Protobuf
The reasons why Protocol Buffer (Protobuf) is selected as the base format for WorksAudit data are:
- Protobuf can be used to define general data structure that can always be extended as necessary, and always be both backward and forward compatible (future libraries can always read older version of protobuf, current libraries can always read future version on protobuf data as well).
- Protobuf can be used to automatically generate libraries to produce/consume the data (serializer/deserializer) in most programming languages.
- By having a base data structure, we can have a single source of truth for the kind of data that we’re handling in the system.
The source of WorksAudit Protobuf structure is https://scm.hue.workslan/cloud-service/wap-audit-core/blob/master/spec/activity.proto.
The explore the WorksAudit data structure, it’s best to start with Activity
:
// Represent an activity or event that happened in the system.
message Activity {
// Globally unique ID (UUID) identifying an activity.
// In HUE, this is generated when an activity record is converted from raw
// log to Parquet row.
string id = 1;
// The globally unique ID of a user operation in which the activity is a part
// of. A user operation is an ordered sequence of individual activities.
// In HUE, this is roughly equivalent to "correlationId".
string userOperationId = 2;
// 0-based sequence number of the activity within the process identified by
// processId. There is no equivalent field in HUE. In HUE, activities with
// the same "correlationId" is ordered by timestamp.
// Same sequence number with the same processId suggests that the activities
// happened in parallel.
uint32 sequenceNo = 3;
// UNIX timestamp recording the time the activity/event happened.
uint64 timestamp = 4;
// IANA TZ database code (e.g. "Asia/Tokyo")
string timeZone = 5;
// The context of the activity, mostly containing request information, e.g.
// host, user-agent, HTTP method, URL, referer, request token, session token,
// etc.
Map context = 6;
// The user that does the activity/participate in an event.
User actor = 7;
// The detail of the activity being done by the actor.
ActivityClause activity = 8;
// The location where the activity/event happened.
Location location = 9;
// The ID of the multililngual template for the text that describes
// the activity.
string description = 10;
// The original user that impersonates the actor.
User impersonator = 11;
}
Activity is the basic data structure for audit. Each audit log entry/record is basically representing an activity that happened in the system being audited.
This basic structure uses another structures defined elsewhere in the protobuf. Following sections describes each of these structure.
Map
is a collection of key-value mapping defined here like:
// A collection of key and value pair, with additional metadata to describe
// the data type of the value.
message Map {
// the collection of pairs.
repeated Pair entries = 1;
// The pair of String key and (typed) value.
message Pair {
string key = 1;
Value value = 2;
//...
}
//...
}
The value itself can be of different types as defined here like:
// the actual content of the value stored in
message Value {
// data type of the value's content
enum ValueType {
DOUBLE = 0;
FLOAT = 1;
INT32 = 2;
INT64 = 3;
UINT32 = 4;
UINT64 = 5;
SINT32 = 6;
SINT64 = 7;
FIXED32 = 8;
FIXED64 = 9;
SFIXED32 = 10;
SFIXED64 = 11;
BOOL = 12;
STRING = 13;
MULTILINGUAL_TEXT = 14;
BYTES = 15;
MAP = 16;
Nil = 17; // used when the the content needs to be explicitly set as empty.
LIST = 18;
TIMESTAMP = 19; // unix timestamp (down to second)
}
// Data type mostly equivalent to Protobuf's built-in types,
// with some additional type meaningful only for audit system.
ValueType type = 1;
oneof content {
double doubleValue = 2;
float floatValue = 3;
int32 int32Value = 4;
int64 int64Value = 5;
uint32 uint32Value = 6;
uint64 uint64Value = 7;
sint32 sint32Value = 8;
sint64 sint64Value = 9;
fixed32 fixed32Value = 10;
fixed64 fixed64Value = 11;
sfixed32 sfixed32Value = 12;
sfixed64 sfixed64Value = 13;
bool boolValue = 14;
string stringValue = 15;
// Multilingual text ID. The value stored should be used as
// a reference to get the actual text in a specific language
// used when viewing the data.
string multilingualTextValue = 16;
bytes bytesValue = 17;
Map mapValue = 18;
ValueList listValue = 19;
// unix timestamp (down to second)
uint64 timestampValue = 20;
}
}
message ValueList {
repeated Value value = 1;
}
User structure is representing a user that is involved in the activity being logged. It is defined here like such:
// Represent a user that does an activity, participate in an event, or be
// a target in an activity.
// Examples:
// When the user does an activity: _Alice_ submit an application.
// When a user becomes a target of an activity:
message User {
string id = 1;
string name = 2;
UserType type = 3;
enum UserType {
USER = 0; // Standard user of the system.
CLIENT = 1; // Access using client ID/API key.
}
// Data source (DB, schema, table) that the ID is refering to.
string ref = 4;
}
Location structure is representing a location where the activity or event happened. It is defined here like such:
// Represent a location where an activity/event happened, or a location
// that is a target of an activity. This can represent a service,
// a subsystem, a page, an organizational unit, etc.
// Examples:
// Where an activity happened: Login is authenticated in the _auth_ system.
// When becoming a target of an activity: Alice disabled the access to the
// _accounting_system_.
message Location {
// The ID of a location.
string id = 1;
// Optional ID of the multilingual description of the location with ID as
// specified, or the name as is (not to be resolved to multilingual value).
oneof name {
string value = 2;
bool nil = 3;
}
// Data source (DB, schema, table) that the ID is refering to.
string ref = 4;
}
The activity details is represented by the ActivityClause
structure:
message ActivityClause {
ActivityCategory category = 1;
ActivityVerb verb = 2;
ActivityVocabulary object = 3;
ActivityVocabulary specifier = 4;
ActivityVocabulary preposition = 5;
// ......
}
The value of these 5 fields determine the type of activity the log record is representing. An example on how these fields are used is for “Web page open activity”. When we’re logging we will be setting:
category
=ACTIVITY_CATEGORY_WEB
verb
=ACTIVITY_VERB_OPEN
object
=ACTIVITY_WORD_PAGE
category
and verb
is required, the other fields are optional. When other fields are not specified, it should be set as ACTIVITY_WORD_NIL
.
Depending on the value of the category
, the activity may have different details as represented by extension
:
message ActivityClause {
// ......
// The actual data content of the activity. Depending on the type of activity,
// the data might be stored in on of the following structures.
oneof extension {
DataActivity data = 7;
WebActivity web = 8;
// authentication and account category uses AccountActivity extension
AccountActivity account = 9;
AuthorityActivity authorization = 10;
ApprovalActivity approval = 11;
ReportActivity report = 12;
}
}
An activity may also have a “sub-activity type” as expressed by following block:
message ActivityClause {
// ......
repeated SubActivityClause aliases = 13;
// Represents an alias of the ActivityClause
message SubActivityClause {
ActivityCategory category = 1;
ActivityVerb verb = 2;
ActivityVocabulary object = 3;
ActivityVocabulary specifier = 4;
ActivityVocabulary preposition = 5;
}
}
An example of such activity is:
category
=ACTIVITY_CATEGORY_DATA
verb
=ACTIVITY_VERB_OPERATE
object
=ACTIVITY_WORD_DATA
So this activity is “Operating on data”. This is a very general statement that may raise some questions:
- What kind of operation is it?
- What kind of data being operated?
Let’s say that actually the operation is “sharing a template”. In that case, following fields are set as such:
aliases[0].category
=ACTIVITY_CATEGORY_DATA
aliases[0].verb
=ACTIVITY_VERB_SHARE
aliases[0].object
=ACTIVITY_WORD_TEMPLATE
The reason why there needs to be a “sub-activity” instead of just directly specify this details in the main category
, verb
fields etc., is so that: the activity can be searched (on Viewer) with the generic activity type, instead of having to present all possible detail activity types on the Viewer.
The Protobuf definition was originally designed using traditional ERD shown below for reference: