Overview
WorksAudit is intended to be a SaaS, so the system would have to be:
- Infinitely scalable.
- Cost as little as possible when unused.
- Can support any kind of system’s audit requirements:
- Should be designed in such a way that it is able to receive, process, and store any kind of audit log data.
- Should have the APIs for both end-user applications and in-house Audit Log Viewer to consume all the data being fed into the system.
- Multi-tenant, multi-landscape.
WorksAudit composed of hundreds of components (implemented as AWS resources), and it would be overwhelming to show them all in one picture. However, to understand the basic idea on how WorksAudit is structured is actually quite simple. By understanding this underlying idea, understanding the detail parts would become easier.
Following image is the simplified architecture of WorksAudit that shows the main idea of how the whole system is constructed:
This diagram can be explained as follows:
- There are 3 main areas that are the focus of an audit system:
- Systems to be audited, shown above on the left side. These systems are producing audit logs and send them to the second part of the system.
- The area in the middle that receives the log data from the systens describe above. The purpose of this part is:
- To transform the log data coming in to make sure that the data can be usable further.
- To store the data in a suitable format and storage.
- To make the data easily available to use by auditor.
- Systems used by auditor (end-user) shown on the right side. These systems consumes data processed, stored, and presented by the middle area described above.
- The left area are not systems that are developed and maintained by WorksAudit team. However, those systems used SDK developed by WorksAudit team. These SDKs are usually called Producer.
The simplified architecture described in the previous section is shown in the following diagram as how it is implemented as AWS resources. This diagram is still simplified and not showing the actual detail implementations, but this may help to understand the approach of how WorksAudit is structured in AWS:
The diagram can be read from left to right. The whole flow can be summarized as follows:
- The systems to be audited send the log data using Producer SDK. This Producer SDK is producing logs in Protobuf format.
- The logs are sent through AWS Kinesis Firehose to be stored in WorksAudit central bucket (
wap-audit-central-{env}
where{env}
is the environment name). - The data collected in WorksAudit central bucket is then processed by data transformation process (ETL) that is executed as a job in AWS Glue.
- The ETL job produces a database in Parquet format to be used by Athena. This database is stored in WorksAudit database bucket (
wap-audit-db-{env}
). - Other than log data, there are some other data synchronized from other systems:
- User authorization data.
- Multi-lingual data.
- The data in the databases then will be fetched by various APIs that exposes the data to the external consumer:
- Query (GraphQL) API is the main API for log data search and retrieval used by the official WorksAudit Log Viewer.
- Query Lite (REST) API is the API mainly directed to be used for direct CSV data search download by end-user scripts or backend systems.
- Reference API is the API for getting mainly static reference data, for example multilingual names. However this API also exposes some dynamic data such as sytem status.
- Auth API is the API for authentication and authorization-related tasks. All API endpoints require authentication, and this set of API is the one to use for establishing an authenticated connection.
- Settings API is the API for retrieving and updating systems' settings.
- To simplify the access to the APIs on client applications, we also develop an SDK called Consumer SDK.
- The Audit Log Viewer is a single-page application (SPA) where the end-user (customer) can search and view the result of the whole process.
- Activity Specification. This contains the master list of all activity types supported by WorksAudit and the external systems that produced the activities.
- HUE Producer Specification. This specifies the details of the data to be sent by HUE producers.
- Collabo/WorksSuite Producer Specification. This specifies the details of the data to be sent by Collabo/WorksSuite producers.
- EBM Producer Specification. This specifies the details of the data to be sent by EBM producers.
- AC Producer Specification. This specifies how to map AC activities to WorksAudit’s.
- Expense Producer Specification. This specifies how to map Expense activities to WorksAudit’s.
- ID Structure Specification. This document specifies how the IDs in WorksAudit is structured, and how it relates to the IDs in the log data source systems.
Project | Module | Description |
---|---|---|
wap-audit-core | Hosts the WorksAudit Protobuf definition. It also has the Athena table definition and old Wiki. | |
wap-audit-auth | Everything related to authentication and authorization. | |
wap-audit-lambda-clientid-resolver | Lambda for resolving client ID given tenant and landscape information. This is a part of authentication flow. | |
wap-audit-lambda-cognito-user-sync | Lambda for synchronizing user data into Cognito user database. | |
wap-audit-lambda-hue-iam-restore | Lambda for restoring DynamoDB user data from synchronized user data. | |
wap-audit-lambda-hue-iam-sync | Lambda for updating DynamoDB user table from user data synchronized from HUE. | |
wap-audit-lambda-hue-ip-restriction-sync | Lambda for updating IP restriction data in DynamoDB settings table from IP restriction data synchronized from HUE. | |
wap-audit-lambda-hue-username-sync | ||