How To Build A Centralized Data Catalog System
January 21, 2021
In compliance with the EU GDPR securing sensitive information (General Data Protection Regulation), securing the sensitive information contained in the user’s personal data is important. Hence, this Europe’s fastest-growing neobroker decided to encrypt the sensitive personal data of its users to avoid getting the raw data into its database, which is against the GDPR policy. Implementing a data catalog system let the neobroker comply with GDPR and paved the way to continue to get insight from the data to enrich its business decisions.
The Data2Bots engineering team built a centralized data catalog system that takes care of users’ sensitive information, such as social security numbers and Bank verification details, by encrypting the sensitive PII (Personal Identifiable Information) data. The three steps involved in data catalog implementation, which include strategy, preparation, and implementation, were carried out internally by our team of expert engineers.
The key technologies used during the implementation are Google Data Catalog System, a fully managed and highly scalable data discovery and metadata management service, and Google Pub/Sub, a messaging middleware for a simple communication medium for modern microservices.
To build its indexGoogle Cloud data catalog is a fully managed and scalable metadata management service that allows organisations to quickly discover, understand and manage their data in google cloud. It is a centralized service, fully managed by Google Cloud, keeping an optimized search index for data assets belonging to GCP (Google Cloud Platform) projects.
The data assets mean datasets, tables, views, text/CSV files, spreadsheets, and data streams. To build its index, the data catalog relies on assets’ metadata to make its index, i.e., name, description, and column definitions.
The team used an encryption solution employing encryption algorithms to protect sensitive PII data. The encryption of sensitive data going into the data catalog was performed by sniffing the schema of JSON data coming into the data catalog. Also, using intelligent tags that can identify PII to encrypt the PII using encryption keys and initialisation vectors obtained from a key service.
All services, including the schema validator, encryption, and key service, were built by the data2bots extended team.
With GDPR compliance, the company improved its data handling and is making better insights that enrich its business value. Implementing the data catalog system ensures strong data governance in the organization. Finally, the data catalog system helps people dealing with data to find, understand, trust, and govern data.
A data catalog system is one of the modern data solutions that can help your organisation inventory and organize data within the system. Data2Bots has been positioned as a data solution provider that solves modern data challenges. We have a team of experts that can build your data infrastructure for your business growth. Reach out to us by sending a mail to firstname.lastname@example.org.
Ezeja JenniferHow To Build A Centralized Data Catalog System