Data Catalog on GCP and BigQuery

Every single large corporation holds an enormous number of data. Different departments use different data, and the organisation’s function should be crucial in deciding on the level of access to sensitive information. Some positions in the organisation should have this access, and some not. The solution for Big Query is a Data Catalog.

From Gartner: “A data catalog maintains an inventory of data assets through the discovery, description, and organisation of datasets. The catalog provides context to enable data analysts, data scientists, data stewards, and other data consumers to find and understand a relevant dataset for the purpose of extracting business value.”

When contractors like me join the data team, they should become familiar with all the data assets they should know—by data assets, I mean datasets, tables, views, files, and data streams. BigQuery provides information such as schema, lineage or tagging system to allow users to manoeuvre across many different tables easily.

Data catalog tag templates will enable the creation and management of standard metadata about data assets. The tags are attached to the tables to discover data in the Data Catalog quickly. This way, we can quickly locate all needed data sources and familiarise ourselves with the structure. Tags allow to attach metadata to find data assets. For example, for data security purposes, someone may search for all tables storing sensitive information and tag them.

The process of creating the tags in the Data Catalog is quick and easy. Create BigQuery dataset and table, go to Data Catalog and allow API. Choose the Tag Template in the Data Catalog and Add new tag template.

Search for the table and attach tags filling the fields as appropriate.

Leave a Reply

Your email address will not be published. Required fields are marked *