Dataplex is an intelligent data fabric that unifies distributed data and automates data management and governance. Through Dataplex, you can use AI to ease data queries, quality assurance, and business insights.
Dataplex performs governance at scale. Take, for example, a global retail company generating large amounts of sales, inventory, and customer data stored in Cloud Storage, Spanner, and Pub/Sub. With data distributed across systems, managing governance, ensuring quality, and maintaining compliance is complex and time-consuming. Dataplex simplifies this process by providing a central view to discover, profile, validate, track the lineage of, and control access to organizational data assets.
Why use Dataplex?
Dataplex governs data through the following features:
- Metadata cataloging. Retrieve metadata for Google Cloud resources (in BigQuery, Cloud SQL, Spanner, Vertex AI, Pub/Sub, Dataform, Dataproc Metastore), and third-party resources you bring into Dataplex, for a snapshot of your data assets.
- Data discovery. Scan for structured and unstructured data in Cloud Storage buckets to extract and catalog their metadata.
- Data insights. Use AI to generate natural language questions about your data, to uncover patterns, assess data quality, and perform statistical analyses.
- Data profiling. Identify common characteristics of the column data in your BigQuery tables, for example, typical data values, data distribution, and null counts, which can inform data classification and quality assurance.
- Data quality. Define and measure the quality of the data in your BigQuery tables, by validating data against organizational policies and logging alerts if data doesn't meet quality criteria.
- Business glossary. Manage business-related terminology and definitions across your organization, and attach terms to table columns to promote a consistent understanding of data usage.
- Data lineage. Track how data moves through your systems- where it comes from, where it is passed to, and what transformations are applied to it.
Dataplex supports an end-to-end data lifecycle, from distributed discovery to business insights. Governance features are also available through BigQuery.
What's next
- Learn about BigQuery governance.
- Learn about BigQuery universal catalog.
- Learn how to search for data assets in BigQuery universal catalog.
- Learn how to manage entries and ingest custom sources.
- Learn how to import metadata into Dataplex.