Catalog

The Catalog is your centralized repository for managing and organizing data assets across your organization. It provides comprehensive metadata management, data discovery capabilities, and automated schema tracking.

Overview

The Catalog enables you to discover, browse, and manage your data assets across multiple vault connections. It automatically extracts metadata, tracks schema changes, and provides detailed information about your tables and datasets.

Catalog Interface

The catalog interface consists of three main sections:

Discover Catalog

Vault Selection: Choose from your configured vaults to discover available data sources
Dataset/Database Selection: Browse available databases or datasets within the selected vault
Table Discovery: View and select multiple tables for cataloging
Bulk Operations: Catalog multiple tables simultaneously

Browse Catalog

Search and Filter: Find cataloged tables using various filters
Metadata Viewing: Access comprehensive table and column information
Schema Details: View column names, data types, and nullability
Version History: Track schema changes and updates over time

Manage Catalog

Table Management: Add or remove tables from the catalog
Bulk Deletion: Remove multiple catalog entries at once
Maintenance Operations: Perform catalog cleanup and optimization

Using the Catalog

Discovering and Cataloging Data

Navigate to Discover Catalog
- Select the vault type (e.g., BigQuery, PostgreSQL, Snowflake)
- Choose the specific vault connection from the dropdown
- Select the target dataset or database
Select Tables for Cataloging
- View available tables in the selected dataset
- Use multi-select to choose tables (e.g., "customers", "orders", "orders_partitioned")
- Click "Catalog" to add selected tables to the catalog
Automatic Metadata Extraction
- DeepDQ automatically extracts table schemas
- Column information including names, types, and constraints
- Creates initial catalog entries with metadata

Browsing Cataloged Data

Access Browse Catalog
- View all cataloged tables across your organization
- Filter by vault type, dataset, or table name
- Search for specific tables or data assets
View Table Details
- Click on any table to view detailed information
- Basic Information: Dataset, GCP Project, Source vault
- Schema Details: Complete column listing with data types
- Version Information: Creation and update timestamps
- Metadata: Comprehensive table documentation

Table Information Display

Each cataloged table shows:

Table Name: Full table identifier
Source Information:
- Dataset/Database name
- GCP Project ID (for BigQuery)
- Source vault connection
Schema Version: Current version with timestamp
Last Updated: Most recent schema update
Column Details:
- Column names and data types (INT64, STRING, NUMERIC, DATE)
- Nullability (NULLABLE/NOT NULL modes)
- Complete schema structure

Managing Catalog Entries

Navigate to Manage Catalog
- Select vault type and specific vault
- View tables available for deletion or management
Remove Tables from Catalog
- Select tables to remove from cataloging
- Use bulk operations for multiple table deletion
- Confirm removal with "Delete Selected Tables"
Catalog Maintenance
- Regular cleanup of outdated entries
- Bulk management operations
- Schema version management

Key Features

Automated Schema Discovery

Multi-Vault Support: Discover tables across different database types
Real-time Schema Extraction: Automatic metadata collection
Schema Versioning: Track changes over time
Bulk Operations: Catalog multiple tables efficiently

Comprehensive Metadata Management

Column-Level Details: Data types, constraints, and nullability
Source Tracking: Vault and dataset origin information
Version Control: Schema change history and timestamps
Cross-Platform Support: Works with BigQuery, PostgreSQL, Snowflake, and other supported vaults

Integration with DeepDQ Features

Vault Integration: Seamless connection to configured vaults
Sentinel Compatibility: Cataloged tables available for quality monitoring
Data Lineage: Schema information feeds into lineage tracking
Alert Integration: Schema changes can trigger notifications

Supported Vault Types

The catalog supports discovery and management across all DeepDQ vault types:

BigQuery: Datasets and tables with full schema extraction
PostgreSQL: Databases, schemas, and table structures
Snowflake: Warehouses, databases, and schema information
SQL Server: Databases and table metadata
MySQL: Database and table schema discovery
Databricks: Catalogs, schemas, and Delta Lake tables
Redshift: Schemas and table structures
Oracle: Schema and table metadata extraction

Best Practices

Efficient Cataloging

Start Small: Begin with critical datasets and expand gradually
Use Bulk Operations: Catalog related tables together for efficiency
Regular Updates: Refresh catalog entries to capture schema changes
Organize by Domain: Group related tables by business domain or project

Metadata Management

Consistent Naming: Follow naming conventions for easy discovery
Regular Maintenance: Remove obsolete or unused catalog entries
Version Tracking: Monitor schema changes and their impact
Documentation: Add business context and descriptions where possible

Integration Workflow

Configure Vaults: Set up secure connections to your data sources
Discover Tables: Use the catalog to find and inventory data assets
Catalog Selection: Add important tables to the centralized catalog
Enable Monitoring: Set up Sentinels for quality monitoring
Maintain Regularly: Keep catalog current with schema changes

Overview​

Catalog Interface​

Discover Catalog​

Browse Catalog​

Manage Catalog​

Using the Catalog​

Discovering and Cataloging Data​

Browsing Cataloged Data​

Table Information Display​

Managing Catalog Entries​

Key Features​

Automated Schema Discovery​

Comprehensive Metadata Management​

Integration with DeepDQ Features​

Supported Vault Types​

Best Practices​

Efficient Cataloging​

Metadata Management​

Integration Workflow​