Skip to main content

Data Lineage

Data Lineage provides end-to-end visibility into how your data flows through systems, transformations, and processes. It helps you understand data dependencies and track the impact of changes through interactive visual diagrams.

Overview

The Data Lineage interface provides a comprehensive view of your data ecosystem with visual representations of tables, connections, and data flows. You can create, save, and manage multiple lineage diagrams to document different aspects of your data architecture.

What is Data Lineage?

Data lineage shows:

  • Data Origins: Where your data comes from originally
  • Transformation Steps: How data is modified as it moves through systems
  • Data Destinations: Where the processed data ultimately ends up
  • Dependencies: Which downstream systems depend on specific data sources
  • Impact Analysis: What would be affected if a data source changes

Interface Overview

Main Canvas

The Data Lineage interface features a visual canvas where you can:

  • View Nodes: See database tables represented as interactive cards showing table details
  • Manage Connections: Visualize relationships between different data entities
  • Navigate: Use the search functionality to find specific tables, schemas, or databases
  • Layout Options: Choose between different visualization layouts (Hierarchical, Force-Directed, Circular, Grid)

Toolbar Controls

The interface provides several control options:

  • Add Node: Manually add tables or custom nodes to your lineage diagram
  • Update: Refresh the current lineage view with latest information
  • Load: Load previously saved lineage configurations
  • Layout: Switch between different diagram layout options
  • Clear: Clear the current diagram to start fresh

Node Information

Each table node displays:

  • Table Name: The name of the database table
  • Database Type: Visual indicator of the database system (SQL Server, PostgreSQL, etc.)
  • Status: Online/offline status indicator
  • Schema Information: Host and schema details
  • Column Count: Number of columns in the table with expandable details
  • Last Updated: Timestamp of when the table information was last refreshed

Lineage Management Features

Save and Load Functionality

  • Save Lineage: Save your current lineage configuration for future reference
  • Load Saved Lineage: Access previously saved lineage diagrams including:
    • SQL Server Lineage Test configurations
    • Database-specific lineage views (Redshift, Oracle, MySQL, BigQuery)
    • Custom lineage mappings you've created

Node Management

  • Add Custom Node: Create custom nodes for processes, systems, or data flows not automatically detected
  • Node Types: Choose from different node types:
    • Process/ETL Pipeline: For data transformation processes
    • Database System: For database connections
    • File/Dataset: For file-based data sources
    • Table/View: For specific database tables or views

Generate Lineage from Catalog

  • Automatic Generation: Create lineage diagrams directly from your data catalog
  • Vault Selection: Choose from configured vaults (PostgreSQL, MySQL, Snowflake, SQL Server, etc.)
  • Table Selection: Select specific tables from available catalogs (Orders, LineItem, Customers, etc.)
  • Bulk Operations: Select multiple tables at once to generate comprehensive lineage views

Visualization Options

Layout Types

Choose from multiple layout options to best visualize your data relationships:

  • Hierarchical (Top-Down): Traditional tree structure showing clear parent-child relationships
  • Force-Directed: Dynamic layout that automatically organizes nodes based on relationships
  • Circular: Circular arrangement useful for identifying patterns and cycles
  • Grid: Structured grid layout for organized viewing

Direction Controls

Control the flow direction of your lineage diagrams:

  • Directed: Shows clear directional flow from source to destination
  • Undirected: Shows relationships without specific directional emphasis
  • Bidirectional: Shows two-way data relationships and dependencies

Working with Lineage

Creating Lineage Diagrams

From Catalog

  1. Click "Generate Lineage from Catalog"
  2. Select your configured vault from the dropdown
  3. Choose tables from the available list (you can select multiple)
  4. Click "Generate Lineage" to create the visual diagram

Manual Creation

  1. Use "Add Node" to manually add tables or processes
  2. Choose the appropriate node type (Process/ETL Pipeline, Database System, File/Dataset, Table/View)
  3. Enter descriptive names and configure node properties
  4. Connect nodes to show data flow relationships

From Saved Configurations

  1. Click "Load" to access saved lineage diagrams
  2. Choose from your saved configurations (e.g., "SQL Server Lineage Test 1", "MySQL Lineage 1")
  3. Load the configuration to restore your previous work
  4. Modify or extend as needed

Managing Saved Lineage

  • Save Current Work: Save your lineage diagrams with descriptive names
  • Load Previous Work: Access previously saved configurations
  • Update Existing: Modify and update saved lineage diagrams
  • Delete Configurations: Remove outdated or unnecessary saved lineage

Real-time Updates

  • Dynamic Refresh: Tables show real-time status (Online/Offline)
  • Schema Sync: Table information automatically updates when schemas change
  • Connection Monitoring: Track the health of data connections in your lineage

Advanced Features

Search and Navigation

  • Global Search: Use the search bar to find specific tables, schemas, or databases
  • Filter Views: Focus on specific parts of your data ecosystem
  • Zoom and Pan: Navigate large lineage diagrams with ease

Node Details

Each node provides comprehensive information:

  • Column Expansion: Click to see detailed column information (+2 more columns, etc.)
  • Update Timestamps: See when table metadata was last refreshed
  • Connection Status: Visual indicators for table availability
  • Database Context: Clear identification of source database and schema

Connection Management

  • Visual Connections: See how data flows between different tables and systems
  • Connection Types: Different visual styles for different types of relationships
  • Impact Tracing: Follow connections to understand downstream impacts

Use Cases

Impact Analysis

Before making changes to your data infrastructure:

  1. Load or create a lineage diagram containing the table in question
  2. Trace downstream connections to see dependent systems
  3. Identify all processes and applications that would be affected
  4. Plan changes and communicate with stakeholders
  5. Use the visual diagram to present impact scope to decision makers

Root Cause Analysis

When data quality issues occur:

  1. Start from the problematic dataset in your lineage diagram
  2. Trace connections backward to identify potential sources of issues
  3. Check each connected table and transformation process
  4. Use the real-time status indicators to identify offline or problematic sources
  5. Verify data quality at each step in the lineage chain

Data Architecture Documentation

  • Visual Documentation: Create comprehensive diagrams of your data ecosystem
  • Team Collaboration: Share saved lineage configurations across teams
  • Architecture Planning: Use lineage views to plan system migrations and upgrades
  • Onboarding: Help new team members understand data relationships

Compliance and Governance

  • Data Flow Documentation: Show how sensitive data moves through systems
  • Regulatory Compliance: Document data handling processes for audit purposes
  • Change Management: Track and document modifications to data flows
  • Access Analysis: Understand which systems have access to specific data

Configuration and Management

Vault Integration

Data Lineage integrates seamlessly with your configured vaults:

  • Multi-Database Support: Work with PostgreSQL, MySQL, Snowflake, SQL Server, and other supported databases
  • Unified View: Create lineage diagrams that span multiple database systems
  • Automatic Discovery: Generate lineage directly from cataloged tables
  • Real-time Sync: Keep lineage information current with your actual database schemas

Saved Lineage Management

  • Organized Storage: Keep multiple saved lineage configurations for different purposes
  • Version Control: Track changes to your lineage documentation over time
  • Team Sharing: Share lineage configurations across your organization
  • Backup and Recovery: Ensure your lineage documentation is preserved

Performance and Scalability

  • Large Diagrams: Handle complex data ecosystems with many tables and connections
  • Efficient Rendering: Smooth performance even with extensive lineage mappings
  • Smart Loading: Load only the necessary information for current view
  • Caching: Optimized performance for frequently accessed lineage diagrams

Best Practices

Getting Started

  • Start Small: Begin with critical data flows and expand coverage over time
  • Use Catalog Integration: Leverage the "Generate Lineage from Catalog" feature for automatic discovery
  • Save Frequently: Regularly save your lineage work to preserve documentation
  • Organize by Purpose: Create different saved lineage configurations for different use cases

Maintenance

  • Regular Updates: Refresh lineage diagrams when your data architecture changes
  • Validate Connections: Periodically verify that connections accurately represent data flows
  • Document Context: Use descriptive names for saved lineage configurations
  • Monitor Status: Pay attention to online/offline status indicators for early problem detection

Collaboration

  • Share Configurations: Use saved lineage to share knowledge across teams
  • Standard Naming: Develop consistent naming conventions for saved lineage
  • Regular Reviews: Schedule periodic reviews of lineage documentation with stakeholders
  • Training: Ensure team members understand how to read and maintain lineage diagrams