Data Lineage

Data Lineage provides end-to-end visibility into how your data flows through systems, transformations, and processes. It helps you understand data dependencies and track the impact of changes through interactive visual diagrams.

Overview

The Data Lineage interface provides a comprehensive view of your data ecosystem with visual representations of tables, connections, and data flows. You can create, save, and manage multiple lineage diagrams to document different aspects of your data architecture.

What is Data Lineage?

Data lineage shows:

Data Origins: Where your data comes from originally
Transformation Steps: How data is modified as it moves through systems
Data Destinations: Where the processed data ultimately ends up
Dependencies: Which downstream systems depend on specific data sources
Impact Analysis: What would be affected if a data source changes

Interface Overview

Main Canvas

The Data Lineage interface features a visual canvas where you can:

View Nodes: See database tables represented as interactive cards showing table details
Manage Connections: Visualize relationships between different data entities
Navigate: Use the search functionality to find specific tables, schemas, or databases
Layout Options: Choose between different visualization layouts (Hierarchical, Force-Directed, Circular, Grid)

The interface provides several control options:

Add Node: Manually add tables or custom nodes to your lineage diagram
Update: Refresh the current lineage view with latest information
Load: Load previously saved lineage configurations
Layout: Switch between different diagram layout options
Clear: Clear the current diagram to start fresh

Node Information

Each table node displays:

Table Name: The name of the database table
Database Type: Visual indicator of the database system (SQL Server, PostgreSQL, etc.)
Status: Online/offline status indicator
Schema Information: Host and schema details
Column Count: Number of columns in the table with expandable details
Last Updated: Timestamp of when the table information was last refreshed

Lineage Management Features

Save and Load Functionality

Save Lineage: Save your current lineage configuration for future reference
Load Saved Lineage: Access previously saved lineage diagrams including:
- SQL Server Lineage Test configurations
- Database-specific lineage views (Redshift, Oracle, MySQL, BigQuery)
- Custom lineage mappings you've created

Node Management

Add Custom Node: Create custom nodes for processes, systems, or data flows not automatically detected
Node Types: Choose from different node types:
- Process/ETL Pipeline: For data transformation processes
- Database System: For database connections
- File/Dataset: For file-based data sources
- Table/View: For specific database tables or views

Generate Lineage from Catalog

Automatic Generation: Create lineage diagrams directly from your data catalog
Vault Selection: Choose from configured vaults (PostgreSQL, MySQL, Snowflake, SQL Server, etc.)
Table Selection: Select specific tables from available catalogs (Orders, LineItem, Customers, etc.)
Bulk Operations: Select multiple tables at once to generate comprehensive lineage views

Visualization Options

Layout Types

Choose from multiple layout options to best visualize your data relationships:

Hierarchical (Top-Down): Traditional tree structure showing clear parent-child relationships
Force-Directed: Dynamic layout that automatically organizes nodes based on relationships
Circular: Circular arrangement useful for identifying patterns and cycles
Grid: Structured grid layout for organized viewing

Direction Controls

Control the flow direction of your lineage diagrams:

Directed: Shows clear directional flow from source to destination
Undirected: Shows relationships without specific directional emphasis
Bidirectional: Shows two-way data relationships and dependencies

Working with Lineage

Creating Lineage Diagrams

From Catalog

Click "Generate Lineage from Catalog"
Select your configured vault from the dropdown
Choose tables from the available list (you can select multiple)
Click "Generate Lineage" to create the visual diagram

Manual Creation

Use "Add Node" to manually add tables or processes
Choose the appropriate node type (Process/ETL Pipeline, Database System, File/Dataset, Table/View)
Enter descriptive names and configure node properties
Connect nodes to show data flow relationships

From Saved Configurations

Click "Load" to access saved lineage diagrams
Choose from your saved configurations (e.g., "SQL Server Lineage Test 1", "MySQL Lineage 1")
Load the configuration to restore your previous work
Modify or extend as needed

Managing Saved Lineage

Save Current Work: Save your lineage diagrams with descriptive names
Load Previous Work: Access previously saved configurations
Update Existing: Modify and update saved lineage diagrams
Delete Configurations: Remove outdated or unnecessary saved lineage

Real-time Updates

Dynamic Refresh: Tables show real-time status (Online/Offline)
Schema Sync: Table information automatically updates when schemas change
Connection Monitoring: Track the health of data connections in your lineage

Advanced Features

Global Search: Use the search bar to find specific tables, schemas, or databases
Filter Views: Focus on specific parts of your data ecosystem
Zoom and Pan: Navigate large lineage diagrams with ease

Node Details

Each node provides comprehensive information:

Column Expansion: Click to see detailed column information (+2 more columns, etc.)
Update Timestamps: See when table metadata was last refreshed
Connection Status: Visual indicators for table availability
Database Context: Clear identification of source database and schema

Connection Management

Visual Connections: See how data flows between different tables and systems
Connection Types: Different visual styles for different types of relationships
Impact Tracing: Follow connections to understand downstream impacts

Use Cases

Impact Analysis

Before making changes to your data infrastructure:

Load or create a lineage diagram containing the table in question
Trace downstream connections to see dependent systems
Identify all processes and applications that would be affected
Plan changes and communicate with stakeholders
Use the visual diagram to present impact scope to decision makers

Root Cause Analysis

When data quality issues occur:

Start from the problematic dataset in your lineage diagram
Trace connections backward to identify potential sources of issues
Check each connected table and transformation process
Use the real-time status indicators to identify offline or problematic sources
Verify data quality at each step in the lineage chain

Data Architecture Documentation

Visual Documentation: Create comprehensive diagrams of your data ecosystem
Team Collaboration: Share saved lineage configurations across teams
Architecture Planning: Use lineage views to plan system migrations and upgrades
Onboarding: Help new team members understand data relationships

Compliance and Governance

Data Flow Documentation: Show how sensitive data moves through systems
Regulatory Compliance: Document data handling processes for audit purposes
Change Management: Track and document modifications to data flows
Access Analysis: Understand which systems have access to specific data

Configuration and Management

Vault Integration

Data Lineage integrates seamlessly with your configured vaults:

Multi-Database Support: Work with PostgreSQL, MySQL, Snowflake, SQL Server, and other supported databases
Unified View: Create lineage diagrams that span multiple database systems
Automatic Discovery: Generate lineage directly from cataloged tables
Real-time Sync: Keep lineage information current with your actual database schemas

Saved Lineage Management

Organized Storage: Keep multiple saved lineage configurations for different purposes
Version Control: Track changes to your lineage documentation over time
Team Sharing: Share lineage configurations across your organization
Backup and Recovery: Ensure your lineage documentation is preserved

Performance and Scalability

Large Diagrams: Handle complex data ecosystems with many tables and connections
Efficient Rendering: Smooth performance even with extensive lineage mappings
Smart Loading: Load only the necessary information for current view
Caching: Optimized performance for frequently accessed lineage diagrams

Best Practices

Getting Started

Start Small: Begin with critical data flows and expand coverage over time
Use Catalog Integration: Leverage the "Generate Lineage from Catalog" feature for automatic discovery
Save Frequently: Regularly save your lineage work to preserve documentation
Organize by Purpose: Create different saved lineage configurations for different use cases

Maintenance

Regular Updates: Refresh lineage diagrams when your data architecture changes
Validate Connections: Periodically verify that connections accurately represent data flows
Document Context: Use descriptive names for saved lineage configurations
Monitor Status: Pay attention to online/offline status indicators for early problem detection

Collaboration

Share Configurations: Use saved lineage to share knowledge across teams
Standard Naming: Develop consistent naming conventions for saved lineage
Regular Reviews: Schedule periodic reviews of lineage documentation with stakeholders
Training: Ensure team members understand how to read and maintain lineage diagrams

Overview​

What is Data Lineage?​

Interface Overview​

Main Canvas​

Toolbar Controls​

Node Information​

Lineage Management Features​

Save and Load Functionality​

Node Management​

Generate Lineage from Catalog​

Visualization Options​

Layout Types​

Direction Controls​

Working with Lineage​

Creating Lineage Diagrams​

From Catalog​

Manual Creation​

From Saved Configurations​

Managing Saved Lineage​

Real-time Updates​

Advanced Features​

Search and Navigation​

Node Details​

Connection Management​

Use Cases​

Impact Analysis​

Root Cause Analysis​

Data Architecture Documentation​

Compliance and Governance​

Configuration and Management​

Vault Integration​

Saved Lineage Management​

Performance and Scalability​

Best Practices​

Getting Started​

Maintenance​

Collaboration​