Skip to main content

Snowflake Vault Configuration

Snowflake vaults enable secure connections to Snowflake data warehouses, providing comprehensive data quality monitoring across your cloud data platform.

Overview

Snowflake is a cloud-native data warehouse platform that separates compute and storage, offering elastic scaling and multi-cloud deployment options. DeepDQ's Snowflake vault integration enables Sentinels to execute data quality queries, Data Catalog to discover warehouse schemas, Data Lineage to track data relationships, and DAB Chatbot to provide conversational analytics.

Configuration Parameters

Required Fields

  • Name: A unique identifier for your Snowflake vault
  • Type: Snowflake (automatically selected)
  • Account: Your Snowflake account identifier
  • Warehouse: The compute warehouse to use for queries
  • Database: The database name in Snowflake
  • Schema: The schema name within the database
  • User: Your Snowflake username
  • Password: Your Snowflake password (securely encrypted)

Example Configuration

Name: Production Snowflake Vault
Type: Snowflake
Account: your-account.region
Warehouse: YOUR_WAREHOUSE
Database: YOUR_DATABASE
Schema: YOUR_SCHEMA
User: your_username
Password: [encrypted]

Account Identifier Format

Snowflake account identifiers follow specific formats:

  • Legacy: account_name.region
  • Current: orgname-account_name
  • Full URL: account_name.region.cloud_provider

Examples:

  • mycompany.us-east-1
  • myorg-myaccount
  • xy12345.us-central1.gcp

Supported Authentication Methods

Username/Password Authentication

  • Standard Snowflake user credentials
  • Multi-factor authentication (MFA) support
  • Password policy compliance
  • Currently supported by DeepDQ

Warehouse Configuration

Warehouse Sizing

Choose appropriate warehouse sizes based on workload:

  • X-Small: Development and testing
  • Small: Light production workloads
  • Medium: Standard production queries
  • Large: Heavy analytical workloads
  • X-Large and above: Enterprise-scale processing

Auto-Suspend and Auto-Resume

  • Configure auto-suspend to minimize costs
  • Enable auto-resume for seamless query execution
  • Balance cost optimization with performance requirements

Database and Schema Access

Required Privileges

The Snowflake user needs these minimum privileges:

  • USAGE on the database and schema
  • SELECT on tables to be monitored
  • MONITOR privilege for warehouse usage (optional)

Example privilege grants:

GRANT USAGE ON DATABASE your_database TO ROLE deepdq_role;
GRANT USAGE ON SCHEMA your_database.your_schema TO ROLE deepdq_role;
GRANT SELECT ON ALL TABLES IN SCHEMA your_database.your_schema TO ROLE deepdq_role;
GRANT ROLE deepdq_role TO USER your_username;

Common Use Cases

Data Warehouse Quality Monitoring

  • Execute Sentinels for automated data quality checks
  • Monitor ETL pipeline outputs and data transformations
  • Validate data warehouse integrity and consistency
  • Track key performance indicators and business metrics

Schema Discovery and Cataloging

  • Automatic discovery of Snowflake databases, schemas, and tables
  • Metadata extraction for comprehensive data cataloging
  • Schema change detection and documentation
  • Data asset inventory management

Data Lineage in Data Warehouses

  • Map data flow across Snowflake objects
  • Track transformation dependencies and relationships
  • Monitor cross-database and cross-schema lineage
  • Visualize data warehouse architecture

Conversational Analytics with DAB

  • Natural language querying of Snowflake data
  • Interactive data exploration and insights
  • Automated analysis and reporting
  • Business intelligence through conversation

Best Practices

Security

  • Use dedicated Snowflake users for DeepDQ
  • Implement role-based access control (RBAC)
  • Enable network policies for IP restrictions
  • Regular credential rotation and security reviews
  • Network Security: Contact salesandsupport@deepanalyze.ai to get the DeepDQ Static IP for network policy configuration

Performance

  • Right-size warehouses for monitoring workloads
  • Use clustering keys for large tables
  • Optimize query patterns for cost efficiency
  • Monitor warehouse credit consumption

Cost Optimization

  • Configure auto-suspend for cost control
  • Use appropriate warehouse sizes
  • Schedule monitoring during off-peak hours
  • Monitor and optimize query performance

Troubleshooting

Connection Issues

Account Identifier Problems

  • Verify correct account identifier format
  • Check region and cloud provider specifications
  • Validate organization name (for new format)
  • Ensure account is active and accessible

Authentication Failures

  • Confirm username and password accuracy
  • Check for MFA requirements
  • Verify user account status and permissions
  • Review network policy restrictions

Warehouse Access Issues

  • Ensure warehouse exists and is accessible
  • Check warehouse state (started/suspended)
  • Verify user privileges on warehouse
  • Monitor warehouse resource availability

Performance Issues

Slow Query Execution

  • Review warehouse size and scaling
  • Check for warehouse queuing
  • Optimize query complexity and filters
  • Monitor concurrent query load

High Credit Consumption

  • Review warehouse auto-suspend settings
  • Optimize query patterns and frequency
  • Consider warehouse scheduling
  • Monitor query execution plans

Advanced Configuration

Network Security

  • Configure network policies for IP whitelisting
  • Implement private connectivity (PrivateLink/Private Service Connect)
  • Set up VPC endpoints for secure access
  • Enable audit logging for security monitoring

Query Optimization

  • Use Snowflake's query optimizer
  • Implement result caching strategies
  • Leverage automatic clustering
  • Monitor query performance history

Snowflake-Specific Features

Time Travel and Zero-Copy Cloning

  • Access historical data for validation
  • Create development/test environments
  • Implement data recovery strategies
  • Track data lineage across clones

Semi-Structured Data Support

  • Monitor JSON, Avro, Parquet data
  • Validate nested data structures
  • Handle variant data types
  • Query JSON path expressions

Secure Data Sharing

  • Monitor shared data quality
  • Validate data provider compliance
  • Track data consumer usage
  • Ensure cross-organization data integrity