Skip to main content

Latest Partition Missing Detection

This guide explains how to create a Sentinel to detect when the latest partition in your data is missing for the current date. This is particularly useful for monitoring daily data pipelines and ensuring data freshness.

What is Latest Partition Missing Detection?

Latest Partition Missing detection monitors your partitioned data to ensure that new partitions are being created as expected. It checks if the most recent partition date in your data matches today's date. If the latest partition is from a previous date, it indicates that:

  • Data pipeline may have failed
  • Data ingestion process is delayed
  • Data source is not providing current data
  • Scheduled data updates are not running

Sample Table Schema

Below is a sample schema for a partitioned table used in this example:

CREATE TABLE sample_db.orders_partitioned (
order_id INT,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2),
status VARCHAR(50)
) PARTITIONED BY (partition_date DATE);

Example Dataset

For this example, consider a table where data is partitioned by date, and you expect new partitions to be created daily. The sentinel will check if today's partition exists.

Step 1: Create the Sentinel

  1. Go to Sentinels in the sidebar and click Add Sentinel.
  2. Fill in the form:
    • Sentinel Name: Latest Partition Check BQ

    • Sentinel Source: BigQuery (or your data source)

    • Vault: Select your vault

    • Sentinel Query: Use a query like:

      SELECT
      MAX(order_date) AS latest_partition_date
      FROM
      sample_db.orders_partitioned
    • Schedule (Cron): Set how often to run (e.g., 0 18 * * * for 6:00 PM daily)

    • Timezone: Choose your timezone (e.g., Asia/Kolkata)

    • Alert Gateway: Select your alert channel (e.g., slack incoming webhook)

Step 2: Define the Alert Condition

In the Alert Condition field, enter:

alert if the latest_partition_date is not today's date i.e. the latest partition is missing for today

This tells DeepDQ to compare the maximum partition date in your data with today's date. If they don't match, it indicates that today's partition is missing.

Example Alert Response

When the latest partition is missing, the alert condition validation will show a response like this:

Alert Condition: alert if the latest_partition_date is not today's date i.e. the latest partition is missing for today

⚠️ Alert Triggered! Your alert condition would trigger based on the sample data.

Execution Details:

  • Alert Condition Evaluation: True (Alert would trigger)
  • Explanation: This analysis will check the most recent date recorded in the data and see if it is different from today's date. If the latest date is not today, it will show those records, indicating that today's data is missing or hasn't been updated yet. This helps you know if the newest data for today is not available.

Matching Records (1 rows):

latest_partition_date
2025-08-29

This means the latest partition in your data is from August 29, 2025, but today is October 5, 2025, indicating that recent partitions are missing.

Understanding the Alert Logic

The sentinel works by:

  1. Querying the latest partition date: The SQL query finds the maximum date value in your partitioned data.
  2. Comparing with today: The alert condition checks if this latest date equals today's date.
  3. Triggering alert: If the latest partition date is not today, the alert is triggered.

Step 3: Save and Monitor

  1. Click Save Changes to create the sentinel.
  2. The sentinel will run on the defined schedule and alert you if the latest partition is missing.

Customizing the Alert Condition

You can modify the alert condition for different scenarios:

  • Check for specific lag tolerance:

    alert if the latest_partition_date is more than 1 day behind today's date
  • Weekend-aware checking:

    alert if the latest_partition_date is not today's date and today is not a weekend
  • Multiple partition columns:

    alert if the latest_partition_date is not today's date for any table partition
  • Business days only:

    alert if the latest_partition_date is not today's date and today is a business day

Common Use Cases

1. Daily Data Pipeline Monitoring

Monitor tables that receive daily data loads to ensure they're updated on time.

2. Real-time Data Validation

Check streaming data partitions to ensure continuous data flow.

3. ETL Process Monitoring

Validate that ETL processes are creating expected partitions.

4. Data Freshness Monitoring

Ensure data consumers have access to the most recent data.

Query Variations

Depending on your data source and partitioning strategy, you might need different queries:

For BigQuery

SELECT
MAX(_PARTITIONTIME) AS latest_partition_date
FROM
`project.dataset.table_name`

For Snowflake

SELECT
MAX(partition_date) AS latest_partition_date
FROM
database.schema.table_name

For MySQL

SELECT
MAX(partition_date) AS latest_partition_date
FROM
database.table_name

For SQL Server

SELECT
MAX(partition_date) AS latest_partition_date
FROM
[database].[schema].[table_name]

For PostgreSQL

SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name

For Databricks

SELECT
MAX(partition_date) AS latest_partition_date
FROM
catalog.schema.table_name

For Oracle

SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name

For Amazon Redshift

SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name

Troubleshooting

False Positives

  • Weekend Data: If your pipeline doesn't run on weekends, modify the alert condition.
  • Time Zone Issues: Ensure consistent timezone settings between sentinel and data pipeline.
  • Holiday Schedules: Account for holidays when data pipelines may not run.

False Negatives

  • Partial Partitions: Empty partitions might exist but contain no data.
  • Late Arrivals: Data might arrive after the sentinel runs.

Tips

  • Use this sentinel in combination with data volume checks for comprehensive monitoring.
  • Consider creating different alert conditions for critical vs. non-critical tables.
  • Test your alert condition using the validation feature before saving.
  • Monitor the sentinel execution logs to understand your data patterns.
  • Set up appropriate alert escalation based on business criticality.