Latest Partition Missing Detection
This guide explains how to create a Sentinel to detect when the latest partition in your data is missing for the current date. This is particularly useful for monitoring daily data pipelines and ensuring data freshness.
What is Latest Partition Missing Detection?
Latest Partition Missing detection monitors your partitioned data to ensure that new partitions are being created as expected. It checks if the most recent partition date in your data matches today's date. If the latest partition is from a previous date, it indicates that:
- Data pipeline may have failed
- Data ingestion process is delayed
- Data source is not providing current data
- Scheduled data updates are not running
Sample Table Schema
Below is a sample schema for a partitioned table used in this example:
CREATE TABLE sample_db.orders_partitioned (
order_id INT,
customer_id INT,
order_date DATE,
amount DECIMAL(10,2),
status VARCHAR(50)
) PARTITIONED BY (partition_date DATE);
Example Dataset
For this example, consider a table where data is partitioned by date, and you expect new partitions to be created daily. The sentinel will check if today's partition exists.
Step 1: Create the Sentinel
- Go to Sentinels in the sidebar and click Add Sentinel.
- Fill in the form:
-
Sentinel Name: Latest Partition Check BQ
-
Sentinel Source: BigQuery (or your data source)
-
Vault: Select your vault
-
Sentinel Query: Use a query like:
SELECT
MAX(order_date) AS latest_partition_date
FROM
sample_db.orders_partitioned -
Schedule (Cron): Set how often to run (e.g.,
0 18 * * *for 6:00 PM daily) -
Timezone: Choose your timezone (e.g., Asia/Kolkata)
-
Alert Gateway: Select your alert channel (e.g., slack incoming webhook)
-
Step 2: Define the Alert Condition
In the Alert Condition field, enter:
alert if the latest_partition_date is not today's date i.e. the latest partition is missing for today
This tells DeepDQ to compare the maximum partition date in your data with today's date. If they don't match, it indicates that today's partition is missing.
Example Alert Response
When the latest partition is missing, the alert condition validation will show a response like this:
Alert Condition: alert if the latest_partition_date is not today's date i.e. the latest partition is missing for today
⚠️ Alert Triggered! Your alert condition would trigger based on the sample data.
Execution Details:
- Alert Condition Evaluation: True (Alert would trigger)
- Explanation: This analysis will check the most recent date recorded in the data and see if it is different from today's date. If the latest date is not today, it will show those records, indicating that today's data is missing or hasn't been updated yet. This helps you know if the newest data for today is not available.
Matching Records (1 rows):
| latest_partition_date |
|---|
| 2025-08-29 |
This means the latest partition in your data is from August 29, 2025, but today is October 5, 2025, indicating that recent partitions are missing.
Understanding the Alert Logic
The sentinel works by:
- Querying the latest partition date: The SQL query finds the maximum date value in your partitioned data.
- Comparing with today: The alert condition checks if this latest date equals today's date.
- Triggering alert: If the latest partition date is not today, the alert is triggered.
Step 3: Save and Monitor
- Click Save Changes to create the sentinel.
- The sentinel will run on the defined schedule and alert you if the latest partition is missing.
Customizing the Alert Condition
You can modify the alert condition for different scenarios:
-
Check for specific lag tolerance:
alert if the latest_partition_date is more than 1 day behind today's date -
Weekend-aware checking:
alert if the latest_partition_date is not today's date and today is not a weekend -
Multiple partition columns:
alert if the latest_partition_date is not today's date for any table partition -
Business days only:
alert if the latest_partition_date is not today's date and today is a business day
Common Use Cases
1. Daily Data Pipeline Monitoring
Monitor tables that receive daily data loads to ensure they're updated on time.
2. Real-time Data Validation
Check streaming data partitions to ensure continuous data flow.
3. ETL Process Monitoring
Validate that ETL processes are creating expected partitions.
4. Data Freshness Monitoring
Ensure data consumers have access to the most recent data.
Query Variations
Depending on your data source and partitioning strategy, you might need different queries:
For BigQuery
SELECT
MAX(_PARTITIONTIME) AS latest_partition_date
FROM
`project.dataset.table_name`
For Snowflake
SELECT
MAX(partition_date) AS latest_partition_date
FROM
database.schema.table_name
For MySQL
SELECT
MAX(partition_date) AS latest_partition_date
FROM
database.table_name
For SQL Server
SELECT
MAX(partition_date) AS latest_partition_date
FROM
[database].[schema].[table_name]
For PostgreSQL
SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name
For Databricks
SELECT
MAX(partition_date) AS latest_partition_date
FROM
catalog.schema.table_name
For Oracle
SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name
For Amazon Redshift
SELECT
MAX(partition_date) AS latest_partition_date
FROM
schema.table_name
Troubleshooting
False Positives
- Weekend Data: If your pipeline doesn't run on weekends, modify the alert condition.
- Time Zone Issues: Ensure consistent timezone settings between sentinel and data pipeline.
- Holiday Schedules: Account for holidays when data pipelines may not run.
False Negatives
- Partial Partitions: Empty partitions might exist but contain no data.
- Late Arrivals: Data might arrive after the sentinel runs.
Tips
- Use this sentinel in combination with data volume checks for comprehensive monitoring.
- Consider creating different alert conditions for critical vs. non-critical tables.
- Test your alert condition using the validation feature before saving.
- Monitor the sentinel execution logs to understand your data patterns.
- Set up appropriate alert escalation based on business criticality.