Outlier Detection Sentinel Using z Score
Sample Table Schema
Below is a sample schema for the emp table used in this example:
CREATE TABLE emp (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
joining_date DATETIME,
salary INT
);
This guide walks you through creating a Sentinel in DeepDQ to detect salary outliers using z-score logic. It includes a sample dataset (as CSV), SQL query, alert condition, an explanation of the z-score method, and a sample alert response.
Example Dataset
Download the sample dataset as a CSV file: outlier-dataset.csv
Step 1: Create the Sentinel
- Go to Sentinels in the sidebar and click Add Sentinel.
- Fill in the form:
-
Sentinel Name: Salary Outlier Detection
-
Sentinel Source: MySQL (or your data source)
-
Vault: Select your vault
-
Sentinel Query: Use a query like:
SELECT * FROM emp LIMIT 200 -
Schedule (Cron): Set how often to run (e.g.,
35 20 * * *for 8:35 PM daily) -
Timezone: Choose your timezone
-
Alert Gateway: Select your alert channel (e.g., Slack)
-
Step 2: Define the Alert Condition
In the Alert Condition field, enter:
if there is an outlier. use z score
This tells DeepDQ to use z-score logic to detect salary outliers. The appropriate columns for outlier detection are automatically chosen by DeepDQ AI if specific column/columns are not specified in the alert condition. You can validate this condition using the Validate Alert Condition button, which will show if any records are flagged as outliers based on your sample data.
Example Alert Response
When an outlier is detected, the alert condition validation will show a response like this:
Alert Condition: if there is an outlier. use z score
Alert Triggered! Your alert condition would trigger based on the sample data.
Execution Details:
- Alert Condition Evaluation: True (Alert would trigger)
- Code Execution Status: Successful
Matching Records (1 row):
| department | emp_id | joining_date | name | salary |
|---|---|---|---|---|
| Engineering | 21 | 2025-05-22T00:00:00 | Jhonny English | 0 |
This means the sentinel has detected an employee (Jhonny English) whose salary is a statistical outlier based on the z-score calculation.
What is a Z-Score?
A z-score measures how many standard deviations a value is from the mean. In salary outlier detection:
- z = (salary - mean_salary) / stddev_salary
- A salary is considered an outlier if its z-score is above a certain threshold (commonly |z| > 2 or 3).
DeepDQ will automatically calculate the mean and standard deviation for the salary column and flag any records that are statistical outliers.
Step 3: Save and Monitor
- Click Save Changes to create the sentinel.
- The sentinel will run on the defined schedule and alert you if any salary outliers are detected.
Customizing the Alert Condition with Z-Score Thresholds
You can write natural language alert conditions to specify different z-score thresholds for outlier detection. Here are some examples:
-
Default (|z| > 3):
if there is an outlier. use z score (|z| > 3)Only records with a z-score above 3 or below -3 are flagged as outliers (very strict, only extreme outliers).
-
More sensitive (|z| > 2):
if there is an outlier. use z score (|z| > 2)Records with a z-score above 2 or below -2 will be flagged (less strict, more outliers detected).
-
One-sided (z > 2.5):
if there is an outlier. use z score (z > 2.5)Only records with a z-score greater than 2.5 (high outliers) will be flagged.
What happens:
- The sentinel will use your specified threshold to determine which records are considered outliers.
- Lowering the threshold (e.g., |z| > 2) will make the alert more sensitive and may result in more alerts.
- Raising the threshold (e.g., |z| > 4) will make the alert less sensitive and only flag extreme outliers.
Tips
- Adjust the SQL query to filter by department, date, or other criteria as needed.
- You can customize the alert condition for more advanced logic (e.g., only flag outliers for recent hires).
- Review the validation results to ensure your alert condition works as expected.