Skip to main content

Outlier Detection Sentinel Using z Score

Sample Table Schema

Below is a sample schema for the emp table used in this example:

CREATE TABLE emp (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
joining_date DATETIME,
salary INT
);

This guide walks you through creating a Sentinel in DeepDQ to detect salary outliers using z-score logic. It includes a sample dataset (as CSV), SQL query, alert condition, an explanation of the z-score method, and a sample alert response.

Example Dataset

Download the sample dataset as a CSV file: outlier-dataset.csv

Step 1: Create the Sentinel

  1. Go to Sentinels in the sidebar and click Add Sentinel.
  2. Fill in the form:
    • Sentinel Name: Salary Outlier Detection

    • Sentinel Source: MySQL (or your data source)

    • Vault: Select your vault

    • Sentinel Query: Use a query like:

      SELECT * FROM emp LIMIT 200
    • Schedule (Cron): Set how often to run (e.g., 35 20 * * * for 8:35 PM daily)

    • Timezone: Choose your timezone

    • Alert Gateway: Select your alert channel (e.g., Slack)

Step 2: Define the Alert Condition

In the Alert Condition field, enter:

if there is an outlier. use z score

This tells DeepDQ to use z-score logic to detect salary outliers. The appropriate columns for outlier detection are automatically chosen by DeepDQ AI if specific column/columns are not specified in the alert condition. You can validate this condition using the Validate Alert Condition button, which will show if any records are flagged as outliers based on your sample data.

Example Alert Response

When an outlier is detected, the alert condition validation will show a response like this:

Alert Condition: if there is an outlier. use z score

Alert Triggered! Your alert condition would trigger based on the sample data.

Execution Details:

  • Alert Condition Evaluation: True (Alert would trigger)
  • Code Execution Status: Successful

Matching Records (1 row):

departmentemp_idjoining_datenamesalary
Engineering212025-05-22T00:00:00Jhonny English0

This means the sentinel has detected an employee (Jhonny English) whose salary is a statistical outlier based on the z-score calculation.

What is a Z-Score?

A z-score measures how many standard deviations a value is from the mean. In salary outlier detection:

  • z = (salary - mean_salary) / stddev_salary
  • A salary is considered an outlier if its z-score is above a certain threshold (commonly |z| > 2 or 3).

DeepDQ will automatically calculate the mean and standard deviation for the salary column and flag any records that are statistical outliers.

Step 3: Save and Monitor

  1. Click Save Changes to create the sentinel.
  2. The sentinel will run on the defined schedule and alert you if any salary outliers are detected.

Customizing the Alert Condition with Z-Score Thresholds

You can write natural language alert conditions to specify different z-score thresholds for outlier detection. Here are some examples:

  • Default (|z| > 3):

    if there is an outlier. use z score (|z| > 3)

    Only records with a z-score above 3 or below -3 are flagged as outliers (very strict, only extreme outliers).

  • More sensitive (|z| > 2):

    if there is an outlier. use z score (|z| > 2)

    Records with a z-score above 2 or below -2 will be flagged (less strict, more outliers detected).

  • One-sided (z > 2.5):

    if there is an outlier. use z score (z > 2.5)

    Only records with a z-score greater than 2.5 (high outliers) will be flagged.

What happens:

  • The sentinel will use your specified threshold to determine which records are considered outliers.
  • Lowering the threshold (e.g., |z| > 2) will make the alert more sensitive and may result in more alerts.
  • Raising the threshold (e.g., |z| > 4) will make the alert less sensitive and only flag extreme outliers.

Tips

  • Adjust the SQL query to filter by department, date, or other criteria as needed.
  • You can customize the alert condition for more advanced logic (e.g., only flag outliers for recent hires).
  • Review the validation results to ensure your alert condition works as expected.