monitor_schema.cli#

Console script for monitor_schema.

Classes#

Analyzer

Configuration for running an analysis.

AnomalyFilter

Filter the anomalies based on certain criteria. If the alerts are filtered down to 0, the monitor won't fire.

BaselineType

Supported baseline types.

Cadence

Cadence for an analyzer or monitor run.

ColumnDataType

Options for configuring data type for a column.

ColumnDiscreteness

Classifying the type.

ColumnMatrix

Define the matrix of columns and segments to fan out for monitoring.

ColumnSchema

Schema configuration for a column.

DatasetMatrix

Define the matrix of fields and segments to fan out for monitoring.

DigestMode

Config mode that indicates the monitor will send out a digest message.

Document

The main document that dictates how the monitor should be run. This document is managed by WhyLabs internally.

DriftConfig

An analyzer using stddev for a window of time range.

EntitySchema

Schema definition of an entity.

EveryAnomalyMode

Config mode that indicates the monitor will send out individual messages per anomaly.

FixedCadenceSchedule

Support for scheduling based on a predefined cadence.

GlobalAction

Actions that are configured at the team/organization level.

Granularity

Supported granularity.

Monitor

Customer specified monitor configs.

Segment

A segment is a list of tags.

SendEmail

Action to send an email.

SlackWebhook

Action to send a Slack webhook.

StddevConfig

Calculates upper bounds and lower bounds based on stddev from a series of numbers.

TargetLevel

Which nested level we are targeting.

TrailingWindowBaseline

A dynamic trailing window.

Functions#

main(→ None)

Generates schema and example document JSON.

_dump_json_yaml(→ None)

Module Contents#

class monitor_schema.cli.Analyzer[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Configuration for running an analysis.

An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.

metadata: monitor_schema.models.commons.Metadata | None#
id: str#
displayName: str | None#
tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
targetSize: int | None#
schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#
disabled: bool | None#
disableTargetRollup: bool | None#
targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#
dataReadinessDuration: str | None#
batchCoolDownPeriod: str | None#
backfillGracePeriodDuration: str | None#
config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#
class Config[source]#

Updates JSON schema anyOf to oneOf.

static schema_extra(schema: Dict[str, Any], model: pydantic.BaseModel) None[source]#

Update specific fields here (for Union type, specifically).

class monitor_schema.cli.AnomalyFilter[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Filter the anomalies based on certain criteria. If the alerts are filtered down to 0, the monitor won’t fire.

includeColumns: List[monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
excludeColumns: List[monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
minWeight: float | None#
maxWeight: float | None#
minRankByWeight: int | None#
maxRankByWeight: int | None#
minTotalWeight: float | None#
maxTotalWeight: float | None#
minAlertCount: int | None#
maxAlertCount: int | None#
includeMetrics: List[monitor_schema.models.utils.METRIC_NAME_STR] | None#
class monitor_schema.cli.BaselineType[source]#

Bases: str, enum.Enum

Supported baseline types.

BatchTimestamp = 'BatchTimestamp'#
Reference = 'Reference'#
TrailingWindow = 'TrailingWindow'#
TimeRange = 'TimeRange'#
CurrentBatch = 'CurrentBatch'#
class monitor_schema.cli.Cadence[source]#

Bases: str, enum.Enum

Cadence for an analyzer or monitor run.

hourly = 'hourly'#
daily = 'daily'#
weekly = 'weekly'#
monthly = 'monthly'#
class monitor_schema.cli.ColumnDataType[source]#

Bases: str, enum.Enum

Options for configuring data type for a column.

integral = 'integral'#
fractional = 'fractional'#
boolean = 'bool'#
string = 'string'#
unknown = 'unknown'#
null = 'null'#
class monitor_schema.cli.ColumnDiscreteness[source]#

Bases: str, enum.Enum

Classifying the type.

discrete = 'discrete'#
continuous = 'continuous'#
class monitor_schema.cli.ColumnMatrix[source]#

Bases: _BaseMatrix

Define the matrix of columns and segments to fan out for monitoring.

type: Literal[TargetLevel]#
include: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
exclude: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
profileId: str | None#
class monitor_schema.cli.ColumnSchema[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Schema configuration for a column.

Should be generated by WhyLabs originally but can be overridden by users.

discreteness: ColumnDiscreteness#
dataType: ColumnDataType#
classifier: str | None#
class monitor_schema.cli.DatasetMatrix[source]#

Bases: _BaseMatrix

Define the matrix of fields and segments to fan out for monitoring.

.

type: Literal[TargetLevel]#
class monitor_schema.cli.DigestMode[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Config mode that indicates the monitor will send out a digest message.

type: Literal['DIGEST']#
filter: AnomalyFilter | None#
creationTimeOffset: str | None#
datasetTimestampOffset: str | None#
groupBy: List[DigestModeGrouping] | None#
class monitor_schema.cli.Document[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

The main document that dictates how the monitor should be run. This document is managed by WhyLabs internally.

id: uuid.UUID | None#
schemaVersion: Literal[1]#
metadata: monitor_schema.models.commons.Metadata | None#
orgId: str#
datasetId: str#
granularity: Granularity#
allowPartialTargetBatches: bool | None#
entitySchema: monitor_schema.models.column_schema.EntitySchema | None#
weightConfig: monitor_schema.models.column_schema.EntityWeights | None#
analyzers: List[monitor_schema.models.analyzer.Analyzer]#
monitors: List[monitor_schema.models.monitor.Monitor]#
class monitor_schema.cli.DriftConfig[source]#

Bases: AlgorithmConfig

An analyzer using stddev for a window of time range.

This analysis will detect whether the data drifts or not. By default, we use hellinger distance with a threshold of 0.7.

type: Literal[AlgorithmType]#
algorithm: Literal['hellinger', 'jensenshannon', 'kl_divergence', 'psi']#
metric: Literal[ComplexMetrics, ComplexMetrics]#
threshold: float#
minBatchSize: int | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
class monitor_schema.cli.EntitySchema[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Schema definition of an entity.

metadata: monitor_schema.models.commons.Metadata | None#
columns: Dict[monitor_schema.models.utils.COLUMN_NAME_TYPE, ColumnSchema]#
class monitor_schema.cli.EveryAnomalyMode[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Config mode that indicates the monitor will send out individual messages per anomaly.

type: Literal['EVERY_ANOMALY']#
filter: AnomalyFilter | None#
class monitor_schema.cli.FixedCadenceSchedule[source]#

Bases: NoExtrasBaseModel

Support for scheduling based on a predefined cadence.

type: Literal['fixed']#
cadence: Literal[Cadence, Cadence, Cadence, Cadence]#
exclusionRanges: List[TimeRange] | None#
class monitor_schema.cli.GlobalAction[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Actions that are configured at the team/organization level.

type: Literal['global']#
target: str#
class monitor_schema.cli.Granularity[source]#

Bases: str, enum.Enum

Supported granularity.

hourly = 'hourly'#
daily = 'daily'#
weekly = 'weekly'#
monthly = 'monthly'#
class monitor_schema.cli.Monitor[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Customer specified monitor configs.

metadata: monitor_schema.models.commons.Metadata | None#
id: str#
displayName: str | None#
tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
analyzerIds: List[constr(regex='^[A-Za-z0-9_\\-]+$')]#
schedule: monitor_schema.models.commons.FixedCadenceSchedule | monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.ImmediateSchedule#
disabled: bool | None#
severity: int | None#
mode: EveryAnomalyMode | DigestMode#
actions: List[GlobalAction | SendEmail | SlackWebhook | RawWebhook]#
class Config[source]#

Updates JSON schema anyOf to oneOf.

static schema_extra(schema: Dict[str, Any], model: pydantic.BaseModel) None[source]#

Update specific fields here (for Union type, specifically).

class monitor_schema.cli.Segment[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

A segment is a list of tags.

We normalize these in the backend.

tags: List[SegmentTag]#
class monitor_schema.cli.SendEmail[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Action to send an email.

type: Literal['email']#
target: str#
class monitor_schema.cli.SlackWebhook[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Action to send a Slack webhook.

type: Literal['slack']#
target: pydantic.HttpUrl#
class monitor_schema.cli.StddevConfig[source]#

Bases: _ThresholdBaseConfig

Calculates upper bounds and lower bounds based on stddev from a series of numbers.

An analyzer using stddev for a window of time range.

This calculation will fall back to Poisson distribution if there is only 1 value in the baseline. For 2 values, we use the formula sqrt((x_i - avg(x))^2 / n - 1)

type: Literal[AlgorithmType]#
factor: float | None#
minBatchSize: int | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId#
class monitor_schema.cli.TargetLevel[source]#

Bases: str, enum.Enum

Which nested level we are targeting.

dataset = 'dataset'#
column = 'column'#
class monitor_schema.cli.TrailingWindowBaseline[source]#

Bases: _SegmentBaseline

A dynamic trailing window.

This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.

type: Literal[BaselineType]#
size: int#
offset: int | None#
exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
monitor_schema.cli.main() None#

Generates schema and example document JSON.

monitor_schema.cli._dump_json_yaml(file_name: str, json_content: str) None[source]#