monitor_schema.models#
Console script for monitor_schema.
Subpackages#
Submodules#
Classes#
Metrics that are applicable at the dataset level. |
|
Simple column metrics that are basically just a single number. |
|
Sketch-based metrics that can only be processed by certain algorithms. |
|
Configuration for running an analysis. |
|
Supported baseline types. |
|
A baseline based on a static reference profile. |
|
A static time range. |
|
A dynamic trailing window. |
|
Using current batch. |
|
An analyzer using stddev for a window of time range. |
|
Detecting the differences between two numerical metrics. |
|
Compare whether the target against either an expect value or against the baseline. |
|
Experimental algorithm that is not standardized by the above ones yet. |
|
Fixed threshold analysis. |
|
Compare whether the target is equal to a value or not. |
|
An analyzer using stddev for a window of time range. |
|
Calculates upper bounds and lower bounds based on stddev from a series of numbers. |
|
Define the matrix of fields and segments to fan out for monitoring. |
|
Define the matrix of columns and segments to fan out for monitoring. |
|
Which nested level we are targeting. |
|
Metadata for a top-level objects such as monitors, analyzers, and schema. |
|
A segment is a list of tags. |
|
Classifying the type. |
|
Options for configuring data type for a column. |
|
Schema configuration for a column. |
|
Object that specifies column weights. |
|
Object that specifies column weights for a segment. |
|
Schema definition of an entity. |
|
Schedule the monitor to run immediately. |
|
Support for scheduling. |
|
Cadence for an analyzer or monitor run. |
|
Support for scheduling based on a predefined cadence. |
|
Metadata for a top-level objects such as monitors, analyzers, and schema. |
|
Configuration for running an analysis. |
|
Schema definition of an entity. |
|
Metadata for a top-level objects such as monitors, analyzers, and schema. |
|
Customer specified monitor configs. |
|
Supported granularity. |
|
The main document that dictates how the monitor should be run. This document is managed by WhyLabs internally. |
|
Support for scheduling. |
|
Support for scheduling based on a predefined cadence. |
|
Schedule the monitor to run immediately. |
|
Metadata for a top-level objects such as monitors, analyzers, and schema. |
|
Actions that are configured at the team/organization level. |
|
Action to send an email. |
|
Action to send a Slack webhook. |
|
Action to send a Slack webhook. |
|
Filter the anomalies based on certain criteria. If the alerts are filtered down to 0, the monitor won't fire. |
|
Config mode that indicates the monitor will send out individual messages per anomaly. |
|
Config mode that indicates the monitor will send out a digest message. |
|
Customer specified monitor configs. |
|
A single tag key value pair for a segment. |
|
A segment is a list of tags. |
Package Contents#
- class monitor_schema.models.DatasetMetric[source]#
-
Metrics that are applicable at the dataset level.
- profile_count = 'profile.count'#
- profile_last_ingestion_time = 'profile.last_ingestion_time'#
- profile_first_ingestion_time = 'profile.first_ingestion_time'#
- column_row_count_sum = 'column_row_count_sum'#
- shape_column_count = 'shape_column_count'#
- shape_row_count = 'shape_row_count'#
- input_count = 'input.count'#
- output_count = 'output.count'#
- classification_f1 = 'classification.f1'#
- classification_precision = 'classification.precision'#
- classification_recall = 'classification.recall'#
- classification_accuracy = 'classification.accuracy'#
- classification_fpr = 'classification.fpr'#
- classification_auroc = 'classification.auroc'#
- regression_mse = 'regression.mse'#
- regression_mae = 'regression.mae'#
- regression_rmse = 'regression.rmse'#
- class monitor_schema.models.SimpleColumnMetric[source]#
-
Simple column metrics that are basically just a single number.
- count = 'count'#
- median = 'median'#
- max = 'max'#
- min = 'min'#
- mean = 'mean'#
- stddev = 'stddev'#
- variance = 'variance'#
- unique_upper = 'unique_upper'#
- unique_upper_ratio = 'unique_upper_ratio'#
- unique_est = 'unique_est'#
- unique_est_ratio = 'unique_est_ratio'#
- unique_lower = 'unique_lower'#
- unique_lower_ratio = 'unique_lower_ratio'#
- count_bool = 'count_bool'#
- count_bool_ratio = 'count_bool_ratio'#
- count_integral = 'count_integral'#
- count_integral_ratio = 'count_integral_ratio'#
- count_fractional = 'count_fractional'#
- count_fractional_ratio = 'count_fractional_ratio'#
- count_string = 'count_string'#
- count_string_ratio = 'count_string_ratio'#
- count_null = 'count_null'#
- count_null_ratio = 'count_null_ratio'#
- inferred_data_type = 'inferred_data_type'#
- quantile_p5 = 'quantile_5'#
- quantile_p75 = 'quantile_75'#
- quantile_p25 = 'quantile_25'#
- quantile_p90 = 'quantile_90'#
- quantile_p95 = 'quantile_95'#
- quantile_p99 = 'quantile_99'#
- class monitor_schema.models.ComplexMetrics[source]#
-
Sketch-based metrics that can only be processed by certain algorithms.
- histogram = 'histogram'#
- frequent_items = 'frequent_items'#
- unique_sketch = 'unique_sketch'#
- column_list = 'column_list'#
- class monitor_schema.models.Analyzer[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Configuration for running an analysis.
An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.
- metadata: monitor_schema.models.commons.Metadata | None#
- tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
- schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#
- targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#
- config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#
- class monitor_schema.models.BaselineType[source]#
-
Supported baseline types.
- BatchTimestamp = 'BatchTimestamp'#
- Reference = 'Reference'#
- TrailingWindow = 'TrailingWindow'#
- TimeRange = 'TimeRange'#
- CurrentBatch = 'CurrentBatch'#
- class monitor_schema.models.ReferenceProfileId[source]#
Bases:
_Baseline
A baseline based on a static reference profile.
A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.
- type: Literal[BaselineType]#
- class monitor_schema.models.TimeRangeBaseline[source]#
Bases:
_SegmentBaseline
A static time range.
Instead of using a single profile or a trailing window, user can lock in a “good” period.
- type: Literal[BaselineType]#
- class monitor_schema.models.TrailingWindowBaseline[source]#
Bases:
_SegmentBaseline
A dynamic trailing window.
This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.
- type: Literal[BaselineType]#
- exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
- class monitor_schema.models.SingleBatchBaseline[source]#
Bases:
_SegmentBaseline
Using current batch.
This is used when you want to use one batch to monitor another batch in a different metric entity.
- type: Literal[BaselineType]#
- class monitor_schema.models.DriftConfig[source]#
Bases:
AlgorithmConfig
An analyzer using stddev for a window of time range.
This analysis will detect whether the data drifts or not. By default, we use hellinger distance with a threshold of 0.7.
- type: Literal[AlgorithmType]#
- algorithm: Literal['hellinger', 'jensenshannon', 'kl_divergence', 'psi']#
- metric: Literal[ComplexMetrics, ComplexMetrics]#
- class monitor_schema.models.DiffConfig[source]#
Bases:
AlgorithmConfig
Detecting the differences between two numerical metrics.
- type: Literal[AlgorithmType]#
- thresholdType: ThresholdType | None#
- class monitor_schema.models.ComparisonConfig[source]#
Bases:
AlgorithmConfig
Compare whether the target against either an expect value or against the baseline.
This is useful to detect data type change, for instance.
- type: Literal[AlgorithmType]#
- operator: ComparisonOperator#
- expected: ExpectedValue | None#
- class monitor_schema.models.ExperimentalConfig[source]#
Bases:
AlgorithmConfig
Experimental algorithm that is not standardized by the above ones yet.
- type: Literal[AlgorithmType]#
- baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
- stub: AlgorithmType | None#
- class monitor_schema.models.FixedThresholdsConfig[source]#
Bases:
AlgorithmConfig
Fixed threshold analysis.
If user fails to set both upper bound and lower bound, this algorithm becomes a no-op. WhyLabs might enforce the present of either fields in the future.
- type: Literal[AlgorithmType]#
- class monitor_schema.models.ColumnListChangeConfig[source]#
Bases:
AlgorithmConfig
Compare whether the target is equal to a value or not.
This is useful to detect data type change, for instance.
- type: Literal[AlgorithmType]#
- mode: Literal['ON_ADD_AND_REMOVE', 'ON_ADD', 'ON_REMOVE'] = 'ON_ADD_AND_REMOVE'#
- metric: Literal[ComplexMetrics]#
- class monitor_schema.models.SeasonalConfig[source]#
Bases:
_ThresholdBaseConfig
An analyzer using stddev for a window of time range.
This will fall back to Poisson distribution if there is only 1 value in the baseline.
This only works with TrailingWindow baseline (TODO: add backend validation)
- type: Literal[AlgorithmType]#
- algorithm: Literal['arima']#
- stddevTimeRanges: List[monitor_schema.models.commons.TimeRange] | None#
- class monitor_schema.models.StddevConfig[source]#
Bases:
_ThresholdBaseConfig
Calculates upper bounds and lower bounds based on stddev from a series of numbers.
An analyzer using stddev for a window of time range.
This calculation will fall back to Poisson distribution if there is only 1 value in the baseline. For 2 values, we use the formula sqrt((x_i - avg(x))^2 / n - 1)
- type: Literal[AlgorithmType]#
- class monitor_schema.models.DatasetMatrix[source]#
Bases:
_BaseMatrix
Define the matrix of fields and segments to fan out for monitoring.
.
- type: Literal[TargetLevel]#
- class monitor_schema.models.ColumnMatrix[source]#
Bases:
_BaseMatrix
Define the matrix of columns and segments to fan out for monitoring.
- type: Literal[TargetLevel]#
- include: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
- exclude: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
- class monitor_schema.models.TargetLevel[source]#
-
Which nested level we are targeting.
- dataset = 'dataset'#
- column = 'column'#
- class monitor_schema.models.Metadata[source]#
Bases:
NoExtrasBaseModel
Metadata for a top-level objects such as monitors, analyzers, and schema.
This object is managed by WhyLabs. Any user-provided values will be ignored on WhyLabs side.
- class monitor_schema.models.Segment[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
A segment is a list of tags.
We normalize these in the backend.
- tags: List[SegmentTag]#
- class monitor_schema.models.ColumnDiscreteness[source]#
-
Classifying the type.
- discrete = 'discrete'#
- continuous = 'continuous'#
- class monitor_schema.models.ColumnDataType[source]#
-
Options for configuring data type for a column.
- integral = 'integral'#
- fractional = 'fractional'#
- boolean = 'bool'#
- string = 'string'#
- unknown = 'unknown'#
- null = 'null'#
- class monitor_schema.models.ColumnSchema[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Schema configuration for a column.
Should be generated by WhyLabs originally but can be overridden by users.
- discreteness: ColumnDiscreteness#
- dataType: ColumnDataType#
- class monitor_schema.models.WeightConfig[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Object that specifies column weights.
By default, the weight of a column is None (unspecified)
If the weight is unspecified, the column is EXCLUDED when you perform a filter/sort by weight
For sorting, unweighted column take the LEAST PRECEDENCE, meaning that weight column have higher priorities
They are not hierarchical: if a segment weight config is specified and a column does not have a weight in that
config, we will not use any hierarchy to resolve the value. It will be None - Order of unweighted column is undefined.
- class monitor_schema.models.SegmentWeightConfig[source]#
Bases:
WeightConfig
Object that specifies column weights for a segment.
- segment: monitor_schema.models.segments.Segment | None#
- class monitor_schema.models.EntitySchema[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Schema definition of an entity.
- metadata: monitor_schema.models.commons.Metadata | None#
- columns: Dict[monitor_schema.models.utils.COLUMN_NAME_TYPE, ColumnSchema]#
- class monitor_schema.models.ImmediateSchedule[source]#
Bases:
NoExtrasBaseModel
Schedule the monitor to run immediately.
- type: Literal['immediate']#
- class monitor_schema.models.CronSchedule[source]#
Bases:
NoExtrasBaseModel
Support for scheduling.
- type: Literal['cron']#
- class monitor_schema.models.Cadence[source]#
-
Cadence for an analyzer or monitor run.
- hourly = 'hourly'#
- daily = 'daily'#
- weekly = 'weekly'#
- monthly = 'monthly'#
- class monitor_schema.models.FixedCadenceSchedule[source]#
Bases:
NoExtrasBaseModel
Support for scheduling based on a predefined cadence.
- type: Literal['fixed']#
- cadence: Literal[Cadence, Cadence, Cadence, Cadence]#
- class monitor_schema.models.Metadata[source]#
Bases:
NoExtrasBaseModel
Metadata for a top-level objects such as monitors, analyzers, and schema.
This object is managed by WhyLabs. Any user-provided values will be ignored on WhyLabs side.
- class monitor_schema.models.Analyzer[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Configuration for running an analysis.
An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.
- metadata: monitor_schema.models.commons.Metadata | None#
- tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
- schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#
- targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#
- config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#
- class monitor_schema.models.EntitySchema[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Schema definition of an entity.
- metadata: monitor_schema.models.commons.Metadata | None#
- columns: Dict[monitor_schema.models.utils.COLUMN_NAME_TYPE, ColumnSchema]#
- class monitor_schema.models.Metadata[source]#
Bases:
NoExtrasBaseModel
Metadata for a top-level objects such as monitors, analyzers, and schema.
This object is managed by WhyLabs. Any user-provided values will be ignored on WhyLabs side.
- class monitor_schema.models.Monitor[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Customer specified monitor configs.
- metadata: monitor_schema.models.commons.Metadata | None#
- tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
- analyzerIds: List[constr(regex='^[A-Za-z0-9_\\-]+$')]#
- schedule: monitor_schema.models.commons.FixedCadenceSchedule | monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.ImmediateSchedule#
- mode: EveryAnomalyMode | DigestMode#
- actions: List[GlobalAction | SendEmail | SlackWebhook | RawWebhook]#
- class monitor_schema.models.Granularity[source]#
-
Supported granularity.
- hourly = 'hourly'#
- daily = 'daily'#
- weekly = 'weekly'#
- monthly = 'monthly'#
- class monitor_schema.models.Document[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
The main document that dictates how the monitor should be run. This document is managed by WhyLabs internally.
- schemaVersion: Literal[1]#
- metadata: monitor_schema.models.commons.Metadata | None#
- granularity: Granularity#
- entitySchema: monitor_schema.models.column_schema.EntitySchema | None#
- weightConfig: monitor_schema.models.column_schema.EntityWeights | None#
- analyzers: List[monitor_schema.models.analyzer.Analyzer]#
- monitors: List[monitor_schema.models.monitor.Monitor]#
- class monitor_schema.models.CronSchedule[source]#
Bases:
NoExtrasBaseModel
Support for scheduling.
- type: Literal['cron']#
- class monitor_schema.models.FixedCadenceSchedule[source]#
Bases:
NoExtrasBaseModel
Support for scheduling based on a predefined cadence.
- type: Literal['fixed']#
- cadence: Literal[Cadence, Cadence, Cadence, Cadence]#
- class monitor_schema.models.ImmediateSchedule[source]#
Bases:
NoExtrasBaseModel
Schedule the monitor to run immediately.
- type: Literal['immediate']#
- class monitor_schema.models.Metadata[source]#
Bases:
NoExtrasBaseModel
Metadata for a top-level objects such as monitors, analyzers, and schema.
This object is managed by WhyLabs. Any user-provided values will be ignored on WhyLabs side.
- class monitor_schema.models.GlobalAction[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Actions that are configured at the team/organization level.
- type: Literal['global']#
- class monitor_schema.models.SendEmail[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Action to send an email.
- type: Literal['email']#
- class monitor_schema.models.SlackWebhook[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Action to send a Slack webhook.
- type: Literal['slack']#
- target: pydantic.HttpUrl#
- class monitor_schema.models.RawWebhook[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Action to send a Slack webhook.
- type: Literal['raw']#
- target: pydantic.HttpUrl#
- class monitor_schema.models.AnomalyFilter[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Filter the anomalies based on certain criteria. If the alerts are filtered down to 0, the monitor won’t fire.
- class monitor_schema.models.EveryAnomalyMode[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Config mode that indicates the monitor will send out individual messages per anomaly.
- type: Literal['EVERY_ANOMALY']#
- filter: AnomalyFilter | None#
- class monitor_schema.models.DigestMode[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Config mode that indicates the monitor will send out a digest message.
- type: Literal['DIGEST']#
- filter: AnomalyFilter | None#
- groupBy: List[DigestModeGrouping] | None#
- class monitor_schema.models.Monitor[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Customer specified monitor configs.
- metadata: monitor_schema.models.commons.Metadata | None#
- tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
- analyzerIds: List[constr(regex='^[A-Za-z0-9_\\-]+$')]#
- schedule: monitor_schema.models.commons.FixedCadenceSchedule | monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.ImmediateSchedule#
- mode: EveryAnomalyMode | DigestMode#
- actions: List[GlobalAction | SendEmail | SlackWebhook | RawWebhook]#
- class monitor_schema.models.SegmentTag[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
A single tag key value pair for a segment.
- class monitor_schema.models.Segment[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
A segment is a list of tags.
We normalize these in the backend.
- tags: List[SegmentTag]#