monitor_schema.models.analyzer#

Analyzer module.

Submodules#

Classes#

`ReferenceProfileId`	A baseline based on a static reference profile.
`SingleBatchBaseline`	Using current batch.
`TimeRangeBaseline`	A static time range.
`TrailingWindowBaseline`	A dynamic trailing window.
`DatasetMetric`	Metrics that are applicable at the dataset level.
`SimpleColumnMetric`	Simple column metrics that are basically just a single number.
`ComplexMetrics`	Sketch-based metrics that can only be processed by certain algorithms.
`ComparisonConfig`	Compare whether the target against either an expect value or against the baseline.
`ColumnListChangeConfig`	Compare whether the target is equal to a value or not.
`FixedThresholdsConfig`	Fixed threshold analysis.
`StddevConfig`	Calculates upper bounds and lower bounds based on stddev from a series of numbers.
`SeasonalConfig`	An analyzer using stddev for a window of time range.
`DriftConfig`	An analyzer using stddev for a window of time range.
`ExperimentalConfig`	Experimental algorithm that is not standardized by the above ones yet.
`DiffConfig`	Detecting the differences between two numerical metrics.
`Analyzer`	Configuration for running an analysis.
`BaselineType`	Supported baseline types.
`ReferenceProfileId`	A baseline based on a static reference profile.
`TrailingWindowBaseline`	A dynamic trailing window.
`TimeRangeBaseline`	A static time range.
`SingleBatchBaseline`	Using current batch.
`TargetLevel`	Which nested level we are targeting.
`DatasetMatrix`	Define the matrix of fields and segments to fan out for monitoring.
`ColumnMatrix`	Define the matrix of columns and segments to fan out for monitoring.

Package Contents#

class monitor_schema.models.analyzer.ReferenceProfileId[source]#

Bases: _Baseline

A baseline based on a static reference profile.

A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.

type: Literal[BaselineType]#

profileId: str#

class monitor_schema.models.analyzer.SingleBatchBaseline[source]#

Bases: _SegmentBaseline

Using current batch.

This is used when you want to use one batch to monitor another batch in a different metric entity.

type: Literal[BaselineType]#

offset: int | None#

datasetId: str#

class monitor_schema.models.analyzer.TimeRangeBaseline[source]#

Bases: _SegmentBaseline

A static time range.

Instead of using a single profile or a trailing window, user can lock in a “good” period.

type: Literal[BaselineType]#

range: monitor_schema.models.commons.TimeRange#

class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#

Bases: _SegmentBaseline

A dynamic trailing window.

This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.

type: Literal[BaselineType]#

size: int#

offset: int | None#

exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#

class monitor_schema.models.analyzer.DatasetMetric[source]#

Bases: str, enum.Enum

Metrics that are applicable at the dataset level.

profile_count = 'profile.count'#

profile_last_ingestion_time = 'profile.last_ingestion_time'#

profile_first_ingestion_time = 'profile.first_ingestion_time'#

column_row_count_sum = 'column_row_count_sum'#

shape_column_count = 'shape_column_count'#

shape_row_count = 'shape_row_count'#

input_count = 'input.count'#

output_count = 'output.count'#

classification_f1 = 'classification.f1'#

classification_precision = 'classification.precision'#

classification_recall = 'classification.recall'#

classification_accuracy = 'classification.accuracy'#

classification_fpr = 'classification.fpr'#

classification_auroc = 'classification.auroc'#

regression_mse = 'regression.mse'#

regression_mae = 'regression.mae'#

regression_rmse = 'regression.rmse'#

class monitor_schema.models.analyzer.SimpleColumnMetric[source]#

Bases: str, enum.Enum

Simple column metrics that are basically just a single number.

count = 'count'#

median = 'median'#

max = 'max'#

min = 'min'#

mean = 'mean'#

stddev = 'stddev'#

variance = 'variance'#

unique_upper = 'unique_upper'#

unique_upper_ratio = 'unique_upper_ratio'#

unique_est = 'unique_est'#

unique_est_ratio = 'unique_est_ratio'#

unique_lower = 'unique_lower'#

unique_lower_ratio = 'unique_lower_ratio'#

count_bool = 'count_bool'#

count_bool_ratio = 'count_bool_ratio'#

count_integral = 'count_integral'#

count_integral_ratio = 'count_integral_ratio'#

count_fractional = 'count_fractional'#

count_fractional_ratio = 'count_fractional_ratio'#

count_string = 'count_string'#

count_string_ratio = 'count_string_ratio'#

count_null = 'count_null'#

count_null_ratio = 'count_null_ratio'#

inferred_data_type = 'inferred_data_type'#

quantile_p5 = 'quantile_5'#

quantile_p75 = 'quantile_75'#

quantile_p25 = 'quantile_25'#

quantile_p90 = 'quantile_90'#

quantile_p95 = 'quantile_95'#

quantile_p99 = 'quantile_99'#

class monitor_schema.models.analyzer.ComplexMetrics[source]#

Bases: str, enum.Enum

Sketch-based metrics that can only be processed by certain algorithms.

histogram = 'histogram'#

frequent_items = 'frequent_items'#

unique_sketch = 'unique_sketch'#

column_list = 'column_list'#

class monitor_schema.models.analyzer.ComparisonConfig[source]#

Bases: AlgorithmConfig

Compare whether the target against either an expect value or against the baseline.

This is useful to detect data type change, for instance.

type: Literal[AlgorithmType]#

operator: ComparisonOperator#

expected: ExpectedValue | None#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline | None#

class monitor_schema.models.analyzer.ColumnListChangeConfig[source]#

Bases: AlgorithmConfig

Compare whether the target is equal to a value or not.

This is useful to detect data type change, for instance.

type: Literal[AlgorithmType]#

mode: Literal['ON_ADD_AND_REMOVE', 'ON_ADD', 'ON_REMOVE'] = 'ON_ADD_AND_REMOVE'#

metric: Literal[ComplexMetrics]#

exclude: List[monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#

class monitor_schema.models.analyzer.FixedThresholdsConfig[source]#

Bases: AlgorithmConfig

Fixed threshold analysis.

If user fails to set both upper bound and lower bound, this algorithm becomes a no-op. WhyLabs might enforce the present of either fields in the future.

type: Literal[AlgorithmType]#

upper: float | None#

lower: float | None#

class monitor_schema.models.analyzer.StddevConfig[source]#

Bases: _ThresholdBaseConfig

Calculates upper bounds and lower bounds based on stddev from a series of numbers.

An analyzer using stddev for a window of time range.

This calculation will fall back to Poisson distribution if there is only 1 value in the baseline. For 2 values, we use the formula sqrt((x_i - avg(x))^2 / n - 1)

type: Literal[AlgorithmType]#

factor: float | None#

minBatchSize: int | None#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId#

class monitor_schema.models.analyzer.SeasonalConfig[source]#

Bases: _ThresholdBaseConfig

An analyzer using stddev for a window of time range.

This will fall back to Poisson distribution if there is only 1 value in the baseline.

This only works with TrailingWindow baseline (TODO: add backend validation)

type: Literal[AlgorithmType]#

algorithm: Literal['arima']#

minBatchSize: int | None#

alpha: float | None#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline#

stddevTimeRanges: List[monitor_schema.models.commons.TimeRange] | None#

stddevMaxBatchSize: int | None#

stddevFactor: float | None#

class monitor_schema.models.analyzer.DriftConfig[source]#

Bases: AlgorithmConfig

An analyzer using stddev for a window of time range.

This analysis will detect whether the data drifts or not. By default, we use hellinger distance with a threshold of 0.7.

type: Literal[AlgorithmType]#

algorithm: Literal['hellinger', 'jensenshannon', 'kl_divergence', 'psi']#

metric: Literal[ComplexMetrics, ComplexMetrics]#

threshold: float#

minBatchSize: int | None#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#

class monitor_schema.models.analyzer.ExperimentalConfig[source]#

Bases: AlgorithmConfig

Experimental algorithm that is not standardized by the above ones yet.

type: Literal[AlgorithmType]#

implementation: str#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#

stub: AlgorithmType | None#

class monitor_schema.models.analyzer.DiffConfig[source]#

Bases: AlgorithmConfig

Detecting the differences between two numerical metrics.

type: Literal[AlgorithmType]#

mode: DiffMode#

thresholdType: ThresholdType | None#

threshold: float#

baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#

class monitor_schema.models.analyzer.Analyzer[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Configuration for running an analysis.

An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.

metadata: monitor_schema.models.commons.Metadata | None#

id: str#

displayName: str | None#

tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#

targetSize: int | None#

schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#

disabled: bool | None#

disableTargetRollup: bool | None#

targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#

dataReadinessDuration: str | None#

batchCoolDownPeriod: str | None#

backfillGracePeriodDuration: str | None#

config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#

class Config[source]#

Updates JSON schema anyOf to oneOf.

static schema_extra(schema: Dict[str, Any], model: pydantic.BaseModel) → None[source]#: Update specific fields here (for Union type, specifically).

class monitor_schema.models.analyzer.BaselineType[source]#

Bases: str, enum.Enum

Supported baseline types.

BatchTimestamp = 'BatchTimestamp'#

Reference = 'Reference'#

TrailingWindow = 'TrailingWindow'#

TimeRange = 'TimeRange'#

CurrentBatch = 'CurrentBatch'#

class monitor_schema.models.analyzer.ReferenceProfileId[source]#

Bases: _Baseline

A baseline based on a static reference profile.

A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.

type: Literal[BaselineType]#

profileId: str#

class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#

Bases: _SegmentBaseline

A dynamic trailing window.

This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.

type: Literal[BaselineType]#

size: int#

offset: int | None#

exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#

class monitor_schema.models.analyzer.TimeRangeBaseline[source]#

Bases: _SegmentBaseline

A static time range.

Instead of using a single profile or a trailing window, user can lock in a “good” period.

type: Literal[BaselineType]#

range: monitor_schema.models.commons.TimeRange#

class monitor_schema.models.analyzer.SingleBatchBaseline[source]#

Bases: _SegmentBaseline

Using current batch.

This is used when you want to use one batch to monitor another batch in a different metric entity.

type: Literal[BaselineType]#

offset: int | None#

datasetId: str#

class monitor_schema.models.analyzer.TargetLevel[source]#

Bases: str, enum.Enum

Which nested level we are targeting.

dataset = 'dataset'#

column = 'column'#

class monitor_schema.models.analyzer.DatasetMatrix[source]#

Bases: _BaseMatrix

Define the matrix of fields and segments to fan out for monitoring.

type: Literal[TargetLevel]#

class monitor_schema.models.analyzer.ColumnMatrix[source]#

Bases: _BaseMatrix

Define the matrix of columns and segments to fan out for monitoring.

type: Literal[TargetLevel]#

include: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#

exclude: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#

profileId: str | None#