monitor_schema.models.analyzer#

Analyzer module.

Submodules#

Classes#

ReferenceProfileId

A baseline based on a static reference profile.

SingleBatchBaseline

Using current batch.

TimeRangeBaseline

A static time range.

TrailingWindowBaseline

A dynamic trailing window.

DatasetMetric

Metrics that are applicable at the dataset level.

SimpleColumnMetric

Simple column metrics that are basically just a single number.

ComplexMetrics

Sketch-based metrics that can only be processed by certain algorithms.

ComparisonConfig

Compare whether the target against either an expect value or against the baseline.

ColumnListChangeConfig

Compare whether the target is equal to a value or not.

FixedThresholdsConfig

Fixed threshold analysis.

StddevConfig

Calculates upper bounds and lower bounds based on stddev from a series of numbers.

SeasonalConfig

An analyzer using stddev for a window of time range.

DriftConfig

An analyzer using stddev for a window of time range.

ExperimentalConfig

Experimental algorithm that is not standardized by the above ones yet.

DiffConfig

Detecting the differences between two numerical metrics.

Analyzer

Configuration for running an analysis.

BaselineType

Supported baseline types.

ReferenceProfileId

A baseline based on a static reference profile.

TrailingWindowBaseline

A dynamic trailing window.

TimeRangeBaseline

A static time range.

SingleBatchBaseline

Using current batch.

TargetLevel

Which nested level we are targeting.

DatasetMatrix

Define the matrix of fields and segments to fan out for monitoring.

ColumnMatrix

Define the matrix of columns and segments to fan out for monitoring.

Package Contents#

class monitor_schema.models.analyzer.ReferenceProfileId[source]#

Bases: _Baseline

A baseline based on a static reference profile.

A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.

type: Literal[BaselineType]#
profileId: str#
class monitor_schema.models.analyzer.SingleBatchBaseline[source]#

Bases: _SegmentBaseline

Using current batch.

This is used when you want to use one batch to monitor another batch in a different metric entity.

type: Literal[BaselineType]#
offset: int | None#
datasetId: str#
class monitor_schema.models.analyzer.TimeRangeBaseline[source]#

Bases: _SegmentBaseline

A static time range.

Instead of using a single profile or a trailing window, user can lock in a “good” period.

type: Literal[BaselineType]#
range: monitor_schema.models.commons.TimeRange#
class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#

Bases: _SegmentBaseline

A dynamic trailing window.

This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.

type: Literal[BaselineType]#
size: int#
offset: int | None#
exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
class monitor_schema.models.analyzer.DatasetMetric[source]#

Bases: str, enum.Enum

Metrics that are applicable at the dataset level.

profile_count = 'profile.count'#
profile_last_ingestion_time = 'profile.last_ingestion_time'#
profile_first_ingestion_time = 'profile.first_ingestion_time'#
column_row_count_sum = 'column_row_count_sum'#
shape_column_count = 'shape_column_count'#
shape_row_count = 'shape_row_count'#
input_count = 'input.count'#
output_count = 'output.count'#
classification_f1 = 'classification.f1'#
classification_precision = 'classification.precision'#
classification_recall = 'classification.recall'#
classification_accuracy = 'classification.accuracy'#
classification_fpr = 'classification.fpr'#
classification_auroc = 'classification.auroc'#
regression_mse = 'regression.mse'#
regression_mae = 'regression.mae'#
regression_rmse = 'regression.rmse'#
class monitor_schema.models.analyzer.SimpleColumnMetric[source]#

Bases: str, enum.Enum

Simple column metrics that are basically just a single number.

count = 'count'#
median = 'median'#
max = 'max'#
min = 'min'#
mean = 'mean'#
stddev = 'stddev'#
variance = 'variance'#
unique_upper = 'unique_upper'#
unique_upper_ratio = 'unique_upper_ratio'#
unique_est = 'unique_est'#
unique_est_ratio = 'unique_est_ratio'#
unique_lower = 'unique_lower'#
unique_lower_ratio = 'unique_lower_ratio'#
count_bool = 'count_bool'#
count_bool_ratio = 'count_bool_ratio'#
count_integral = 'count_integral'#
count_integral_ratio = 'count_integral_ratio'#
count_fractional = 'count_fractional'#
count_fractional_ratio = 'count_fractional_ratio'#
count_string = 'count_string'#
count_string_ratio = 'count_string_ratio'#
count_null = 'count_null'#
count_null_ratio = 'count_null_ratio'#
inferred_data_type = 'inferred_data_type'#
quantile_p5 = 'quantile_5'#
quantile_p75 = 'quantile_75'#
quantile_p25 = 'quantile_25'#
quantile_p90 = 'quantile_90'#
quantile_p95 = 'quantile_95'#
quantile_p99 = 'quantile_99'#
class monitor_schema.models.analyzer.ComplexMetrics[source]#

Bases: str, enum.Enum

Sketch-based metrics that can only be processed by certain algorithms.

histogram = 'histogram'#
frequent_items = 'frequent_items'#
unique_sketch = 'unique_sketch'#
column_list = 'column_list'#
class monitor_schema.models.analyzer.ComparisonConfig[source]#

Bases: AlgorithmConfig

Compare whether the target against either an expect value or against the baseline.

This is useful to detect data type change, for instance.

type: Literal[AlgorithmType]#
operator: ComparisonOperator#
expected: ExpectedValue | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline | None#
class monitor_schema.models.analyzer.ColumnListChangeConfig[source]#

Bases: AlgorithmConfig

Compare whether the target is equal to a value or not.

This is useful to detect data type change, for instance.

type: Literal[AlgorithmType]#
mode: Literal['ON_ADD_AND_REMOVE', 'ON_ADD', 'ON_REMOVE'] = 'ON_ADD_AND_REMOVE'#
metric: Literal[ComplexMetrics]#
exclude: List[monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
class monitor_schema.models.analyzer.FixedThresholdsConfig[source]#

Bases: AlgorithmConfig

Fixed threshold analysis.

If user fails to set both upper bound and lower bound, this algorithm becomes a no-op. WhyLabs might enforce the present of either fields in the future.

type: Literal[AlgorithmType]#
upper: float | None#
lower: float | None#
class monitor_schema.models.analyzer.StddevConfig[source]#

Bases: _ThresholdBaseConfig

Calculates upper bounds and lower bounds based on stddev from a series of numbers.

An analyzer using stddev for a window of time range.

This calculation will fall back to Poisson distribution if there is only 1 value in the baseline. For 2 values, we use the formula sqrt((x_i - avg(x))^2 / n - 1)

type: Literal[AlgorithmType]#
factor: float | None#
minBatchSize: int | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId#
class monitor_schema.models.analyzer.SeasonalConfig[source]#

Bases: _ThresholdBaseConfig

An analyzer using stddev for a window of time range.

This will fall back to Poisson distribution if there is only 1 value in the baseline.

This only works with TrailingWindow baseline (TODO: add backend validation)

type: Literal[AlgorithmType]#
algorithm: Literal['arima']#
minBatchSize: int | None#
alpha: float | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline#
stddevTimeRanges: List[monitor_schema.models.commons.TimeRange] | None#
stddevMaxBatchSize: int | None#
stddevFactor: float | None#
class monitor_schema.models.analyzer.DriftConfig[source]#

Bases: AlgorithmConfig

An analyzer using stddev for a window of time range.

This analysis will detect whether the data drifts or not. By default, we use hellinger distance with a threshold of 0.7.

type: Literal[AlgorithmType]#
algorithm: Literal['hellinger', 'jensenshannon', 'kl_divergence', 'psi']#
metric: Literal[ComplexMetrics, ComplexMetrics]#
threshold: float#
minBatchSize: int | None#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
class monitor_schema.models.analyzer.ExperimentalConfig[source]#

Bases: AlgorithmConfig

Experimental algorithm that is not standardized by the above ones yet.

type: Literal[AlgorithmType]#
implementation: str#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
stub: AlgorithmType | None#
class monitor_schema.models.analyzer.DiffConfig[source]#

Bases: AlgorithmConfig

Detecting the differences between two numerical metrics.

type: Literal[AlgorithmType]#
mode: DiffMode#
thresholdType: ThresholdType | None#
threshold: float#
baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
class monitor_schema.models.analyzer.Analyzer[source]#

Bases: monitor_schema.models.commons.NoExtrasBaseModel

Configuration for running an analysis.

An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.

metadata: monitor_schema.models.commons.Metadata | None#
id: str#
displayName: str | None#
tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
targetSize: int | None#
schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#
disabled: bool | None#
disableTargetRollup: bool | None#
targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#
dataReadinessDuration: str | None#
batchCoolDownPeriod: str | None#
backfillGracePeriodDuration: str | None#
config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#
class Config[source]#

Updates JSON schema anyOf to oneOf.

static schema_extra(schema: Dict[str, Any], model: pydantic.BaseModel) None[source]#

Update specific fields here (for Union type, specifically).

class monitor_schema.models.analyzer.BaselineType[source]#

Bases: str, enum.Enum

Supported baseline types.

BatchTimestamp = 'BatchTimestamp'#
Reference = 'Reference'#
TrailingWindow = 'TrailingWindow'#
TimeRange = 'TimeRange'#
CurrentBatch = 'CurrentBatch'#
class monitor_schema.models.analyzer.ReferenceProfileId[source]#

Bases: _Baseline

A baseline based on a static reference profile.

A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.

type: Literal[BaselineType]#
profileId: str#
class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#

Bases: _SegmentBaseline

A dynamic trailing window.

This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.

type: Literal[BaselineType]#
size: int#
offset: int | None#
exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
class monitor_schema.models.analyzer.TimeRangeBaseline[source]#

Bases: _SegmentBaseline

A static time range.

Instead of using a single profile or a trailing window, user can lock in a “good” period.

type: Literal[BaselineType]#
range: monitor_schema.models.commons.TimeRange#
class monitor_schema.models.analyzer.SingleBatchBaseline[source]#

Bases: _SegmentBaseline

Using current batch.

This is used when you want to use one batch to monitor another batch in a different metric entity.

type: Literal[BaselineType]#
offset: int | None#
datasetId: str#
class monitor_schema.models.analyzer.TargetLevel[source]#

Bases: str, enum.Enum

Which nested level we are targeting.

dataset = 'dataset'#
column = 'column'#
class monitor_schema.models.analyzer.DatasetMatrix[source]#

Bases: _BaseMatrix

Define the matrix of fields and segments to fan out for monitoring.

.

type: Literal[TargetLevel]#
class monitor_schema.models.analyzer.ColumnMatrix[source]#

Bases: _BaseMatrix

Define the matrix of columns and segments to fan out for monitoring.

type: Literal[TargetLevel]#
include: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
exclude: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
profileId: str | None#