monitor_schema.models.analyzer#
Analyzer module.
Submodules#
Classes#
A baseline based on a static reference profile. |
|
Using current batch. |
|
A static time range. |
|
A dynamic trailing window. |
|
Metrics that are applicable at the dataset level. |
|
Simple column metrics that are basically just a single number. |
|
Sketch-based metrics that can only be processed by certain algorithms. |
|
Compare whether the target against either an expect value or against the baseline. |
|
Compare whether the target is equal to a value or not. |
|
Fixed threshold analysis. |
|
Calculates upper bounds and lower bounds based on stddev from a series of numbers. |
|
An analyzer using stddev for a window of time range. |
|
An analyzer using stddev for a window of time range. |
|
Experimental algorithm that is not standardized by the above ones yet. |
|
Detecting the differences between two numerical metrics. |
|
Configuration for running an analysis. |
|
Supported baseline types. |
|
A baseline based on a static reference profile. |
|
A dynamic trailing window. |
|
A static time range. |
|
Using current batch. |
|
Which nested level we are targeting. |
|
Define the matrix of fields and segments to fan out for monitoring. |
|
Define the matrix of columns and segments to fan out for monitoring. |
Package Contents#
- class monitor_schema.models.analyzer.ReferenceProfileId[source]#
Bases:
_Baseline
A baseline based on a static reference profile.
A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.SingleBatchBaseline[source]#
Bases:
_SegmentBaseline
Using current batch.
This is used when you want to use one batch to monitor another batch in a different metric entity.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.TimeRangeBaseline[source]#
Bases:
_SegmentBaseline
A static time range.
Instead of using a single profile or a trailing window, user can lock in a “good” period.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#
Bases:
_SegmentBaseline
A dynamic trailing window.
This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.
- type: Literal[BaselineType]#
- exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
- class monitor_schema.models.analyzer.DatasetMetric[source]#
-
Metrics that are applicable at the dataset level.
- profile_count = 'profile.count'#
- profile_last_ingestion_time = 'profile.last_ingestion_time'#
- profile_first_ingestion_time = 'profile.first_ingestion_time'#
- column_row_count_sum = 'column_row_count_sum'#
- shape_column_count = 'shape_column_count'#
- shape_row_count = 'shape_row_count'#
- input_count = 'input.count'#
- output_count = 'output.count'#
- classification_f1 = 'classification.f1'#
- classification_precision = 'classification.precision'#
- classification_recall = 'classification.recall'#
- classification_accuracy = 'classification.accuracy'#
- classification_fpr = 'classification.fpr'#
- classification_auroc = 'classification.auroc'#
- regression_mse = 'regression.mse'#
- regression_mae = 'regression.mae'#
- regression_rmse = 'regression.rmse'#
- class monitor_schema.models.analyzer.SimpleColumnMetric[source]#
-
Simple column metrics that are basically just a single number.
- count = 'count'#
- median = 'median'#
- max = 'max'#
- min = 'min'#
- mean = 'mean'#
- stddev = 'stddev'#
- variance = 'variance'#
- unique_upper = 'unique_upper'#
- unique_upper_ratio = 'unique_upper_ratio'#
- unique_est = 'unique_est'#
- unique_est_ratio = 'unique_est_ratio'#
- unique_lower = 'unique_lower'#
- unique_lower_ratio = 'unique_lower_ratio'#
- count_bool = 'count_bool'#
- count_bool_ratio = 'count_bool_ratio'#
- count_integral = 'count_integral'#
- count_integral_ratio = 'count_integral_ratio'#
- count_fractional = 'count_fractional'#
- count_fractional_ratio = 'count_fractional_ratio'#
- count_string = 'count_string'#
- count_string_ratio = 'count_string_ratio'#
- count_null = 'count_null'#
- count_null_ratio = 'count_null_ratio'#
- inferred_data_type = 'inferred_data_type'#
- quantile_p5 = 'quantile_5'#
- quantile_p75 = 'quantile_75'#
- quantile_p25 = 'quantile_25'#
- quantile_p90 = 'quantile_90'#
- quantile_p95 = 'quantile_95'#
- quantile_p99 = 'quantile_99'#
- class monitor_schema.models.analyzer.ComplexMetrics[source]#
-
Sketch-based metrics that can only be processed by certain algorithms.
- histogram = 'histogram'#
- frequent_items = 'frequent_items'#
- unique_sketch = 'unique_sketch'#
- column_list = 'column_list'#
- class monitor_schema.models.analyzer.ComparisonConfig[source]#
Bases:
AlgorithmConfig
Compare whether the target against either an expect value or against the baseline.
This is useful to detect data type change, for instance.
- type: Literal[AlgorithmType]#
- operator: ComparisonOperator#
- expected: ExpectedValue | None#
- class monitor_schema.models.analyzer.ColumnListChangeConfig[source]#
Bases:
AlgorithmConfig
Compare whether the target is equal to a value or not.
This is useful to detect data type change, for instance.
- type: Literal[AlgorithmType]#
- mode: Literal['ON_ADD_AND_REMOVE', 'ON_ADD', 'ON_REMOVE'] = 'ON_ADD_AND_REMOVE'#
- metric: Literal[ComplexMetrics]#
- class monitor_schema.models.analyzer.FixedThresholdsConfig[source]#
Bases:
AlgorithmConfig
Fixed threshold analysis.
If user fails to set both upper bound and lower bound, this algorithm becomes a no-op. WhyLabs might enforce the present of either fields in the future.
- type: Literal[AlgorithmType]#
- class monitor_schema.models.analyzer.StddevConfig[source]#
Bases:
_ThresholdBaseConfig
Calculates upper bounds and lower bounds based on stddev from a series of numbers.
An analyzer using stddev for a window of time range.
This calculation will fall back to Poisson distribution if there is only 1 value in the baseline. For 2 values, we use the formula sqrt((x_i - avg(x))^2 / n - 1)
- type: Literal[AlgorithmType]#
- class monitor_schema.models.analyzer.SeasonalConfig[source]#
Bases:
_ThresholdBaseConfig
An analyzer using stddev for a window of time range.
This will fall back to Poisson distribution if there is only 1 value in the baseline.
This only works with TrailingWindow baseline (TODO: add backend validation)
- type: Literal[AlgorithmType]#
- algorithm: Literal['arima']#
- stddevTimeRanges: List[monitor_schema.models.commons.TimeRange] | None#
- class monitor_schema.models.analyzer.DriftConfig[source]#
Bases:
AlgorithmConfig
An analyzer using stddev for a window of time range.
This analysis will detect whether the data drifts or not. By default, we use hellinger distance with a threshold of 0.7.
- type: Literal[AlgorithmType]#
- algorithm: Literal['hellinger', 'jensenshannon', 'kl_divergence', 'psi']#
- metric: Literal[ComplexMetrics, ComplexMetrics]#
- class monitor_schema.models.analyzer.ExperimentalConfig[source]#
Bases:
AlgorithmConfig
Experimental algorithm that is not standardized by the above ones yet.
- type: Literal[AlgorithmType]#
- baseline: monitor_schema.models.analyzer.baseline.TrailingWindowBaseline | monitor_schema.models.analyzer.baseline.ReferenceProfileId | monitor_schema.models.analyzer.baseline.TimeRangeBaseline | monitor_schema.models.analyzer.baseline.SingleBatchBaseline#
- stub: AlgorithmType | None#
- class monitor_schema.models.analyzer.DiffConfig[source]#
Bases:
AlgorithmConfig
Detecting the differences between two numerical metrics.
- type: Literal[AlgorithmType]#
- thresholdType: ThresholdType | None#
- class monitor_schema.models.analyzer.Analyzer[source]#
Bases:
monitor_schema.models.commons.NoExtrasBaseModel
Configuration for running an analysis.
An analysis targets a metric (note that a metric could be a complex object) for one or multiple fields in one or multiple segments. The output is a list of ‘anomalies’ that might show issues with data.
- metadata: monitor_schema.models.commons.Metadata | None#
- tags: Optional[List[constr(min_length=3, max_length=256, regex='[0-9a-zA-Z\\-_]')]]#
- schedule: monitor_schema.models.commons.CronSchedule | monitor_schema.models.commons.FixedCadenceSchedule | None#
- targetMatrix: monitor_schema.models.analyzer.targets.ColumnMatrix | monitor_schema.models.analyzer.targets.DatasetMatrix | None#
- config: monitor_schema.models.analyzer.algorithms.ConjunctionConfig | monitor_schema.models.analyzer.algorithms.DisjunctionConfig | monitor_schema.models.analyzer.algorithms.DiffConfig | monitor_schema.models.analyzer.algorithms.ComparisonConfig | monitor_schema.models.analyzer.algorithms.ListComparisonConfig | monitor_schema.models.analyzer.algorithms.FrequentStringComparisonConfig | monitor_schema.models.analyzer.algorithms.ColumnListChangeConfig | monitor_schema.models.analyzer.algorithms.FixedThresholdsConfig | monitor_schema.models.analyzer.algorithms.StddevConfig | monitor_schema.models.analyzer.algorithms.DriftConfig | monitor_schema.models.analyzer.algorithms.ExperimentalConfig | monitor_schema.models.analyzer.algorithms.SeasonalConfig#
- class monitor_schema.models.analyzer.BaselineType[source]#
-
Supported baseline types.
- BatchTimestamp = 'BatchTimestamp'#
- Reference = 'Reference'#
- TrailingWindow = 'TrailingWindow'#
- TimeRange = 'TimeRange'#
- CurrentBatch = 'CurrentBatch'#
- class monitor_schema.models.analyzer.ReferenceProfileId[source]#
Bases:
_Baseline
A baseline based on a static reference profile.
A typical use case is to use a “gold” dataset and upload its profile to WhyLabs. This can be a training dataset as well for an ML model.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.TrailingWindowBaseline[source]#
Bases:
_SegmentBaseline
A dynamic trailing window.
This is useful if you don’t have a static baseline to monitor against. This is the default mode for most monitors.
- type: Literal[BaselineType]#
- exclusionRanges: List[monitor_schema.models.commons.TimeRange] | None#
- class monitor_schema.models.analyzer.TimeRangeBaseline[source]#
Bases:
_SegmentBaseline
A static time range.
Instead of using a single profile or a trailing window, user can lock in a “good” period.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.SingleBatchBaseline[source]#
Bases:
_SegmentBaseline
Using current batch.
This is used when you want to use one batch to monitor another batch in a different metric entity.
- type: Literal[BaselineType]#
- class monitor_schema.models.analyzer.TargetLevel[source]#
-
Which nested level we are targeting.
- dataset = 'dataset'#
- column = 'column'#
- class monitor_schema.models.analyzer.DatasetMatrix[source]#
Bases:
_BaseMatrix
Define the matrix of fields and segments to fan out for monitoring.
.
- type: Literal[TargetLevel]#
- class monitor_schema.models.analyzer.ColumnMatrix[source]#
Bases:
_BaseMatrix
Define the matrix of columns and segments to fan out for monitoring.
- type: Literal[TargetLevel]#
- include: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#
- exclude: List[ColumnGroups | monitor_schema.models.utils.COLUMN_NAME_TYPE] | None#