whylogs
#
whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:
Track changes in their dataset
Create data constraints to know whether their data looks they way it should
Quickly visualize key summary statistics about their datasets
- These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and
data engineers:
Detecting data drift (and resultant ML model performance degradation)
Data quality validation
Exploratory data analysis via data profiling
Tracking data for ML experiments
And many more…
Subpackages#
whylogs.api
whylogs.api.fugue
whylogs.api.logger
whylogs.api.reader
whylogs.api.store
whylogs.api.whylabs
whylogs.api.writer
whylogs.api.writer.gcs
whylogs.api.writer.local
whylogs.api.writer.mlflow
whylogs.api.writer.s3
whylogs.api.writer.whylabs
whylogs.api.writer.whylabs_base
whylogs.api.writer.whylabs_batch_writer
whylogs.api.writer.whylabs_client
whylogs.api.writer.whylabs_estimation_result_writer
whylogs.api.writer.whylabs_reference_writer
whylogs.api.writer.whylabs_transaction_writer
whylogs.api.writer.writer
whylogs.api.annotations
whylogs.context
whylogs.core
whylogs.core.constraints
whylogs.core.metrics
whylogs.core.metrics.aggregators
whylogs.core.metrics.column_metrics
whylogs.core.metrics.compound_metric
whylogs.core.metrics.condition_count_metric
whylogs.core.metrics.decorators
whylogs.core.metrics.deserializers
whylogs.core.metrics.maths
whylogs.core.metrics.metric_components
whylogs.core.metrics.metrics
whylogs.core.metrics.multimetric
whylogs.core.metrics.serializers
whylogs.core.metrics.unicode_range
whylogs.core.model_performance_metrics
whylogs.core.proto
whylogs.core.validators
whylogs.core.view
whylogs.core.column_profile
whylogs.core.common
whylogs.core.configs
whylogs.core.dataset_profile
whylogs.core.datatypes
whylogs.core.errors
whylogs.core.feature_weights
whylogs.core.input_resolver
whylogs.core.metadata
whylogs.core.metric_getters
whylogs.core.predicate_parser
whylogs.core.preprocessing
whylogs.core.projectors
whylogs.core.relations
whylogs.core.resolvers
whylogs.core.schema
whylogs.core.segment
whylogs.core.segmentation_partition
whylogs.core.specialized_resolvers
whylogs.datasets
whylogs.experimental
whylogs.experimental.api
whylogs.experimental.constraints_generation
whylogs.experimental.constraints_generation.condition_counts
whylogs.experimental.constraints_generation.count_metrics
whylogs.experimental.constraints_generation.distribution_metrics
whylogs.experimental.constraints_generation.frequent_items
whylogs.experimental.constraints_generation.multi_metrics
whylogs.experimental.constraints_generation.types_metrics
whylogs.experimental.performance_estimation
whylogs.migration
whylogs.viz
Package Contents#
Classes#
A holder object for profiling results. |
|
A Writable is an object that contains data to write to a file or files. |
Functions#
|
|
|
Function to track metrics based on validation data. |
|
Function to track regression metrics based on validation data. |
|
|
|
|
|
|
|
Set up authentication for this whylogs logging session. There are three modes that you can authentiate in. |
|
|
|
Calculate version number based on pyproject.toml |
- class whylogs.ResultSet[source]#
Bases:
whylogs.api.writer.writer._Writable
,abc.ABC
A holder object for profiling results.
A whylogs.log call can result in more than one profile. This wrapper class simplifies the navigation among these profiles.
Note that currently we only hold one profile but we’re planning to add other kinds of profiles such as segmented profiles here.
- property performance_metrics: Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]#
- Return type
Optional[whylogs.core.model_performance_metrics.ModelPerformanceMetrics]
- abstract view() Optional[whylogs.core.DatasetProfileView] [source]#
- Return type
Optional[whylogs.core.DatasetProfileView]
- abstract profile() Optional[whylogs.core.DatasetProfile] [source]#
- Return type
Optional[whylogs.core.DatasetProfile]
- get_writables() Optional[List[whylogs.api.writer.writer._Writable]] [source]#
- Return type
Optional[List[whylogs.api.writer.writer._Writable]]
- set_dataset_timestamp(dataset_timestamp: datetime.datetime) None [source]#
- Parameters
dataset_timestamp (datetime.datetime) –
- Return type
- add_model_performance_metrics(metrics: whylogs.core.model_performance_metrics.ModelPerformanceMetrics) None [source]#
- Parameters
metrics (whylogs.core.model_performance_metrics.ModelPerformanceMetrics) –
- Return type
- add_metric(name: str, metric: whylogs.core.metrics.metrics.Metric) None [source]#
- Parameters
name (str) –
metric (whylogs.core.metrics.metrics.Metric) –
- Return type
- whylogs.log(obj: Any = None, *, pandas: Optional[whylogs.core.stubs.pd.DataFrame] = None, row: Optional[Dict[str, Any]] = None, schema: Optional[whylogs.core.DatasetSchema] = None, name: Optional[str] = None, multiple: Optional[Dict[str, Loggable]] = None, dataset_timestamp: Optional[datetime.datetime] = None, trace_id: Optional[str] = None, tags: Optional[List[str]] = None, segment_key_values: Optional[Dict[str, str]] = None, debug_event: Optional[Dict[str, Any]] = None) result_set.ResultSet [source]#
- Parameters
obj (Any) –
pandas (Optional[whylogs.core.stubs.pd.DataFrame]) –
row (Optional[Dict[str, Any]]) –
schema (Optional[whylogs.core.DatasetSchema]) –
name (Optional[str]) –
multiple (Optional[Dict[str, Loggable]]) –
dataset_timestamp (Optional[datetime.datetime]) –
trace_id (Optional[str]) –
tags (Optional[List[str]]) –
debug_event (Optional[Dict[str, Any]]) –
- Return type
- whylogs.log_classification_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, score_column: Optional[str] = None, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet [source]#
Function to track metrics based on validation data. user may also pass the associated attribute names associated with target, prediction, and/or score.
- Parameters
data (pd.DataFrame) – Dataframe with the data to log.
target_column (str) – Column name for the actual validated values.
prediction_column (str) – Column name for the predicted values.
score_column (Optional[str], optional) – Associated scores for each inferred, all values set to 1 if None, by default None
schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None
log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.
dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None
- Return type
Examples
data = { "product": ["milk", "carrot", "cheese", "broccoli"], "category": ["dairies", "vegetables", "dairies", "vegetables"], "output_discount": [0, 0, 1, 1], "output_prediction": [0, 0, 0, 1], } df = pd.DataFrame(data) results = why.log_classification_metrics( df, target_column="output_discount", prediction_column="output_prediction", log_full_data=True, )
- whylogs.log_regression_metrics(data: whylogs.core.stubs.pd.DataFrame, target_column: str, prediction_column: str, schema: Optional[whylogs.core.DatasetSchema] = None, log_full_data: bool = False, dataset_timestamp: Optional[datetime.datetime] = None) result_set.ResultSet [source]#
Function to track regression metrics based on validation data. User may also pass the associated attribute names associated with target, prediction, and/or score.
- Parameters
data (pd.DataFrame) – Dataframe with the data to log.
target_column (str) – Column name for the target values.
prediction_column (str) – Column name for the predicted values.
schema (Optional[DatasetSchema], optional) – Defines the schema for tracking metrics in whylogs, by default None
log_full_data (bool, optional) – Whether to log the complete dataframe or not. If True, the complete DF will be logged in addition to the regression metrics. If False, only the calculated regression metrics will be logged. In a typical production use case, the ground truth might not be available at the time the remaining data is generated. In order to prevent double profiling the input features, consider leaving this as False. by default False.
dataset_timestamp (Optional[datetime], optional) – dataset’s timestamp, by default None
- Returns
- Return type
Examples
import pandas as pd import whylogs as why df = pd.DataFrame({"target_temperature": [[10.5, 24.3, 15.6]], "predicted_temperature": [[9.12,26.42,13.12]]}) results = why.log_regression_metrics(df, target_column = "temperature", prediction_column = "prediction_temperature")
- whylogs.profiling(*, schema: Optional[whylogs.core.DatasetSchema] = None)[source]#
- Parameters
schema (Optional[whylogs.core.DatasetSchema]) –
- whylogs.write(profile: whylogs.core.DatasetProfile, base_dir: Optional[str] = None, filename: Optional[str] = None) None [source]#
- Parameters
profile (whylogs.core.DatasetProfile) –
base_dir (Optional[str]) –
filename (Optional[str]) –
- Return type
- whylogs.init(reinit: bool = False, allow_anonymous: bool = True, allow_local: bool = True, whylabs_api_key: Optional[str] = None, default_dataset_id: Optional[str] = None, config_path: Optional[str] = None, **kwargs: bool) whylogs.api.whylabs.session.session.Session [source]#
Set up authentication for this whylogs logging session. There are three modes that you can authentiate in.
- WHYLABS: Data is sent to WhyLabs and is associated with a specific WhyLabs account. You can get a WhyLabs api
key from the WhyLabs Settings page after logging in.
- WHYLABS_ANONYMOUS: Data is sent to WhyLabs, but no authentication happens and no WhyLabs account is required.
Sessions can be claimed into an account later on the WhyLabs website.
- LOCAL: No authentication. No data is automatically sent anywhere. Use this if you want to explore profiles
locally or manually upload them somewhere.
Typically, you should only have to put why.init() with no arguments at the start of your application/notebook/script. The arguments allow for some customization of the logic that determines the session type. Here is the priority order:
If there is an api key directly supplied to init, then use it and authenticate session as WHYLABS.
If there is an api key in the environment variable WHYLABS_API_KEY, then use it and authenticate session as WHYLABS.
If there is an api key in the whylogs config file, then use it and authenticate session as WHYLABS.
- If we’re in an interractive environment (notebook, colab, etc.) then prompt the user to pick a method explicitly.
The options are determined by the allow* argument values to init().
If allow_anonymous is True, then authenticate session as WHYLABS_ANONYMOUS.
If allow_local is True, then authenticate session as LOCAL.
- Parameters
reinit (bool) – Normally, init() is idempotent, so you can run it over and over again in a notebook without any issues, for example. If reinit=True then it will run the initialization logic again, so you can switch authentication methods without restarting.
allow_anonymous (bool) – If True, then the user will be able to choose WHYLABS_ANONYMOUS if no other authentication method is found.
allow_local (bool) – If True, then the user will be able to choose LOCAL if no other authentication method is found.
whylabs_api_key (Optional[str]) – A WhyLabs api key to use for uploading profiles. There are other ways that you can set an api key that don’t require direclty embedding it in code, like setting WHYLABS_API_KEY env variable or supplying the api key interractively via the init() prompt in a notebook.
default_dataset_id (Optional[str]) – The default dataset id to use for uploading profiles. This is only used if the session is authenticated. This is a convenience argument so that you don’t have to supply the dataset id every time you upload a profile if you’re only using a single dataset id.
upload_on_log – If True, and the session type is WHYLOGS or WHYLOGS_ANONYMOUS, automaticall upload the profile to WhyLabs from the why.log() function. By default, uploading to WhyLabs requires an explicit write() call from a WhyLabsWriter.
config_path (Optional[str]) –
kwargs (bool) –
- Return type
- class whylogs.DatasetProfileView(*, columns: Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView], dataset_timestamp: Optional[datetime.datetime], creation_timestamp: Optional[datetime.datetime], metrics: Optional[Dict[str, Any]] = None, metadata: Optional[Dict[str, str]] = None)[source]#
Bases:
whylogs.api.writer.writer._Writable
A Writable is an object that contains data to write to a file or files. These might be temporary files intended to be passed on to another consumer (e.g., WhyLabs servers) via a Writer.
- Parameters
columns (Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]) –
dataset_timestamp (Optional[datetime.datetime]) –
creation_timestamp (Optional[datetime.datetime]) –
metrics (Optional[Dict[str, Any]]) –
- property dataset_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property creation_timestamp: Optional[datetime.datetime]#
- Return type
Optional[datetime.datetime]
- property model_performance_metrics: Any#
- Return type
Any
- set_dataset_timestamp(dataset_timestamp: datetime.datetime) None [source]#
- Parameters
dataset_timestamp (datetime.datetime) –
- Return type
- merge(other: DatasetProfileView) DatasetProfileView [source]#
- Parameters
other (DatasetProfileView) –
- Return type
- get_column(col_name: str) Optional[whylogs.core.view.column_profile_view.ColumnProfileView] [source]#
- Parameters
col_name (str) –
- Return type
Optional[whylogs.core.view.column_profile_view.ColumnProfileView]
- get_columns(col_names: Optional[List[str]] = None) Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView] [source]#
- Parameters
col_names (Optional[List[str]]) –
- Return type
Dict[str, whylogs.core.view.column_profile_view.ColumnProfileView]
- classmethod zero() DatasetProfileView [source]#
- Return type
- classmethod deserialize(data: bytes) DatasetProfileView [source]#
- Parameters
data (bytes) –
- Return type
- classmethod read(path: str) DatasetProfileView [source]#
- Parameters
path (str) –
- Return type
- to_pandas(column_metric: Optional[str] = None, cfg: Optional[whylogs.core.configs.SummaryConfig] = None) whylogs.core.stubs.pd.DataFrame [source]#
- Parameters
column_metric (Optional[str]) –
cfg (Optional[whylogs.core.configs.SummaryConfig]) –
- Return type
whylogs.core.stubs.pd.DataFrame
- whylogs.v0_to_v1_view(msg: whylogs.core.proto.v0.DatasetProfileMessageV0, allow_partial: bool = False) whylogs.core.DatasetProfileView [source]#
- Parameters
msg (whylogs.core.proto.v0.DatasetProfileMessageV0) –
allow_partial (bool) –
- Return type