🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!

Getting Started#

whylogs provides a standard to log any kind of data.

With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:

Data Visualization
Data Validation
Tracking changes in your datasets

Table of Content#

In this example, we’ll explore the basics of logging data with whylogs:

Installing whylogs
Profiling data
Interacting with the profile
Writing/Reading profiles to/from disk

Installing whylogs#

whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs:

[1]:

# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs

Minimal requirements:

Python 3.7+ up to Python 3.10
Windows, Linux x86_64, and MacOS 10+

Loading a Pandas DataFrame#

Before showing how we can log data, we first need the data itself. Let’s create a simple Pandas DataFrame:

[8]:

import pandas as pd
data = {
    "animal": ["cat", "hawk", "snake", "cat"],
    "legs": [4, 2, 0, 4],
    "weight": [4.3, 1.8, 1.3, 4.1],
}

df = pd.DataFrame(data)

Profiling with whylogs#

To obtain a profile of your data, you can simply use whylogs’ log call, and navigate through the result to a specific profile with profile():

[3]:

import whylogs as why

results = why.log(df)
profile = results.profile()

Analyzing Profiles#

Once you’re done logging the data, you can generate a Profile View and inspect it in a Pandas Dataframe format:

[9]:

prof_view = profile.view()
prof_df = prof_view.to_pandas()

prof_df

[9]:

	cardinality/est	cardinality/lower_1	cardinality/upper_1	counts/inf	counts/n	counts/nan	counts/null	distribution/max	distribution/mean	distribution/median	...	frequent_items/frequent_strings	type	types/boolean	types/fractional	types/integral	types/object	types/string	types/tensor	ints/max	ints/min
column
animal	3.0	3.0	3.00015	0	4	0	0	NaN	0.000	NaN	...	[FrequentItem(value='cat', est=2, upper=2, low...	SummaryType.COLUMN	0	0	0	0	4	0	NaN	NaN
legs	3.0	3.0	3.00015	0	4	0	0	4.0	2.500	4.0	...	[FrequentItem(value='4', est=2, upper=2, lower...	SummaryType.COLUMN	0	0	4	0	0	0	4.0	0.0
weight	4.0	4.0	4.00020	0	4	0	0	4.3	2.875	4.1	...	NaN	SummaryType.COLUMN	0	4	0	0	0	0	NaN	NaN

3 rows × 31 columns

This will provide you with valuable statistics on a column (feature) basis, such as:

Counters, such as number of samples and null values
Inferred types, such as integral, fractional and boolean
Estimated Cardinality
Frequent Items
Distribution Metrics: min,max, median, quantile values

Writing to Disk#

You can also store your profile in disk for further inspection:

[7]:

why.write(profile, "profile.bin")

This will create a profile binary file in your local filesystem.

Reading from Disk#

You can read the profile back into memory with:

[8]:

n_prof = why.read("profile.bin")

Note: write expects a profile as parameter, while read returns a Profile View. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.

What’s Next?#

There’s a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!

Basic
- Visualizing Profiles - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
- Logging Data - See the different ways you can log your data with whylogs
- Inspecting Profiles - A deeper dive on the metrics generated by whylogs
- Schema Configuration for Tracking Metrics - Configure tracking metrics according to data type or column features
- Data Constraints - Set constraints to your data to ensure its quality
- Merging Profiles - Merge your profiles logged across different computing instances, time periods or data segments
Integrations
- WhyLabs - Monitor your profiles continuously with the WhyLabs Observability Platform
- Pyspark - Use whylogs with pyspark
- Writing Profiles - See different ways and locations to output your profiles
- Flask - See how you can create a Flask app with whylogs and WhyLabs integration
- Feature Stores - Learn how to log features from your Feature Store with feast and whylogs
- BigQuery - Profile data queried from a Google BigQuery table
- MLflow - Log your whylogs profiles to an MLflow environment

Or go to the examples page for the complete list of examples!