🚩 Create a free WhyLabs account to get more value out of whylogs!
Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!
Getting Started#
whylogs provides a standard to log any kind of data.
With whylogs, we will show how to log data, generating statistical summaries called profiles. These profiles can be used in a number of ways, like:
Data Visualization
Data Validation
Tracking changes in your datasets
Table of Content#
In this example, we’ll explore the basics of logging data with whylogs:
Installing whylogs
Profiling data
Interacting with the profile
Writing/Reading profiles to/from disk
Installing whylogs#
whylogs is made available as a Python package. You can get the latest version from PyPI with pip install whylogs
:
[1]:
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
Minimal requirements:
Python 3.7+ up to Python 3.10
Windows, Linux x86_64, and MacOS 10+
Loading a Pandas DataFrame#
Before showing how we can log data, we first need the data itself. Let’s create a simple Pandas DataFrame:
[8]:
import pandas as pd
data = {
"animal": ["cat", "hawk", "snake", "cat"],
"legs": [4, 2, 0, 4],
"weight": [4.3, 1.8, 1.3, 4.1],
}
df = pd.DataFrame(data)
Profiling with whylogs#
To obtain a profile of your data, you can simply use whylogs’ log
call, and navigate through the result to a specific profile with profile()
:
[3]:
import whylogs as why
results = why.log(df)
profile = results.profile()
Analyzing Profiles#
Once you’re done logging the data, you can generate a Profile View
and inspect it in a Pandas Dataframe format:
[9]:
prof_view = profile.view()
prof_df = prof_view.to_pandas()
prof_df
[9]:
cardinality/est | cardinality/lower_1 | cardinality/upper_1 | counts/inf | counts/n | counts/nan | counts/null | distribution/max | distribution/mean | distribution/median | ... | frequent_items/frequent_strings | type | types/boolean | types/fractional | types/integral | types/object | types/string | types/tensor | ints/max | ints/min | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
column | |||||||||||||||||||||
animal | 3.0 | 3.0 | 3.00015 | 0 | 4 | 0 | 0 | NaN | 0.000 | NaN | ... | [FrequentItem(value='cat', est=2, upper=2, low... | SummaryType.COLUMN | 0 | 0 | 0 | 0 | 4 | 0 | NaN | NaN |
legs | 3.0 | 3.0 | 3.00015 | 0 | 4 | 0 | 0 | 4.0 | 2.500 | 4.0 | ... | [FrequentItem(value='4', est=2, upper=2, lower... | SummaryType.COLUMN | 0 | 0 | 4 | 0 | 0 | 0 | 4.0 | 0.0 |
weight | 4.0 | 4.0 | 4.00020 | 0 | 4 | 0 | 0 | 4.3 | 2.875 | 4.1 | ... | NaN | SummaryType.COLUMN | 0 | 4 | 0 | 0 | 0 | 0 | NaN | NaN |
3 rows × 31 columns
This will provide you with valuable statistics on a column (feature) basis, such as:
Counters, such as number of samples and null values
Inferred types, such as integral, fractional and boolean
Estimated Cardinality
Frequent Items
Distribution Metrics: min,max, median, quantile values
Writing to Disk#
You can also store your profile in disk for further inspection:
[7]:
why.write(profile, "profile.bin")
This will create a profile binary file in your local filesystem.
Reading from Disk#
You can read the profile back into memory with:
[8]:
n_prof = why.read("profile.bin")
Note:
write
expects a profile as parameter, whileread
returns aProfile View
. That means that you can use the loaded profile for visualization purposes and merging, but not for further tracking and updates.
What’s Next?#
There’s a lot you can do with the profiles you just created. Keep getting your hands dirty with the following examples!
Basic
Visualizing Profiles - Compare profiles to detect distribution shifts, visualize histograms and bar charts and explore your data
Logging Data - See the different ways you can log your data with whylogs
Inspecting Profiles - A deeper dive on the metrics generated by whylogs
Schema Configuration for Tracking Metrics - Configure tracking metrics according to data type or column features
Data Constraints - Set constraints to your data to ensure its quality
Merging Profiles - Merge your profiles logged across different computing instances, time periods or data segments
Integrations
WhyLabs - Monitor your profiles continuously with the WhyLabs Observability Platform
Pyspark - Use whylogs with pyspark
Writing Profiles - See different ways and locations to output your profiles
Flask - See how you can create a Flask app with whylogs and WhyLabs integration
Feature Stores - Learn how to log features from your Feature Store with feast and whylogs
BigQuery - Profile data queried from a Google BigQuery table
MLflow - Log your whylogs profiles to an MLflow environment
Or go to the examples page for the complete list of examples!