Great Expectations

Always know what to expect from your data.

Important announcements regarding our upcoming 1.0 release

We’re planning a ton of work to take GX OSS to the next level as we officially graduate it to 1.0!

Our biggest goal is to improve the user and contributor experiences by streamlining the API, based on the feedback we’ve gotten from the community (thank you!) over the years.

Learn more about our plans for 1.0 and how we’ll be making this transition in our blog post.

As we gear up for the launch of our 1.0 release early next year, we want to share an important update regarding our current development process.

Temporary hold on PRs

We’re temporarily pausing the acceptance of new pull requests (PRs). We’re going to be updating the API and codebase frequently and significantly over the next few months—we don’t want contributors to spend time and effort only to find that we’ve just implemented a breaking change for their work.

Looking forward

We deeply value the contributions and engagement of our community. Please hold onto your fantastic ideas and PRs until after the 1.0 release, when we will be excited to resume accepting them. We appreciate your understanding and support as we make this final push toward this exciting milestone. Please watch for updates in our slack community, and thank you for being a crucial part of our journey!

What is GX?

Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing, documentation, and profiling.

Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.

See Down with Pipeline Debt! for an introduction to our philosophy of pipeline data quality testing.

Key features

Seamless operation

GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to perfectly meet your data quality needs.

Start fast

Get useful results quickly even for large data volumes. GX’s Data Assistants provide curated Expectations for different domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines. Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.

Unified understanding

Expectations are GX’s workhorse abstraction: each Expectation declares an expected state of the data. The Expectation library provides a flexible, extensible vocabulary for data quality—one that’s human-readable, meaningful for technical and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing exactly what you expect from your data.

expect_column_values_to_not_be_null
expect_column_values_to_match_regex
expect_column_values_to_be_unique
expect_column_values_to_match_strftime_format
expect_table_row_count_to_be_between
expect_column_median_to_be_between
...and many more

Secure and transparent

GX doesn’t ask you to exchange security for your insight. It processes your data in place, on your systems, so your security and governance procedures can maintain control at all times. And because GX’s core is and always will be open source, its complete transparency is the opposite of a black box.

Data contracts support

Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data from moving further in your pipelines.

Readable for collaboration

Everyone stays on the same page about your data quality with GX’s inspectable, shareable, and human-readable Data Docs. You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.

Quick start

To see Great Expectations in action on your own data:

You can install it using pip

pip install great_expectations

and then run

import great_expectations as gx

context = gx.get_context()

(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources, which will teach you how to get up and running in minutes.)

For full documentation, visit https://docs.greatexpectations.io/.

If you need help, hop into our Slack channel—there are always contributors and other users there.

Integrations

Great Expectations works with the tools and systems that you're already using with your data, including:

Integration		Notes
	DataHub	Data Catalog
	AWS Glue	Data Integration
	Athena	Data Source
	AWS Redshift	Data Source
	AWS S3	Data Source
	BigQuery	Data Source
	Databricks	Data Source
	Deepnote	Collaborative data notebook
	Google Cloud Platform (GCP)	Data Source
	Microsoft Azure Blob Storage	Data Source
	Microsoft SQL Server	Data Source
	MySQL	Data Source
	Pandas	Data Source
	PostgreSQL	Data Source
	Snowflake	Data Source
	Spark	Data Source
	SQLite	Data Source
	Trino	Data Source
	Apache Airflow	Orchestrator
	Flyte	Orchestrator
	Meltano	Orchestrator
	Prefect	Orchestrator
	ZenML	Orchestrator
	Slack	Plugin
	Jupyter Notebooks	Utility

What is GX not?

Great Expectations is not a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools like Spark, Airflow, dbt , prefect, dagster , Kedro, Flyte, etc. GX carries out your data quality pipeline testing while these tools execute the pipelines.

Great Expectations is not a database or storage software. It processes your data in place, on your existing systems. Expectations and Validation Results that GX produces are metadata about your data.

Great Expectations is not a data versioning tool. If you want to bring your data itself under version control, check out tools like DVC, Quilt, and lakeFS.

Great Expectations is not a language-agnostic platform. Instead, it follows the philosophy of “take the compute to the data” by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy), and Spark environments.

Great Expectations is not exclusive to Python programming environments. It can be invoked from the command line without a Python environment. However, if you’re working into another ecosystem, you may want to explore ecosystem-specific alternatives such as assertR (for R environments) or TFDV (for Tensorflow environments).

Who maintains Great Expectations?

Great Expectations OSS is under active development by GX Labs and the Great Expectations community.

What's the best way to get in touch with the Great Expectations team?

If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our public Slack channel or post in our Discourse.

Can I contribute to the library?

Absolutely. Yes, please. See Contributing code , Contributing Expectations , Contributing packages , or Contribute to Great Expectations documentation , and please don't be shy with questions.

How do I stay up to date with Great Expectations?

You can get updates on everything GX with our email newsletter. Subscribe here!

Name		Name	Last commit message	Last commit date
Latest commit History 11,850 Commits
.github		.github
assets		assets
ci		ci
contrib		contrib
docker		docker
docs		docs
docs_rtd		docs_rtd
examples		examples
great_expectations		great_expectations
reqs		reqs
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
CODEOWNERS		CODEOWNERS
CONTRIBUTING_CODE.md		CONTRIBUTING_CODE.md
CONTRIBUTING_EXPECTATIONS.md		CONTRIBUTING_EXPECTATIONS.md
CONTRIBUTING_PACKAGES.md		CONTRIBUTING_PACKAGES.md
CONTRIBUTING_WORKFLOWS.md		CONTRIBUTING_WORKFLOWS.md
IDE_SETUP_TIPS.md		IDE_SETUP_TIPS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
constraints-dev.txt		constraints-dev.txt
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-types.txt		requirements-types.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tasks.py		tasks.py
versioneer.py		versioneer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Great Expectations

Important announcements regarding our upcoming 1.0 release

Temporary hold on PRs

Looking forward

What is GX?