Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

In this hands-on tutorial, you will learn the fundamentals of analyzing massive datasets with real-world examples on actual powerful machines on a public cloud -- starting from how the data is stored and read, to how it is processed and visualized. You will understand how large-scale analysis differs from local workflows, the unique challenges associated with scale, and some best practices to work productively with your data.

Setup for SciPy 2024

You can use Nebari (JupterHub) hosted at scipy.quansight.dev to follow along with this tutorial.

Follow this participant's guide to register & sign-in (re-register if you used for a different tutorial), select the Medium Instance in the Server Options, and click on the "Data of an Unusual Size" card in the JupyterLab launcher to clone the materials.

In the tutorials/big-data-tutorial folder that's created with all material, navigate to 00-introduction.ipynb.

The environment for this tutorial is scipy-scipy-data-of-unusual-size, and it is automatically selected for you. :)

Live presentations

SciPy US 2024 (Upcoming!)
SciPy US 2023
PyCon US 2023

You can check out the tags for previous versions of this tutorial.

This repository is covered by the Nebari Code of Conduct, and is under BSD 3-Clause license.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
images		images
prep		prep
.gitignore		.gitignore
00-introduction.ipynb		00-introduction.ipynb
01-analyze-with-pandas.ipynb		01-analyze-with-pandas.ipynb
02-intro-to-hvplot.ipynb		02-intro-to-hvplot.ipynb
03-intro-to-dask.ipynb		03-intro-to-dask.ipynb
04-storage-formats.ipynb		04-storage-formats.ipynb
05-big-data-analysis-with-dask.ipynb		05-big-data-analysis-with-dask.ipynb
06-big-data-visualization.ipynb		06-big-data-visualization.ipynb
07-big-data-dashboards.ipynb		07-big-data-dashboards.ipynb
08-big-data-application-pipeline.ipynb		08-big-data-application-pipeline.ipynb
09-collaborative-data-science.ipynb		09-collaborative-data-science.ipynb
10-polars.ipynb		10-polars.ipynb
11-duckdb.ipynb		11-duckdb.ipynb
12-conclusion.ipynb		12-conclusion.ipynb
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

Setup for SciPy 2024

Live presentations

About

Releases

Packages

Contributors 3

Languages

nebari-dev/big-data-tutorial

Folders and files

Latest commit

History

Repository files navigation

Data of an Unusual Size: A practical guide to analysis and interactive visualization of massive datasets

Setup for SciPy 2024

Live presentations

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages