Welcome to "The Internals Of" Online Books project! 🤙
I'm Jacek Laskowski, a Freelance Data(bricks) Engineer specializing in Apache Spark (incl. Spark SQL and Spark Structured Streaming), Delta Lake, Databricks, and Apache Kafka (incl. Kafka Streams) with brief forays into a wider data engineering space (e.g., Trino, Dask and dbt, mostly during Warsaw Data Engineering meetups).
I'm very excited to have you here and hope you will enjoy exploring the internals of the open source projects together (in no particular order):
- Apache Spark
- Spark SQL
- Unity Catalog
- Spark Structured Streaming
- Delta Lake
- Spark on Kubernetes
- PySpark
- Apache Kafka (previously at gitbooks.io)
- Kafka Streams (previously at gitbooks.io)
- ksqlDB (no longer maintained)
- Apache Beam (no longer maintained)
- Spark Standalone (no longer maintained)
Please note that some books have less current content than others, but that's expected with a one-person project where so many things are truly interesting and thus time-consuming. Life's too short to taste everything :/
The aim of this project is to host all the current and future internals books under a single organization on GitHub and publish to a single domain via GitHub Pages (until I find a better way to publish the books).
The books projects use a custom Docker image.
The official Docker image does not include all plugins the books need as well as is no longer available.
See build-image.sh shell script to learn more.
Execute the build-image.sh shell script to build the Docker image.
Use docker run
command with build
argument to build a book.
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
jaceklaskowski/mkdocs-material-insiders \
build --clean
TIP: Consult the Material for MkDocs documentation to get started.
Use docker run
command with serve
argument (with --dirtyreload
for faster reloads) in the project root (the folder with mkdocs.yml).
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
jaceklaskowski/mkdocs-material-insiders \
serve --dirtyreload --verbose --dev-addr 0.0.0.0:8000
Run an interactive shell in a container.
docker run \
--rm \
-it \
-p 8000:8000 \
-v ${PWD}:/docs \
--entrypoint sh \
jaceklaskowski/mkdocs-material-insiders
While inside, execute the following command to list outdated packages, and show the latest version available (as described here).
python -m pip list --outdated