Spark Structured Streaming Workshop (Apache Spark 2.3)

Spark Structured Streaming (aka Spark Streams) is the stream processing module in Apache Spark that offers a high-level declarative streaming Dataset API built on top of Spark SQL and allowing for continuous incremental execution of structured queries.

In this intensive one-day hands-on workshop, you will learn how to develop end-to-end continuous distributed streaming applications using Spark Structured Streaming. Once you complete this workshop, you will have a in-depth understanding of the origin, architecture and building blocks of Spark Structured Streaming, including (but not limited to) the following:

Develop and execute your own streaming applications
Explore available streaming sources and sinks
Use Apache Kafka as a data source and a sink
Understand output modes
Learn how to monitor streaming queries
Use web UI for monitoring and performance tuning
Use dropDuplicates operator for streaming deduplication (with state)
Explain streaming query plans
Apply groupBy and groupByKey operators for streaming aggregations
Use window function for aggregation
Use event time streaming watermark to handle late events
Use flatMapGroupsWithState operator for arbitrary stateful streaming aggregation (with explicit state logic)

The programming language of the workshop is Scala (with Python or Java being acceptable yet posing some mental challenge for the trainer).

The version of Apache Spark is 2.3.0 (or later once released).

Prerequisities / Recommended Background

After completing the workshop participants should be able to:

Experience with the basic concepts of Scala language (or Java or Python)
Familiarity with Spark SQL concepts like DataFrame and Dataset
Familiarity using the command line and spark-shell in particular

Duration

1 days

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-structured-streaming-workshop.md

spark-structured-streaming-workshop.md

Spark Structured Streaming Workshop (Apache Spark 2.3)

Prerequisities / Recommended Background

Duration

Files

spark-structured-streaming-workshop.md

Latest commit

History

spark-structured-streaming-workshop.md

File metadata and controls

Spark Structured Streaming Workshop (Apache Spark 2.3)

Prerequisities / Recommended Background

Duration