Scala & Apache Spark
11 Modules ~36 hours Intermediate → Advanced
Master Scala 3 + Apache Spark 3+ for distributed data processing: from functional programming foundations to multi-terabyte ETL, structured streaming, MLlib, and tuning Spark on Kubernetes.
Course roadmap
| # | Module | Status | Topics |
|---|---|---|---|
| 0 | Setup & Hello Spark | Plan ready | Install Scala 3, sbt, Spark in Docker, first DataFrame |
| 1 | Scala Fundamentals | Plan ready | Types, immutability, case classes, pattern matching |
| 2 | Functional Programming | Plan ready | Higher-order functions, map/flatMap, Option/Either, type classes |
| 3 | Spark Core (RDDs) | Plan ready | RDD API, transformations vs actions, lineage, persist/cache |
| 4 | Spark SQL & DataFrames | Plan ready | DataFrame API, schema, Catalyst optimizer, joins |
| 5 | Datasets & Encoders | Plan ready | Type-safe API, Encoders, performance tradeoffs |
| 6 | Spark Streaming | Plan ready | Structured Streaming, watermarks, exactly-once, Kafka source |
| 7 | MLlib | Plan ready | Pipelines, transformers, estimators, model selection |
| 8 | Tuning & Optimization | Plan ready | Partitioning, shuffles, broadcast joins, AQE, skew handling |
| 9 | Production Spark | Plan ready | Spark on Kubernetes, dynamic allocation, Spark Operator, monitoring |
| 10 | Capstone | Plan ready | Build a streaming ETL: Kafka → Spark Streaming → Iceberg/Delta Lake |
What's available now
Curriculum plan published. Content rolling out 2026 H2.
Related courses:
- aws-data-engineering — AWS-native data lakes
- Kubernetes — run Spark on K8s
- Python — PySpark alternative
Last updated
2026-05 — Curriculum plan published.