Advanced Spark programming in Scala


Learn how to make the most of the Spark optimizations that the framework offers for free, and learn how to manually optimize and organize your Spark code to make it more robust and performance, in those situations where the framework is not smart enough.


  • Programmers who are familiar with the basic traits of Spark programming and need to get acquainted with the nuts and bolts of the framework



MODULE 1. Spark optimizations 

  • Datasets vs DataFrames optimizations
  • Optimized file formats vs non-optimized
  • The standard Catalog API

MODULE 2. Best practices on performance & modular design

  • Partitioning issues: Unpartitioned data and over-partitioning
  • Fixing memory problems
  • How to solve serialization issues
  • Caching: when it improves your process, and when is extra work
  • Tasks that never finish:  detect why this is happening

Workflow structure: design patterns to properly modularize your ETLs, and improve testability





Your premises / Our facilities at Parque Científico Universidad Carlos III de Madrid

Do you need a tailor-made course?

Contact us if you need tailor-made training for you or your development team. We create an exclusive content for your company

Request your own course

Start an awesome project with us