TRAINING

Advanced Spark programming in Scala

8h.

Learn how to make the most of the Spark optimizations that the framework offers for free, and learn how to manually optimize and organize your Spark code to make it more robust and performance, in those situations where the framework is not smart enough.

AUDIENCE

  • Programmers who are familiar with the basic traits of Spark programming and need to get acquainted with the nuts and bolts of the framework

COURSE OUTLINE (8 hours)

 

MODULE 1. Spark optimizations 

  • Datasets vs DataFrames optimizations
  • Optimized file formats vs non-optimized
  • The standard Catalog API

MODULE 2. Best practices on performance & modular design

  • Partitioning issues: Unpartitioned data and over-partitioning
  • Fixing memory problems
  • How to solve serialization issues
  • Caching: when it improves your process, and when is extra work
  • Tasks that never finish:  detect why this is happening

Workflow structure: design patterns to properly modularize your ETLs, and improve testability

Trainers

Info

8h.

Location

Your premises / Our facilities at Parque Científico Universidad Carlos III de Madrid

Do you need a tailor-made course?

Contact us if you need tailor-made training for you or your development team. We create an exclusive content for your company

Request your own course

Start an awesome project with us