TRAINING

Spark in Scala

16h.

This course offers an in-depth introduction into distributed programming with Apache Spark, making use of Scala - the language used in the implementation of Spark itself, and the best way to make the most of it. It focuses on learning the fundamentals of Apache Spark computational model, and the course contents are explained through interactive examples. It will also provide insights on how to analyze the program's performance using SparkUI and how to make basic optimizations through practical exercises.

AUDIENCE

  • Programmers with basic Scala knowledge interested in making the most of Spark using its language of choice
  • Spark programmers in Java or Python willing to start using the framework using Scala

COURSE OUTLINE (16 hours)

MODULE 1. Computational model

  • Transformations and actions; jobs, stages and tasks
  • Cluster managers: Yarn, Standalone, Mesos
  • Driver and executors; SparkUI

MODULE 2. Spark APIs

  • Spark languages: SparkSQL, RDDs, ML, GraphX
  • Dataset: Statically typed
  • DataFrame: Dynamically unsafe
  • Datasets vs DataFrames

MODULE 3. Reading and writing in Spark

  • Files: JSON, Parquet
  • Databases: JDBC, NoSQL

MODULE 5. Patterns and antipatterns

  • Memory
  • Serialization issues
  • Caching
  • Tasks that never finish

Trainers

Info

16h.

Location

Your premises / Our facilities at Parque Científico Universidad Carlos III de Madrid

Do you need a tailor-made course?

Contact us if you need tailor-made training for you or your development team. We create an exclusive content for your company

Request your own course

Start an awesome project with us