Episode 5 – An introduction to Spark

In this episode we’ll cover the basics of Apache Spark, including typical deployment situations, architecture and usage.

00:00 Recent events

  • Seasons Greetings!
  • Jhon shamelessly plugs his mini cluster build
  • Apache Mesos
  • Amazon IoT solution

05:28 Main Topic

  • Who would use Apache Spark, why would you use it, where would you use it
  • Apache Spark Architecture
  • Apache Spark Components
  • Apache Spark MLlib
  • Apache Spark gotcha’s
  • Typical use cases for Apache Spark

28:20 Questions from our Listeners:

  • What happens if all my data does not fit in memory?
  • What is the security like for Spark?
  • Why Spark on Hadoop instead of standalone
  • Python, Scala, Java or something else for Spark?
  • Can I access data on HDFS or local disk from my Spark script?

37:50 End


Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Jhon Masschelein

Author: Jhon Masschelein

Tackler of advanced Cloud and Hadoop challenges in a world of open-source technologies. – Impossible is merely a matter of time and effort. –