Episode 5 – An introduction to Spark

Apache Spark logo
In this episode we’ll cover the basics of Apache Spark, including typical deployment situations, architecture and usage.
 

00:00 Recent events

  • Seasons Greetings!
  • Jhon shamelessly plugs his mini cluster build
  • Apache Mesos
  • Amazon IoT solution

05:28 Main Topic

  • Who would use Apache Spark, why would you use it, where would you use it
  • Apache Spark Architecture
  • Apache Spark Components
  • Apache Spark MLlib
  • Apache Spark gotcha’s
  • Typical use cases for Apache Spark

28:20 Questions from our Listeners:

  •  
  • What happens if all my data does not fit in memory?
  • What is the security like for Spark?
  • Why Spark on Hadoop instead of standalone
  • Python, Scala, Java or something else for Spark?
  • Can I access data on HDFS or local disk from my Spark script?

37:50 End


Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.