When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own.
Podcast: Play in new window | Download (Duration: 1:34:59 — 54.6MB)
Subscribe: Apple Podcasts | Spotify | RSS | More
00:00 Recent events
- Dave:
- Which tool should I use?
- YaRrr! – The Pirate’s guide to R
- Blog: http://nathanieldphillips.com/thepiratesguidetor/
- YaRrr! – Download the book:
https://drive.google.com/file/d/0B4udF24Yxab0S1hnZlBBTmgzM3M/view - Video tutorials to go with the above:
https://www.youtube.com/playlist?list=PL9tt3I41HFS9gmeZFEuNrnu_7V_NFngfJ
- Listener Question from Sampath from Baltimore:
- When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice.
- Jhon:
- More accountability for big-data algorithms
- 6 Illusions Execs Have About Big Data
- Michele:
- Hadoop release 3.0.0-alpha1 available
- Running Spark on Alluxio with S3
47:00 The pro’s and con’s of crafting your own distribution
With our special guest Michele Lamarca (@nonfacciocip).
Many thanks to Michele for being on the podcast with us and sharing his experiences!
01:34:59 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.