When we talk about Big Data and Hadoop in particular, we generally have one of the existing distributions from Cloudera, Hortonworks or other Big Data companies in mind. But sometimes, a pre-built distro just does not meet the needs. In this episode, we have a guest on the show that explains why they made the choice to forgo the available distributions in favour of building ones own.
00:00 Recent events
- Which tool should I use?
- YaRrr! – The Pirate’s guide to R
- Blog: http://nathanieldphillips.com/thepiratesguidetor/
- YaRrr! – Download the book:
- Video tutorials to go with the above:
- Listener Question from Sampath from Baltimore:
- When moving into a career in Big Data, is it better to pick a technology like Spark and try to build expertise on it versus having a broader knowledge on many tools. I registered for Edx courses and working towards getting Cloudera Certification. Please provide me any advice.
- More accountability for big-data algorithms
- 6 Illusions Execs Have About Big Data
- Hadoop release 3.0.0-alpha1 available
- Running Spark on Alluxio with S3
47:00 The pro’s and con’s of crafting your own distribution
With our special guest Michele Lamarca (@nonfacciocip).
Many thanks to Michele for being on the podcast with us and sharing his experiences!