Episode 58 – Big Data Roles: The data scientist

In this entry in our long-running “roles in Big Data” series, we talk to Eduardo Barbaro, a Sr. Data Scientist at Mobiquity. To say that the data scientist is a pivotal person in any big data or advanced analytics project is not an exaggeration and we are really grateful to Eduardo for spending some time on the podcast to give us his views and recount his experiences.

Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2

In this second part of Dave’s tale of the Sidney Dataworks Summit, the subjects range from Apache Metron, a talk by Telstra, Australia’s leading mobile provider, Yarn 3.0 and Apache Zeppelin Continue reading “Episode 57 – Dataworks Summit Sydney recap by Dave – Part 2”

Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1

Dave has attended the Dataworks Summit in Sidney and we go over the different sessions he attended there. In this first of two episodes, the focus lies on the new goodness that Hadoop 3.0 will bring us soon. Continue reading “Episode 56 – Dataworks Summit Sydney recap by Dave – Part 1”

Episode 55 – Roaring News

In this edition of Roaring News, Dave covers the release of Apache Metron based HCP 1.3 and an HBase vs Cassandra benchmark battle. Jhon talks about some Spark tuning and scheduler inner-workings and finishes with a tale of a compliance kettle… Continue reading “Episode 55 – Roaring News”

Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones

In this episode, we took an online article by Chris Riccomini and give our take on the discussion on having a single big cluster versus many smaller ones. If you are architecting a Hadoop cluster and are faced with this choice, this episode should give you a lot of information on the subject. Continue reading “Episode 54 – Hadoop sizing part 1: One big cluster, or many small ones”