SQL was one of the first data access methods added to vanilla Hadoop. Considering that the many of the people working with Hadoop in the early days came from a database background, this is not surprising. Since then, the SQL ecosystem in Hadoop has grown considerably and in this episode we do a general overview of many of the available choices.This episode runs a bit longer than normal but we hope you’ll find it worthwhile!
Podcast: Play in new window | Download (Duration: 53:38 — 30.9MB)
Subscribe: Apple Podcasts | Spotify | RSS | More
00:00 Recent events
- Spark masterclasses
- NiFi on trains
- Mifid II and the active archive
- World Mobile Congress
08:30 Main Topic
SQL solutions:
- Apache Hive
- Apache Spark Sql
- Apache Phoenix
- Apache Impala (incubating)
- Apache Hawq (incubating)
- Apache Drill
- Presto
- Oracle Big Data Sql
- IBM BigSql
Technology topics:
- JDBC/ODBC
- SQL syntax compliance
- Multi-user concurrency
- Benchmarks
46:40 Questions from our Listeners:
- How much storage overhead should I count on if I add SQL in my Hadoop workflow?
- How do I make my sql faster?
53:38 End
Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.