A new function is being called into being by Forrester called the “Data Hunter” which sounded interesting enough to us to spend some time on. Then we cover a nice guest blog on the Cloudera site and we finish off with some rambling on he changes in the HPC world. Enjoy!
Loincloths and spears to the ready: the Data Hunter is born!
Now we did not pony up the 300$ they charge for the webinar, but we found the concept quite compelling and looked at the three “audience questions” that were included in the article.
The “Small File Problem” and a little “You’re Doing it Wrong”…?
This guest blog on the Cloudera web site actually has some practical information that can be useful when you need to consolidate your incremental upload files to reduce the amount of files your Hive queries need to traverse. The additional complexity here was that this had to happen on a live production environment without service interruption and keeping all data available and sane.
We do however need to remark that the author of this article was making life quite difficult for himself since his “data estate” really does not seem to justify the use of any kind of Big Data technologies. We fully agree with his own summary where he states that using a standard RDBMS would most likely be a better solution…
Should “HPC” now be spelled “HPE”???
With the Enterprise branch of HP gobbling up Cray, after doing the same with whatever remained of Silicon Graphics way back in 2016, theynow represent a large percentage of what could be considered “traditional HPC”. Of course, IBM is still in there too, but not much of the old supercomputer firms are still around.
Of course, the whole HPC world is undergoing a major redesign towards GPU’s (and to a lesser extent FPGA’s) so it does make sense that the ecosystem is changing…?
And that’s all we have for this episode. See you next week! Don’t forget to subscribe to our YouTube channel and consider becoming a Patreon and support your favorite podcast!