IDG Contributor Network: The siren song of Hadoop

Click here to visit Original posting

Hadoop seems incredibly well-suited to shouldering machine-learning workloads. With HDFS you can store both structured and unstructured data across a cluster of machines, and SQL-on-Hadoop technologies like Hive make those structured data look like database tables. Execution frameworks like Spark let you distribute compute across the cluster as well. On paper, Hadoop is the perfect environment for running compute-intensive distributed machine learning algorithms across a vast amount of data.

Unfortunately, though, Hadoop seems incredibly well-suited for a lot of other things too. Streaming data? Storm and Flink! Security? Kerberos, Sentry, Ranger, and Knox! Data movement and message queues? Flume, Sqoop, and Kafka! SQL? Hive, Impala and Hawq! The Hadoop ecosystem has become a bag of often overlapping and competing technologies. Cloudera vs. Hortonworks vs. MapR is responsible for some of this, as is the dynamism of the open source community.

To read this article in full or to leave a comment, please click here

IDG Contributor Network: The siren song of Hadoop

IPVanish’s malware protection confirmed among the best on the market

Is AI bad for music or is it just another step in the auto-tune timeline?

Is AI bad for music or is it just another step in the auto-tune timeline?

Samsung’s latest smartphone has a very simple feature that no other Samsung phone offers right now