12. Layers on Top September 2008 Pig ( from Pig Latin) MapReduce query language Hive SQL against the data (facebook) HBase non-relational database Mahout Machine Learning Distributed Lucene Search over HDFS Hama Mathematics
17. Around Hadoop ... with SmartFrog now Hardware Hadoop Vertical applications Management, Monitoring, Virtualization
18.
Editor's Notes
September 2008
This is one of Doug Cutting's kid's toys. It is what Hadoop is named after.
Lots of people have those google "our other computer is a datacentre" stickers. We just know where ours is and what runs on it. Soon: our machines will have hadoop stickers on them.
search for the google mapreduce paper
Everyone talks about word counting and click logs, here's something more fun. mapping out all devices with the same ID, debouncing, building up stats. Or by time: which are peak times; are there special groups (school, college students) who can be identified by timings and dates of travel?
The namenode really matters -lose that and the cluster is gone Lose the job tracker and all ongoing work is lost, currently the entire job chain needs to be restarted. But the rest of the cluster says live.
This is how applications used to be written. An App server driving a cluster, a database (Oracle, IBM DB/2 or something else) behind it all. An O/R mapping infrastructure, either entity beans or spring/hibernate to make Java classes persistent, and JSP front ends. On the side: Message beans for queued operations, Corba IIOP or Java WS-* to talk to other parts of the enterprise.
Here are where things get interesting
We may have our own cluster, but it isnt a stable one. We need to adapt to what gets provided to us.