2. Hadoop as an application framework
■ Hadoop makes a great solution for web scale data
warehousing and batch data processing
■ But there’s more to hadoop than big analytics queries
alone
■ Cloudera Hadoop offers a compelling foundation for
building and deploying large scale distributed internet
applications
3. Anatomy of a Hadoop-backed internet application
Cache service processing tier Web UI service API service
HBase database
service
Spark batch
SolrCloud search
service
HDFS filesystem
4. Hadoop as an operating system
■ Hadoop offers many of the foundation components you need to
build web scale applications:
■ Message queues [Kafka]
■ Stream processing [Spark]
■ Batch processing [Spark, MapReduce]
■ Database [Hbase]
■ Search [SolrCloud]
■ Storage [HDFS]
5. Cloudera Manager integration
■ You can use Cloudera Manager to deploy, operate, monitor
and alert on your service’s custom components
■ Stuff like Memcached, your APIs, and your web UI
■ https://github.com/cloudera/cm_ext/wiki
■ You package your custom services components as parcels
for distribution across the cluster
■ A robust framework for packaging, versioning and upgrades
6. Hadoop cluster
Component view
Web 1 Web 2
Spark 1 Spark 2 Spark N SolrCloud 1
Hadoop Master
Hbase 1 Hbase 2 Hbase N SolrCloud N
/ CM
Network
Services
Hadoop Master
/ CM
Firewall
Memcached 1 Memcached N Services
Distribution
Server
7. Deployment considerations
■ You will still need the support of foundation network services like DNS, NTP and
firewall
■ You may still need to deploy HAProxy - can be on nodes in the hadoop cluster with
floating IP
■ http://blog.cloudera.com/blog/2013/08/how-to-achieve-higher-availability-for-hue/
■ Use Linux Control Groups (CGroups) to guarantee resource shares - configure from CM
■ http://www.cloudera.
com/content/cloudera/en/documentation/core/latest/topics/cm_mc_cgroups.
html
8. Wrap up
■ By extending Cloudera Manager, Hadoop can be used to
build, deploy and operate complete, web-scale
applications in a consistent and predictable way
■ Hadoop can offer much more than data warehousing
alone
■ But still a little way to go until Hadoop becomes a fully
fledged Data Centre scale OS