1. Apache Hadoop HBase
● What is it ?
● Why use it ?
● Architecture
● Storage
● Related Projects
2. Hbase – What is it ?
● A Hadoop Data Store
● A noSQL store for big data
● It is Open Source, written in Java
● It is a distributed database
● Automatic sharding, table data spread over cluster
● Automatic region server fail over
3. Hbase – Why / When use it ?
● Data in billions of rows
● Complex data
● High volume of I/O
● High level of data nodes, 5 +
● No need for extra RDBMS functions i.e. transactions
5. HBase – Architecture
● HBase is a data store
● Uses Hadoop for distributed storage
● Data stored across region servers
● Region server data spread across HDFS data nodes
● A write ahead log (WAL) is used to record changes
7. HBase – Storage
● Client makes call i.e. put
● Request RPC'ed as key value to Region server
● Key Value routed to region for row
● Data is written to WAL
● Data written to region memStore
● If region server cashes WAL can be used to recover data
8. HBase – Related Projects
● Apache Flume – move large data sets to Hadoop
● Apache Sqoop – cmd line, move rdbms data to Hadoop
● Apache Hbase – Non relational database
● Apache Pig – analyse large data sets
● Apache Oozie – work flow scheduler
● Apache Mahout – machine learning and data mining
● Apache Hue – Hadoop user interface
● Apache Zoo Keeper – configuration / build
9. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems
10. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems