Firebird meets NoSQL (Apache HBase)     Case Study     Firebird Conference 2011 – Luxembourg     25.11.2011 – 26.11.2011  ...
ATTENTION – Think BIG Source: http://www.telovation.com/articles/gallery-old-computers.html© Software Competence Center Ha...
ATTENTION – Think BIGGER Source: Google pictures search engine© Software Competence Center Hagenberg GmbH   3
Agenda       Big Data Trend       Scalability Challenges       Apache Hadoop/HBase       Firebird meets NoSQL – Case study...
Big Data Trend       Data volume grows dramatically             40% up to 60% per year is common       What is big?       ...
Big Data Trend The Three Vs                                                                          VOLUME        Big Dat...
Big Data Trend Typical application domains        Sensor networks        Social networks        Search engines        Heal...
Agenda       Big Data Trend       Scalability Challenges       Apache Hadoop/HBase       Firebird meets NoSQL – Case study...
Scalability Challenges RDBMS scalability issues       How do you tackle scalability issues in your RDBMS       environment...
Scalability Challenges A True Story (not happened to me)       A MySQL (doesn‘t matter) environment       Started with a r...
Scalability Challenges How can NoSQL help       NoSQL – Not Only SQL       NoSQL implementations target different user bas...
Agenda       Big Data Trend       Scalability Challenges       Apache Hadoop/HBase       Firebird meets NoSQL – Case study...
Apache Hadoop/HBase The Hadoop Universe       Apache Hadoop is an open source software for reliable,       scalable and di...
Apache Hadoop/HBase The Hadoop Universe       Other Hadoop-related Apache projects          Avro™: A data serialization sy...
Apache Hadoop/HBase MapReduce Framework Source: http://www.recessframework.org/page/map-reduce-anonymous-functions-lambdas...
Apache Hadoop/HBase MapReduce Framework Source: http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html© Softwar...
Apache Hadoop/HBase HBase – Overview       Open source Java implementation of Google‘s BigTable       concept       Non-re...
Apache Hadoop/HBase HBase – Architecture Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html© So...
Apache Hadoop/HBase HBase – Data Model       A sparse, multi-dimensional, sorted map             {rowkey, column, timestam...
Apache Hadoop/HBase HBase – API‘s       Native Java       REST       Thrift       Avro       Ruby shell       Apache MapRe...
Apache Hadoop/HBase HBase – Users       Facebook       Twitter       Mozilla       Trend Micro       Yahoo       Adobe    ...
Apache Hadoop/HBase HBase vs. RDBMS       HBase is not a drop-in replacement for a RDBMS         No data types just byte a...
Agenda       Big Data Trend       Scalability Challenges       Apache Hadoop/HBase       Firebird meets NoSQL – Case study...
Firebird meets NoSQL – Case study High-Level Requirements       Condition Monitoring System in the domain of solar       c...
Firebird meets NoSQL – Case study Prototypical implementation       Initial problem: You can‘t manage this data volume wit...
Firebird meets NoSQL – Case study Prototypical implementation       HBase data model based on the web application object  ...
Firebird meets NoSQL – Case study Prototypical implementation       MapReduce job implementation for data aggregation and ...
MapReduce Demonstration                                 Live Demonstration© Software Competence Center Hagenberg GmbH     ...
Agenda       Big Data Trend       Scalability Challenges       Apache Hadoop/HBase       Firebird meets NoSQL – Case study...
Q&A                           Questions and Answers© Software Competence Center Hagenberg GmbH        30
ATTENTION – Think                            small© Software Competence Center Hagenberg GmbH           31
The End                         Thanks for your attention!                                      thomas.steinmaurer@scch.at...
Resources       Apache Hadoop: http://hadoop.apache.org/       Apache HBase: http://hbase.apache.org/       MapReduce: htt...
Upcoming SlideShare
Loading in...5
×

Firebird meets NoSQL

2,665

Published on

Different application domains including sensor networks, social networks, science, financial services, condition monitoring systems demand the storage of a vast amount of data in the petabytes area. Prominent candidates are Google, Facebook, Yahoo!, Amazon just to name a few.

This data volume can't be tackled with convential relational database technologies anymore, either from a technical or licensing point of view or both. It demands a scale-out environment, which allows reliable, scalable and distributed processing. This trend in Big Data management is more and more approached with NoSQL solutions like Apache HBase on top of Apache Hadoop.

This session discusses big data management and their scalability challenges in general with a short introduction into Apache Hadoop/HBase and a case study on the co-existence of Apache Hadoop/HBase with Firebird in a sensor data aquisition system.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,665
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
39
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Firebird meets NoSQL

  1. 1. Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 – Luxembourg 25.11.2011 – 26.11.2011 Thomas Steinmaurer DI Michael Zwick DI (FH) +43 7236 3343 896 +43 7236 3343 843 thomas.steinmaurer@scch.at michael.zwick@scch.at www.scch.at www.scch.atThe SCCH is an initiative of The SCCH is located at
  2. 2. ATTENTION – Think BIG Source: http://www.telovation.com/articles/gallery-old-computers.html© Software Competence Center Hagenberg GmbH 2
  3. 3. ATTENTION – Think BIGGER Source: Google pictures search engine© Software Competence Center Hagenberg GmbH 3
  4. 4. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A© Software Competence Center Hagenberg GmbH 4
  5. 5. Big Data Trend Data volume grows dramatically 40% up to 60% per year is common What is big? Gigabyte, Terabyte, Petabyte ...? It depends on your business and the technology you use to manage this data Google, Facebook, Yahoo etc. are all managing petabytes of data Most of the data is unstructured or semistructured, thus (far) away from our perfect relational/normalized world© Software Competence Center Hagenberg GmbH 5
  6. 6. Big Data Trend The Three Vs VOLUME Big Data is not just about data volume • Terabytes • Records • Transactions • Tables, files • Batch • Structured • Near time • Unstructured • Real time • Semistructured • Streams • All the above VELOCITY VARIETY Source: Big Data Analytics – TDWI Best Practices Report© Software Competence Center Hagenberg GmbH 6
  7. 7. Big Data Trend Typical application domains Sensor networks Social networks Search engines Health care (medical images and patient records) Science Bioinformatics Financial services Condition Monitoring systems© Software Competence Center Hagenberg GmbH 7
  8. 8. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study© Software Competence Center Hagenberg GmbH 8
  9. 9. Scalability Challenges RDBMS scalability issues How do you tackle scalability issues in your RDBMS environment? Scale-Up database server Faster/more CPU, more RAM, Faster I/O Create indices, optimize statements, partition data Reduce network traffic Pre-aggregate data Denormalize data to avoid complex join statements Replication for distributed workload Problem: This usually fails in the Big Data area due to technical or licensing issues Solution: Big Data Management demands Scale-Out© Software Competence Center Hagenberg GmbH 9
  10. 10. Scalability Challenges A True Story (not happened to me) A MySQL (doesn‘t matter) environment Started with a regular setup, a powerful database server Business grow, data volume increased dramatically The project team began to Partition data on several physical disks Replicate data to several machines to distribute workload Denormalize data to avoid complex joins Removed transaction support (ISAM storage engine) At the end, they gave up: More or less a few (or even one) denormalized table Data spread across several machines and physical disks Expensive and hard to administrate Now they are using a NoSQL solution© Software Competence Center Hagenberg GmbH 10
  11. 11. Scalability Challenges How can NoSQL help NoSQL – Not Only SQL NoSQL implementations target different user bases Document databases Key/Value databases Graph databases NoSQL can Scale-Out easily In the previous true story, they switched to a distributed key/value store implementation called HBase (NoSQL database) on top of Apache Hadoop© Software Competence Center Hagenberg GmbH 11
  12. 12. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A© Software Competence Center Hagenberg GmbH 12
  13. 13. Apache Hadoop/HBase The Hadoop Universe Apache Hadoop is an open source software for reliable, scalable and distributed computing Consists of Hadoop Common: Common utilities to support other Hadoop sub-projects Hadoop Distributed Filesystem (HDFS): Distributed file system Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters Can run on commodity hardware Widely used Amazon/A9, Facebook, Google, IBM, Joost, Last.fm, New York, Times, PowerSet, Veoh, Yahoo! ...© Software Competence Center Hagenberg GmbH 13
  14. 14. Apache Hadoop/HBase The Hadoop Universe Other Hadoop-related Apache projects Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications.© Software Competence Center Hagenberg GmbH 14
  15. 15. Apache Hadoop/HBase MapReduce Framework Source: http://www.recessframework.org/page/map-reduce-anonymous-functions-lambdas-php© Software Competence Center Hagenberg GmbH 15
  16. 16. Apache Hadoop/HBase MapReduce Framework Source: http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html© Software Competence Center Hagenberg GmbH 16
  17. 17. Apache Hadoop/HBase HBase – Overview Open source Java implementation of Google‘s BigTable concept Non-relational, distributed database Column-oriented storage, optimized for sparse data Multi-dimensional High Availability High Performance Runs on top of Apache Hadoop Distributed Filesystem (HDFS) Goals Billions of rows * Million of columns * Thousands of Versions Petabytes across thousands of commodity servers© Software Competence Center Hagenberg GmbH 17
  18. 18. Apache Hadoop/HBase HBase – Architecture Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html© Software Competence Center Hagenberg GmbH 18
  19. 19. Apache Hadoop/HBase HBase – Data Model A sparse, multi-dimensional, sorted map {rowkey, column, timestamp} -> cell Column = <column_family>:<qualifier> Rows are sorted lexicographically based on the rowkey© Software Competence Center Hagenberg GmbH 19
  20. 20. Apache Hadoop/HBase HBase – API‘s Native Java REST Thrift Avro Ruby shell Apache MapReduce Hive ...© Software Competence Center Hagenberg GmbH 20
  21. 21. Apache Hadoop/HBase HBase – Users Facebook Twitter Mozilla Trend Micro Yahoo Adobe ...© Software Competence Center Hagenberg GmbH 21
  22. 22. Apache Hadoop/HBase HBase vs. RDBMS HBase is not a drop-in replacement for a RDBMS No data types just byte arrays, interpretation happens at the client No query engine No joins No transactions Secondary indexes problematic Denormalized data HBase is bullet-proof to Store key/value pairs in a distributed environment Scale horizontally by adding more nodes to the cluster Offer a flexible schema Efficiently store sparse tables (NULL doesn‘t need any space) Support semi-structured and structered data© Software Competence Center Hagenberg GmbH 22
  23. 23. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A© Software Competence Center Hagenberg GmbH 23
  24. 24. Firebird meets NoSQL – Case study High-Level Requirements Condition Monitoring System in the domain of solar collectors and inverters Collecting measurement values (current, voltage, temperature ...) Expected data volume ~ 1300 billions measurement values (rows) per year Long-term storage solution necessary Web-based customer portal accesing aggregated and detailed data Backend data analysis, data mining, machine learning, fault detection© Software Competence Center Hagenberg GmbH 24
  25. 25. Firebird meets NoSQL – Case study Prototypical implementation Initial problem: You can‘t manage this data volume with a RDBMS Measurement values are a good example for key/value pairs Evaluated Apache Hadoop/HBase for storing detailed measurement values Virtualized Hadoop/HBase test cluster with 12 nodes RDBMS (Firebird/Oracle) for storing aggregated data© Software Competence Center Hagenberg GmbH 25
  26. 26. Firebird meets NoSQL – Case study Prototypical implementation HBase data model based on the web application object model Row-Key: <DataLogger-ID>-<Device-ID>-<Timestamp> 0000000000000050-0000000000000001-20110815134520 Column Families and Qualifiers based on classes and attributes in the object model Test data generator (TDG) and scanner (TDS) implementing the HBase data model for performance tests TDG: ~70000 rows per second TDS: <150ms response time with 400 simulated clients querying detail values for a particular DataLogger-ID/Device- ID for a given day for a HBase table with ~5 billions rows© Software Competence Center Hagenberg GmbH 26
  27. 27. Firebird meets NoSQL – Case study Prototypical implementation MapReduce job implementation for data aggregation and storage Throughput: 600000 rows per second Pre-aggregation of ~15 billions rows in ~7h, including storage in RDBMS Prototypical implementation based on a fraction of the expected data volume Simply add more nodes to the cluster to handle more data© Software Competence Center Hagenberg GmbH 27
  28. 28. MapReduce Demonstration Live Demonstration© Software Competence Center Hagenberg GmbH 28
  29. 29. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A© Software Competence Center Hagenberg GmbH 29
  30. 30. Q&A Questions and Answers© Software Competence Center Hagenberg GmbH 30
  31. 31. ATTENTION – Think small© Software Competence Center Hagenberg GmbH 31
  32. 32. The End Thanks for your attention! thomas.steinmaurer@scch.at© Software Competence Center Hagenberg GmbH 32
  33. 33. Resources Apache Hadoop: http://hadoop.apache.org/ Apache HBase: http://hbase.apache.org/ MapReduce: http://www.larsgeorge.com/2009/05/hbase-mapreduce- 101-part-i.html Hadoop/Hbase powered by: http://wiki.apache.org/hadoop/PoweredBy http://wiki.apache.org/hadoop/Hbase/PoweredBy HBase goals: http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS HBase architecture: http://www.larsgeorge.com/2009/10/hbase-architecture-101- storage.html Cloudera Distribution: http://www.cloudera.com/© Software Competence Center Hagenberg GmbH 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×