SlideShare a Scribd company logo
1 of 33
Download to read offline
Firebird meets NoSQL (Apache HBase)
     Case Study
     Firebird Conference 2011 – Luxembourg
     25.11.2011 – 26.11.2011
     Thomas Steinmaurer DI        Michael Zwick DI (FH)

     +43 7236 3343 896            +43 7236 3343 843
     thomas.steinmaurer@scch.at   michael.zwick@scch.at
     www.scch.at                  www.scch.at




The SCCH is an initiative of                              The SCCH is located at
ATTENTION – Think BIG




 Source: http://www.telovation.com/articles/gallery-old-computers.html

© Software Competence Center Hagenberg GmbH                              2
ATTENTION – Think BIGGER




 Source: Google pictures search engine

© Software Competence Center Hagenberg GmbH   3
Agenda


       Big Data Trend
       Scalability Challenges
       Apache Hadoop/HBase
       Firebird meets NoSQL – Case study
       Q&A




© Software Competence Center Hagenberg GmbH   4
Big Data Trend


       Data volume grows dramatically
             40% up to 60% per year is common
       What is big?
             Gigabyte, Terabyte, Petabyte ...?
             It depends on your business and the technology you use to
             manage this data
       Google, Facebook, Yahoo etc. are all managing petabytes
       of data
       Most of the data is unstructured or semistructured, thus
       (far) away from our perfect relational/normalized world



© Software Competence Center Hagenberg GmbH                              5
Big Data Trend
 The Three Vs

                                                                          VOLUME
        Big Data is not just about data
        volume
                                                                         • Terabytes
                                                                         • Records
                                                                         • Transactions
                                                                         • Tables, files




                                                           • Batch
                                                                                           • Structured
                                                           • Near time
                                                                                           • Unstructured
                                                           • Real time
                                                                                           • Semistructured
                                                           • Streams
                                                                                           • All the above
                                 VELOCITY                                                                     VARIETY
 Source: Big Data Analytics – TDWI Best Practices Report
© Software Competence Center Hagenberg GmbH                                                                             6
Big Data Trend
 Typical application domains

        Sensor networks
        Social networks
        Search engines
        Health care (medical images and patient records)
        Science
        Bioinformatics
        Financial services
        Condition Monitoring systems




© Software Competence Center Hagenberg GmbH                7
Agenda


       Big Data Trend
       Scalability Challenges
       Apache Hadoop/HBase
       Firebird meets NoSQL – Case study




© Software Competence Center Hagenberg GmbH   8
Scalability Challenges
 RDBMS scalability issues

       How do you tackle scalability issues in your RDBMS
       environment?
             Scale-Up database server
                  Faster/more CPU, more RAM, Faster I/O
             Create indices, optimize statements, partition data
             Reduce network traffic
             Pre-aggregate data
             Denormalize data to avoid complex join statements
             Replication for distributed workload
       Problem: This usually fails in the Big Data area due to
       technical or licensing issues
       Solution: Big Data Management demands Scale-Out

© Software Competence Center Hagenberg GmbH                        9
Scalability Challenges
 A True Story (not happened to me)

       A MySQL (doesn‘t matter) environment
       Started with a regular setup, a powerful database server
       Business grow, data volume increased dramatically
       The project team began to
             Partition data on several physical disks
             Replicate data to several machines to distribute workload
             Denormalize data to avoid complex joins
       Removed transaction support (ISAM storage engine)
       At the end, they gave up:
             More or less a few (or even one) denormalized table
             Data spread across several machines and physical disks
             Expensive and hard to administrate
       Now they are using a NoSQL solution
© Software Competence Center Hagenberg GmbH                              10
Scalability Challenges
 How can NoSQL help

       NoSQL – Not Only SQL
       NoSQL implementations target different user bases
             Document databases
             Key/Value databases
             Graph databases
       NoSQL can Scale-Out easily
       In the previous true story, they switched to a distributed
       key/value store implementation called HBase (NoSQL
       database) on top of Apache Hadoop




© Software Competence Center Hagenberg GmbH                         11
Agenda


       Big Data Trend
       Scalability Challenges
       Apache Hadoop/HBase
       Firebird meets NoSQL – Case study
       Q&A




© Software Competence Center Hagenberg GmbH   12
Apache Hadoop/HBase
 The Hadoop Universe

       Apache Hadoop is an open source software for reliable,
       scalable and distributed computing
       Consists of
             Hadoop Common: Common utilities to support other Hadoop
             sub-projects
             Hadoop Distributed Filesystem (HDFS): Distributed file
             system
             Hadoop MapReduce: A software framework for distributed
             processing of large data sets on compute clusters
       Can run on commodity hardware
       Widely used
             Amazon/A9, Facebook, Google, IBM, Joost, Last.fm, New
             York, Times, PowerSet, Veoh, Yahoo! ...
© Software Competence Center Hagenberg GmbH                            13
Apache Hadoop/HBase
 The Hadoop Universe

       Other Hadoop-related Apache projects
          Avro™: A data serialization system.
          Cassandra™: A scalable multi-master database with no single
          points of failure.
          Chukwa™: A data collection system for managing large distributed
          systems.
          HBase™: A scalable, distributed database that supports
          structured data storage for large tables.
          Hive™: A data warehouse infrastructure that provides data
          summarization and ad hoc querying.
          Mahout™: A Scalable machine learning and data mining library.
          Pig™: A high-level data-flow language and execution framework for
          parallel computation.
          ZooKeeper™: A high-performance coordination service for
          distributed applications.


© Software Competence Center Hagenberg GmbH                                   14
Apache Hadoop/HBase
 MapReduce Framework




 Source: http://www.recessframework.org/page/map-reduce-anonymous-functions-lambdas-php
© Software Competence Center Hagenberg GmbH                                               15
Apache Hadoop/HBase
 MapReduce Framework




 Source: http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html
© Software Competence Center Hagenberg GmbH                                  16
Apache Hadoop/HBase
 HBase – Overview

       Open source Java implementation of Google‘s BigTable
       concept
       Non-relational, distributed database
       Column-oriented storage, optimized for sparse data
       Multi-dimensional
       High Availability
       High Performance
       Runs on top of Apache Hadoop Distributed Filesystem
       (HDFS)
       Goals
             Billions of rows * Million of columns * Thousands of Versions
             Petabytes across thousands of commodity servers
© Software Competence Center Hagenberg GmbH                                  17
Apache Hadoop/HBase
 HBase – Architecture




 Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
© Software Competence Center Hagenberg GmbH                                      18
Apache Hadoop/HBase
 HBase – Data Model

       A sparse, multi-dimensional, sorted map
             {rowkey, column, timestamp} -> cell
       Column = <column_family>:<qualifier>
       Rows are sorted lexicographically based on the rowkey




© Software Competence Center Hagenberg GmbH                    19
Apache Hadoop/HBase
 HBase – API‘s

       Native Java
       REST
       Thrift
       Avro
       Ruby shell
       Apache MapReduce
       Hive
       ...




© Software Competence Center Hagenberg GmbH   20
Apache Hadoop/HBase
 HBase – Users

       Facebook
       Twitter
       Mozilla
       Trend Micro
       Yahoo
       Adobe
       ...




© Software Competence Center Hagenberg GmbH   21
Apache Hadoop/HBase
 HBase vs. RDBMS

       HBase is not a drop-in replacement for a RDBMS
         No data types just byte arrays, interpretation happens at the client
         No query engine
         No joins
         No transactions
         Secondary indexes problematic
         Denormalized data
       HBase is bullet-proof to
         Store key/value pairs in a distributed environment
         Scale horizontally by adding more nodes to the cluster
         Offer a flexible schema
         Efficiently store sparse tables (NULL doesn‘t need any space)
         Support semi-structured and structered data

© Software Competence Center Hagenberg GmbH                                     22
Agenda


       Big Data Trend
       Scalability Challenges
       Apache Hadoop/HBase
       Firebird meets NoSQL – Case study
       Q&A




© Software Competence Center Hagenberg GmbH   23
Firebird meets NoSQL – Case study
 High-Level Requirements

       Condition Monitoring System in the domain of solar
       collectors and inverters
       Collecting measurement values (current, voltage,
       temperature ...)
       Expected data volume
             ~ 1300 billions measurement values (rows) per year
       Long-term storage solution necessary
       Web-based customer portal accesing aggregated and
       detailed data
       Backend data analysis, data mining, machine learning,
       fault detection


© Software Competence Center Hagenberg GmbH                       24
Firebird meets NoSQL – Case study
 Prototypical implementation

       Initial problem: You can‘t manage this data volume with a
       RDBMS
       Measurement values are a good example for key/value
       pairs
       Evaluated Apache Hadoop/HBase for storing detailed
       measurement values
       Virtualized Hadoop/HBase test cluster with 12 nodes
       RDBMS (Firebird/Oracle) for storing aggregated data




© Software Competence Center Hagenberg GmbH                        25
Firebird meets NoSQL – Case study
 Prototypical implementation

       HBase data model based on the web application object
       model
             Row-Key: <DataLogger-ID>-<Device-ID>-<Timestamp>
             0000000000000050-0000000000000001-20110815134520
             Column Families and Qualifiers based on classes and
             attributes in the object model
       Test data generator (TDG) and scanner (TDS)
       implementing the HBase data model for performance tests
             TDG: ~70000 rows per second
             TDS: <150ms response time with 400 simulated clients
             querying detail values for a particular DataLogger-ID/Device-
             ID for a given day for a HBase table with ~5 billions rows


© Software Competence Center Hagenberg GmbH                                  26
Firebird meets NoSQL – Case study
 Prototypical implementation

       MapReduce job implementation for data aggregation and
       storage
             Throughput: 600000 rows per second
             Pre-aggregation of ~15 billions rows in ~7h, including storage
             in RDBMS
       Prototypical implementation based on a fraction of the
       expected data volume
             Simply add more nodes to the cluster to handle more data




© Software Competence Center Hagenberg GmbH                                   27
MapReduce Demonstration




                                 Live Demonstration




© Software Competence Center Hagenberg GmbH           28
Agenda


       Big Data Trend
       Scalability Challenges
       Apache Hadoop/HBase
       Firebird meets NoSQL – Case study
       Q&A




© Software Competence Center Hagenberg GmbH   29
Q&A




                           Questions and Answers




© Software Competence Center Hagenberg GmbH        30
ATTENTION – Think                            small




© Software Competence Center Hagenberg GmbH           31
The End




                         Thanks for your attention!

                                      thomas.steinmaurer@scch.at




© Software Competence Center Hagenberg GmbH                        32
Resources


       Apache Hadoop: http://hadoop.apache.org/
       Apache HBase: http://hbase.apache.org/
       MapReduce: http://www.larsgeorge.com/2009/05/hbase-mapreduce-
       101-part-i.html
       Hadoop/Hbase powered by:
       http://wiki.apache.org/hadoop/PoweredBy
       http://wiki.apache.org/hadoop/Hbase/PoweredBy
       HBase goals:
       http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS
       HBase architecture:
       http://www.larsgeorge.com/2009/10/hbase-architecture-101-
       storage.html
       Cloudera Distribution: http://www.cloudera.com/



© Software Competence Center Hagenberg GmbH                            33

More Related Content

What's hot

Intro to drupal
Intro to drupalIntro to drupal
Intro to drupalhernanibf
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmurTobias Koprowski
 
Which Postgres is Right for You? - Part 2
Which Postgres is Right for You? - Part 2Which Postgres is Right for You? - Part 2
Which Postgres is Right for You? - Part 2EDB
 
My site is slow
My site is slowMy site is slow
My site is slowhernanibf
 
Infinspan: In-memory data grid meets NoSQL
Infinspan: In-memory data grid meets NoSQLInfinspan: In-memory data grid meets NoSQL
Infinspan: In-memory data grid meets NoSQLManik Surtani
 
Deploying Web Applications with WildFly 8
Deploying Web Applications with WildFly 8Deploying Web Applications with WildFly 8
Deploying Web Applications with WildFly 8Arun Gupta
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and DeltaDatabricks
 
Making the Most of Your Postgres Rollout
Making the Most of Your Postgres RolloutMaking the Most of Your Postgres Rollout
Making the Most of Your Postgres RolloutEDB
 
Deployer - Deployment tool for PHP
Deployer - Deployment tool for PHPDeployer - Deployment tool for PHP
Deployer - Deployment tool for PHPhernanibf
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517EDB
 
Phase2 Large Drupal Multisites (gta case study)
Phase2   Large Drupal Multisites (gta case study)Phase2   Large Drupal Multisites (gta case study)
Phase2 Large Drupal Multisites (gta case study)Phase2
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project ExperienceEDB
 
KoprowskiT_SPBizConference_2AMaDisasterJustBegan
KoprowskiT_SPBizConference_2AMaDisasterJustBeganKoprowskiT_SPBizConference_2AMaDisasterJustBegan
KoprowskiT_SPBizConference_2AMaDisasterJustBeganTobias Koprowski
 
Postgres Deployment Automation with Terraform and Ansible
 Postgres Deployment Automation with Terraform and Ansible Postgres Deployment Automation with Terraform and Ansible
Postgres Deployment Automation with Terraform and AnsibleEDB
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013EDB
 
Minimize Headaches with Your Postgres Deployment
Minimize Headaches with Your Postgres DeploymentMinimize Headaches with Your Postgres Deployment
Minimize Headaches with Your Postgres DeploymentEDB
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Ajay Kumar Uppal
 

What's hot (20)

Intro to drupal
Intro to drupalIntro to drupal
Intro to drupal
 
SQL Azure for ITPros
SQL Azure for ITProsSQL Azure for ITPros
SQL Azure for ITPros
 
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
44spotkaniePLSSUGWRO_CoNowegowKrainieChmur
 
Which Postgres is Right for You? - Part 2
Which Postgres is Right for You? - Part 2Which Postgres is Right for You? - Part 2
Which Postgres is Right for You? - Part 2
 
My site is slow
My site is slowMy site is slow
My site is slow
 
Infinspan: In-memory data grid meets NoSQL
Infinspan: In-memory data grid meets NoSQLInfinspan: In-memory data grid meets NoSQL
Infinspan: In-memory data grid meets NoSQL
 
Deploying Web Applications with WildFly 8
Deploying Web Applications with WildFly 8Deploying Web Applications with WildFly 8
Deploying Web Applications with WildFly 8
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
 
Making the Most of Your Postgres Rollout
Making the Most of Your Postgres RolloutMaking the Most of Your Postgres Rollout
Making the Most of Your Postgres Rollout
 
Deployer - Deployment tool for PHP
Deployer - Deployment tool for PHPDeployer - Deployment tool for PHP
Deployer - Deployment tool for PHP
 
Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517Which postgres is_right_for_me_20130517
Which postgres is_right_for_me_20130517
 
Phase2 Large Drupal Multisites (gta case study)
Phase2   Large Drupal Multisites (gta case study)Phase2   Large Drupal Multisites (gta case study)
Phase2 Large Drupal Multisites (gta case study)
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
 
KoprowskiT_SPBizConference_2AMaDisasterJustBegan
KoprowskiT_SPBizConference_2AMaDisasterJustBeganKoprowskiT_SPBizConference_2AMaDisasterJustBegan
KoprowskiT_SPBizConference_2AMaDisasterJustBegan
 
Postgres Deployment Automation with Terraform and Ansible
 Postgres Deployment Automation with Terraform and Ansible Postgres Deployment Automation with Terraform and Ansible
Postgres Deployment Automation with Terraform and Ansible
 
Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013Avoiding.the.pitfallsof.oracle.migration.2013
Avoiding.the.pitfallsof.oracle.migration.2013
 
Minimize Headaches with Your Postgres Deployment
Minimize Headaches with Your Postgres DeploymentMinimize Headaches with Your Postgres Deployment
Minimize Headaches with Your Postgres Deployment
 
Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2Trusted advisory on technology comparison --exadata, hana, db2
Trusted advisory on technology comparison --exadata, hana, db2
 

Similar to Firebird meets NoSQL

Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Jean-Pierre König
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 

Similar to Firebird meets NoSQL (20)

Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013Semantic web meetup 14.november 2013
Semantic web meetup 14.november 2013
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 

More from Mind The Firebird

Tips for using Firebird system tables
Tips for using Firebird system tablesTips for using Firebird system tables
Tips for using Firebird system tablesMind The Firebird
 
Using Azure cloud and Firebird to develop applications easily
Using Azure cloud and Firebird to develop applications easilyUsing Azure cloud and Firebird to develop applications easily
Using Azure cloud and Firebird to develop applications easilyMind The Firebird
 
A year in the life of Firebird .Net provider
A year in the life of Firebird .Net providerA year in the life of Firebird .Net provider
A year in the life of Firebird .Net providerMind The Firebird
 
How Firebird transactions work
How Firebird transactions workHow Firebird transactions work
How Firebird transactions workMind The Firebird
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceMind The Firebird
 
Creating logs for data auditing in FirebirdSQL
Creating logs for data auditing in FirebirdSQLCreating logs for data auditing in FirebirdSQL
Creating logs for data auditing in FirebirdSQLMind The Firebird
 
Firebird Performance counters in details
Firebird Performance counters in detailsFirebird Performance counters in details
Firebird Performance counters in detailsMind The Firebird
 
Understanding Numbers in Firebird SQL
Understanding Numbers in Firebird SQLUnderstanding Numbers in Firebird SQL
Understanding Numbers in Firebird SQLMind The Firebird
 
Threading through InterBase, Firebird, and beyond
Threading through InterBase, Firebird, and beyondThreading through InterBase, Firebird, and beyond
Threading through InterBase, Firebird, and beyondMind The Firebird
 
New SQL Features in Firebird 3, by Vlad Khorsun
New SQL Features in Firebird 3, by Vlad KhorsunNew SQL Features in Firebird 3, by Vlad Khorsun
New SQL Features in Firebird 3, by Vlad KhorsunMind The Firebird
 
Orphans, Corruption, Careful Write, and Logging
Orphans, Corruption, Careful Write, and LoggingOrphans, Corruption, Careful Write, and Logging
Orphans, Corruption, Careful Write, and LoggingMind The Firebird
 
Firebird release strategy and roadmap for 2015/2016
Firebird release strategy and roadmap for 2015/2016Firebird release strategy and roadmap for 2015/2016
Firebird release strategy and roadmap for 2015/2016Mind The Firebird
 
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...Mind The Firebird
 
Working with Large Firebird databases
Working with Large Firebird databasesWorking with Large Firebird databases
Working with Large Firebird databasesMind The Firebird
 
Stored procedures in Firebird
Stored procedures in FirebirdStored procedures in Firebird
Stored procedures in FirebirdMind The Firebird
 
Superchaging big production systems on Firebird: transactions, garbage, maint...
Superchaging big production systems on Firebird: transactions, garbage, maint...Superchaging big production systems on Firebird: transactions, garbage, maint...
Superchaging big production systems on Firebird: transactions, garbage, maint...Mind The Firebird
 

More from Mind The Firebird (20)

Tips for using Firebird system tables
Tips for using Firebird system tablesTips for using Firebird system tables
Tips for using Firebird system tables
 
Using Azure cloud and Firebird to develop applications easily
Using Azure cloud and Firebird to develop applications easilyUsing Azure cloud and Firebird to develop applications easily
Using Azure cloud and Firebird to develop applications easily
 
A year in the life of Firebird .Net provider
A year in the life of Firebird .Net providerA year in the life of Firebird .Net provider
A year in the life of Firebird .Net provider
 
How Firebird transactions work
How Firebird transactions workHow Firebird transactions work
How Firebird transactions work
 
SuperServer in Firebird 3
SuperServer in Firebird 3SuperServer in Firebird 3
SuperServer in Firebird 3
 
Copycat presentation
Copycat presentationCopycat presentation
Copycat presentation
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
 
Overview of RedDatabase 2.5
Overview of RedDatabase 2.5Overview of RedDatabase 2.5
Overview of RedDatabase 2.5
 
Creating logs for data auditing in FirebirdSQL
Creating logs for data auditing in FirebirdSQLCreating logs for data auditing in FirebirdSQL
Creating logs for data auditing in FirebirdSQL
 
Firebird Performance counters in details
Firebird Performance counters in detailsFirebird Performance counters in details
Firebird Performance counters in details
 
Understanding Numbers in Firebird SQL
Understanding Numbers in Firebird SQLUnderstanding Numbers in Firebird SQL
Understanding Numbers in Firebird SQL
 
Threading through InterBase, Firebird, and beyond
Threading through InterBase, Firebird, and beyondThreading through InterBase, Firebird, and beyond
Threading through InterBase, Firebird, and beyond
 
New SQL Features in Firebird 3, by Vlad Khorsun
New SQL Features in Firebird 3, by Vlad KhorsunNew SQL Features in Firebird 3, by Vlad Khorsun
New SQL Features in Firebird 3, by Vlad Khorsun
 
Orphans, Corruption, Careful Write, and Logging
Orphans, Corruption, Careful Write, and LoggingOrphans, Corruption, Careful Write, and Logging
Orphans, Corruption, Careful Write, and Logging
 
Firebird release strategy and roadmap for 2015/2016
Firebird release strategy and roadmap for 2015/2016Firebird release strategy and roadmap for 2015/2016
Firebird release strategy and roadmap for 2015/2016
 
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
Nbackup and Backup: Internals, Usage strategy and Pitfalls, by Dmitry Kuzmenk...
 
Working with Large Firebird databases
Working with Large Firebird databasesWorking with Large Firebird databases
Working with Large Firebird databases
 
Stored procedures in Firebird
Stored procedures in FirebirdStored procedures in Firebird
Stored procedures in Firebird
 
Firebird on Linux
Firebird on LinuxFirebird on Linux
Firebird on Linux
 
Superchaging big production systems on Firebird: transactions, garbage, maint...
Superchaging big production systems on Firebird: transactions, garbage, maint...Superchaging big production systems on Firebird: transactions, garbage, maint...
Superchaging big production systems on Firebird: transactions, garbage, maint...
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Firebird meets NoSQL

  • 1. Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 – Luxembourg 25.11.2011 – 26.11.2011 Thomas Steinmaurer DI Michael Zwick DI (FH) +43 7236 3343 896 +43 7236 3343 843 thomas.steinmaurer@scch.at michael.zwick@scch.at www.scch.at www.scch.at The SCCH is an initiative of The SCCH is located at
  • 2. ATTENTION – Think BIG Source: http://www.telovation.com/articles/gallery-old-computers.html © Software Competence Center Hagenberg GmbH 2
  • 3. ATTENTION – Think BIGGER Source: Google pictures search engine © Software Competence Center Hagenberg GmbH 3
  • 4. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A © Software Competence Center Hagenberg GmbH 4
  • 5. Big Data Trend Data volume grows dramatically 40% up to 60% per year is common What is big? Gigabyte, Terabyte, Petabyte ...? It depends on your business and the technology you use to manage this data Google, Facebook, Yahoo etc. are all managing petabytes of data Most of the data is unstructured or semistructured, thus (far) away from our perfect relational/normalized world © Software Competence Center Hagenberg GmbH 5
  • 6. Big Data Trend The Three Vs VOLUME Big Data is not just about data volume • Terabytes • Records • Transactions • Tables, files • Batch • Structured • Near time • Unstructured • Real time • Semistructured • Streams • All the above VELOCITY VARIETY Source: Big Data Analytics – TDWI Best Practices Report © Software Competence Center Hagenberg GmbH 6
  • 7. Big Data Trend Typical application domains Sensor networks Social networks Search engines Health care (medical images and patient records) Science Bioinformatics Financial services Condition Monitoring systems © Software Competence Center Hagenberg GmbH 7
  • 8. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study © Software Competence Center Hagenberg GmbH 8
  • 9. Scalability Challenges RDBMS scalability issues How do you tackle scalability issues in your RDBMS environment? Scale-Up database server Faster/more CPU, more RAM, Faster I/O Create indices, optimize statements, partition data Reduce network traffic Pre-aggregate data Denormalize data to avoid complex join statements Replication for distributed workload Problem: This usually fails in the Big Data area due to technical or licensing issues Solution: Big Data Management demands Scale-Out © Software Competence Center Hagenberg GmbH 9
  • 10. Scalability Challenges A True Story (not happened to me) A MySQL (doesn‘t matter) environment Started with a regular setup, a powerful database server Business grow, data volume increased dramatically The project team began to Partition data on several physical disks Replicate data to several machines to distribute workload Denormalize data to avoid complex joins Removed transaction support (ISAM storage engine) At the end, they gave up: More or less a few (or even one) denormalized table Data spread across several machines and physical disks Expensive and hard to administrate Now they are using a NoSQL solution © Software Competence Center Hagenberg GmbH 10
  • 11. Scalability Challenges How can NoSQL help NoSQL – Not Only SQL NoSQL implementations target different user bases Document databases Key/Value databases Graph databases NoSQL can Scale-Out easily In the previous true story, they switched to a distributed key/value store implementation called HBase (NoSQL database) on top of Apache Hadoop © Software Competence Center Hagenberg GmbH 11
  • 12. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A © Software Competence Center Hagenberg GmbH 12
  • 13. Apache Hadoop/HBase The Hadoop Universe Apache Hadoop is an open source software for reliable, scalable and distributed computing Consists of Hadoop Common: Common utilities to support other Hadoop sub-projects Hadoop Distributed Filesystem (HDFS): Distributed file system Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters Can run on commodity hardware Widely used Amazon/A9, Facebook, Google, IBM, Joost, Last.fm, New York, Times, PowerSet, Veoh, Yahoo! ... © Software Competence Center Hagenberg GmbH 13
  • 14. Apache Hadoop/HBase The Hadoop Universe Other Hadoop-related Apache projects Avro™: A data serialization system. Cassandra™: A scalable multi-master database with no single points of failure. Chukwa™: A data collection system for managing large distributed systems. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout™: A Scalable machine learning and data mining library. Pig™: A high-level data-flow language and execution framework for parallel computation. ZooKeeper™: A high-performance coordination service for distributed applications. © Software Competence Center Hagenberg GmbH 14
  • 15. Apache Hadoop/HBase MapReduce Framework Source: http://www.recessframework.org/page/map-reduce-anonymous-functions-lambdas-php © Software Competence Center Hagenberg GmbH 15
  • 16. Apache Hadoop/HBase MapReduce Framework Source: http://www.larsgeorge.com/2009/05/hbase-mapreduce-101-part-i.html © Software Competence Center Hagenberg GmbH 16
  • 17. Apache Hadoop/HBase HBase – Overview Open source Java implementation of Google‘s BigTable concept Non-relational, distributed database Column-oriented storage, optimized for sparse data Multi-dimensional High Availability High Performance Runs on top of Apache Hadoop Distributed Filesystem (HDFS) Goals Billions of rows * Million of columns * Thousands of Versions Petabytes across thousands of commodity servers © Software Competence Center Hagenberg GmbH 17
  • 18. Apache Hadoop/HBase HBase – Architecture Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html © Software Competence Center Hagenberg GmbH 18
  • 19. Apache Hadoop/HBase HBase – Data Model A sparse, multi-dimensional, sorted map {rowkey, column, timestamp} -> cell Column = <column_family>:<qualifier> Rows are sorted lexicographically based on the rowkey © Software Competence Center Hagenberg GmbH 19
  • 20. Apache Hadoop/HBase HBase – API‘s Native Java REST Thrift Avro Ruby shell Apache MapReduce Hive ... © Software Competence Center Hagenberg GmbH 20
  • 21. Apache Hadoop/HBase HBase – Users Facebook Twitter Mozilla Trend Micro Yahoo Adobe ... © Software Competence Center Hagenberg GmbH 21
  • 22. Apache Hadoop/HBase HBase vs. RDBMS HBase is not a drop-in replacement for a RDBMS No data types just byte arrays, interpretation happens at the client No query engine No joins No transactions Secondary indexes problematic Denormalized data HBase is bullet-proof to Store key/value pairs in a distributed environment Scale horizontally by adding more nodes to the cluster Offer a flexible schema Efficiently store sparse tables (NULL doesn‘t need any space) Support semi-structured and structered data © Software Competence Center Hagenberg GmbH 22
  • 23. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A © Software Competence Center Hagenberg GmbH 23
  • 24. Firebird meets NoSQL – Case study High-Level Requirements Condition Monitoring System in the domain of solar collectors and inverters Collecting measurement values (current, voltage, temperature ...) Expected data volume ~ 1300 billions measurement values (rows) per year Long-term storage solution necessary Web-based customer portal accesing aggregated and detailed data Backend data analysis, data mining, machine learning, fault detection © Software Competence Center Hagenberg GmbH 24
  • 25. Firebird meets NoSQL – Case study Prototypical implementation Initial problem: You can‘t manage this data volume with a RDBMS Measurement values are a good example for key/value pairs Evaluated Apache Hadoop/HBase for storing detailed measurement values Virtualized Hadoop/HBase test cluster with 12 nodes RDBMS (Firebird/Oracle) for storing aggregated data © Software Competence Center Hagenberg GmbH 25
  • 26. Firebird meets NoSQL – Case study Prototypical implementation HBase data model based on the web application object model Row-Key: <DataLogger-ID>-<Device-ID>-<Timestamp> 0000000000000050-0000000000000001-20110815134520 Column Families and Qualifiers based on classes and attributes in the object model Test data generator (TDG) and scanner (TDS) implementing the HBase data model for performance tests TDG: ~70000 rows per second TDS: <150ms response time with 400 simulated clients querying detail values for a particular DataLogger-ID/Device- ID for a given day for a HBase table with ~5 billions rows © Software Competence Center Hagenberg GmbH 26
  • 27. Firebird meets NoSQL – Case study Prototypical implementation MapReduce job implementation for data aggregation and storage Throughput: 600000 rows per second Pre-aggregation of ~15 billions rows in ~7h, including storage in RDBMS Prototypical implementation based on a fraction of the expected data volume Simply add more nodes to the cluster to handle more data © Software Competence Center Hagenberg GmbH 27
  • 28. MapReduce Demonstration Live Demonstration © Software Competence Center Hagenberg GmbH 28
  • 29. Agenda Big Data Trend Scalability Challenges Apache Hadoop/HBase Firebird meets NoSQL – Case study Q&A © Software Competence Center Hagenberg GmbH 29
  • 30. Q&A Questions and Answers © Software Competence Center Hagenberg GmbH 30
  • 31. ATTENTION – Think small © Software Competence Center Hagenberg GmbH 31
  • 32. The End Thanks for your attention! thomas.steinmaurer@scch.at © Software Competence Center Hagenberg GmbH 32
  • 33. Resources Apache Hadoop: http://hadoop.apache.org/ Apache HBase: http://hbase.apache.org/ MapReduce: http://www.larsgeorge.com/2009/05/hbase-mapreduce- 101-part-i.html Hadoop/Hbase powered by: http://wiki.apache.org/hadoop/PoweredBy http://wiki.apache.org/hadoop/Hbase/PoweredBy HBase goals: http://www.docstoc.com/docs/2996433/Hadoop-and-HBase-vs-RDBMS HBase architecture: http://www.larsgeorge.com/2009/10/hbase-architecture-101- storage.html Cloudera Distribution: http://www.cloudera.com/ © Software Competence Center Hagenberg GmbH 33