HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

Cloudera, Inc.
Cloudera, Inc.Cloudera, Inc.
Developing Real Time Analytics
Applications Using HBase in the Cloud

                        May 22, 2012
                         Rick Tucker
                      tech@sproxil.com




   tech@sproxil.com        May 22,2012   © 2012 Sproxil, Inc.
About Sproxil
• Brand protection,
  specializing in anti-
                                                 1
                                          SCRATCH
  counterfeiting solutions

• Solution requires a
  scalable and high-
  throughput text                                2
  message processing                           TEXT
  engine

• Supports a real-time
  analytics web interface                         3
                                             VERIFY



      tech@sproxil.com   May 22,2012   © 2012 Sproxil, Inc.
Why HBase?

 USER SENDS                TEXT MESSAGE              CALCULATE
TEXT MESSAGE               IS PROCESSED              ANALYTICS




    USER                    Amazon EC2
  RECEIVES                    Cloud
   REPLY




        tech@sproxil.com       May 22,2012   © 2012 Sproxil, Inc.
Real-Time Analytics Engine
 • MapReduce too slow to maintain data in true real time

 • As data arrives, analytical data is updated through
   counters

Text Message                    Message                     Increment
   Arrives                      Analyzed                     Counters

                            Genuine Product      +1 Increment Counter for
                            Authentication          Genuine Authentications


                            Repeat Customer      +1 Increment Counter for
                                                    Repeat Customers


         tech@sproxil.com          May 22,2012        © 2012 Sproxil, Inc.
Schema Design: Example 1

• Example: View log of text messages in
  chronological order
        • Rowkey: row prefix + timestamp

      Row
      transaction 2012-05-22 12:00:00
      transaction 2012-05-22 12:01:14
      transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse
chronological order
         tech@sproxil.com          May 22,2012              © 2012 Sproxil, Inc.   5
•
         •


    Row
    transaction userID 1 2012-05-22 12:00:00
    transaction userID 1 2012-05-22 12:01:14
    transaction userID 2 2012-05-22 12:00:54
    transaction userID 2 2012-05-22 12:01:22
    transaction userID 2 2012-05-22 12:02:01
Note: Hbase sorts rows lexicographically so scans return data in reverse
chronological order

          tech@sproxil.com             May 22,2012                 © 2012 Sproxil, Inc.
Critical Findings
• Schema design is crucial for successful HBase
  implementation
  – Pack as much info as possible into row keys


• Use caution with Filters
  – E.g. Regex filters can be costly
  – Alternatives:
     • Directly query for data you need
     • Use efficient filters when filtering large data sets




      tech@sproxil.com         May 22,2012             © 2012 Sproxil, Inc.
Thank You!                                 Your global brand
                                                 protection specialists
                                                     – spanning 3
                                                    continents and
  Making Counterfeiting Unprofitable™            speaking 9 languages




                                                   tech@sproxil.com

                                                    +1 617 682 9577

America | Asia | Africa     Sproxil.com



         tech@sproxil.com          May 22,2012           © 2012 Sproxil, Inc.   8
1 of 8

Recommended

HBaseCon 2015: Industrial Internet Case Study using HBase and TSDB by
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon 2015: Industrial Internet Case Study using HBase and TSDB
HBaseCon 2015: Industrial Internet Case Study using HBase and TSDBHBaseCon
5.6K views23 slides
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys by
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysHBaseCon 2012 | Real-Time and Batch HBase for Healthcare at Explorys
HBaseCon 2012 | Real-Time and Batch HBase for Healthcare at ExplorysCloudera, Inc.
3.7K views22 slides
NoSQL and Spatial Database Capabilities using PostgreSQL by
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
297 views63 slides
High-Scale Entity Resolution in Hadoop by
High-Scale Entity Resolution in HadoopHigh-Scale Entity Resolution in Hadoop
High-Scale Entity Resolution in HadoopDataWorks Summit/Hadoop Summit
975 views11 slides
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems by
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems HBaseCon 2013: Real-Time Model Scoring in Recommender Systems
HBaseCon 2013: Real-Time Model Scoring in Recommender Systems Cloudera, Inc.
6.1K views35 slides
Insights into Real World Data Management Challenges by
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
336 views73 slides

More Related Content

What's hot

Azure data lakes by
Azure data lakesAzure data lakes
Azure data lakesVishwas N
119 views18 slides
Hadoop data access layer v4.0 by
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0SpringPeople
1.3K views19 slides
Querying Druid in SQL with Superset by
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
5.5K views26 slides
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O... by
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Qubole
757 views27 slides
Built-In Security for the Cloud by
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
1K views26 slides
HTAP Queries by
HTAP QueriesHTAP Queries
HTAP QueriesAtif Shaikh
205 views20 slides

What's hot(20)

Azure data lakes by Vishwas N
Azure data lakesAzure data lakes
Azure data lakes
Vishwas N119 views
Hadoop data access layer v4.0 by SpringPeople
Hadoop data access layer v4.0Hadoop data access layer v4.0
Hadoop data access layer v4.0
SpringPeople1.3K views
Querying Druid in SQL with Superset by DataWorks Summit
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
DataWorks Summit5.5K views
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O... by Qubole
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole757 views
Ravi Namboori 's Open stack framework introduction by Ravi namboori
Ravi Namboori 's Open stack framework introductionRavi Namboori 's Open stack framework introduction
Ravi Namboori 's Open stack framework introduction
Ravi namboori706 views
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?" by DataConf
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
Eugene Polonichko "Azure Data Lake: what is it? why is it? where is it?"
DataConf119 views
Atlanta Data Science Meetup | Qubole slides by Qubole
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
Qubole1.4K views
Spark and Couchbase– Augmenting the Operational Database with Spark by Matt Ingenthron
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
Matt Ingenthron266 views
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse by DataWorks Summit
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
DataWorks Summit998 views
Addressing Enterprise Customer Pain Points with a Data Driven Architecture by DataWorks Summit
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
DataWorks Summit565 views
Hadoop vs. RDBMS for Advanced Analytics by joshwills
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
joshwills4K views
Improving Apache Spark™ In-Memory Computing with Apache Ignite™ by Tom Diederich
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Tom Diederich204 views
Big Telco - Yousun Jeong by Spark Summit
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit4.3K views
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre... by Data Con LA
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Data Con LA362 views
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ... by Data Con LA
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Data Con LA1.3K views
Enabling Modern Application Architecture using Data.gov open government data by DataWorks Summit
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit293 views
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics by DataWorks Summit
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
DataWorks Summit7.9K views
Securing your Big Data Environments in the Cloud by DataWorks Summit
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
DataWorks Summit842 views

Viewers also liked

Impala: A Modern, Open-Source SQL Engine for Hadoop by
Impala: A Modern, Open-Source SQL Engine for HadoopImpala: A Modern, Open-Source SQL Engine for Hadoop
Impala: A Modern, Open-Source SQL Engine for HadoopAll Things Open
1.8K views51 slides
BIG Data & Hadoop Applications in E-Commerce by
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceSkillspeed
3.5K views17 slides
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W... by
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
5.8K views27 slides
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage by
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
4.8K views45 slides
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
4.6K views23 slides
HBaseCon 2012 | HBase, the Use Case in eBay Cassini by
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
6.1K views13 slides

Viewers also liked(13)

Impala: A Modern, Open-Source SQL Engine for Hadoop by All Things Open
Impala: A Modern, Open-Source SQL Engine for HadoopImpala: A Modern, Open-Source SQL Engine for Hadoop
Impala: A Modern, Open-Source SQL Engine for Hadoop
All Things Open1.8K views
BIG Data & Hadoop Applications in E-Commerce by Skillspeed
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
Skillspeed3.5K views
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W... by Cloudera, Inc.
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
Cloudera, Inc.5.8K views
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase by Cloudera, Inc.
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
Cloudera, Inc.4.6K views
HBaseCon 2012 | HBase, the Use Case in eBay Cassini by Cloudera, Inc.
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.6.1K views
How we solved Real-time User Segmentation using HBase by DataWorks Summit
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
DataWorks Summit11.4K views
MongoDB Schema Design: Four Real-World Examples by Mike Friedman
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
Mike Friedman98.2K views
Magento scalability from the trenches (Meet Magento Sweden 2016) by Divante
Magento scalability from the trenches (Meet Magento Sweden 2016)Magento scalability from the trenches (Meet Magento Sweden 2016)
Magento scalability from the trenches (Meet Magento Sweden 2016)
Divante166.5K views
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness by Divante
Surprising failure factors when implementing eCommerce and Omnichannel eBusinessSurprising failure factors when implementing eCommerce and Omnichannel eBusiness
Surprising failure factors when implementing eCommerce and Omnichannel eBusiness
Divante161.4K views
Omnichannel Customer Experience by Divante
Omnichannel Customer ExperienceOmnichannel Customer Experience
Omnichannel Customer Experience
Divante166.6K views
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce by Cloudera, Inc.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.41.7K views

Similar to HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

Introducing MongoDB into your Organization by
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your OrganizationMongoDB
653 views28 slides
Mongodb Presentation by
Mongodb PresentationMongodb Presentation
Mongodb PresentationHashim Shaikh
222 views20 slides
Mongodb hashim shaikh by
Mongodb hashim shaikhMongodb hashim shaikh
Mongodb hashim shaikhHashim Shaikh
246 views20 slides
Mongodb Presentation by
Mongodb PresentationMongodb Presentation
Mongodb PresentationHashim Shaikh
256 views20 slides
Overcoming Today's Data Challenges with MongoDB by
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
1.3K views104 slides
Overcoming Today's Data Challenges with MongoDB by
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
811 views102 slides

Similar to HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil(20)

Introducing MongoDB into your Organization by MongoDB
Introducing MongoDB into your OrganizationIntroducing MongoDB into your Organization
Introducing MongoDB into your Organization
MongoDB653 views
Overcoming Today's Data Challenges with MongoDB by MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
MongoDB1.3K views
Overcoming Today's Data Challenges with MongoDB by MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
MongoDB811 views
2022 Trends in Enterprise Analytics by DATAVERSITY
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY511 views
Big Data by Immo Salo
Big DataBig Data
Big Data
Immo Salo1.7K views
How to Get Started with Your MongoDB Pilot Project by DATAVERSITY
How to Get Started with Your MongoDB Pilot ProjectHow to Get Started with Your MongoDB Pilot Project
How to Get Started with Your MongoDB Pilot Project
DATAVERSITY834 views
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012 by Bjarni Kristjánsson
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
Maximal: MPL Software Demo - INFORMS Phoenix Oct 2012
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S... by SoftServe
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
SoftServe1.4K views
mongoDB: Driving a data revolution by MongoDB
mongoDB: Driving a data revolutionmongoDB: Driving a data revolution
mongoDB: Driving a data revolution
MongoDB1.1K views
New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C... by EDB
New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...
New Approaches to Migrating from Oracle to Enterprise-Ready Postgres in the C...
EDB272 views
Final_CloudEventFrankfurt2017 (1).pdf by MongoDB
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdf
MongoDB572 views
Case Study: Using EDMCS to Solve Master Data Challenges by Alithya
Case Study:  Using EDMCS to Solve Master Data ChallengesCase Study:  Using EDMCS to Solve Master Data Challenges
Case Study: Using EDMCS to Solve Master Data Challenges
Alithya867 views
An Evening with MongoDB Detroit 2013 by MongoDB
An Evening with MongoDB Detroit 2013An Evening with MongoDB Detroit 2013
An Evening with MongoDB Detroit 2013
MongoDB1.6K views
Optimize with Open Source by EDB
Optimize with Open SourceOptimize with Open Source
Optimize with Open Source
EDB933 views

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx by
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
109 views55 slides
Cloudera Data Impact Awards 2021 - Finalists by
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
6.5K views34 slides
2020 Cloudera Data Impact Awards Finalists by
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
6.3K views43 slides
Edc event vienna presentation 1 oct 2019 by
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
4.5K views67 slides
Machine Learning with Limited Labeled Data 4/3/19 by
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
3.6K views36 slides
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
2.5K views21 slides

More from Cloudera, Inc.(20)

Partner Briefing_January 25 (FINAL).pptx by Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.109 views
Cloudera Data Impact Awards 2021 - Finalists by Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.6.5K views
2020 Cloudera Data Impact Awards Finalists by Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.6.3K views
Edc event vienna presentation 1 oct 2019 by Cloudera, Inc.
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.4.5K views
Machine Learning with Limited Labeled Data 4/3/19 by Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.3.6K views
Data Driven With the Cloudera Modern Data Warehouse 3.19.19 by Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.2.5K views
Introducing Cloudera DataFlow (CDF) 2.13.19 by Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.4.9K views
Introducing Cloudera Data Science Workbench for HDP 2.12.19 by Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.2.7K views
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19 by Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.1.6K views
Leveraging the cloud for analytics and machine learning 1.29.19 by Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.1.6K views
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19 by Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.2.5K views
Leveraging the Cloud for Big Data Analytics 12.11.18 by Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.1.7K views
Modern Data Warehouse Fundamentals Part 3 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.1.3K views
Modern Data Warehouse Fundamentals Part 2 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.2.3K views
Modern Data Warehouse Fundamentals Part 1 by Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.1.5K views
Extending Cloudera SDX beyond the Platform by Cloudera, Inc.
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.966 views
Federated Learning: ML with Privacy on the Edge 11.15.18 by Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.2.2K views
Analyst Webinar: Doing a 180 on Customer 360 by Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.1.4K views
Build a modern platform for anti-money laundering 9.19.18 by Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.1K views
Introducing the data science sandbox as a service 8.30.18 by Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.1.2K views

Recently uploaded

NTGapps NTG LowCode Platform by
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform Mustafa Kuğu
437 views30 slides
"Package management in monorepos", Zoltan Kochan by
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan KochanFwdays
34 views18 slides
LLMs in Production: Tooling, Process, and Team Structure by
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
57 views77 slides
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
162 views59 slides
GDSC GLAU Info Session.pptx by
GDSC GLAU Info Session.pptxGDSC GLAU Info Session.pptx
GDSC GLAU Info Session.pptxgauriverrma4
15 views28 slides
CryptoBotsAI by
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 views5 slides

Recently uploaded(20)

NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 views
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays34 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage57 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash162 views
GDSC GLAU Info Session.pptx by gauriverrma4
GDSC GLAU Info Session.pptxGDSC GLAU Info Session.pptx
GDSC GLAU Info Session.pptx
gauriverrma415 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Measurecamp Brussels - Synthetic data.pdf by Human37
Measurecamp Brussels - Synthetic data.pdfMeasurecamp Brussels - Synthetic data.pdf
Measurecamp Brussels - Synthetic data.pdf
Human37 26 views
AI + Memoori = AIM by Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori14 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Deep Tech and the Amplified Organisation: Core Concepts by Holonomics
Deep Tech and the Amplified Organisation: Core ConceptsDeep Tech and the Amplified Organisation: Core Concepts
Deep Tech and the Amplified Organisation: Core Concepts
Holonomics17 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf by MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 views
Cocktail of Environments. How to Mix Test and Development Environments and St... by Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

  • 1. Developing Real Time Analytics Applications Using HBase in the Cloud May 22, 2012 Rick Tucker tech@sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 2. About Sproxil • Brand protection, specializing in anti- 1 SCRATCH counterfeiting solutions • Solution requires a scalable and high- throughput text 2 message processing TEXT engine • Supports a real-time analytics web interface 3 VERIFY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 3. Why HBase? USER SENDS TEXT MESSAGE CALCULATE TEXT MESSAGE IS PROCESSED ANALYTICS USER Amazon EC2 RECEIVES Cloud REPLY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 4. Real-Time Analytics Engine • MapReduce too slow to maintain data in true real time • As data arrives, analytical data is updated through counters Text Message Message Increment Arrives Analyzed Counters Genuine Product +1 Increment Counter for Authentication Genuine Authentications Repeat Customer +1 Increment Counter for Repeat Customers tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 5. Schema Design: Example 1 • Example: View log of text messages in chronological order • Rowkey: row prefix + timestamp Row transaction 2012-05-22 12:00:00 transaction 2012-05-22 12:01:14 transaction 2012-05-22 12:02:03 Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 5
  • 6. • Row transaction userID 1 2012-05-22 12:00:00 transaction userID 1 2012-05-22 12:01:14 transaction userID 2 2012-05-22 12:00:54 transaction userID 2 2012-05-22 12:01:22 transaction userID 2 2012-05-22 12:02:01 Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 7. Critical Findings • Schema design is crucial for successful HBase implementation – Pack as much info as possible into row keys • Use caution with Filters – E.g. Regex filters can be costly – Alternatives: • Directly query for data you need • Use efficient filters when filtering large data sets tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 8. Thank You! Your global brand protection specialists – spanning 3 continents and Making Counterfeiting Unprofitable™ speaking 9 languages tech@sproxil.com +1 617 682 9577 America | Asia | Africa Sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 8

Editor's Notes

  1. Processed large volume of text messages, has even led to arrest of counterfeiters
  2. High speed transactional operations criticalHandle large volumes of text messages quicklyLarge volume of dataMillions of recordsSchema supports sparse data
  3. Explain why regex is costly