Developing Real Time Analytics
Applications Using HBase in the Cloud

                        May 22, 2012
                         Rick Tucker
                      tech@sproxil.com




   tech@sproxil.com        May 22,2012   © 2012 Sproxil, Inc.
About Sproxil
• Brand protection,
  specializing in anti-
                                                 1
                                          SCRATCH
  counterfeiting solutions

• Solution requires a
  scalable and high-
  throughput text                                2
  message processing                           TEXT
  engine

• Supports a real-time
  analytics web interface                         3
                                             VERIFY



      tech@sproxil.com   May 22,2012   © 2012 Sproxil, Inc.
Why HBase?

 USER SENDS                TEXT MESSAGE              CALCULATE
TEXT MESSAGE               IS PROCESSED              ANALYTICS




    USER                    Amazon EC2
  RECEIVES                    Cloud
   REPLY




        tech@sproxil.com       May 22,2012   © 2012 Sproxil, Inc.
Real-Time Analytics Engine
 • MapReduce too slow to maintain data in true real time

 • As data arrives, analytical data is updated through
   counters

Text Message                    Message                     Increment
   Arrives                      Analyzed                     Counters

                            Genuine Product      +1 Increment Counter for
                            Authentication          Genuine Authentications


                            Repeat Customer      +1 Increment Counter for
                                                    Repeat Customers


         tech@sproxil.com          May 22,2012        © 2012 Sproxil, Inc.
Schema Design: Example 1

• Example: View log of text messages in
  chronological order
        • Rowkey: row prefix + timestamp

      Row
      transaction 2012-05-22 12:00:00
      transaction 2012-05-22 12:01:14
      transaction 2012-05-22 12:02:03

Note: HBase sorts rowkeys lexicographically so scans return data in reverse
chronological order
         tech@sproxil.com          May 22,2012              © 2012 Sproxil, Inc.   5
•
         •


    Row
    transaction userID 1 2012-05-22 12:00:00
    transaction userID 1 2012-05-22 12:01:14
    transaction userID 2 2012-05-22 12:00:54
    transaction userID 2 2012-05-22 12:01:22
    transaction userID 2 2012-05-22 12:02:01
Note: Hbase sorts rows lexicographically so scans return data in reverse
chronological order

          tech@sproxil.com             May 22,2012                 © 2012 Sproxil, Inc.
Critical Findings
• Schema design is crucial for successful HBase
  implementation
  – Pack as much info as possible into row keys


• Use caution with Filters
  – E.g. Regex filters can be costly
  – Alternatives:
     • Directly query for data you need
     • Use efficient filters when filtering large data sets




      tech@sproxil.com         May 22,2012             © 2012 Sproxil, Inc.
Thank You!                                 Your global brand
                                                 protection specialists
                                                     – spanning 3
                                                    continents and
  Making Counterfeiting Unprofitable™            speaking 9 languages




                                                   tech@sproxil.com

                                                    +1 617 682 9577

America | Asia | Africa     Sproxil.com



         tech@sproxil.com          May 22,2012           © 2012 Sproxil, Inc.   8

HBaseCon 2012 | Developing Real Time Analytics Applications Using HBase in the Cloud - Rick Tucker, Sproxil

  • 1.
    Developing Real TimeAnalytics Applications Using HBase in the Cloud May 22, 2012 Rick Tucker tech@sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 2.
    About Sproxil • Brandprotection, specializing in anti- 1 SCRATCH counterfeiting solutions • Solution requires a scalable and high- throughput text 2 message processing TEXT engine • Supports a real-time analytics web interface 3 VERIFY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 3.
    Why HBase? USERSENDS TEXT MESSAGE CALCULATE TEXT MESSAGE IS PROCESSED ANALYTICS USER Amazon EC2 RECEIVES Cloud REPLY tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 4.
    Real-Time Analytics Engine • MapReduce too slow to maintain data in true real time • As data arrives, analytical data is updated through counters Text Message Message Increment Arrives Analyzed Counters Genuine Product +1 Increment Counter for Authentication Genuine Authentications Repeat Customer +1 Increment Counter for Repeat Customers tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 5.
    Schema Design: Example1 • Example: View log of text messages in chronological order • Rowkey: row prefix + timestamp Row transaction 2012-05-22 12:00:00 transaction 2012-05-22 12:01:14 transaction 2012-05-22 12:02:03 Note: HBase sorts rowkeys lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 5
  • 6.
    • Row transaction userID 1 2012-05-22 12:00:00 transaction userID 1 2012-05-22 12:01:14 transaction userID 2 2012-05-22 12:00:54 transaction userID 2 2012-05-22 12:01:22 transaction userID 2 2012-05-22 12:02:01 Note: Hbase sorts rows lexicographically so scans return data in reverse chronological order tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 7.
    Critical Findings • Schemadesign is crucial for successful HBase implementation – Pack as much info as possible into row keys • Use caution with Filters – E.g. Regex filters can be costly – Alternatives: • Directly query for data you need • Use efficient filters when filtering large data sets tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc.
  • 8.
    Thank You! Your global brand protection specialists – spanning 3 continents and Making Counterfeiting Unprofitable™ speaking 9 languages tech@sproxil.com +1 617 682 9577 America | Asia | Africa Sproxil.com tech@sproxil.com May 22,2012 © 2012 Sproxil, Inc. 8

Editor's Notes

  • #3 Processed large volume of text messages, has even led to arrest of counterfeiters
  • #4 High speed transactional operations criticalHandle large volumes of text messages quicklyLarge volume of dataMillions of recordsSchema supports sparse data
  • #8 Explain why regex is costly