SlideShare a Scribd company logo
Low-Latency “OLAP” with Hadoop and HBase
      Andrei Dragomir | Software Engineer




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Synopsis


  §  What                        are we trying to solve
  §  Description                                              of our system
  §  How                     it works
  §  Minimizing                                            Latency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   2
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   3
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   4
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   5
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   6
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   7
In a nutshell


  Low-latency OLAP system
  Hadoop DFS to store input data (ie log files, or
  HBase tables)
  The processing loop of the system takes a cube
  description and processes it (pre-aggregations)
  using Hadoop Map/Reduce.
  The output is written to a statistics HBase table.
  To get the data, users query a server, which scans
  the HBase table, applying the filters, roll-ups or
  drill-downs, and returning the result.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   8
Vocabulary

  Date                              Country                            City          OS       Browser      Sales
  2012-05-12                        USA                                NY            Win      FF           $ 0.0
  2012-05-12                        USA                                NY            Win      FF           $ 10.0
  2012-05-13                        USA                                SF            OSX      Chrome       $ 25.0
  2012-05-13                        Canada                             Ontario       Linux    Chrome       $ 0.0
  2012-05-14                        USA                                Chicago       OSX      Safari       $ 15.0
  ...                               ...                                ...           ...      ...          ...
  5 Visits                          2 Countries 4 Cities:                            3 OS:    3 Browser:   $50.0
  3 Days                            USA: 4      NY: 2                                Win: 2   FF: 2        3 sales
                                    Canada: 1   SF: 1                                OSX: 2   Chrome: 2




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.      9
Vocabulary

  Date                              Country                            City           OS       Browser      Sales
  2012-05-12                        USA                                NY             Win      FF           $ 0.0
  2012-05-12                        USA                                NY             Win      FF           $ 10.0
  2012-05-13                        USA                                SF             OSX      Chrome       $ 25.0
  2012-05-13                        Canada                             Ontario        Linux    Chrome       $ 0.0
  2012-05-14                        USA                                Chicago        OSX      Safari       $ 15.0
  ...                               ...                                ...            ...      ...          ...
  5 Visits                          2 Countries 4 Cities:                             3 OS:    3 Browser:   $50.0
  3 Days                            USA: 4      NY: 2                                 Win: 2   FF: 2        3 sales
                                    Canada: 1   SF: 1                                 OSX: 2   Chrome: 2
  §    We want to get (mostly) numeric data: metrics
  §    These metrics have a set of labels (dimensions)
  §    We want to view the metrics by any combination of
        dimensions
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.      10
Vocabulary

  Date                              Country                            City           OS       Browser      Sales
  2012-05-12                        USA                                NY             Win      FF           $ 0.0
  2012-05-12                        USA                                NY             Win      FF           $ 10.0
  2012-05-13                        USA                                SF             OSX      Chrome       $ 25.0
  2012-05-13                        Canada                             Ontario        Linux    Chrome       $ 0.0
  2012-05-14                        USA                                Chicago        OSX      Safari       $ 15.0
  ...                               ...                                ...            ...      ...          ...
  5 Visits                          2 Countries 4 Cities:                             3 OS:    3 Browser:   $50.0
  3 Days                            USA: 4      NY: 2                                 Win: 2   FF: 2        3 sales
                                    Canada: 1   SF: 1                                 OSX: 2   Chrome: 2
  §    We want to get (mostly) numeric data: metrics
  §    These metrics have a set of labels (dimensions)
  §    We want to view the metrics by any combination of
        dimensions
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.      11
Vocabulary

  Date                              Country                            City           OS       Browser      Sales
  2012-05-12                        USA                                NY             Win      FF           $ 0.0
  2012-05-12                        USA                                NY             Win      FF           $ 10.0
  2012-05-13                        USA                                SF             OSX      Chrome       $ 25.0
  2012-05-13                        Canada                             Ontario        Linux    Chrome       $ 0.0
  2012-05-14                        USA                                Chicago        OSX      Safari       $ 15.0
  ...                               ...                                ...            ...      ...          ...
  5 Visits                          2 Countries 4 Cities:                             3 OS:    3 Browser:   $50.0
  3 Days                            USA: 4      NY: 2                                 Win: 2   FF: 2        3 sales
                                    Canada: 1   SF: 1                                 OSX: 2   Chrome: 2
  §    We want to get (mostly) numeric data: metrics
  §    These metrics have a set of labels (dimensions)
  §    We want to view the metrics by any combination of
        dimensions
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.      12
OLAP Queries

  §    Rolling up to country level                                                Country    visits   sales
  SELECT	
  COUNT(visits),	
  SUM(sales)	
                                         USA        4        $50
  GROUP	
  BY	
  country	
  
                                                                                   Canada     1        0




  §    “Slicing” by browser                                                       Country   visits sales

  SELECT	
  COUNT(visits),	
  SUM(sales)	
                                         USA       2         $10

  GROUP	
  BY	
  country	
                                                         Canada    0         0
  HAVING	
  browser	
  =	
  “FF”	
  

                                                                                   Browser   sales     visits
  §    Top browsers by sales
                                                                                   Chrome    $25       2
  SELECT	
  SUM(sales),	
  COUNT(visits)	
  	
  
  GROUP	
  BY	
  browser	
  	
                                                     Safari    $15       1

  ORDER	
  BY	
  sales	
                                                           FF        $10       2

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   13
Looking inside – physical diagram




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Looking inside – logical diagram




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Simplifying assumptions: pre-aggregation


  §  In          most cases...
       §  Data  needs to be summarized – hard to
             draw 1B data points
       §  You    don’t need to look at all dimensions at
             the same time – hard to correlate
       §  Not   all queries are used with the same
             frequency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   16
A timeless CS problem: Optimize...


                                       Time                                                     Space
       §  Pre-aggregation                                                         §  Runtime

       §  Fast
                                                                                     aggregation
                                                                                   §  Flexible
       §  Efficient                               reads –
             O(1)
       §  Inflexible                                                              §  I/O,   CPU intensive
       §  Processing                                           latency            §  Slow– always need
       §  Combinatorial
                                                                                     to look at all the
             Explosion                                                               data
                                                                                   §  Low    throughput
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   17
Solution ?


  §  Just do both !
  §  Can tune: pre-aggregate more, or rely on
      runtime aggregation
  §  Ingestion + process speed vs Query speed

  §  Works just like normal queries +
      materialized views




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   18
Solution ?


  §  Process:   pre-aggregate all the report
       definitions, create an indexed HBase table.
  §  Query:   use the indexes to get the data
       fast. Perform extra aggregation, filtering if
       needed at runtime.
  §  Platform                                   strengths
       §  Parallelism                                         in M/R
       §  Fast  access and natural key ordering in
             HBase
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   19
Minimal HBase details

                                                                                   Row	
     Columns...	
  
  §    Data is stored in tables                                                   Key	
  
                                                                                   u1	
      v1	
      v2	
      v3	
  
  §    Each row has a key,
                                                                                   u2	
      v	
       X	
       ...	
  
        and any number of
        columns (long & wide)                                                      u3	
      v	
       x	
       ...	
  
                                                                                   u4	
      x	
       v2	
      ...	
  
  §    Ordered by row keys:                                                       u5	
      ...	
     v3	
      ...	
  
        clustered indexes
                                                                                   u6	
      ...	
     v5	
      ...	
  
        built-in
                                                                                   u7	
      ...	
     ...	
     ...	
  
  §    Sparse tables. NULLs                                                       u8	
      ...	
     ...	
     ...	
  
        are free.


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   20
Minimal HBase details

                                                                                           Row	
     Column
  §    Operations use row                                                                 key	
     ...	
  
        key: get(), put()	
                                                                aaa	
     v1	
  
                                                                                           aab	
     v2	
  
  §    Can scan a range of
                                                                                   ←	
  
        rows:[start,	
  end)	
                                                             aac	
     v3	
  
                                                                                   ←	
     aad	
     v4	
  
  §  We   can use the row                                                         ←	
     aae	
     v5	
  
        key as a built-in                                                          ←	
     aaf	
     v6	
  
        indexing                                                                           aba	
     ...	
  
        mechanism                                                                          abb	
     ...	
  




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   21
SaasBase vs. SQL Views Comparison




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   22
Reports configuration


  §    List of Dimensions (with custom classes,
        arguments, etc)
  §    List of Metrics (with custom classes, arguments,
        etc)
  §    List of Reports, each containing
        §    Dimensions (subset)
        §    Metrics (subset)
        §    Sorting, etc
  §  The    reports configuration is used in the
        entire system: import, process, query
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   23
Solution ?

  Date                         Countr                  Cit           Sale
                               y                       y             s
  2012-05-1 USA                                        NY            3
  2
  2012-05-1 USA                                        NY            10
  2
  2012-05-1 USA                                        SF            25
  3
  2012-05-1 CAN                                        ON            0
  3
  2012-05-1 USA                                        CH            15
  4




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   24
Solution ?

  Date                         Countr                  Cit           Sale
                               y                       y             s
  2012-05-1 USA                                        NY            3
  2
  2012-05-1 USA                                        NY            10
  2
  2012-05-1 USA                                        SF            25
  3
  2012-05-1 CAN                                        ON            0
  3
  2012-05-1 USA
  visits_by_city:	
  {	
             CH 15
  	
  	
  dimensions:	
  [country,	
  city],	
  	
  
  4
  	
  	
  metrics:	
  [visits]	
  
  },	
  	
  
  daily_sales:	
  {	
  
  	
  	
  dimensions:	
  [year,	
  month,	
  day,	
  
  country],	
  	
  
  	
  	
  metrics:	
  [sales]	
  
  }	
  

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   25
Solution ?

  Date                         Countr                  Cit           Sale
                               y                       y             s
  2012-05-1 USA                                        NY            3
  2
  2012-05-1 USA                                        NY            10
  2                                                                           	
  	
  	
  Statistics	
  HBASE	
  Output	
  Table	
  
                                                                                             	
  	
  	
  	
  	
  ROWKEY	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  VALUE	
  
  2012-05-1 USA                                        SF            25
  3                                                                           daily_sales/2012+05+12+USA	
  	
  	
  	
  $13	
  	
  
                                                                              daily_sales/2012+05+13+CAN	
  	
  	
  	
  $0	
  
  2012-05-1 CAN                                        ON            0
                                                                              daily_sales/2012+05+13+USA	
  	
  	
  	
  $25	
  
  3
                                                                              daily_sales/2012+05+14+USA	
  	
  	
  	
  $15	
  
  2012-05-1 USA
  visits_by_city:	
  {	
             CH 15                                    visits_by_city/CAN+ON	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  
  	
  	
  dimensions:	
  [country,	
  city],	
  	
  
  4
  	
  	
  metrics:	
  [visits]	
                                              visits_by_city/USA+CH	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  
  },	
  	
  
  daily_sales:	
  {	
                                                         visits_by_city/USA+NY	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  
  	
  	
  dimensions:	
  [year,	
  month,	
  day,	
                           visits_by_city/USA+SF	
  	
  	
  	
  	
  	
  	
  	
  	
  1	
  
  country],	
  	
  
  	
  	
  metrics:	
  [sales]	
  
  }	
  

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.     26
HBase natural order: hierarchical filtering




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   27
Sorting


  §  Add  the metrics that you want to sort by to the
       row key...
  §  In          a way that preserves the ordering




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   28
Sorting


  §  Add   the metrics that you want to sort by to the
        row key...
  §  In          a way that preserves the ordering
  §    ORDER	
  BY	
  metric	
  DESC	
  ==	
  Long.MAX_VALUE	
  –	
  metric	
  


  2012+05+USA+0000000000+	
  
  2012+05+USA+4294961296+SF 	
  =	
  1000	
  visits	
  
  2012+05+USA+4294961396+NY 	
  =	
  900	
  visits	
  
  .	
  .	
  .	
  	
  	
  
  2012+05+USA+9999999999+	
  


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   29
Minimizing Latency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Minimizing Import Latency


  §    Only import the minimal set of changes
  §    Map/Reduce input filters:
        §    c.a.s.a.i.FileCache – checks if file already
              processed
        §    c.a.s.a.i.FileDateFilter – checks if a date in
              the file path is against a specified interval
        §    process files from 3 days ago up until now,
              once
        §    HBase scan (from import table) start and stop row
  §    Minimize map-task overhead – stitch input splits
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   31
Minimizing Import Latency


  §    Minimize map-task overhead – stitch input splits
  §    for 400000 files -> 400000 Map Tasks, slow reduce-copy
        phase
  §    o.a.h.m.i.CombineFileInputFormat – make 2GB
        splits
  §    c.a.s.a.m.i.FixedMappersTableInputFormat –
        stitches multiple HBase regions in the same
        map task



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   32
Minimizing Import Latency


  §    If warehousing in HBase, use
        o.a.h.h.m.HFileOutputFormat	
  
  §    ~ 100 times faster than using the API
  §    No shuffle step! you must use a global order partitioner
  §    Problem: data grows over time
  §    Solution: estimate output partitions based on input data
        size, and make partitions (regions) using this heuristic
  §    c.a.s.a.m.FileSizeDatePartitioner – inject input files
        size and dates and rebalance regions based on these,
        and a fixed size (2GB)


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   33
Minimizing Processing Latency


  §    Processing involves reading the input (files, tables,
        events), pre-aggregating it (reducing cardinality) and
        generating tables that can be queried in real-time
  §    Processing does GROUP BY, COUNT/SUM/AVG, ORDER
        BY
  §    Minimize each M/R step: read, map, partition, combine,
        copy, sort, reduce, write
  §    Read
        §    Filter input data (incremental processing) – differentiate
              between OPEN and CLOSED data
        §    HBase Scan options: caching, batching, etc
        §    Ensure HBase table regions are distributed in the cluster
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   34
Minimizing Processing Latency


  §    c.a.s.a.m.j.SuperProcessor	
  
        §    One shot M/R job: for all data, for all reports, emit the
              pre-aggregated values in 1 map() call
        §    no allocations
        §    Simple and tight
        §    no system calls (avoid context switches)
        §    no String <> byte[] transformations
        §    minimize Map > Combine > Reduce I/O
        §    NO ALLOCATIONS



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   35
Minimizing Query Latency


  §    c.a.s.a.m.t.ReportHandler	
  
        §    Simple Thrift server
  §    Data is already processed and pre-aggregated
  §    Query time does HAVING/WHERE (filters), extra
        GROUP BY (roll-ups)
  §    Calculate an optimal set of HBase scan()s	
  
        §    single / multiple scans
        §    start / stop rows (prefixes, index positions)
  §    Perform extra roll-ups / sorting
  §    Assorted sundries: paging, display-time ser/des, etc

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   36
Flexible


  §    Report configuration – the core of the system
  §    c.a.s.a.e.Dimension, c.a.s.a.e.Metric	
  
        §    Can override ser/des, aggregate functions (for metrics)
        §    Can override behavior (only add 1 if X...)
        §    Emergent patterns are rolled-up in the reporting core
  §    The entire processing loop can be written outside of
        M/R for realtime
        §    Storm ?
  §    Applied in 4 use-cases right now, easy to extend
  §    Some programming required
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   37
Thank you


                                 adragomi@adobe.com / @adragomir
                                          http://hstack.org


       Our team: Adrian Muraru, Andrei Dulvac, Bogdan Dragu,
     Bogdan Drutu, Cosmin Lehene, Raluca Podiuc, Tudor Scurtu

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Break!
Break takes place in the Community Showcase (Hall 2)
Sessions will resume at 3:35pm




                                                       Page 40

More Related Content

What's hot

Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadataLouis liu
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
Sergio Zenatti Filho
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
Quentin Ambard
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
Yang Li
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
Adam Doyle
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
Edureka!
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
Tyler Wishnoff
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
Databricks
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
Dori Waldman
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
CockroachDB
 

What's hot (20)

Teradata vs-exadata
Teradata vs-exadataTeradata vs-exadata
Teradata vs-exadata
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
What is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | EdurekaWhat is HDFS | Hadoop Distributed File System | Edureka
What is HDFS | Hadoop Distributed File System | Edureka
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
iceberg introduction.pptx
iceberg introduction.pptxiceberg introduction.pptx
iceberg introduction.pptx
 
PostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQLPostgreSQL and CockroachDB SQL
PostgreSQL and CockroachDB SQL
 

Viewers also liked

Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Cosmin Lehene
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
DataWorks Summit
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
DataWorks Summit/Hadoop Summit
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
HBaseCon
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
Cloudera, Inc.
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
Luke Han
 
(Ebook pdf) olap
(Ebook   pdf) olap(Ebook   pdf) olap
(Ebook pdf) olapTalita Lima
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
Xu Jiang
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Yang Li
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
Xu Jiang
 
eBay Cloud CMS - QCon 2012 - http://yidb.org/
eBay Cloud CMS - QCon 2012 - http://yidb.org/eBay Cloud CMS - QCon 2012 - http://yidb.org/
eBay Cloud CMS - QCon 2012 - http://yidb.org/
Xu Jiang
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
Luke Han
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
DataWorks Summit
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
Seshu Adunuthula
 
Polyglot Messaging with Apache ActiveMQ
Polyglot Messaging with Apache ActiveMQPolyglot Messaging with Apache ActiveMQ
Polyglot Messaging with Apache ActiveMQ
Christian Posta
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
Rostislav Pashuto
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
Lars Albertsson
 

Viewers also liked (20)

Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
(Ebook pdf) olap
(Ebook   pdf) olap(Ebook   pdf) olap
(Ebook pdf) olap
 
Sybase BAM Overview
Sybase BAM OverviewSybase BAM Overview
Sybase BAM Overview
 
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 DecApache Kylin: Hadoop OLAP Engine, 2014 Dec
Apache Kylin: Hadoop OLAP Engine, 2014 Dec
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
eBay Cloud CMS - QCon 2012 - http://yidb.org/
eBay Cloud CMS - QCon 2012 - http://yidb.org/eBay Cloud CMS - QCon 2012 - http://yidb.org/
eBay Cloud CMS - QCon 2012 - http://yidb.org/
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Polyglot Messaging with Apache ActiveMQ
Polyglot Messaging with Apache ActiveMQPolyglot Messaging with Apache ActiveMQ
Polyglot Messaging with Apache ActiveMQ
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 

Similar to Low Latency OLAP with Hadoop and HBase

Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setup
Dong Lin
 
AD303 - Extreme Makeover: IBM Lotus Domino Application Edition
AD303 - Extreme Makeover: IBM Lotus Domino Application EditionAD303 - Extreme Makeover: IBM Lotus Domino Application Edition
AD303 - Extreme Makeover: IBM Lotus Domino Application Edition
Ray Bilyk
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
David Poblador i Garcia
 
Ajax for-coldfusion-developers
Ajax for-coldfusion-developersAjax for-coldfusion-developers
Ajax for-coldfusion-developersSudhakar Ganta
 
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
Kai Koenig
 
AD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
AD303: Extreme Makeover – IBM® Lotus® Domino® Application EditionAD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
AD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
Ray Bilyk
 
Developing for consoles as an indie in 2019
Developing for consoles as an indie in 2019Developing for consoles as an indie in 2019
Developing for consoles as an indie in 2019
David Voyles
 
Developing games for consoles as an indie in 2019
Developing games for consoles as an indie in 2019Developing games for consoles as an indie in 2019
Developing games for consoles as an indie in 2019
David Voyles
 
So go installation guide
So go installation guideSo go installation guide
So go installation guideJavier Urbaneja
 
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
Robert MacLean
 
Developing for Consoles as an Indie in 2015
Developing for Consoles as an Indie in 2015Developing for Consoles as an Indie in 2015
Developing for Consoles as an Indie in 2015
Sarah Sexton
 
Tom Krcha: Building Games with Adobe Technologies
Tom Krcha: Building Games with Adobe TechnologiesTom Krcha: Building Games with Adobe Technologies
Tom Krcha: Building Games with Adobe Technologies
DevGAMM Conference
 
Adobe Gaming Solutions by Tom Krcha
Adobe Gaming Solutions by Tom KrchaAdobe Gaming Solutions by Tom Krcha
Adobe Gaming Solutions by Tom Krchamochimedia
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrencyConcurrency, Inc.
 
Macs OSX & Libraries
Macs OSX & LibrariesMacs OSX & Libraries
Macs OSX & Libraries
Scott Kehoe
 
Business Case: IBM DB2 versus Oracle Database - Conor O'Mahony
Business Case: IBM DB2 versus Oracle Database - Conor O'MahonyBusiness Case: IBM DB2 versus Oracle Database - Conor O'Mahony
Business Case: IBM DB2 versus Oracle Database - Conor O'Mahony
comahony
 
Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...
North Bend Public Library
 
Next mmorpg architecture-siggraph_asia2010
Next mmorpg architecture-siggraph_asia2010Next mmorpg architecture-siggraph_asia2010
Next mmorpg architecture-siggraph_asia2010
Jongwon Kim
 

Similar to Low Latency OLAP with Hadoop and HBase (20)

Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setup
 
AD303 - Extreme Makeover: IBM Lotus Domino Application Edition
AD303 - Extreme Makeover: IBM Lotus Domino Application EditionAD303 - Extreme Makeover: IBM Lotus Domino Application Edition
AD303 - Extreme Makeover: IBM Lotus Domino Application Edition
 
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify LearningsObvious and Non-Obvious Scalability Issues: Spotify Learnings
Obvious and Non-Obvious Scalability Issues: Spotify Learnings
 
Ajax for-coldfusion-developers
Ajax for-coldfusion-developersAjax for-coldfusion-developers
Ajax for-coldfusion-developers
 
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
Apps vs. Sites vs. Content - a vendor-agnostic view on building stuff for the...
 
AD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
AD303: Extreme Makeover – IBM® Lotus® Domino® Application EditionAD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
AD303: Extreme Makeover – IBM® Lotus® Domino® Application Edition
 
Developing for consoles as an indie in 2019
Developing for consoles as an indie in 2019Developing for consoles as an indie in 2019
Developing for consoles as an indie in 2019
 
Developing games for consoles as an indie in 2019
Developing games for consoles as an indie in 2019Developing games for consoles as an indie in 2019
Developing games for consoles as an indie in 2019
 
So go installation guide
So go installation guideSo go installation guide
So go installation guide
 
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
DevDays 2011- Let’s get ready for the cloud: Building your applications so th...
 
Developing for Consoles as an Indie in 2015
Developing for Consoles as an Indie in 2015Developing for Consoles as an Indie in 2015
Developing for Consoles as an Indie in 2015
 
01 lab1
01 lab101 lab1
01 lab1
 
Tom Krcha: Building Games with Adobe Technologies
Tom Krcha: Building Games with Adobe TechnologiesTom Krcha: Building Games with Adobe Technologies
Tom Krcha: Building Games with Adobe Technologies
 
Adobe Gaming Solutions by Tom Krcha
Adobe Gaming Solutions by Tom KrchaAdobe Gaming Solutions by Tom Krcha
Adobe Gaming Solutions by Tom Krcha
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrency
 
Macs OSX & Libraries
Macs OSX & LibrariesMacs OSX & Libraries
Macs OSX & Libraries
 
Business Case: IBM DB2 versus Oracle Database - Conor O'Mahony
Business Case: IBM DB2 versus Oracle Database - Conor O'MahonyBusiness Case: IBM DB2 versus Oracle Database - Conor O'Mahony
Business Case: IBM DB2 versus Oracle Database - Conor O'Mahony
 
Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...Even internet computers want to be free: Using Linux and open source software...
Even internet computers want to be free: Using Linux and open source software...
 
Next mmorpg architecture-siggraph_asia2010
Next mmorpg architecture-siggraph_asia2010Next mmorpg architecture-siggraph_asia2010
Next mmorpg architecture-siggraph_asia2010
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Low Latency OLAP with Hadoop and HBase

  • 1. Low-Latency “OLAP” with Hadoop and HBase Andrei Dragomir | Software Engineer © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 2. Synopsis §  What are we trying to solve §  Description of our system §  How it works §  Minimizing Latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  • 3. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  • 4. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  • 5. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  • 6. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
  • 7. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  • 8. In a nutshell Low-latency OLAP system Hadoop DFS to store input data (ie log files, or HBase tables) The processing loop of the system takes a cube description and processes it (pre-aggregations) using Hadoop Map/Reduce. The output is written to a statistics HBase table. To get the data, users query a server, which scans the HBase table, applying the filters, roll-ups or drill-downs, and returning the result. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  • 9. Vocabulary Date Country City OS Browser Sales 2012-05-12 USA NY Win FF $ 0.0 2012-05-12 USA NY Win FF $ 10.0 2012-05-13 USA SF OSX Chrome $ 25.0 2012-05-13 Canada Ontario Linux Chrome $ 0.0 2012-05-14 USA Chicago OSX Safari $ 15.0 ... ... ... ... ... ... 5 Visits 2 Countries 4 Cities: 3 OS: 3 Browser: $50.0 3 Days USA: 4 NY: 2 Win: 2 FF: 2 3 sales Canada: 1 SF: 1 OSX: 2 Chrome: 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
  • 10. Vocabulary Date Country City OS Browser Sales 2012-05-12 USA NY Win FF $ 0.0 2012-05-12 USA NY Win FF $ 10.0 2012-05-13 USA SF OSX Chrome $ 25.0 2012-05-13 Canada Ontario Linux Chrome $ 0.0 2012-05-14 USA Chicago OSX Safari $ 15.0 ... ... ... ... ... ... 5 Visits 2 Countries 4 Cities: 3 OS: 3 Browser: $50.0 3 Days USA: 4 NY: 2 Win: 2 FF: 2 3 sales Canada: 1 SF: 1 OSX: 2 Chrome: 2 §  We want to get (mostly) numeric data: metrics §  These metrics have a set of labels (dimensions) §  We want to view the metrics by any combination of dimensions © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
  • 11. Vocabulary Date Country City OS Browser Sales 2012-05-12 USA NY Win FF $ 0.0 2012-05-12 USA NY Win FF $ 10.0 2012-05-13 USA SF OSX Chrome $ 25.0 2012-05-13 Canada Ontario Linux Chrome $ 0.0 2012-05-14 USA Chicago OSX Safari $ 15.0 ... ... ... ... ... ... 5 Visits 2 Countries 4 Cities: 3 OS: 3 Browser: $50.0 3 Days USA: 4 NY: 2 Win: 2 FF: 2 3 sales Canada: 1 SF: 1 OSX: 2 Chrome: 2 §  We want to get (mostly) numeric data: metrics §  These metrics have a set of labels (dimensions) §  We want to view the metrics by any combination of dimensions © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  • 12. Vocabulary Date Country City OS Browser Sales 2012-05-12 USA NY Win FF $ 0.0 2012-05-12 USA NY Win FF $ 10.0 2012-05-13 USA SF OSX Chrome $ 25.0 2012-05-13 Canada Ontario Linux Chrome $ 0.0 2012-05-14 USA Chicago OSX Safari $ 15.0 ... ... ... ... ... ... 5 Visits 2 Countries 4 Cities: 3 OS: 3 Browser: $50.0 3 Days USA: 4 NY: 2 Win: 2 FF: 2 3 sales Canada: 1 SF: 1 OSX: 2 Chrome: 2 §  We want to get (mostly) numeric data: metrics §  These metrics have a set of labels (dimensions) §  We want to view the metrics by any combination of dimensions © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  • 13. OLAP Queries §  Rolling up to country level Country visits sales SELECT  COUNT(visits),  SUM(sales)   USA 4 $50 GROUP  BY  country   Canada 1 0 §  “Slicing” by browser Country visits sales SELECT  COUNT(visits),  SUM(sales)   USA 2 $10 GROUP  BY  country   Canada 0 0 HAVING  browser  =  “FF”   Browser sales visits §  Top browsers by sales Chrome $25 2 SELECT  SUM(sales),  COUNT(visits)     GROUP  BY  browser     Safari $15 1 ORDER  BY  sales   FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
  • 14. Looking inside – physical diagram © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 15. Looking inside – logical diagram © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 16. Simplifying assumptions: pre-aggregation §  In most cases... §  Data needs to be summarized – hard to draw 1B data points §  You don’t need to look at all dimensions at the same time – hard to correlate §  Not all queries are used with the same frequency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  • 17. A timeless CS problem: Optimize... Time Space §  Pre-aggregation §  Runtime §  Fast aggregation §  Flexible §  Efficient reads – O(1) §  Inflexible §  I/O, CPU intensive §  Processing latency §  Slow– always need §  Combinatorial to look at all the Explosion data §  Low throughput © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
  • 18. Solution ? §  Just do both ! §  Can tune: pre-aggregate more, or rely on runtime aggregation §  Ingestion + process speed vs Query speed §  Works just like normal queries + materialized views © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  • 19. Solution ? §  Process: pre-aggregate all the report definitions, create an indexed HBase table. §  Query: use the indexes to get the data fast. Perform extra aggregation, filtering if needed at runtime. §  Platform strengths §  Parallelism in M/R §  Fast access and natural key ordering in HBase © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  • 20. Minimal HBase details Row   Columns...   §  Data is stored in tables Key   u1   v1   v2   v3   §  Each row has a key, u2   v   X   ...   and any number of columns (long & wide) u3   v   x   ...   u4   x   v2   ...   §  Ordered by row keys: u5   ...   v3   ...   clustered indexes u6   ...   v5   ...   built-in u7   ...   ...   ...   §  Sparse tables. NULLs u8   ...   ...   ...   are free. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
  • 21. Minimal HBase details Row   Column §  Operations use row key   ...   key: get(), put()   aaa   v1   aab   v2   §  Can scan a range of ←   rows:[start,  end)   aac   v3   ←   aad   v4   §  We can use the row ←   aae   v5   key as a built-in ←   aaf   v6   indexing aba   ...   mechanism abb   ...   © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
  • 22. SaasBase vs. SQL Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  • 23. Reports configuration §  List of Dimensions (with custom classes, arguments, etc) §  List of Metrics (with custom classes, arguments, etc) §  List of Reports, each containing §  Dimensions (subset) §  Metrics (subset) §  Sorting, etc §  The reports configuration is used in the entire system: import, process, query © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  • 24. Solution ? Date Countr Cit Sale y y s 2012-05-1 USA NY 3 2 2012-05-1 USA NY 10 2 2012-05-1 USA SF 25 3 2012-05-1 CAN ON 0 3 2012-05-1 USA CH 15 4 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  • 25. Solution ? Date Countr Cit Sale y y s 2012-05-1 USA NY 3 2 2012-05-1 USA NY 10 2 2012-05-1 USA SF 25 3 2012-05-1 CAN ON 0 3 2012-05-1 USA visits_by_city:  {   CH 15    dimensions:  [country,  city],     4    metrics:  [visits]   },     daily_sales:  {      dimensions:  [year,  month,  day,   country],        metrics:  [sales]   }   © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  • 26. Solution ? Date Countr Cit Sale y y s 2012-05-1 USA NY 3 2 2012-05-1 USA NY 10 2      Statistics  HBASE  Output  Table            ROWKEY                        VALUE   2012-05-1 USA SF 25 3 daily_sales/2012+05+12+USA        $13     daily_sales/2012+05+13+CAN        $0   2012-05-1 CAN ON 0 daily_sales/2012+05+13+USA        $25   3 daily_sales/2012+05+14+USA        $15   2012-05-1 USA visits_by_city:  {   CH 15 visits_by_city/CAN+ON                  1      dimensions:  [country,  city],     4    metrics:  [visits]   visits_by_city/USA+CH                  1   },     daily_sales:  {   visits_by_city/USA+NY                  2      dimensions:  [year,  month,  day,   visits_by_city/USA+SF                  1   country],        metrics:  [sales]   }   © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  • 27. HBase natural order: hierarchical filtering © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
  • 28. Sorting §  Add the metrics that you want to sort by to the row key... §  In a way that preserves the ordering © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  • 29. Sorting §  Add the metrics that you want to sort by to the row key... §  In a way that preserves the ordering §  ORDER  BY  metric  DESC  ==  Long.MAX_VALUE  –  metric   2012+05+USA+0000000000+   2012+05+USA+4294961296+SF  =  1000  visits   2012+05+USA+4294961396+NY  =  900  visits   .  .  .       2012+05+USA+9999999999+   © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  • 30. Minimizing Latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 31. Minimizing Import Latency §  Only import the minimal set of changes §  Map/Reduce input filters: §  c.a.s.a.i.FileCache – checks if file already processed §  c.a.s.a.i.FileDateFilter – checks if a date in the file path is against a specified interval §  process files from 3 days ago up until now, once §  HBase scan (from import table) start and stop row §  Minimize map-task overhead – stitch input splits © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
  • 32. Minimizing Import Latency §  Minimize map-task overhead – stitch input splits §  for 400000 files -> 400000 Map Tasks, slow reduce-copy phase §  o.a.h.m.i.CombineFileInputFormat – make 2GB splits §  c.a.s.a.m.i.FixedMappersTableInputFormat – stitches multiple HBase regions in the same map task © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 32
  • 33. Minimizing Import Latency §  If warehousing in HBase, use o.a.h.h.m.HFileOutputFormat   §  ~ 100 times faster than using the API §  No shuffle step! you must use a global order partitioner §  Problem: data grows over time §  Solution: estimate output partitions based on input data size, and make partitions (regions) using this heuristic §  c.a.s.a.m.FileSizeDatePartitioner – inject input files size and dates and rebalance regions based on these, and a fixed size (2GB) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 33
  • 34. Minimizing Processing Latency §  Processing involves reading the input (files, tables, events), pre-aggregating it (reducing cardinality) and generating tables that can be queried in real-time §  Processing does GROUP BY, COUNT/SUM/AVG, ORDER BY §  Minimize each M/R step: read, map, partition, combine, copy, sort, reduce, write §  Read §  Filter input data (incremental processing) – differentiate between OPEN and CLOSED data §  HBase Scan options: caching, batching, etc §  Ensure HBase table regions are distributed in the cluster © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 34
  • 35. Minimizing Processing Latency §  c.a.s.a.m.j.SuperProcessor   §  One shot M/R job: for all data, for all reports, emit the pre-aggregated values in 1 map() call §  no allocations §  Simple and tight §  no system calls (avoid context switches) §  no String <> byte[] transformations §  minimize Map > Combine > Reduce I/O §  NO ALLOCATIONS © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 35
  • 36. Minimizing Query Latency §  c.a.s.a.m.t.ReportHandler   §  Simple Thrift server §  Data is already processed and pre-aggregated §  Query time does HAVING/WHERE (filters), extra GROUP BY (roll-ups) §  Calculate an optimal set of HBase scan()s   §  single / multiple scans §  start / stop rows (prefixes, index positions) §  Perform extra roll-ups / sorting §  Assorted sundries: paging, display-time ser/des, etc © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 36
  • 37. Flexible §  Report configuration – the core of the system §  c.a.s.a.e.Dimension, c.a.s.a.e.Metric   §  Can override ser/des, aggregate functions (for metrics) §  Can override behavior (only add 1 if X...) §  Emergent patterns are rolled-up in the reporting core §  The entire processing loop can be written outside of M/R for realtime §  Storm ? §  Applied in 4 use-cases right now, easy to extend §  Some programming required © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 37
  • 38. Thank you adragomi@adobe.com / @adragomir http://hstack.org Our team: Adrian Muraru, Andrei Dulvac, Bogdan Dragu, Bogdan Drutu, Cosmin Lehene, Raluca Podiuc, Tudor Scurtu © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 39. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 40. Break! Break takes place in the Community Showcase (Hall 2) Sessions will resume at 3:35pm Page 40