SlideShare a Scribd company logo
1 of 32
Real-time “OLAP” for Big Data (+ use cases)
     Cosmin Lehene | Adobe
     #bigdataro - 30 January 2013




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What we needed … and built


      OLAP Semantics
      Low Latency Ingestion
      High Throughput
      Real-time Query API




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   2
“Physical” Building Blocks




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   3
Logical Building Blocks


      Dimensions, Metrics
      Aggregations
      Roll-up, drill-down, slicing and dicing, sorting




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   4
OLAP 101 – Queries example




                 Date                           Country                       City            OS        Browser      Sale

                 2012-05-21                     USA                           NY              Windows   FF           0.0

                 2012-05-21                     USA                           NY              Windows   FF           10.0

                 2012-05-22                     USA                           SF              OSX       Chrome       25.0

                 2012-05-22                     Canada                        Ontario         Linux     Chrome       0.0

                 2012-05-23                     USA                           Chicago         OSX       Safari       15.0

                 5 visits,                      2                             4 cities:       3 OS-es   3 browsers   50.0
                 3 days                         countries                     NY: 2           Win: 2    FF: 2        3 sales
                                                USA: 4                        SF: 1           OSX: 2    Chrome:2
                                                Canada: 1



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.               5
OLAP 101 – Queries example

      Rolling up to country level:                                               Country    visits   sales
  SELECT COUNT(visits), SUM(sales)
                                                                                  USA        4        $50
  GROUP BY country
                                                                                  Canada     1        0




      “Slice” by browser                                                         Country   visits sales
  SELECT COUNT(visits), SUM(sales)                                                USA       2         $10
  GROUP BY country
                                                                                  Canada    0         0
  HAVING browser = “FF”

                                                                                  Browser   sales     visits
      Top browsers by sales
  SELECT SUM(sales), COUNT(visits)                                                Chrome    $25       2

  GROUP BY browser                                                                Safari    $15       1
  ORDER BY sales                                                                  FF        $10       2

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   6
OLAP – Runtime Aggregation vs. Pre-aggregation


      Aggregate at runtime                                                      Pre-aggregate
            Most flexible                                                           Fast
            Fast – scatter gather                                                   Efficient – O(1)
            Space efficient                                                         High throughput
      But                                                                       But
            I/O, CPU intensive                                                      More effort to process (latency)
            slow for larger data                                                    Combinatorial explosion (space)
            low throughput                                                          No flexibility




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   7
SaasBase Map




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   8
SaasBase Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   9
SaasBase - Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   10
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   11
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   12
Ingestion




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   13
Ingestion(ETL) throughput vs. latency


      Historical data (large batches)
            Optimize for throughput
      Increments (latest data, smaller)
            Optimize for latency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   14
Processing




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   15
Processing



      Processing involves reading the Input (files, tables, events), pre-
       aggregating it (reducing cardinality) and generating cubes that can be
       queried in real-time


      “Super Processor” code running in Storm, Map-Reduce, HBase




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   16
Processing for OLAP semantics

            GROUP BY (process, query)
            COUNT, SUM, AVG, etc. (process, query)
            SORT (process, query)
            HAVING (mostly query, can define pre-process constraints)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   17
SaasBase vs. SQL Views Comparison




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   18
Query Engine

      Always reads indexed, compact data
      Query parsing
      Scan strategy
            Single vs. multiple scans
            Start/stop rows (prefixes, index positions, etc.)
            Index selection (volatile indexes with incremental processing)
      Deserialization
      Post-aggregation, sorting, fuzzy-sorting etc.
      Paging
      Custom dimension/metric class loading




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   19
Adobe Business Catalyst

      Online business presence: e-commerce, marketing, web analytics etc.
      Use case: Web Analytics (visitors, channels, content, e-
       commerce, campaigns, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   20
BC - Workflow




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   21
Adobe Business Catalyst - Stats

      3 active datacenters
      Raw data ~6TB (from ~1TB 18 months ago)
      Visits table: ~1TB each(compressed)
      OLAP cubes (stats): 49GB – 64GB (compressed)


      ~30 minutes latency (from actual pageview/sale to chart in UI)
      10s – 100s of milliseconds latency for queries
      ~3000/s max concurrent OLAP queries (actual traffic is much lower)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   22
Adobe Pass for TV Everywhere

      Authentication & Authorization
      Single sign-on to Programmer content (e.g.
       Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g.
       Comcast, Dish, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   23
Adobe Pass – Use Case

      Analytics use case: Operational metrics (users, devices, latencies, etc.)
      Real-time ingestion in HBase
      High Frequency Map Reduce jobs (every 2 minutes)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   24
Adobe Pass - Stats (London Olympics 2012)

      67M streams ~ 5.3M hours
      1.5M concurrent streams
      > 7M unique users


      1 Technical & Engineering Emmy Award ;)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   25
Adobe Primetime – Real-time Video Analytics

      Unified video platform (acquisition, transcoding, broadcast, ads,
       analytics)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   26
Adobe Primetime – Use Case


      Use Cases:
            Audience metrics – minutes latency ok
            Ads metrics – seconds to minutes ok
            Streaming QoS metrics – seconds must


      Requirements:
            Massive throughput (millions of streams, multiple
             heartbeats every 10 seconds)
            Low latency (end-to-end)


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   27
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   28
Conclusions

      OLAP semantics on a simple data model
            Data as first class citizen
            Domain Specific “Language” for Dimensions, Metrics, Aggregations
      Framework for vertical analytics systems
      Tunable performance, resource allocation




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   29
Thank you!
                                                            Cosmin Lehene @clehene

                                                            http://hstack.org



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   30
Related

  http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/
  http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   31
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

More Related Content

What's hot

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachKhulna University
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionMohammed Bennamoun
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
A machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesA machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesVenkat Projects
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedYugal Kumar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQuery
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQuery
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB
 
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...akshaya870130
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 

What's hot (20)

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Regression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network ApproachRegression and Classification: An Artificial Neural Network Approach
Regression and Classification: An Artificial Neural Network Approach
 
Artificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competitionArtificial Neural Networks Lect7: Neural networks based on competition
Artificial Neural Networks Lect7: Neural networks based on competition
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
A machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehiclesA machine learning model for average fuel consumption in heavy vehicles
A machine learning model for average fuel consumption in heavy vehicles
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQuery
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQueryMongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQuery
MongoDB Europe 2016 - Warehousing MongoDB Data using Apache Beam and BigQuery
 
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...
vdocument.in_prof-saroj-kaushik1-introduction-to-artificial-intelligence-lect...
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Query trees
Query treesQuery trees
Query trees
 
Hadoop YARN
Hadoop YARNHadoop YARN
Hadoop YARN
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 

Viewers also liked

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubesmister_zed
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
Technical product manager
Technical product managerTechnical product manager
Technical product managerMark Long
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architectureddrschiw
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductStacy Vicknair
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - IntroductionMadhu Bala
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-dataBryan Jacobs
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]ecko_disasterz
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld
 

Viewers also liked (20)

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Technical product manager
Technical product managerTechnical product manager
Technical product manager
 
docker
dockerdocker
docker
 
Core Management - Task 1
Core Management - Task 1Core Management - Task 1
Core Management - Task 1
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architecture
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software Product
 
IEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conferenceIEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conference
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - Introduction
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic Algortihms
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
 

Similar to Real-time OLAP Big Data Use Cases

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applicationsMichael Chaize
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentMichael Chaize
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeIcinga
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...François Le Droff
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrencyConcurrency, Inc.
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoAmazon Web Services
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDAmazon Web Services
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017Amazon Web Services
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboardguest9776673
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016AdobeMarketingCloud
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentCraig Randall
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongSpiffy
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS StackGuy Korland
 

Similar to Real-time OLAP Big Data Use Cases (20)

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applications
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrency
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboard
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan Wong
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS Stack
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Real-time OLAP Big Data Use Cases

  • 1. Real-time “OLAP” for Big Data (+ use cases) Cosmin Lehene | Adobe #bigdataro - 30 January 2013 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 2. What we needed … and built  OLAP Semantics  Low Latency Ingestion  High Throughput  Real-time Query API © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  • 3. “Physical” Building Blocks © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  • 4. Logical Building Blocks  Dimensions, Metrics  Aggregations  Roll-up, drill-down, slicing and dicing, sorting © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  • 5. OLAP 101 – Queries example Date Country City OS Browser Sale 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  • 6. OLAP 101 – Queries example  Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0  “Slice” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Browser sales visits  Top browsers by sales SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
  • 7. OLAP – Runtime Aggregation vs. Pre-aggregation  Aggregate at runtime  Pre-aggregate  Most flexible  Fast  Fast – scatter gather  Efficient – O(1)  Space efficient  High throughput  But  But  I/O, CPU intensive  More effort to process (latency)  slow for larger data  Combinatorial explosion (space)  low throughput  No flexibility © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  • 8. SaasBase Map © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  • 9. SaasBase Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
  • 10. SaasBase - Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
  • 11. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  • 12. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  • 13. Ingestion © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
  • 14. Ingestion(ETL) throughput vs. latency  Historical data (large batches)  Optimize for throughput  Increments (latest data, smaller)  Optimize for latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
  • 15. Processing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
  • 16. Processing  Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating cubes that can be queried in real-time  “Super Processor” code running in Storm, Map-Reduce, HBase © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  • 17. Processing for OLAP semantics  GROUP BY (process, query)  COUNT, SUM, AVG, etc. (process, query)  SORT (process, query)  HAVING (mostly query, can define pre-process constraints) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
  • 18. SaasBase vs. SQL Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  • 19. Query Engine  Always reads indexed, compact data  Query parsing  Scan strategy  Single vs. multiple scans  Start/stop rows (prefixes, index positions, etc.)  Index selection (volatile indexes with incremental processing)  Deserialization  Post-aggregation, sorting, fuzzy-sorting etc.  Paging  Custom dimension/metric class loading © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  • 20. Adobe Business Catalyst  Online business presence: e-commerce, marketing, web analytics etc.  Use case: Web Analytics (visitors, channels, content, e- commerce, campaigns, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
  • 21. BC - Workflow © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
  • 22. Adobe Business Catalyst - Stats  3 active datacenters  Raw data ~6TB (from ~1TB 18 months ago)  Visits table: ~1TB each(compressed)  OLAP cubes (stats): 49GB – 64GB (compressed)  ~30 minutes latency (from actual pageview/sale to chart in UI)  10s – 100s of milliseconds latency for queries  ~3000/s max concurrent OLAP queries (actual traffic is much lower) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  • 23. Adobe Pass for TV Everywhere  Authentication & Authorization  Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  • 24. Adobe Pass – Use Case  Analytics use case: Operational metrics (users, devices, latencies, etc.)  Real-time ingestion in HBase  High Frequency Map Reduce jobs (every 2 minutes) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  • 25. Adobe Pass - Stats (London Olympics 2012)  67M streams ~ 5.3M hours  1.5M concurrent streams  > 7M unique users  1 Technical & Engineering Emmy Award ;) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  • 26. Adobe Primetime – Real-time Video Analytics  Unified video platform (acquisition, transcoding, broadcast, ads, analytics) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  • 27. Adobe Primetime – Use Case  Use Cases:  Audience metrics – minutes latency ok  Ads metrics – seconds to minutes ok  Streaming QoS metrics – seconds must  Requirements:  Massive throughput (millions of streams, multiple heartbeats every 10 seconds)  Low latency (end-to-end) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
  • 28. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  • 29. Conclusions  OLAP semantics on a simple data model  Data as first class citizen  Domain Specific “Language” for Dimensions, Metrics, Aggregations  Framework for vertical analytics systems  Tunable performance, resource allocation © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  • 30. Thank you! Cosmin Lehene @clehene http://hstack.org © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
  • 31. Related http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/ http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
  • 32. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Editor's Notes

  1. How many HBase users?
  2. Data as first class citizen
  3. Add the real building blocks HDFS, MapReduce, Hbase Storm
  4. Add the real building blocks HDFS, MapReduce, Hbase Storm
  5. Check contrast on projector
  6. Two approaches RDBMS / .OLAP
  7. Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
  8. QUERY ENGINE -> INDEX(always realtime)What’s the difference between this and HIVE/PIG/Impala
  9. Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
  10. LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
  11. >100K/sec/threadREALTIME
  12. ~12 hours to reprocess everything from scratch
  13. 2 datacenters (active-failover) on US West and East coasts (2NN + 19DN, 0.5PB total, 456 cores, 1.1TB RAM)
  14. ----- Meeting Notes (1/29/13 18:09) -----OlympicsSame SaasBase codebase running in Storm instead of HadoopSimpler aggregations, but strict latency requirements
  15. ----- Meeting Notes (1/29/13 18:12) -----draw line between player and chart
  16. Data analysts work with familiar concepts----- Meeting Notes (1/29/13 18:12) -----Future:
  17. …….