SlideShare a Scribd company logo
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Murtaza Doctor Principal Architect
Giang Nguyen Sr. Software Engineer
How we solved
Real-Time User
Segmentation using
Hbase
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Outline
• {rr} Story
• {rr} Personalization Platform
• User Segmentation: Problem Statement
• Design & Architecture
• Performance Metrics
• Q&A
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Personalized Product
Recommendations
• Targeted Promotional
Content
• Relevant Brand
Advertising
RichRelevance delivers:
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Multiple Personalization Placements
Personalized Product
Recommendations Targeted Promotions Relevant Brand Advertising
Inventory and
Margin Data
Catalog
Attribute Data
Real time user
behavior
Multi-Channel
purchase
history
3rd Party Data
Sources
Future Data
Sources
Input
Data
100+ algorithms dynamically targeted by
page, context, and user behavior, across
all retail channels.
Targeted content optimization to
customer segments
Monetization through relevant
brand advertising
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
 One place where all data
about your customer lives
(e.g. loyalty, customer
service, offline, catalogue)
 Direct access to data for
your entire enterprise via API
 Real-time actionable data
and business intelligence
 Link customer activity across
any channel
 Leverage 3rd party data (e.g.
weather, geography, census)
Delivering a
Customer-centric
Omni-channel
Data model
RichRelevanceDataMesh
Real-time
Segmentation
DMP
Integration
RichRelevance DataMesh Cloud Platform
Delivering a Single View of your Customer
Event-based
Triggers
Ad Hoc
Reporting
Omni-channel
Personalization
Loyalty
Integration
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Our cloud-based platform supports both real-time processes
and analytical use cases, utilizing technologies to name a
few: Crunch, Hive, HBase, Avro, Azkaban, Voldemort, Kafka
Someone clicks on a {rr} recommendation
every 21 milliseconds
Did You Know?
Our data capacity includes a 1.5 PB Hadoop infrastructure,
which enables us to employ 100+ algorithms in real-time
In the US, we serve 7000 requests per second with an average
response time of 50 ms
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
User Segmentation
Finding and Targeting the Right Audience
Example segment : savvy shopper
• Demographics
– Female
– Age 30 to 25
• DMA
– 98004
• Behavior
– Purchased a Gucci hand bag last month,
Louis Vuitton hand bag 2 weeks back,
Chanel hand bag last week
– Viewed products in the Cartier brand 2 days
ago
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Utilizes valuable targeting
data such as event logs,
off-line data, DMA,
demographics, and etc.
• Finds highly qualified
consumers and buckets them
into segments
– Example: for Retail, Segment
Builder supports view, click,
purchase on products,
categories, and brands
What is the {rr} Segment Builder?
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Segment’s list page
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Add/Edit segment page
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Create segments to capture the audience via web UI
– Segment definition consists of one or more behaviors joined
together using logical expressions
– Example: Find all users who viewed a category in the last 7 days at least
3 times, or clicked on a product at least 2 times in the last 2 days, and
from zip code 90210
• Each behavior is captured by a rule
• Each rule corresponds to a row key in HBase
• Each rule returns the users
• Rules are joined using set union or intersection
• Segment definition consists of one or more rules
Design: Segment Evaluation Engine
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
1.
Segment Builder
Add rule
2.
3.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segment Builder
Add additional rule
1.
3.
2.
4.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• User interacts with a retail site
• Events are triggered in real-time, each event is
converted into an Avro record
• Events are ingested using our real-time
framework built using Apache Kafka
Let’s start with real-time data ingestion
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Design Principles: Real-Time Solution
• Streaming: Support for streaming versus batch
• Reliability: no loss of events and at least once delivery of
messages
• Scalable: add more brokers, add more consumers,
partition data
• Distributed system with central coordinator like
zookeeper
• Persistence of messages
• Push & pull mechanism
• Support for compression
• Low operational complexity & easy to monitor
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Our Decision: Apache Kafka
• Distributed pub-sub messaging system
• More generic concept (Ingest & Publish)
• Can support both offline & online use-
cases
• Designed for persistent messages as
the common case
• Guarantee ordering of events
• Supports gzip & snappy (0.8) compression protocols
• Supports rewind offset & re-consumption of data
• No concept of master node, brokers are peers
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Common Consumer Framework
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Volume Facts?
• Daily clickstream event data ~ 150 GB
• Average size of message ~ 2 KB
• Batch Size ~ 5000 messages
• Producer throughput: 5000 messages/sec
• Real time Hbase Consumer: 7000 messages/sec
on average
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
End-to-end Real-time Story…
• User exhibits a behavior
• Events are generated at front-end data-center
• Events are streamed to backend data center via
Kafka
• Events are consumed and indexed in HBase
tables
• Takes seconds from event origination to indexing
in HBase
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
site.com
[retailer]
runtime
[websever]
clickstream
events
clicks
views
purchases kafka Producer
kafka
Broker
runtime colo
backend colo
mirrored
Kafka Broker
HBase
Consumer
HBase
Segment
Manager
Segment
History
Segment
Evaluator
avro
events
dimension tables
events
available in msecs
shopper
End-to-end Real-time Story…
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Row key represents the behavior
• Columns store the user id
• Cell stores the time when user exhibited
behavior. Used to capture frequency
• One column family U
Design: HBase Table
Rowkey Columns
338VBChanel1370377143 23b93laddf82:1370377143973
Hd92jslahd0a:1313323414192
338CCElectronic1370377143 z3be3la2dfa2:1370477142970
kd9zjsla3d01:1313323414192
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Row key contains an attribute value, siteId, metric, and
attribute, followed by the timestamp that this occured,
and finally the userGUID
[salt]_len_[siteId]_len_[metric]_len_[dimension]_len_[value]
[timestamp]
• _len_ is the integer length of the field following it. Fixed-
length fields (timestamp) don't get a length
• Timestamp is stored in day granularity
• [salt] is computed by first creating a hash from the siteId,
metric, and dimension, then combining this with a random
number between 0 and N (number of shards) for sharding
Design: User Index
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Each row is sharded to prevent hot-spotting
• Popular products or high level categories can
have millions of users, each day
• Each row key is sharded to keep the row small
• Shard into N number of regions (predefined)
• Calculate the hash with a random number
between 0 and N; we then spread this load to N
regions
Design: Row Sharding
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Each row key returns a set of users
• OR = Full outer join
• AND = Inner join
• Done in memory for small rules
• Merged on disk for large rules
Design: Joins
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Segmentation Performance
• Data indexed in real time
– User browses site and clicks product and is
added to a segment in less than a second
• 40K puts/sec over 2 tables, 8 regions per table
• Scaling achieved through addition of regions
• Segments with 4-5 attributes/dimensions can be
evaluated in msecs
• Large segments with millions of users can be
evaluated in secs
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
• Real-time data
ingestion an important
ingredient
• Indexing of
segmentation attributes
• HBase key design
critical for performance
• Distributed data
centers complicates
matters
Lessons Learned
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
© 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
Thank You!

More Related Content

What's hot

HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
DataWorks Summit/Hadoop Summit
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
DataStax
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
Cloudera, Inc.
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
DataWorks Summit
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
In-Memory Computing Summit
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
Cloudera, Inc.
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
Cloudera, Inc.
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
DataWorks Summit
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
Cloudera, Inc.
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
Jason Hubbard
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
DataWorks Summit
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
Precisely
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
DataStax
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
DataWorks Summit/Hadoop Summit
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and Streaming
Anjani Phuyal
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
PivotalOpenSourceHub
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
Cloudera, Inc.
 

What's hot (20)

HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Relying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services ExperienceRelying on Data for Strategic Decision-Making--Financial Services Experience
Relying on Data for Strategic Decision-Making--Financial Services Experience
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
IoT Connected Brewery
IoT Connected BreweryIoT Connected Brewery
IoT Connected Brewery
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Mainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft AzureMainframe Modernization with Precisely and Microsoft Azure
Mainframe Modernization with Precisely and Microsoft Azure
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Webinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the DarkWebinar: Don't Leave Your Data in the Dark
Webinar: Don't Leave Your Data in the Dark
 
Instrumenting your Instruments
Instrumenting your Instruments Instrumenting your Instruments
Instrumenting your Instruments
 
Big Data, HPC and Streaming
Big Data, HPC and StreamingBig Data, HPC and Streaming
Big Data, HPC and Streaming
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 

Similar to HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.BI
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
DataStax
 
Hello Streams Overview
Hello Streams OverviewHello Streams Overview
Hello Streams Overview
psanet
 
Vistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT OperationsVistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT Operations
Vistara
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
Orkhan Gasimov
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
Arvind Sathi
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
MSAdvAnalytics
 
Navigator Systems ltd HireTrack NX questions
Navigator Systems ltd   HireTrack NX questionsNavigator Systems ltd   HireTrack NX questions
Navigator Systems ltd HireTrack NX questions
David Rose
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Kai Wähner
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics
Informatica Cloud
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
David Walker
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
DataWorks Summit/Hadoop Summit
 
Race IT Review
Race IT ReviewRace IT Review
Race IT Review
cschmidtva
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Greg Makowski
 
Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'
Amazon Web Services
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
VMware Tanzu
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
Cloudera, Inc.
 

Similar to HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study (20)

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
Hello Streams Overview
Hello Streams OverviewHello Streams Overview
Hello Streams Overview
 
Vistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT OperationsVistara 3.1 - Delivering Unified IT Operations
Vistara 3.1 - Delivering Unified IT Operations
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Navigator Systems ltd HireTrack NX questions
Navigator Systems ltd   HireTrack NX questionsNavigator Systems ltd   HireTrack NX questions
Navigator Systems ltd HireTrack NX questions
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics5 Ways to Make Waves with Informatica and Salesforce Analytics
5 Ways to Make Waves with Informatica and Salesforce Analytics
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Race IT Review
Race IT ReviewRace IT Review
Race IT Review
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'Breaking Down the 'Monowhat'
Breaking Down the 'Monowhat'
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
HBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBaseHBaseCon 2013: ETL for Apache HBase
HBaseCon 2013: ETL for Apache HBase
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
manji sharman06
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
ScyllaDB
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 

Recently uploaded (20)

Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!Containers & AI - Beauty and the Beast!?!
Containers & AI - Beauty and the Beast!?!
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
Call Girls Chandigarh🔥7023059433🔥Agency Profile Escorts in Chandigarh Availab...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsGetting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
Getting the Most Out of ScyllaDB Monitoring: ShareChat's Tips
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 

HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural Case Study

  • 1. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Murtaza Doctor Principal Architect Giang Nguyen Sr. Software Engineer How we solved Real-Time User Segmentation using Hbase
  • 2. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Outline • {rr} Story • {rr} Personalization Platform • User Segmentation: Problem Statement • Design & Architecture • Performance Metrics • Q&A
  • 3. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Personalized Product Recommendations • Targeted Promotional Content • Relevant Brand Advertising RichRelevance delivers:
  • 4. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Multiple Personalization Placements Personalized Product Recommendations Targeted Promotions Relevant Brand Advertising Inventory and Margin Data Catalog Attribute Data Real time user behavior Multi-Channel purchase history 3rd Party Data Sources Future Data Sources Input Data 100+ algorithms dynamically targeted by page, context, and user behavior, across all retail channels. Targeted content optimization to customer segments Monetization through relevant brand advertising
  • 5. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.  One place where all data about your customer lives (e.g. loyalty, customer service, offline, catalogue)  Direct access to data for your entire enterprise via API  Real-time actionable data and business intelligence  Link customer activity across any channel  Leverage 3rd party data (e.g. weather, geography, census) Delivering a Customer-centric Omni-channel Data model RichRelevanceDataMesh Real-time Segmentation DMP Integration RichRelevance DataMesh Cloud Platform Delivering a Single View of your Customer Event-based Triggers Ad Hoc Reporting Omni-channel Personalization Loyalty Integration
  • 6. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Our cloud-based platform supports both real-time processes and analytical use cases, utilizing technologies to name a few: Crunch, Hive, HBase, Avro, Azkaban, Voldemort, Kafka Someone clicks on a {rr} recommendation every 21 milliseconds Did You Know? Our data capacity includes a 1.5 PB Hadoop infrastructure, which enables us to employ 100+ algorithms in real-time In the US, we serve 7000 requests per second with an average response time of 50 ms
  • 7. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. User Segmentation Finding and Targeting the Right Audience Example segment : savvy shopper • Demographics – Female – Age 30 to 25 • DMA – 98004 • Behavior – Purchased a Gucci hand bag last month, Louis Vuitton hand bag 2 weeks back, Chanel hand bag last week – Viewed products in the Cartier brand 2 days ago
  • 8. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Utilizes valuable targeting data such as event logs, off-line data, DMA, demographics, and etc. • Finds highly qualified consumers and buckets them into segments – Example: for Retail, Segment Builder supports view, click, purchase on products, categories, and brands What is the {rr} Segment Builder?
  • 9. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Segment’s list page
  • 10. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Add/Edit segment page
  • 11. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Create segments to capture the audience via web UI – Segment definition consists of one or more behaviors joined together using logical expressions – Example: Find all users who viewed a category in the last 7 days at least 3 times, or clicked on a product at least 2 times in the last 2 days, and from zip code 90210 • Each behavior is captured by a rule • Each rule corresponds to a row key in HBase • Each rule returns the users • Rules are joined using set union or intersection • Segment definition consists of one or more rules Design: Segment Evaluation Engine
  • 12. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. 1. Segment Builder Add rule 2. 3.
  • 13. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segment Builder Add additional rule 1. 3. 2. 4.
  • 14. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • User interacts with a retail site • Events are triggered in real-time, each event is converted into an Avro record • Events are ingested using our real-time framework built using Apache Kafka Let’s start with real-time data ingestion
  • 15. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Design Principles: Real-Time Solution • Streaming: Support for streaming versus batch • Reliability: no loss of events and at least once delivery of messages • Scalable: add more brokers, add more consumers, partition data • Distributed system with central coordinator like zookeeper • Persistence of messages • Push & pull mechanism • Support for compression • Low operational complexity & easy to monitor
  • 16. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Our Decision: Apache Kafka • Distributed pub-sub messaging system • More generic concept (Ingest & Publish) • Can support both offline & online use- cases • Designed for persistent messages as the common case • Guarantee ordering of events • Supports gzip & snappy (0.8) compression protocols • Supports rewind offset & re-consumption of data • No concept of master node, brokers are peers
  • 17. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
  • 18. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Common Consumer Framework
  • 19. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Volume Facts? • Daily clickstream event data ~ 150 GB • Average size of message ~ 2 KB • Batch Size ~ 5000 messages • Producer throughput: 5000 messages/sec • Real time Hbase Consumer: 7000 messages/sec on average
  • 20. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. End-to-end Real-time Story… • User exhibits a behavior • Events are generated at front-end data-center • Events are streamed to backend data center via Kafka • Events are consumed and indexed in HBase tables • Takes seconds from event origination to indexing in HBase
  • 21. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. site.com [retailer] runtime [websever] clickstream events clicks views purchases kafka Producer kafka Broker runtime colo backend colo mirrored Kafka Broker HBase Consumer HBase Segment Manager Segment History Segment Evaluator avro events dimension tables events available in msecs shopper End-to-end Real-time Story…
  • 22. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Row key represents the behavior • Columns store the user id • Cell stores the time when user exhibited behavior. Used to capture frequency • One column family U Design: HBase Table Rowkey Columns 338VBChanel1370377143 23b93laddf82:1370377143973 Hd92jslahd0a:1313323414192 338CCElectronic1370377143 z3be3la2dfa2:1370477142970 kd9zjsla3d01:1313323414192
  • 23. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Row key contains an attribute value, siteId, metric, and attribute, followed by the timestamp that this occured, and finally the userGUID [salt]_len_[siteId]_len_[metric]_len_[dimension]_len_[value] [timestamp] • _len_ is the integer length of the field following it. Fixed- length fields (timestamp) don't get a length • Timestamp is stored in day granularity • [salt] is computed by first creating a hash from the siteId, metric, and dimension, then combining this with a random number between 0 and N (number of shards) for sharding Design: User Index
  • 24. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Each row is sharded to prevent hot-spotting • Popular products or high level categories can have millions of users, each day • Each row key is sharded to keep the row small • Shard into N number of regions (predefined) • Calculate the hash with a random number between 0 and N; we then spread this load to N regions Design: Row Sharding
  • 25. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Each row key returns a set of users • OR = Full outer join • AND = Inner join • Done in memory for small rules • Merged on disk for large rules Design: Joins
  • 26. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Segmentation Performance • Data indexed in real time – User browses site and clicks product and is added to a segment in less than a second • 40K puts/sec over 2 tables, 8 regions per table • Scaling achieved through addition of regions • Segments with 4-5 attributes/dimensions can be evaluated in msecs • Large segments with millions of users can be evaluated in secs
  • 27. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. • Real-time data ingestion an important ingredient • Indexing of segmentation attributes • HBase key design critical for performance • Distributed data centers complicates matters Lessons Learned
  • 28. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential.
  • 29. © 2013 RichRelevance, Inc. All Rights Reserved. Confidential. Thank You!

Editor's Notes

  1. How we plan to go over stuff
  2. The RichRelevance DataMesh Cloud Platform delivers a single view of your customer by:Giving you one single place to house unlimited sets of dataExample use cases:Create your own run-time strategies (predictive models)Create and manage segments via toolAutomatic & real-time segment creationView performance of strategies against KPIs Run adhoc queries using SQL-like toolImport into offline toolsOLAP capabilitiesMarket Basket AnalysisCustomer Lifetime ValueSequential Pattern miningManage APIs, build products & applications
  3. Nuggets or Data Points1.5PB not as big as yahoo or facebook – huge from a retail industry perspective
  4. Distributed System:: i.e. producers, brokers and consumer entities can all be deployed to different hosts in different colos in a truly distributed fashion and coordination controlled through zookeeperPersistence of Messages: messages need to be persisted on the broker for reliability, replay and temporary storagePush & Pull Mechanism:: i.e. push data to Kafka server and pull data from it using a consumer. This allows for two different rates: rate at which messages are transferred to the kafka server and the rate at which the messages are consumed.: Kafka supports GZIP and version 0.8 will additionally support Snappy compression.