SlideShare a Scribd company logo
1 of 26
Download to read offline




Relevance - Deal Personalization and Real
Time Big Data Analytics
Prassnitha	
  Sampath	
  
psampath@groupon.com	
  
About Me
•  Lead	
  Engineer	
  working	
  on	
  Real	
  Time	
  Data	
  
Infrastructure	
  @	
  Groupon	
  
	
  
•  Graduate	
  of	
  Portland	
  State	
  and	
  Madras	
  
University	
  
What are Groupon Deals?
Our Relevance Scenario
Users	
  
Scaling: Keeping Up With a Changing Business
2014	
  2011	
   2012	
  
Growing	
  Number	
  of	
  deals	
   Growing	
  Users	
  
•  100	
  Million+	
  subscribers	
  
•  We	
  need	
  	
  to	
  store	
  data	
  
like,	
  user	
  click	
  history,	
  	
  
email	
  records,	
  service	
  
logs	
  etc.	
  This	
  is	
  billions	
  of	
  
data	
  points	
  and	
  TB’s	
  of	
  
data	
  
Changing Business: Shift from Email to Mobile
•  Growth	
  in	
  Mobile	
  
Business	
  
•  Reducing	
  dependence	
  on	
  
email	
  markeOng	
  
	
  
100	
  Million+	
  App	
  Downloads	
  
Deal Personalization Infrastructure Use Cases
Deliver Personalized
Emails
Deliver Personalized
Website & Mobile
Experience
Offline	
  System	
   Online	
  System	
  
Email	
  
Personalize	
  billions	
  of	
  emails	
  for	
  hundreds	
  
of	
  millions	
  of	
  users	
  
Personalize	
  one	
  of	
  the	
  most	
  popular	
  
e-­‐commerce	
  mobile	
  &	
  web	
  app	
  
for	
  hundreds	
  of	
  millions	
  of	
  users	
  &	
  page	
  views	
  
Deal Personalization Infrastructure Use Cases
Deliver Personalized
Website, Mobile and Email
Experience
Deal	
  Performance	
   Understand	
  User	
  Behavior	
  
Deliver Relevant Experience
with High Quality Deals
Earlier System
Offline	
  
PersonalizaOon	
  
Map/Reduce	
  
Data	
  Pipeline	
  (User	
  Logs,	
  Email	
  Records,	
  User	
  History	
  etc)	
  
Online	
  Deal	
  
PersonalizaOon	
  	
  
API	
  
MySQL	
  Store	
  
Email	
  
Earlier System
Email	
  
Offline	
  
PersonalizaOon	
  
Map/Reduce	
  
Data	
  Pipeline	
  
Online	
  Deal	
  
PersonalizaOon	
  	
  
API	
  
MySQL	
  Store	
  
•  	
  Scaling	
  MySQL	
  for	
  data	
  
such	
  as	
  user	
  click	
  history,	
  
email	
  records	
  was	
  
painful	
  unless	
  we	
  shard	
  
data	
  
•  Data	
  Pipeline	
  is	
  not	
  
“Real	
  Time”	
  
Email	
  
Offline	
  
PersonalizaOon	
  
Map/Reduce	
  
Real	
  Time	
  Data	
  
Pipeline	
  
Online	
  Deal	
  
PersonalizaOon	
  	
  
API	
  
Ideal	
  Data	
  Store	
  
•  Common	
  data	
  store	
  that	
  
serves	
  data	
  to	
  both	
  online	
  
and	
  offline	
  systems	
  
•  Data	
  store	
  that	
  scales	
  to	
  
hundreds	
  of	
  millions	
  of	
  
records	
  
•  Data	
  store	
  that	
  works	
  well	
  
with	
  our	
  exisOng	
  Hadoop	
  
based	
  systems	
  
•  Real	
  Time	
  pipeline	
  that	
  scales	
  
and	
  can	
  process	
  about	
  
100,000	
  messages/	
  second	
  
Ideal System
Email	
  
Offline	
  
PersonalizaOon	
  
Map/Reduce	
  
Web	
  Site	
  	
  
Logs	
  
Online	
  Deal	
  
PersonalizaOon	
  	
  
API	
  
HBase	
  
Final Design
Mobile	
  	
  
Logs	
  
Ka`a	
  Message	
  Broker	
  
Storm	
  
Two Challenges With HBase
HBase	
  
How	
  to	
  scale	
  
100,000	
  	
  
writes/	
  second?	
  
HBase	
  
•  How	
  to	
  run	
  Map	
  Reduce	
  Programs	
  
over	
  HBase	
  without	
  affecOng	
  read	
  
latency?	
  
•  How	
  to	
  batch	
  load	
  data	
  in	
  HBase	
  	
  
without	
  affecOng	
  read	
  latencies?	
  
	
  
Final Hbase Design
Real	
  Time	
  
HBase	
  
Batch	
  
HBase	
  
Bulk	
  Load	
  	
  data	
  via	
  
HFiles	
  
ReplicaOon	
  
Map	
  Reduce	
  Over	
  
HBase	
  
Leveraging System for Real Time Analytics
	
  Various	
  requirements	
  from	
  relevance	
  algorithms	
  to	
  pre-­‐
compute	
  real	
  6me	
  analy6cs	
  for	
  be9er	
  targe6ng	
  
	
  
	
  
Category	
  Level	
  
MulOdimensional	
  
Performance	
  
Metrics	
  	
  	
  
	
  
Deal	
  Level	
  
Performance	
  
Metrics	
  
	
  How	
  do	
  	
  women	
  in	
  Dublin	
  
convert	
  for	
  Pizza	
  deals?	
  	
  
How	
  do	
  women	
  in	
  Dublin	
  
convert	
  for	
  a	
  parOcular	
  pizza	
  
deal?	
  	
  
Leveraging System for Real Time Analytics
	
  More	
  Complex	
  Examples	
  
	
  
	
  
Category	
  Level	
  
MulOdimensional	
  
Performance	
  Metrics	
  	
  	
  
	
  
Deal	
  Level	
  
Performance	
  Metrics	
  
	
  How	
  do	
  women	
  in	
  Dublin	
  
from	
  the	
  Dundrum	
  area	
  aged	
  
30-­‐35	
  convert	
  for	
  New	
  York	
  
Style	
  Pizza,	
  when	
  deal	
  is	
  
located	
  within	
  2	
  miles,	
  and	
  
when	
  deal	
  is	
  priced	
  between	
  
€10-­‐€20?	
  	
  
	
  How	
  do	
  women	
  in	
  Dublin	
  from	
  Dundrum	
  area	
  
aged	
  30-­‐35	
  convert	
  for	
  a	
  parOcular	
  deal?	
  
Leveraging System for Real Time Analytics
Even	
  More	
  Complex	
  Examples	
  
	
  
	
  How	
  do	
  women	
  in	
  Dublin	
  
from	
  the	
  Dundrum	
  area	
  
aged	
  30-­‐35	
  who	
  also	
  like	
  
acOviOes	
  like	
  Biking	
  and	
  are	
  	
  
acOve	
  customers	
  on	
  our	
  
mobile	
  plahorm	
  convert	
  
when	
  deal	
  is	
  located	
  within	
  
2	
  miles,	
  and	
  when	
  deal	
  is	
  
priced	
  between	
  €10-­‐€20?	
  	
  
	
  How	
  do	
  women	
  in	
  Dublin	
  from	
  
the	
  Dundrum	
  area	
  aged	
  30-­‐35	
  
who	
  also	
  like	
  acOviOes	
  such	
  as	
  
biking	
  and	
  are	
  acOve	
  customers	
  
of	
  Groupon	
  deals	
  on	
  mobile	
  
plahorm	
  convert	
  for	
  this	
  
parOcular	
  deal?	
  
Power of Simple Counting
Turns	
  out	
  all	
  earlier	
  quesOons	
  can	
  be	
  answered	
  if	
  we	
  could	
  count	
  appropriate	
  events	
  in	
  
appropriate	
  bucket	
  	
  	
  	
  
No	
  Deal	
  Impressions	
  by	
  Women	
  in	
  Dublin	
  for	
  Pizza	
  Deals	
  	
  	
  
No	
  of	
  Purchases	
  by	
  Women	
  in	
  Dublin	
  for	
  
Pizza	
  Deals	
  Conversion	
  rate	
  
for	
  pizza	
  deals	
  
for	
  women	
  in	
  
Dublin	
  
=	
  
Real Time Analytics Infrastructure
Ka`a	
  Topic	
  –	
  With	
  Real	
  Time	
  User	
  
events	
  
Storm	
  –	
  Running	
  AnalyOcs	
  Topology	
  
Real	
  Time	
  infrastructure	
  processing	
  	
  
100,000	
  requests/	
  second	
  
Redis	
  1	
   …	
  
Storm	
  Topology	
  calculaOng	
  various	
  
dimensions/	
  buckets	
  and	
  updates	
  
appropriate	
  Redis	
  bucket.	
  Redis	
  is	
  
sharded	
  from	
  client	
  side	
  
Redis	
  cluster	
  handles	
  over	
  3	
  Million	
  
events	
  per	
  second.	
  Stores	
  over	
  14	
  
Billion	
  unique	
  keys	
  
Redis	
  2	
   Redis	
  N	
  
Real Time Analytics Infrastructure -
Explained
Ka`a	
  Topic	
  –	
  
With	
  Real	
  
Time	
  User	
  
events	
  
Read	
  user	
  
event	
  Data	
  
from	
  Ka`a	
  
Find	
  out	
  
which	
  all	
  
buckets	
  this	
  
event	
  falls	
  
Increase	
  event	
  
counter	
  for	
  
appropriate	
  
bucket	
  in	
  Redis	
  
Redis	
  
Shards	
  
Storm	
  
Scaling Challenges - Kafka - Storm
	
  
•  Storm	
  was	
  hard	
  to	
  scale.	
  We	
  had	
  to	
  try	
  various	
  number	
  of	
  combinaOons	
  to	
  
finalize	
  how	
  many	
  bolts	
  of	
  each	
  type	
  are	
  required	
  for	
  steady	
  state	
  
operaOons	
  and	
  overall	
  how	
  many	
  workers	
  are	
  needed.	
  
•  Use	
  “topology.max.spout.pending”	
  senng	
  in	
  Storm	
  topologies.	
  We	
  found	
  
it	
  to	
  be	
  very	
  useful	
  to	
  shield	
  your	
  topologies	
  from	
  sudden	
  surge	
  in	
  traffic.	
  
•  Build	
  your	
  enOre	
  infrastructure	
  –	
  where	
  data	
  duplicates	
  are	
  allowed	
  
Scaling Challenges - Redis
•  Reduce	
  memory	
  footprint	
  –	
  	
  use	
  hashes.	
  Very	
  memory	
  
efficient	
  compared	
  to	
  normal	
  Redis	
  keys	
  	
  
•  In	
  order	
  to	
  support	
  high	
  write	
  operaOons	
  turned	
  off	
  AOF,	
  
turned	
  on	
  RDB	
  backups	
  
Easiest	
  of	
  all	
  other	
  infrastructure	
  pieces	
  –	
  Ka`a,	
  Storm,	
  HBase	
  
When Small is Big – Bloom Filters
•  Since	
  both	
  Ka`a	
  and	
  Storm	
  can	
  send	
  same	
  data	
  twice	
  specially	
  at	
  
scale,	
  it	
  was	
  important	
  to	
  build	
  downstream	
  infrastructure	
  that	
  can	
  
handle	
  duplicate	
  data.	
  
•  However,	
  by	
  very	
  nature	
  AnalyOcs	
  Topology	
  (CounOng	
  Topology)	
  
cannot	
  handle	
  duplicates	
  
•  Storing	
  individual	
  messages	
  for	
  billions	
  of	
  messages	
  is	
  way	
  too	
  
expensive	
  and	
  would	
  take	
  lot	
  more	
  memory	
  
	
  
•  So	
  we	
  used	
  bloom	
  filters.	
  At	
  a	
  very	
  small	
  %	
  error	
  rate,	
  we	
  could	
  
effecOvely	
  de-­‐dupe	
  data	
  with	
  a	
  very	
  small	
  memory	
  footprint.	
  
Avoiding Errors – Backups/ Recovery
Strategy
For	
  a	
  high	
  volume	
  system,	
  which	
  also	
  drives	
  so	
  much	
  revenue	
  for	
  the	
  company	
  good	
  
backup/recovery	
  strategy	
  is	
  necessary	
  
Redis	
  
	
  
RDB	
  Backups	
  every	
  
few	
  hours.	
  RDB	
  
backups	
  are	
  stored	
  
in	
  HDFS	
  for	
  later	
  
use	
  	
  	
  
HBase	
  
	
  
HBase	
  Snapshot	
  
funcOonality	
  is	
  
used.	
  Snapshot	
  are	
  
taken	
  every	
  few	
  
hours.	
  	
  
Ka`a/	
  Storm	
  
	
  
All	
  input	
  into	
  Ka`a	
  
topic	
  is	
  stored	
  in	
  
HDFS	
  for	
  30	
  days.	
  
So	
  any	
  hour/	
  day	
  
can	
  be	
  replayed	
  
from	
  HDFS	
  if	
  
necessary.	
  
Monitoring
Overall end-to-end monitoring to test the complete flow of data
Ka`a	
  -­‐>	
  Storm	
  -­‐>	
  HBase	
  Pipeline	
  
Crawler	
  crawls	
  the	
  page	
  and	
  monitoring	
  looks	
  for	
  corresponding	
  data	
  in	
  HBase	
  
psampath@groupon.com
www.groupon.com/techjobs
Ques6ons?	
  
Thank	
  you!	
  
Slides	
  prepared	
  in	
  collabora/on	
  with	
  Ameya	
  Kanitkar	
  

More Related Content

What's hot

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Amazon Web Services
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementAmazon Web Services
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012infolive
 
Big Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashBig Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashAmazon Web Services
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)Amazon Web Services
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analyticsKovid Academy
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTAmazon Web Services
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
 
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWSAmazon Web Services
 

What's hot (20)

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
 
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big Engagement
 
Traitement d'événements
Traitement d'événementsTraitement d'événements
Traitement d'événements
 
16h00 globant - aws globant-big-data_summit2012
16h00   globant - aws globant-big-data_summit201216h00   globant - aws globant-big-data_summit2012
16h00 globant - aws globant-big-data_summit2012
 
Big Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell NashBig Data and Analytics – End to End on AWS – Russell Nash
Big Data and Analytics – End to End on AWS – Russell Nash
 
Turn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and AmazonTurn Big Data Into Big Value On Informatica and Amazon
Turn Big Data Into Big Value On Informatica and Amazon
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
AWS re:Invent 2016: Earth on AWS—Next-Generation Open Data Platforms (STG203)
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPTHow EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
How EIS Reduced Costs by 20% and Optimized SAP by Leveraging the Cloud PPT
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
(ISM213) Building and Deploying a Modern Big Data Architecture on AWS
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 

Similar to Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase - NoSQL matters Dublin 2015

AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost OptimizationMiles Ward
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Kai Wähner
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB
 
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...Amazon Web Services
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...DataStax
 
Advanced Topics - Session 3 - Optimizing AWS Applications
Advanced Topics - Session 3 - Optimizing AWS ApplicationsAdvanced Topics - Session 3 - Optimizing AWS Applications
Advanced Topics - Session 3 - Optimizing AWS ApplicationsAmazon Web Services
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoAmazon Web Services
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 

Similar to Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase - NoSQL matters Dublin 2015 (20)

AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
Real World Use Cases and Success Stories for In-Memory Data Grids (TIBCO Acti...
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
MongoDB in a Mainframe World
MongoDB in a Mainframe WorldMongoDB in a Mainframe World
MongoDB in a Mainframe World
 
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
AWS Summit 2013 | Auckland - Optimizing Your AWS Applications and Usage to Re...
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
Advanced Topics - Session 3 - Optimizing AWS Applications
Advanced Topics - Session 3 - Optimizing AWS ApplicationsAdvanced Topics - Session 3 - Optimizing AWS Applications
Advanced Topics - Session 3 - Optimizing AWS Applications
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
 
AWS Analytics Experience Argentina - Intro
AWS Analytics Experience Argentina - IntroAWS Analytics Experience Argentina - Intro
AWS Analytics Experience Argentina - Intro
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 

More from NoSQLmatters

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...NoSQLmatters
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...NoSQLmatters
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015NoSQLmatters
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...NoSQLmatters
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015NoSQLmatters
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015NoSQLmatters
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...NoSQLmatters
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...NoSQLmatters
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...NoSQLmatters
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...NoSQLmatters
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...NoSQLmatters
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...NoSQLmatters
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...NoSQLmatters
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015NoSQLmatters
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 

More from NoSQLmatters (20)

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase - NoSQL matters Dublin 2015

  • 1. 
 
 Relevance - Deal Personalization and Real Time Big Data Analytics Prassnitha  Sampath   psampath@groupon.com  
  • 2. About Me •  Lead  Engineer  working  on  Real  Time  Data   Infrastructure  @  Groupon     •  Graduate  of  Portland  State  and  Madras   University  
  • 5. Scaling: Keeping Up With a Changing Business 2014  2011   2012   Growing  Number  of  deals   Growing  Users   •  100  Million+  subscribers   •  We  need    to  store  data   like,  user  click  history,     email  records,  service   logs  etc.  This  is  billions  of   data  points  and  TB’s  of   data  
  • 6. Changing Business: Shift from Email to Mobile •  Growth  in  Mobile   Business   •  Reducing  dependence  on   email  markeOng     100  Million+  App  Downloads  
  • 7. Deal Personalization Infrastructure Use Cases Deliver Personalized Emails Deliver Personalized Website & Mobile Experience Offline  System   Online  System   Email   Personalize  billions  of  emails  for  hundreds   of  millions  of  users   Personalize  one  of  the  most  popular   e-­‐commerce  mobile  &  web  app   for  hundreds  of  millions  of  users  &  page  views  
  • 8. Deal Personalization Infrastructure Use Cases Deliver Personalized Website, Mobile and Email Experience Deal  Performance   Understand  User  Behavior   Deliver Relevant Experience with High Quality Deals
  • 9. Earlier System Offline   PersonalizaOon   Map/Reduce   Data  Pipeline  (User  Logs,  Email  Records,  User  History  etc)   Online  Deal   PersonalizaOon     API   MySQL  Store   Email  
  • 10. Earlier System Email   Offline   PersonalizaOon   Map/Reduce   Data  Pipeline   Online  Deal   PersonalizaOon     API   MySQL  Store   •   Scaling  MySQL  for  data   such  as  user  click  history,   email  records  was   painful  unless  we  shard   data   •  Data  Pipeline  is  not   “Real  Time”  
  • 11. Email   Offline   PersonalizaOon   Map/Reduce   Real  Time  Data   Pipeline   Online  Deal   PersonalizaOon     API   Ideal  Data  Store   •  Common  data  store  that   serves  data  to  both  online   and  offline  systems   •  Data  store  that  scales  to   hundreds  of  millions  of   records   •  Data  store  that  works  well   with  our  exisOng  Hadoop   based  systems   •  Real  Time  pipeline  that  scales   and  can  process  about   100,000  messages/  second   Ideal System
  • 12. Email   Offline   PersonalizaOon   Map/Reduce   Web  Site     Logs   Online  Deal   PersonalizaOon     API   HBase   Final Design Mobile     Logs   Ka`a  Message  Broker   Storm  
  • 13. Two Challenges With HBase HBase   How  to  scale   100,000     writes/  second?   HBase   •  How  to  run  Map  Reduce  Programs   over  HBase  without  affecOng  read   latency?   •  How  to  batch  load  data  in  HBase     without  affecOng  read  latencies?    
  • 14. Final Hbase Design Real  Time   HBase   Batch   HBase   Bulk  Load    data  via   HFiles   ReplicaOon   Map  Reduce  Over   HBase  
  • 15. Leveraging System for Real Time Analytics  Various  requirements  from  relevance  algorithms  to  pre-­‐ compute  real  6me  analy6cs  for  be9er  targe6ng       Category  Level   MulOdimensional   Performance   Metrics         Deal  Level   Performance   Metrics    How  do    women  in  Dublin   convert  for  Pizza  deals?     How  do  women  in  Dublin   convert  for  a  parOcular  pizza   deal?    
  • 16. Leveraging System for Real Time Analytics  More  Complex  Examples       Category  Level   MulOdimensional   Performance  Metrics         Deal  Level   Performance  Metrics    How  do  women  in  Dublin   from  the  Dundrum  area  aged   30-­‐35  convert  for  New  York   Style  Pizza,  when  deal  is   located  within  2  miles,  and   when  deal  is  priced  between   €10-­‐€20?      How  do  women  in  Dublin  from  Dundrum  area   aged  30-­‐35  convert  for  a  parOcular  deal?  
  • 17. Leveraging System for Real Time Analytics Even  More  Complex  Examples      How  do  women  in  Dublin   from  the  Dundrum  area   aged  30-­‐35  who  also  like   acOviOes  like  Biking  and  are     acOve  customers  on  our   mobile  plahorm  convert   when  deal  is  located  within   2  miles,  and  when  deal  is   priced  between  €10-­‐€20?      How  do  women  in  Dublin  from   the  Dundrum  area  aged  30-­‐35   who  also  like  acOviOes  such  as   biking  and  are  acOve  customers   of  Groupon  deals  on  mobile   plahorm  convert  for  this   parOcular  deal?  
  • 18. Power of Simple Counting Turns  out  all  earlier  quesOons  can  be  answered  if  we  could  count  appropriate  events  in   appropriate  bucket         No  Deal  Impressions  by  Women  in  Dublin  for  Pizza  Deals       No  of  Purchases  by  Women  in  Dublin  for   Pizza  Deals  Conversion  rate   for  pizza  deals   for  women  in   Dublin   =  
  • 19. Real Time Analytics Infrastructure Ka`a  Topic  –  With  Real  Time  User   events   Storm  –  Running  AnalyOcs  Topology   Real  Time  infrastructure  processing     100,000  requests/  second   Redis  1   …   Storm  Topology  calculaOng  various   dimensions/  buckets  and  updates   appropriate  Redis  bucket.  Redis  is   sharded  from  client  side   Redis  cluster  handles  over  3  Million   events  per  second.  Stores  over  14   Billion  unique  keys   Redis  2   Redis  N  
  • 20. Real Time Analytics Infrastructure - Explained Ka`a  Topic  –   With  Real   Time  User   events   Read  user   event  Data   from  Ka`a   Find  out   which  all   buckets  this   event  falls   Increase  event   counter  for   appropriate   bucket  in  Redis   Redis   Shards   Storm  
  • 21. Scaling Challenges - Kafka - Storm   •  Storm  was  hard  to  scale.  We  had  to  try  various  number  of  combinaOons  to   finalize  how  many  bolts  of  each  type  are  required  for  steady  state   operaOons  and  overall  how  many  workers  are  needed.   •  Use  “topology.max.spout.pending”  senng  in  Storm  topologies.  We  found   it  to  be  very  useful  to  shield  your  topologies  from  sudden  surge  in  traffic.   •  Build  your  enOre  infrastructure  –  where  data  duplicates  are  allowed  
  • 22. Scaling Challenges - Redis •  Reduce  memory  footprint  –    use  hashes.  Very  memory   efficient  compared  to  normal  Redis  keys     •  In  order  to  support  high  write  operaOons  turned  off  AOF,   turned  on  RDB  backups   Easiest  of  all  other  infrastructure  pieces  –  Ka`a,  Storm,  HBase  
  • 23. When Small is Big – Bloom Filters •  Since  both  Ka`a  and  Storm  can  send  same  data  twice  specially  at   scale,  it  was  important  to  build  downstream  infrastructure  that  can   handle  duplicate  data.   •  However,  by  very  nature  AnalyOcs  Topology  (CounOng  Topology)   cannot  handle  duplicates   •  Storing  individual  messages  for  billions  of  messages  is  way  too   expensive  and  would  take  lot  more  memory     •  So  we  used  bloom  filters.  At  a  very  small  %  error  rate,  we  could   effecOvely  de-­‐dupe  data  with  a  very  small  memory  footprint.  
  • 24. Avoiding Errors – Backups/ Recovery Strategy For  a  high  volume  system,  which  also  drives  so  much  revenue  for  the  company  good   backup/recovery  strategy  is  necessary   Redis     RDB  Backups  every   few  hours.  RDB   backups  are  stored   in  HDFS  for  later   use       HBase     HBase  Snapshot   funcOonality  is   used.  Snapshot  are   taken  every  few   hours.     Ka`a/  Storm     All  input  into  Ka`a   topic  is  stored  in   HDFS  for  30  days.   So  any  hour/  day   can  be  replayed   from  HDFS  if   necessary.  
  • 25. Monitoring Overall end-to-end monitoring to test the complete flow of data Ka`a  -­‐>  Storm  -­‐>  HBase  Pipeline   Crawler  crawls  the  page  and  monitoring  looks  for  corresponding  data  in  HBase  
  • 26. psampath@groupon.com www.groupon.com/techjobs Ques6ons?   Thank  you!   Slides  prepared  in  collabora/on  with  Ameya  Kanitkar