SlideShare a Scribd company logo
COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS


                      SPEAKER: Vipul Sharma
                               Director of Data Engineering
                               Eventbrite



Monday, April 1, 13
Real Time Data Processing at Scale
                           Vipul Sharma – Director of Data Engineering


Monday, April 1, 13
Eventbrite by the Numbers




Monday, April 1, 13
Eventbrite by the Numbers




                                 1.5 million events
                              80 million tickets sold
                          $1 billion in gross ticket sales
                             Events in 179 countries




Monday, April 1, 13
Who am I?

          Director of Data Engineering at Eventbrite
          Infrastructure, Data Science, Analytics, Spam and Fraud

          linkedin.com/in/vipulsharma3
          @vipulsharma
          vipul@eventbrite.com




Monday, April 1, 13
Real Time

          • Definition of real time varies with use case
          • Real time at scale is a challenge
          • Active learning requires real time data processing
             • Spam/Fraud
             • Discovery
             • Search
          • Analytics
             • Real time analytics
          • Data Changes
             • Changes in inventory, user settings etc

Monday, April 1, 13
Scaling for Growth

          • Decouple Services
                 • Decouple services based on CAP, Size and Growth
                 • NoSQL attractive for out of the box sharding, replication and multi data
                   center support along with high write speeds
                 • Multiple data stores pose a challenges of data flow between services in real
                   time
          • Batch Processing
                 •    Batch processing for big data e.g. data science, analytics etc
                 •    MapReduce is not built for real time
                 •    Data locality requires data to be stored on HDFS
                 •    Data Sync to Hadoop in real time is a challenge


Monday, April 1, 13
Monday, April 1, 13
Challenges with Real Time
          • Data Flow
                 • How to transfer data captured in logs to services in real
                   time
                 • How to transfer data captured in database to services in
                   real time
          • Data Processing
                 • How to process significant data in real time
                 • Distributed data processing for real time



Monday, April 1, 13
Data Flow

          • Database polling
                 •    Rather than each application polling build a single polling service
                 •    Downstream applications polls from this service
                 •    Built for consistency and read scalability
                 •    Example: Event Cache
                 •    Excited about Linkedin’s Databus - http://data.linkedin.com/projects/
                      databus
          • Persisted Queues
                 • Transfer logs via a distributed persisted message queue
                 • Downstream applications subscribe to these queues getting a stream of
                   data
                 • Example: Firehose
                 • Excited about Linkedin’s Kafka - http://kafka.apache.org/index.html
Monday, April 1, 13
Data Processing

          • Denormalization
            • Write data ready to serve
            • NoSQL built for Denormalization
            • Example: See who’s visiting
          • Distributed Data Processing
            • Complex business logic needs more than de-normalization
            • Example: API stats using Storm
            • http://storm-project.net/



Monday, April 1, 13
Questions?




                      See it in action. Download our app:
                      eventbrite.com/eventbriteapp



Monday, April 1, 13
Thank You!
                      @vipulsharma/ vipul@eventbrite.com
Monday, April 1, 13
Monday, April 1, 13

More Related Content

What's hot

Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Zillow's favorite big data & machine learning tools
Zillow's favorite big data & machine learning toolsZillow's favorite big data & machine learning tools
Zillow's favorite big data & machine learning tools
njstevens
 
Preview - Massive Scale Content at Re:Invent 2015
Preview - Massive Scale Content at Re:Invent 2015Preview - Massive Scale Content at Re:Invent 2015
Preview - Massive Scale Content at Re:Invent 2015
John Newton
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
Thang Bui (Bob)
 
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
Amazon Web Services
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
✔ Eric David Benari, PMP
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
Blake Irvine
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Sanjay Sharma
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
✔ Eric David Benari, PMP
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talk
Sanjay Sharma
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
Anjani Phuyal
 
Useful data presentation from DataShaka
Useful data presentation from DataShakaUseful data presentation from DataShaka
Useful data presentation from DataShaka
agenda21
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Jen Stirrup
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
Impetus Technologies
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
Qubole
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
Vincent Terrasi
 
NoSQL Now
NoSQL NowNoSQL Now
NoSQL Now
Orchestrate
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
SingleStore
 

What's hot (20)

Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
 
Zillow's favorite big data & machine learning tools
Zillow's favorite big data & machine learning toolsZillow's favorite big data & machine learning tools
Zillow's favorite big data & machine learning tools
 
Preview - Massive Scale Content at Re:Invent 2015
Preview - Massive Scale Content at Re:Invent 2015Preview - Massive Scale Content at Re:Invent 2015
Preview - Massive Scale Content at Re:Invent 2015
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
 
Data Architecture Brief Overview
Data Architecture Brief OverviewData Architecture Brief Overview
Data Architecture Brief Overview
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talk
 
Building an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 MinutesBuilding an IoT Kafka Pipeline in Under 5 Minutes
Building an IoT Kafka Pipeline in Under 5 Minutes
 
Big Data Analytics & Architecture
Big Data Analytics & ArchitectureBig Data Analytics & Architecture
Big Data Analytics & Architecture
 
Useful data presentation from DataShaka
Useful data presentation from DataShakaUseful data presentation from DataShaka
Useful data presentation from DataShaka
 
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
Data Visualisation with Hadoop Mashups, Hive, Power BI and Excel 2013
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
NoSQL Now
NoSQL NowNoSQL Now
NoSQL Now
 
Real-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQLReal-Time Analytics with Spark and MemSQL
Real-Time Analytics with Spark and MemSQL
 

Similar to COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
17aroumougamh
 
Open Data and APIs - DataWeave
Open Data and APIs - DataWeaveOpen Data and APIs - DataWeave
Open Data and APIs - DataWeave
DataWeave
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
Treasure Data, Inc.
 
THINK DIFFERENT, THINK SIGNAL PROCESSING: APPROACHES TO REAL-TIME NETWORK DA...
THINK DIFFERENT, THINK SIGNAL PROCESSING:  APPROACHES TO REAL-TIME NETWORK DA...THINK DIFFERENT, THINK SIGNAL PROCESSING:  APPROACHES TO REAL-TIME NETWORK DA...
THINK DIFFERENT, THINK SIGNAL PROCESSING: APPROACHES TO REAL-TIME NETWORK DA...
Gigaom
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data miningAnkit Solanki
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceMercedes Coyle
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
Caserta
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenches
Azrul MADISA
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
Rob Winters
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
Dunn Solutions Group
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
AWSCOMSUM
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
Tomy Rhymond
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 

Similar to COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013 (20)

Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Open Data and APIs - DataWeave
Open Data and APIs - DataWeaveOpen Data and APIs - DataWeave
Open Data and APIs - DataWeave
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 
THINK DIFFERENT, THINK SIGNAL PROCESSING: APPROACHES TO REAL-TIME NETWORK DA...
THINK DIFFERENT, THINK SIGNAL PROCESSING:  APPROACHES TO REAL-TIME NETWORK DA...THINK DIFFERENT, THINK SIGNAL PROCESSING:  APPROACHES TO REAL-TIME NETWORK DA...
THINK DIFFERENT, THINK SIGNAL PROCESSING: APPROACHES TO REAL-TIME NETWORK DA...
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Big data
Big dataBig data
Big data
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenches
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
HP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big DataHP Discover: Real Time Insights from Big Data
HP Discover: Real Time Insights from Big Data
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
 

More from Gigaom

Structure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe WeinmanStructure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe Weinman
Gigaom
 
Structure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - RackspaceStructure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - Rackspace
Gigaom
 
Structure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey resultsStructure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey results
Gigaom
 
Structure 2014 - Launchpad Competition
Structure 2014 - Launchpad CompetitionStructure 2014 - Launchpad Competition
Structure 2014 - Launchpad Competition
Gigaom
 
Structure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshopStructure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshop
Gigaom
 
Structure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - BatteryStructure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - Battery
Gigaom
 
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Gigaom
 
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Gigaom
 
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit BendovStructure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Gigaom
 
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Gigaom
 
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA, Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Gigaom
 
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherStructure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Gigaom
 
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris HaddadStructure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Gigaom
 
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Gigaom
 
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrathStructure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Gigaom
 
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve RussellStructure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Gigaom
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Gigaom
 
How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013
Gigaom
 
25 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 201325 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 2013
Gigaom
 
How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013
Gigaom
 

More from Gigaom (20)

Structure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe WeinmanStructure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe Weinman
 
Structure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - RackspaceStructure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - Rackspace
 
Structure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey resultsStructure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey results
 
Structure 2014 - Launchpad Competition
Structure 2014 - Launchpad CompetitionStructure 2014 - Launchpad Competition
Structure 2014 - Launchpad Competition
 
Structure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshopStructure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshop
 
Structure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - BatteryStructure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - Battery
 
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
 
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
 
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit BendovStructure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
 
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
 
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA, Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
 
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherStructure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
 
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris HaddadStructure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
 
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
 
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrathStructure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
 
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve RussellStructure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013
 
25 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 201325 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 2013
 
How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 

COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS from Structure:Data 2013

  • 1. COMPLEMENTING HADOOP WITH REAL-TIME DATA ANALYSIS SPEAKER: Vipul Sharma Director of Data Engineering Eventbrite Monday, April 1, 13
  • 2. Real Time Data Processing at Scale Vipul Sharma – Director of Data Engineering Monday, April 1, 13
  • 3. Eventbrite by the Numbers Monday, April 1, 13
  • 4. Eventbrite by the Numbers 1.5 million events 80 million tickets sold $1 billion in gross ticket sales Events in 179 countries Monday, April 1, 13
  • 5. Who am I? Director of Data Engineering at Eventbrite Infrastructure, Data Science, Analytics, Spam and Fraud linkedin.com/in/vipulsharma3 @vipulsharma vipul@eventbrite.com Monday, April 1, 13
  • 6. Real Time • Definition of real time varies with use case • Real time at scale is a challenge • Active learning requires real time data processing • Spam/Fraud • Discovery • Search • Analytics • Real time analytics • Data Changes • Changes in inventory, user settings etc Monday, April 1, 13
  • 7. Scaling for Growth • Decouple Services • Decouple services based on CAP, Size and Growth • NoSQL attractive for out of the box sharding, replication and multi data center support along with high write speeds • Multiple data stores pose a challenges of data flow between services in real time • Batch Processing • Batch processing for big data e.g. data science, analytics etc • MapReduce is not built for real time • Data locality requires data to be stored on HDFS • Data Sync to Hadoop in real time is a challenge Monday, April 1, 13
  • 9. Challenges with Real Time • Data Flow • How to transfer data captured in logs to services in real time • How to transfer data captured in database to services in real time • Data Processing • How to process significant data in real time • Distributed data processing for real time Monday, April 1, 13
  • 10. Data Flow • Database polling • Rather than each application polling build a single polling service • Downstream applications polls from this service • Built for consistency and read scalability • Example: Event Cache • Excited about Linkedin’s Databus - http://data.linkedin.com/projects/ databus • Persisted Queues • Transfer logs via a distributed persisted message queue • Downstream applications subscribe to these queues getting a stream of data • Example: Firehose • Excited about Linkedin’s Kafka - http://kafka.apache.org/index.html Monday, April 1, 13
  • 11. Data Processing • Denormalization • Write data ready to serve • NoSQL built for Denormalization • Example: See who’s visiting • Distributed Data Processing • Complex business logic needs more than de-normalization • Example: API stats using Storm • http://storm-project.net/ Monday, April 1, 13
  • 12. Questions? See it in action. Download our app: eventbrite.com/eventbriteapp Monday, April 1, 13
  • 13. Thank You! @vipulsharma/ vipul@eventbrite.com Monday, April 1, 13