SlideShare a Scribd company logo
1 of 15
Lambda Architecture:
How We Merged Batch and Real-Time
Sewook Wee, Senior Engineering Lead
Sotos Matzanas, Tech Lead
June 27 2016
Our Goal
Our goal at Trulia is to
give consumers an easy
and enjoyable way to find
their next home by
providing data and
insights to help them
make the best decision.
Personalization Team
• Who we are
• What we do
• How we do it
User ID: 30942342
Browser Cookies: [1411087c06c6b530c155b933fdee44e2ef8905]
SearchHistory: [
{time: 2016-01-21T23:38:31Z,
query: “/for_sale/San_Francisco,CA/2p_beds/1p_baths”}
]
lastVisited: 2016-01-21T23:52:55Z
locationPreference: {
“San Francisco, CA” : 0.52,
“Oakland, CA”: 0.39,
“Los Angeles, CA” : 0.09
}
User Type: { buyer: 0.87, renter : 0.13 }
Examples of User Trait
user-device linkage
activity summary
inferred insights
Recommended properties: [
{ propertyId: 3223394214, score: 0.22},
{ propertyId: 3223518578, score: 0.09}
]
Why Lambda Architecture
We needed a way to…
• Recalculate the full User Trait from full event body at scale
• Read it back fast
• Ability to add new metrics to old aggregates
• Refresh near real-time to catch up the delta
Enter: Lambda Architecture
Event
Master
Dataset
Batch
Processing
Real-Time
Processing
Serving Layer
User Trait API
Desktop,
Mobile Web
Mobile
Apps
Email & Push
Event
API
User Trait
(Real-Time)
User
Trait
(Batch)
Kafka
Our User Model
• We support both registered and unregistered users
• Registered users: user id + secondary id(s) (mobile, Web, email)
• User login: link and merge all known activity on all devices
Our Real-Time Complications
• User linkage can change while new batch is calculated
• New user linkage can appear during the day, and not reflected in
batch calculation
• We needed to plan for these and make sure eventual User Trait
reflects the state of a user as of right now
Event Event Event Parse
Linkage
Lookup
HBase
Store
Transfers Writes Reads
Redis
user id
Simple Real-Time Case
Parse
Linkage
Lookup
Yesterday’s
Lookup
(Hbase)
Rebalance user id
secondary id
user id
Redis
Store
Send as Control Events
Control
Bolt
Lookup
Change
Lookup
Change
Lookup
Change
Control Event Spout
Rebalance Time @Batch Completion Time
Get all user ids + secondary ids for today
Transfers Writes Reads
Current Real-Time Design
Event Event Event
Kafka Spout
Today’s
Lookup
(Hbase)
Transition to a New Epoch
• When rebalance of all ids is complete
• Completion of rebalance: no new user id has been rebalanced for 30
seconds
• Redis keys with TTL mark a heartbeat that disappears if no new
control events
Rebalance
Done for N+1
Midnight Rebalance
Done for N
Batch Layer
Events for
Epoch N
Batch Layer
N Done
Midnight
Batch Layer Events for N + 1
Speed Layer Events Epoch N
Batch Layer
N +1 Done
Timeline
Epoch Transitions
Serve
N + 1
Serving Batch N +
Speed N
Batch Layer
Epoch N
Real-Time
Layer Epoch N
Speed Layer Events for N + 1
Batch Layer
Epoch N + 1
Real-Time Layer
Epoch N + 1
Rebalance
for N
Event Processing Epoch Serving FromBatch Process
Rebalance
for N
Our Input and its Size
• Hundreds of millions of events per day
• Billions of events per month
• 12TBs of events, and growing
• Hundreds of millions of User Traits calculated daily
• Millions calculated in real-time
As a Result
• Continuously add new features to build data driven products
• Retroactively apply new features on old data
• A virtuous cycle of learning more, personalizing more, and
learning again
• Delivery of data and insights to help consumers make the
best decision
swee@trulia.com
smatzanas@trulia.com

More Related Content

Viewers also liked

งานนำเสนอ1.2
งานนำเสนอ1.2งานนำเสนอ1.2
งานนำเสนอ1.2sirivadee
 
World of Watson 2016 - Content Management
World of Watson 2016 - Content ManagementWorld of Watson 2016 - Content Management
World of Watson 2016 - Content ManagementKeith Redman
 
World of Watson 2016 - Information Insecurity
World of Watson 2016 - Information InsecurityWorld of Watson 2016 - Information Insecurity
World of Watson 2016 - Information InsecurityKeith Redman
 
World of Watson 2016 - Artificial Intelligence Research
World of Watson 2016 - Artificial Intelligence ResearchWorld of Watson 2016 - Artificial Intelligence Research
World of Watson 2016 - Artificial Intelligence ResearchKeith Redman
 
Japanese niv
Japanese nivJapanese niv
Japanese nivgriffey24
 
World of Watson 2016 - Implementing data science
World of Watson 2016 - Implementing data scienceWorld of Watson 2016 - Implementing data science
World of Watson 2016 - Implementing data scienceKeith Redman
 
The Business of Trading
The Business of TradingThe Business of Trading
The Business of TradingKeith Redman
 
World of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseWorld of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseKeith Redman
 
World of Watson 2016 - For your Boss - Panel discussions
World of Watson 2016 - For your Boss - Panel discussionsWorld of Watson 2016 - For your Boss - Panel discussions
World of Watson 2016 - For your Boss - Panel discussionsKeith Redman
 
World of Watson 2016 - Put your Analytics on Cloud 9
World of Watson 2016 - Put your Analytics on Cloud 9World of Watson 2016 - Put your Analytics on Cloud 9
World of Watson 2016 - Put your Analytics on Cloud 9Keith Redman
 
World of Watson 2016 - Internet of (Things) Tomorrow
World of Watson 2016 - Internet of (Things) TomorrowWorld of Watson 2016 - Internet of (Things) Tomorrow
World of Watson 2016 - Internet of (Things) TomorrowKeith Redman
 
World of Watson 2016 - What is this thing called cognitive
World of Watson 2016 - What is this thing called cognitiveWorld of Watson 2016 - What is this thing called cognitive
World of Watson 2016 - What is this thing called cognitiveKeith Redman
 
World of Watson 2016 - Data lake or Data Swamp
World of Watson 2016 - Data lake or Data SwampWorld of Watson 2016 - Data lake or Data Swamp
World of Watson 2016 - Data lake or Data SwampKeith Redman
 

Viewers also liked (13)

งานนำเสนอ1.2
งานนำเสนอ1.2งานนำเสนอ1.2
งานนำเสนอ1.2
 
World of Watson 2016 - Content Management
World of Watson 2016 - Content ManagementWorld of Watson 2016 - Content Management
World of Watson 2016 - Content Management
 
World of Watson 2016 - Information Insecurity
World of Watson 2016 - Information InsecurityWorld of Watson 2016 - Information Insecurity
World of Watson 2016 - Information Insecurity
 
World of Watson 2016 - Artificial Intelligence Research
World of Watson 2016 - Artificial Intelligence ResearchWorld of Watson 2016 - Artificial Intelligence Research
World of Watson 2016 - Artificial Intelligence Research
 
Japanese niv
Japanese nivJapanese niv
Japanese niv
 
World of Watson 2016 - Implementing data science
World of Watson 2016 - Implementing data scienceWorld of Watson 2016 - Implementing data science
World of Watson 2016 - Implementing data science
 
The Business of Trading
The Business of TradingThe Business of Trading
The Business of Trading
 
World of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics HouseWorld of Watson 2016 - Architecting your Analytics House
World of Watson 2016 - Architecting your Analytics House
 
World of Watson 2016 - For your Boss - Panel discussions
World of Watson 2016 - For your Boss - Panel discussionsWorld of Watson 2016 - For your Boss - Panel discussions
World of Watson 2016 - For your Boss - Panel discussions
 
World of Watson 2016 - Put your Analytics on Cloud 9
World of Watson 2016 - Put your Analytics on Cloud 9World of Watson 2016 - Put your Analytics on Cloud 9
World of Watson 2016 - Put your Analytics on Cloud 9
 
World of Watson 2016 - Internet of (Things) Tomorrow
World of Watson 2016 - Internet of (Things) TomorrowWorld of Watson 2016 - Internet of (Things) Tomorrow
World of Watson 2016 - Internet of (Things) Tomorrow
 
World of Watson 2016 - What is this thing called cognitive
World of Watson 2016 - What is this thing called cognitiveWorld of Watson 2016 - What is this thing called cognitive
World of Watson 2016 - What is this thing called cognitive
 
World of Watson 2016 - Data lake or Data Swamp
World of Watson 2016 - Data lake or Data SwampWorld of Watson 2016 - Data lake or Data Swamp
World of Watson 2016 - Data lake or Data Swamp
 

Similar to Lambda Architecture: How we merged batch and real-time

SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018Chester Chen
 
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Have your cake and eat it too, further dispelling the myths of the lambda arc...Have your cake and eat it too, further dispelling the myths of the lambda arc...
Have your cake and eat it too, further dispelling the myths of the lambda arc...Dimos Raptis
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessBen Stopford
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisAmazon Web Services
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaDobo Radichkov
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropeFlip Kromer
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAmazon Web Services
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhedeconfluent
 
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database Redis Labs
 
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...BATbern
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...Matt Stubbs
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...DataStax
 

Similar to Lambda Architecture: How we merged batch and real-time (20)

[Meetup ms] Kafka Streams
[Meetup ms] Kafka Streams[Meetup ms] Kafka Streams
[Meetup ms] Kafka Streams
 
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
SF big Analytics : Stream all things by Gwen Shapira @ Lyft 2018
 
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Have your cake and eat it too, further dispelling the myths of the lambda arc...Have your cake and eat it too, further dispelling the myths of the lambda arc...
Have your cake and eat it too, further dispelling the myths of the lambda arc...
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real WorldWSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
WSO2Con USA 2015: Patterns for Deploying Analytics in the Real World
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
 
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropePatterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
 
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha NarkhededotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
 
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
RedisConf17 - Redfin - The Real Estate Brokerage and the In-memory Database
 
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
batbern43 Stream all Things: Patterns of Data Integration in Event Driven Sys...
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
Big Data LDN 2018: THE FUTURE OF STREAMING: GLOBAL APPS, EVENT STORES AND SER...
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
 

Recently uploaded

Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 

Recently uploaded (20)

Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 

Lambda Architecture: How we merged batch and real-time

  • 1. Lambda Architecture: How We Merged Batch and Real-Time Sewook Wee, Senior Engineering Lead Sotos Matzanas, Tech Lead June 27 2016
  • 2. Our Goal Our goal at Trulia is to give consumers an easy and enjoyable way to find their next home by providing data and insights to help them make the best decision.
  • 3. Personalization Team • Who we are • What we do • How we do it
  • 4. User ID: 30942342 Browser Cookies: [1411087c06c6b530c155b933fdee44e2ef8905] SearchHistory: [ {time: 2016-01-21T23:38:31Z, query: “/for_sale/San_Francisco,CA/2p_beds/1p_baths”} ] lastVisited: 2016-01-21T23:52:55Z locationPreference: { “San Francisco, CA” : 0.52, “Oakland, CA”: 0.39, “Los Angeles, CA” : 0.09 } User Type: { buyer: 0.87, renter : 0.13 } Examples of User Trait user-device linkage activity summary inferred insights Recommended properties: [ { propertyId: 3223394214, score: 0.22}, { propertyId: 3223518578, score: 0.09} ]
  • 5. Why Lambda Architecture We needed a way to… • Recalculate the full User Trait from full event body at scale • Read it back fast • Ability to add new metrics to old aggregates • Refresh near real-time to catch up the delta
  • 6. Enter: Lambda Architecture Event Master Dataset Batch Processing Real-Time Processing Serving Layer User Trait API Desktop, Mobile Web Mobile Apps Email & Push Event API User Trait (Real-Time) User Trait (Batch) Kafka
  • 7. Our User Model • We support both registered and unregistered users • Registered users: user id + secondary id(s) (mobile, Web, email) • User login: link and merge all known activity on all devices
  • 8. Our Real-Time Complications • User linkage can change while new batch is calculated • New user linkage can appear during the day, and not reflected in batch calculation • We needed to plan for these and make sure eventual User Trait reflects the state of a user as of right now
  • 9. Event Event Event Parse Linkage Lookup HBase Store Transfers Writes Reads Redis user id Simple Real-Time Case
  • 10. Parse Linkage Lookup Yesterday’s Lookup (Hbase) Rebalance user id secondary id user id Redis Store Send as Control Events Control Bolt Lookup Change Lookup Change Lookup Change Control Event Spout Rebalance Time @Batch Completion Time Get all user ids + secondary ids for today Transfers Writes Reads Current Real-Time Design Event Event Event Kafka Spout Today’s Lookup (Hbase)
  • 11. Transition to a New Epoch • When rebalance of all ids is complete • Completion of rebalance: no new user id has been rebalanced for 30 seconds • Redis keys with TTL mark a heartbeat that disappears if no new control events
  • 12. Rebalance Done for N+1 Midnight Rebalance Done for N Batch Layer Events for Epoch N Batch Layer N Done Midnight Batch Layer Events for N + 1 Speed Layer Events Epoch N Batch Layer N +1 Done Timeline Epoch Transitions Serve N + 1 Serving Batch N + Speed N Batch Layer Epoch N Real-Time Layer Epoch N Speed Layer Events for N + 1 Batch Layer Epoch N + 1 Real-Time Layer Epoch N + 1 Rebalance for N Event Processing Epoch Serving FromBatch Process Rebalance for N
  • 13. Our Input and its Size • Hundreds of millions of events per day • Billions of events per month • 12TBs of events, and growing • Hundreds of millions of User Traits calculated daily • Millions calculated in real-time
  • 14. As a Result • Continuously add new features to build data driven products • Retroactively apply new features on old data • A virtuous cycle of learning more, personalizing more, and learning again • Delivery of data and insights to help consumers make the best decision

Editor's Notes

  1. Introductions
  2. Sewook Engage audience: Show of hands – how many of you have heard of Trulia before? Trulia’s goal is to simplify the crazy experience of finding a home, by providing data and insights to help you make a better decision It’s not just about finding the best home in the town, but finding the best home for you. To do that, we need to know our users, so we’ve formed a personalization team
  3. Sewook The personalization team works to understand what our users are looking for We have built a personal users platform based on the Lambda Architecture, where we track users’ activity in real-time, process them and build a digital signature or profile We have built a digital profile of each user which we call a user trait, which I’ll explain a bit more on the next slide
  4. Sewook This slide explains how we’ve built our user traits. Essentially, we take the repository of consumer activity events, process and generate the user trait. The simplest approach to processing the data is batch processing, but that takes time and during the batch cycles the user trait becomes stale. Another extreme approach is event by event full real-time processing, which is cool but historically we can ran into other issues, like full data re-processsing. Which is why we landed on Lambda Architecture.
  5. Sewook We like Lambda because it has batch and real-time benefits Through Lambda, we can recalculate the full trait from event body in each batch cycle at scale Whenever we need to change business rules or cleanse old data, we can do it very easily We can read back each individual trait quickly In addition, we have a real-time layer where we can catch up the delta Hand presentation to Sotos: Sotos here will explain exactly how our personalization platform looks
  6. Sotos I’m going to share how we implement Lambda Architecture, our specific needs and complications and what our current architecture looks like. Will mostly focus on the real-time process but first will walk you through our batch and how we built our Lambda Architecture batch part.
  7. Sotos Before I dig into the complications of our RT let me explain our user model a bit No matter what avenue a user comes from, we always build a user trait, even though we have a secondary ID, so incases where the user is registered, we also build a unique user trait We discover linkages through our batch workflow and store all discovered linkages in a unique table per batch run
  8. Sotos The issue is that linkages may change day to day, or we might discover a new linkage. We need to account for all these cases. Our real time platform needs to properly marry all activity for a user.
  9. Sotos
  10. Sotos
  11. Sotos Explain what Epoch is
  12. Batch layer events for Epoch N Midnight line Batch layer epoch N blue Speed layer evenets epoch n orange 5. real-time layer epoch n not rebalance N 6.Batch layer N done line 7. Rebalance for N 8. Rebalance done for N line 9. Serving Batch N