SlideShare a Scribd company logo
1 of 40
Sea of data
Story of data, scale and how we evolve
architecture to handle it.
Daniel Marchant (@driedtoast)
What do you think of when
you hear the word “data”?
Setting the stage
Data – Things known or assumed as facts,
making the basis of reasoning or calculation
Time – the indefinite continued progress of
existence and events in the past, present and
future regarded as a whole
What types of data
are there?
Types of data
● Customer Data - Data the customer provides,
the lifeblood of your application
● Business Data - Metrics on how growth,
customer attrition, marketing, etc...
● Operation Data - Metrics and log messages
that help troubleshoot / monitor your
application
Let’s jump into the story...
Once upon a time...
A company was founded to
produce the best seamonkey
management application ever
produced. (purely fictional for
now)
More details: http://www.seamonkey.xyz (eventually)
A hypothetical system timeline
● Launch of application
● Reddit posts promote application
● Hacker News promotes application
● Product Hunt promotes application
Launch
Look ma, I got an app online!
Initial dataset
● Operation Data
○ cpu / memory / disk metrics
○ error messages in logs
● Business Data
○ Signup metrics
○ Access usage
● Customer Data
○ User
○ Seamonkey info
Launch Architecture
Architecture
● Load balancer - route traffic to application
● Application - handles requests and manages
data to the database
● Database - data storage
So simple, life is good! Some reads and writes!
Integrations
● Metric Service - google analytics,
kilometer.io, kissmetrics, mixpanel, etc...
● Operation Events - datadog, graylog,
newrelic, etc…
Troubleshooting
● Pretty straight forward
● Check application can write to DB
● Make sure database user can access tables
● Make sure the transactions scoped in the
application make sense
● Check rollback scenarios
A little about ACID
● Atomicity: all task(s) within a transaction are performed or
none of them are. An all-or-none principle.
● Consistency: transaction does not violate those protocols
and the data must remain in a consistent state at the
beginning and end of a transaction; no half-completed
transactions.
● Isolation: each transaction is independent unto itself for
both performance and consistency of transactions.
● Durability: Once complete the transaction will persist as
complete; it will survive system failure, power loss and other
types of system breakdowns.
Reddit
Oh, cool some people are looking at it!
Data evolution
● Operation Data Additions
○ Timers on critical logic
○ Customer requests
● Business Data Additions
○ Customer emails on problems
● Customer Data Additions
○ Seamonkey Tank
○ Seamonkey Social interactions
Architecture Changes
Architecture
● Load balancer - route traffic to application
● Application - still managing data, more
nodes added
● Worker - handles work from the db ‘queue’
table
● Cache - used to taper database reads
● Database - data storage master
● Read Only Database - slave data storage
More integrations
● Gmail - customer emails
● DataLoop - Timers and statsd data
● Open Tracing - distributed event tracing
http://opentracing.io/
More Troubleshooting
● If the application isn’t display the right data,
is the cache invalidated properly
● Has the worker updated over the application
as changes happen within the queued
process
● Is replication working on from master to
slave
Hacker News
What have I gotten myself into?
Data evolution
● Operation Data / Business Data convergence
○ Customer requests
○ Customer emails to support cases
○ Customer usage to product roadmap
● Customer Data requirements stabilize
Architecture Changes
Architecture
● Application - still managing data, more
nodes added, application pushes writes to a
queue for non-critical work
● Worker - handles work coming from queue
vs db, and writes from application. Also
invalidates cache now.
● Cache - used to taper database reads. App is
getting more complex invalidation logic
CAPs off to you!
● Consistency: same idea presented in ACID.
All data storage nodes see the data.
● Availability: data is available
● Partition Tolerance: system continues to
operate even under circumstances of data
loss or system failure. A single node failure
should not cause the entire system to
collapse.
Troubleshooting
● Oh boy, more systems more debugging
“opportunities”
● If data isn’t updated, has the queue gotten
the event from the application? Has the
worker processed the change event and
written to db?
● Is the queue up? Is the worker up?
Product Hunt
There’s too many people on this planet.
We need another plague.
Data evolution
● Operation Data
○ Hopes for attrition
● Business Data
○ Monitors customer attrition
○ Hopes for NO attrition
● Customer Data
○ Grows insane
○ Working out archive strategies
Architecture Changes
Architecture
● Lifecyle service / database - added a service
to migrate some of the monolith app, service
just handles seamonkey growth and lifecyle
● Worker - still listens for events, writes to
lifecycle service
● Stream - swapped out the queue with an
immutable stream, better data recovery
BASE
● Basically Available: system does guarantee the availability
of the data as regards CAP Theorem; there will be a
response to any request. Response could be a failure to find
data or data could be in an inconsistent state.
● Soft state: state of the data could change over time, there
may be changes going on due to ‘eventual consistency’
● Eventual consistency: data will eventually become
consistent once it stops receiving changes. The system will
continue to receive changes and is not checking the
consistency of every transaction before it moves onto the
next one.
Troubleshooting
● If seamonkeys aren’t progressing, debug
new service, is it up? Database for service
up?
● If event isn’t processed reset stream point to
catch up, handle duplicate events on the
worker vs stream.
● UI not finding events, check service up.
Immutability for the
changing chaos...
Time and data
As you see through the growth patterns, time
and data start to have trade offs. With
questions such as:
● How fast does the data update?
● How do we support a backup and restore?
● How do we ensure no data loss?
Immutability and Time
● If point in time never changes, immutability
is achieved
● Pointer vs point in time, current data version
is a pointer to the latest point in time
● A timeline of data changes provides for
restoration and easier debugging
Distributed immutability
● Database transaction log is an immutable
stream of changes
○ Used for replication, most database /
datastores use this approach
● Immutable stream(Kafkta, Kinesis) provides
an incoming change log, latest changes can
be pointed to part of stream. Reverse db
approach
What’s the point of all this?
Some plankton for thought
● If you have any idea where you'll end up,
you’d have a better idea where to start
● Understanding reactions to growth will help
with setting up services as you grow
● Misery loves company, knowing everyone
has these pain points somehow makes you
happier
● Know where you’ve been helps you now
Thank you!

More Related Content

Viewers also liked

Windows Phone: Presente y futuro
Windows Phone: Presente y futuroWindows Phone: Presente y futuro
Windows Phone: Presente y futuroHernan Guzman
 
Hany Salah last update C.V
Hany Salah last update C.VHany Salah last update C.V
Hany Salah last update C.Vhany salah
 
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of SodiumSmart Villages
 
Anaya -manual_avanzado_de_flash_mx
Anaya  -manual_avanzado_de_flash_mxAnaya  -manual_avanzado_de_flash_mx
Anaya -manual_avanzado_de_flash_mxDiego Aguilera
 
Premio innova s@lute2016 lecce cardioprotetto
Premio innova s@lute2016   lecce cardioprotettoPremio innova s@lute2016   lecce cardioprotetto
Premio innova s@lute2016 lecce cardioprotettoFPA
 
Nursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceNursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceDavid Corman
 
Digital Enterprise_Cover Story
Digital Enterprise_Cover StoryDigital Enterprise_Cover Story
Digital Enterprise_Cover Storysmita vasudevan
 
Chapter i quantities editing
Chapter i quantities editingChapter i quantities editing
Chapter i quantities editingrozi arrozi
 
Identification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersIdentification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersfateh11
 

Viewers also liked (15)

Miniclase
MiniclaseMiniclase
Miniclase
 
Tam workbook
Tam workbookTam workbook
Tam workbook
 
Windows Phone: Presente y futuro
Windows Phone: Presente y futuroWindows Phone: Presente y futuro
Windows Phone: Presente y futuro
 
Resolución 652 de 2012
Resolución 652 de 2012Resolución 652 de 2012
Resolución 652 de 2012
 
Hany Salah last update C.V
Hany Salah last update C.VHany Salah last update C.V
Hany Salah last update C.V
 
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
Edinburgh | May-16 | Future Battery Chemistries – The Rôle of Sodium
 
Evaluación v3
Evaluación v3Evaluación v3
Evaluación v3
 
Anaya -manual_avanzado_de_flash_mx
Anaya  -manual_avanzado_de_flash_mxAnaya  -manual_avanzado_de_flash_mx
Anaya -manual_avanzado_de_flash_mx
 
Ensayo lucy inferencial
Ensayo lucy inferencialEnsayo lucy inferencial
Ensayo lucy inferencial
 
Premio innova s@lute2016 lecce cardioprotetto
Premio innova s@lute2016   lecce cardioprotettoPremio innova s@lute2016   lecce cardioprotetto
Premio innova s@lute2016 lecce cardioprotetto
 
Nursing Homes: Making the Right Choice
Nursing Homes: Making the Right ChoiceNursing Homes: Making the Right Choice
Nursing Homes: Making the Right Choice
 
Digital Enterprise_Cover Story
Digital Enterprise_Cover StoryDigital Enterprise_Cover Story
Digital Enterprise_Cover Story
 
Chapter i quantities editing
Chapter i quantities editingChapter i quantities editing
Chapter i quantities editing
 
Negociacion ces3
Negociacion ces3Negociacion ces3
Negociacion ces3
 
Identification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promotersIdentification and analysis of “extended 210” promoters
Identification and analysis of “extended 210” promoters
 

Similar to Sea of Data

Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxAshutoshmahale3
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyoneKaren Hsieh
 
Why does a business need real-time data processing?
Why does a business need real-time data processing?Why does a business need real-time data processing?
Why does a business need real-time data processing?NexSoftsys
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Brian Brazil
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations VisionSteve Mushero
 
Self service BI for humans
Self service BI for humansSelf service BI for humans
Self service BI for humansAdrian Brudaru
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream ProcessingSafe Software
 
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 20227 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022Safe Software
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale OverviewPete Jarvis
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA
 
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfData Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfAndrew Leo
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 

Similar to Sea of Data (20)

Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Online Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptxOnline Crime Management System.ppt.pptx
Online Crime Management System.ppt.pptx
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
How to build data accessibility for everyone
How to build data accessibility for everyoneHow to build data accessibility for everyone
How to build data accessibility for everyone
 
Why does a business need real-time data processing?
Why does a business need real-time data processing?Why does a business need real-time data processing?
Why does a business need real-time data processing?
 
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
Your data is in Prometheus, now what? (CurrencyFair Engineering Meetup, 2016)
 
Unified Operations Vision
Unified Operations VisionUnified Operations Vision
Unified Operations Vision
 
Self service BI for humans
Self service BI for humansSelf service BI for humans
Self service BI for humans
 
Introduction to Data Stream Processing
Introduction to Data Stream ProcessingIntroduction to Data Stream Processing
Introduction to Data Stream Processing
 
7 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 20227 Emerging Data & Enterprise Integration Trends in 2022
7 Emerging Data & Enterprise Integration Trends in 2022
 
TidalScale Overview
TidalScale OverviewTidalScale Overview
TidalScale Overview
 
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
Data Con LA 2018 - Enabling real-time exploration and analytics at scale at H...
 
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdfData Processing ,Translate Raw Data Into Valuable Insights.pdf
Data Processing ,Translate Raw Data Into Valuable Insights.pdf
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 

Sea of Data

  • 1. Sea of data Story of data, scale and how we evolve architecture to handle it. Daniel Marchant (@driedtoast)
  • 2. What do you think of when you hear the word “data”?
  • 3. Setting the stage Data – Things known or assumed as facts, making the basis of reasoning or calculation Time – the indefinite continued progress of existence and events in the past, present and future regarded as a whole
  • 4. What types of data are there?
  • 5. Types of data ● Customer Data - Data the customer provides, the lifeblood of your application ● Business Data - Metrics on how growth, customer attrition, marketing, etc... ● Operation Data - Metrics and log messages that help troubleshoot / monitor your application
  • 6. Let’s jump into the story...
  • 7. Once upon a time... A company was founded to produce the best seamonkey management application ever produced. (purely fictional for now) More details: http://www.seamonkey.xyz (eventually)
  • 8. A hypothetical system timeline ● Launch of application ● Reddit posts promote application ● Hacker News promotes application ● Product Hunt promotes application
  • 9. Launch Look ma, I got an app online!
  • 10. Initial dataset ● Operation Data ○ cpu / memory / disk metrics ○ error messages in logs ● Business Data ○ Signup metrics ○ Access usage ● Customer Data ○ User ○ Seamonkey info
  • 12. Architecture ● Load balancer - route traffic to application ● Application - handles requests and manages data to the database ● Database - data storage So simple, life is good! Some reads and writes!
  • 13. Integrations ● Metric Service - google analytics, kilometer.io, kissmetrics, mixpanel, etc... ● Operation Events - datadog, graylog, newrelic, etc…
  • 14. Troubleshooting ● Pretty straight forward ● Check application can write to DB ● Make sure database user can access tables ● Make sure the transactions scoped in the application make sense ● Check rollback scenarios
  • 15. A little about ACID ● Atomicity: all task(s) within a transaction are performed or none of them are. An all-or-none principle. ● Consistency: transaction does not violate those protocols and the data must remain in a consistent state at the beginning and end of a transaction; no half-completed transactions. ● Isolation: each transaction is independent unto itself for both performance and consistency of transactions. ● Durability: Once complete the transaction will persist as complete; it will survive system failure, power loss and other types of system breakdowns.
  • 16. Reddit Oh, cool some people are looking at it!
  • 17. Data evolution ● Operation Data Additions ○ Timers on critical logic ○ Customer requests ● Business Data Additions ○ Customer emails on problems ● Customer Data Additions ○ Seamonkey Tank ○ Seamonkey Social interactions
  • 19. Architecture ● Load balancer - route traffic to application ● Application - still managing data, more nodes added ● Worker - handles work from the db ‘queue’ table ● Cache - used to taper database reads ● Database - data storage master ● Read Only Database - slave data storage
  • 20. More integrations ● Gmail - customer emails ● DataLoop - Timers and statsd data ● Open Tracing - distributed event tracing http://opentracing.io/
  • 21. More Troubleshooting ● If the application isn’t display the right data, is the cache invalidated properly ● Has the worker updated over the application as changes happen within the queued process ● Is replication working on from master to slave
  • 22. Hacker News What have I gotten myself into?
  • 23. Data evolution ● Operation Data / Business Data convergence ○ Customer requests ○ Customer emails to support cases ○ Customer usage to product roadmap ● Customer Data requirements stabilize
  • 25. Architecture ● Application - still managing data, more nodes added, application pushes writes to a queue for non-critical work ● Worker - handles work coming from queue vs db, and writes from application. Also invalidates cache now. ● Cache - used to taper database reads. App is getting more complex invalidation logic
  • 26. CAPs off to you! ● Consistency: same idea presented in ACID. All data storage nodes see the data. ● Availability: data is available ● Partition Tolerance: system continues to operate even under circumstances of data loss or system failure. A single node failure should not cause the entire system to collapse.
  • 27. Troubleshooting ● Oh boy, more systems more debugging “opportunities” ● If data isn’t updated, has the queue gotten the event from the application? Has the worker processed the change event and written to db? ● Is the queue up? Is the worker up?
  • 28. Product Hunt There’s too many people on this planet. We need another plague.
  • 29. Data evolution ● Operation Data ○ Hopes for attrition ● Business Data ○ Monitors customer attrition ○ Hopes for NO attrition ● Customer Data ○ Grows insane ○ Working out archive strategies
  • 31. Architecture ● Lifecyle service / database - added a service to migrate some of the monolith app, service just handles seamonkey growth and lifecyle ● Worker - still listens for events, writes to lifecycle service ● Stream - swapped out the queue with an immutable stream, better data recovery
  • 32. BASE ● Basically Available: system does guarantee the availability of the data as regards CAP Theorem; there will be a response to any request. Response could be a failure to find data or data could be in an inconsistent state. ● Soft state: state of the data could change over time, there may be changes going on due to ‘eventual consistency’ ● Eventual consistency: data will eventually become consistent once it stops receiving changes. The system will continue to receive changes and is not checking the consistency of every transaction before it moves onto the next one.
  • 33. Troubleshooting ● If seamonkeys aren’t progressing, debug new service, is it up? Database for service up? ● If event isn’t processed reset stream point to catch up, handle duplicate events on the worker vs stream. ● UI not finding events, check service up.
  • 35. Time and data As you see through the growth patterns, time and data start to have trade offs. With questions such as: ● How fast does the data update? ● How do we support a backup and restore? ● How do we ensure no data loss?
  • 36. Immutability and Time ● If point in time never changes, immutability is achieved ● Pointer vs point in time, current data version is a pointer to the latest point in time ● A timeline of data changes provides for restoration and easier debugging
  • 37. Distributed immutability ● Database transaction log is an immutable stream of changes ○ Used for replication, most database / datastores use this approach ● Immutable stream(Kafkta, Kinesis) provides an incoming change log, latest changes can be pointed to part of stream. Reverse db approach
  • 38. What’s the point of all this?
  • 39. Some plankton for thought ● If you have any idea where you'll end up, you’d have a better idea where to start ● Understanding reactions to growth will help with setting up services as you grow ● Misery loves company, knowing everyone has these pain points somehow makes you happier ● Know where you’ve been helps you now