SlideShare a Scribd company logo
Rounds Analytics Pipeline
Aviv Laufer CTO, Rounds @avivl
Founded 2008
35 person team (⅔ in R&D)
Tel-Aviv based
Raised over $22 million in funding from industry leading
investors such as Sequoia Capital, Samsung Ventures, Rhodium,
Verizon Ventures and many more. Over 25 million users worldwide
Client - Objective-C, Java, C/C++
Server - Python, Go, C++, and bits of Erlang
DB - MySQL (RDS), CouchBase (a few clusters)
Multi Cloud - AWS, GCE, SL, DO
Deployment - Ansible
Monitoring - Sensu, NewRelic, VictorOps
And yes we use Docker…...
Tools of the Trade
We tried to make it work quite a few times, and failed
We kept on trying
We think we got it right this time
ANALYTICS @ROUNDS
DO OR DO NOT. THERE IS NO TRY. - Master Yoda
One monolith app
Data was written to MySql/RDS - row by row
Batch ETL to Vertica
And then came July 2014
GENESIS
Data collection killed our backend app
Slow, failing ETL process
No real time view into events
Preferred users over analytics, we killed the event collection
We were flying blind
CHAOS
Separate ETL process from main app
Clients reports (a request for each event) to a different microservice
First very naive version written in Go - it scales!
Data is written to an Elasticsearch cluster.
ETL from ES to Vertica
EXODUS
Frontends - Receiving user analytics and perform sanity checks
Google Pub/Sub - Store events for future processing
Workers - Pull the events from Pub/Sub and stream to Google BigQuery and ES
...And Then There Were Three...
Clients send gzipped, batched
Frontend does sanity checks - Validation, versioning, etc.
Frontend replies fast (202 Accepted) and closes the connection in order
to save on mobile socket life
Geo load-balanced
Pushes analytics into Pub/Sub for future processing, mutation
Fan-In model
ANALYTICS - FRONTEND
Pulls analytics from Pub/Sub
Mutate/Enrich data if necessary
Inserts to various DBs, according to usage - Monitoring, BI, Warehousing, etc.
Separation of concerns - Worker cluster per target DB
Fan-out model
Renee Finch, golang.org/doc/gopher/pencil/ANALYTICS - WORKER
Golang, abstraction package
Receives rows, streams to BigQuery (as opposed to load jobs)
Sync (foreground insert) or Async (background insert)
Pros: Instant data availability, no job delay, fast
Cons: Harder handling of bad analytics, Google’s HTTP 500s (requires retry)
Open source, PRs merrily encouraged!
Collecting User Data and Usage - Blog Post - http://bit.ly/CollectingDataRounds
github.com/rounds/go-bqstreamerSTREAMING TO BigQuery
Frontends deployed in several locations in GCE (We Geo load balance them)
Workers are in GCE (Europe West)
ES cluster is in GCE (Europe West)
ACROSS THE UNIVERSE
We started using Elasticsearch for monitoring about a year before elastic.co relaize that
Every new feature received a monitoring dashboard
Debugging
Monitoring (custom sensu checks)
Data is kept for 30 to 90 days
Ad-hoc reporting using Kibana
ELASTICSEARCH
Store data from the beginning till the end of time
Standard(ish) SQL
Very fast
No DBA is needed
Business reports (SiSense) - “Kibana” for BigQuery
BigQuery
NEW ISSUES
Permissive vs hard scheme for events - allow clients ease of use while keeping the scheme
strict for ease of BI
Clients make mistakes (Arabic locale dates) - elasticsearch allows while BQ doesn’t
Things we’re integrating as solutions
Every event is a class - compile time validation generated from RAML
Wrote a library for event reporting server side
ANY QUESTIONS
DO YOU HAVE?

More Related Content

What's hot

Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
Treasure Data, Inc.
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud Facility
Claudio Pontili
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutes
Claudiu Barbura
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
Jason Flittner
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
Databricks
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overview
pzybrick
 
Serverless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDAServerless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDA
Nilesh Gule
 
Azure containers fundamentals
Azure containers fundamentalsAzure containers fundamentals
Azure containers fundamentals
Nilesh Gule
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
Nilesh Gule
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
Data Con LA
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
Andreas Raible
 
CSharp
CSharpCSharp
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Databricks
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
Alluxio, Inc.
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
Elasticsearch
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
Robert Sanders
 
PowerStream Demo
PowerStream DemoPowerStream Demo
PowerStream Demo
SingleStore
 

What's hot (20)

Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud Facility
 
Zero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using HadoopZero Downtime App Deployment using Hadoop
Zero Downtime App Deployment using Hadoop
 
Building an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutesBuilding an intelligent big data application in 30 minutes
Building an intelligent big data application in 30 minutes
 
Netflix Big Data Paris 2017
Netflix Big Data Paris 2017Netflix Big Data Paris 2017
Netflix Big Data Paris 2017
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
IPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution OverviewIPC Global Big Data To Decision Solution Overview
IPC Global Big Data To Decision Solution Overview
 
Serverless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDAServerless Event Driven Containers with KEDA
Serverless Event Driven Containers with KEDA
 
Azure containers fundamentals
Azure containers fundamentalsAzure containers fundamentals
Azure containers fundamentals
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Big data for dot net Devs with Spark
Big data for dot net Devs with SparkBig data for dot net Devs with Spark
Big data for dot net Devs with Spark
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
CSharp
CSharpCSharp
CSharp
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Migrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the CloudMigrating Big Data Workloads to the Cloud
Migrating Big Data Workloads to the Cloud
 
PowerStream Demo
PowerStream DemoPowerStream Demo
PowerStream Demo
 

Similar to Rounds analytics pipeline

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu
 
Griffith Bi Migration & Source Control
Griffith Bi Migration & Source ControlGriffith Bi Migration & Source Control
Griffith Bi Migration & Source Control
David Waters
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Jongwook Woo
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
Jonathan Holloway
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
Imply
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloud
Ernest Mueller
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
Bill Hayduk
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
Big Data Week
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Mopuru Babu
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Shirshanka Das
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Qubole
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
sparkflows
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Docker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to DockerDocker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to Docker
SmartWave
 

Similar to Rounds analytics pipeline (20)

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Griffith Bi Migration & Source Control
Griffith Bi Migration & Source ControlGriffith Bi Migration & Source Control
Griffith Bi Migration & Source Control
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the EcosystemsIntroduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
Introduction to Big Data, MapReduce, its Use Cases, and the Ecosystems
 
Building data pipelines
Building data pipelinesBuilding data pipelines
Building data pipelines
 
Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)Druid: Under the Covers (Virtual Meetup)
Druid: Under the Covers (Virtual Meetup)
 
Business model driven cloud adoption - what NI is doing in the cloud
Business model driven cloud adoption -  what  NI is doing in the cloudBusiness model driven cloud adoption -  what  NI is doing in the cloud
Business model driven cloud adoption - what NI is doing in the cloud
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache HiveHarnessing the Hadoop Ecosystem Optimizations in Apache Hive
Harnessing the Hadoop Ecosystem Optimizations in Apache Hive
 
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 minsSparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
Sparkflows - Build E2E Data Analytics Use Cases in less than 30 mins
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Docker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to DockerDocker Geneva Meetup - Introduction to Docker
Docker Geneva Meetup - Introduction to Docker
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 

Rounds analytics pipeline

  • 1. Rounds Analytics Pipeline Aviv Laufer CTO, Rounds @avivl
  • 2. Founded 2008 35 person team (⅔ in R&D) Tel-Aviv based Raised over $22 million in funding from industry leading investors such as Sequoia Capital, Samsung Ventures, Rhodium, Verizon Ventures and many more. Over 25 million users worldwide
  • 3.
  • 4. Client - Objective-C, Java, C/C++ Server - Python, Go, C++, and bits of Erlang DB - MySQL (RDS), CouchBase (a few clusters) Multi Cloud - AWS, GCE, SL, DO Deployment - Ansible Monitoring - Sensu, NewRelic, VictorOps And yes we use Docker…... Tools of the Trade
  • 5. We tried to make it work quite a few times, and failed We kept on trying We think we got it right this time ANALYTICS @ROUNDS DO OR DO NOT. THERE IS NO TRY. - Master Yoda
  • 6. One monolith app Data was written to MySql/RDS - row by row Batch ETL to Vertica And then came July 2014 GENESIS
  • 7.
  • 8. Data collection killed our backend app Slow, failing ETL process No real time view into events Preferred users over analytics, we killed the event collection We were flying blind CHAOS
  • 9. Separate ETL process from main app Clients reports (a request for each event) to a different microservice First very naive version written in Go - it scales! Data is written to an Elasticsearch cluster. ETL from ES to Vertica EXODUS
  • 10. Frontends - Receiving user analytics and perform sanity checks Google Pub/Sub - Store events for future processing Workers - Pull the events from Pub/Sub and stream to Google BigQuery and ES ...And Then There Were Three...
  • 11. Clients send gzipped, batched Frontend does sanity checks - Validation, versioning, etc. Frontend replies fast (202 Accepted) and closes the connection in order to save on mobile socket life Geo load-balanced Pushes analytics into Pub/Sub for future processing, mutation Fan-In model ANALYTICS - FRONTEND
  • 12. Pulls analytics from Pub/Sub Mutate/Enrich data if necessary Inserts to various DBs, according to usage - Monitoring, BI, Warehousing, etc. Separation of concerns - Worker cluster per target DB Fan-out model Renee Finch, golang.org/doc/gopher/pencil/ANALYTICS - WORKER
  • 13. Golang, abstraction package Receives rows, streams to BigQuery (as opposed to load jobs) Sync (foreground insert) or Async (background insert) Pros: Instant data availability, no job delay, fast Cons: Harder handling of bad analytics, Google’s HTTP 500s (requires retry) Open source, PRs merrily encouraged! Collecting User Data and Usage - Blog Post - http://bit.ly/CollectingDataRounds github.com/rounds/go-bqstreamerSTREAMING TO BigQuery
  • 14. Frontends deployed in several locations in GCE (We Geo load balance them) Workers are in GCE (Europe West) ES cluster is in GCE (Europe West) ACROSS THE UNIVERSE
  • 15. We started using Elasticsearch for monitoring about a year before elastic.co relaize that Every new feature received a monitoring dashboard Debugging Monitoring (custom sensu checks) Data is kept for 30 to 90 days Ad-hoc reporting using Kibana ELASTICSEARCH
  • 16.
  • 17. Store data from the beginning till the end of time Standard(ish) SQL Very fast No DBA is needed Business reports (SiSense) - “Kibana” for BigQuery BigQuery
  • 18. NEW ISSUES Permissive vs hard scheme for events - allow clients ease of use while keeping the scheme strict for ease of BI Clients make mistakes (Arabic locale dates) - elasticsearch allows while BQ doesn’t Things we’re integrating as solutions Every event is a class - compile time validation generated from RAML Wrote a library for event reporting server side