SlideShare a Scribd company logo
1 of 40
• CTO at
Daniel Zohar
• A world leading language learning app
• Making a positive impact on millions of
people’s lives!
Data Pipelines @ Memrise
Past, Present & Future
Let’s talk about data-pipelines
Let’s talk about data-pipelines
(analytics)
Let’s talk about data-pipelines
a data pipeline is the software that consolidates data from
multiple sources and makes it available to be used
strategically
Let’s talk about data-pipelines
Let’s talk about data-pipelines
Let’s talk about data-pipelines
Data Pipelines at Memrise
Data Pipelines at Memrise
• Millions of user sessions a day
• ~90MM events per day (avg. 1,000/s,
max. 2000/s)
• ~330GB of raw data
Data Pipelines at Memrise
< 2014
Data Pipelines at Memrise
2014< 2014
Data Pipelines at Memrise 2014
Data Pipeline
Data Pipeline
Workflow Orchestration
Reporting
• ETL processes were written in MapReduce (Hive) -
not easy to work with and add new processes.
• UI was custom made. Every report required
planning, developer time and a deploy.
• Some reports running on production DB
Pain points
No access to raw data
Pain points
Data Pipelines at Memrise
20152014< 2014
Snowplow to the rescue!
Snowplow to the rescue!
Own your data!
Snowplow to the rescue!
Own your data!
and much more…
Data Pipelines at Memrise 2015
Data Pipeline
KINESIS
Data Analysis
Data Analysis
Workflow Orchestration
Workflow Orchestration
Workflow Orchestration
Workflow Orchestration
Data Pipelines at Memrise
201620152014< 2014
Realtime Monitoring
“Monitor everything” mentality (with alerts)
Integrate Back Into the Product
Integrate Back Into the Product
Integrate Back Into the Product
Thank You!
🚀
• Data Engineer
• Platform Lead
• Platform Engineer
Memrise is
hiring!
memrise.com/jobs

More Related Content

What's hot

Viewbix tracking journey
Viewbix tracking journeyViewbix tracking journey
Viewbix tracking journey
idan_by
 

What's hot (20)

Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
 
Apache Kylin 101
Apache Kylin 101Apache Kylin 101
Apache Kylin 101
 
Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
 
Snowplow, Metail and Cascalog
Snowplow, Metail and CascalogSnowplow, Metail and Cascalog
Snowplow, Metail and Cascalog
 
Simply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution AnalysisSimply Business and Snowplow - Multichannel Attribution Analysis
Simply Business and Snowplow - Multichannel Attribution Analysis
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
 
Splunk @ HomeAway
Splunk @ HomeAwaySplunk @ HomeAway
Splunk @ HomeAway
 
SplunkLive! Utrecht - KPN
SplunkLive! Utrecht - KPNSplunkLive! Utrecht - KPN
SplunkLive! Utrecht - KPN
 
Spark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business OperationsSpark Usage in Enterprise Business Operations
Spark Usage in Enterprise Business Operations
 
Snowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcase
 
Driving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive AnalyticsDriving the On-Demand Economy with Spark and Predictive Analytics
Driving the On-Demand Economy with Spark and Predictive Analytics
 
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooksNotebooks @ Netflix: From analytics to engineering with Jupyter notebooks
Notebooks @ Netflix: From analytics to engineering with Jupyter notebooks
 
Understanding event data
Understanding event dataUnderstanding event data
Understanding event data
 
A taste of Snowplow Analytics data
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics data
 
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
Data Science at Netflix - Principles for Speed & Scale [Rev 2019 keynote]
 
Viewbix tracking journey
Viewbix tracking journeyViewbix tracking journey
Viewbix tracking journey
 
Spark Summit EU: IBM Keynote
Spark Summit EU: IBM KeynoteSpark Summit EU: IBM Keynote
Spark Summit EU: IBM Keynote
 
Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2
 

Similar to Memrise presentation @ London Snowplow meetup

Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
Stylight
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB
 

Similar to Memrise presentation @ London Snowplow meetup (20)

Big Data
Big DataBig Data
Big Data
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
Overview of Bigdata Analytics
Overview of Bigdata Analytics Overview of Bigdata Analytics
Overview of Bigdata Analytics
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
The New Model
The New ModelThe New Model
The New Model
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Big data
Big dataBig data
Big data
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Memrise presentation @ London Snowplow meetup