SlideShare a Scribd company logo
1 of 22
Download to read offline
Truecaller
towards a data-driven company
Marek Wiewiórka, Tomasz Żukowski
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Agenda
1. Truecaller - a global phonebook
2. Evolution of the company’s data architecture
3. Data as a company asset
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Truecaller
■ World's largest mobile phone community ( > 250 mln users)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Truecaller In Numbers
■ +6 billion application events daily
■ +3 TB of compressed user generated data daily
■ +65M active users and 250k application installations daily
■ +28M identified spam calls every day
■ ...
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Not ly Data-Driven Beginnings...
■ Data layer and analytics powered by MySQL databases
■ No separation of OLTP and OLAP domains
■ Daily ETL processes that used to take longer than one day ;)
■ Problems with storing and querying historical (cold) data
■ Basic reporting without possibility of doing real data
science
■ Almost no DWH design principles in place...
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Towards ly Scalable Data Architecture
DWH
Data ingestion
Schema
repo
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Towards ly Scalable Data Architecture
■ Both data ingestion and data storage/analytics layers are
horizontally scalable
■ High availability for both master and worker nodes
■ Apache Avro with schema evolution features and
centralized schema repository makes adding new event
types seamless for ETL processes
■ Clear separation of staging (raw - Avro format) and
reporting (cleaned and enriched in ORC format) data
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Towards ly Self-Service Analytics
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ly Analytical Tools
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Jupyter Notebooks
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What Can We Do With These Data?
■ Calculate spammer score
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What Can We Do With These Data?
■ Calculate spammer score
■ Visualize our business
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What Can We Do With These Data?
■ Calculate spammer score
■ Visualize our business
■ Monitor KPIs after upgrades
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What Can We Do With These Data?
■ Calculate spammer score
■ Visualize our business
■ Monitor KPIs after upgrades
■ Better target ads
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What Can We Do With These Data?
■ Calculate spammer score
■ Visualize our business
■ Monitor KPIs after upgrades
■ Better target ads
■ Detect fraudulent user behaviour
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
LTV - how to calculate
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Market Share Estimation
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Is Brexit ly a Problem?
■ Calculated on anonymized data of
200k users in the UK
■ Analysis prepared just after Brexit
referendum
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Is Brexit ly a Problem?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What’s next
■ More digging into data (a lot of areas not even touched yet)
■ More advanced modelling
■ Streaming analytics
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
?

More Related Content

What's hot

Splunk Cloud
Splunk CloudSplunk Cloud
Splunk CloudSplunk
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overviewAlex Fok
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights Gunnar Peipman
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQLAppier
 
Test-Driven Machine Learning
Test-Driven Machine LearningTest-Driven Machine Learning
Test-Driven Machine LearningC4Media
 
Tracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubeTracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubePatroklos Papapetrou (Pat)
 
Managing Infrastructure as a Product - Introduction to Platform Engineering
Managing Infrastructure as a Product - Introduction to Platform EngineeringManaging Infrastructure as a Product - Introduction to Platform Engineering
Managing Infrastructure as a Product - Introduction to Platform EngineeringAdityo Pratomo
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to AnsibleKnoldus Inc.
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityNGINX, Inc.
 
Accelerate Microservices Deployments with Automation
Accelerate Microservices Deployments with AutomationAccelerate Microservices Deployments with Automation
Accelerate Microservices Deployments with AutomationNGINX, Inc.
 
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Edureka!
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction DimitrisFinas1
 
An Introduction To Java Profiling
An Introduction To Java ProfilingAn Introduction To Java Profiling
An Introduction To Java Profilingschlebu
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfConnorShorten2
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelDr Ganesh Iyer
 
Splunk 101
Splunk 101Splunk 101
Splunk 101Splunk
 
Splunk HTTP Event Collector
Splunk HTTP Event CollectorSplunk HTTP Event Collector
Splunk HTTP Event CollectorSplunk
 

What's hot (20)

Splunk Architecture
Splunk ArchitectureSplunk Architecture
Splunk Architecture
 
Splunk Cloud
Splunk CloudSplunk Cloud
Splunk Cloud
 
Splunk Architecture overview
Splunk Architecture overviewSplunk Architecture overview
Splunk Architecture overview
 
Deep-Dive to Application Insights
Deep-Dive to Application Insights Deep-Dive to Application Insights
Deep-Dive to Application Insights
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
Test-Driven Machine Learning
Test-Driven Machine LearningTest-Driven Machine Learning
Test-Driven Machine Learning
 
Tracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQubeTracking and improving software quality with SonarQube
Tracking and improving software quality with SonarQube
 
Managing Infrastructure as a Product - Introduction to Platform Engineering
Managing Infrastructure as a Product - Introduction to Platform EngineeringManaging Infrastructure as a Product - Introduction to Platform Engineering
Managing Infrastructure as a Product - Introduction to Platform Engineering
 
Introduction to Ansible
Introduction to AnsibleIntroduction to Ansible
Introduction to Ansible
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with Observability
 
The basics of fluentd
The basics of fluentdThe basics of fluentd
The basics of fluentd
 
Accelerate Microservices Deployments with Automation
Accelerate Microservices Deployments with AutomationAccelerate Microservices Deployments with Automation
Accelerate Microservices Deployments with Automation
 
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...
Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction
 
An Introduction To Java Profiling
An Introduction To Java ProfilingAn Introduction To Java Profiling
An Introduction To Java Profiling
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdf
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement Model
 
Splunk 101
Splunk 101Splunk 101
Splunk 101
 
Splunk HTTP Event Collector
Splunk HTTP Event CollectorSplunk HTTP Event Collector
Splunk HTTP Event Collector
 
Selenium
SeleniumSelenium
Selenium
 

Similar to Truecaller towards a data-driven company

Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...GetInData
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...GetInData
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczGetInData
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Nicola Sandoli
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
 
IT and OT Convergence
IT and OT ConvergenceIT and OT Convergence
IT and OT ConvergenceOpsRamp
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...gogo6
 
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSouth Tyrol Free Software Conference
 

Similar to Truecaller towards a data-driven company (20)

Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...Predicting Startup Market Trends based on the news and social media - Albert ...
Predicting Startup Market Trends based on the news and social media - Albert ...
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 
IT and OT Convergence
IT and OT ConvergenceIT and OT Convergence
IT and OT Convergence
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
 
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
HiveMQ & HighByte Presents: Building an Enterprise Unified Namespace (UNS) to...
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 

More from GetInData

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competitionGetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierGetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataGetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataGetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...GetInData
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataGetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...GetInData
 

More from GetInData (20)

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInDataStrategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
Strategies for on premise to Google Cloud migration - Mateusz Pytel, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Truecaller towards a data-driven company

  • 1. Truecaller towards a data-driven company Marek Wiewiórka, Tomasz Żukowski
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Agenda 1. Truecaller - a global phonebook 2. Evolution of the company’s data architecture 3. Data as a company asset
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller ■ World's largest mobile phone community ( > 250 mln users)
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller In Numbers ■ +6 billion application events daily ■ +3 TB of compressed user generated data daily ■ +65M active users and 250k application installations daily ■ +28M identified spam calls every day ■ ...
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Not ly Data-Driven Beginnings... ■ Data layer and analytics powered by MySQL databases ■ No separation of OLTP and OLAP domains ■ Daily ETL processes that used to take longer than one day ;) ■ Problems with storing and querying historical (cold) data ■ Basic reporting without possibility of doing real data science ■ Almost no DWH design principles in place...
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture DWH Data ingestion Schema repo
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture ■ Both data ingestion and data storage/analytics layers are horizontally scalable ■ High availability for both master and worker nodes ■ Apache Avro with schema evolution features and centralized schema repository makes adding new event types seamless for ETL processes ■ Clear separation of staging (raw - Avro format) and reporting (cleaned and enriched in ORC format) data
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Self-Service Analytics
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ly Analytical Tools
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Jupyter Notebooks
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads ■ Detect fraudulent user behaviour
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. LTV - how to calculate
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Market Share Estimation
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem? ■ Calculated on anonymized data of 200k users in the UK ■ Analysis prepared just after Brexit referendum
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem?
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What’s next ■ More digging into data (a lot of areas not even touched yet) ■ More advanced modelling ■ Streaming analytics
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ?