SlideShare a Scribd company logo

Truecaller towards a data-driven company

GetInData
GetInData

Did you like it? Check out our blog to stay up to date: https://getindata.com/blog Marek Wiewiórka and Tomasz Żukowski had a pleasure to give a presentation at Data Science Summit 2017 which took place in Warsaw on May 26th. They came forth with one of the most popular startup in Sweden - Trucaller. Please take a look at the presentation.

1 of 22
Download to read offline
Truecaller
towards a data-driven company
Marek Wiewiórka, Tomasz Żukowski
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Agenda
1. Truecaller - a global phonebook
2. Evolution of the company’s data architecture
3. Data as a company asset
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Truecaller
■ World's largest mobile phone community ( > 250 mln users)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Truecaller In Numbers
■ +6 billion application events daily
■ +3 TB of compressed user generated data daily
■ +65M active users and 250k application installations daily
■ +28M identified spam calls every day
■ ...
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Not ly Data-Driven Beginnings...
■ Data layer and analytics powered by MySQL databases
■ No separation of OLTP and OLAP domains
■ Daily ETL processes that used to take longer than one day ;)
■ Problems with storing and querying historical (cold) data
■ Basic reporting without possibility of doing real data
science
■ Almost no DWH design principles in place...
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Towards ly Scalable Data Architecture
DWH
Data ingestion
Schema
repo
Ad

Recommended

Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Evention
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics MeetupIntroduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetup
Introduction to Hadoop and Cloudera, Louisville BI & Big Data Analytics Meetupiwrigley
 
Zfs Nuts And Bolts
Zfs Nuts And BoltsZfs Nuts And Bolts
Zfs Nuts And BoltsEric Sproul
 
Scalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessScalable and Available, Patterns for Success
Scalable and Available, Patterns for SuccessDerek Collison
 

More Related Content

What's hot

ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaJack Gao
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hPrecisely
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Yongho Ha
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHortonworks
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentationSai Mohith
 
CATALOGO DE MEMORIAS RAM
CATALOGO DE MEMORIAS RAMCATALOGO DE MEMORIAS RAM
CATALOGO DE MEMORIAS RAMitzelcamas
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015Ivan Glushkov
 
Intent detection
Intent detectionIntent detection
Intent detectionBytesview
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural NetworkDerek Kane
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaEdureka!
 
Spark streaming
Spark streamingSpark streaming
Spark streamingWhiteklay
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Spark Summit
 

What's hot (20)

rnn BASICS
rnn BASICSrnn BASICS
rnn BASICS
 
ClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @SinaClickHouse北京Meetup ClickHouse Best Practice @Sina
ClickHouse北京Meetup ClickHouse Best Practice @Sina
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Blue brain Technology
Blue brain TechnologyBlue brain Technology
Blue brain Technology
 
Simplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-hSimplifying Big Data Integration with Syncsort DMX and DMX-h
Simplifying Big Data Integration with Syncsort DMX and DMX-h
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
NoSQL
NoSQLNoSQL
NoSQL
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
 
HBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minuteHBase: How to get MTTR below 1 minute
HBase: How to get MTTR below 1 minute
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
CATALOGO DE MEMORIAS RAM
CATALOGO DE MEMORIAS RAMCATALOGO DE MEMORIAS RAM
CATALOGO DE MEMORIAS RAM
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
Intent detection
Intent detectionIntent detection
Intent detection
 
Data Science - Part VIII - Artifical Neural Network
Data Science - Part VIII -  Artifical Neural NetworkData Science - Part VIII -  Artifical Neural Network
Data Science - Part VIII - Artifical Neural Network
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
 

Similar to Truecaller towards a data-driven company

Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Nicola Sandoli
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?Aerospike, Inc.
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Big Data Spain
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsKai Wähner
 
IT and OT Convergence
IT and OT ConvergenceIT and OT Convergence
IT and OT ConvergenceOpsRamp
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersDavid Walker
 
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...gogo6
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSouth Tyrol Free Software Conference
 
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimePrecisely
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...GetInData
 
Redis Day TLV 2018 - Redis & BioCatch
Redis Day TLV 2018 - Redis & BioCatchRedis Day TLV 2018 - Redis & BioCatch
Redis Day TLV 2018 - Redis & BioCatchRedis Labs
 

Similar to Truecaller towards a data-driven company (20)

Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
Tibco Augmented Intelligence - Analytics, IoT, Big Data, Streaming 20161025
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
Streaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and ProductsStreaming Analytics - Comparison of Open Source Frameworks and Products
Streaming Analytics - Comparison of Open Source Frameworks and Products
 
IT and OT Convergence
IT and OT ConvergenceIT and OT Convergence
IT and OT Convergence
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
 
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...Predictive Analytics and the Industrial Internet of Manufacturing Things with...
Predictive Analytics and the Industrial Internet of Manufacturing Things with...
 
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data HubSFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
SFSCON23 - Martin Rabanser - Real-time aeroplane tracking and the Open Data Hub
 
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and UptimeLegacy IBM Systems and Splunk: Security, Compliance and Uptime
Legacy IBM Systems and Splunk: Security, Compliance and Uptime
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Redis Day TLV 2018 - Redis & BioCatch
Redis Day TLV 2018 - Redis & BioCatchRedis Day TLV 2018 - Redis & BioCatch
Redis Day TLV 2018 - Redis & BioCatch
 

More from GetInData

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...GetInData
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
 
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...GetInData
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
 
Elephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud readyElephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud readyGetInData
 
Druid - Real-time interactive analytics at scale
Druid - Real-time interactive analytics at scaleDruid - Real-time interactive analytics at scale
Druid - Real-time interactive analytics at scaleGetInData
 
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)GetInData
 
Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)GetInData
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem GetInData
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopGetInData
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Apache Flume
Apache FlumeApache Flume
Apache FlumeGetInData
 

More from GetInData (19)

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...
Zdarzeniowe zasilanie i przeliczanie danych w nowoczesnej hurtowni danych opa...
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
 
Elephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud readyElephants in the cloud or How to become cloud ready
Elephants in the cloud or How to become cloud ready
 
Druid - Real-time interactive analytics at scale
Druid - Real-time interactive analytics at scaleDruid - Real-time interactive analytics at scale
Druid - Real-time interactive analytics at scale
 
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
 
Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
Simplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in HadoopSimplified Data Management And Process Scheduling in Hadoop
Simplified Data Management And Process Scheduling in Hadoop
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
HCatalog
HCatalogHCatalog
HCatalog
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 

Recently uploaded

Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanDatabarracks
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxharimaxwell0712
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfDomotica daVinci
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...ISPMAIndia
 
LF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIELF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIEDanBrown980551
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 

Recently uploaded (20)

Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
How to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response PlanHow to write an effective Cyber Incident Response Plan
How to write an effective Cyber Incident Response Plan
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptx
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdf
 
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
AI MODELS USAGE IN FINTECH PRODUCTS: PM APPROACH & BEST PRACTICES by Kasthuri...
 
LF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIELF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIE
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 

Truecaller towards a data-driven company

  • 1. Truecaller towards a data-driven company Marek Wiewiórka, Tomasz Żukowski
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Agenda 1. Truecaller - a global phonebook 2. Evolution of the company’s data architecture 3. Data as a company asset
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller ■ World's largest mobile phone community ( > 250 mln users)
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller In Numbers ■ +6 billion application events daily ■ +3 TB of compressed user generated data daily ■ +65M active users and 250k application installations daily ■ +28M identified spam calls every day ■ ...
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Not ly Data-Driven Beginnings... ■ Data layer and analytics powered by MySQL databases ■ No separation of OLTP and OLAP domains ■ Daily ETL processes that used to take longer than one day ;) ■ Problems with storing and querying historical (cold) data ■ Basic reporting without possibility of doing real data science ■ Almost no DWH design principles in place...
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture DWH Data ingestion Schema repo
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture ■ Both data ingestion and data storage/analytics layers are horizontally scalable ■ High availability for both master and worker nodes ■ Apache Avro with schema evolution features and centralized schema repository makes adding new event types seamless for ETL processes ■ Clear separation of staging (raw - Avro format) and reporting (cleaned and enriched in ORC format) data
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Self-Service Analytics
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ly Analytical Tools
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Jupyter Notebooks
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads ■ Detect fraudulent user behaviour
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. LTV - how to calculate
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Market Share Estimation
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem? ■ Calculated on anonymized data of 200k users in the UK ■ Analysis prepared just after Brexit referendum
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem?
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What’s next ■ More digging into data (a lot of areas not even touched yet) ■ More advanced modelling ■ Streaming analytics
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ?