Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Truecaller towards a data-driven company

1,039 views

Published on

Marek Wiewiórka and Tomasz Żukowski had a pleasure to give a presentation at Data Science Summit 2017 which took place in Warsaw on May 26th. They came forth with one of the most popular startup in Sweden - Trucaller.
Please take a look at the presentation.

Published in: Technology
  • Be the first to comment

Truecaller towards a data-driven company

  1. 1. Truecaller towards a data-driven company Marek Wiewiórka, Tomasz Żukowski
  2. 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Agenda 1. Truecaller - a global phonebook 2. Evolution of the company’s data architecture 3. Data as a company asset
  3. 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller ■ World's largest mobile phone community ( > 250 mln users)
  4. 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Truecaller In Numbers ■ +6 billion application events daily ■ +3 TB of compressed user generated data daily ■ +65M active users and 250k application installations daily ■ +28M identified spam calls every day ■ ...
  5. 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Not ly Data-Driven Beginnings... ■ Data layer and analytics powered by MySQL databases ■ No separation of OLTP and OLAP domains ■ Daily ETL processes that used to take longer than one day ;) ■ Problems with storing and querying historical (cold) data ■ Basic reporting without possibility of doing real data science ■ Almost no DWH design principles in place...
  6. 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture DWH Data ingestion Schema repo
  7. 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Scalable Data Architecture ■ Both data ingestion and data storage/analytics layers are horizontally scalable ■ High availability for both master and worker nodes ■ Apache Avro with schema evolution features and centralized schema repository makes adding new event types seamless for ETL processes ■ Clear separation of staging (raw - Avro format) and reporting (cleaned and enriched in ORC format) data
  8. 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Towards ly Self-Service Analytics
  9. 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ly Analytical Tools
  10. 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Jupyter Notebooks
  11. 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score
  12. 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business
  13. 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades
  14. 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads
  15. 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What Can We Do With These Data? ■ Calculate spammer score ■ Visualize our business ■ Monitor KPIs after upgrades ■ Better target ads ■ Detect fraudulent user behaviour
  16. 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. LTV - how to calculate
  17. 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Market Share Estimation
  18. 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem? ■ Calculated on anonymized data of 200k users in the UK ■ Analysis prepared just after Brexit referendum
  19. 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Brexit ly a Problem?
  20. 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  21. 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What’s next ■ More digging into data (a lot of areas not even touched yet) ■ More advanced modelling ■ Streaming analytics
  22. 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ?

×