Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; Krzysztof Zarzycki, GetInData

188 views

Published on

Nowadays many companies become data rich and intensive. They have millions of users generating billions of interactions and events per day.
These massive streams of complex events can be processed and reacted upon to e.g. offer new products, next best actions, communicate to users or detect frauds, and quicker we can do it, the higher value we can generate.

In this talk we will present, how in joint development with our client and in just few months effort we have built from ground up a complex event processing platform for their intensive data streams. We will share how the system runs marketing campaigns or detect frauds by following behavior of millions users in real-time and reacting on it instantly. The platform designed and built with Big Data technologies to infinitely and cost-effectively scale already ingests and processes billions of messages or terabytes of data per day on a still small cluster. We will share how we leveraged the current best of breed open-source projects including Apache Flink, Apache Nifi and Apache Kafka, but also what interesting problems we needed to solve. Finally, we will share where we’re heading next, what next use cases we’re going to implement and how.

Published in: Data & Analytics
  • Be the first to comment

Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; Krzysztof Zarzycki, GetInData

  1. 1. Assisting Millions of User in Real-Time Big Data Tech Warsaw 2018
  2. 2. 2 The Speakers Who are these guys? Alexey Brodovshuk @alexeybrod Krzysztof Zarzycki @k_zarzyk
  3. 3. 3 About Kcell Kcell JSC is a part of the largest Scandinavian telecommunications holding – TeliaCompany Kcell has a strong software development team and lots of experience in building services and products We like innovations > 10 000 000 subscribers Largest GSM operator in Kazakhstan 4G (40%), 3G (73%), 2G (96%) population Great network coverage There is the ongoing process of company digital transformation Not only telco
  4. 4. 4 Business needs Assisting Millions of User in Real-Time SMS events Voice usage events Data usage events Roaming events Location events Input Process Actions
  5. 5. 5 Use Cases Use case scenarios. Just few of many. Case If subscriber top-ups her balance too often in short period of time. We can offer her a less expensive tariff or auto-payment services. Balance Top Up Case Trigger UI
  6. 6. 6 Roaming Fraud Trigger to Marketing Platform if subscriber visited X country OR/AND registered in Y visited mobile network and his device's type is Z Roaming case Send an email to the anti-fraud unit if subscriber registered in roaming but his balance at the moment is equal to 0. This situation is impossible in standard case. Fraud case in roaming
  7. 7. 7 Old System Why did we start to look for the new solution? External Vendor Solution Blackbox Solution Scalability issues Not reliable 1 2 3 Kcell Developers can’t fix, tweak or optimize it Limited to ~2000 events / sec Can’t support all needed data sources Multiple accidents which took too much time to resolve
  8. 8. 8 Scale Required system throughput 160KEvents / second 10MSubscribers 22.15TB / month
  9. 9. 9 About GetInData Big Data. Passion. Experience. Roots at Spotify Focus on Big Data from Day 1 Production Experience Contributions to Apache Flink
  10. 10. 10 New Solution Real-time Stream Processing ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  11. 11. 11 New Solution Real-time Stream Processing flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ
  12. 12. 12 Processing Flow Real-time Stream Processing raw call events data usage events transform transformed events transform transformed events local state RocksDB control topic Admin UI HTTP calls notification events outgestion ingestion ingestion submit/stop triggers
  13. 13. 13 New Solution (Operations) Web UI, Monitoring, Security flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Admin UI (Triggers workbench) Monitoring ELK stack - logs InfluxDB/Grafana - metrics Security FreeIPA Kerberos LDAP/AD API (kafka based)
  14. 14. 14 New Solution (Data Lake) Data Lake and Sub-second OLAP Analytics flink ingestion outgestion events hub events processing HTTP push/pull FTP NFS MQ HTTP push/pull FTP MQ Data Lake Historical Storage (HDFS) Batch (Spark) SQL (Hive) Keep history, Report, Explore Column-oriented Data store OLAP (Druid) online
  15. 15. 15 Decisions made Some decisions our team made before or during project implementation Streaming-first approach Apache Kafka for event hub Apache Flink Powerful Real-Time Analytics
  16. 16. 16 Apache Avro Keep state local to the process Ingest reference data for local joins and enrichment ● No need to query external systems while processing ● Data time correlation correctness Performance transformed events transformed events Subscriber profile data (events) Local State Not at >100K events / sec
  17. 17. 17 Nifi for data ingestion (no coding) ● but not for CEP Web UI for configuring triggers Ease of Use
  18. 18. 18 Flink on YARN, with HDFS HA for redundancy and running ~24/7 InfluxDb & Grafana for monitoring & alerting ELK for logs collection and aggregation Reliability and battle-tested techniques Kerberos and AD thanks to FreeIPA Apache Ranger for authorization Security
  19. 19. 19 One platform for the whole Enterprise Batch (adhoc) queries too ● Spark, Hive/Presto Online analytics ● OLAP Extensiveness HDP Open-source technologies HDP as a licence-free distribution Just start with a bunch of servers Cost-Efficiency
  20. 20. 20 Before You Start Words of wisdom 1 Simple sketchy trigger request quickly becomes a complex algorithm2 Know your data sources ● Do NOT assume that your data sources are ready for streaming 3 DO prepare yourself to use open-source ● NiFi is a great framework, but not a comprehend set of processors ● HDP is a great distribution, but versions in it are quickly outdated 4 Start small, Start fast
  21. 21. 21 Our Collaboration Two heads are better than one Joint development team Not a vendor solution Development as one team Code quality Code review and automated tools for code quality control Agile Practices Distant geographic locations, but everyday standups Go live quickly! <4 months to first production case running 24/7! Deliver DevOps/Automation Knowledge sharing Constant knowledge exchange in areas of expertise Testing Separate testing environment Automated Unit/E2E tests
  22. 22. 22 Make it a company-wide, self-service go-to place for data analysis Future Work We have already done a lot. But more great things are coming. 2018 Q2 2018 Q3 2018 Q4 Bright Future More Data Sources More Triggers Geolocation data CDRs Equipment logs Data Lake Machine Learning We plan to include machine learning and other tools that would enhance our platform even more Real-time BI Intraday view on business and operations Call center, clickstream, communication… all in one place ready for behavioral analysis Customer 360 view Monetize valuable insights from our combined rich data sources. Data Monetization Predictive maintenance Network Optimization To lower operational costs And make better investments And many more...
  23. 23. Questions? Big Data Tech Warsaw 2018 zarz@getindata.com alexey.brodovshuk@gmail.com Contact us:

×