Kanthaka - High Volume CDR Analyzer


Published on

'Kanthaka' is an attempt to bring the benefits of Big Data technologies to telecom industry. The objective of the system is to analyze the CDRs (Caller Detail Record) and give results in near real time.
This is carried out as a final year project for my degree B. Sc. of Engineering (Hons) at University of Moratuwa as a team with 3 more colleagues, under the supervision of a senior lecturer and an industry expert.
The presentation exhibits the background, findings after literature review and proposing architecture of the system as for now. Any feed backs on improvements that can be made, are warmly welcome!

Published in: Technology
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Kanthaka - High Volume CDR Analyzer

  1. 1. Big Data CDR AnalyzerProject Supervisors- 080201N – M.K.P.R. JayawardhanaMr. Thilina Anjitha – hSenid 080254D – P.K.A.M. KumaraDr.Shahani Markus Weerawarana 080331L – W.D.A.I. Paranawithana 080357V – T.D.K. Perera
  2. 2. Overview• Background• Current Situation• Scope and Assumptions• Kanthaka – big data CDR Analyzer System• Technology Comparison - Map Reduce - No SQL Databases• Architecture• Project Plan• Risks and Possible Remedies• References
  3. 3. BackgroundMobile Promotions
  4. 4. Current Situation• Promotions based only on their network usage• Use only active call switch for triggering promotions• No way of analyzing and processing high volume CDR records• No efficient CDR analyzing method• No access to historical data• Complex rules not supported &@$* #
  5. 5. to rescue• Selecting eligible users for both commercial organizations based and network usage based promotions. Eg- giving 20% discount for pizza lovers within age group 16-40 who have called pizza hut more than 5 times a month• High volume CDR analysis.• Near real time selection of eligible users for promotions.
  6. 6. • CDR Analyzer system which ▫ can process 30 million records per day ▫ can produce results within 10-15 seconds ▫ provides a GUI to define dynamic rules ▫ can be used to offer real-time sales promotions for mobile subscribers
  7. 7. Scope and AssumptionsScope  30 M  30 M  Multiple Rules  Single Rule  Offer Promotion  Select eligibilities for promotion only Real system operation Operation expect by Kanthaka
  8. 8. Assumptions• CDR records can be only in .CSV format.• Event type can be in different types like SMS, Voice call, MMS, USSD, Top-up, GPRS, LBS.• CDR can be received as batches to the system asynchronously.• Only 6 attributes out of many attributes will be considered during processing.
  9. 9. Technology Comparison
  10. 10. Lot of data + higher speed --> Scale out system
  11. 11. Map Reduce Hadoop map-reduce • Can handle lot of data • Latency is high that not suitable where results are expected in near real timeTo count words of size of 100KB file Start time = 01.04.44 End time =01.05.12 Total time = 28 sec
  12. 12. DB Technology Comparison• RDMS ▫ Provide ACID properties ▫ Use sharding to scale up ▫ Managing overhead is huge in scaling up ▫ Performance degrade with higher data load ▫ Less partition tolerant
  13. 13. DB Technology Comparison Ctd.• NoSQL ▫ Lot of available options(Cassandra, HBase, MongoDB, Hive) ▫ Promised easy scale up(Lot of big users – Facebook, Twitter) ▫ Provide BASE properties under CAP theorem ▫ Hard to model the system into limited data model ▫ Partition tolerant ▫ More memory --> Higher performance
  14. 14. DB Technology Comparison Ctd.• NewSQL ▫ Provide ACID properties ▫ Familiar relational data model ▫ Options available(ScaleDB, VoltDB) ▫ Totally run on memory, hence need lot of memory ▫ Promised speed ▫ Persistency achieved by replaying logs
  15. 15. With persistency, less restricted hardware, proven performance, best to try out is NoSQL.• Cassandra – a key-value pair column family store(Used at Facebook, Twitter, eBay)• HBase – a key value pair column family store (Facebook)• MongoDB – document store(Adobe)• Hive – HDFS based database
  16. 16. YCSB Benchmarks• With more big users, active mailing lists, most promising technologies (secondary index, counters) best to try out is Cassandra.
  17. 17. Technology selectionTechnologies left behind Technologies selected• Complex Event Processing • NoSQL DB - Cassandra engines(CEP) ▫ No persistency• Rules Engine ▫ More layers  More latency• Hadoop• NoSQL DB- Hbase, MongoDB, Hive
  18. 18. Architecture
  19. 19. Project PlanMilestones Target date StatusFirst chapters of final report - DoneERU abstracts - AcceptedERU Paper 31/07/2012 DueArchitecture 06/06/2012 DoneSetting up the Cassandra cluster 06/06/2012 DoneGUI for rule define 15/06/2012 On goingBulk data load to Cassandra 15/06/2012 On goingSystem Requirement Specification 20/06/2012 DueQuery data from database periodically 26/06/2012 DueInitial Design Document 27/06/2012 DueAlgorithm for Pre-processing 10/07/2012 DueTesting 10/07/2012 DueFinal report 10/08/2012 Due
  20. 20. Risks and PossibleRemedies• NoSQL databases High performance More memoryUse an external cluster with descent memory• In the long run Performance degrade  More dataArchiving
  21. 21. • Concurrency issues handling Low speed  Locking databaseUse shadow copy• NoSQL fails to achieve requirements Options : NewSQL– VoltDB (totally run on memory) CEP (Need actions to preserve persistency )• Handling sudden peaks Should have an auto balancing mechanism ready
  22. 22. Final Deliverables• Big Data CDR Analyzer system• Research Paper• Final Report
  23. 23. References• http://www.slideshare.net/gvdinesh/cap-and- base-8169489• B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, “Benchmarking cloud serving systems with YCSB,” 2010, pp. 143–154.Visit us at Kanthaka
  24. 24. Thank You!