Your SlideShare is downloading. ×
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform


Published on

Atzmon Hen-Tov & Lior Schachter, Pontis
Businesses everywhere are increasingly challenged by their dependencies on legacy platforms. The dramatic increase in data volume, speed, and types of data is quickly outstripping the capabilities of these legacy systems. By transitioning from a legacy RDBMS to a Hadoop-based platform, Pontis was able to process and analyze billions of mobile subscriber events every day. In this talk, we’ll provide a quick overview of our legacy system, as well as our process for migrating to our target architecture. We’ll continue with a review our Hadoop platform selection process, which involved a thorough RFP and a detailed analysis of the top Hadoop platform vendors. This session will focus on how we gradually transitioned to our big data platform over the course of several product versions, resulting in higher scalability and a lower TCO in each version. We’ll outline the benefits of the target architecture, and detail how we successfully integrated Hadoop into our organization. Our session will conclude with a look at technical solutions for dealing with big data deficiencies.

Published in: Technology, Business
1 Comment
  • See the animated version here:
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Animated version
  • 2. Operat or System s OCS IN CDR PCC CRM Data Flow IntegrationLayer RT Complex Event Processing Decisioning Engine iCLM UI Marketing Operations Business Discovery, Monitoring & Reporting Visual Rules Subscriber Data Store HBa se MarketingCSR Monitoring Big Data Analytics Hadoop M/R Event, aggregation and profile Data Hive DWH Subscriber Profile Decisioning Engine Chann els
  • 3. We conducted an RFP for selecting the most Telco-Grade platform. The RFP focused on non-functional capabilities such as sustainable performance, high-availability and manageability.
  • 4. The approach  Each step should increase scalability and reduce TCO.  Runtime (OLTP) processing:  We replace the underline plumbing's-minimal changes to business logic.  All changes can be turned on/off by GUI configurations:  Modular hybrid architecture.  Ability to work in dual mode - Good for QA…But also for production (legacy)…  Analytics processing:  Calculate the Profile in M/R (Java).  Scalable.  We have the best Java developers.  Wrap it with a DSL (Domain-Specific-Languages)  That’s how we work for years – (ModelTalk paper)  Non-Java-programmers can do the Job.
  • 5. Legacy Architecture
  • 6. Phase 1
  • 7. Phase 1 – File queues in NFS Resulting context  Pure plumbing change – no changes to business logic code.  Offloading oracle: *2 Performance boost.  No BigData technology.  Windows NFS client performance is a bottleneck. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M
  • 8. Phase 2
  • 9. Phase 2 – Introducing MapR Hadoop Cluster Resulting Context  MapR FS + NFS :  Horizontally scalable  Cheap compared to high-end NFS solutions.  Fast and High-Available (using VIPs)  Avoiding another hop to HDFS (Flume, Kafka).  Many small files are stored in HDFS (100s of millions) – no need to merge files Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M
  • 10. Phase 2 – Introducing MapR Hadoop Cluster Resulting Context  Avro files:  Complex Object Graph  Troubleshooting with PIG  Out-of-the-box upgrade (e.g. adding field)  Map/Reduce is incremental – Avro record capture the subscriber state  Map/Reduce efficiency - avoiding huge joins  Subscriber Profile calculation:  Performance : 2-3 hours.  Linear scalability: No limitation on number of subscribers/raw data (buy more nodes)  Fast run over history data allows for early launch  Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes).  Data-Analysts started working over Hive environment.  No HA for OOZIE yet…  Hue is premature  MS-SQL and ODBC over Hive is slow and limited
  • 11. Phase 3
  • 12. Phase 3 –Introducing MapR M7 Table  Extensive YCSB load tests to find best table structure and read/update granularity. Main conclusions:  M7 knows how to handle very big heap – 90GB.  Update granularity : small updates (using columns) = fast reads (*)While in other KV store need to update the entire BLOB  CSR tables migrated from Oracle to M7 Table:  10s of billions of records  Need sub-second random access per subscriber  99.9% Writes – by Runtime machines (almost each event processing operation produces update)  0.1% Reads – by Customer’s CSR representative.  Rows – per subscriber key, 10’s of millions  2 CFs – TTL 365 days. 1 version.  Qualifier:  key:[date_class_event_id], value: record  Up to thousands per Row
  • 13. Phase 3 –Introducing MapR M7 Table Resulting Context  Choosing the right features – no too demanding performance wise.  Easy to create and manage tables– still there’s some tweaking.  No cross-table ACID - need to develop a solution for keeping consistency across M7 Table/Oracle/Files-system.  Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M Phase 3 unlimited 300M
  • 14. Phase 4
  • 15. Phase 4 – Migrating OLTP features to M7 tables  Subscriber State table migrated from Oracle to M7 Table:  25% Writes– by Runtime machines updating the state  100% Reads – by Runtime.  Rows – per subscriber key, 10’s of millions  1 CFs – TTL -1. 1 version.  YCSB to validate the solution  Sizing model  Qualifier:  key:state_name, value: state value.  Dozens per Row.  But….Only 10% are being updated per event  Subscriber Profile Table migrated from MS-SQL to M7 Table.  Bulk insert once a day  Outbound Queue Table migrated from MS-SQL to M7 Table.
  • 16. Phase 4 – Migrating OLTP features to M7 tables Resulting Context  No longer dependent on Oracle for OLTP.  Real-time processing can handle billions of events per day.  Sizing is linear and easy to calculate:  Number of subscribers * state size * 80% should reside in cache.  HW spec: 128GB RAM, 12 SAS drives.  Consistency management is very complicated. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M Phase 3 unlimited 300M Phase 4 unlimited unlimited
  • 17. Phase 5
  • 18. Phase 5 – Decommission legacy RDBMS Resulting Context  MySQL is not a new technology in our stack (part of MapR distribution).  Removing Oracle/MS-SQL from our architecture has significant impact on system cost, deployment, monitoring etc.
  • 19. Atzmon Hen-Tov Lior Schachter