Animated version
Operat
or
System
s
OCS
IN
CDR
PCC
CRM
Data Flow
IntegrationLayer
RT Complex
Event
Processing
Decisioning Engine
iCLM UI
Ma...
We conducted an RFP for selecting the most Telco-Grade platform.
The RFP focused on non-functional capabilities such as su...
The approach
 Each step should increase scalability and reduce TCO.
 Runtime (OLTP) processing:
 We replace the underli...
Legacy Architecture
Phase 1
Phase 1 – File queues in NFS
Resulting context
 Pure plumbing change – no changes to business logic code.
 Offloading or...
Phase 2
Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
 MapR FS + NFS :
 Horizontally scalable
 Cheap compared to ...
Phase 2 – Introducing MapR Hadoop
Cluster
Resulting Context
 Avro files:
 Complex Object Graph
 Troubleshooting with PI...
Phase 3
Phase 3 –Introducing MapR M7 Table
 Extensive YCSB load tests to find best table structure and read/update
granularity. M...
Phase 3 –Introducing MapR M7 Table
Resulting Context
 Choosing the right features – no too demanding performance wise.
 ...
Phase 4
Phase 4 – Migrating OLTP features to
M7 tables
 Subscriber State table migrated from Oracle to M7 Table:
 25% Writes– by...
Phase 4 – Migrating OLTP features to
M7 tables
Resulting Context
 No longer dependent on Oracle for OLTP.
 Real-time pro...
Phase 5
Phase 5 – Decommission legacy RDBMS
Resulting Context
 MySQL is not a new technology in our stack (part of MapR
distribut...
Atzmon Hen-Tov Lior Schachter
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform
Upcoming SlideShare
Loading in...5
×

Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform

3,515

Published on


Atzmon Hen-Tov & Lior Schachter, Pontis
Businesses everywhere are increasingly challenged by their dependencies on legacy platforms. The dramatic increase in data volume, speed, and types of data is quickly outstripping the capabilities of these legacy systems. By transitioning from a legacy RDBMS to a Hadoop-based platform, Pontis was able to process and analyze billions of mobile subscriber events every day. In this talk, we’ll provide a quick overview of our legacy system, as well as our process for migrating to our target architecture. We’ll continue with a review our Hadoop platform selection process, which involved a thorough RFP and a detailed analysis of the top Hadoop platform vendors. This session will focus on how we gradually transitioned to our big data platform over the course of several product versions, resulting in higher scalability and a lower TCO in each version. We’ll outline the benefits of the target architecture, and detail how we successfully integrated Hadoop into our organization. Our session will conclude with a look at technical solutions for dealing with big data deficiencies.

Published in: Technology, Business
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
3,515
On Slideshare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
1
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Data on the Move: Transitioning from a Legacy Architecture to a Big Data Platform"

  1. 1. Animated version
  2. 2. Operat or System s OCS IN CDR PCC CRM Data Flow IntegrationLayer RT Complex Event Processing Decisioning Engine iCLM UI Marketing Operations Business Discovery, Monitoring & Reporting Visual Rules Subscriber Data Store HBa se MarketingCSR Monitoring Big Data Analytics Hadoop M/R Event, aggregation and profile Data Hive DWH Subscriber Profile Decisioning Engine Chann els
  3. 3. We conducted an RFP for selecting the most Telco-Grade platform. The RFP focused on non-functional capabilities such as sustainable performance, high-availability and manageability.
  4. 4. The approach  Each step should increase scalability and reduce TCO.  Runtime (OLTP) processing:  We replace the underline plumbing's-minimal changes to business logic.  All changes can be turned on/off by GUI configurations:  Modular hybrid architecture.  Ability to work in dual mode - Good for QA…But also for production (legacy)…  Analytics processing:  Calculate the Profile in M/R (Java).  Scalable.  We have the best Java developers.  Wrap it with a DSL (Domain-Specific-Languages)  That’s how we work for years – (ModelTalk paper)  Non-Java-programmers can do the Job.
  5. 5. Legacy Architecture
  6. 6. Phase 1
  7. 7. Phase 1 – File queues in NFS Resulting context  Pure plumbing change – no changes to business logic code.  Offloading oracle: *2 Performance boost.  No BigData technology.  Windows NFS client performance is a bottleneck. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M
  8. 8. Phase 2
  9. 9. Phase 2 – Introducing MapR Hadoop Cluster Resulting Context  MapR FS + NFS :  Horizontally scalable  Cheap compared to high-end NFS solutions.  Fast and High-Available (using VIPs)  Avoiding another hop to HDFS (Flume, Kafka).  Many small files are stored in HDFS (100s of millions) – no need to merge files Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M
  10. 10. Phase 2 – Introducing MapR Hadoop Cluster Resulting Context  Avro files:  Complex Object Graph  Troubleshooting with PIG  Out-of-the-box upgrade (e.g. adding field)  Map/Reduce is incremental – Avro record capture the subscriber state  Map/Reduce efficiency - avoiding huge joins  Subscriber Profile calculation:  Performance : 2-3 hours.  Linear scalability: No limitation on number of subscribers/raw data (buy more nodes)  Fast run over history data allows for early launch  Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes).  Data-Analysts started working over Hive environment.  No HA for OOZIE yet…  Hue is premature  MS-SQL and ODBC over Hive is slow and limited
  11. 11. Phase 3
  12. 12. Phase 3 –Introducing MapR M7 Table  Extensive YCSB load tests to find best table structure and read/update granularity. Main conclusions:  M7 knows how to handle very big heap – 90GB.  Update granularity : small updates (using columns) = fast reads (*)While in other KV store need to update the entire BLOB  CSR tables migrated from Oracle to M7 Table:  10s of billions of records  Need sub-second random access per subscriber  99.9% Writes – by Runtime machines (almost each event processing operation produces update)  0.1% Reads – by Customer’s CSR representative.  Rows – per subscriber key, 10’s of millions  2 CFs – TTL 365 days. 1 version.  Qualifier:  key:[date_class_event_id], value: record  Up to thousands per Row
  13. 13. Phase 3 –Introducing MapR M7 Table Resulting Context  Choosing the right features – no too demanding performance wise.  Easy to create and manage tables– still there’s some tweaking.  No cross-table ACID - need to develop a solution for keeping consistency across M7 Table/Oracle/Files-system.  Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M Phase 3 unlimited 300M
  14. 14. Phase 4
  15. 15. Phase 4 – Migrating OLTP features to M7 tables  Subscriber State table migrated from Oracle to M7 Table:  25% Writes– by Runtime machines updating the state  100% Reads – by Runtime.  Rows – per subscriber key, 10’s of millions  1 CFs – TTL -1. 1 version.  YCSB to validate the solution  Sizing model  Qualifier:  key:state_name, value: state value.  Dozens per Row.  But….Only 10% are being updated per event  Subscriber Profile Table migrated from MS-SQL to M7 Table.  Bulk insert once a day  Outbound Queue Table migrated from MS-SQL to M7 Table.
  16. 16. Phase 4 – Migrating OLTP features to M7 tables Resulting Context  No longer dependent on Oracle for OLTP.  Real-time processing can handle billions of events per day.  Sizing is linear and easy to calculate:  Number of subscribers * state size * 80% should reside in cache.  HW spec: 128GB RAM, 12 SAS drives.  Consistency management is very complicated. Phase # Customers # Events Legacy 10M 120M Phase 1 10M 200M Phase 2 unlimited 200M Phase 3 unlimited 300M Phase 4 unlimited unlimited
  17. 17. Phase 5
  18. 18. Phase 5 – Decommission legacy RDBMS Resulting Context  MySQL is not a new technology in our stack (part of MapR distribution).  Removing Oracle/MS-SQL from our architecture has significant impact on system cost, deployment, monitoring etc.
  19. 19. Atzmon Hen-Tov Lior Schachter

×