1. Big Data in Telecommunications
A Practical Roadmap for the Colombian CSP
DavidCallaghan
Senior Platform Architect/Data Scientist
david.callaghan@2cdata.com
2. Big Data in Telecommunications: A Practical Roadmap for the Colombian CSP
Table of Contents
Executive Summary..............................................................................................................................2
Opportunities.........................................................................................................................................3
Customer Experience........................................................................................................................5
Network Management......................................................................................................................7
Challenges.............................................................................................................................................8
Staffing and Skills.............................................................................................................................8
Business Support..............................................................................................................................8
Current Environment........................................................................................................................8
Big Data SDLC.....................................................................................................................................9
Roadmap..............................................................................................................................................11
Definitions...........................................................................................................................................13
Big Data..........................................................................................................................................13
NoSQL............................................................................................................................................16
Mobile.............................................................................................................................................19
Social..............................................................................................................................................20
Appendix I : Industry Survey..............................................................................................................20
Copyright Dos Chihuahuas, LLC 2013 DRAFT ONLY not for distribution 1of 25
5. Big Data in Telecommunications: A Practical Roadmap for the Colombian CSP
The following data mining applications are typical of CSP's and can be accomplished with the data described
above.
• Fraud Detection
• Subscription Fraud : Customer opens account with no intention of paying
• Superimposition Fraud : Legitimate account with legitimate activity with some illegitimate
activity superimposed
• Customer Profiling
Managing customer churn represents a very profitable area for the application of predictive analysis.
A significant cost is incurred when a customer leaves. For example, when competing companies offer
incentives, such as a $50 bonus, people switch carriers repeatedly to earn incentives. Utilizing call
detail, billing subscription and customer information, it is possible to create an induced model to
inform next best action.
In 1991, using graph analysis, MCI calculated that it would be cheaper to add entire calling circles to
a plan rather than adding individuals. This resulted in the MCI Friends and Family plan, which was
one of the most successful marketing plans in telecommunication history. It is interesting to note
that MCI ultimately decided to have customers defines their circle rather than using the call detail
data because of privacy concerns.
• Network Fault Isolation
Most of the network elements are capable of at least limited selfdiagnosis, and these elements may
collectively generate millions of status and alarm messages each month. Because of the volume of
the data, and because a single fault may cause many different, seemingly unrelated, alarms to be
generated, the task of network fault isolation is quite difficult. Data mining has a role to play in
generating rules for identifying faults.
Telecommunication Alarm Sequence Analyzer (TASA) automatically discovers recurrent
patterns of alarms within the network data along with their statistical properties, using a
specialized data mining algorithm.
Each of these activities can be accomplished using structured and semistructured internal data and fits into
the two categories that we will identify as key business drivers for CSPs in Colombia today :
• Customer Experience
• Network Management
Copyright Dos Chihuahuas, LLC 2013 DRAFT ONLY not for distribution 4of 25
10. Big Data in Telecommunications: A Practical Roadmap for the Colombian CSP
• Keeping the data in Big Data initiatives secure from external parties
• Getting functional managers to make decisions based on Big Data, rather than on intuition
• Reskilling the IT function to be able to use the new tools and technologies of Big Data
The best way to address these challenges is to build them into the DNA of a new organization unit
explicitely charges with evangelizing and delivering Big Data solutions. This is discussed next in the
Roadmap.
Big Data SDLC
Big Data is a platform rather than a prepackaged solution. Specifically, Big Data is an platform upon which
you can build an entire ecosystem of products for the enterprise. However, it is not enterprise development.
Algorithms that are effective on GB of data are untenable at TB scale. The same is true of error rates of 1%.
Waterfall approaches were acceptable, although far from optimal, at the enterprise level because the data
was small enough to allow interdepartmental politics to trump effectitve algorithmic design. Projects are
different at scale and its a mistake to take enterpriselevel thinking to bigdata scale.
Software Development Lifecycle at Scale
• Start Simply
• Prototype Perpetutally
• Optimize Obsessively
• Be Opportunistic For Wins
The 2Cdata Knowledge Cycle: a Scalable, Repeatable, Flexible Approach
1. Define an Objective
Define a clear and measurable outcome. Using a framework when defining objectives can help.
1. Answer existing questions in existing businesses, with a focus on improved efficiency
2. Answer new questions in existing businesses, with a focus on opportunities for growth
3. Answer new questions in new businesses, with the goal of reshaping the competitive landscape
This is an ordered list for a reason; these could also be interpreted as phases or degrees of difficulty. You
may not want to reshape the competitive landscape before you parse a clickstream log, for example.
2. Identify Controls
Copyright Dos Chihuahuas, LLC 2013 DRAFT ONLY not for distribution 9of 25
22. Big Data in Telecommunications: A Practical Roadmap for the Colombian CSP
• Automated decisions for realtime processes 37%
• Definitions of churn and other customer behaviors 35%
• Detection of fraud 33%
• Greater leverage and ROI for big data 30%
• Quantification of risks 30%
• Trending for market sentiments 30%
• Understanding of business change 29%
• Better planning and forecasting 29%
• Identification of root causes of cost 29%
• Understanding consumer behavior from clickstreams 27%
• Manufacturing yield improvements 6%
• Other 4%
The benefits were the broken down as follows:
Customer Experience
• Better targeted social media influencer marketing 61%
• Recognition of sales and marketing opportunities 38%
• Definitions of churn and other customer behaviors 35%
• Understanding consumer behavior from clickstreams 27%
BI in general can benefit
• More numerous and accurate business insights 45%
• Understanding of business change 29%
• Better planning and forecasting 29%
• Identification of root causes of cost 29%
Specific applications
• Automated decisions for realtime processes 37%
• Detection of fraud 33%
Copyright Dos Chihuahuas, LLC 2013 DRAFT ONLY not for distribution 21of 25
23. Big Data in Telecommunications: A Practical Roadmap for the Colombian CSP
• Quantification of risks 30%
• Trending for market sentiments 30%
In your organization, what are the top potential barrier to implementing big data analytics?
• Inadequate staffing or skills for big data analytics 46%
• Cost, overall 42%
• Lack of business sponsorship 38%
• Difficulty of architecting big data systems 33%
• Current database software lacks indatabase analytics 32%
• Lack of compelling business case 28%
• Scalability problems with big data 23%
• Cannot make big data usable for end users 22%
• Database software cannot process analytic queries fast enough 22%
• Current data warehouse modeled for reorts and OLAP only 22%
• Current database software cannot load data fast enough 21%
• Can't find Hadoop experts to hire 11%
• Can't fund Hadoop's high operational expenses 7%
• Other 6%
The challenges were the broken down as follows:
Inadequate staffing and skills are leading barrier
• Inadequate staffing or skills for big data analytics 46%
• Difficulty of architecting big data systems 33%
• Cannot make big data usable for end users 22%
• Can't find Hadoop experts to hire 11%
Lack of business support
• Cost, overall 42%
• Lack of business sponsorship 38%
Copyright Dos Chihuahuas, LLC 2013 DRAFT ONLY not for distribution 22of 25