SlideShare a Scribd company logo
1 of 20
Wrangling Customer Usage
Data with Hadoop
Clearwire – Thursday, June 27th
Carmen Hall – IT Director
Mathew Johnson – Sr. IT Manager
Starting With…
• …a little ingenuITy!
ingenuITy Day @ Clearwire
• Opportunity for everyone in IT to innovate and present
new and even crazy ideas
• One of those crazy ideas was from Roger Hosto
• Roger had the solution for Clearwire’s Big Data
problem: Hadoop
But Wait!
• Now we had a solution for Big Data
• We needed a Big Data opportunity
• We had just the thing…
The Perfect Problem
• Customer Usage Data – our commodity to Wholesale
partners
Totally (un)Wired
• Americans used more than 1,304 petabytes of
wireless data in 2012 - an increase of 69.3% over the
previous 12 months' usage (827 TB)
• Clearwire processes over 3B individual usage detail
records each month
Shifting Landscape
• The U.S. wireless industry is a $195.5 billion
enterprise - larger than publishing, agriculture, hotels
and lodging, air transportation and movies – just to
name a few
• Prepaid/Pay-As-You-Go services' share of overall
market penetration is 23.4% driving higher exposure of
lost revenue if usage delivery is delayed.
• In some cases, a customer can consume data faster
than we can bill for it
Anatomy Of Latency - Legacy
IT Usage
Processing
ASN GW PTS SPB Wholesale
Partners
Internet
AAA
OSS SDU
1 Hour Up to 90 Minutes
Let’s Talk Numbers
• Assume a 2GB plan
• An HD movie from Netflix consumes 2+ GB per hour
• Assume wholesale price = $6/GB
• Assume the retail price for a GB of data (as top up or
overage) ranges from $20 – $100
As if that wasn’t enough -
• Clearwire was locked into a very expensive vendor
contract which handled both network provisioning and
usage delivery needs
• Legacy solution was not adaptable or flexible
• We needed something innovative, reliable, internally
supportable, scalable – and we needed it fast
Putting ingenuITy to Work!
• Roger’s idea was suddenly a project
• We needed to build a platform to ingest, process, and
provide cleaned usage data for downstream
applications – and quickly
• We needed:
• A Hadoop Cluster
• 24x7 Operations
• Code to ingest data and handle a myriad of business
rules
• Integration with legacy and new systems
Atlas was Born
• Development work began immediately on Clearwire’s
private cloud infrastructure
• Selected BigTop Packaging of Apache Hadoop v1.0.1
• Custom code leveraging Hive and other common tools
to ingest and process data was written
• Infrastructure was built
Hybrid Approach to Hadoop
• Virtual Edge Nodes
• Leveraged our existing private cloud
• Physical Data Nodes
• Per Unit Cost (Storage & CPU) was lower than
existing infrastructure
• Smaller and more efficient than you think
• 24 data nodes, each with 3TB of usable storage
• Gives us 72TB of usable space
• 3x block replication for production data
• Deployed identical DR/Analytics platform
Operational in No Time
• 2.5 months from project approval to production
• Leveraged our existing support organizations
• Solution leveraged common tools, did not require
specialized teams
• Fault tolerance inherent within Hadoop helps us
minimize late night calls
• An endless supply of data was quickly flowing through
the system
• The results were looking good!
Real Results
• 65% improvement in end to end delivery times
• From 2.5 hours to 1.3 hours
• Reduced catch up time from upstream outages by
more than half
• Reduced outage impacts by introducing flexibility to
deliver partial files
• Eliminated 4 hour weekly usage delivery outages tied
to provisioning system maintenance
Anatomy of Latency - Now
ASN GW PTS SPB Wholesale
Partners
Internet
AAA
OSS SDU
1 Hour Average of 15 Minutes
Atlas Medusa
~6 Minutes ~9 Minutes
Real (Financial) Results
• 6 month return on investment
• Delivered at 1/3 the cost of competing solutions
• Foundational – Enabling Wholesale support plan of
legacy platform migration
• Saving Clearwire 10’s of millions of dollars over life of
contract and internalizing support and development
The Intangibles
• Proved to internal and external partners that we
deliver what we promise with limited negative impacts
to ongoing business
• This was KEY to the speed at which we were able to
migrate our billing platform
• Delivered more than just a single, targeted process –
delivered an enterprise usage platform to grow from
• Kept true to our innovative spirit and the commitment
to IT professionals that they can make a difference
Evolution – Proving More
The Atlas Hadoop platform is now a go-to IT solution
• LTE Usage Data – Now in production
• Other Data Sources - ESR Data
• Data Replication and real-time ETL
• Exploring opportunities with network team to move
closer to usage generation
• Changing mindset of what IT can mean to an
organization
Q & A

More Related Content

What's hot

What's hot (20)

How Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data CenterHow Cloud Providers are Playing with Traditional Data Center
How Cloud Providers are Playing with Traditional Data Center
 
Benefits of Transforming to a Hybrid Infrastructure - HPE
Benefits of Transforming to a Hybrid Infrastructure - HPEBenefits of Transforming to a Hybrid Infrastructure - HPE
Benefits of Transforming to a Hybrid Infrastructure - HPE
 
Postgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy SystemPostgres Vision 2018: Making Modern an Old Legacy System
Postgres Vision 2018: Making Modern an Old Legacy System
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
Increase your ROI with Hadoop in Six Months - Presented by Dell, Cloudera and...
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
 
Extensibility: The Key To Managing Your Entire Cloud Portfolio
Extensibility: The Key To Managing Your Entire Cloud PortfolioExtensibility: The Key To Managing Your Entire Cloud Portfolio
Extensibility: The Key To Managing Your Entire Cloud Portfolio
 
Appplications – Driving Expansion In The Cloud
Appplications – Driving Expansion In The CloudAppplications – Driving Expansion In The Cloud
Appplications – Driving Expansion In The Cloud
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Sprint's Data Modernization Journey
Sprint's Data Modernization JourneySprint's Data Modernization Journey
Sprint's Data Modernization Journey
 
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
Cloudera + Syncsort: Fuel Business Insights, Analytics, and Next Generation T...
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big Data
 
Augmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data ScientistAugmented Analytics and Automation in the Age of the Data Scientist
Augmented Analytics and Automation in the Age of the Data Scientist
 
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets
Put Alternative Data to Use in Capital Markets

Put Alternative Data to Use in Capital Markets

 
Big Data LDN 2017: The 3rd Wave of Business Intelligence
Big Data LDN 2017: The 3rd Wave of Business IntelligenceBig Data LDN 2017: The 3rd Wave of Business Intelligence
Big Data LDN 2017: The 3rd Wave of Business Intelligence
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
4 Phases of Cloud Optimization
4 Phases of Cloud Optimization4 Phases of Cloud Optimization
4 Phases of Cloud Optimization
 

Viewers also liked

Collaborative prioritization
Collaborative prioritizationCollaborative prioritization
Collaborative prioritization
mwiberg
 

Viewers also liked (12)

Boikido nově na Feedo.cz!
Boikido nově na Feedo.cz!Boikido nově na Feedo.cz!
Boikido nově na Feedo.cz!
 
Yg300
Yg300Yg300
Yg300
 
Nca e
Nca eNca e
Nca e
 
European commission
European commissionEuropean commission
European commission
 
Sinlac jídelníček (nejen) pro malé alergiky - Feedo.cz
Sinlac jídelníček (nejen) pro malé alergiky - Feedo.czSinlac jídelníček (nejen) pro malé alergiky - Feedo.cz
Sinlac jídelníček (nejen) pro malé alergiky - Feedo.cz
 
Nca en
Nca enNca en
Nca en
 
eSmart Libraries cybersafety presentation june 2015
eSmart Libraries cybersafety presentation june 2015eSmart Libraries cybersafety presentation june 2015
eSmart Libraries cybersafety presentation june 2015
 
Terapia centrada en el cliente
Terapia centrada en el clienteTerapia centrada en el cliente
Terapia centrada en el cliente
 
Collaborative prioritization
Collaborative prioritizationCollaborative prioritization
Collaborative prioritization
 
Smashwords
SmashwordsSmashwords
Smashwords
 
Objetos sagrados para la liturgia
Objetos sagrados para la liturgiaObjetos sagrados para la liturgia
Objetos sagrados para la liturgia
 
eSmart Libraries social media presentation 2015
eSmart Libraries social media presentation 2015eSmart Libraries social media presentation 2015
eSmart Libraries social media presentation 2015
 

Similar to Wranging customer data hadoop june2013

GraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in GraphdatenbankenGraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in Graphdatenbanken
Neo4j
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
dickonsondorris
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Aitp presentation ed holub - october 23 2010
Aitp presentation   ed holub - october 23 2010Aitp presentation   ed holub - october 23 2010
Aitp presentation ed holub - october 23 2010
AITPHouston
 

Similar to Wranging customer data hadoop june2013 (20)

Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data IntegrationWebinar: The 5 Most Critical Things to Understand About Modern Data Integration
Webinar: The 5 Most Critical Things to Understand About Modern Data Integration
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
GraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in GraphdatenbankenGraphTalk Frankfurt - Einführung in Graphdatenbanken
GraphTalk Frankfurt - Einführung in Graphdatenbanken
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
Content1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docxContent1. Introduction2. What is Big Data3. Characte.docx
Content1. Introduction2. What is Big Data3. Characte.docx
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
NCET Tech Bite | Ron Husey, Moving Your Business to the Cloud | Mar 2016
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Customer Use Case Featuring Hightail
Customer Use Case Featuring HightailCustomer Use Case Featuring Hightail
Customer Use Case Featuring Hightail
 
Aitp presentation ed holub - october 23 2010
Aitp presentation   ed holub - october 23 2010Aitp presentation   ed holub - october 23 2010
Aitp presentation ed holub - october 23 2010
 
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
The Executive View on Big Data Platform Hosting - Evaluating Hosting Services...
 
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
Financial Services Technology Leader Turns Mainframe Logs into Real-Time Insi...
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 

Wranging customer data hadoop june2013

  • 1. Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager
  • 2. Starting With… • …a little ingenuITy!
  • 3. ingenuITy Day @ Clearwire • Opportunity for everyone in IT to innovate and present new and even crazy ideas • One of those crazy ideas was from Roger Hosto • Roger had the solution for Clearwire’s Big Data problem: Hadoop
  • 4. But Wait! • Now we had a solution for Big Data • We needed a Big Data opportunity • We had just the thing…
  • 5. The Perfect Problem • Customer Usage Data – our commodity to Wholesale partners
  • 6. Totally (un)Wired • Americans used more than 1,304 petabytes of wireless data in 2012 - an increase of 69.3% over the previous 12 months' usage (827 TB) • Clearwire processes over 3B individual usage detail records each month
  • 7. Shifting Landscape • The U.S. wireless industry is a $195.5 billion enterprise - larger than publishing, agriculture, hotels and lodging, air transportation and movies – just to name a few • Prepaid/Pay-As-You-Go services' share of overall market penetration is 23.4% driving higher exposure of lost revenue if usage delivery is delayed. • In some cases, a customer can consume data faster than we can bill for it
  • 8. Anatomy Of Latency - Legacy IT Usage Processing ASN GW PTS SPB Wholesale Partners Internet AAA OSS SDU 1 Hour Up to 90 Minutes
  • 9. Let’s Talk Numbers • Assume a 2GB plan • An HD movie from Netflix consumes 2+ GB per hour • Assume wholesale price = $6/GB • Assume the retail price for a GB of data (as top up or overage) ranges from $20 – $100
  • 10. As if that wasn’t enough - • Clearwire was locked into a very expensive vendor contract which handled both network provisioning and usage delivery needs • Legacy solution was not adaptable or flexible • We needed something innovative, reliable, internally supportable, scalable – and we needed it fast
  • 11. Putting ingenuITy to Work! • Roger’s idea was suddenly a project • We needed to build a platform to ingest, process, and provide cleaned usage data for downstream applications – and quickly • We needed: • A Hadoop Cluster • 24x7 Operations • Code to ingest data and handle a myriad of business rules • Integration with legacy and new systems
  • 12. Atlas was Born • Development work began immediately on Clearwire’s private cloud infrastructure • Selected BigTop Packaging of Apache Hadoop v1.0.1 • Custom code leveraging Hive and other common tools to ingest and process data was written • Infrastructure was built
  • 13. Hybrid Approach to Hadoop • Virtual Edge Nodes • Leveraged our existing private cloud • Physical Data Nodes • Per Unit Cost (Storage & CPU) was lower than existing infrastructure • Smaller and more efficient than you think • 24 data nodes, each with 3TB of usable storage • Gives us 72TB of usable space • 3x block replication for production data • Deployed identical DR/Analytics platform
  • 14. Operational in No Time • 2.5 months from project approval to production • Leveraged our existing support organizations • Solution leveraged common tools, did not require specialized teams • Fault tolerance inherent within Hadoop helps us minimize late night calls • An endless supply of data was quickly flowing through the system • The results were looking good!
  • 15. Real Results • 65% improvement in end to end delivery times • From 2.5 hours to 1.3 hours • Reduced catch up time from upstream outages by more than half • Reduced outage impacts by introducing flexibility to deliver partial files • Eliminated 4 hour weekly usage delivery outages tied to provisioning system maintenance
  • 16. Anatomy of Latency - Now ASN GW PTS SPB Wholesale Partners Internet AAA OSS SDU 1 Hour Average of 15 Minutes Atlas Medusa ~6 Minutes ~9 Minutes
  • 17. Real (Financial) Results • 6 month return on investment • Delivered at 1/3 the cost of competing solutions • Foundational – Enabling Wholesale support plan of legacy platform migration • Saving Clearwire 10’s of millions of dollars over life of contract and internalizing support and development
  • 18. The Intangibles • Proved to internal and external partners that we deliver what we promise with limited negative impacts to ongoing business • This was KEY to the speed at which we were able to migrate our billing platform • Delivered more than just a single, targeted process – delivered an enterprise usage platform to grow from • Kept true to our innovative spirit and the commitment to IT professionals that they can make a difference
  • 19. Evolution – Proving More The Atlas Hadoop platform is now a go-to IT solution • LTE Usage Data – Now in production • Other Data Sources - ESR Data • Data Replication and real-time ETL • Exploring opportunities with network team to move closer to usage generation • Changing mindset of what IT can mean to an organization
  • 20. Q & A