• Save
Gaining Support for Hadoop in a Large Corporate Environment
Upcoming SlideShare
Loading in...5
×
 

Gaining Support for Hadoop in a Large Corporate Environment

on

  • 154 views

 

Statistics

Views

Total Views
154
Views on SlideShare
154
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • As a presenter of advanced analytics proof of concepts to other corporations, I am questioned most frequently on the “how” by my audiences. Not the “how” about the technology or the data we used, but “how” we were able to gain momentum and support in a large corporate enterprise to incorporate new technology and practices in analytics. I will share with you how a major telecommunications company, Sprint, created a research team of just 8 people who were able to infect the Enterprise with new infrastructure, new data, and new analytics and transforming them into new business benefits. <br />   <br /> When I speak with other companies on advanced analytics proof of concepts, the focus of their questions skips quickly past the “what” onto the “how” – how did we gain support, how did we find success, how did we decide which technology to select. I will share with you some of the lessons we learned as well as answer many of these questions. This discussion will showcase how Sprint, a major telecommunications company, went from issuing a research challenge to enabling the entire enterprise in the area of analytics. I’ll walk you through how we repurposed an existing team and started with our first Proof of Concept on Hadoop. We are now in the midst of setting up a multi-petabyte enterprise supported Hadoop system with multiple funded projects, are augmenting our research facilities, and have a long list of use case trials in the works. <br />
  • Capture data “before” it is processed by the Enterprise databases <br /> Merge streaming Data with static data from existing databases <br /> Include geospatial tools from the start <br /> Allow standard query language to allow anyone to access & use <br /> Make it easy to create UDFs <br /> Use off the shelf hardware and open source where possible <br /> Use off the shelf visualization tools

Gaining Support for Hadoop in a Large Corporate Environment Gaining Support for Hadoop in a Large Corporate Environment Presentation Transcript

  • Gaining Support for Hadoop in a Large Corporate Environment Tuesday, June 3, 2014 Hadoop for Business Apps, Hadoop Summit
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Overview. 2 • Create the team - Who are We • Research challenge. • Evaluate the data - Resource Evaluation • What did we learn? - New Analytics - New Benefits - New Data - New Infrastructure • How did we move out of Research and into the Enterprise?
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. About Me. 3 • Jennifer Lim has over 14 years of experience in large enterprise data warehousing and analytics. Most recently, she was a Research Scientist for the Sprint Advanced Analytics Lab and is now acting as a Lead Technology Architect, focusing on upgrading the enterprise analytics infrastructure in support of all those great use cases being discovered in the research lab. She has an MBA from Avila University, with a BS from Iowa State University. Jennifer.Lim@sprint.com • Sprint is widely recognized for developing, engineering and deploying innovative technologies, including the first wireless 4G service from a national carrier in the United States; offering industry-leading mobile data services, leading prepaid brands including Virgin Mobile USA, Boost Mobile, and Assurance Wireless; instant national and international push-to-talk capabilities; and a global Tier 1 Internet backbone. www.sprint.com About Sprint. View slide
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. The Team. Advanced Analytics Lab 4 • The CTO took a team focused on Network Technology Research and refocused them onto the new “gold”: Data. • Data Research Scientists and RF Engineers engaged in - Mobile Internet Research • Security & Privacy • Location: location accuracy, population estimation • Social Connection: social networks, influence, churn - Network Research • Wireless and IP Networks • Wireless and wireline security: fraud prevention - Architecture Research • Performing data platform & tool evaluations - Prototype Development • Use Case Development • Demonstration of new technologies & capabilities Summer 2011 View slide
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Our Journey… 5  from data optimization  to a research idea  to a realization - was our data in the right place?  to developing a Hadoop-based analysis environment  to enhancing the technical capabilities of the enterprise data warehouse …to create Actionable Insights.
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Historically – Data utilized for Optimization Tasks. 6
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. The Research Challenge. 7 XDRs Voice Texting IP Video Websites Visited Location Applications Used Social Networks Calls & Texts Find Insights Available No Where Else Find New Use Cases
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Proof of Concept. 8 Transition --- from optimizing to asking questions about the data October 2011
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Prototype Infrastructure. 9 • Current Enterprise infrastructure couldn’t be used to build the prototypes - No formal IT project, so we couldn’t use IT resources. - We didn’t have the funding to buy the latest & greatest. - We needed something that could store a lot of data without a lot of prep. - We wanted to experiment. • Current Lab infrastructure couldn’t be used to build the prototypes - Network focused - File based, focused on finding specific traffic in same geo-location • Look around, found some servers, dusted them off…grabbed open source Hadoop. - 5 TBs, our servers were all memory & no disc - 5 data nodes & 1 manager node
  • What Did We Learn? New Analytics New Benefits New Data New Infrastructure 10
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Analytics. 11 and creating this… Finding ways to take network events… Using Network data to create new Products, Increase Customer Satisfaction, Attract new Customers by providing actionable insights to Customers and Enterprise decision makers
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Benefits. New Data. 12 • Incorporation of new, insightful data sets • Incorporation of new, specialized business rules • Geospatial! Techniques • Examination of new Business Intelligence and Visualization tools Becoming the Advocates & Demonstrators for new analytics
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Infrastructure. Lab Cluster. 13 • Trials of distributions & server setups. • Training of internal resources. Big Data User Group. • Expansion of teams able to run Prototypes on the cluster. - Usage Based Cost / Finance - Application data transforms / Product - Location Accuracy Improvement / Network - Pathing Analysis / Marketing - Device Behavior Analysis / Device - Customer Text Analytics / Care - …. - approximately 1 Petabyte, our servers have 4 TB data drives and 256GB RAM - 30 nodes…23 data nodes, with management nodes & visualization nodes June 2013
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. New Infrastructure. Production Cluster. 14 • Standard Visualizations and Analytics Tools Integrated. • Funding Proven Use Cases. • IT process & controls related to continuous data loading, transformations, and reliability. • Standards established. • Resources scaled – from a team of 5 supporting the lab cluster to more than 5 teams responsible for the system. - Over 2 Petabytes, our servers have 4 TB data drives and 256GB RAM (same as the lab cluster) - 52 data nodes, with management nodes & visualization nodes May 2014
  • ©2014 Sprint. This information is subject to Sprint policies regarding use and is the property of Sprint and/or its relevant affiliates and may contain restricted, confidential or privileged materials intended for the sole use of the intended recipient. Any review, use, distribution or disclosure is prohibited without authorization. Enterprise Analytics Architecture. Changes. Agile: • Enable faster development cycle • Deal with structured & unstructured data Scalable Hadoop environment: • Billions of objects, high read/write volume, terabytes / petabytes • Distribution model & consistency Partnering Across the Enterprise. Big Data User Group. • Marketing – Loyalty & Retention • Network Development & Engineering • Network Planning & Forecasting • Finance Accounting • Product – Consumer Aps & Entertainment • Product – Messaging & Instant Communications • Enterprise Architecture • IT Application Development & Operations • IT Data Management…
  • 16