Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include:-High potential for disruption to IT or the biz-Need for a major dollar investment-Risk being late to adopt
Level Seven - Expedient Big Data presentation
IntroductionNavigate. Doug DentonBig Data Practice LeadLevel SevenGuide. Tim HoolihanCTO, Dir. Of Strategic ServicesLevel SevenExplore. Michael DeAloiaRegional Vice PresidentExpedient Data Centers
Agenda3:00 – Welcome & Introductions3:05 - Explore the concept of Big Data3:30 - Navigate through initial projects4:00 – Beer break4:15 – Back to work4:30 - Guide to full company adoption5:00 – QA and more beer (tour departs)
“BIG DATA DREAMS” Michael C. DeAloiaRegional Vice President - ClevelandExplore. Navigate. Guide.
Founded in 2001Cleveland Ohio12 Year Tenure InTechnologyInfrastructureServicesScalable PlatformDesignPetabytes of Storage,100s of Terabytes ofMemory in our CloudThousands ofCustomers2x Growth Y/Y inCloud ServicesBIGDATADREAMS::THEEXPEDIENTECOLOGY
What is Big Data?History of Big Data8 Laws of Big DataQ&ABig Data by the NumbersBIGDATADREAMS::ROADMAP
What is Big Data?Gartner has deﬁned ‘Big Data’ asa Strategic Technology for 2013.BIGDATADREAMS::WHATISBIGDATA
What is Big Data?• “Big Data Dreams” 11 Big Data /bɪɡ dātə/ n. A collection ofdata sets so large and complex that itbecomes difficult to process using on-handdatabase management tools or traditionaldata processing applications. %%Big Data challenges include capture,curation, storage, search, sharing,transfer, analysis and visualization. %BIGDATADREAMS::WHATISBIGDATA
What is Big Data?• “Big Data Dreams” 12 The three Vs characterize what big data is all about, and alsohelp deﬁne the major issues that IT needs to address:• Volume The massive scale and growth of unstructureddata outstrips traditional storage and analytical solutions.• Variety Traditional data management processes can’tcope with the heterogeneity of big data—or “shadow” or“dark data,” such as access traces and Web searchhistories.• Velocity Data is generated in real time, with demands forusable information to be served up immediately.BIGDATADREAMS::WHATISBIGDATA
What is Big Data?• “Big Data Dreams” 13 “Big Data is the new oil.”-Bryan Trogdonas quotedin ‘The Future of Big Data’Pew Research SurveyBIGDATADREAMS::WHATISBIGDATA
What is Big Data?• “Big Data Dreams” • A technology-‐enabled strategy for gaining richer, deeper insights into customers, partners, and the business—and ulEmately gaining compeEEve advantage. • Working with data sets whose size and variety is beyond the ability of typical database soLware to capture, store, manage, and analyze. • Processing a steady stream of real-‐Eme data in order to make Eme-‐sensiEve decisions faster than ever before. • Distributed in nature. AnalyEcs processing goes to where the data is for greater speed and eﬃciency. • A new paradigm in which IT collaborates with business users and “data scienEsts” to idenEfy and implement analyEcs that will increase operaEonal eﬃciency and solve new business problems. • Moving decision making down in the organizaEon and empowering people to make beOer, faster decisions in real Eme. • Just about technology. At the business level, it’s about how to exploit the vastly enhanced sources of data to gain insight. • Only about volume. It’s also about variety and velocity. But perhaps most important, it’s about value derived from the data. • Generated or used only by huge online companies like Google or Amazon anymore. While Internet companies may have pioneered the use of big data at web scale, applicaEons touch every industry. • About “one-‐size-‐ﬁts-‐all” tradiEonal relaEonal databases built on shared disk and memory architecture. Big data uses a grid of compuEng resources for massively parallel processing (MPP). • Meant to replace relaEonal databases or the data warehouse. Structured data conEnues to be criEcally important. However, tradiEonal systems may not be suitable for the new sources and contexts of big data. Big Data Analytics IS: Big Data Analytics IS NOT:BIGDATADREAMS::WHATISBIGDATA
What is Big Data?• “Big Data Dreams” “Every two days now we create as muchinformation as we did from the dawn ofcivilization up until 2003. That’s somethinglike ﬁve exabytes of data”- Erik Schmidt, CEOGoogle“By 2015 the digital universe is expected toreach 8 zettabytes.”- IntelBIGDATADREAMS::WHATISBIGDATA
16 1 zettabyte = 18 million copies of the Library of CongressBIGDATADREAMS::WHATISBIGDATA
A new kind of professional is helping organizations makesense of the massive streams of digital information: the datascientist. Data scientists are responsible for modeling complexbusiness problems, discovering business insights, andidentifying opportunities.They bring to the job:• Skills for integrating and preparing large, varied data sets• Advanced analytics and modeling skills to reveal andunderstand hidden relationships• Business knowledge to apply context• Communication skills to present resultsWho works Big Data?BIGDATADREAMS::WHATISBIGDATA
34 More sources and more devices• Mobile• Pictures• Video• SMS• GPS• Social Media• Facebook• Twitter• Youtube• Reviews• Automated Sources• RFID• Telemetry• Security camerasReal-time correlation ofdata can be turned intogolden nuggets ofinformation.BIGDATADREAMS::BYTHENUMBERS
35 Big Data Law #1The Faster You Analyze Your Data, theGreater its Predictive Power.BIGDATADREAMS::THE8LAWSOFBIGDATAGreat list developed by Dave Feinleib – Managing Director of Big Data Group.
36 Big Data Law #2Maintain one copy of your data, notdozens.BIGDATADREAMS::THE8LAWSOFBIGDATA
37 Big Data Law #3Use more diverse data, not just moredata.BIGDATADREAMS::THE8LAWSOFBIGDATA
38 Big Data Law #4Data has value far beyond what youoriginally anticipate.BIGDATADREAMS::THE8LAWSOFBIGDATA
39 Big Data Law #5Plan for Exponential GrowthBIGDATADREAMS::THE8LAWSOFBIGDATA
40 Big Data Law #6Solve a real pain point.BIGDATADREAMS::THE8LAWSOFBIGDATA
41 Big Data Law #7Put data and humans together to getmore insight.BIGDATADREAMS::THE8LAWSOFBIGDATA
42 Big Data Law #8Big Data is transforming business thesame way IT did.BIGDATADREAMS::THE8LAWSOFBIGDATA
43 Q&AMichael C. DeAloiaRegional Vice PresidentExpedient Data Centersm) 216.212.4067e) email@example.comBIGDATADREAMS::QUESTIONS&ANSWERS
Charting the Course to Big Data Implementation.Doug Denton Tim HoolihanBig Data Practice Lead CTO, Dir. of Strategic ServicesExplore. Navigate. Guide.
What’s Diﬀerent About Big Data?• Data that IT historically ignores• Too much, too fast, too dirty to handle• Represents 80% of all data• Very diﬀerent way of thinking about data• Very diﬀerent way of processing data• A VERY BIG DEALYou were blind, but now you see.
Why Now?• Pretending 80% of data did not exist is OK• Not really, numb & blind is no way to live• Revolutionary tools now available• Google, Facebook, Yahoo, IBM started• Open source community advances• HDFS, Map Reduce, Pig, Hive, JAQL, …• Inexpensive, networked infrastructure availableIt is all about technology, baby.
Where are we coming from?• Relational databases are the norm• Stored after analysis and transformation• Optimized for predicted retrieval• Best for well-understood, highly structured data• Only works for 20% of our dataWhen it works, it works really well.
Where we’re going – Data at Rest• Data stored in original format• Divide and conquer to process• Best for massive, poorly structured data• Supplements relational database toolsThink “batch processing”.
Where we’re going – Data in Motion• Data that you never write down• Network traﬃc, sensor data, phone calls• Data that never stops• Processing is done in real time• Processing is done in memory• Tools are less numerous• IBM StreamsThink “watch a stream ﬂow by”.
Where are we now?• Ecosystem of supporting tools well formed• Thanks Google, FB, Yahoo, IBM• Thanks Open Source Community• Tool sets oﬀered as premium aggregations• IBM Big Insights• Cloud infrastructure economical & available• ExpedientTools are ready for the craftsman.
What are the Tools?• Distributed File System• Distributed Map Reduce Runtime• Jaql, Pig, Hive, Oozie, Hbase, R and othersFind a knowledgeable craftsman.
What Makes the Tools Diﬀerent?First and foremost - the run-time environment• Massively distributed• Redundant• Anticipates failure• Runs on commodity servers & operating systems
What else?Divide and conquer on a massive scale• Break data into smaller chunks (map)• Execute on chunks in parallel• Execute code as close to the data as possible• Execute multiple instances simultaneously• Work with name-value pairs (tuples)• Assemble comprehensive answer (reduce)
The Challenge• New way of thinking about data• Everything is valuable data• New way of thinking about processing data• No normalization, no relationships• Program extracts attribute and forms tuple• Tuples consolidated and reduced• Integration focus more on external sources, less DWs• New tools and approaches• Lots of specialized tools community-managed• Technology adoption curve progressing rapidly
Meeting the Challenge• Embrace the opportunity/inevitability• Consider your place on the adoption curve• Eﬀectively, Eﬃciently, Intelligently:• Experiment with technologies• Prove concepts valuable to organization• Prototype high value applications for quick wins• Enable staﬀ & organization• Make a practical plan based on experienceNow is the time for leadership.
Big Data is a Big Deal for Business• Bigger deal for CEO than for IT• CEO singles look better than IT home runs• Better CEO drags IT than IT push CEO• You will need money• You will need help keeping the faithDig where gold has been found.
Now it’s time for beer.Top 5 Projectsfor Big Data
His little black book isconsidered Big Data.Think global. Drink local.
Proving the Value• GM/CEO needs to be in front of IT• Think POV, not POC• Get rid of the engineering mindset• Stop thinking about speciﬁc tools – for now• Sell the story without mentioning the toolsYou still need the tools!
Top 5 Big Data Projects:The Categories1. Know Your Customer2. Secure Cyber Assets3. Optimize Operations4. Expand Data Warehouse5. Explore & Discover
Top 5 Big Data Projects:1. Know Your Customer.• Social Media• Measure and track customer sentiment• Real-time customer engagement• Real-time selling• Customer proﬁling• Recent transactions• Call center and web site activity• Rate likelihood of defectionT-mobile cut defections by 50% in one quarter.
Top 5 Big Data Projects:2. Secure Cyber Assets• Analyze• Logs to inform security policies• Network traﬃc to identify outliers & patterns• Enforce in real-time• Data in motion solution
Top 5 Big Data Projects:3. Optimize Operations• Predict equipment failures• Just-in-time maintenance• Identify sources of ineﬃciencies
Top 5 Big Data Projects:4. Expand Data Warehouse• Customer proﬁle (email/doc/call contents)• Predicted behavior (man/machine/process)• Market segmentation
Top 5 Big Data Projects:5. Explore and Discover• Cost of new customer• Cost of a new product• Eﬃcacy of treatment• Predictive analytics• Data science analysis
Moving Forward• Pick your team• Call your shot• Assemble your tools• Prove the value (and your good judgment)• Plot your course
Why Partner?• What does a Strategic Partnership look like?• What is the role of a data scientist?
Tools• The tools are great, but…• Owning a Hadoop cluster doesn’t make a Big Datapractice• Just like owning a reporting tool doesn’t mean youhave a strong Business Intelligence initiative• …it takes strategy and experts
Scenario A• Retrained DBA or Developer• Cost Model• Looking for Theta with Linear Regression• Local Minimum problems• Lots of Iteration• Even in a matrix / vector world, may iterate
Scenario B• Data Scientist• Linear Algebra solving data in chunks• Reducing by multiple hundred iterations to one• Use of proper data structures to leverage matrix / vectoroperations• MIMD vs SIMD on the CPU• Again, large cycles of optimization• No local minimum problems
Hats to Wear.• Algorithms in Context• Linear Algebra• Data Structures• CPU Architecture• rare in the modern business app developer• Concurrency Issues• Cost Modeling• Data Visualization
Why a Partner?Multiple discipline jobs are hard, large barriers to entry• Even with high market rates, supply can’t keep up• Analogous to large ERP talent• Retaining this talent is hard• Particularly when under-utilized• Rather than keeping that skill sharp artiﬁcially, anoutsourced data scientist is keeping sharp with realsolutions
…in shortYou don’t keep a trial attorney around full-time for the fewtimes you may need them.Why keep a data scientist full-time?
Is there a role for internal?• Tweaks to Map/Reduce jobs• Debugging• Reporting• Integrating new sources• Hardware / Infrastructure• Pilots
Explorer. Navigator. Guide• Reducing risk of failure• Working with your team• Identifying initial projects• Selecting best tools• Creating a strategic adoption roadmap• Avoiding common pitfalls• Taking you beyond the initial phase
Q&ADoug Denton Tim HoolihanBig Data Practice Lead CTO, Dir. Of Strategic Servicesm) 440.478.6003 m) 330.338.1532e) firstname.lastname@example.org e) email@example.com