Level Seven - Expedient Big Data presentation


Published on

Michael DeAolia, Tim Hoolihan and I gave this presentation as a joint Level Seven - Expedient event May 30th.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include:-High potential for disruption to IT or the biz-Need for a major dollar investment-Risk being late to adopt
  • Level Seven - Expedient Big Data presentation

    1. 1. IntroductionNavigate. Doug DentonBig Data Practice LeadLevel SevenGuide. Tim HoolihanCTO, Dir. Of Strategic ServicesLevel SevenExplore. Michael DeAloiaRegional Vice PresidentExpedient Data Centers
    2. 2. Agenda3:00 – Welcome & Introductions3:05 - Explore the concept of Big Data3:30 - Navigate through initial projects4:00 – Beer break4:15 – Back to work4:30 - Guide to full company adoption5:00 – QA and more beer (tour departs)
    3. 3. Explorethe concept
    4. 4. “BIG  DATA  DREAMS”  Michael C. DeAloiaRegional Vice President - ClevelandExplore. Navigate. Guide.
    5. 5. Founded in 2001Cleveland Ohio12 Year Tenure InTechnologyInfrastructureServicesScalable PlatformDesignPetabytes of Storage,100s of Terabytes ofMemory in our CloudThousands ofCustomers2x Growth Y/Y inCloud ServicesBIGDATADREAMS::THEEXPEDIENTECOLOGY
    6. 6. 7Cloud & MSEnterprise CloudManaged OS & AppStorage & BackupNetwork ServicesBIGDATADREAMS::THEEXPEDIENTECOLOGY
    7. 7. 1 Boston2 Baltimore2 Cleveland1 Columbus1 Indianapolis2 Pittsburgh10GbEInterconnectBIGDATADREAMS::THEEXPEDIENTECOLOGY
    8. 8. What is Big Data?History of Big Data8 Laws of Big DataQ&ABig Data by the NumbersBIGDATADREAMS::ROADMAP
    9. 9. What is Big Data?Gartner has defined ‘Big Data’ asa Strategic Technology for 2013.BIGDATADREAMS::WHATISBIGDATA
    10. 10. What is Big Data?• “Big  Data  Dreams”  11  Big Data /bɪɡ dātə/ n. A collection ofdata sets so large and complex that itbecomes difficult to process using on-handdatabase management tools or traditionaldata processing applications. %%Big Data challenges include capture,curation, storage, search, sharing,transfer, analysis and visualization. %BIGDATADREAMS::WHATISBIGDATA
    11. 11. What is Big Data?• “Big  Data  Dreams”  12  The three Vs characterize what big data is all about, and alsohelp define the major issues that IT needs to address:•  Volume The massive scale and growth of unstructureddata outstrips traditional storage and analytical solutions.•  Variety Traditional data management processes can’tcope with the heterogeneity of big data—or “shadow” or“dark data,” such as access traces and Web searchhistories.•  Velocity Data is generated in real time, with demands forusable information to be served up immediately.BIGDATADREAMS::WHATISBIGDATA
    12. 12. What is Big Data?• “Big  Data  Dreams”  13  “Big Data is the new oil.”-Bryan Trogdonas quotedin ‘The Future of Big Data’Pew Research SurveyBIGDATADREAMS::WHATISBIGDATA
    13. 13. What is Big Data?• “Big  Data  Dreams”  •  A  technology-­‐enabled  strategy  for  gaining  richer,  deeper  insights  into  customers,  partners,  and  the  business—and  ulEmately  gaining  compeEEve  advantage.  •  Working  with  data  sets  whose  size  and  variety  is  beyond  the  ability  of  typical  database  soLware  to  capture,  store,  manage,  and  analyze.  •  Processing  a  steady  stream  of  real-­‐Eme  data  in  order  to  make  Eme-­‐sensiEve  decisions  faster  than  ever  before.  •  Distributed  in  nature.  AnalyEcs  processing  goes  to  where  the  data  is  for  greater  speed  and  efficiency.  •  A  new  paradigm  in  which  IT  collaborates  with  business  users  and  “data  scienEsts”  to  idenEfy  and  implement  analyEcs  that  will  increase  operaEonal  efficiency  and  solve  new  business  problems.  •  Moving  decision  making  down  in  the  organizaEon  and  empowering  people  to  make  beOer,  faster  decisions  in  real  Eme.  •  Just  about  technology.  At  the  business  level,  it’s  about  how  to  exploit  the  vastly  enhanced  sources  of  data  to  gain  insight.  •  Only  about  volume.  It’s  also  about  variety  and  velocity.  But  perhaps  most  important,  it’s  about  value  derived  from  the  data.  •  Generated  or  used  only  by  huge  online  companies  like  Google  or  Amazon  anymore.  While  Internet  companies  may  have  pioneered  the  use  of  big  data  at  web  scale,  applicaEons  touch  every  industry.  •  About  “one-­‐size-­‐fits-­‐all”  tradiEonal  relaEonal  databases  built  on  shared  disk  and  memory  architecture.  Big  data  uses  a  grid  of  compuEng  resources  for  massively  parallel  processing  (MPP).  •  Meant  to  replace  relaEonal  databases  or  the  data  warehouse.  Structured  data  conEnues  to  be  criEcally  important.  However,  tradiEonal  systems  may  not  be  suitable  for  the  new  sources  and  contexts  of  big  data.  Big Data Analytics IS: Big Data Analytics IS NOT:BIGDATADREAMS::WHATISBIGDATA
    14. 14. What is Big Data?• “Big  Data  Dreams”  “Every two days now we create as muchinformation as we did from the dawn ofcivilization up until 2003. That’s somethinglike five exabytes of data”- Erik Schmidt, CEOGoogle“By 2015 the digital universe is expected toreach 8 zettabytes.”- IntelBIGDATADREAMS::WHATISBIGDATA
    15. 15. 16  1 zettabyte = 18 million copies of the Library of CongressBIGDATADREAMS::WHATISBIGDATA
    16. 16. A new kind of professional is helping organizations makesense of the massive streams of digital information: the datascientist. Data scientists are responsible for modeling complexbusiness problems, discovering business insights, andidentifying opportunities.They bring to the job:•  Skills for integrating and preparing large, varied data sets•  Advanced analytics and modeling skills to reveal andunderstand hidden relationships•  Business knowledge to apply context•  Communication skills to present resultsWho works Big Data?BIGDATADREAMS::WHATISBIGDATA
    33. 33. 34  More sources and more devices• Mobile• Pictures• Video• SMS• GPS• Social Media• Facebook• Twitter• Youtube• Reviews• Automated Sources• RFID• Telemetry• Security camerasReal-time correlation ofdata can be turned intogolden nuggets ofinformation.BIGDATADREAMS::BYTHENUMBERS
    34. 34. 35  Big Data Law #1The Faster You Analyze Your Data, theGreater its Predictive Power.BIGDATADREAMS::THE8LAWSOFBIGDATAGreat  list  developed  by  Dave  Feinleib  –  Managing  Director  of  Big  Data  Group.  
    35. 35. 36  Big Data Law #2Maintain one copy of your data, notdozens.BIGDATADREAMS::THE8LAWSOFBIGDATA
    36. 36. 37  Big Data Law #3Use more diverse data, not just moredata.BIGDATADREAMS::THE8LAWSOFBIGDATA
    37. 37. 38  Big Data Law #4Data has value far beyond what youoriginally anticipate.BIGDATADREAMS::THE8LAWSOFBIGDATA
    38. 38. 39  Big Data Law #5Plan for Exponential GrowthBIGDATADREAMS::THE8LAWSOFBIGDATA
    39. 39. 40  Big Data Law #6Solve a real pain point.BIGDATADREAMS::THE8LAWSOFBIGDATA
    40. 40. 41  Big Data Law #7Put data and humans together to getmore insight.BIGDATADREAMS::THE8LAWSOFBIGDATA
    41. 41. 42  Big Data Law #8Big Data is transforming business thesame way IT did.BIGDATADREAMS::THE8LAWSOFBIGDATA
    42. 42. 43  Q&AMichael C. DeAloiaRegional Vice PresidentExpedient Data Centersm) 216.212.4067e) michael.dealoia@expedient.comBIGDATADREAMS::QUESTIONS&ANSWERS
    43. 43. Charting the Course to Big Data Implementation.Doug Denton Tim HoolihanBig Data Practice Lead CTO, Dir. of Strategic ServicesExplore. Navigate. Guide.
    44. 44. What’s Different About Big Data?•  Data that IT historically ignores•  Too much, too fast, too dirty to handle•  Represents 80% of all data•  Very different way of thinking about data•  Very different way of processing data•  A VERY BIG DEALYou were blind, but now you see.
    45. 45. Why Now?•  Pretending 80% of data did not exist is OK•  Not really, numb & blind is no way to live•  Revolutionary tools now available•  Google, Facebook, Yahoo, IBM started•  Open source community advances•  HDFS, Map Reduce, Pig, Hive, JAQL, …•  Inexpensive, networked infrastructure availableIt is all about technology, baby.
    46. 46. Where are we coming from?•  Relational databases are the norm•  Stored after analysis and transformation•  Optimized for predicted retrieval•  Best for well-understood, highly structured data•  Only works for 20% of our dataWhen it works, it works really well.
    47. 47. Where we’re going – Data at Rest•  Data stored in original format•  Divide and conquer to process•  Best for massive, poorly structured data•  Supplements relational database toolsThink “batch processing”.
    48. 48. Where we’re going – Data in Motion•  Data that you never write down•  Network traffic, sensor data, phone calls•  Data that never stops•  Processing is done in real time•  Processing is done in memory•  Tools are less numerous•  IBM StreamsThink “watch a stream flow by”.
    49. 49. Where are we now?•  Ecosystem of supporting tools well formed•  Thanks Google, FB, Yahoo, IBM•  Thanks Open Source Community•  Tool sets offered as premium aggregations•  IBM Big Insights•  Cloud infrastructure economical & available•  ExpedientTools are ready for the craftsman.
    50. 50. What are the Tools?•  Distributed File System•  Distributed Map Reduce Runtime•  Jaql, Pig, Hive, Oozie, Hbase, R and othersFind a knowledgeable craftsman.
    51. 51. What Makes the Tools Different?First and foremost - the run-time environment•  Massively distributed•  Redundant•  Anticipates failure•  Runs on commodity servers & operating systems
    52. 52. What else?Divide and conquer on a massive scale•  Break data into smaller chunks (map)•  Execute on chunks in parallel•  Execute code as close to the data as possible•  Execute multiple instances simultaneously•  Work with name-value pairs (tuples)•  Assemble comprehensive answer (reduce)
    53. 53. Technology Adoption Curve
    54. 54. The Challenge•  New way of thinking about data•  Everything is valuable data•  New way of thinking about processing data•  No normalization, no relationships•  Program extracts attribute and forms tuple•  Tuples consolidated and reduced•  Integration focus more on external sources, less DWs•  New tools and approaches•  Lots of specialized tools community-managed•  Technology adoption curve progressing rapidly
    55. 55. Meeting the Challenge•  Embrace the opportunity/inevitability•  Consider your place on the adoption curve•  Effectively, Efficiently, Intelligently:•  Experiment with technologies•  Prove concepts valuable to organization•  Prototype high value applications for quick wins•  Enable staff & organization•  Make a practical plan based on experienceNow is the time for leadership.
    56. 56. Big Data is a Big Deal for Business•  Bigger deal for CEO than for IT•  CEO singles look better than IT home runs•  Better CEO drags IT than IT push CEO•  You will need money•  You will need help keeping the faithDig where gold has been found.
    57. 57. Now it’s time for beer.Top 5 Projectsfor Big Data
    58. 58. His little black book isconsidered Big Data.Think global. Drink local.
    59. 59. Navigateinitial projects
    60. 60. Proving the Value•  GM/CEO needs to be in front of IT•  Think POV, not POC•  Get rid of the engineering mindset•  Stop thinking about specific tools – for now•  Sell the story without mentioning the toolsYou still need the tools!
    61. 61. Top 5 Big Data Projects:The Categories1.  Know Your Customer2.  Secure Cyber Assets3.  Optimize Operations4.  Expand Data Warehouse5.  Explore & Discover
    62. 62. Top 5 Big Data Projects:1. Know Your Customer.•  Social Media•  Measure and track customer sentiment•  Real-time customer engagement•  Real-time selling•  Customer profiling•  Recent transactions•  Call center and web site activity•  Rate likelihood of defectionT-mobile cut defections by 50% in one quarter.
    63. 63. Top 5 Big Data Projects:2. Secure Cyber Assets•  Analyze•  Logs to inform security policies•  Network traffic to identify outliers & patterns•  Enforce in real-time•  Data in motion solution
    64. 64. Top 5 Big Data Projects:3. Optimize Operations•  Predict equipment failures•  Just-in-time maintenance•  Identify sources of inefficiencies
    65. 65. Top 5 Big Data Projects:4. Expand Data Warehouse•  Customer profile (email/doc/call contents)•  Predicted behavior (man/machine/process)•  Market segmentation
    66. 66. Top 5 Big Data Projects:5. Explore and Discover•  Cost of new customer•  Cost of a new product•  Efficacy of treatment•  Predictive analytics•  Data science analysis
    67. 67. Moving Forward•  Pick your team•  Call your shot•  Assemble your tools•  Prove the value (and your good judgment)•  Plot your course
    68. 68. Guidethrough to complete adoption
    69. 69. Why Partner?•  What does a Strategic Partnership look like?•  What is the role of a data scientist?
    70. 70. Tools•  The tools are great, but…•  Owning a Hadoop cluster doesn’t make a Big Datapractice•  Just like owning a reporting tool doesn’t mean youhave a strong Business Intelligence initiative•  …it takes strategy and experts
    71. 71. Scenario A•  Retrained DBA or Developer•  Cost Model•  Looking for Theta with Linear Regression•  Local Minimum problems•  Lots of Iteration•  Even in a matrix / vector world, may iterate
    72. 72. Scenario B•  Data Scientist•  Linear Algebra solving data in chunks•  Reducing by multiple hundred iterations to one•  Use of proper data structures to leverage matrix / vectoroperations•  MIMD vs SIMD on the CPU•  Again, large cycles of optimization•  No local minimum problems
    73. 73. Hats to Wear.•  Algorithms in Context•  Linear Algebra•  Data Structures•  CPU Architecture•  rare in the modern business app developer•  Concurrency Issues•  Cost Modeling•  Data Visualization
    74. 74. Why a Partner?Multiple discipline jobs are hard, large barriers to entry•  Even with high market rates, supply can’t keep up•  Analogous to large ERP talent•  Retaining this talent is hard•  Particularly when under-utilized•  Rather than keeping that skill sharp artificially, anoutsourced data scientist is keeping sharp with realsolutions
    75. 75. …in shortYou don’t keep a trial attorney around full-time for the fewtimes you may need them.Why keep a data scientist full-time?
    76. 76. Is there a role for internal?•  Tweaks to Map/Reduce jobs•  Debugging•  Reporting•  Integrating new sources•  Hardware / Infrastructure•  Pilots
    77. 77. Explorer. Navigator. Guide•  Reducing risk of failure•  Working with your team•  Identifying initial projects•  Selecting best tools•  Creating a strategic adoption roadmap•  Avoiding common pitfalls•  Taking you beyond the initial phase
    78. 78. Q&ADoug Denton Tim HoolihanBig Data Practice Lead CTO, Dir. Of Strategic Servicesm) 440.478.6003 m) 330.338.1532e) doug.denton@lvlsvn.com e) tim.hoolihan@lvlsvn.com