Introduction
Navigate. Doug Denton
Big Data Practice Lead
Level Seven
Guide. Tim Hoolihan
CTO, Dir. Of Strategic Services
Level Seven
Explore. Michael DeAloia
Regional Vice President
Expedient Data Centers
Agenda
3:00 – Welcome & Introductions
3:05 - Explore the concept of Big Data
3:30 - Navigate through initial projects
4:00 – Beer break
4:15 – Back to work
4:30 - Guide to full company adoption
5:00 – QA and more beer (tour departs)
Explore
the concept
“BIG	
  DATA	
  DREAMS”	
  
Michael C. DeAloia
Regional Vice President - Cleveland
Explore. Navigate. Guide.
Founded in 2001
Cleveland Ohio
12 Year Tenure In
Technology
Infrastructure
Services
Scalable Platform
Design
Petabytes of Storage,
100s of Terabytes of
Memory in our Cloud
Thousands of
Customers
2x Growth Y/Y in
Cloud Services
BIGDATADREAMS::THEEXPEDIENTECOLOGY
7
Cloud & MS
Enterprise Cloud
Managed OS & App
Storage & Backup
Network Services
BIGDATADREAMS::THEEXPEDIENTECOLOGY
1 Boston
2 Baltimore
2 Cleveland
1 Columbus
1 Indianapolis
2 Pittsburgh
10GbE
Interconnect
BIGDATADREAMS::THEEXPEDIENTECOLOGY
What is Big Data?
History of Big Data
8 Laws of Big Data
Q&A
Big Data by the Numbers
BIGDATADREAMS::ROADMAP
What is Big Data?
Gartner has defined ‘Big Data’ as
a Strategic Technology for 2013.
BIGDATADREAMS::WHATISBIGDATA
What is Big Data?
• “Big	
  Data	
  Dreams”	
  
11	
  
Big Data /bɪɡ dā'tə/ n. A collection of
data sets so large and complex that it
becomes difficult to process using on-hand
database management tools or traditional
data processing applications. %
%
Big Data challenges include capture,
curation, storage, search, sharing,
transfer, analysis and visualization. %
BIGDATADREAMS::WHATISBIGDATA
What is Big Data?
• “Big	
  Data	
  Dreams”	
  
12	
  
The three Vs characterize what big data is all about, and also
help define the major issues that IT needs to address:
•  Volume The massive scale and growth of unstructured
data outstrips traditional storage and analytical solutions.
•  Variety Traditional data management processes can’t
cope with the heterogeneity of big data—or “shadow” or
“dark data,” such as access traces and Web search
histories.
•  Velocity Data is generated in real time, with demands for
usable information to be served up immediately.
BIGDATADREAMS::WHATISBIGDATA
What is Big Data?
• “Big	
  Data	
  Dreams”	
  
13	
  
“Big Data is the new oil.”
-
Bryan Trogdon
as quoted
in ‘The Future of Big Data’
Pew Research Survey
BIGDATADREAMS::WHATISBIGDATA
What is Big Data?
• “Big	
  Data	
  Dreams”	
  
•  A	
  technology-­‐enabled	
  strategy	
  for	
  gaining	
  richer,	
  
deeper	
  insights	
  into	
  customers,	
  partners,	
  and	
  the	
  
business—and	
  ulEmately	
  gaining	
  compeEEve	
  
advantage.	
  
•  Working	
  with	
  data	
  sets	
  whose	
  size	
  and	
  variety	
  is	
  
beyond	
  the	
  ability	
  of	
  typical	
  database	
  soLware	
  to	
  
capture,	
  store,	
  manage,	
  and	
  analyze.	
  
•  Processing	
  a	
  steady	
  stream	
  of	
  real-­‐Eme	
  data	
  in	
  
order	
  to	
  make	
  Eme-­‐sensiEve	
  decisions	
  faster	
  than	
  
ever	
  before.	
  
•  Distributed	
  in	
  nature.	
  AnalyEcs	
  processing	
  goes	
  to	
  
where	
  the	
  data	
  is	
  for	
  greater	
  speed	
  and	
  efficiency.	
  
•  A	
  new	
  paradigm	
  in	
  which	
  IT	
  collaborates	
  with	
  
business	
  users	
  and	
  “data	
  scienEsts”	
  to	
  idenEfy	
  and	
  
implement	
  analyEcs	
  that	
  will	
  increase	
  operaEonal	
  
efficiency	
  and	
  solve	
  new	
  business	
  problems.	
  
•  Moving	
  decision	
  making	
  down	
  in	
  the	
  organizaEon	
  
and	
  empowering	
  people	
  to	
  make	
  beOer,	
  faster	
  
decisions	
  in	
  real	
  Eme.	
  
•  Just	
  about	
  technology.	
  At	
  the	
  business	
  level,	
  it’s	
  
about	
  how	
  to	
  exploit	
  the	
  vastly	
  enhanced	
  sources	
  of	
  
data	
  to	
  gain	
  insight.	
  
•  Only	
  about	
  volume.	
  It’s	
  also	
  about	
  variety	
  and	
  
velocity.	
  But	
  perhaps	
  most	
  important,	
  it’s	
  about	
  value	
  
derived	
  from	
  the	
  data.	
  
•  Generated	
  or	
  used	
  only	
  by	
  huge	
  online	
  companies	
  like	
  
Google	
  or	
  Amazon	
  anymore.	
  While	
  Internet	
  
companies	
  may	
  have	
  pioneered	
  the	
  use	
  of	
  big	
  data	
  at	
  
web	
  scale,	
  applicaEons	
  touch	
  every	
  industry.	
  
•  About	
  “one-­‐size-­‐fits-­‐all”	
  tradiEonal	
  relaEonal	
  
databases	
  built	
  on	
  shared	
  disk	
  and	
  memory	
  
architecture.	
  Big	
  data	
  uses	
  a	
  grid	
  of	
  compuEng	
  
resources	
  for	
  massively	
  parallel	
  processing	
  (MPP).	
  
•  Meant	
  to	
  replace	
  relaEonal	
  databases	
  or	
  the	
  data	
  
warehouse.	
  Structured	
  data	
  conEnues	
  to	
  be	
  criEcally	
  
important.	
  However,	
  tradiEonal	
  systems	
  may	
  not	
  be	
  
suitable	
  for	
  the	
  new	
  sources	
  and	
  contexts	
  of	
  big	
  data.	
  
Big Data Analytics IS: Big Data Analytics IS NOT:
BIGDATADREAMS::WHATISBIGDATA
What is Big Data?
• “Big	
  Data	
  Dreams”	
  
“Every two days now we create as much
information as we did from the dawn of
civilization up until 2003. That’s something
like five exabytes of data”
- Erik Schmidt, CEO
Google
“By 2015 the digital universe is expected to
reach 8 zettabytes.”
- Intel
BIGDATADREAMS::WHATISBIGDATA
16	
  
1 zettabyte = 18 million copies of the Library of Congress
BIGDATADREAMS::WHATISBIGDATA
A new kind of professional is helping organizations make
sense of the massive streams of digital information: the data
scientist. Data scientists are responsible for modeling complex
business problems, discovering business insights, and
identifying opportunities.
They bring to the job:
•  Skills for integrating and preparing large, varied data sets
•  Advanced analytics and modeling skills to reveal and
understand hidden relationships
•  Business knowledge to apply context
•  Communication skills to present results
Who works Big Data?
BIGDATADREAMS::WHATISBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::HISTORYOFBIGDATA
BIGDATADREAMS::BYTHENUMBERS
BIGDATADREAMS::BYTHENUMBERS
34	
  
More sources and more devices
• Mobile
• Pictures
• Video
• SMS
• GPS
• Social Media
• Facebook
• Twitter
• Youtube
• Reviews
• Automated Sources
• RFID
• Telemetry
• Security cameras
Real-time correlation of
data can be turned into
golden nuggets of
information.
BIGDATADREAMS::BYTHENUMBERS
35	
  
Big Data Law #1
The Faster You Analyze Your Data, the
Greater its Predictive Power.
BIGDATADREAMS::THE8LAWSOFBIGDATA
Great	
  list	
  developed	
  by	
  Dave	
  Feinleib	
  –	
  Managing	
  Director	
  of	
  Big	
  Data	
  Group.	
  
36	
  
Big Data Law #2
Maintain one copy of your data, not
dozens.
BIGDATADREAMS::THE8LAWSOFBIGDATA
37	
  
Big Data Law #3
Use more diverse data, not just more
data.
BIGDATADREAMS::THE8LAWSOFBIGDATA
38	
  
Big Data Law #4
Data has value far beyond what you
originally anticipate.
BIGDATADREAMS::THE8LAWSOFBIGDATA
39	
  
Big Data Law #5
Plan for Exponential Growth
BIGDATADREAMS::THE8LAWSOFBIGDATA
40	
  
Big Data Law #6
Solve a real pain point.
BIGDATADREAMS::THE8LAWSOFBIGDATA
41	
  
Big Data Law #7
Put data and humans together to get
more insight.
BIGDATADREAMS::THE8LAWSOFBIGDATA
42	
  
Big Data Law #8
Big Data is transforming business the
same way IT did.
BIGDATADREAMS::THE8LAWSOFBIGDATA
43	
  
Q&A
Michael C. DeAloia
Regional Vice President
Expedient Data Centers
m) 216.212.4067
e) michael.dealoia@expedient.com
BIGDATADREAMS::QUESTIONS&ANSWERS
Charting the Course to Big Data Implementation.
Doug Denton Tim Hoolihan
Big Data Practice Lead CTO, Dir. of Strategic Services
Explore. Navigate. Guide.
What’s Different About Big Data?
•  Data that IT historically ignores
•  Too much, too fast, too dirty to handle
•  Represents 80% of all data
•  Very different way of thinking about data
•  Very different way of processing data
•  A VERY BIG DEAL
You were blind, but now you see.
Why Now?
•  Pretending 80% of data did not exist is OK
•  Not really, numb & blind is no way to live
•  Revolutionary tools now available
•  Google, Facebook, Yahoo, IBM started
•  Open source community advances
•  HDFS, Map Reduce, Pig, Hive, JAQL, …
•  Inexpensive, networked infrastructure available
It is all about technology, baby.
Where are we coming from?
•  Relational databases are the norm
•  Stored after analysis and transformation
•  Optimized for predicted retrieval
•  Best for well-understood, highly structured data
•  Only works for 20% of our data
When it works, it works really well.
Where we’re going – Data at Rest
•  Data stored in original format
•  Divide and conquer to process
•  Best for massive, poorly structured data
•  Supplements relational database tools
Think “batch processing”.
Where we’re going – Data in Motion
•  Data that you never write down
•  Network traffic, sensor data, phone calls
•  Data that never stops
•  Processing is done in real time
•  Processing is done in memory
•  Tools are less numerous
•  IBM Streams
Think “watch a stream flow by”.
Where are we now?
•  Ecosystem of supporting tools well formed
•  Thanks Google, FB, Yahoo, IBM
•  Thanks Open Source Community
•  Tool sets offered as premium aggregations
•  IBM Big Insights
•  Cloud infrastructure economical & available
•  Expedient
Tools are ready for the craftsman.
What are the Tools?
•  Distributed File System
•  Distributed Map Reduce Runtime
•  Jaql, Pig, Hive, Oozie, Hbase, R and others
Find a knowledgeable craftsman.
What Makes the Tools Different?
First and foremost - the run-time environment
•  Massively distributed
•  Redundant
•  Anticipates failure
•  Runs on commodity servers & operating systems
What else?
Divide and conquer on a massive scale
•  Break data into smaller chunks (map)
•  Execute on chunks in parallel
•  Execute code as close to the data as possible
•  Execute multiple instances simultaneously
•  Work with name-value pairs (tuples)
•  Assemble comprehensive answer (reduce)
Technology Adoption Curve
The Challenge
•  New way of thinking about data
•  Everything is valuable data
•  New way of thinking about processing data
•  No normalization, no relationships
•  Program extracts attribute and forms tuple
•  Tuples consolidated and reduced
•  Integration focus more on external sources, less DWs
•  New tools and approaches
•  Lots of specialized tools community-managed
•  Technology adoption curve progressing rapidly
Meeting the Challenge
•  Embrace the opportunity/inevitability
•  Consider your place on the adoption curve
•  Effectively, Efficiently, Intelligently:
•  Experiment with technologies
•  Prove concepts valuable to organization
•  Prototype high value applications for quick wins
•  Enable staff & organization
•  Make a practical plan based on experience
Now is the time for leadership.
Big Data is a Big Deal for Business
•  Bigger deal for CEO than for IT
•  CEO singles look better than IT home runs
•  Better CEO drags IT than IT push CEO
•  You will need money
•  You will need help keeping the faith
Dig where gold has been found.
Now it’s time for beer.
Top 5 Projects
for Big Data
His little black book is
considered Big Data.
Think global. Drink local.
Navigate
initial projects
Proving the Value
•  GM/CEO needs to be in front of IT
•  Think POV, not POC
•  Get rid of the engineering mindset
•  Stop thinking about specific tools – for now
•  Sell the story without mentioning the tools
You still need the tools!
Top 5 Big Data Projects:
The Categories
1.  Know Your Customer
2.  Secure Cyber Assets
3.  Optimize Operations
4.  Expand Data Warehouse
5.  Explore & Discover
Top 5 Big Data Projects:
1. Know Your Customer.
•  Social Media
•  Measure and track customer sentiment
•  Real-time customer engagement
•  Real-time selling
•  Customer profiling
•  Recent transactions
•  Call center and web site activity
•  Rate likelihood of defection
T-mobile cut defections by 50% in one quarter.
Top 5 Big Data Projects:
2. Secure Cyber Assets
•  Analyze
•  Logs to inform security policies
•  Network traffic to identify outliers & patterns
•  Enforce in real-time
•  Data in motion solution
Top 5 Big Data Projects:
3. Optimize Operations
•  Predict equipment failures
•  Just-in-time maintenance
•  Identify sources of inefficiencies
Top 5 Big Data Projects:
4. Expand Data Warehouse
•  Customer profile (email/doc/call contents)
•  Predicted behavior (man/machine/process)
•  Market segmentation
Top 5 Big Data Projects:
5. Explore and Discover
•  Cost of new customer
•  Cost of a new product
•  Efficacy of treatment
•  Predictive analytics
•  Data science analysis
Moving Forward
•  Pick your team
•  Call your shot
•  Assemble your tools
•  Prove the value (and your good judgment)
•  Plot your course
Guide
through to complete adoption
Why Partner?
•  What does a Strategic Partnership look like?
•  What is the role of a data scientist?
Tools
•  The tools are great, but…
•  Owning a Hadoop cluster doesn’t make a Big Data
practice
•  Just like owning a reporting tool doesn’t mean you
have a strong Business Intelligence initiative
•  …it takes strategy and experts
Scenario A
•  Retrained DBA or Developer
•  Cost Model
•  Looking for Theta with Linear Regression
•  Local Minimum problems
•  Lots of Iteration
•  Even in a matrix / vector world, may iterate
Scenario B
•  Data Scientist
•  Linear Algebra solving data in chunks
•  Reducing by multiple hundred iterations to one
•  Use of proper data structures to leverage matrix / vector
operations
•  MIMD vs SIMD on the CPU
•  Again, large cycles of optimization
•  No local minimum problems
Hats to Wear.
•  Algorithms in Context
•  Linear Algebra
•  Data Structures
•  CPU Architecture
•  rare in the modern business app developer
•  Concurrency Issues
•  Cost Modeling
•  Data Visualization
Why a Partner?
Multiple discipline jobs are hard, large barriers to entry
•  Even with high market rates, supply can’t keep up
•  Analogous to large ERP talent
•  Retaining this talent is hard
•  Particularly when under-utilized
•  Rather than keeping that skill sharp artificially, an
outsourced data scientist is keeping sharp with real
solutions
…in short
You don’t keep a trial attorney around full-time for the few
times you may need them.
Why keep a data scientist full-time?
Is there a role for internal?
•  Tweaks to Map/Reduce jobs
•  Debugging
•  Reporting
•  Integrating new sources
•  Hardware / Infrastructure
•  Pilots
Explorer. Navigator. Guide
•  Reducing risk of failure
•  Working with your team
•  Identifying initial projects
•  Selecting best tools
•  Creating a strategic adoption roadmap
•  Avoiding common pitfalls
•  Taking you beyond the initial phase
Q&A
Doug Denton Tim Hoolihan
Big Data Practice Lead CTO, Dir. Of Strategic Services
m) 440.478.6003 m) 330.338.1532
e) doug.denton@lvlsvn.com e) tim.hoolihan@lvlsvn.com

Level Seven - Expedient Big Data presentation

  • 2.
    Introduction Navigate. Doug Denton BigData Practice Lead Level Seven Guide. Tim Hoolihan CTO, Dir. Of Strategic Services Level Seven Explore. Michael DeAloia Regional Vice President Expedient Data Centers
  • 3.
    Agenda 3:00 – Welcome& Introductions 3:05 - Explore the concept of Big Data 3:30 - Navigate through initial projects 4:00 – Beer break 4:15 – Back to work 4:30 - Guide to full company adoption 5:00 – QA and more beer (tour departs)
  • 4.
  • 5.
    “BIG  DATA  DREAMS”   Michael C. DeAloia Regional Vice President - Cleveland Explore. Navigate. Guide.
  • 6.
    Founded in 2001 ClevelandOhio 12 Year Tenure In Technology Infrastructure Services Scalable Platform Design Petabytes of Storage, 100s of Terabytes of Memory in our Cloud Thousands of Customers 2x Growth Y/Y in Cloud Services BIGDATADREAMS::THEEXPEDIENTECOLOGY
  • 7.
    7 Cloud & MS EnterpriseCloud Managed OS & App Storage & Backup Network Services BIGDATADREAMS::THEEXPEDIENTECOLOGY
  • 8.
    1 Boston 2 Baltimore 2Cleveland 1 Columbus 1 Indianapolis 2 Pittsburgh 10GbE Interconnect BIGDATADREAMS::THEEXPEDIENTECOLOGY
  • 9.
    What is BigData? History of Big Data 8 Laws of Big Data Q&A Big Data by the Numbers BIGDATADREAMS::ROADMAP
  • 10.
    What is BigData? Gartner has defined ‘Big Data’ as a Strategic Technology for 2013. BIGDATADREAMS::WHATISBIGDATA
  • 11.
    What is BigData? • “Big  Data  Dreams”   11   Big Data /bɪɡ dā'tə/ n. A collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. % % Big Data challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. % BIGDATADREAMS::WHATISBIGDATA
  • 12.
    What is BigData? • “Big  Data  Dreams”   12   The three Vs characterize what big data is all about, and also help define the major issues that IT needs to address: •  Volume The massive scale and growth of unstructured data outstrips traditional storage and analytical solutions. •  Variety Traditional data management processes can’t cope with the heterogeneity of big data—or “shadow” or “dark data,” such as access traces and Web search histories. •  Velocity Data is generated in real time, with demands for usable information to be served up immediately. BIGDATADREAMS::WHATISBIGDATA
  • 13.
    What is BigData? • “Big  Data  Dreams”   13   “Big Data is the new oil.” - Bryan Trogdon as quoted in ‘The Future of Big Data’ Pew Research Survey BIGDATADREAMS::WHATISBIGDATA
  • 14.
    What is BigData? • “Big  Data  Dreams”   •  A  technology-­‐enabled  strategy  for  gaining  richer,   deeper  insights  into  customers,  partners,  and  the   business—and  ulEmately  gaining  compeEEve   advantage.   •  Working  with  data  sets  whose  size  and  variety  is   beyond  the  ability  of  typical  database  soLware  to   capture,  store,  manage,  and  analyze.   •  Processing  a  steady  stream  of  real-­‐Eme  data  in   order  to  make  Eme-­‐sensiEve  decisions  faster  than   ever  before.   •  Distributed  in  nature.  AnalyEcs  processing  goes  to   where  the  data  is  for  greater  speed  and  efficiency.   •  A  new  paradigm  in  which  IT  collaborates  with   business  users  and  “data  scienEsts”  to  idenEfy  and   implement  analyEcs  that  will  increase  operaEonal   efficiency  and  solve  new  business  problems.   •  Moving  decision  making  down  in  the  organizaEon   and  empowering  people  to  make  beOer,  faster   decisions  in  real  Eme.   •  Just  about  technology.  At  the  business  level,  it’s   about  how  to  exploit  the  vastly  enhanced  sources  of   data  to  gain  insight.   •  Only  about  volume.  It’s  also  about  variety  and   velocity.  But  perhaps  most  important,  it’s  about  value   derived  from  the  data.   •  Generated  or  used  only  by  huge  online  companies  like   Google  or  Amazon  anymore.  While  Internet   companies  may  have  pioneered  the  use  of  big  data  at   web  scale,  applicaEons  touch  every  industry.   •  About  “one-­‐size-­‐fits-­‐all”  tradiEonal  relaEonal   databases  built  on  shared  disk  and  memory   architecture.  Big  data  uses  a  grid  of  compuEng   resources  for  massively  parallel  processing  (MPP).   •  Meant  to  replace  relaEonal  databases  or  the  data   warehouse.  Structured  data  conEnues  to  be  criEcally   important.  However,  tradiEonal  systems  may  not  be   suitable  for  the  new  sources  and  contexts  of  big  data.   Big Data Analytics IS: Big Data Analytics IS NOT: BIGDATADREAMS::WHATISBIGDATA
  • 15.
    What is BigData? • “Big  Data  Dreams”   “Every two days now we create as much information as we did from the dawn of civilization up until 2003. That’s something like five exabytes of data” - Erik Schmidt, CEO Google “By 2015 the digital universe is expected to reach 8 zettabytes.” - Intel BIGDATADREAMS::WHATISBIGDATA
  • 16.
    16   1 zettabyte= 18 million copies of the Library of Congress BIGDATADREAMS::WHATISBIGDATA
  • 17.
    A new kindof professional is helping organizations make sense of the massive streams of digital information: the data scientist. Data scientists are responsible for modeling complex business problems, discovering business insights, and identifying opportunities. They bring to the job: •  Skills for integrating and preparing large, varied data sets •  Advanced analytics and modeling skills to reveal and understand hidden relationships •  Business knowledge to apply context •  Communication skills to present results Who works Big Data? BIGDATADREAMS::WHATISBIGDATA
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
    34   More sourcesand more devices • Mobile • Pictures • Video • SMS • GPS • Social Media • Facebook • Twitter • Youtube • Reviews • Automated Sources • RFID • Telemetry • Security cameras Real-time correlation of data can be turned into golden nuggets of information. BIGDATADREAMS::BYTHENUMBERS
  • 35.
    35   Big DataLaw #1 The Faster You Analyze Your Data, the Greater its Predictive Power. BIGDATADREAMS::THE8LAWSOFBIGDATA Great  list  developed  by  Dave  Feinleib  –  Managing  Director  of  Big  Data  Group.  
  • 36.
    36   Big DataLaw #2 Maintain one copy of your data, not dozens. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 37.
    37   Big DataLaw #3 Use more diverse data, not just more data. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 38.
    38   Big DataLaw #4 Data has value far beyond what you originally anticipate. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 39.
    39   Big DataLaw #5 Plan for Exponential Growth BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 40.
    40   Big DataLaw #6 Solve a real pain point. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 41.
    41   Big DataLaw #7 Put data and humans together to get more insight. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 42.
    42   Big DataLaw #8 Big Data is transforming business the same way IT did. BIGDATADREAMS::THE8LAWSOFBIGDATA
  • 43.
    43   Q&A Michael C.DeAloia Regional Vice President Expedient Data Centers m) 216.212.4067 e) michael.dealoia@expedient.com BIGDATADREAMS::QUESTIONS&ANSWERS
  • 44.
    Charting the Courseto Big Data Implementation. Doug Denton Tim Hoolihan Big Data Practice Lead CTO, Dir. of Strategic Services Explore. Navigate. Guide.
  • 45.
    What’s Different AboutBig Data? •  Data that IT historically ignores •  Too much, too fast, too dirty to handle •  Represents 80% of all data •  Very different way of thinking about data •  Very different way of processing data •  A VERY BIG DEAL You were blind, but now you see.
  • 46.
    Why Now? •  Pretending80% of data did not exist is OK •  Not really, numb & blind is no way to live •  Revolutionary tools now available •  Google, Facebook, Yahoo, IBM started •  Open source community advances •  HDFS, Map Reduce, Pig, Hive, JAQL, … •  Inexpensive, networked infrastructure available It is all about technology, baby.
  • 47.
    Where are wecoming from? •  Relational databases are the norm •  Stored after analysis and transformation •  Optimized for predicted retrieval •  Best for well-understood, highly structured data •  Only works for 20% of our data When it works, it works really well.
  • 48.
    Where we’re going– Data at Rest •  Data stored in original format •  Divide and conquer to process •  Best for massive, poorly structured data •  Supplements relational database tools Think “batch processing”.
  • 49.
    Where we’re going– Data in Motion •  Data that you never write down •  Network traffic, sensor data, phone calls •  Data that never stops •  Processing is done in real time •  Processing is done in memory •  Tools are less numerous •  IBM Streams Think “watch a stream flow by”.
  • 50.
    Where are wenow? •  Ecosystem of supporting tools well formed •  Thanks Google, FB, Yahoo, IBM •  Thanks Open Source Community •  Tool sets offered as premium aggregations •  IBM Big Insights •  Cloud infrastructure economical & available •  Expedient Tools are ready for the craftsman.
  • 51.
    What are theTools? •  Distributed File System •  Distributed Map Reduce Runtime •  Jaql, Pig, Hive, Oozie, Hbase, R and others Find a knowledgeable craftsman.
  • 52.
    What Makes theTools Different? First and foremost - the run-time environment •  Massively distributed •  Redundant •  Anticipates failure •  Runs on commodity servers & operating systems
  • 53.
    What else? Divide andconquer on a massive scale •  Break data into smaller chunks (map) •  Execute on chunks in parallel •  Execute code as close to the data as possible •  Execute multiple instances simultaneously •  Work with name-value pairs (tuples) •  Assemble comprehensive answer (reduce)
  • 54.
  • 55.
    The Challenge •  Newway of thinking about data •  Everything is valuable data •  New way of thinking about processing data •  No normalization, no relationships •  Program extracts attribute and forms tuple •  Tuples consolidated and reduced •  Integration focus more on external sources, less DWs •  New tools and approaches •  Lots of specialized tools community-managed •  Technology adoption curve progressing rapidly
  • 56.
    Meeting the Challenge • Embrace the opportunity/inevitability •  Consider your place on the adoption curve •  Effectively, Efficiently, Intelligently: •  Experiment with technologies •  Prove concepts valuable to organization •  Prototype high value applications for quick wins •  Enable staff & organization •  Make a practical plan based on experience Now is the time for leadership.
  • 57.
    Big Data isa Big Deal for Business •  Bigger deal for CEO than for IT •  CEO singles look better than IT home runs •  Better CEO drags IT than IT push CEO •  You will need money •  You will need help keeping the faith Dig where gold has been found.
  • 58.
    Now it’s timefor beer. Top 5 Projects for Big Data
  • 59.
    His little blackbook is considered Big Data. Think global. Drink local.
  • 60.
  • 61.
    Proving the Value • GM/CEO needs to be in front of IT •  Think POV, not POC •  Get rid of the engineering mindset •  Stop thinking about specific tools – for now •  Sell the story without mentioning the tools You still need the tools!
  • 62.
    Top 5 BigData Projects: The Categories 1.  Know Your Customer 2.  Secure Cyber Assets 3.  Optimize Operations 4.  Expand Data Warehouse 5.  Explore & Discover
  • 63.
    Top 5 BigData Projects: 1. Know Your Customer. •  Social Media •  Measure and track customer sentiment •  Real-time customer engagement •  Real-time selling •  Customer profiling •  Recent transactions •  Call center and web site activity •  Rate likelihood of defection T-mobile cut defections by 50% in one quarter.
  • 64.
    Top 5 BigData Projects: 2. Secure Cyber Assets •  Analyze •  Logs to inform security policies •  Network traffic to identify outliers & patterns •  Enforce in real-time •  Data in motion solution
  • 65.
    Top 5 BigData Projects: 3. Optimize Operations •  Predict equipment failures •  Just-in-time maintenance •  Identify sources of inefficiencies
  • 66.
    Top 5 BigData Projects: 4. Expand Data Warehouse •  Customer profile (email/doc/call contents) •  Predicted behavior (man/machine/process) •  Market segmentation
  • 67.
    Top 5 BigData Projects: 5. Explore and Discover •  Cost of new customer •  Cost of a new product •  Efficacy of treatment •  Predictive analytics •  Data science analysis
  • 68.
    Moving Forward •  Pickyour team •  Call your shot •  Assemble your tools •  Prove the value (and your good judgment) •  Plot your course
  • 69.
  • 70.
    Why Partner? •  Whatdoes a Strategic Partnership look like? •  What is the role of a data scientist?
  • 71.
    Tools •  The toolsare great, but… •  Owning a Hadoop cluster doesn’t make a Big Data practice •  Just like owning a reporting tool doesn’t mean you have a strong Business Intelligence initiative •  …it takes strategy and experts
  • 72.
    Scenario A •  RetrainedDBA or Developer •  Cost Model •  Looking for Theta with Linear Regression •  Local Minimum problems •  Lots of Iteration •  Even in a matrix / vector world, may iterate
  • 73.
    Scenario B •  DataScientist •  Linear Algebra solving data in chunks •  Reducing by multiple hundred iterations to one •  Use of proper data structures to leverage matrix / vector operations •  MIMD vs SIMD on the CPU •  Again, large cycles of optimization •  No local minimum problems
  • 74.
    Hats to Wear. • Algorithms in Context •  Linear Algebra •  Data Structures •  CPU Architecture •  rare in the modern business app developer •  Concurrency Issues •  Cost Modeling •  Data Visualization
  • 75.
    Why a Partner? Multiplediscipline jobs are hard, large barriers to entry •  Even with high market rates, supply can’t keep up •  Analogous to large ERP talent •  Retaining this talent is hard •  Particularly when under-utilized •  Rather than keeping that skill sharp artificially, an outsourced data scientist is keeping sharp with real solutions
  • 76.
    …in short You don’tkeep a trial attorney around full-time for the few times you may need them. Why keep a data scientist full-time?
  • 77.
    Is there arole for internal? •  Tweaks to Map/Reduce jobs •  Debugging •  Reporting •  Integrating new sources •  Hardware / Infrastructure •  Pilots
  • 78.
    Explorer. Navigator. Guide • Reducing risk of failure •  Working with your team •  Identifying initial projects •  Selecting best tools •  Creating a strategic adoption roadmap •  Avoiding common pitfalls •  Taking you beyond the initial phase
  • 79.
    Q&A Doug Denton TimHoolihan Big Data Practice Lead CTO, Dir. Of Strategic Services m) 440.478.6003 m) 330.338.1532 e) doug.denton@lvlsvn.com e) tim.hoolihan@lvlsvn.com

Editor's Notes

  • #11 Gartner defines a strategic technology as one with the potential for significant impact on the enterprise within the next 3 years. Factors that denote high potential for significant impact include:-High potential for disruption to IT or the biz-Need for a major dollar investment-Risk being late to adopt