Big Data vs. IT STKI
Summit
2013
IT at the crossroads:
Lead, follow or get out of the way
Pini Cohen Srouec: http://machinationsintomadness.com/good-vs-evil-how-about-good-vs-good/
Big Data Definition – 4 V’s (or more…)
• Volume – tens of TBs and more (15-20TB+)
• Velocity – the speed in which data is added – 10M items per hour and more.
And the speed in which the data needs to be processed
• Variety – different types of data – structured & unstructured. In many cases
deals with internet of things, social media, but also with voice, video, etc.
• Variability - able to cope with new attributes and changing data types –
without interrupting the analytical process (without “import-export”)
• Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source:
http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
The origins of the 3V’s:
• 2002 research by Doug Laney from META Group (now Gartner):
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
“Big Data” theme main current usage:
•“Big Data" is just marketing jargon. -Doug Laney,
Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Source:http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Big Data at work:
• Orbitz Worldwide has collected 750 terabytes of unstructured
data on their consumers’ behavior – detailed information from
customer online visits and browsing sessions. Using Hadoop,
models have been developed intended to improve search results
and tailor the user experience based on everything from
location, interest in family travel versus solo travel, and even the
kind of device being used to explore travel options.
• The result? To date, a 7% increase in interaction rate, 37%
growth in stickiness of sessions and a net 2.6% in booking path
engagement.
Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
DoD R&D prioritizes 'Big Data'
• The Pentagon has invested billions in a new generation of
electronic systems that gather and store vast quantities of
imagery and other data from the battlefield, and the digital
deluge is so vast that sifting through it manually to generate
actionable information is not a sustainable option, officials said.
• The Pentagon joined four other federal departments and
agencies at a White House event in late March to announce
$200 million in governmentwide big data research efforts.
Source: http://www.federalnewsradio.com/885/2824044/DoD-RD-prioritizes-Big-Data
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Technology: Elements  Concepts
• Storing data for analytics (mainly):
• HDFS – Hadoop File System
• Map Reduce- Programming method mainly for analytics
• Other “Add-on”: Pig, , Hive, JAQL (IBM)
• Storing and retrieving data - DBMS:
• NoSQL – DBMS (not only SQL):
• Cassandra
• MongoDB
• CouchDB
• Hbase
• Redis
• Neo4j
• Riak
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Big Data technologies (Hadoop etc.) vs. traditional IT - Infrastructure
Big DataTraditional IT
Local storageCentralized Storage
Cheap HW  White BoxesBrand redundant Servers
Is standardization needed?! (in the HW level). No
server virtualization.
Standard Infrastructure and virtual servers.
Why do I need backup? How do I tackle DRP (compute
clusters that are stretched over locations)
Well established backup and DRP procedures
Open Source solutionsTraditional vendors
In a new patch for specific issues sometimes it is
written “not implemented yet”
Mature products and procedures
Different kind of programming (map-reduce) , no JoinsTraditional programming, SQL
Will Big Data infrastructure be part of existing infrastructure or will be developed as new domain?
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Big Data technologies (Hadoop etc.) vs. traditional IT - BI
• Data Scientist vs. BI
• Data science incorporates varying elements and
builds on techniques and theories from many
fields, including math, statistics, data
engineering, pattern recognition and learning,
advanced computing, visualization, uncertainty
modeling, data warehousing, and high
performance computing with the goal of
extracting meaning from data and creating data
products. (WIKI)
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
What is the business value of big data analytics?
• Big data is now a technology looking for a business need
• It can mean doing the same thing but better / faster (better
segmentation, more accurate analysis model)
• Doing things Cheaper
• Or it can mean doing completely new things (telematics,
sentiment analysis, recommendation engine, matching
competition’s pricing in real time, being able to analyze data we
haven’t been able to analyze in the past)
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Decision making – old school vs. new school (big data)
• Old School:
• Phase 1 : Analyze existing data and prepare general model
• Phase 2: Apply the general model to specific client
• This means applying the same model for many clients when they arrive
• Issues with Old School decision making:
• Time gap between preparing and applying the model
• # of combinations might be too big for general model (example:
recommendation based in interest)
• The general model generated is biased towards “main stream” population
• New School (Big Data):
• Phase 1: Prepare specific model for the client and apply the model – instantly
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
You can’t run and you can’t hide – HadoopBig Data is coming
DBMS
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
13

5. big data vs it stki - pini cohen

  • 1.
    Big Data vs.IT STKI Summit 2013 IT at the crossroads: Lead, follow or get out of the way Pini Cohen Srouec: http://machinationsintomadness.com/good-vs-evil-how-about-good-vs-good/
  • 2.
    Big Data Definition– 4 V’s (or more…) • Volume – tens of TBs and more (15-20TB+) • Velocity – the speed in which data is added – 10M items per hour and more. And the speed in which the data needs to be processed • Variety – different types of data – structured & unstructured. In many cases deals with internet of things, social media, but also with voice, video, etc. • Variability - able to cope with new attributes and changing data types – without interrupting the analytical process (without “import-export”) • Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 3.
    The origins ofthe 3V’s: • 2002 research by Doug Laney from META Group (now Gartner): Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 4.
    “Big Data” thememain current usage: •“Big Data" is just marketing jargon. -Doug Laney, Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html Source:http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 5.
    Big Data atwork: • Orbitz Worldwide has collected 750 terabytes of unstructured data on their consumers’ behavior – detailed information from customer online visits and browsing sessions. Using Hadoop, models have been developed intended to improve search results and tailor the user experience based on everything from location, interest in family travel versus solo travel, and even the kind of device being used to explore travel options. • The result? To date, a 7% increase in interaction rate, 37% growth in stickiness of sessions and a net 2.6% in booking path engagement. Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 6.
    DoD R&D prioritizes'Big Data' • The Pentagon has invested billions in a new generation of electronic systems that gather and store vast quantities of imagery and other data from the battlefield, and the digital deluge is so vast that sifting through it manually to generate actionable information is not a sustainable option, officials said. • The Pentagon joined four other federal departments and agencies at a White House event in late March to announce $200 million in governmentwide big data research efforts. Source: http://www.federalnewsradio.com/885/2824044/DoD-RD-prioritizes-Big-Data Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 7.
    Technology: Elements Concepts • Storing data for analytics (mainly): • HDFS – Hadoop File System • Map Reduce- Programming method mainly for analytics • Other “Add-on”: Pig, , Hive, JAQL (IBM) • Storing and retrieving data - DBMS: • NoSQL – DBMS (not only SQL): • Cassandra • MongoDB • CouchDB • Hbase • Redis • Neo4j • Riak Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 8.
    Big Data technologies(Hadoop etc.) vs. traditional IT - Infrastructure Big DataTraditional IT Local storageCentralized Storage Cheap HW White BoxesBrand redundant Servers Is standardization needed?! (in the HW level). No server virtualization. Standard Infrastructure and virtual servers. Why do I need backup? How do I tackle DRP (compute clusters that are stretched over locations) Well established backup and DRP procedures Open Source solutionsTraditional vendors In a new patch for specific issues sometimes it is written “not implemented yet” Mature products and procedures Different kind of programming (map-reduce) , no JoinsTraditional programming, SQL Will Big Data infrastructure be part of existing infrastructure or will be developed as new domain? Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 9.
    Big Data technologies(Hadoop etc.) vs. traditional IT - BI • Data Scientist vs. BI • Data science incorporates varying elements and builds on techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. (WIKI) Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 10.
    What is thebusiness value of big data analytics? • Big data is now a technology looking for a business need • It can mean doing the same thing but better / faster (better segmentation, more accurate analysis model) • Doing things Cheaper • Or it can mean doing completely new things (telematics, sentiment analysis, recommendation engine, matching competition’s pricing in real time, being able to analyze data we haven’t been able to analyze in the past) Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 11.
    Decision making –old school vs. new school (big data) • Old School: • Phase 1 : Analyze existing data and prepare general model • Phase 2: Apply the general model to specific client • This means applying the same model for many clients when they arrive • Issues with Old School decision making: • Time gap between preparing and applying the model • # of combinations might be too big for general model (example: recommendation based in interest) • The general model generated is biased towards “main stream” population • New School (Big Data): • Phase 1: Prepare specific model for the client and apply the model – instantly Pini Cohen's work Copyright@2013 Do not remove source or attribution from any slide, graph or portion of graph
  • 12.
    Pini Cohen's workCopyright@2013 Do not remove source or attribution from any slide, graph or portion of graph You can’t run and you can’t hide – HadoopBig Data is coming DBMS
  • 13.
    Pini Cohen's workCopyright@2013 Do not remove source or attribution from any slide, graph or portion of graph 13