WSO2's API Vision: Unifying Control, Empowering Developers
5. big data vs it stki - pini cohen
1. Big Data vs. IT STKI
Summit
2013
IT at the crossroads:
Lead, follow or get out of the way
Pini Cohen Srouec: http://machinationsintomadness.com/good-vs-evil-how-about-good-vs-good/
2. Big Data Definition – 4 V’s (or more…)
• Volume – tens of TBs and more (15-20TB+)
• Velocity – the speed in which data is added – 10M items per hour and more.
And the speed in which the data needs to be processed
• Variety – different types of data – structured & unstructured. In many cases
deals with internet of things, social media, but also with voice, video, etc.
• Variability - able to cope with new attributes and changing data types –
without interrupting the analytical process (without “import-export”)
• Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source:
http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
3. The origins of the 3V’s:
• 2002 research by Doug Laney from META Group (now Gartner):
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
4. “Big Data” theme main current usage:
•“Big Data" is just marketing jargon. -Doug Laney,
Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Source:http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
5. Big Data at work:
• Orbitz Worldwide has collected 750 terabytes of unstructured
data on their consumers’ behavior – detailed information from
customer online visits and browsing sessions. Using Hadoop,
models have been developed intended to improve search results
and tailor the user experience based on everything from
location, interest in family travel versus solo travel, and even the
kind of device being used to explore travel options.
• The result? To date, a 7% increase in interaction rate, 37%
growth in stickiness of sessions and a net 2.6% in booking path
engagement.
Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
6. DoD R&D prioritizes 'Big Data'
• The Pentagon has invested billions in a new generation of
electronic systems that gather and store vast quantities of
imagery and other data from the battlefield, and the digital
deluge is so vast that sifting through it manually to generate
actionable information is not a sustainable option, officials said.
• The Pentagon joined four other federal departments and
agencies at a White House event in late March to announce
$200 million in governmentwide big data research efforts.
Source: http://www.federalnewsradio.com/885/2824044/DoD-RD-prioritizes-Big-Data
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
7. Technology: Elements Concepts
• Storing data for analytics (mainly):
• HDFS – Hadoop File System
• Map Reduce- Programming method mainly for analytics
• Other “Add-on”: Pig, , Hive, JAQL (IBM)
• Storing and retrieving data - DBMS:
• NoSQL – DBMS (not only SQL):
• Cassandra
• MongoDB
• CouchDB
• Hbase
• Redis
• Neo4j
• Riak
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
8. Big Data technologies (Hadoop etc.) vs. traditional IT - Infrastructure
Big DataTraditional IT
Local storageCentralized Storage
Cheap HW White BoxesBrand redundant Servers
Is standardization needed?! (in the HW level). No
server virtualization.
Standard Infrastructure and virtual servers.
Why do I need backup? How do I tackle DRP (compute
clusters that are stretched over locations)
Well established backup and DRP procedures
Open Source solutionsTraditional vendors
In a new patch for specific issues sometimes it is
written “not implemented yet”
Mature products and procedures
Different kind of programming (map-reduce) , no JoinsTraditional programming, SQL
Will Big Data infrastructure be part of existing infrastructure or will be developed as new domain?
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
9. Big Data technologies (Hadoop etc.) vs. traditional IT - BI
• Data Scientist vs. BI
• Data science incorporates varying elements and
builds on techniques and theories from many
fields, including math, statistics, data
engineering, pattern recognition and learning,
advanced computing, visualization, uncertainty
modeling, data warehousing, and high
performance computing with the goal of
extracting meaning from data and creating data
products. (WIKI)
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
10. What is the business value of big data analytics?
• Big data is now a technology looking for a business need
• It can mean doing the same thing but better / faster (better
segmentation, more accurate analysis model)
• Doing things Cheaper
• Or it can mean doing completely new things (telematics,
sentiment analysis, recommendation engine, matching
competition’s pricing in real time, being able to analyze data we
haven’t been able to analyze in the past)
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
11. Decision making – old school vs. new school (big data)
• Old School:
• Phase 1 : Analyze existing data and prepare general model
• Phase 2: Apply the general model to specific client
• This means applying the same model for many clients when they arrive
• Issues with Old School decision making:
• Time gap between preparing and applying the model
• # of combinations might be too big for general model (example:
recommendation based in interest)
• The general model generated is biased towards “main stream” population
• New School (Big Data):
• Phase 1: Prepare specific model for the client and apply the model – instantly
Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
12. Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
You can’t run and you can’t hide – HadoopBig Data is coming
DBMS
13. Pini Cohen's work Copyright@2013
Do not remove source or attribution
from any slide, graph or portion of
graph
13