5. big data vs it stki - pini cohen

592 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
592
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

5. big data vs it stki - pini cohen

  1. 1. Big Data vs. IT STKISummit2013IT at the crossroads:Lead, follow or get out of the wayPini Cohen Srouec: http://machinationsintomadness.com/good-vs-evil-how-about-good-vs-good/
  2. 2. Big Data Definition – 4 V’s (or more…)• Volume – tens of TBs and more (15-20TB+)• Velocity – the speed in which data is added – 10M items per hour and more.And the speed in which the data needs to be processed• Variety – different types of data – structured & unstructured. In many casesdeals with internet of things, social media, but also with voice, video, etc.• Variability - able to cope with new attributes and changing data types –without interrupting the analytical process (without “import-export”)• Other optional V’s - validity, volatility, viscosity (resistance to flow), etc. source:http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.htmlPini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  3. 3. The origins of the 3V’s:• 2002 research by Doug Laney from META Group (now Gartner):Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  4. 4. “Big Data” theme main current usage:•“Big Data" is just marketing jargon. -Doug Laney,Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.htmlSource:http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpgPini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  5. 5. Big Data at work:• Orbitz Worldwide has collected 750 terabytes of unstructureddata on their consumers’ behavior – detailed information fromcustomer online visits and browsing sessions. Using Hadoop,models have been developed intended to improve search resultsand tailor the user experience based on everything fromlocation, interest in family travel versus solo travel, and even thekind of device being used to explore travel options.• The result? To date, a 7% increase in interaction rate, 37%growth in stickiness of sessions and a net 2.6% in booking pathengagement.Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  6. 6. DoD R&D prioritizes Big Data• The Pentagon has invested billions in a new generation ofelectronic systems that gather and store vast quantities ofimagery and other data from the battlefield, and the digitaldeluge is so vast that sifting through it manually to generateactionable information is not a sustainable option, officials said.• The Pentagon joined four other federal departments andagencies at a White House event in late March to announce$200 million in governmentwide big data research efforts.Source: http://www.federalnewsradio.com/885/2824044/DoD-RD-prioritizes-Big-DataPini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  7. 7. Technology: Elements Concepts• Storing data for analytics (mainly):• HDFS – Hadoop File System• Map Reduce- Programming method mainly for analytics• Other “Add-on”: Pig, , Hive, JAQL (IBM)• Storing and retrieving data - DBMS:• NoSQL – DBMS (not only SQL):• Cassandra• MongoDB• CouchDB• Hbase• Redis• Neo4j• RiakPini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  8. 8. Big Data technologies (Hadoop etc.) vs. traditional IT - InfrastructureBig DataTraditional ITLocal storageCentralized StorageCheap HW White BoxesBrand redundant ServersIs standardization needed?! (in the HW level). Noserver virtualization.Standard Infrastructure and virtual servers.Why do I need backup? How do I tackle DRP (computeclusters that are stretched over locations)Well established backup and DRP proceduresOpen Source solutionsTraditional vendorsIn a new patch for specific issues sometimes it iswritten “not implemented yet”Mature products and proceduresDifferent kind of programming (map-reduce) , no JoinsTraditional programming, SQLWill Big Data infrastructure be part of existing infrastructure or will be developed as new domain?Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  9. 9. Big Data technologies (Hadoop etc.) vs. traditional IT - BI• Data Scientist vs. BI• Data science incorporates varying elements andbuilds on techniques and theories from manyfields, including math, statistics, dataengineering, pattern recognition and learning,advanced computing, visualization, uncertaintymodeling, data warehousing, and highperformance computing with the goal ofextracting meaning from data and creating dataproducts. (WIKI)Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  10. 10. What is the business value of big data analytics?• Big data is now a technology looking for a business need• It can mean doing the same thing but better / faster (bettersegmentation, more accurate analysis model)• Doing things Cheaper• Or it can mean doing completely new things (telematics,sentiment analysis, recommendation engine, matchingcompetition’s pricing in real time, being able to analyze data wehaven’t been able to analyze in the past)Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  11. 11. Decision making – old school vs. new school (big data)• Old School:• Phase 1 : Analyze existing data and prepare general model• Phase 2: Apply the general model to specific client• This means applying the same model for many clients when they arrive• Issues with Old School decision making:• Time gap between preparing and applying the model• # of combinations might be too big for general model (example:recommendation based in interest)• The general model generated is biased towards “main stream” population• New School (Big Data):• Phase 1: Prepare specific model for the client and apply the model – instantlyPini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph
  12. 12. Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraphYou can’t run and you can’t hide – HadoopBig Data is comingDBMS
  13. 13. Pini Cohens work Copyright@2013Do not remove source or attributionfrom any slide, graph or portion ofgraph13

×