Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Zwei jahrebigdata

704 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Zwei jahrebigdata

  1. 1. Big Data: Das zweite Jahr. Joerg Blumtritt
  2. 2. 2
  3. 3. 4
  4. 4. 5
  5. 5. The Future of Market Research
  6. 6. Hardware Traditional • exotic hardware • big central servers • SAN • RAID • hardware reliability • expensive • limited scalability Big Data • commodity HW • racks of pizza boxes • Ethernet • JBOD • unreliable HW • cost effective • scales further
  7. 7. Software Traditional • monolithic • centralized storage • RDBMS • schema first • proprietary Big Data • distributed • storage & compute • nodes • raw data • open source
  8. 8. Volume Velocity Variety Quanti fication Data Science
  9. 9. ... das erste V 1. Volume – Very large data sets – Data Center → Data Warehouse → Internet Scale – Typical dimensions: billions or trillions of records, millions or billions of variables – e.g. Twitter: > 400 M Tweets per day – Technologies: MapReduce, HDFS, Project Voldemort
  10. 10. Map-Reduce http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v2.0 12
  11. 11. zweites V 1. Volume 2. Velocity – Very fast data streams – sensor data, smartphones, socia media: – Typical dimensions: 15k-300k/s – Real time inputs / real time outputs – Stream/event pocessing – Technologies: Storm, S4, Esper, HBase, Kafka
  12. 12. Storm http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html 14
  13. 13. ... und das letzte V 1. Volume 2. Velocity 3. Variety / Variability – Manifold and highly variable data structures – data market places, e.g. Datasift, GNIP, Enigma.io – No schema / NoSQL – Distributed storage – Immutability
  14. 14. {"created_at":"Sat Apr 13 08:07:34 +0000 2013", "id":322984390491774976, "id_str":"322984390491774976", "text":"getru00e4umt, ich hu00e4tte u00fcber den Skandal geblogt, dass wir immernoch geschirrspu00fchlen, genau wie zu Caru00eames Zeiten.", "source":"u003ca href="http://twitter.com/download/android" rel="nofollow"u003eTwitter for Androidu003c/au003e", "truncated":false, "in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_re ply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":10177792,"id_str":"1017779 2", "name":"Joerg Blumtritt", "screen_name":"jbenno", "location":"Stockdorf", "url":"http://slow-media.net", "description":"I just coined the word panfuturistic because it sounds cool. http://memeticturn.com/declaration-of-liquid-culture", "protected":false,"followers_count":2671,"friends_count":1599,"listed_count":141,"created_at":" Mon Nov 12 11:16:15 +0000 2007", "favourites_count":3582,"utc_offset":3600, "time_zone":"Berlin", "geo_enabled":true,"verified":false,"statuses_count":30140,"lang":"en", "contributors_enabled":false,"is_translator":false,"profile_background_color":"FFFFFF", "profile_background_image_url":"http://a0.twimg.com/profile_background_images/816896285/68 8fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/81689 6285/688fcbc8df9391dfd71012d06ca34002.jpeg", "profile_background_tile":false,"profile_image_url":"http://a0.twimg.com/profile_images/331 5156408/db719e7db02772e468179545fb06e7f9_normal.jpeg", "profile_image_url_https":"https://si0.twimg.com/profile_images/3315156408/db719e7db02772e 468179545fb06e7f9_normal.jpeg", "profile_banner_url":"https://si0.twimg.com/profile_banners/10177792/1365261531", "profile_link_color":"0000FF", "profile_sidebar_border_color":"FFFFFF", "profile_sidebar_fill_color":"E0FF92", "profile_text_color":"000000", "profile_use_background_image":true,"default_profile":false,"default_profile_image":false, "following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null ,"place":null,"contributors":null,"retweet_count":0,"favorite_count":0,"favorited":false,"retwe eted":false,"lang":"de"} 16
  15. 15. Statt die Konsistenz der Daten schon in der Struktur festzulegen, wird eine Funktion definiert, die jeden Record nach den vorgegebenen Kriterien überprüft: function IsConsistent(Record, Schema) as Boolean 17
  16. 16. "Each event happens at a particular time and is always true" Operation Create Read (Retrieve) Update (Modify) Delete (Destroy) SQL INSERT SELECT UPDATE DELETE "mutable" • Just C+R; nothing gets ever "updated" • Records are stored as files. Each record is a new file. "immutable" 18
  17. 17. Data Stream Precomputed realtime view Query All Data Precomputed View (Batch Mode) 19
  18. 18. Volume Velocity Variety Quanti fication Data Science
  19. 19. As we know, There are known knowns. There are things we know we know. We also know There are known unknowns. That is to say: We know there are some things We do not know. But there are also unknown unknowns, The ones we don't know We don't know. Donald Rumsfeld known knowns known unknowns unknowns unkonws „data puking“ (Dashboards) „analysis throwing“ (Modellings) „data democracy“ (Big Data) Avinash Kaushik
  20. 20. Data Science 22
  21. 21. • Text comparism of party programmes • Cosinus-Vector distance
  22. 22. 1500 DSDS 1000 Tatort 500 0 0 4 8 12 16 20 0 4 8 12 16 22 2 6 10 14 20 Fr 8.3. Sa 9.3. So 10.3. 26
  23. 23. Persona http://twitter.com/FlaviaReil/statuses/308321057499144193 http://twitter.com/froschmann1968/statuses/308321920200364034 http://twitter.com/VeronikaTangen/statuses/308322141676388352 http://twitter.com/froschmann1968/statuses/308322188501602304 http://twitter.com/QWallyTy/statuses/308322522863128576 http://twitter.com/Duftlavendel/statuses/308322911444406272 http://twitter.com/kakakiri/statuses/308323144836456448 http://twitter.com/Chake/statuses/308323468179566592 http://twitter.com/RegulaAeppli/statuses/308323570386350083 http://twitter.com/Imissmycat1/statuses/308323602342764544 http://twitter.com/WorldNewsGerman/statuses/308323834749140995 http://twitter.com/Zoran2010/statuses/308324446035386368 27
  24. 24. männlich weiblich n.a. 28
  25. 25. http://www.jasondavies.com/parallel-sets/ http://www.nytimes.com/interactive/2012/05 /17/business/dealbook/how-the-facebookoffering-compares.html?_r=0 http://www.senchalabs.org/philogl/PhiloGL/ex amples/winds/ 29
  26. 26. Volume Velocity Variety Quanti fication Data Science D3
  27. 27. 31
  28. 28. 32
  29. 29. 33
  30. 30. Quantified Self 34
  31. 31. 35
  32. 32. 36
  33. 33. 37
  34. 34. 38
  35. 35. 39
  36. 36. 40
  37. 37. 41
  38. 38. 42
  39. 39. 43
  40. 40. 44
  41. 41. 45
  42. 42. Digital Darwinism is the Evolution of Consumer Behavior when Society & Technology Evolve Faster Than the Ability To Adapt Brian Solis
  43. 43. { "name": "Joerg Blumtritt", "job": {title: "Strategy Consultant", startdate: "2005", enddate: null } "job": {title: "Chairman", company: "Arbeitsgemeinschaft Social Media e.V.", startdate: "2008", enddate: null } "email": "joerg.blumtritt@mediagnosis.de" "twitter":"@jbenno", "blog": "http://beautifuldata.net", "blog": "http://slow-media.net", "blog": "http://kuirjeo.net", "blog": "http://memeticturn.net", "website":"http://mediagnosis.de" , "image": "http://slow-media.net/wp-content/uploads/jb_creeper.jpg", "bio": http://beautifuldata.net/Joerg-blumtritt/ } 47

×