Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DLD Summer Workshop Big Data

787 views

Published on

Understanding Big Data and getting the right Mindset.

Published in: Technology
  • Be the first to comment

DLD Summer Workshop Big Data

  1. 1. Big Data Workshop - DLD Summer 15 Big Data – Workshop DLD Summer 15 21/06/15, DLD Summer 15, @rjudas
  2. 2. Big Data Workshop - DLD Summer 15 Understanding Big Data And getting the right mindset 21/06/15, DLD Summer 15, @rjudas
  3. 3. Big Data Workshop - DLD Summer 15 Agenda  Syncing  Defining Big Data  Hype or Evolution  Tech Drivers  Big Data – Big Business?  What‘s it all about?  How do we get there? 21/06/15, DLD Summer 15, @rjudas
  4. 4. Big Data Workshop - DLD Summer 15 Syncing 21/06/15, DLD Summer 15, @rjudas
  5. 5. Big Data Workshop - DLD Summer 15 Syncing  Please tell us your opinion about Big Data  Please tell us about your Big Data projects 21/06/15, DLD Summer 15, @rjudas
  6. 6. Big Data Workshop - DLD Summer 15 Defining Big Data 21/06/15, DLD Summer 15, @rjudas
  7. 7. Big Data Workshop - DLD Summer 15 Definition(s) “Big Data describes datasets so large they become very difficult to manage with traditional database tools.” „big data is “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures”.“ "Very pragmatically, it's about building net-new analytic applications based on new types of data that (an organization) wasn't previously tracking." 21/06/15, DLD Summer 15, @rjudas
  8. 8. Big Data Workshop - DLD Summer 15 The 3 V‘s  Variety  Tables, Images, Videos, XML, Logs  Velocity  Batch, Streams, Real- Time  Volume  Lot‘s of xBytes Variety VolumeVelocity 21/06/15, DLD Summer 15, @rjudas
  9. 9. Big Data Workshop - DLD Summer 15 Variety  Mix of Data types  BLOB‘s and CLOB‘s  Images, Audio, Videos, Log Files  Semi-Structured, Unstructured  Email, EDI-Messages, Transaction Logs, Sensor- Data 21/06/15, DLD Summer 15, @rjudas
  10. 10. Big Data Workshop - DLD Summer 15 Velocity  Crucial – Speed of „Feedback Loop“  Streaming Data  Complex Event Processing  From Batch to (Near) Real-Time  Different Lifetime 21/06/15, DLD Summer 15, @rjudas
  11. 11. Big Data Workshop - DLD Summer 15 Volume - Big?  KiloByte  MegaByte  GigaByte  TeraByte  PetaByte  Exabtye  ZettaByte  YottaByte 21/06/15, DLD Summer 15, @rjudas
  12. 12. Big Data Workshop - DLD Summer 15 Figures  „Digital Universe“ according to EMC/IDG Study 2014 in 2013 4.4 Zettabytes, in 2020 44 Zettabytes  All human speech ever spoken 42 Zettabyte (16kHz, 16bit)  2013 - Speculations about NSA Datacenter 1 YB, real estimation 3-12 EB  CERN / LHC Datacenter passes 100 PB 21/06/15, DLD Summer 15, @rjudas
  13. 13. Big Data Workshop - DLD Summer 15 Volume – Most famous quote  2.5 Exabytes of Data Created each Day (2,500,000,000,000,000,000 bytes) ≈ 1 ZB/Year  (with 90% of World Data created in the last two years)  Source IBM CMO Study 2011 21/06/15, DLD Summer 15, @rjudas
  14. 14. Big Data Workshop - DLD Summer 15 Even more V‘s  Veracity  Uncertainty of Data, Trustworthiness, Accountability  Value  Big Data only if it generates value  Visibility  Security, stitching together data from various sources  Validity  Logic inference, Correlation vs. Causation 21/06/15, DLD Summer 15, @rjudas
  15. 15. Big Data Workshop - DLD Summer 15 Hype or Evolution? 21/06/15, DLD Summer 15, @rjudas
  16. 16. Big Data Workshop - DLD Summer 15 Old wine?  OLTP, OLAP, DataWareHouse - Around since 1970s - ACID (Atomicity, Consistency, Isolation, Durability) - based on SQL 21/06/15, DLD Summer 15, @rjudas
  17. 17. Big Data Workshop - DLD Summer 15 Big Data 15 years ago OLTP Orders Articles Receiving Orders, Articles, Receiving Etc. Data Warehouse Decision Support Systems (OLAP) 21/06/15, DLD Summer 15, @rjudas
  18. 18. Big Data Workshop - DLD Summer 15 Business Intelligence 21/06/15, DLD Summer 15, @rjudas
  19. 19. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  20. 20. Big Data Workshop - DLD Summer 15 Enter Big Data http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation http://www.gartner.com/newsroom/id/1731916 http://chucksblog.emc.com/chucks_blog/2011/06/2011-idc-digital-universe-study-big-data-is-here-now-what.html 21/06/15, DLD Summer 15, @rjudas
  21. 21. Big Data Workshop - DLD Summer 15 “New” Big Data  New Paradigm  BASE (Basic Availability, Soft State and Eventually consistency)  New Data Model  Data LifeCycle and Variability  Data Linking and referral integrity  New Analytics  Real-time/streaming analysis, interactive  Machine-learning  New Infrastructure and Tools  High Performance Computing, Storage, Network  Multi-Provider Services Integration  New Data Centric service models and security models 21/06/15, DLD Summer 15, @rjudas
  22. 22. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  23. 23. Big Data Workshop - DLD Summer 15 Hadoop on Premise Big Data Cluster Mgmt / Monitoring NoSQL NewSQL Databases MPP Databases Graph DB Crowd- sourcing Transfo rmation Security Storage App Dev Cross Infrastructure / Cloud Services Analytics Platform BI Platforms For Business Analysts Data Science / Platform Data Visualization Unstru ctured Data AI Social Analytics Analytic Services Machine Learning Location/Pe ople/Events Search Statistical Computing Log Analytics Crowd- source d RealTime SMB Frame- work Query Data Access Collab. workflow Real- Time Stat. Tools ML Data Source Sensors DataData Markets Incubators Cloud Deploy Gov / Regu lation Security Education / Learning Health Log Analytics Search Finance Human Capital Legal Marketing Publisher Tools Ad Optimi- zation 21/06/15, DLD Summer 15, @rjudas
  24. 24. Big Data Workshop - DLD Summer 15 Big Data  Hype AND Evolution  Some Vendors use it to remarket “old” stuff  Many “new” products/services 21/06/15, DLD Summer 15, @rjudas
  25. 25. Big Data Workshop - DLD Summer 15 Tech Drivers 21/06/15, DLD Summer 15, @rjudas
  26. 26. Big Data Workshop - DLD Summer 15 Drivers  Vendors  Hardware, Storage, Network, Software  Business  Mobile  Social  Customer Insights  Technology  Open Source Technology, Cloud Computing 21/06/15, DLD Summer 15, @rjudas
  27. 27. Big Data Workshop - DLD Summer 15 The Elephant in the Room 21/06/15, DLD Summer 15, @rjudas
  28. 28. Big Data Workshop - DLD Summer 15 Hadoop - Hadoop is an Open Source „Big Data“ Framework - Distributed Storage (HDFS) and Processing (Map Reduce) - Reliable, Fault tolerant - Horizontal scalability from Single to thousands of Cluster Nodes - Cost 2.500$ / TB vs. 250.000$ / TB in Datawarehouses 21/06/15, DLD Summer 15, @rjudas
  29. 29. Big Data Workshop - DLD Summer 15 MapReduce  Programming Model/Framework for processing large Data Sets 21/06/15, DLD Summer 15, @rjudas
  30. 30. Big Data Workshop - DLD Summer 15 NoSQL Databases  Traditional RDBMS outdated for modern paradigms - Big Data - Connectivity - Concurrency - Diversity - Cloud 21/06/15, DLD Summer 15, @rjudas
  31. 31. Big Data Workshop - DLD Summer 15 The difference – SQL / Tables 21/06/15, DLD Summer 15, @rjudas
  32. 32. Big Data Workshop - DLD Summer 15 The NoSQL difference { _id: ObjectId(”2341"), type: "Article", author: ”Chris Boos", title: ”Introduction AutoPilot", date: ISODate("2015-04-21T13:21:12.343Z"), }, { _id: ObjectId(2342"), type: "Book", author: ”Roland Judas", title: ”Big Data", isbn: "978-0-213434235-5-7" } Document-based „User1“, „Roland Judas“ „User2“, „Chris Boos“ „User3“, „Charly Brown“ Key-Value Graph-Based Columns 21/06/15, DLD Summer 15, @rjudas
  33. 33. Big Data Workshop - DLD Summer 15 Pros/Cons Hadoop / NoSQL  Pro  Highly flexible, agile, available, performant  Scalable  Modern, open technology with Commercial Support  Support for very large datasets on commodity hardware  Cons  Immature  No Standardization - Schema-free means Application needs to know how to retrieve data 21/06/15, DLD Summer 15, @rjudas
  34. 34. Big Data Workshop - DLD Summer 15 Even more tools  Search/Index  Business Intelligence  Analytical Programming  Visualisation 21/06/15, DLD Summer 15, @rjudas
  35. 35. Big Data Workshop - DLD Summer 15 Machine Learning 21/06/15, DLD Summer 15, @rjudas
  36. 36. Big Data Workshop - DLD Summer 15 Big Data – Big Business? 21/06/15, DLD Summer 15, @rjudas
  37. 37. Big Data Workshop - DLD Summer 15 Big Data Market  Big Data Market projected in 2015 – $125bn* (in comparison Public Cloud - $95bn**)  Big Funding  Cloudera – $1.2bn  MongoDB – $300m  HortonWorks – $250m  DataStax – $190m  BIRST – $130m * According to Forbes.co / 2014/12/11 / 6 Predictions for Big Data / IDC Research ** According to Forrester Research 21/06/15, DLD Summer 15, @rjudas
  38. 38. Big Data Workshop - DLD Summer 15 Shares of Big Data Market 21/06/15, DLD Summer 15, @rjudas
  39. 39. Big Data Workshop - DLD Summer 15 Vendors love Big Data 21/06/15, DLD Summer 15, @rjudas
  40. 40. Big Data Workshop - DLD Summer 15 Vendors REALLY love Big Data! Latest in Corporate Tech: In-Memory  Oracle Exalytics  SAP HANA „Has SAP Bet The House With The Biggest Update to its ERP in Two Decades?“ http://www.forbes.com/sites/greatspeculations/2015/03/04/has-sap-bet-the-house-with-the-biggest-update-to-its-erp-in-two-decades/ 21/06/15, DLD Summer 15, @rjudas
  41. 41. Big Data Workshop - DLD Summer 15 Even more Sales!!! 21/06/15, DLD Summer 15, @rjudas
  42. 42. Big Data Workshop - DLD Summer 15 Best Practices DWH / BI / Big Data  Analyze problem / data / quality  Data Cleaning  Data quality initiatives  Sync Business / IT  Buy stuff  Implement stuff  Train users  Use governance / strategic approaches 21/06/15, DLD Summer 15, @rjudas
  43. 43. Big Data Workshop - DLD Summer 15 And the success?  Through 2017, 60% of big data projects will fail to go beyond piloting and experimentation and will be abandoned.  Through 2017, fewer than half of lagging organizations will have made cultural or business model adjustments sufficient to benefit from big data.  Through 2018, 90% of deployed data lakes will be useless as they are overwhelmed with information assets captured for uncertain use cases. Gartner: Predicts 2015: Big Data Challenges Move From Technology to the Organization 21/06/15, DLD Summer 15, @rjudas
  44. 44. Big Data Workshop - DLD Summer 15 Challenges  Usage Scenarios  Goals  Skills  Missing Data Scientists  Need to understand the Math  Technical  Data Integration  Privacy  Main discussion in Germany 21/06/15, DLD Summer 15, @rjudas
  45. 45. Big Data Workshop - DLD Summer 15 Syncing  What‘s your opinion?  Do you have experience with big vendors offerings? 21/06/15, DLD Summer 15, @rjudas
  46. 46. Big Data Workshop - DLD Summer 15 What‘s it all about? 21/06/15, DLD Summer 15, @rjudas
  47. 47. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  48. 48. Big Data Workshop - DLD Summer 15 What‘s it all about?  Data contains information of great business value  If you can extract those insights you can make far better decisions  Ultimately - Predicting the future 21/06/15, DLD Summer 15, @rjudas
  49. 49. Big Data Workshop - DLD Summer 15 Common Use Cases  Customer Insights  Market Basket/Pricing optimization  Fraud Detection / Security Analytics  (Proactive) Monitoring  Sensor Data (IoT)  Data Warehouse Optimization 21/06/15, DLD Summer 15, @rjudas
  50. 50. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  51. 51. Big Data Workshop - DLD Summer 15 Understanding is important Data Understanding Connectedness Information Knowledge Intelligence/Wisdom Understanding relations Understanding patterns Understanding principles 21/06/15, DLD Summer 15, @rjudas
  52. 52. Big Data Workshop - DLD Summer 15 How do we get there? 21/06/15, DLD Summer 15, @rjudas
  53. 53. Big Data Workshop - DLD Summer 15 Syncing  Anyone heard about „Semantic Web“ or „Ontology“?  Anyone having experience or projects around Ontologies? 21/06/15, DLD Summer 15, @rjudas
  54. 54. Big Data Workshop - DLD Summer 15 Mapping the territory  Enterprise Architecture (traditional)  „Holistic“ Approach  Many „Best practices“ and patterns  Big Data Discovery  Kind of Self-Service for Big Data  Next Big Thing?  Semantic Layer  Should exist from BI implementation (proprietary)  Or use modern approach “Linked Data” 21/06/15, DLD Summer 15, @rjudas
  55. 55. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  56. 56. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  57. 57. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  58. 58. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  59. 59. Big Data Workshop - DLD Summer 15 Data + Semantic = Knowledge 21/06/15, DLD Summer 15, @rjudas
  60. 60. Big Data Workshop - DLD Summer 15 Key is getting machine readable Data <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:admin="http://webns.net/mvcb/"> <foaf:PersonalProfileDocument rdf:about=""> <foaf:maker rdf:resource="#me"/> <foaf:primaryTopic rdf:resource="#me"/> </foaf:PersonalProfileDocument> <foaf:Person rdf:ID="me"> <foaf:name>Roland Judas</foaf:name> <foaf:title>Mr.</foaf:title> <foaf:givenname>Roland</foaf:givenname> <foaf:family_name>Judas</foaf:family_name> <foaf:homepage rdf:resource="http://about.me/rjudas"/> <foaf:workplaceHomepage rdf:resource="http://arago.co"/> <foaf:knows> <foaf:Person> <foaf:name>Chris Boos</foaf:name> </foaf:Person></foaf:knows></foaf:Person> </rdf:RDF> 21/06/15, DLD Summer 15, @rjudas
  61. 61. Big Data Workshop - DLD Summer 15 Ontologies  “A Data Model that represents Knowledge as a set of concepts within a domain and the relationships between these concepts”  FOAF  Schema.org  DBPedia Ontology  Good Relations  http://www.w3.org/wiki/Good_Ontologies 21/06/15, DLD Summer 15, @rjudas
  62. 62. Big Data Workshop - DLD Summer 15 Triples  Representation of facts PredicateSubject Object Is a (has type)Roland Person http://about.me/rjudas rdf:type foaf:Person 21/06/15, DLD Summer 15, @rjudas
  63. 63. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  64. 64. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  65. 65. Big Data Workshop - DLD Summer 15 From Triples to Graphs Is a Person Roland likes DLD Songs plays Vertice / Node Edge 21/06/15, DLD Summer 15, @rjudas
  66. 66. Big Data Workshop - DLD Summer 15 Famous Examples 21/06/15, DLD Summer 15, @rjudas
  67. 67. Big Data Workshop - DLD Summer 15 A pragmatic Approach From the Basement 21/06/15, DLD Summer 15, @rjudas
  68. 68. Big Data Workshop - DLD Summer 15 Bringing Pieces together Semantic Graphs Big DataAPIs 21/06/15, DLD Summer 15, @rjudas
  69. 69. Big Data Workshop - DLD Summer 15 http://github.com/arago/ogit 21/06/15, DLD Summer 15, @rjudas
  70. 70. Big Data Workshop - DLD Summer 1521/06/15, DLD Summer 15, @rjudas
  71. 71. Big Data Workshop - DLD Summer 15 Semantic Data Platform 21/06/15, DLD Summer 15, @rjudas
  72. 72. Big Data Workshop - DLD Summer 15 Visualization 21/06/15, DLD Summer 15, @rjudas
  73. 73. Big Data Workshop - DLD Summer 15 Use Cases from/beyond the IT Department  Ticket Statistics  Provider Management  Network Planning  Comparing Architectures  Forecasting Technological Trends  Data Center Planning  Application Migration  Technical Analysis for Business Processes  IT Organisation Insights  User Ranking 21/06/15, DLD Summer 15, @rjudas
  74. 74. Big Data Workshop - DLD Summer 15 The right Mindset Semantics Graphs APIs “New” Big Data Tools 21/06/15, DLD Summer 15, @rjudas
  75. 75. Big Data Workshop - DLD Summer 15 www.autopilot.co www.graphit.co www.tabtab.co 21/06/15, DLD Summer
  76. 76. Big Data Workshop - DLD Summer 15 Roland Judas  Frankfurt, Germany  Technical Evangelist, Product Manager at arago  Organizer Webmontag Frankfurt, Cloudcamp Frankfurt  Mail: rjudas@arago.de  Twitter:  @rjudas (en)  @rolandjudas (de)  http://about.me/rjudas 21/06/15, DLD Summer 15, @rjudas
  77. 77. Big Data Workshop - DLD Summer 15 Image References and Licenses Facebook Datacenter https://www.flickr.com/photos/intelfreepress/ License CC BY 2.0 Winery https://www.flickr.com/photos/joceykinghorn/ License CC BY-SA 2.0 BI Dashboard https://www.flickr.com/photos/ctsi-global/ License CC BY-SA 2.0 Dollars https://www.flickr.com/photos/amagill/ License CC BY 2.0 Old Timer Truck: https://www.flickr.com/photos/ell-r-brown/ License CC BY 2.0 SQL Designer https://www.flickr.com/photos/ejk/ License CC BY-SA 2.0 Crystal Ball https://www.flickr.com/photos/frogman2212/ License CC BY 2.0 MapReduce https://www.flickr.com/photos/lkaestner/ License CC BY-SA 2.0 Foaf https://www.flickr.com/photos/dullhunk/ License CC BY 2.0 Linked Open Data Richard Cyganiak and Anja Jentzsch License CC BY-SA 3.0 Rear-View Mirror https://www.flickr.com/photos/labyrinthx-2/ License CC BY-SA 2.0 Servers-8055_13.jpg https://commons.wikimedia.org/wiki/User:Victorgrigas License CC BY-SA 3.0 Watson https://commons.wikimedia.org/wiki/User:Clockready License CC BY-SA 3.0 Wolfram Alpha https://www.flickr.com/photos/morville/ License CC BY 2.0 Social_Network_Visualization MartinGrandjean http://www.martingrandjean.ch/wp-content/ 21/06/15, DLD Summer 15, @rjudas

×