Big Data: Architectures and Approaches

10,613 views

Published on

ThoughtWorkers David Elliman and Ashok Subramanian present how the big data world is moving quickly with predictions of amazing industry growth. For more information on how the 'Internet of Things' is playing an increasingly larger role, read David's blog post or watch the video from the London-based event. http://www.thoughtworks.com/insights/blog/big-data-and-internet-things

Published in: Technology
0 Comments
30 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
10,613
On SlideShare
0
From Embeds
0
Number of Embeds
4,958
Actions
Shares
0
Downloads
570
Comments
0
Likes
30
Embeds 0
No embeds

No notes for slide
  • Dave
    http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/Reference
  • Ashok
    Big data analytics are driving rapid growth for public cloud computing vendors with revenues for the top 50 public cloud providers shooting up 47% in the fourth quarter last year to $6.2 billion
  • Dave
    http://nsa.gov1.info/utah-data-center/
  • Ashok
    Who is that handsome man!
  • Dave & Ashok
    Growth in retail, usage of iBeacons, Precision marketing, some sophistication with web analytics & CRM - greater penetration.
    Healthcare - remote monitoring, automated procedures
  • Ashok
  • Ashok
    Validation or Discovery
    picture of fork in the road?
  • Ashok & Dave
  • Dave
  • Ashok
    Exploring alternate models
  • Dave
  • Ashok
    Lambda Architecture - section heading
  • Ashok - high level description of components
  • Dave
    Batch Hadoop 2.0/MR2
    goal: allows you to share a large cluster of machines between different frameworks. Similar to Mesos, both are steps towards distributed data OS.
  • Dave
    Data Lakes
  • Dave
  • Ashok
  • Ashok
  • Ashok
    Fast and Scalable Analytics depends on efficient data structures
    Matching the Algorithm to the data structure
    Morphing the Raw data into the data structure
    Raw data > Data Structure > Algorithm > Insight
  • Conclusion
  • Ashok
    Balance shifting from Commercial to Open-Source
    Innovations coming from the open source world
  • Ashok
    Quantum computing - this is one apparently!
  • closing statement before Q&A
  • Dave
  • Big Data: Architectures and Approaches

    1. 1. w e l c o m e BIG DATA Architectures and Approaches David Elliman & Ashok Subramanian
    2. 2. Luke Barrett 1971-2014
    3. 3. http://upload.wikimedia.org/wikipedia/commons/f/f0/DARPA_Big_Data.jpg BIG DATA
    4. 4. https://www.flickr.com/photos/katerha/8380451137/
    5. 5. 1944 https://www.flickr.com/photos/timetrax/376152628/sizes/l 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    6. 6. 1961 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    7. 7. 1971 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    8. 8. 1996 https://www.flickr.com/photos/epsos/8336691931 ge becomes more cost effective for storing da 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    9. 9. 1996 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    10. 10. 1998 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    11. 11. 1998 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 https://www.usenix.org/conference/1999-usenix-annual-technical-conference/big-data-and-next-wave-infrastress-problems
    12. 12. 2004 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    13. 13. 2006 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    14. 14. 2008 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    15. 15. 2010 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    16. 16. 2013 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 "alottabytes"
    17. 17. 2015 https://www.flickr.com/photos/will-lion/2595830716/ 1940 1945 1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
    18. 18. https://www.flickr.com/photos/taedc/6998468974
    19. 19. http://blogs.gartner.com/doug-laney/batman-on-big-data/
    20. 20. https://www.flickr.com/photos/10ch/3347658610/
    21. 21. THE OPPORTUNITY
    22. 22. <- 1990 DATA INSIGHT DATA INSIGHT DATA INSIGHT 1990s - 2000 2000 ->
    23. 23. Key Takeaways • This isn’t a new problem • The problem isn’t going away • Remember to focus on the VALUE https://www.flickr.com/photos/djwtwo/8331524425/
    24. 24. Where do we… https://www.flickr.com/photos/ekosystem/4334671818/
    25. 25. https://www.flickr.com/photos/libraryacu/7695938410/
    26. 26. Complexity Value Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Why did it happen? What will happen? How can we make it happen? Analytics - Goals
    27. 27. https://www.flickr.com/photos/lopetz/3912416793/ REAL TIME BATCH
    28. 28. Volume Velocity REAL TIME BATCH
    29. 29. https://www.flickr.com/photos/ingythewingy/5510406450/
    30. 30. THINK BIG S M A L L A C T S M A L L A C T Small is the New Big (Seth Godin)
    31. 31. https://www.flickr.com/photos/pauldineen/4529216647/
    32. 32. “80% of the work in any data project is in cleaning the data” – D J Patil https://www.flickr.com/photos/desideratum/8595251348/
    33. 33. https://www.flickr.com/photos/22280677@N07/2504310138/
    34. 34. https://www.flickr.com/photos/jm3/4814208649/
    35. 35. SQL
    36. 36. https://www.flickr.com/photos/marc_smith/6793088143/
    37. 37. Key Takeaways • Start small • Start with the ? • Iteratively follow the value • Using freely available tooling • Volume vs Velocity https://www.flickr.com/photos/djwtwo/8331524425/
    38. 38. Scaling the Solution https://www.flickr.com/photos/auntiep/4310240/
    39. 39. https://www.flickr.com/photos/111692634@N04/11407095913/
    40. 40. –attributed to Gene Amdahl 1967 “Amdahl’s law is used to find the maximum expected improvement to an overall system when only part of the system is improved.”
    41. 41. https://twitter.com/PieCalculus/status/459485747842523136/photo/1
    42. 42. https://www.flickr.com/photos/rofi/2097239111/
    43. 43. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
    44. 44. Scaled Data Store Event Processing Network QueryAll Data Lambda Architecture Batch View Realtime View Batch Write Random Write
    45. 45. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
    46. 46. Client Master Node JobTracker Name Node Metadata Operations to Get Block Info Job assignment to cluster Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce Task Tracker Slave Node Data Node Map Reduce 1 3 1 2 1 5 6 4 Data Replication on Multiple Nodes DataWrite DataRead Batch - Hadoop (MR1)
    47. 47. Batch - MapReduce Map Shuffle Reduce
    48. 48. Batch - Cascading
    49. 49. Batch - Spark
    50. 50. Segment Servers Query processing and data storage Network Interconnect Master Servers Query planning & dispatch External Sources Loading, streaming, etc. SQL or MapReduceBatch - MPP database
    51. 51. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
    52. 52. Speed - Storm
    53. 53. CEP
    54. 54. Batch Speed Serving Query query = function(all data) All Data Lambda Architecture
    55. 55. Lambda Architecture - Serving
    56. 56. http://www.wallzhq.com/wp-content/uploads/2014/02/matrix_binary-wide.jpg
    57. 57. Pull-based Batch Loads Enterprise Data Models Complex ETL Logic Poorly Suited to Non-Relational Data Emergent design is difficult Conventional Architectures
    58. 58. Pivotal Business Data Lake Architecture http://www.gopivotal.com/sites/default/files/Pivotal-Business-Data-Lake-Technical_Brochure_WEB.PDF
    59. 59. DATA CORE RAW FACTUAL DATA HISTORIZED EVENTS RETAIN BUSINESS KEY DATA LINEAGE
    60. 60. DATA INGESTION EVENT DRIVEN MESSAGE QUEUE TRICKLE FEED BATCH LOAD
    61. 61. INFORMATION PUBLISHING TOPICAL QUEUES POST PROCESSING
    62. 62. INFORMATION TIER PURPOSE BUILT DATA SUBSETS TRANSFORMATION DATA GOVERNANCE MDM CONCERNS POST PROCESSING
    63. 63. PRESENTATION TIER BUSINESS VALUE APPLICATIONS DATA SERVICES AD HOC QUERYING WRITE BACK?
    64. 64. Transformation Logic Data Post Processing Near Real Time Feed Emergent Design & Agile Delivery
    65. 65. Apache Kafka Apache Storm
    66. 66. Micro-data-services
    67. 67. Drive Towards In Memory Processing
    68. 68. https://www.tele-task.de/archive/lecture/overview/5721/
    69. 69. Remember https://www.flickr.com/photos/anjin/695894443/
    70. 70. Data Structures Algorithmshttps://www.flickr.com/photos/herrolsen/7645876896/
    71. 71. Raw Data Data Structure Algorithm Insight
    72. 72. Key Takeaways • Embrace the cloud • Fit the Architecture to the problem • Remember Knuth https://www.flickr.com/photos/djwtwo/8331524425/
    73. 73. https://www.flickr.com/photos/tim_norris/2789759648/ SUMMARY
    74. 74. http://www.datameer.com/blog/uncategorized/the-hadoop-ecosystem-visualized-in-datameer.html 48 30 26 22 18 18 16 15 15 15 13 13 13 13 12 0 13 25 38 50 63 Hadoop Ecosystem
    75. 75. https://www.flickr.com/photos/classblog/5136926303/ Commercial Open Source
    76. 76. https://blog.cloudera.com/blog/2011/10/the-community-effect/
    77. 77. https://www.flickr.com/photos/ctsi-global/6556284907/
    78. 78. https://www.flickr.com/photos/will-lion/2597608152/
    79. 79. https://www.flickr.com/photos/jurvetson/14105339228/
    80. 80. Open Questions http://talkmarketing.co.uk/wp-content/uploads/2013/07/Open-Ended-Questions.jpg
    81. 81. https://www.flickr.com/photos/typoatelier/5615759848/
    82. 82. https://www.flickr.com/photos/rembcc/3802038945/
    83. 83. https://www.flickr.com/photos/sidelong/246816211/
    84. 84. No matter how much you speed up the computers or the way you put computers together, the real issues are at the DATA LEVEL
    85. 85. https://www.flickr.com/photos/opensourceway/5556249000/
    86. 86. Enterprise Master Data Management
    87. 87. Localised Formats
    88. 88. Single System of Record
    89. 89. SoR is a process not a place
    90. 90. Database Integration (by another name)
    91. 91. http://www.bain.com/infographics/big-data/ Organisational Models

    ×