The European Conference on Software Architecture (ECSA) 14 - IBM BigData Reference Architecture

2,509 views

Published on

Published in: Internet
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,509
On SlideShare
0
From Embeds
0
Number of Embeds
58
Actions
Shares
0
Downloads
50
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The European Conference on Software Architecture (ECSA) 14 - IBM BigData Reference Architecture

  1. 1. P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive Systems and BigData (A joint-venture between IBM Research Zurich and IBM Innovation Center DACH)
  2. 2. ● Motivation ● Use Cases ● How Databases scale ● Evolution of Large Scale Data Processing ● Requirements and Ingredients ● Architectural Proposal Agenda
  3. 3. Motivation – the World before 2000 .coms Traditional Enterprises MySQL, Postgres DB2, Oracle, Teradata
  4. 4. Motivation – the World after 2000 .coms Traditional Enterprises NoSQL DB2, Oracle, Teradata
  5. 5. ● Use ALL available data independently weather it is Use Cases – Basic Idea
  6. 6. ● Use ALL available data independently weather it is ● Inside Use Cases – Basic Idea
  7. 7. ● Use ALL available data independently weather it is ● Inside or outside your company Use Cases – Basic Idea
  8. 8. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured Use Cases – Basic Idea
  9. 9. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured Use Cases – Basic Idea
  10. 10. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured Use Cases – Basic Idea
  11. 11. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary Use Cases – Basic Idea
  12. 12. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary ● At rest Use Cases – Basic Idea
  13. 13. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary ● At rest or in motion Use Cases – Basic Idea
  14. 14. ● Scale-out because Scale-up not possible ● Why? ● There is a sweet spot (global optimum) for the ideal node size ● Determines the number of cluster nodes ● Because node price vs node size is not linear How do Databases Scale?
  15. 15. ● To minimize overall cluster price ● Use many node because of rather small node size ● Use commodity hardware ● Fault tolerance ● Won't go into CAP theorem here → Google ● For dynamic workloads and dynamic scale-in/out → CLOUD How do Databases Scale, Conclusion
  16. 16. ● Current situation in Enterprises ● BI Tools ● SQL (42%) ● R (33%) ● Python (26%) ● Excel (25%) ● Java, Ruby, C++ (17%) ● SPSS, SAS (9%) ● Current situation in .coms ● Writing MapReduce Jobs ● Using proprietary query languages ● Code everything from scratch Current situation
  17. 17. ● Programing language implementations ● Usage of high level query languages ● Pig ● Jaql ● SQL or SQL like ● BigSQL ● HQL ● CQL ● R push back ● BigR ● Rhadoop ● BigData spread sheets ● BI push back ● SPSS push back Evolution
  18. 18. Pig(Latin) / JAQL
  19. 19. BigSQL
  20. 20. Big Spreadsheets
  21. 21. BigR
  22. 22. SPSS
  23. 23. SPSS
  24. 24. Business Intelligence
  25. 25. ● Fault tolerance ● Dynamic and elastic scale-in and out ● Processing data of all types ● Use familiar ways of working with data Requirements Summary
  26. 26. ● NoSQL DB ● Cloud ● Push-back from Business Applications Ingredients
  27. 27. IBM Reference Architecture (current) Information Ingestion Security and Business Continuity Management Information Interaction Analytics Sources Shared Operational Information Traditional Sources 3rd Party Applications Governing Systems Visualization, Data Mining & Exploration Visualization, Data Mining & Exploration User Reports & Dashboards User Reports & Dashboards Accelerators & Application Frameworks Accelerators & Application Frameworks User Guided Applications & Advanced Analytics User Guided Applications & Advanced Analytics Content Repository Content Repository Master Data Hubs Master Data Hubs Reference Data Hubs Reference Data Hubs Warehouse ZoneWarehouse Zone Integrated Warehouse & Marts Zone (Guided Analytical Area) Integrated Warehouse & Marts Zone (Guided Analytical Area) Exploration Zone (Discovery Area) Exploration Zone (Discovery Area) LandingAreaZones LandingAreaZones
  28. 28. IBM Reference Architecture (transition)
  29. 29. 15 minutes Discussion Twitter: @romeokienzler Email: romeo . Kienzler (at) ch.ibm.com

×