Your SlideShare is downloading. ×
  • Like
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Reference Architecture
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The European Conference on Software Architecture (ECSA) 14 - IBM BigData Reference Architecture

  • 240 views
Published

 

Published in Internet
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
240
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
10
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive Systems and BigData (A joint-venture between IBM Research Zurich and IBM Innovation Center DACH)
  • 2. ● Motivation ● Use Cases ● How Databases scale ● Evolution of Large Scale Data Processing ● Requirements and Ingredients ● Architectural Proposal Agenda
  • 3. Motivation – the World before 2000 .coms Traditional Enterprises MySQL, Postgres DB2, Oracle, Teradata
  • 4. Motivation – the World after 2000 .coms Traditional Enterprises NoSQL DB2, Oracle, Teradata
  • 5. ● Use ALL available data independently weather it is Use Cases – Basic Idea
  • 6. ● Use ALL available data independently weather it is ● Inside Use Cases – Basic Idea
  • 7. ● Use ALL available data independently weather it is ● Inside or outside your company Use Cases – Basic Idea
  • 8. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured Use Cases – Basic Idea
  • 9. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured Use Cases – Basic Idea
  • 10. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured Use Cases – Basic Idea
  • 11. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary Use Cases – Basic Idea
  • 12. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary ● At rest Use Cases – Basic Idea
  • 13. ● Use ALL available data independently weather it is ● Inside or outside your company ● Structured, semi-structured, unstructured or binary ● At rest or in motion Use Cases – Basic Idea
  • 14. ● Scale-out because Scale-up not possible ● Why? ● There is a sweet spot (global optimum) for the ideal node size ● Determines the number of cluster nodes ● Because node price vs node size is not linear How do Databases Scale?
  • 15. ● To minimize overall cluster price ● Use many node because of rather small node size ● Use commodity hardware ● Fault tolerance ● Won't go into CAP theorem here → Google ● For dynamic workloads and dynamic scale-in/out → CLOUD How do Databases Scale, Conclusion
  • 16. ● Current situation in Enterprises ● BI Tools ● SQL (42%) ● R (33%) ● Python (26%) ● Excel (25%) ● Java, Ruby, C++ (17%) ● SPSS, SAS (9%) ● Current situation in .coms ● Writing MapReduce Jobs ● Using proprietary query languages ● Code everything from scratch Current situation
  • 17. ● Programing language implementations ● Usage of high level query languages ● Pig ● Jaql ● SQL or SQL like ● BigSQL ● HQL ● CQL ● R push back ● BigR ● Rhadoop ● BigData spread sheets ● BI push back ● SPSS push back Evolution
  • 18. Pig(Latin) / JAQL
  • 19. BigSQL
  • 20. Big Spreadsheets
  • 21. BigR
  • 22. SPSS
  • 23. SPSS
  • 24. Business Intelligence
  • 25. ● Fault tolerance ● Dynamic and elastic scale-in and out ● Processing data of all types ● Use familiar ways of working with data Requirements Summary
  • 26. ● NoSQL DB ● Cloud ● Push-back from Business Applications Ingredients
  • 27. IBM Reference Architecture (current) Information Ingestion Security and Business Continuity Management Information Interaction Analytics Sources Shared Operational Information Traditional Sources 3rd Party Applications Governing Systems Visualization, Data Mining & Exploration Visualization, Data Mining & Exploration User Reports & Dashboards User Reports & Dashboards Accelerators & Application Frameworks Accelerators & Application Frameworks User Guided Applications & Advanced Analytics User Guided Applications & Advanced Analytics Content Repository Content Repository Master Data Hubs Master Data Hubs Reference Data Hubs Reference Data Hubs Warehouse ZoneWarehouse Zone Integrated Warehouse & Marts Zone (Guided Analytical Area) Integrated Warehouse & Marts Zone (Guided Analytical Area) Exploration Zone (Discovery Area) Exploration Zone (Discovery Area) LandingAreaZones LandingAreaZones
  • 28. IBM Reference Architecture (transition)
  • 29. 15 minutes Discussion Twitter: @romeokienzler Email: romeo . Kienzler (at) ch.ibm.com