Copyright © 2012, SAS Institute Inc. All rights reserved.make connections • share ideas • be inspiredDataGuido OswaldSolut...
Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data….. Wer?
Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data….. Wer?
Copyright © 2012, SAS Institute Inc. All rights reserved.Wikipedia sagt: Als Big Data werden besonders große Datenmengenb...
Copyright © 2012, SAS Institute Inc. All rights reserved.Ist BIG Data relevant für mich? Zettabytes ?! 1021 Byte 1.000....
Copyright © 2012, SAS Institute Inc. All rights reserved.Zettabytes
Copyright © 2012, SAS Institute Inc. All rights reserved.* According to IDC
Copyright © 2012, SAS Institute Inc. All rights reserved.Diversität der Daten wächst!
Copyright © 2012, SAS Institute Inc. All rights reserved.E = mc2
Copyright © 2012, SAS Institute Inc. All rights reserved.Gartner IT Trends 20126. Next-Generation-Analytics7. Big Data8. I...
Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data kommtWie hat sich nach Ihrer Einschätzung die Datenmenge...
Copyright © 2012, SAS Institute Inc. All rights reserved.VOLUMENVARIETÄTGESCHWINDIGKEITRELEVANZTODAY THE FUTUREDATASIZEWet...
Copyright © 2012, SAS Institute Inc. All rights reserved.Analytics Lifecycle und BIG DataIDENTIFY /FORMULATEPROBLEMDATAPRE...
Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung: Supercomputer?
Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung: Supercomputer?“Monte Rosa” im CSCS (Lugano) AMD quad...
Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsTraditionelle Architektur
Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsStatt die Daten zur «Arbeit»Die «A...
Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsMARKETINGVERKAUFFINANZENSUPPLYCHAI...
Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung für BIG Analytics Paralleles Datenbanksystem EMC Gre...
Copyright © 2012, SAS Institute Inc. All rights reserved.Commodity Hardware: Blade Server
Copyright © 2012, SAS Institute Inc. All rights reserved.SAS Lösung für BIG Data
Copyright © 2012, SAS Institute Inc. All rights reserved.Fazit
make connections • share ideas • be inspiredCopyright © 2012, SAS Institute Inc. All rights reserved.Vielen Dank für IhreA...
Copyright © 2012, SAS Institute Inc. All rights reserved.make connections • share ideas • be inspiredBACKUP SLIDES
Copyright © 2012, SAS Institute Inc. All rights reserved.Welche Daten wohin?DECISIONS / ACTIONS / DATARAW RELEVANT DATALOW...
Copyright © 2012, SAS Institute Inc. All rights reserved.Business intelligence weiter denkenFORECASTINGDATA MININGTEXT ANA...
Upcoming SlideShare
Loading in …5
×

Big data trend oder hype slideshare

723 views

Published on

presented during SAS Forum Switzerland 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
723
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • besonders große Datenmengen, die mit Hilfe von Standard-Datenbanken und Daten-Management-Tools nicht oder nur unzureichend verarbeitet werden können.Problematisch sind hierbei vor allem die Erfassung, die Speicherung, die Suche, Verteilung, Analyse und Visualisierung von großen Datenmengen. Das Volumen dieser Datenmengen geht in die Terabytes, Petabytes, Exabytes und Zettabytes.
  • 1021 Byte1.000.000.000.000.000.000.000 Byte
  • 1 Terabyte = Consumer Disk
  • Social Media (WEB 2.0) und SensordatenStrukturierte und unstrukturierte DatenAccording to IDC, “In 2011, the amount of information created and replicated will surpass 1.8 zettabytes (1.8 trillion gigabytes), growing by a factor of nine in just five years. That’s nearly as many bits of information in the digital universe as stars in the physical universe” (The 2011 Digital Universe Study: Extracting Value from Chaos, IDC, June 2011).The explosion of data isn’t new. It continues a trend that started in the 1970s. What has changed is the velocity of growth, the diversity of the data, and the imperative to make better use of it all to transform the business.“IDC’s prediction is that by 2020, the amount of information that IT is responsible for managing will grow by a factor of 44,” said May. “That means there will be work for IT to do. What is concerning is that the forecast is that IT headcount is only going to grow by a factor of 1.4. The situation is becoming unsustainable.”
  • Fazit: Daten wachsen massiv in Volumen und DiversitätBetrifft mich das??NEXT: RelativNow let’s take a look at the factors that are driving the need for a more strategic, information management approach. This slide describes the factors and the required diversity that is necessary for Information Management to support so that organizations can better capitalize on data, information, etc. This will allow us to contrast Information Management with a more the traditional vs. taking a more traditional or limited approach to managing data. The diagram represents the various factors and then highlights an example that represents the factors that dictate the need for a comprehensive information management approach.Each element that is included has a diversity of requirements or broad scope considerations that are driving the need to take a more strategic information management approach. Touch on each factor briefly and be familiar with the details below so that you can highlight the items that are most relevant to the account:Data Diversity – the information management strategy should accommodate the diversity of data, including the following attributes:Big data: Volume, velocity, variety – big data is relative, but all organizations need to think about the entirety of the data is at their disposal and how the requirements for storage, scale, processing, etc., will multiply in the future.Big text / unstructured data – in many cases, the big data phenomena is being driven by text or unstructured data and the need to mine value out of that text is critical for both analytical and operational scnearios. It is critical that information management strategies effectively accommodate text and unstructured data and this highlights the need for a combined analytics and data management capability.In-flight (perishable), data at rest – the ability to support data that is in-flight is becoming increasingly important given the large volumes of data that people are faced with and the increased amount of streaming data sources.Internal, External – it’s not just about internal data, and all internal data is not “on premise”. It’s also about partner data and other 3rd party data sources that need to be leveraged and managed effectively.Text / Unstructured / Semi-structured – although all data and data source types need to be accommodated in a comprehensive strategy, data types that are outside of the traditional legacy or relational structure are worth noting. This is where many organizations struggle, they have a decent handle on transactional or relational data, but text, unstructured, semi-structured (whatever you want to call it) is typically a challenge.All forms of information assets – addressing data diversity needs to be expanded beyond traditional sources of data. For example, analytical models are information assets that need to be leveraged and managed effectively.Consumption Diversity – the information management strategy needs to support an increasingly diverse set of consumption “devices” in a continuous and concurrent fashion:Support for multiple form factors and form factor versions including PCs, smart phones, tablets and industrial devices (medical, network, etc.)Support for user based interaction as well, automated and machine to machine support using a service based approach.  Use Case Diversity – the information management strategy should support all relevant use cases for an organization, including:Analytical prep, data migrations, single view, enterprise application integration, B2B integration, etc. Refer to Data Management market slides that outline the use cases based on Gartner’s definition. Application Diversity – the information management strategy should support all types of applications that span operational and analytical:Analytics – BI, Data Mining, Text Analytics, Forecasting, Optimization, etc.Operational / Transactional – ERP, CRM, HR, Supply Chain, etc. Temporal Diversity – the information management strategy should support the various temporal requirements relating to data – from batch to extremely low-latency (real-time or near-real time):Batch – for information processing that is suited to batch requirements, the processing must be supported in the appropriate batch window. In many cases, IT is being squeezed on both ends – batch windows are shrinking (in order to support 24x7x365 processing driven by international business, on-line commerce, etc.) while the amount of data and the processing or analytical requirements are expanding.Real-time / Near-real-time – there is an increased need to support processing in real-time via the introduction of rich information and analytic services.  Deployment Diversity& Scope – the information management strategy needs to support multiple deployment approaches:The information management strategy should effectively leverage multiple deployment or infrastructure options including:Cloud (public, private, hybrid) as well as various “as a Service types” like SaaS, PaaS, Analytic Results as a Service, On-premiseApplianceThe information strategy should take an expansive view of the capabilities and services that are needed to deploy and manage production information systems and the support necessary to support development, test, production cycle associated with operational applications or the iterative sandbox environment and production cycle associated with analytical applications. Architecture Scope – the information management strategy should support a robust architecture that supports a broad range of technical requirements:Enterprise Architecture approach – although not related to diversity, it’s important that an enterprise architecture approach is used to formulate the overall information strategy.The architecture approach should accommodate the ability to manage the analytical assets or models that are leveraged in analytical scenarios. Centralized or distributed – based on the needs of the organization (which will likely change), the information management strategy should support a centralized or distributed approach.Service-based SOA – supporting the diverse set of information and analytic requirements typically calls for a service-based approach.  Persona / Role Diversity – the information management strategy should support all of the different roles that are associated with managing information, including:Business – support for the key business stakeholders, both business analysts and end usersIT – support for the various roles that work with data in IT, including architects, DBAsData Stewards – support for those that are responsible for managing data governanceData Scientists (data analyst, statistician, etc.) – support for those that are responsible for managing analytics
  • Kundenbeispiel: 85GB Cube brauchtminutenlangzumslicen/dicen
  • Gartner IT trends vomOktober 2011Eingeklammert von Lösungsansätzen zum Thema Big Data
  • Wir müssen uns auf grosse Datenmengen einstellenWie gehe ich damit um?
  • INFORMATION OVERLOAD: Nicht alle 900 Mio Facebook Nutzer sind für mich relevant -> welche sprechen über meine Produkte / Dienstleistungen?Speicherung der Daten bringt keinen Mehrwert Das Grundproblem der IT verschärft sich damit: Aus den vielen Daten (Volumen), die sehr unterschiedlich strukturiert sind (Variety) und schnell anwachsen (Velocity) gilt es, die wertvollen, relevanten Daten herauszufiltern.17% of the world’s population used a social networking site in 2011.Twitter logs 100 million Tweets per day.Facebook counts 350 million unique visitors per day.60 hours of video is transferred to YouTube every 60 seconds.80% of companies use social media for recruitment.Many competitors are talking about the first 3 Vs. We belive that Value is the key one. Just like speed isn’t enough, gaining value from the data is all that matters.
  • BIG Data für alle Phasen relevantKeinSampling mehr!NEXT: Lösung?While organizations may represent the process differently, they are all using the same lifecycle. And frankly, simply having more data doesn’t give you a competitive advantage. It’s only when an organization begins to revisit their analytic lifecycle taking advantage of the additional data and compute power that they begin to gain an advantage – both over their competition and where they are today.NEXT:WiesiehteineLösungaus?
  • Verarbeiten grosse Datenmengen in relativ kurzer Zeit“Monte Rosa” des CSCS (ETH Zürich) beiLugano
  • ABER: Prozessoren zwar Commodity Hardware – der Rest leider nicht!HoheKostendurchspezialAnfertigungenCa. 15Mio CHF Betriebskosten / Jahr(“Monte Rosa” plus “Tödi” plus andere)Fokus auf WissenschaftlicheAnwendungenundWettervorhersagenNEXT: Lösungsansatz kann übernommen werden…
  • Gigantische Anforderungen an I/O Throughput(mehrere Mio Investition in Storage)
  • Verteilen der Daten auf vieleKnoten (Nodes)Jeder Knoten rechnet nur einen Teil der Daten und ist entsprechend schnellerKonsolidieren der Ergebnisse zu einem Gesamtergebnis (Map-Reduce Ansatz)
  • ADW = Analytical Data Warehouse -> Relevante Daten aus Enterprise Data Warehouse und operationalen SytemenThe SAS Enterprise Analytics Architecture, which includes SAS High Performance Computing technology, enables organizations to “raise the relevance” in their data. SAS has introduced the concept of an Analytical Data Warehouse (ADW) to sit “along side” the traditional Enterprise Data Warehouse (EDW). The ADW takes the complexity out of the EDW by surfacing only relevant data and INFORMATION to the business users. To draw a parallel . . . we all go to the grocery store. But, do you realize there are two parts to a grocery store? There are the aisles with the bread, milk, cereal, etc. This is part that the CONSUMER sees. And, of course, groceries are organized as the CONSUMER typically shops. The other part is generally behind the wall of the store where items are organized quite differently. Here, items are organized for maximum storage and for quick replenishment of the store shelves. So, the “AISLES” are the ADW, while the part “BEHIND THE WALL” is the EDW. In our world, SAS leverages both the EDW and the ADW, depending upon the nature of the analytical problem, the required data, and the needs of the consumer. Our Information Management capabilities, combined with our advanced analytics, provides the necessary linkages to make this happen. In addition, SAS’ Grid, In-Database, and In-Memory technologiesallow organizations to take large computational problems and distribute the analysis across the appropriate computing resources. This enables IT to “get out of” the “problem solving” business, and “get into” the “information providing” business. We provide the BUSINESS users with INFORMATION, organized in a fashion for consumption, enabling them to solve problemsquickly and confidently.SAS is the only vendor that can drive this High-Performance data-to-delivery process. 
  • Massiv Paralelle Datenbank Systeme mit SAS als Embedded ProcessEMC Greenplum oder TeradataApache Hadoop als open source AlternativeErprobt (und entwickelt)durch Amazon, Google, Facebook, YahooSkaliert praktisch beliebig!Commodity Hardware (Blade Server)Zweites Linux (Erfolgsstory)?!
  • 2-4 CPUs (jeweils 6-8 Cores)Bis zu 2 Terabyte Hauptspeicher (96 Gigabyte standard)Extreme CPU und Speicher-DichteGünstigda «von der Stange»DELL PowerEdge M910: 4GB to 1TB of DDR3 RAM, Intel Xeon Processor 7500 (8 Cores / 16 Threads per CPU)32 servers in one enclosure. A single rack can support 128 servers for a total of 1024 CPU cores and 2 terabytes of memory. And peak performance per rack is a staggering 12.3 teraflops.
  • Symmetrisches Multiprozessorsystem Ein symmetrisches Multiprozessorsystem (SMP) ist in der Informationstechnologie eine Multiprozessor-Architektur, bei der zwei oder mehr identische Prozessoren einen gemeinsamen Adressraum besitzen. Dies bedeutet, dass jeder Prozessor mit derselben (physikalischen) Adresse dieselbe Speicherzelle oder dasselbe Peripherieregister adressiert. Die meisten Mehrprozessorsysteme heute sind SMP-Architekturen.Massively Parallel Processing (MPP) bezeichnet in der Informatik die Verteilung einer Aufgabe auf mehrere Hauptprozessoren, die jeweils auch über eigenen Arbeitsspeicher verfügen können. Ein Massiv-paralleler Computer ist demnach ein Parallelrechner, der über eine Vielzahl (zum Teil mehrere tausend) unabhängiger Ausführungseinheiten verfügt.The SAS Enterprise Analytics Architecture, which includes SAS High Performance Computing technology, enables organizations to “raise the relevance” in their data. SAS has introduced the concept of an Analytical Data Warehouse (ADW) to sit “along side” the traditional Enterprise Data Warehouse (EDW). The ADW takes the complexity out of the EDW by surfacing only relevant data and INFORMATION to the business users. To draw a parallel . . . we all go to the grocery store. But, do you realize there are two parts to a grocery store? There are the aisles with the bread, milk, cereal, etc. This is part that the CONSUMER sees. And, of course, groceries are organized as the CONSUMER typically shops. The other part is generally behind the wall of the store where items are organized quite differently. Here, items are organized for maximum storage and for quick replenishment of the store shelves. So, the “AISLES” are the ADW, while the part “BEHIND THE WALL” is the EDW. In our world, SAS leverages both the EDW and the ADW, depending upon the nature of the analytical problem, the required data, and the needs of the consumer. Our Information Management capabilities, combined with our advanced analytics, provides the necessary linkages to make this happen. In addition, SAS’ Grid, In-Database, and In-Memory technologiesallow organizations to take large computational problems and distribute the analysis across the appropriate computing resources. This enables IT to “get out of” the “problem solving” business, and “get into” the “information providing” business. We provide the BUSINESS users with INFORMATION, organized in a fashion for consumption, enabling them to solve problemsquickly and confidently.SAS is the only vendor that can drive this High-Performance data-to-delivery process. 
  • BIG Data ist ein Hype!Das Buzzword wird verschwinden, aber die grossen Datenmengen werden bleiben eCommerceWEB 2.0Die technischen Voraussetzungen für BIG Analytics sind vorhanden SAS High Performance Analytics, überschaubarer finanzieller AufwandBIG Analytics verschafft den entscheidenden Wettbewerbsvorteil – SCHON HEUTE
  • Der immer größere Datenstrom muss frühzeitig sortiert werden: Manche Daten sollen sofort Ereignisse auslösen, andere Informationen müssen im Vertriebsreporting landen und wieder andere werden für eine spätere Analyse kostengünstig aufgehoben
  • Klassisches BI stößt hier schnell an Grenzen. Es geht immer mehr um Advanced Analytics mit der wertvolle Muster und Zusammenhänge erkannt werden können. Und das Ganze muss aussage-kräftig visualisiert werden, damit es verständlich bleibt.
  • Big data trend oder hype slideshare

    1. 1. Copyright © 2012, SAS Institute Inc. All rights reserved.make connections • share ideas • be inspiredDataGuido OswaldSolution Architect @SAS Switzerland
    2. 2. Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data….. Wer?
    3. 3. Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data….. Wer?
    4. 4. Copyright © 2012, SAS Institute Inc. All rights reserved.Wikipedia sagt: Als Big Data werden besonders große Datenmengenbezeichnet, die mit Hilfe von Standard-Datenbankenund Daten-Management-Tools nicht oder nurunzureichend verarbeitet werden können. Problematisch sind hierbei vor allem die Erfassung,die Speicherung, die Suche, Verteilung, Analyse undVisualisierung von großen Datenmengen. Das Volumen dieser Datenmengen geht in dieTerabytes, Petabytes, Exabytes und Zettabytes.
    5. 5. Copyright © 2012, SAS Institute Inc. All rights reserved.Ist BIG Data relevant für mich? Zettabytes ?! 1021 Byte 1.000.000.000.000.000.000.000 Byte
    6. 6. Copyright © 2012, SAS Institute Inc. All rights reserved.Zettabytes
    7. 7. Copyright © 2012, SAS Institute Inc. All rights reserved.* According to IDC
    8. 8. Copyright © 2012, SAS Institute Inc. All rights reserved.Diversität der Daten wächst!
    9. 9. Copyright © 2012, SAS Institute Inc. All rights reserved.E = mc2
    10. 10. Copyright © 2012, SAS Institute Inc. All rights reserved.Gartner IT Trends 20126. Next-Generation-Analytics7. Big Data8. In-Memory-Computing9. Extrem stromsparendeServer10. Cloud Computing1. Tablets2. Mobile Applikationenund Interfaces3. Kontextuelles undsozialesBenutzererlebnis4. Das Internet der Dinge5. App-Stores undMarktplätzeOktober 2011
    11. 11. Copyright © 2012, SAS Institute Inc. All rights reserved.BIG Data kommtWie hat sich nach Ihrer Einschätzung die Datenmenge, die in Ihrem Unternehmen verarbeitet wird,in den letzten zwölf Monaten verändert? Die Datenmenge ist ... (n=116 , Angaben in Prozent)Quelle: CW-Marktstudie 2011: Big Data.
    12. 12. Copyright © 2012, SAS Institute Inc. All rights reserved.VOLUMENVARIETÄTGESCHWINDIGKEITRELEVANZTODAY THE FUTUREDATASIZEWettbewerbsvorteil aus BIG Data
    13. 13. Copyright © 2012, SAS Institute Inc. All rights reserved.Analytics Lifecycle und BIG DataIDENTIFY /FORMULATEPROBLEMDATAPREPARATIONDATAEXPLORATIONTRANSFORM& SELECTBUILDMODELVALIDATEMODELDEPLOYMODELEVALUATE /MONITORRESULTSHow can wecreatestrategicadvantage?Domain ExpertMakes DecisionsEvaluates Processes and ROIBUSINESSMANAGERModel ValidationModel DeploymentModel MonitoringData PreparationIT SYSTEMS /MANAGEMENTData ExplorationData VisualizationReport CreationBUSINESSANALYSTExploratory AnalysisDescriptive SegmentationPredictive ModelingDATA MINER /STATISTICIAN
    14. 14. Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung: Supercomputer?
    15. 15. Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung: Supercomputer?“Monte Rosa” im CSCS (Lugano) AMD quad core Opteron 2,4 GHz Prozessoren 47.872 Kerne 297,0 TFLOPSABER: Kosten? Verteilen der Daten / Verarbeitung?
    16. 16. Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsTraditionelle Architektur
    17. 17. Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsStatt die Daten zur «Arbeit»Die «Arbeit» zu den Daten
    18. 18. Copyright © 2012, SAS Institute Inc. All rights reserved.Lösungsansatz für BIG AnalyticsMARKETINGVERKAUFFINANZENSUPPLYCHAINRISKHREDWADW
    19. 19. Copyright © 2012, SAS Institute Inc. All rights reserved.Die Lösung für BIG Analytics Paralleles Datenbanksystem EMC Greenplum Teradata Apache Hadoop Erprobt durch Google, Facebook, Yahoo Skaliert praktisch beliebig! Commodity Hardware (Blade Server)
    20. 20. Copyright © 2012, SAS Institute Inc. All rights reserved.Commodity Hardware: Blade Server
    21. 21. Copyright © 2012, SAS Institute Inc. All rights reserved.SAS Lösung für BIG Data
    22. 22. Copyright © 2012, SAS Institute Inc. All rights reserved.Fazit
    23. 23. make connections • share ideas • be inspiredCopyright © 2012, SAS Institute Inc. All rights reserved.Vielen Dank für IhreAufmerksamkeit!
    24. 24. Copyright © 2012, SAS Institute Inc. All rights reserved.make connections • share ideas • be inspiredBACKUP SLIDES
    25. 25. Copyright © 2012, SAS Institute Inc. All rights reserved.Welche Daten wohin?DECISIONS / ACTIONS / DATARAW RELEVANT DATALOW COST STORAGEENTERPRISEDATAWAREHOUSE
    26. 26. Copyright © 2012, SAS Institute Inc. All rights reserved.Business intelligence weiter denkenFORECASTINGDATA MININGTEXT ANALYTICSOPTIMIZATIONSTATISTICSINFORMATIONMANAGEMENTCopyright © 2012, SAS Institute Inc. All rights reserved.

    ×