Your SlideShare is downloading. ×
0
Big Data AnalyticsIBM Power Event – Hindsgavl SlotMay 2, 2012Flemming Bagger, Nordic Sales Leader for Big Data Analytics a...
Why is 2012 the YEAR of Big Data?     “Big Data: The next frontier for innovation,     competition and productivity”     M...
Insights from the IBM Global CEO Study 2010Vast majority of CEOs experience the New Economic Environmentas distinctly diff...
IBM Institute for Business ValueWhich underprepared areas are the most critical for CMOs                            Market...
Information is at the Center                  … And Organizationsof a New Wave of Opportunity…                 Need Deeper...
The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing proportionately to the ava...
What should a Big Data platform do?                      Analyze a Variety of Information                      Novel analy...
IBM Big Data Strategy: Move the Analytics Closer to the Data     Netezza is for High Economic Value data     that requires...
Why Didn’t We Use All of the Big Data Before? 9                                          © 2012 IBM Corporation
One customer... Two data worlds                                  Product/Service                                 •Subscrip...
Complementary Approaches for Different Use Cases                  Traditional Approach                      New Approach  ...
IBM Big Data Strategy: Move the Analytics Closer to the Data      Netezza is for High Economic Value data      that requir...
InfoSphere Streams: Analyze all your data, all the time, just in time                                        What if you c...
Traditional Computing                                   Stream Computing     Historical fact finding - Find and analyze   ...
InfoSphere Streams for superior real time analytic processing            Streams Processing Language (SPL)                ...
IBM Big Data Strategy: Move the Analytics Closer to the Data      Netezza is for High Economic Value data      that requir...
InfoSphere BigInsights – A Full Hadoop Stack      User Interface                Integrated                    Management  ...
What is Hadoop?      Apache Hadoop – free, open source framework      for data-intensive applications       – Inspired by ...
Machine Learning Analytics      SystemML      – IBM Research invented Machine Learning engine for native use on        Big...
Statistical and Predictive Analysis      Framework for machine learning (ML) implementations on Big Data       – Large, sp...
Customer Use Case: Log Analytics (storing computer logs &transaction data)  Business Problem: The size and volume of log d...
Log Analysis is a Big Data Problem      Volume      – Large number of devices      – Logs generated at hardware, Firmware,...
Log Analysis - why     IBM and its customers have huge amounts of log data        System logs        Application logs     ...
Insight into your logs                        Data Analyst,                Analytics                                      ...
Optimizing capital investments based on double-digit Petabyte analysis   Business Challenge                               ...
The Big Data Challenge 7/25/2008               Google passes 1 trillion URLs $187/second             Cost of last Ebay out...
The Big Data ChallengeThe Biggest Big Data challenge of our future     –   Humans are limited     –   Sensors are unbounde...
Current approaches might not be enough in the future          Understand current state and desired state … 28             ...
THINK29      ibm.com/bigdata   © 2012 IBM Corporation
Upcoming SlideShare
Loading in...5
×

Big Data, IBM Power Event

1,223

Published on

BIGData - nye indsigter
BIGData giver mulighed for at få indsigt i og udnytte potentioalet i store, ustrukturerede datamængder.
Inspiration til, hvordan man kan skabe værdi for sin virksomhed.

Flemming Bagger, Nordic Segment Leader, Big Data, IBM
Søren Ravn, Consulting IT Specialist, IBM

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,223
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
61
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Big Data, IBM Power Event"

  1. 1. Big Data AnalyticsIBM Power Event – Hindsgavl SlotMay 2, 2012Flemming Bagger, Nordic Sales Leader for Big Data Analytics and Data WarehousingSøren Ravn, Consulting IT Specialist for Big Data © 2012 IBM Corporation
  2. 2. Why is 2012 the YEAR of Big Data? “Big Data: The next frontier for innovation, competition and productivity” McKinsey Global Institute 2012 will be the year of big data BBC Nov 30 2011 Big Data will be the CIO Issue of 2012 IDC Prediction 2012 report Searches for "big data" on Gartners website have increased 981% between March 2011 - October 2011 “most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies… They are increasingly asking the question, "How can we use big data to deliver new insights?" Gartner 20122 © 2011 IBM Corporation
  3. 3. Insights from the IBM Global CEO Study 2010Vast majority of CEOs experience the New Economic Environmentas distinctly different The New Economic Environment Full Sample Nordics 13% 18% 69% 13% 19% 68% More volatile Deeper/faster cycles, more risk 14% 21% 65% 8% 13% 79% More uncertain Less predictable 18% 22% 60% 28% 31% 41% More complex Multi-faceted, interconnected 26% 21% 53% 34% 29% 37% Structurally different Sustained change “Last year’s experience was a wake-up call, like looking into the dark with no light at the end of the tunnel.” CEO, Industrial Products, The Netherlands Not at all/to a limited extent To some extent To a large/very large extentSource: Q7 To what extent is the new economic environment different? Volatile n=1514; Uncertain n=1521; Complex n=1522 ; Structurally different n=1523; Nordics n=83 © 2010 IBM Corporation
  4. 4. IBM Institute for Business ValueWhich underprepared areas are the most critical for CMOs Marketing Priority Matrix 1 Data explosion Underpreparedness Percent of CMOs reporting 2 Social media underpreparedness 1 3 Growth of channel and device choices 70 2 4 Shifting consumer demographics 3 5 Financial constraints 4 6 Decreasing brand loyalty 60 5 7 Growth market opportunities 6 10 7 9 8 ROI accountability 8 11 9 Customer collaboration and influence 50 12 10 Privacy considerations 13 11 Global outsourcing Factors impacting marketing 12 Regulatory considerations 40 Percent of CMOs selecting as “Top five factors” 13 Corporate transparency 0 20 40 60 MeanSource: Q7 Which of the following market factors will have the most impact on your marketing organization over the next 3 to 5 years? n1=1733; Q8 How prepared are you to manage the impact of the top 5 market factors that will have the most impact on your marketing organization over the next 3 to 5 years?4 n2=149 to 1141 (n2 = number of respondents who selected the factor as important in Q7) © 2011 IBM Corporation
  5. 5. Information is at the Center … And Organizationsof a New Wave of Opportunity… Need Deeper Insights44x 2020 35 zettabytes Business leaders frequentlyas much Data and ContentOver Coming Decade 1 in 3 make decisions based on information they don’t trust, or don’t have 1 in 2 Business leaders say they don’t have access to the information they need to do their jobs 80% of CIOs cited “Business 83% intelligence and analytics” as part of their visionary plans to enhance competitiveness 2009 Of world’s data800,000 petabytes is unstructured of CEOs need to do a better job 60% capturing and understanding information rapidly in order to make swift business decisions55 © 2012 IBM Corporation
  6. 6. The Big Data Conundrum The percentage of available data an enterprise can analyze is decreasing proportionately to the available to that enterprise Quite simply, this means as enterprises, we are getting “more naive” about our business over time Data AVAILABLE to an organization Data an organization can PROCESS 6 © 2012 IBM Corporation
  7. 7. What should a Big Data platform do? Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before The 3 Vs Analyze Information in Motion Streaming data analysis Large volume data bursts & ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze petabytes of information Manage & analyze high volumes of structured, relational data Discover & Experiment Ad-hoc analytics, data discovery & experimentation Manage & Plan Enforce data structure, integrity and control to ensure consistency for repeatable queries 7 © 2012 IBM Corporation
  8. 8. IBM Big Data Strategy: Move the Analytics Closer to the Data Netezza is for High Economic Value data that requires deep, extensive and frequent analysis with results delivered in minutes Streams is for Low Latency, Real Time Analysis of high velocity data with results delivered sub-second after which the data is discarded or stored elsewhere Big Insights is for Discovery and Exploration on data of uncertain economic value to identify patterns and correlations which can be proceduralised… it can also be used as a lower cost per terabyte store of data that is used or accessed in a non- time critical manner 8 © 2012 IBM Corporation
  9. 9. Why Didn’t We Use All of the Big Data Before? 9 © 2012 IBM Corporation
  10. 10. One customer... Two data worlds Product/Service •Subscriptions •Rate Plans Virtual Worlds •Media Type •Category/Classification •Price Customer •Segment Starts, Stops •Social Network Collaboration •Demographics Success Rates • Sex, Age Group, etc Errors •Tenure •Rate plan •Credit Rating, ARPU Group Social Networking Network •Availability Throughput Structured •Throughput/Speed Setup Time Repeatable •Latency •Location Connection Time Content Usage Linear •Facilities Communities Transactions sales reports Monthly Interface •Voice, Profitability analysis SMS, MMS •Discovery •Data & Web Sessions •Navigation Customer surveys •Click Streams •Recommendations •Purchases •Downloads Recency •Signaling, Authentication Device Frequency •Probe/DPI •Class Monetary •Manufacturer Latency Blogs/Micro-blogs •Model •OS •Media Capability •Keyboard Type10 © 2012 IBM Corporation
  11. 11. Complementary Approaches for Different Use Cases Traditional Approach New Approach Structured, analytical, logical Creative, holistic thought, intuition Data Hadoop, Warehouse Streams Transaction Data Web Logs Internal App Data Social Data Structured Structured Unstructured Unstructured Repeatable Enterprise Exploratory Mainframe Data Repeatable Linear Integration Exploratory Text Data: emails Iterative Linear Monthly sales reports Iterative sentiment Brand Profitability analysis Product strategy OLTP System Data surveys Customer MaximumSensor data: images asset utilization ERP data Traditional New RFID Sources Sources 11 © 2012 IBM Corporation
  12. 12. IBM Big Data Strategy: Move the Analytics Closer to the Data Netezza is for High Economic Value data that requires deep, extensive and frequent analysis with results delivered in minutes Streams is for Low Latency, Real Time Analysis of high velocity data with results delivered sub-second after which the data is discarded or stored elsewhere Big Insights is for Discovery and Exploration on data of uncertain economic value to identify patterns and correlations which can be proceduralised… it can also be used as a lower cost per terabyte store of data that is used or accessed in a non- time critical manner 12 © 2012 IBM Corporation
  13. 13. InfoSphere Streams: Analyze all your data, all the time, just in time What if you could get IMMEDIATE insight? Analytic Results What if you could analyze MORE kinds of data? What if you could do it with exceptional price/performance? Alerts / Actions Billing/ Transaction More context Systems Customer Real-time Offers Traditional Data, Sensor Events, Threat Prevention Signals Systems Enterprise Storage and Warehousing 13 13 © 2012 IBM Corporation
  14. 14. Traditional Computing Stream Computing Historical fact finding - Find and analyze Real time analysis of data-in-motion - analyses information stored on disk data before you store it Batch paradigm, pull model A stream of structured or unstructured data Query-driven: submits queries to static data Analytic operations on streaming data Relies on Databases, Data Warehouses in real-time Databases find the needle in the haystack Streams finds the needle as it’s blowing by Query Query Data Data Results Results Data Data Query Query Results Results14 © 2012 IBM Corporation
  15. 15. InfoSphere Streams for superior real time analytic processing Streams Processing Language (SPL) Compile groups of operators built for Streaming applications: into single processes: Efficient use of cores Reusable operators Distributed execution Rapid application development Very fast data exchange Continuous “pipeline” processing Can be automatic or tuned Scaled with push of a button Use the data that gives you a competitive advantage: Can handle virtually any data type Use data that is too expensive and time sensitive for traditional approachesEasy to extend: Built in adaptors Users add capability with familiar C++ and Java Dynamic analysis: Programmatically change Easy to manage: Flexible and high topology at runtime performance transport: Create new subscriptions Automatic placement Create new port properties Extend applications incrementally Very low latency without downtime High data rates Multi-user / multiple applications 15 © 2012 IBM Corporation
  16. 16. IBM Big Data Strategy: Move the Analytics Closer to the Data Netezza is for High Economic Value data that requires deep, extensive and frequent analysis with results delivered in minutes Streams is for Low Latency, Real Time Analysis of high velocity data with results delivered sub-second after which the data is discarded or stored elsewhere Big Insights is for Discovery and Exploration on data of uncertain economic value to identify patterns and correlations which can be proceduralised… it can also be used as a lower cost per terabyte store of data that is used or accessed in a non- time critical manner 16 © 2012 IBM Corporation
  17. 17. InfoSphere BigInsights – A Full Hadoop Stack User Interface Integrated Management Development Analytics Install Console Tooling (ODS) Visualization Application Analytics Pig Hive Jaql Avro Zookeeper ML Analytics MapReduce AdaptiveMR Text Analytics Oozie Lucene Storage HBase HDFS GPFS-SNC Data Sources/ Streams DB2 LUW Netezza R Connectors Data Stage DB2 z Teradata Flume Informix Oracle 17 © 2012 IBM Corporation
  18. 18. What is Hadoop? Apache Hadoop – free, open source framework for data-intensive applications – Inspired by Google technologies (MapReduce, GFS) – Originally built to address scalability problems of Web search and analytics – Extensively used by Yahoo! Enables applications to work with thousands of nodes and petabytes of data in a highly parallel, cost effective manner – CPU + disks of a commodity box = Hadoop node – Boxes can be combined into clusters – New nodes can be added without changing • Data formats • How data is loaded • How jobs are written Processing MapReduce framework – How Hadoop understands and assigns work to the Storage nodes (machines) Hadoop Distributed File System = HDFS – Where Hadoop stores data – A file system that spans all the nodes in a Hadoop cluster – It links together the file systems on many local nodes to make them into one big file system 18 © 2012 IBM Corporation
  19. 19. Machine Learning Analytics SystemML – IBM Research invented Machine Learning engine for native use on BigInsights Directly implementing ML algorithms on MapReduce is difficult – Natural mathematical operators need to be re-expressed in terms of key-value pairs, map and reduce functions. – Data characteristics dictate the optimal MapReduce implementation, so user bears responsibility for efficient hand-coding Sample Uses – Finding non-obvious data correlations over Internet Scale data collections • E.g. Topic Modeling, Recommender Systems, Ranking, … 19 © 2012 IBM Corporation
  20. 20. Statistical and Predictive Analysis Framework for machine learning (ML) implementations on Big Data – Large, sparse data sets, e.g. 5B non-zero values – Runs on large BigInsights clusters with 1000s of nodes Productivity – Build and enhance predictive models directly on Big Data – High-level language – Declarative Machine Learning Language (DML) • E.g. 1500 lines of Java code boils down to 15 lines of DML code – Parallel SPSS data mining algorithms implementable in DML Optimization – Compile algorithms into optimized parallel code – For different clusters – For different data characteristics – E.g. 1 hr. execution (hand-coded) down to 10 mins 4500 4000 3500 Execution Time (sec) 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 # non zeros (million) Java Map-Reduce SystemML Single node R 20 © 2012 IBM Corporation
  21. 21. Customer Use Case: Log Analytics (storing computer logs &transaction data) Business Problem: The size and volume of log data generated by computer systems constrains the ability of many enterprises to create and maintain effective platforms for compliance and analysis. IBM Solution: – Ingests the all system logging at low latency (under 15 minutes) and re-assembles the transactions into a whole, providing exact details on system component response times and trending. – This solution can store more than a year’s worth of data. – An analytics layer can be delivered through a web front-end, and standard browser based tooling for ah-hoc analytics. 21 © 2012 IBM Corporation
  22. 22. Log Analysis is a Big Data Problem Volume – Large number of devices – Logs generated at hardware, Firmware, OS and middleware, – Aggregation over time for predictive analysis generates vast amounts of log data Velocity – Online analysis needed to explore the data to discover meaningful correlations Variety – Logs formats lack a unified structure • Variation across device types, firmware middleware versions – Log data needs to be supplement with additional data • Performance and Availability/Fault data • Reference data 22 © 2012 IBM Corporation
  23. 23. Log Analysis - why IBM and its customers have huge amounts of log data System logs Application logs We know there is valuable information hidden in these logs Anomaly detection: What kind of alerts should I add to my automated monitoring system? Root cause analysis: What sequence of minor problems caused this major problem? Resource planning: Where do I need to add redundancy? When should a particular machine be replaced? Marketing: How can I turn more of the visitors to my site into customers? But getting that information out requires Extraction, transformation and complex statistical analysis at scale23 © 2012 IBM Corporation
  24. 24. Insight into your logs Data Analyst, Analytics End User Programmer Developer Reports & Import Ad-Hoc Logs Transform Analyze Exploration Dashboard Dashboards &Alerts Import Import Analyze Analyze - -Log files, performance data, fault –– Sessionization: Identify which records are part of the same Sessionization: Identify which records are part of the same Log files, performance data, fault sessions data, reference data (network sessions data, reference data (network –– Identify subsequences containing fault or performance issue topology, device dictionaries) topology, device dictionaries) Identify subsequences containing fault or performance issue from various source systems into from various source systems into –– Observe correlations Observe correlations HDFS HDFS –– Predictive operators Predictive operators Transform Transform Visualize Visualize –– Identify record boundries, Extract Identify record boundries, Extract –– Ad-Hoc exploration with BigSheets information from text, Identify Ad-Hoc exploration with BigSheets information from text, Identify patterns patterns –– Institutionalizing the knowledge gleaned from Ad-Hoc Institutionalizing the knowledge gleaned from Ad-Hoc exploration (Network operating center dashboards, reports, exploration (Network operating center dashboards, reports, –– Find cross log relationships and Find cross log relationships and alerts) integration across diverse data alerts) integration across diverse data sources sources –– Build indexes Build indexes24 24 © 2012 IBM Corporation
  25. 25. Optimizing capital investments based on double-digit Petabyte analysis Business Challenge Solution Components: Wind turbines are expensive, have a service life of ~25 years IBM InfoSphere Existing process for turbine placements requires weeks of analysis, uses BigInsights Enterprise subset of available data and does not yield optimal results Edition: Project objectives GPFS-based file Leverage large volume of weather data to optimize placement of turbines. system capable of (2+ PB today; ~20 PB by 2015) running Hadoop and Reduce modeling time from weeks to hours. non-Hadoop apps Analyze data from turbines to optimize ongoing operations. Powerful, extensible query support (JAQL) The benefits Read-optimized column storage Clear fulfillment of Vestas business needs through IBM technology and expertise IBM xSeries hardware Reliability, security, scalability, and integration needs fulfilled Standard enterprise software support Single-vendor solution for software, hardware, storage, support 25 © 2012 IBM Corporation
  26. 26. The Big Data Challenge 7/25/2008 Google passes 1 trillion URLs $187/second Cost of last Ebay outage ($16,156,800/Day) 789.4 PB Current size of YouTube 2/4/2011 IPv4 address space is exhausted, 4.3 billion addresses have been allocated (340x1038) Size of IPv6 address space 100 million gigabytes Size of Google’s index 144 million Number of Tweets per day 1.7 trillion Items at Facebook - 90 PB of data 4.3 Billion Mobile devices 26 © 2012 IBM Corporation
  27. 27. The Big Data ChallengeThe Biggest Big Data challenge of our future – Humans are limited – Sensors are unbounded – “Sensorization” of everything means – Everything is a sensor The problem – Don’t know the future value of a dot today – Cannot connect dots we don’t have27 © 2012 IBM Corporation
  28. 28. Current approaches might not be enough in the future Understand current state and desired state … 28 © 2012 IBM Corporation
  29. 29. THINK29 ibm.com/bigdata © 2012 IBM Corporation
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×