Your SlideShare is downloading. ×
IBM Netezza - The data warehouse in a big data strategy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

IBM Netezza - The data warehouse in a big data strategy

1,303
views

Published on

Big Data - Trender och verklighet inom Information Management. …

Big Data - Trender och verklighet inom Information Management.
Denna presentation hölls på IBM Data Server Day den 22 maj i Stockholm av Jacques Milman, Datawarehouse Architecture Leader, IBM

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,303
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Includes a wide variety of relational & non-relational data sources. The volume of this data is large, or can grow to be large. Though the data might be “ noisy ” , it can contain significant insight. Data may have an extremely short half-life. Comes from a wide variety of structured and unstructured (non-relational) sources. The volume of this data is large or can grow to be large. The data can contain significant insight, though a large portion may not be valuable.
  • Key Points There are many varied use cases that may be addressed by Internet-scale analytics – across many industries. Utilities – already mentioned the weather analysis for wind turbine placement. Mention the smart meter analysis – reading large volumes of smart meter data, and combining that with other consumption data, and weather data, to understand the impact on consumption. Also able to store anomalies in meter reads – data that doesn ’ t fit into a pre-defined structure – which aids in problem detection and repairs. IT – log analysis is a popular use case – bringing in log data from many systems to track a transaction that unfolds across many systems - helps determine where the error occurred. Previously this was possible, but only throuhg mapping many log files to one relational structure – which was costly and time consuming. E Commerce & Multi-channel – have already mentioned both use case previously – no need to dwell on them here. Transportation – taking in a huge variety of data – logistics, traffic data, weather patterns, fuel consumption – to optimize logistics – it contains all three elements of V3 – significant variety, velocity is moderate but the time window to solve the problem is quite short and the data is very volatile, and volume. Transmission monitoring – voltage levels and transient voltage fluctuations
  • Includes a wide variety of relational & non-relational data sources. The volume of this data is large, or can grow to be large. Though the data might be “ noisy ” , it can contain significant insight. Data may have an extremely short half-life. Comes from a wide variety of structured and unstructured (non-relational) sources. The volume of this data is large or can grow to be large. The data can contain significant insight, though a large portion may not be valuable.
  • Key Points There are many varied use cases that may be addressed by Internet-scale analytics – across many industries. Utilities – already mentioned the weather analysis for wind turbine placement. Mention the smart meter analysis – reading large volumes of smart meter data, and combining that with other consumption data, and weather data, to understand the impact on consumption. Also able to store anomalies in meter reads – data that doesn ’ t fit into a pre-defined structure – which aids in problem detection and repairs. IT – log analysis is a popular use case – bringing in log data from many systems to track a transaction that unfolds across many systems - helps determine where the error occurred. Previously this was possible, but only throuhg mapping many log files to one relational structure – which was costly and time consuming. E Commerce & Multi-channel – have already mentioned both use case previously – no need to dwell on them here. Transportation – taking in a huge variety of data – logistics, traffic data, weather patterns, fuel consumption – to optimize logistics – it contains all three elements of V3 – significant variety, velocity is moderate but the time window to solve the problem is quite short and the data is very volatile, and volume. Transmission monitoring – voltage levels and transient voltage fluctuations
  • Here is another example of something the University of Southern California Annenberg School of Communication did with the IBM Big Data platform’s BigSheets technology. USC@Annenburg created the Film Forecaster tool and used it to correctly predict 2011’s summer block busters based on scraping Twitter and analyzing that against a simple lexicon that described a positive or negative showing for a movie. They made quite the impact since this very solution was featured on ABC News (a national news agency in the USA). More striking is the quote: the application was built by a communication Masters student who learned Big Sheets in a day.
  • This picture is a little simplistic for 2 reasons: First if gives pre-eminence to Netezza. That is because Netezza’s simplicity, performance and agile support for ad-hoc analysis is often the default proposition for an analytic warehouse in a greenfield situation (though this is not necessarily true if there is an existing commitment to Power or to DB2). Secondly it does not recognise the differentiation between exploratory analysis and repeated analysis. But if you are doing exploratory analysis of relational (ie structured) data, Netezza is a better platform; it thrives on ad-hoc analysis and has very rich tooling (INZA, SPSS etc) for analytics. Clearly exploratory on unstructured is BigI, Exploratory analysis on something in between (e.g. CDRs) could be done on Netezza, but if the data is not already being loaded (and even in a Netezza customer the raw XDRs are probably not loaded into the warehouse) then exploration in a low-cost Hadoop grid makes tons of sense. We have at least one customer use case of this, where once the analysis was repeatable it was implemented in the Netezza. But there are also use cases where the repeated analysis remains in BigI, exploiting its differentiating enterprise readiness.
  • If it’s data in motion (remember the babies being monitored). it has to be real-time. it has to be Streams. That’s the easy one. If it’s unstructured data, at rest, the best place to start is BigInsights, though you may load data into the relational warehouse subsequently for further insight. If it’s relational data, it’s unlikely you are going to move it to Hadoop If it’s semi-structured you have a choice and you’ll be influenced by these other development factors: It may be that an organization has already developed a map-reduce solution that delivers a high value analysis for data that was unloaded from the corporate EDW. Is the right solution to say ‘great, now you know the solution, re-code it in SQL using in-database analytics and implement it on your warehouse?’ Maybe a better solution is to implement BigInsights to enterprise-harden the Hadoop environment and run the application as is, but with production applications reliability and supportability. It may be that the volume is so huge that a DWH can’t handle it and certainly can’t handle it economically (think Vestas) it may be better to go to the platform with more of the appropriate analytic skills or other development resources available It may be that the customer wants to build their capability in Hadoop because they will have more challenging use case later that will be clear-cut BigInsights use cases. It may be that the customer just wants to experiment cheaply and quickly (though actually that’s more a BigI Basic edition use case – we ’ ll be looking to enterprise harden it later) But remember they are influencers, not deciders. IBMers can adapt to whatever best matches the customer’s needs, because of the comprehensive nature of our big data portfolio.
  • Transcript

    • 1. February 2012IBM Netezza:the data warehouse in a big data strategy © 2012 IBM Corporation
    • 2. Information ManagementWhat is “BIG DATA”? All kinds of data Large• volumes Existing sources of data continue to grow Valuable insight, •but difficult to extract now available New sources of data are Often extremely • detailed customer data time sensitive • internet sources • instrumentation • Data arrives at an increasing rate © 2012 IBM Corporation
    • 3. Information Management Utilities Financial Services  Weather impact on power  Fraud detection generation  Risk management  Transmission monitoring  360° View of the Customer  Smart grid managementTransportation IT Weather and traffic  Transition log analysis for impact on logistics and multiple systems fuel consumption  CybersecurityHealth & Life Sciences Epidemic early warning Retail ICU monitoring  Customer 360° View Healthcare monitoring  Click-stream analysis  Real-time promotions Telecommunications Law Enforcement  CDR processing  Real-time multimodal surveillance  Churn prediction  Situational awareness  Geomapping / marketing  Cyber security detection  Network monitoring © 2012 IBM Corporation
    • 4. Information ManagementWhat is “BIG DATA”? MATHs All kinds of data Large volumes Valuable insight, but difficult to extract Often extremely time sensitive © 2012 IBM Corporation
    • 5. Information Management Utilities Financial Services  Weather impact on power  Fraud detection generation  Risk management  Transmission monitoring  360° View of the Customer  Smart grid management Variety: Manage the complexity of multiple relational and non-relational dataTransportation types and schemas IT Weather and traffic  Transition log analysis for impact on logistics and multiple systems Streaming data and large volume fuel consumption Velocity:  Cybersecurity data movementHealth & Life Sciences Epidemic early warning Retail ICU monitoring  Customer 360° View Healthcare monitoring Volume: Scale from terabytes to zettabytes  Click-stream analysis  Real-time promotions Telecommunications Law Enforcement  CDR processing  Real-time multimodal surveillance  Churn prediction  Situational awareness  Geomapping / marketing  Cyber security detection  Network monitoring © 2012 IBM Corporation
    • 6. Information ManagementMarketing to a segment of one • Identifies items that shoppers are likely to buy in future visits • Coupon redemption rates as high as 24% “Because of (Netezza’s) in-database technology, we believe well be able to do 600 predictive models per year (10X as many as before) with the same staff”. - Eric Williams, CIO & Executive VP 6 © 2012 IBM Corporation
    • 7. Information ManagementNetezza in-database analytics at Catalina Marketing 35X improvement in staff productivity – model development reduced from 2+ months to 2 days – 90 models per year in 2006 – 900 models per year in 2011 • with the same staff – model scoring time reduced from 4.5 hours to 60 seconds Increased depth of data per model – 150 to 3.2 million features – 1 million to 14.5 trillion records per analysis ROI on IT investment – direct correlation between number of models and revenue © 2012 IBM Corporation 7
    • 8. Information Management Big Data Analytics in Smarter Hospitals Big Data enabled doctors from University of Ontario to apply neonatal infant monitoring to predict infection in ICU 24 hours in advance IBM Data Baby youtube.com8 © 2012 IBM Corporation
    • 9. Information Management University of Ontario Institute of Technology Use case – Neonatal infant monitoring – Predict infection in ICU 24 hours in advance Solutions – 120 children monitored :120K msg/sec, billion msg/day – Trials expanding to include hospitals in US and China Event Pre- Analysis processer Framework Sensor Stream-based Distributed Interoperable Solutions Network Health care Infrastructure (Applications) (9 © 2012 IBM Corporation
    • 10. Information Management Vestas optimizes capital investments based on 2.5 Petabytes of information.  Model the weather to optimize placement of turbines, maximizing power generation and longevity.  Reduce time required to identify placement of turbine from weeks to hours.  Incorporate 2.5 PB of structured and semi- structured information flows. Data volume expected to grow to 6 PB. © 2012 IBM Corporation
    • 11. Information ManagementBig Data Made Easy for the Little GuyUSC’s Film Forecaster correctly predicted a clamor for "Hangover 2” thatresulted in $100 million opening over Memorial Day weekend – Looked at 250K-500K Tweets and broke down positive and negative messages using a lexicon of 1700 words The Film Forecaster sounds like a big undertaking for USC, but it really came down to one communications masters student who learned Big Sheets in a day, then pulled in the tweets and analyzed them - Ryan Kim © 2012 IBM Corporation
    • 12. Information ManagementIBM big data platform InfoSphere BigInsights Hadoop-based analytics for variety and volume Hadoop Information Stream InfoSphere Information Integration Computing InfoSphere Streams Server Low-latency Analytics forHigh-volume data integration streaming data and transformation MPP Data Warehouse IBM optimized workload data warehouses Scalable, high-performance, mixed-workload analytics on structured data © 2012 IBM Corporation
    • 13. Information ManagementIBM big data platform © 2012 IBM Corporation
    • 14. Information ManagementIBM big data platform InfoSphere BigInsights IBM Netezza InfoSphere Streams Analytics on Big Data at Rest Analytics on Unstructured Structured Big Data in Motion © 2012 IBM Corporation
    • 15. Information ManagementIBM big data platform © 2012 IBM Corporation
    • 16. Information ManagementIBM big data platform • Big Data • Volume • Velocity • Variety • Combining data types & sources • Combining technologies to analyse it • Complementing the relational warehouse © 2012 IBM Corporation
    • 17. Information Management © 2012 IBM Corporation

    ×