New Innovations in Information Management for Big Data - Smarter Business 2013


Published on

Big data has changed the IT landscape. Learn how
your existing IIG investment, combined with our
latest innovations in integration and governance, is a
springboard to success with big data use cases that
unlock valuable new insights. Presenter: David Corrigan, Big Data Specialist, IBM

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Key PointsThere’s a notion that you should govern data to make it an asset, or because you ought to do it, or because you have to due to compliance. Those are true, but the real reason you do it is for competitive advantage.Information is supposed to inform all of our decisions – to unlock new insights for competitive advantage, to gain market share, etc. But the biggest hindrance to using information is confidence – if users don’t trust the data, they won’t use it. Trusting the data means you actually use it to your advantage, and that’s the source of outperforming peers.Those same companies are able to transform their front office experience – by making faster decisions at the point of interaction, and making better decisions. In fact, 4 out of 5 companies with mature IIG rated their decision making as 7/10 (very good) or higher. In other words, better data means better decisionsAnd the users have confidence in their data because they know it’s trusted – it’s made obvious to them what has been done to verify, validate, and improve the information they are using. In other words, they make better decisions because they trust their data.Client Stories & Anecdotes24,800 Lives Saved with better information confidence - Premier used a variety of IBM software products to improve patient health and reduce costs. The InfoSphere products (Master Data Management, Information Server, DataStage, and QualityStage) were able to create a singular, trusted view of each data entry in the system. The combination of all of the products were able to create a better data warehouse.Catchy StatementThe reason you integrate and govern data is as simple as this – you’ll outperform your competitors by making better decisions because your employees have confidence in and therefore use data available to them.
  • Key Points: - With so many initiatives dependent on data, simply getting access to the right data is a challenge. - InfoSphere Data Click accelerates a whole host of projects by making it easier to get started, without dealing with long waits for IT resources - Data Click has been very well received since its introduction last year, and now it is becoming even more helpful by enabling integration of data from more big data sources (JSON, NoSQL, Hadoop, lots of others via JDBC) How InfoSphere Data Click Works: Data Click now provides rapid access to a wide range of data, in repositories like Teradata, Netezza, SQL Server, Greenplum, Informix, Sybase, files and more . . . in addition to the original sources (DB2, Oracle) and original target (Netezza)Relevant Story: - Message in a Nutshell: Universal connectivity with just two clicks
  • Key PointsThrough hundreds of client implementations, briefings and consultations – we’ve determined a common set of big data use casesEach of the use cases requires different big data technologyEach of the use cases requires a different set of governance capabilities and a different level of appropriate governanceFor example, big data exploration. This use case is all about ingesting big data quickly or discovering it in its source systems, determining its relative value, experimenting with big data, and utilizing it. From an IIG perspective – its critical that you be able to discover and determine the confidence of the data. That’s not so say it should be improved or governed yet while you’re exploring. It’s focused on understanding your confidence level in the data to determine if you trust the outcomes, or whether the data needs to be improved before it’s analyzed. Enhanced 360° View – this use case is about truly knowing everything about master entities such as the customer. In order to find big data for the customer, you first need to establish the unique customer record – and that’s where MDM along with data quality and integration play a role.Security and Intelligence Extension – this use case is about monitoring data – log data, network data – to prevent data loss, threats, fraud, among other things. IIG helps by providing automatic protection of sensitive data, masking it, and also aiding in the detection of fraudulent individuals and networks.Operations Analysis – this use case is all about analyzing operational data – from machines and networks – either streaming information or data at rest. It requires high volume data integration to move and integrate data among the zones.DW Augmentation – this use case focused on augmenting the DW – sometimes that means archiving data from the DW but still being able to access and analyze it, sometimes it includes complementing the DW with unstructured data and unconventional sources. IIG helps by providing high volume data integration to and from the DW, as well as archiving capabilities to track the lifecycle of data.
  • Key PointsThe use case is about joining the power of MDM with the power of big data to truly know everything about your customer.MDM manages big data volumes for structured master data – matching, consolidating, and providing master data as a service.Data Explorer extends that view by finding and displaying all available big data related to that customer record . The capabilities you need for a true 360° view include:Combining structured and unstructured master data – join master records with unstructured content in one viewOnboard new data sources – keep them as separate but linked records to enable a complete view – and as your confidence level with those uncertain sources rises – merge them into a single golden record. Hybrid MDM – the ability to act as both a virtual/registry style approach for some systems while acting as a transaction-hub, single physical record for other systems enables organization to onboard big data systems as ‘virtual records’ rapidly, and consolidate to the physical record over time.Finding master entities within big data sources – the ability to match data at big data volumes as well as identifying master records in new big data sources.Catchy StatementMany software categories have proclaimed victory in the holy grail that is the “360° view” but each has fallen short by only offering a piece of that view. Finally, this is a solution that delivers on that promise.
  • Key PointsThis use case is about augmenting the DW with the power of new big data technologiesIn order to do that effectively, you also need IIG capabilities, such asSelf-service integration – the ability for business users, or data scientists and analytic professionals who work in the LOB, to access and integrate data on demandUnderstand context – to view the context of what data is available in the DW, what is available to augment the DW, and how it is related. Also the ability to have a business glossary of terms, of very industry-specific terms, to ensure everyone is utilizing the correct terminologyMask sensitive data to ensure privacy Protect and monitor data within the DW to prevent data loss/breaches.Store and analyze archive files on Hadoop – manage the lifecycle of data and the compliance requirements for archiving and disposal of data.
  • Key PointsConfidence is iterative. Varying amounts of IIG are required for each big data use case. It’s only with agile governance that you can apply the appropriate level of governance to be successful. I’ll leave you with a final question – are you confident in your data?You definitely need to answer that question before you begin your big data journey.
  • New Innovations in Information Management for Big Data - Smarter Business 2013

    1. 1. New Innovations in Information Integration & Governance (IIG) for Big Data David Corrigan Director of Product Marketing, InfoSphere
    2. 2. Data Confidence Is Essential If you want to find new insights from big data . . . and ACT on those insights . . . you need confidence in the data used for insight Information Integration & Governance (IIG) • Make decisions with greater certainty • Analyze rapidly while providing necessary controls • Increase the value of data
    3. 3. Building Big Data Confidence is Essential Outperform Competitors Transform the Front Office Experience Establish Trusted Information 3x 80% 77% Organizations with IIG outperform their competitors Organizations rated their decision making as good or excellent Organizations establish high or very high level of trust in data
    4. 4. IIG Evolves for the Era of Big Data Automated Integration How do I get access to new big data sources? 1 How do I digest all of this new information? 2 How do manage all of this new data? 3 Business users need rapid data provisioning among the zones Visual Context Categorize, index, and find big data to optimize its usage Agile Governance Ensure appropriate actions based on the value of the data
    5. 5. Six Innovations that Build Big Data Confidence Data Click Automated Integration Visual Context Agile Governance Big Match Self-service data provisioning for big data repositories Integration of master records from big data with probabilistic matching powered by Hadoop * Information Governance Dashboard Big Data Catalogue Visual context to give immediate status on governance policies Categorize metadata on all big data sources Big Data Privacy & Security MDM for Big Data Monitor and mask sensitive big data in Hadoop, NoSQL, & relational systems * Rapid mastering of new big data sources and extension of 360 view with unstructured big data * * Statement of Direction
    6. 6. InfoSphere Data Click Self-service Data Provisioning Innovation • Two-click data provisioning designed for business users • Integration of more big data sources – JSON, NoSQL, Hadoop, JDBC Value Automated Integration Data Provisioning in 2 1 5000th Click Data theAccess time Of traditional approach • Rapid provisioning of ad-hoc repositories • Faster time to insight • Self service to eliminate the IT bottleneck Usage • Enables rapid analysis of big data sources * Source: IBM performance lab testing, showing JDBC inserts at 5.8% to 74% faster
    7. 7. Big Match Find & Integrate Master Data in Big Data Sources How It Works • Probabilistic matching on big data platform (BigInsights-Hadoop) • Matching at a higher volume • Matching of a wider variety of data sets Automated Integration Match Millions Of Records MDM Client Value • Find master data within big data sources • Get an answer faster – enable real-time matching at big data volumes Big Match Engine Usage • Provides more context by detecting master entities faster * Source: IBM InfoSphere performance team test results BigInsight s
    8. 8. Big Data Catalogue Find Big Data More Easily Innovation • Stores metadata on every available big data source • Provides structure to the Hadoop landing zone so data may be easily found and leveraged • Classifies data (origin, lineage, source, value….) Visual Context 170x Improvement in metadata import performance* Value • Find data more easily within a growing Hadoop landing zone and a complex zone architecture • Rapidly leverage new big data sources Usage • Enables optimal usage of big data * Source: IBM internal performance results, where three test runs with the latest version averaged 11.46 seconds vs 1,964 seconds with the previous release Big Data Catalogue
    9. 9. Information Governance Dashboard Visualize and Control Governance Innovation • Measurements for policies and KPIs • Rapid creation of tailored dashboards Value • Immediate insight into governance policy status • Interception of issues when they start, right at the source Usage • Raises data confidence with visual governance status Visual Context 1000s Of data points and policies visualized
    10. 10. Big Data Privacy and Security Protect a Wider Variety of Sources Agile Governance Innovation • Data activity monitoring of more NoSQL, Hadoop, and Relational Systems • Masking of sensitive data used in Hadoop 80% Faster Activity Monitoring* Value • Protection is a pre-requisite for the fundamental assumption of big data – sharing data for new insight • Automation enables protection without inhibiting speed RDBMS Hadoop InfoSphere Guardium InfoSphere Optim Usage • Ensures sensitive data is protected and secure •Source: IBM internal benchmarks of InfoSphere Guardium V9 p50 NoSQL Data Warehouses Application Data and Files
    11. 11. MDM for Big Data The Complete 360 View of Important Data How It Works • Extend the master view with federated, unstructured big data • Hybrid styles enable linking source records or consolidating based on confidence Agile Governance 21K Customer-centric transactions per second* Client Value • Visualize every related data item in the 360 view • Rapidly onboard new big data sources • MDM adapts to the source Usage • Provides a complete understanding of the customer or master entity MDM * Source: InfoSphere MDM with DB2 pureScale achieves: 21,000 customer-centric transactions a second, 2X transaction rate of Oracle MDM on Exalogic/Exadata using ½ the number of cores Note to U.S. Government Users Restricted Rights -Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Approved Claim in US/Canada only. Results valid as of 10/21/2012. Data Explorer
    12. 12. Demonstration
    13. 13. InfoSphere Delivers Data Confidence For Big Data Use Cases Big Data Exploration  Understand confidence  Determine risk Enhanced 360o View of the Customer  Establish master record  Extent to all sources Operations Analysis  High volume data integration  Automatic data protection Security/Intelligence Extension  Automatic data protection  Mask sensitive information Data Warehouse Augmentation  High volume data integration  Agile big data archiving and retrieval
    14. 14. Use Case Spotlight: Enhanced 360 View MDM and Big Data Deliver the Complete 360 View Capabilities Required to Be Successful 1. Combine structured MDM and unstructured big data MDM Data Explorer Single Version of the Truth Extended View of Master Data Integration & Quality 2. Rapidly onboard uncertain data sources in a registry style to separate low and high confidence data 3. Find and match master data entities within big data sources
    15. 15. Use Case Spotlight: Data Warehouse Augmentation Improve your data warehouse by improving data confidence Capabilities Required to Be Successful 1. Self-service integration for ad-hoc requests Integration & Quality Data Warehouse 2. Understand context of all available big data with a single metadata repository and business glossary 3. Mask any variety of sensitive data before ingestion High performance data loads 4. Automatically protect big data with activity monitoring MD M More Accurate Analysis Test Data Management Self-service Testing Archiving Automated Archiving Security & Privacy Automated Data Protection 5. Store and analyze archive files on Hadoop
    16. 16. A Busy Year of Innovation within the Labs Literally dozens of innovations that raise confidence in big data Two highlights: 1. BLU Acceleration 2. PureData System for Hadoop
    17. 17. BLU Acceleration IBM Research & Development Lab Innovations BLU Acceleration Dynamic In-Memory In-memory columnar processing with dynamic movement of unused data to storage Actionable Compression Industry’s first data compression that preserves order so that the data can be used without decompressing Parallel Vector Processing Multi-core and SIMD parallelism (Single Instruction Multiple Data) Data Skipping Skips unnecessary processing of irrelevant data Super Fast, Super Easy— Create, Load and Go! No indexes, No aggregates, No tuning, No SQL changes, No schema changes
    18. 18. BLU Acceleration: Customers are Seeing Great Results “With BLU Acceleration, we’ve been able to reduce the time spent on pre-aggregation by 30x—from one hour to two minutes! BLU Acceleration is truly amazing.” Yong Zhou, Sr. Manager of Data Warehouse & Business Intelligence Dept. “100x speed up with literally no tuning!” Lennart Henäng, IT Architect “Converting this roworganized uncompressed table to a columnorganized table in DB2 10.5 delivered a massive 15.4x savings!” Iqbal Goralwalla, Head of DB2 Managed Services, Triton
    19. 19. PureData System for Hadoop Bringing big data to the enterprise  Simplify the delivery of unstructured data to the enterprise  Integrate Hadoop with the data warehouse  Leverage Hadoop for data archive  Provide best in class security  Provide data exploration across structured and unstructured data  Accelerate insight with machine data  Accelerate insight with social data
    20. 20. Confidence Is Essential for Actionable Insight Automated Integration Visual Context Agile Governance • Make decisions with greater certainty • Analyze rapidly while providing necessary controls • Increase the value of data
    21. 21. Understanding Your Data is the Basis for Confidence