Successfully reported this slideshow.

New Innovations in Information Management for Big Data - Smarter Business 2013


Published on

Big data has changed the IT landscape. Learn how
your existing IIG investment, combined with our
latest innovations in integration and governance, is a
springboard to success with big data use cases that
unlock valuable new insights. Presenter: David Corrigan, Big Data Specialist, IBM

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

New Innovations in Information Management for Big Data - Smarter Business 2013

  1. 1. New Innovations in Information Integration & Governance (IIG) for Big Data David Corrigan Director of Product Marketing, InfoSphere
  2. 2. Data Confidence Is Essential If you want to find new insights from big data . . . and ACT on those insights . . . you need confidence in the data used for insight Information Integration & Governance (IIG) • Make decisions with greater certainty • Analyze rapidly while providing necessary controls • Increase the value of data
  3. 3. Building Big Data Confidence is Essential Outperform Competitors Transform the Front Office Experience Establish Trusted Information 3x 80% 77% Organizations with IIG outperform their competitors Organizations rated their decision making as good or excellent Organizations establish high or very high level of trust in data
  4. 4. IIG Evolves for the Era of Big Data Automated Integration How do I get access to new big data sources? 1 How do I digest all of this new information? 2 How do manage all of this new data? 3 Business users need rapid data provisioning among the zones Visual Context Categorize, index, and find big data to optimize its usage Agile Governance Ensure appropriate actions based on the value of the data
  5. 5. Six Innovations that Build Big Data Confidence Data Click Automated Integration Visual Context Agile Governance Big Match Self-service data provisioning for big data repositories Integration of master records from big data with probabilistic matching powered by Hadoop * Information Governance Dashboard Big Data Catalogue Visual context to give immediate status on governance policies Categorize metadata on all big data sources Big Data Privacy & Security MDM for Big Data Monitor and mask sensitive big data in Hadoop, NoSQL, & relational systems * Rapid mastering of new big data sources and extension of 360 view with unstructured big data * * Statement of Direction
  6. 6. InfoSphere Data Click Self-service Data Provisioning Innovation • Two-click data provisioning designed for business users • Integration of more big data sources – JSON, NoSQL, Hadoop, JDBC Value Automated Integration Data Provisioning in 2 1 5000th Click Data theAccess time Of traditional approach • Rapid provisioning of ad-hoc repositories • Faster time to insight • Self service to eliminate the IT bottleneck Usage • Enables rapid analysis of big data sources * Source: IBM performance lab testing, showing JDBC inserts at 5.8% to 74% faster
  7. 7. Big Match Find & Integrate Master Data in Big Data Sources How It Works • Probabilistic matching on big data platform (BigInsights-Hadoop) • Matching at a higher volume • Matching of a wider variety of data sets Automated Integration Match Millions Of Records MDM Client Value • Find master data within big data sources • Get an answer faster – enable real-time matching at big data volumes Big Match Engine Usage • Provides more context by detecting master entities faster * Source: IBM InfoSphere performance team test results BigInsight s
  8. 8. Big Data Catalogue Find Big Data More Easily Innovation • Stores metadata on every available big data source • Provides structure to the Hadoop landing zone so data may be easily found and leveraged • Classifies data (origin, lineage, source, value….) Visual Context 170x Improvement in metadata import performance* Value • Find data more easily within a growing Hadoop landing zone and a complex zone architecture • Rapidly leverage new big data sources Usage • Enables optimal usage of big data * Source: IBM internal performance results, where three test runs with the latest version averaged 11.46 seconds vs 1,964 seconds with the previous release Big Data Catalogue
  9. 9. Information Governance Dashboard Visualize and Control Governance Innovation • Measurements for policies and KPIs • Rapid creation of tailored dashboards Value • Immediate insight into governance policy status • Interception of issues when they start, right at the source Usage • Raises data confidence with visual governance status Visual Context 1000s Of data points and policies visualized
  10. 10. Big Data Privacy and Security Protect a Wider Variety of Sources Agile Governance Innovation • Data activity monitoring of more NoSQL, Hadoop, and Relational Systems • Masking of sensitive data used in Hadoop 80% Faster Activity Monitoring* Value • Protection is a pre-requisite for the fundamental assumption of big data – sharing data for new insight • Automation enables protection without inhibiting speed RDBMS Hadoop InfoSphere Guardium InfoSphere Optim Usage • Ensures sensitive data is protected and secure •Source: IBM internal benchmarks of InfoSphere Guardium V9 p50 NoSQL Data Warehouses Application Data and Files
  11. 11. MDM for Big Data The Complete 360 View of Important Data How It Works • Extend the master view with federated, unstructured big data • Hybrid styles enable linking source records or consolidating based on confidence Agile Governance 21K Customer-centric transactions per second* Client Value • Visualize every related data item in the 360 view • Rapidly onboard new big data sources • MDM adapts to the source Usage • Provides a complete understanding of the customer or master entity MDM * Source: InfoSphere MDM with DB2 pureScale achieves: 21,000 customer-centric transactions a second, 2X transaction rate of Oracle MDM on Exalogic/Exadata using ½ the number of cores Note to U.S. Government Users Restricted Rights -Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Approved Claim in US/Canada only. Results valid as of 10/21/2012. Data Explorer
  12. 12. Demonstration
  13. 13. InfoSphere Delivers Data Confidence For Big Data Use Cases Big Data Exploration  Understand confidence  Determine risk Enhanced 360o View of the Customer  Establish master record  Extent to all sources Operations Analysis  High volume data integration  Automatic data protection Security/Intelligence Extension  Automatic data protection  Mask sensitive information Data Warehouse Augmentation  High volume data integration  Agile big data archiving and retrieval
  14. 14. Use Case Spotlight: Enhanced 360 View MDM and Big Data Deliver the Complete 360 View Capabilities Required to Be Successful 1. Combine structured MDM and unstructured big data MDM Data Explorer Single Version of the Truth Extended View of Master Data Integration & Quality 2. Rapidly onboard uncertain data sources in a registry style to separate low and high confidence data 3. Find and match master data entities within big data sources
  15. 15. Use Case Spotlight: Data Warehouse Augmentation Improve your data warehouse by improving data confidence Capabilities Required to Be Successful 1. Self-service integration for ad-hoc requests Integration & Quality Data Warehouse 2. Understand context of all available big data with a single metadata repository and business glossary 3. Mask any variety of sensitive data before ingestion High performance data loads 4. Automatically protect big data with activity monitoring MD M More Accurate Analysis Test Data Management Self-service Testing Archiving Automated Archiving Security & Privacy Automated Data Protection 5. Store and analyze archive files on Hadoop
  16. 16. A Busy Year of Innovation within the Labs Literally dozens of innovations that raise confidence in big data Two highlights: 1. BLU Acceleration 2. PureData System for Hadoop
  17. 17. BLU Acceleration IBM Research & Development Lab Innovations BLU Acceleration Dynamic In-Memory In-memory columnar processing with dynamic movement of unused data to storage Actionable Compression Industry’s first data compression that preserves order so that the data can be used without decompressing Parallel Vector Processing Multi-core and SIMD parallelism (Single Instruction Multiple Data) Data Skipping Skips unnecessary processing of irrelevant data Super Fast, Super Easy— Create, Load and Go! No indexes, No aggregates, No tuning, No SQL changes, No schema changes
  18. 18. BLU Acceleration: Customers are Seeing Great Results “With BLU Acceleration, we’ve been able to reduce the time spent on pre-aggregation by 30x—from one hour to two minutes! BLU Acceleration is truly amazing.” Yong Zhou, Sr. Manager of Data Warehouse & Business Intelligence Dept. “100x speed up with literally no tuning!” Lennart Henäng, IT Architect “Converting this roworganized uncompressed table to a columnorganized table in DB2 10.5 delivered a massive 15.4x savings!” Iqbal Goralwalla, Head of DB2 Managed Services, Triton
  19. 19. PureData System for Hadoop Bringing big data to the enterprise  Simplify the delivery of unstructured data to the enterprise  Integrate Hadoop with the data warehouse  Leverage Hadoop for data archive  Provide best in class security  Provide data exploration across structured and unstructured data  Accelerate insight with machine data  Accelerate insight with social data
  20. 20. Confidence Is Essential for Actionable Insight Automated Integration Visual Context Agile Governance • Make decisions with greater certainty • Analyze rapidly while providing necessary controls • Increase the value of data
  21. 21. Understanding Your Data is the Basis for Confidence