The New Possible: Very Big Data for Serious Business Value


Published on

The Briefing Room with Robin Bloor and SAP
Slides from the Live Webcast on Oct. 9, 2012

With Big Data come big databases, but not the old-fashioned kind. The biggest databases these days bring the best of both worlds: traditional columnar power and functionality, combined with the scalability and schema-agnostic flexibility of NoSQL. This one-two punch creates a new world of possibilities for business value, because companies can dynamically weave together the contextual insights of Big Data, with the granular focus of high-powered Business Intelligence.

Check out this episode of The Briefing Room to hear veteran Analyst Robin Bloor who will explain how this new era of hybrid Very Large Databases will fundamentally transform the manner in which companies store and analyze enterprise data. He'll be briefed by Courtney Claussen of SAP Sybase, who will tout her company's recent innovations for loading, processing and delivering massive amounts of data, both structured and unstructured, even for environments with thousands of concurrent users.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The New Possible: Very Big Data for Serious Business Value

  1. 1. Eric.kavanagh@bloorgroup.comTwitter Tag: #briefr
  2. 2. !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers!Twitter Tag: #briefr
  3. 3. !  November: Cloud !  December: Innovators !  January: Big Data !  February: Performance !  March: IntegrationTwitter Tag: #briefr
  4. 4. !  Databases were designed primarily to store information for retrieval at a later time. !  Big Data requires big databases. !  The convergence of multi-structured data and the need to perform both transactional and operational analytics has led to substantial innovations in database technologies. !  Today some of the biggest databases blend the best of both worlds, transforming the way organizations store and analyze enterprise data.Twitter Tag: #briefr
  5. 5. Robin Bloor is Chief Analyst at The Bloor Group. Robin.Bloor@Bloorgroup.comTwitter Tag: #briefr
  6. 6. !  German-founded SAP is one of the largest software companies in the world. Its best-known products are SAP ERP, SAP Business Warehouse, SAP Business Objects, SAP Sybase IQ and SAP HANA. !  SAP offers a comprehensive set of database management solutions that spans the needs of the enterprise, leveraging in- memory, cloud and mobile technologies. !  Recent innovations include a Big Data analytics platform that loads, processes and delivers massive amounts of multi- structured data and is accessible on demand enterprise-wide.Twitter Tag: #briefr
  7. 7. Courtney Claussen is a product manager at Sybase, Inc., concentrating on Sybases data warehousing and analytics products. She has enjoyed a 30 year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics.Twitter Tag: #briefr
  8. 8. The New Possible: Very Big Data for Serious Business ValueThe Briefing Room with Dr. Robin Bloor and SAPOctober 9, 2012 CON FIDE N TIAL
  9. 9. AGENDA •  Big Data Analytics: A Reality •  SAP Sybase IQ: Built for Big Data Analytics •  SAP Sybase IQ: Continuing Innovation©  2012 SAP AG. All rights reserved. 10
  10. 10. Big Data Analytics A Reality
  11. 11. THE NEW DYNAMICS OF BUSINESSCOMPETING ON BIG DATA DRIVEN ANALYTICS New Strategies & Business Models Business Value* Operational Revenue Efficiencies Growth *A McKinsey study titled “Big Data: Next frontier for innovation, competition, and productivity”, May 2011, has found huge potential for Big Data Analytics with metrics as impressive as 60% improvements in Retail operating margins, 8% reduction in (US) national healthcare expenditures, and $150M savings in operational efficiencies in European economies©  2012 SAP AG. All rights reserved. 12
  12. 12. Getting Value from Big Data Find supply chain inefficiencies Predict financial Uncover insurance fraud performance Applied Optimize stocking of Big Data Dispense correct health care products Analytics Maintain customer loyalty©  2012 SAP AG. All rights reserved. 13
  13. 13. EDW AND BIG DATA PLATFORMSCONTRASTS Big Data Clickstreams, EDW sensors, log data, unstructured social media Large -> ENORMOUS Pre-processed data -> Raw data Business Schema -> No schemaVale* SQL -> Programmatic Enterprise data, relational, OLAP -> Batch processing structured, indexed text Scale up -> Scale out©  2012 SAP AG. All rights reserved. 14
  14. 14. EDW AND BIG DATA PLATFORMSPARTNERSHIP Big Data Clickstreams, sensors, EDW log data, unstructured social media •  Combine all relevant data for better insights Business •  Real-time BI Vale* •  SQL declarative processing •  Big Data pre-processing with EDW deep analytics Enterprise data, relational, structured, indexed text©  2012 SAP AG. All rights reserved. 15
  15. 15. Big-data analytics plus data warehousingDeserves a new platform Data loading Mobile OLAP* Integrated Web Operational workflow reporting Specialized Data apps mining Ÿ  Platform accessible to all business Support  massive  numbers     processes and all Ÿ  Volume of  users  and  workloads   business users Ÿ  Velocity Ÿ  Requirement for Ÿ  Variety Ÿ  MapReduce data and algorithms Ÿ  Costs Ÿ  RDBMS+ Ÿ  EDW together in the Ÿ  Skills In-DB analytics platform Analyze  massive  volumes   Ÿ  Ability to distribute of  complex  data  from  many  sources   interactions throughout the enterprise *Online analytical processing HDFS +Relationaldatabase management system©  2012 SAP AG. All rights reserved. 16
  17. 17. Grid architectureSystem scale out Full Mesh Interconnect Storage FabricMulti-dimensional scale out•  Multiple resources can scale out independently –  Storage, server (CPU, memory), SAN switches, interconnect can scale on their own•  Scale out is incremental and linear –  No need to add large units of monolithic CPU/storage pairs©  2012 SAP AG. All rights reserved. 18
  18. 18. Deployed use casecomScore Networks measures the digital worldŸ  comScore provides solutions for online audience measurement, e-commerce, advertising, search, video and mobile to analysts with digital marketing and vertical-specific industry expertiseŸ  Large SAP Sybase IQ Multiplex Grid on v15.x with 10s of servers and hundreds of CPU coresŸ  Manages more than 150TB of data with trillions of rows and 10s of thousands of tablesŸ  More than 200+ concurrent users with highly parallel and distributed workloadŸ  Incrementally scalable on commodity hardware …………… Storage Fabric©  2012 SAP AG. All rights reserved. 19
  19. 19. Community platformElastic virtual data marts VDM1 VDM2 Shared Full Mesh High Speed Interconnect Virtual Shared CPU, Memory Logical Server 1 Logical Server 2 Storage Fabric Virtual Shared StorageVirtual data marts•  VDM is logical binding of mutually exclusive nodes, memory, storage –  Logical Server (LS) is a mutually exclusive logical binding of nodes, memory –  Logical Server (LS) is a subset of VDM –  Bindings are elastic i.e. they can dynamically grow/shrink©  2012 SAP AG. All rights reserved. 20
  20. 20. Robust load engineLoading can be from multiple Extraction, transformation, and load (ETL) in SAP softwaremodes: Scale out Scale outŸ  Parallel bulk load processing: ETL project 1 ETL project 1 ETL project 1 –  Load rates in excess of 250 GB/hr are common even with modest-size Full-­‐mesh  interconnect   hardware nodesŸ  Continuous and trickle feed via microbatching (change data capture)Page-level snapshot versioning: Storage fabricŸ  Allows non-blocking concurrent loads and queriesLoad from client machines©  2012 SAP AG. All rights reserved. 21
  21. 21. Query engineDistributed query processing Query 1 Query 2 5 node DQP 4 node DQP Storage Fabric Massively parallel processing•  Leader node: Receives and initiates queries –  Any node can be a leader –  Leader node may satisfy query within itself•  Worker node: Nodes pick up work units from leader –  Many worker nodes per query –  Same worker node can serve multiple queries©  2012 SAP AG. All rights reserved. 22
  22. 22. Text search and analysis Table in SAP Text index Full textText File ingestion intoload blob or clob Sybase IQ queries ? TextCol ID Term Pos InfoText Filtering to plain abcfiltering 0 a 1,3,4 text and formatting feed dad 1 b 1,5 dead 2 c 1Schema Hierarchical to Visualizationtransform 3 d 2,3,4 relational beef … 4 e 2,4,5 … 5 f 2,5Entity Categorizationextraction tokenizationFull-text queries:SELECT * FROM myTable WHERE CONTAINS (TextCol, ‘d’); – returns rowsSELECT * FROM myTable CONTAINS (TextCol, ‘d’); – returns rows and scoringSELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a AND NOT b’); – BooleanSELECT * FROM myTable WHERE CONTAINS (TextCol, ‘a NEAR b’); – proximity©  2012 SAP AG. All rights reserved. 23
  23. 23. In-database analyticsNo compromise for complex analytics:Ÿ  Basic to advanced analytical functions available to SQLŸ  Data never leaves the database until results are materializedŸ  Analytics code and models are shareableŸ  Analytics code and models are applicable to the latest data setŸ  Average developer can build in database analytical models Process in SAP Sybase IQ Database = logic and filtering Built-in functions External DLL “A” applied in database External DLL “A” Analytics simplified: Logic to data = fast and efficient©  2012 SAP AG. All rights reserved. 24
  24. 24. FederationWith external file systems (Hadoop distributed file system)1. Client-side federation: Join data from SAP Sybase IQ and Hadoop at a client-application level Load Hadoop data into column store of2. ETL SAP Sybase IQ: Extract, transform, and load data from Hadoop distributed file system (HDFS) into schemas of SAP Sybase IQ Join HDFS data with data of SAP Sybase IQ on the fly: Fetch and join subsets of HDFS data on demand, using SQL queries3. from SAP Sybase IQ (data federation technique) Combine results of Hadoop MapReduce (MR) jobs with SAP Sybase IQ data on the fly: Initiate and join results of MR jobs on demand using SQL queries from data in SAP Sybase IQ4. (query federation technique)©  2012 SAP AG. All rights reserved. 25
  25. 25. Native MapReduceHighly distributed processing without Hadoop SELECT (Reducer… (Mapper… OVER PARTITION BY…) OVER PARTITION BY…) … … Parallel mapper TPFs Parallel reducer TPFs Storage Fabric •  TPFs (Table Parameterized Functions) consume/produce data sets in bulk •  TPFs run in parallel •  TPFs are fed with disjoint data sets •  TPFs can be arbitrarily nested to multiple levels via sub-queries •  TPFs currently available in popular, performance efficient C++©  2012 SAP AG. All rights reserved. 26
  26. 26. SAP Sybase IQ: Continuing Innovation IMPORTANT LEGAL DISCLAIMER CONCERNING PROGRAM DATES, RELEASE-RELATED INFORMATION & CONTENT All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions.
  27. 27. SAP Sybase IQ: Next WaveInnovations for extremely large databases (XLDB) Storage Architecture Loading Engine •  New generation column store •  Fully parallel bulk loading •  New partitioning and compression •  Real-time loading into delta store Petabytes SAP Real-time Sybase IQ: Next Wave System Reliability Query Processing •  Grid resiliency •  Data affinity •  Data availability •  Aggressively parallel and distributed©  2012 SAP AG. All rights reserved. 28
  28. 28. Summary
  29. 29. SAP SYBASE IQA COMPREHENSIVE PLATFORM FOR BIG DATA ANALYTICS Sybase  PowerDesigner,   Bradmark,   SAS,  SPSS,  KXEN,   Sybase  Replica9on  Server,   Symantec,   Fuzzy  Logix,   BMMSoZ,     SAP  BusinessObjects   Whitesands,   Zemen9s,  Visual   SOLIX,  PBS     ISYS,  Panop9con   Quest,  ZEND   Numerics   Eco-System Op9mized  BI,EIM,   Dev  and  admin  tools   Predic9ve  Analy9cs     Packaged  ILM  apps   Model,  Replicate   App. Services Hadoop,   DBMS R   Comprehensive   Built-­‐in  Full   InDB  Analy9cs  w/   Big  Data   Web  2.0  APIs   ANSI  SQL  w/OLAP   Text  Search   MapReduce  +  simulator   OpnSrc    APIs   Most  mature   Comprehensive   MPP  queries  +  Virtual   High  Speed   Structured  +   column  store   lifecycle  9ering   Marts  +  User  scaling   loads   Unstructured  Store  ©  2012 SAP AG. All rights reserved. 30
  30. 30. © 2012 SAP AG. All rights reserved.No part of this publication may be reproduced or transmitted in any form or for any SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjectspurpose without the express permission of SAP AG. The information contained Explorer, StreamWork, SAP HANA, and other SAP products and servicesherein may be changed without prior notice. mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries.Some software products marketed by SAP AG and its distributors containproprietary software components of other software vendors. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other BusinessMicrosoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Objects products and services mentioned herein as well as their respective logosMicrosoft Corporation. are trademarks or registered trademarks of Business Objects Software Ltd.IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, Business Objects is anSystem x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, SAP company.zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and otherParallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, Sybase products and services mentioned herein as well as their respective logosPOWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAPBladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, company.Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere,Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM All other product and service names mentioned are the trademarks of theirCorporation. respective companies. Data contained in this document serves informational purposes only. National product specifications may vary.Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. The information in this document is proprietary to SAP. No part of this documentAdobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or may be reproduced, copied, or transmitted in any form or for any purpose withoutregistered trademarks of Adobe Systems Incorporated in the United States and/or the express prior written permission of SAP AG.other countries.Oracle and Java are registered trademarks of Oracle.UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, andMultiWin are trademarks or registered trademarks of Citrix Systems, Inc.HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®,World Wide Web Consortium, Massachusetts Institute of Technology. ©  2012 SAP AG. All rights reserved. 31
  31. 31. Twitter Tag: #briefr
  32. 32. The Universe of Big Data The  Bloor  Group  
  33. 33. In marketing terms BIG DATA is as big a trend as cloud computing(if you measure the trend in terms of column inches) The  Bloor  Group  
  34. 34. The Big Data Trend q  Corporate data volumes grow at about 55% per annum q  VLDB volumes grow at about 55% per annum q  This is exponential q  Data has been growing at this rate for at least 20 years q  As such there is nothing new about big data other than the current data volumes which follow a well established trendTwitter Tag: #briefr The  Bloor  Group  
  35. 35. So What s New?q  Volume, velocity, variety, verifiability and other words beginning with V - but not all at onceq  Hadoop is newq  Big Data in the cloud is newq  And there’s a new dynamic in data analyticsq  Volume (and velocity) is now mostly about events, not transactions, and the world of embedded processors is going to expand the number of events worth processing The  Bloor  Group  
  36. 36. The Analytics Two-Step The  Bloor  Group  
  37. 37. The Future?q  The data growth trend is likely to continueq  More and more companies will be drawn into using Big Data technologiesq  Will the two-step become a one-step? Not sure. If you gather Big Data, you also need to be able to throw it awayq  RDBMS (column store) will remain as the analytics engine The  Bloor  Group  
  38. 38. !  Please explain why you believe that the Sybase IQ shared some things architecture is equal to or better than a shared nothing architecture?!  Are you seeing the same trend that I seem to be noticing with Big Data in respect to analytics?!  Roughly how many of your customers are using Hadoop?!  If I were a Sybase customer would you recommend Hadoop as an ETL mechanism or is it your view that Sybase IQ can do it all? The  Bloor  Group  
  39. 39. !  Please describe the most extensive use of Sybase IQ (in respect of data volumes, daily ingest, instances, etc.).!  How difficult is it to use (in other words, what are the labor/DBA overheads compared to a traditional RDBMS)?!  Are your competitors always the usual suspects (i.e. other column store products)? Do you ever compete with the NoSQL crowd?!  Explain how you usually fit with HANA in sites where both products are in use. Is HANA promoting sales of Sybase IQ? The  Bloor  Group  
  40. 40. Twitter Tag: #briefr
  41. 41. !  This Month: Database !  November: Cloud !  December: Innovators !  January: Big Data !  2013 Editorial Calendar ( Tag: #briefr
  42. 42. Twitter Tag: #briefr