• Share
  • Email
  • Embed
  • Like
  • Private Content
In the Mix: How Native Integration Improves MapReduce and Hadoop
 

In the Mix: How Native Integration Improves MapReduce and Hadoop

on

  • 1,071 views

The Briefing Room with Robin Bloor and Sybase, an SAP company...

The Briefing Room with Robin Bloor and Sybase, an SAP company
Slides from the Live Webcast on May 22, 2012

The either/or question pervades the world of analytics. With so many options for crunching numbers, deciding which avenue to take can be difficult. One positive trend is the convergence of open source solutions with traditional database technologies, merging the best of both approaches. In this episode of The Briefing Room, veteran database Analyst Robin Bloor explains how new service layers are integrating the many available options for the legwork of analysis.

Bloor will be briefed by David Jonker and Courtney Claussen of Sybase, an SAP Company. Jonker will describe several of the key features in the latest iteration of Sybase IQ, a columnar database that’s been on the market for more than two decades. Claussen will provide details on a litany of improvements, including: a native MapReduce API, Predictive Model Markup Language (PMML) support, integration with Hadoop, and an expanded library of statistical and data mining algorithms that leverage the power of Sybase IQ PlexQ® massively parallel processing (MPP) technology.

For more information visit: http://www.insideanalysis.com

Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E

Statistics

Views

Total Views
1,071
Views on SlideShare
1,066
Embed Views
5

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 5

http://ui.inxero.india 5

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    In the Mix: How Native Integration Improves MapReduce and Hadoop In the Mix: How Native Integration Improves MapReduce and Hadoop Presentation Transcript

    • Tuesday, May 22, 12
    • Eric.kavanagh@bloorgroup.com Twitter Tag: #briefrTuesday, May 22, 12
    • Reveal the essential characteristics of enterprise software, good and bad Provide a forum for detailed analysis of today’s innovative technologies Give vendors a chance to explain their product to savvy analysts Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefrTuesday, May 22, 12
    • May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database Twitter Tag: #briefrTuesday, May 22, 12
    • Ultimately analytics is about businesses making optimal decisions, although the range of technologies that inhabit this area is wide: statistical analysis, data mining, process mining, predictive analytics, predictive modeling, business process modeling and complex event processing. With the advent of big data, analytics has become “big analytics” with organizations diving into large heaps of data that previously was not available or usable. A major challenge with this market trend is to be able to provide adequate performance for all BI and analytics workloads on the volumes of data that are now being assembled and which are continuously growing. Twitter Tag: #briefrTuesday, May 22, 12
    • Robin Bloor is Chief Analyst at The Bloor Group. Robin.Bloor@Bloorgroup.com Twitter Tag: #briefrTuesday, May 22, 12
    • SAP Sybase has a history of database innovation and application from the corporate RDBMS through to the mobile and embedded market. Sybase IQ has been deployed in many areas of application and is used in many complex predictive analytics deployments, where speed data capacity and versatility are critical. Recently it has been upgraded to be used in a symbiotic manner with Hadoop in order to provide a comprehensive capability as a BI and analytics engine for Big Data applications Twitter Tag: #briefrTuesday, May 22, 12
    • David Jonker works in the area of Data Management & Analytics for SAP and is Product Marketing Director for Sybase IQ. In the last 5 years David has led product marketing teams for Sybase’s Data Management & Analytics product lines, including Sybase IQ, Sybase ASE, SQL Anywhere, and Advantage Database Server. His career includes over 10 years in software engineering and product management. Before joining Sybase, David had consulting, product management and software development roles. Courtney Claussen is a product manager at Sybase, Inc., focusing on Sybases data warehousing and analytics products. She has enjoyed a 30 year career in software development, technical support and product marketing in the areas of computer aided design, computer aided software engineering, database management systems, middleware, and analytics. Twitter Tag: #briefrTuesday, May 22, 12
    • Sybase IQ 15.4 Overview — Big data analytics & HadoopTuesday, May 22, 12
    • Sybase IQ Widespread success Stands out as the leading enterprise data warehouse among the largest banks, insurance agencies, and Manage and analyze Analyze ALL Federal tax Analyze complex telecom operators statistical measures for returns in the US models in more than worldwide the entire nation 200 financial institutions of Canada worldwide Store and analyze massive amounts of industry segment data in 30 of the largest information providers in the world, including Transunion, Nielsen and Axiom © 2012 SAP AG. All rights reserved. 10Tuesday, May 22, 12
    • BIG DATA ANALYTICS ISSUES Dealing with volume, variety, velocity, costs, skills Volume Managing and harnessing terabytes of data Skills Variety Lack of adequate BIG Harmonizing silos of skills for non- structured and standard platforms DATA unstructured data and APIs ANALYTICS Costs Velocity Too expensive to Keeping up with acquire, operate, unpredictable data and expand and query flows © 2012 SAP AG. All rights reserved. 11Tuesday, May 22, 12
    • Sybase IQ 15 A powerful big data analytics platform in the making 2009 2009 2010 2011 2011 Big data v15.0 v15.1 v15.2 v15.3 v15.4 analytics Skills MapReduce API Costs PlexQ™ MPP Foundation Variety Text Search, Web 2.0 API Velocity In-Database Analytics API Volume VLDB Platform Foundation © 2012 SAP AG. All rights reserved. 12Tuesday, May 22, 12
    • Sybase IQ 15.4 A comprehensive platform for big data analytics Sybase Eco-System CONTROL Sybase CERTIFITED ISV TOOLS CENTER POWERDESIGNER Unstructured Data Ingest + Persist (Hadoop, App Content Mgmt) Services Web 2.0 Java C/C++ SQL Federation Structured Data (DBMS) DMBS © 2012 SAP AG. All rights reserved. 13Tuesday, May 22, 12
    • Details: In-Database Analytics & HadoopTuesday, May 22, 12
    • In-database analytics in Sybase IQ No compromise for complex analytics  Basic to advanced analytical functions available to SQL directly from Sybase IQ engine  Data never leaves the database until results are materialized  Analytics code / models must be shareable yet must allow AD-HOC analysis  Analytics code / models must be applicable to the latest data set  Standards based access, concept extensibility is compulsory  Performance and scalability is a given  Average developer must be able to build In-database analytical models Sybase  IQ  Process Database  =   Logic/Filtering Built-­‐In  func6ons External  DLL  “A” Applied  in  database    Analy7cs  simplified:  Logic  To  Data    =  Fast  +  Efficient External  DLL  “A”   © 2012 SAP AG. All rights reserved. 15Tuesday, May 22, 12
    • Tuesday, May 22, 12
    • In-database analytics in Sybase IQ Custom functions APIs Several different forms of C++ and JAVA UDF APIs for building custom In-database analytics, each valid at different locations within queries 1.{Scalar} to {Scalar functions} e.g. sin, cosine, … 2.{Scalar set} to {Scalar functions} e.g. max, min, … 3.{Scalar set} to {Scalar set} e.g. OLAP windows, … 4.{Scalar set} to {Tables} e.g. join result sets, … 5.{Scalar set, Tables} to {Tables} e.g. MapReduce, … All variants are parallelizable, but (5) is also distributable across the PlexQ™ grid © 2012 SAP AG. All rights reserved. 17Tuesday, May 22, 12
    • In-database analytics in Sybase IQ Java custom functions 3 Feature Characteristics Big Data Use Cases JAVA User •External algorithms written as • Ideal for ISV or custom Data Mining Defined Function libraries for Healthcare, eCommerce, JAVA fns, plugged into Sybase IQ Public Sector offers a new in- Apps include: •JAVA fns via SQL: runs In- database analytics – ISV partner Zementis built a plug-in Database, much faster than client API for PMML (Predictive Modeling side Markup Language) models •JAVA fns run protected/fault – Validates PMML from SAS, R,.. – Translates PMML to JAVA UDFs tolerant (in separate process) – JAVA UDFs called from SQL •Supports scalar and table outputs •Supports all data types Plug-In PMML Zementis Sybase IQ JAVA UDF © 2012 SAP AG. All rights reserved. 18Tuesday, May 22, 12
    • SYBASE IQ 15.4 DECONSTRUCTED App services — integrating Sybase IQ + Hadoop: at client side 6a Feature Characteristics Big Data Use Cases Client side •Client tool capable of querying • Ideal for bringing together Big Data federation: Join Analytics pre-computations from Sybase IQ and Hadoop different domains data from •Currently certified client tool is • Example — In Telecommunication: Sybase Sybase IQ AND IQ with aggregated customer loyalty data & Quest Toad for Cloud Hadoop at a client Hadoop with aggregated network utilization application level •Better performance when results data; Quest Toad for Cloud can bring data from both sources, linking customer loyalty from sources are pre-computed/ to network utilization or network faults (e.g. pre-aggregated dropped calls) Toad for Cloud Databases $ Hadoop Hive Sybase IQ © 2012 SAP AG. All rights reserved. 19Tuesday, May 22, 12
    • SYBASE IQ 15.4 DECONSTRUCTED App services — integrating Sybase IQ + Hadoop: using ETL 6b Feature Characteristics Big Data Use Cases Load Hadoop • Extract & load subsets of HDFS data • Ideal for combining subsets of HDFS into Sybase IQ column store unstructured data or summary of HDFS data into Sybase – Raw data from HDFS data into Sybase IQ for mid to long term IQ column store: usage in business reports – Results of Hadoop MR jobs Extract, transform, • Example — In eCommerce: clickstream data • HDFS data stored in Sybase IQ is load data from treated like other Sybase IQ data from weblogs stored in HDFS and outputs of HDFS (Hadoop MR jobs on that data (to study browsing – Gets ACID properties of a DBMS behavior) ETL’d into Sybase IQ. The Distributed File – Can be indexed, joined, parallelized transactional sales data in Sybase IQ joined System) into – Can be queried in an ad-hoc way with clickstream data to understand and predict customer browsing to buying behavior Sybase IQ • Visible to BI and other client tools via schemas Sybase IQ ANSI SQL API only • Currently, the Apache bulk data transfer utility SQOOP (built by Cloudera) is certified to provide this ETL capability ETL Clickstream Sales Data Data HDFS SQOOP Sybase IQ © 2012 SAP AG. All rights reserved. 20Tuesday, May 22, 12
    • SYBASE IQ 15.4 DECONSTRUCTED App services — integrating Sybase IQ + Hadoop: using Data Federation 6c Feature Characteristics Big Data Use Cases Join HDFS data • Scan and fetch specified data subsets • Ideal for combining subsets of HDFS with Sybase IQ from HDFS via table UDF data with Sybase IQ data for – Can read and fetch HDFS data operational (transient) business data on the fly: reports subsets Fetch and join – Called as part of Sybase IQ SQL • Example — In Retail: Point Of Sale subsets of HDFS query (POS) detailed data stored in HDFS. data on-demand – Output joinable with Sybase IQ data Sybase IQ EDW fetches POS data at using SQL queries • HDFS data not stored in Sybase IQ fixed intervals from HDFS of specific from Sybase IQ – Fetched into Sybase IQ In-memory hot selling SKUs, combines with tables inventory data in Sybase IQ to predict (Data Federation and prevent inventory “stockouts” – ACID properties not applicable technique) • Visible to BI/other client tools via Sybase IQ ANSI SQL API POS Data Inventory Data HDFS UDF Bridge Sybase IQ © 2012 SAP AG. All rights reserved. 21Tuesday, May 22, 12
    • SYBASE IQ 15.4 DECONSTRUCTED App services — integrating Sybase IQ + Hadoop: using Query Federation 6d Feature Characteristics Characteristics Big Data Use Cases Combine results of • Trigger and fetch Hadoop MR job • Ideal for combining results of Hadoop Hadoop MR jobs with results via table UDF MR job results with Sybase IQ data for Sybase IQ data on operational (transient) business reports the fly: Initiate and – Can trigger Hadoop MR jobs • Example – In Utilities: Smart meter and Join results of Hadoop smart grid data can be combined for MR jobs on-demand – Called as part of Sybase IQ SQL query load monitoring and demand forecast. using SQL queries Smart grid transmission quality data from Sybase IQ data – Output joinable with Sybase IQ data (multi-attribute time series data) stored (Query Federation in HDFS can be computed via Hadoop technique) • HDFS data not stored in Sybase IQ MR jobs triggered from Sybase IQ and – Fetched into Sybase IQ In-memory combined with Smart meter data stored tables in Sybase IQ to analyze demand and workload. – ACID properties not applicable • Repeated use: put fetched data in tables • Visible to BI and other client tools via Smart Grid Smart Meter Sybase IQ ANSI SQL API Transmission Data Consumption Data HDFS UDF Bridge Sybase IQ © 2012 SAP AG. All rights reserved. 22Tuesday, May 22, 12
    • SYBASE IQ 15.4 Unique, user community focused platform for big data analytics Data  Discovery  (Data   Applica6on  Modeling   Reports/Dashboards   Business  Decisions   Scien7sts) (Business  Analysts) (BI  Programmers) (Business  End  Users) Full  Mesh  High  Speed  Interconnect                                                                                         Infrastructure   Management   (DBAs)                                                                                                         SAN Fabric • Dynamic, elastic PlexQ™ MPP grid – Grow, shrink, provision on-demand – Heavy parallelization • Load, prepare, mine, report in a workflow – Privacy through isolation of resources – Collaboration through sharing of results/data via sharing of resources © 2012 SAP AG. All rights reserved. 23Tuesday, May 22, 12
    • Thank you Courtney Claussen Product Manager, Sybase IQ courtney.claussen@sap.com David Jonker Product Marketing Director, Sybase IQ david.jonker@sap.comTuesday, May 22, 12
    • Twitter Tag: #briefrTuesday, May 22, 12
    • Tuesday, May 22, 12
    • Most of the Big Data opportunity is, in the end, a Big Analytics opportunity. There are two challenges in this: Managing the data and the data flow Providing acceptable performance for analytics applications Hadoop and its associated technologies can be both a blessing and a curse. Twitter Tag: #briefrTuesday, May 22, 12
    • • Hadoop = Key-value store & Parallel processing framework • Some NoSQL databases are DHT-based, some are specialized DBMS • Column-store DBMS vary, but in general they are MPP RDBMS and NewSQL DBMS Twitter Tag: #briefrTuesday, May 22, 12
    • Data volumes (includes complexity of data structure) Concurrency (includes also workload variability) Computation (is application dependent) Data flow architecture is a factor Twitter Tag: #briefrTuesday, May 22, 12
    • In many ways this is similar to the Data Warehouse data flow challenge; writ larger Latency is about application service levels This is probably still a three stage process This is, by the way, a simplification Twitter Tag: #briefrTuesday, May 22, 12
    • Big Analytics is here to stay In some analytical application areas speed is desirable, in others speed is critical. Warning: Workloads can be mixed Analytic speed depends upon the database engine, but also data flow architecture Business effectiveness depends upon integration with the business process Twitter Tag: #briefrTuesday, May 22, 12
    • The prebuilt functions clearly make sense (for speed of processing). Are they intended to make some analytic tools unnecessary or simply to be called directly by such tools? What does SAP see as the appropriate role(s) for Hadoop in most businesses? As I understand it, Sybase IQ can fully replace Hadoop in some contexts. What are the situations where you think Hadoop AND Sybase IQ is appropriate? I’m intrigued by the idea of JOINing data between Hadoop results and Sybase IQ, but I’m not sure of the role of such a capability. How is this different from using MR for data ingest? As you can link up to Hadoop/Sybase IQ at the front or at the back-end, which would you tend to use when? Twitter Tag: #briefrTuesday, May 22, 12
    • You speak of broad and comprehensive capability, in combination with Hadoop. So which areas do you think are sweet spots? And which kinds of application and/or data collections do you think require different approaches? Who have been the early adopters of this Hadoop/Sybase IQ capability and what kind of business problems are they trying to solve? What do you see as SAP HANA’s role in this? Are the same analytical capabilities being added to SAP HANA? Twitter Tag: #briefrTuesday, May 22, 12
    • Tuesday, May 22, 12
    • May: Analytics June: Intelligence July: Governance August: Analytics September: Integration October: Database Twitter Tag: #briefrTuesday, May 22, 12
    • Tuesday, May 22, 12