Your SlideShare is downloading. ×
0
InfoSphere BigInsightsHadoop business readyWilfried HogeIT Architect Big Data
© 2013 International Business Machines Corporation 2Getting the Value from Big Data – Why a Platform?§  Almost all big da...
© 2013 International Business Machines Corporation 3AcceleratorsInformation Integration & GovernanceDataWarehouseStreamCom...
© 2013 International Business Machines Corporation 4New Architecture to Leverage All Data and AnalyticsData	  in	  Mo)on	 ...
© 2013 International Business Machines Corporation 5New Architecture to Leverage All Data and AnalyticsData	  in	  Mo)on	 ...
© 2013 International Business Machines Corporation 6Tools for Administrators6•  Monitoring capabilities provide a centrali...
© 2013 International Business Machines Corporation 7BigSheets to analyze and visualize•  Model “big data” collectedfrom va...
© 2013 International Business Machines Corporation 88A centralized dashboard to visualizeanalytic results:•  BigSheets col...
© 2013 International Business Machines Corporation 99Editors•  A workflow editor that greatly simplifies the creation ofco...
© 2013 International Business Machines Corporation 10Running Applications on Big Data•  Browse available applications•  De...
© 2013 International Business Machines Corporation 11Application linking and interfaces to build new apps11•  Compose newa...
© 2013 International Business Machines Corporation 12Collaborative Big Data for many roles•  Business Users can get their ...
© 2013 International Business Machines Corporation 13Build-in accelerators•  Software components that accelerate developme...
© 2013 International Business Machines Corporation 14Machine Data Analytics AcceleratorWhat does it do?§  Provides the ab...
© 2013 International Business Machines Corporation 15Machine Data Analytics Accelerator High-Level Workflow© 2013 IBM Corp...
© 2013 International Business Machines Corporation 16Use the Machine Data Analytics Accelerator by starting thepredefined ...
© 2013 International Business Machines Corporation 17© 2013 IBM CorporationView results of MDA in web, BigSheets and dashb...
© 2013 International Business Machines Corporation 18BigInsights Enterprise EditionConnectivity and Integration StreamsNet...
© 2013 International Business Machines Corporation 19BigInsights: Value Beyond Open SourceEnterprise CapabilitiesAdministr...
© 2013 International Business Machines Corporation 20If this were easy, everyone would already beleveraging big data“Big D...
© 2013 International Business Machines Corporation 21Simplifying Big Data for the EnterpriseThe new PureData System for Ha...
© 2013 International Business Machines Corporation 22Accelerate Big DataTime to ValueSimplify Big DataAdoption & Consumpti...
© 2013 International Business Machines Corporation 23SQL Access for Hadoop: Why?•  Data warehouse augmentation isa leading...
© 2013 International Business Machines Corporation 24SQL for Hadoop: What’s the Problem?•  SQL Access to data in Hadoop is...
© 2013 International Business Machines Corporation 25Big SQL: Native SQL Query Access for Hadoop•  Native SQL access to da...
© 2013 International Business Machines Corporation 26From Getting Starting to Enterprise DeploymentInfoSphere BigInsights ...
© 2013 International Business Machines Corporation 27Where to start with BigInsights?•  Learn it at BigDataUniversity.com•...
© 2013 International Business Machines Corporation 28IBM’s statements regarding its plans, directions, and intent are subj...
Upcoming SlideShare
Loading in...5
×

InfoSphere BigInsights

4,554

Published on

Presentation about InfoSphere BigInsights from IM Forum 2013 in Berlin

Published in: Technology

Transcript of "InfoSphere BigInsights"

  1. 1. InfoSphere BigInsightsHadoop business readyWilfried HogeIT Architect Big Data
  2. 2. © 2013 International Business Machines Corporation 2Getting the Value from Big Data – Why a Platform?§  Almost all big data use cases requirean integrated set of big data technologiesto address the business pain completely§  Reduce time and cost and provide quick ROIby leveraging pre-integrated components§  Be flexible in the combination of technologies§  Start small with a single project and progressto others over your big data journeyAcceleratorsInformation Integration & GovernanceDataWarehouseStreamComputingHadoopSystemDiscoveryApplicationDevelopmentSystemsManagementData Media Content Machine SocialBIG DATA PLATFORM
  3. 3. © 2013 International Business Machines Corporation 3AcceleratorsInformation Integration & GovernanceDataWarehouseStreamComputingHadoopSystemDiscoveryApplicationDevelopmentSystemsManagementData Media Content Machine SocialBIG DATA PLATFORMInfoSphere BigInsights is IBM‘s distribution ofHadoop that delivers additional valueAcceleratorsSpeed time to value with analyticand application acceleratorsInfoSphere BigInsightsBringing Hadoop to the enterprise
  4. 4. © 2013 International Business Machines Corporation 4New Architecture to Leverage All Data and AnalyticsData  in  Mo)on  Data  at  Rest  Data  in  Many  Forms  InformationIngestion andOperationalInformationDecisionManagementBI and PredictiveAnalyticsNavigationand DiscoveryIntelligenceAnalysisLanding Area,Analytics Zoneand Archive§  Raw Data§  Structured Data§  Text Analytics§  Data Mining§  Entity Analytics§  Machine LearningReal-timeAnalytics§  Video/Audio§  Network/Sensor§  Entity Analytics§  PredictiveExploration,IntegratedWarehouse,and Mart Zones§  Discovery§  Deep Reflection§  Operational§  Predictive§  Stream Processing§  Data Integration§  Master DataStreamsInformation Governance, Security and Business Continuity
  5. 5. © 2013 International Business Machines Corporation 5New Architecture to Leverage All Data and AnalyticsData  in  Mo)on  Data  at  Rest  Data  in  Many  Forms  InformationIngestion andOperationalInformationDecisionManagementBI and PredictiveAnalyticsNavigationand DiscoveryIntelligenceAnalysisLanding Area,Analytics Zoneand Archive§  Raw Data§  Structured Data§  Text Analytics§  Data Mining§  Entity Analytics§  Machine LearningReal-timeAnalytics§  Video/Audio§  Network/Sensor§  Entity Analytics§  PredictiveExploration,IntegratedWarehouse,and Mart Zones§  Discovery§  Deep Reflection§  Operational§  Predictive§  Stream Processing§  Data Integration§  Master DataStreamsInformation Governance, Security and Business Continuity•  brings Hadoop to the Enterprise•  enhances ease of use andconsumability•  takes the complexity out ofgetting started with Hadoop•  users across the organizationcan build applications, and getinsights at their fingertips withouthaving to learn new skill setsInfoSphere BigInsights
  6. 6. © 2013 International Business Machines Corporation 6Tools for Administrators6•  Monitoring capabilities provide a centralized dashboard view to visualize key performanceindicators including CPU, disk, and memory and network usage for the cluster, data servicessuch as HDFS, HBase, Zookeeper and Flume, and application services including MapReduce,Hive, and Oozie•  Status information and controlover the major clustercapabilities•  Advanced capabilities to controlapplication permissions anddeployment•  Capability to view and controlall applications from a singlepage
  7. 7. © 2013 International Business Machines Corporation 7BigSheets to analyze and visualize•  Model “big data” collectedfrom various sources inspreadsheet-like structures•  Filter and enrich content withbuilt-in functions•  Combine data in differentworkbooks•  Visualize results throughspreadsheets, charts•  Export data into commonformats (if desired)No programming knowledge needed!
  8. 8. © 2013 International Business Machines Corporation 88A centralized dashboard to visualizeanalytic results:•  BigSheets collections•  Analytic application results•  Monitoring metrics•  Ability to view BigSheets data flows betweenand across data sets to quickly navigate andrelate analysis and charts•  Visualize inner outer joins, enhanced filtersfor BigSheets columns, column data-typemapping for collections and application ofanalytics to BigSheetscolumns, … etcCentralized dashboard & data flows
  9. 9. © 2013 International Business Machines Corporation 99Editors•  A workflow editor that greatly simplifies the creation ofcomplex Oozie workflows with a consumable interface•  A Pig/Jaql Editor with content assist and syntaxhighlighting that enables users to create and executenew applications using Pig or Jaql in local or clustermode from the Eclipse IDEApplication development & deployment•  Enablement of BigSheets macroand BigSheets reader development•  Text Analytics development,including support for modularrule sets•  Publish new application: BigSheetsMacro, BigSheets Reader, AQLmodule, Jaql moduleTools for Developers 1. Sample yourData2. Develop yourapplication usingBigInsights tools3. Test yourapplication4. Package and publish yourapplication5. Deploy yourapplication on thecluster
  10. 10. © 2013 International Business Machines Corporation 10Running Applications on Big Data•  Browse available applications•  Deploy published applications(administrators only)•  Launch (or schedule for launch) adeployed application•  Monitor job (application) executionstatus•  Predefined applications•  Import & Export Data•  Database & Files•  Web and Social•  Analyze and Query•  Predictive Analytics•  Text Analytics•  SQL/Hive, Jaql, Pig, Hbase•  Accelerators
  11. 11. © 2013 International Business Machines Corporation 11Application linking and interfaces to build new apps11•  Compose newapplications fromexisting applicationsand BigSheets•  Invoke analyticsapplications from theweb console, includingintegration withinBigSheets•  REST data source Appthat enables users toload data from any data source supporting REST APIs into BigInsights, includingpopular social media services•  Sampling App that enables users to sample data for analysis•  Subsetting App that enables users to subset data for data analysis
  12. 12. © 2013 International Business Machines Corporation 12Collaborative Big Data for many roles•  Business Users can get their hands on bigdata and use big data applications andBigSheets to get insights into their data§  Data scientists can perform deeper analysisand get richer insights§  Administrators are empowered to be moreagile through better controls and views into keyperformance indicators§  Developers can leverage unified tooling in a Big DataApplication Development Lifecycle and are able to create anddeploy new types of applications, with enhancements thatsimplify even complex workflows
  13. 13. © 2013 International Business Machines Corporation 13Build-in accelerators•  Software components that accelerate development and/or implementation of specificsolutions or use cases on top of the Big Data platform•  Provide business logic, data processing, and UI/visualization, tailored for a given use case•  Bundled with Big Data platform components – InfoSphere BigInsights and InfoSphereStreams•  Key Benefits–  Time to value–  Leverage best practices around implementation of a given use case.•  Analytical Accelerators–  Text analytics – Geospatial analytics–  Machine learning – Time series–  Data mining•  Application Accelerators–  Machine Data Analytics – operational data including logs for operations efficiency–  Social Data Analytics – sentiment analytics, Intent to purchase–  Telecommunications – CDR streaming analytics deep customer event analytics–  Finance Analysis – streaming options, trading, Insurance and banking DW models
  14. 14. © 2013 International Business Machines Corporation 14Machine Data Analytics AcceleratorWhat does it do?§  Provides the ability to ingest, parse and extract a widevariety of machine data– Faceted search enables easy navigation and discovery– Visualization enables easy analysis of the dataMachine Data AnalyticsExample Application: Facilities Management• Use real time data from building devices such as meters, sensors and motiondetectors to monitor and manage power usageWhy should you care?§  It enables clients to gain insights into operations, customer experience,transactions and behavior, processing machine data in minutes instead of daysand weeks§  With these insights, clients can:– Proactively plan to increase operational efficiency– Troubleshoot problems and investigate security incidents– Monitor end-to-end infrastructure to avoid service degradation or outages
  15. 15. © 2013 International Business Machines Corporation 15Machine Data Analytics Accelerator High-Level Workflow© 2013 IBM Corporation
  16. 16. © 2013 International Business Machines Corporation 16Use the Machine Data Analytics Accelerator by starting thepredefined applications
  17. 17. © 2013 International Business Machines Corporation 17© 2013 IBM CorporationView results of MDA in web, BigSheets and dashboard
  18. 18. © 2013 International Business Machines Corporation 18BigInsights Enterprise EditionConnectivity and Integration StreamsNetezzaTextprocessingengine andlibraryJDBCFlumeInfrastructure JaqlHivePigHBaseMapReduceHDFSZooKeeperIndexing LuceneAdaptiveMapReduceOozieText compressionEnhancedsecurityFlexibleschedulerOptionalIBM andpartnerofferingsAnalytics and discovery “Apps”DB2BigSheetsWeb CrawlerDistrib filecopyDB exportBoardreaderDB importAd hoc queryMachinelearningDataprocessing. . .Administrative anddevelopment toolsWeb console•  Monitor cluster health, jobs,etc.•  Add / remove nodes•  Start / stop services•  Inspect job status•  Inspect workflow status•  Deploy applications•  Launch apps / jobs•  Work with distrib file system•  Work with spreadsheetInterface•  Support REST-based API•  . . .REclipse tools•  Text analytics•  MapReduce programming•  Jaql, Hive, Pig development•  BigSheets plug-indevelopment•  Oozie workflow generationIntegratedinstallerOpen Source IBMIBMCognos BIGPFS (EAP)Accelerator formachine dataanalysisAccelerator forsocial dataanalysisGuardium DataStageData ExplorerSqoopHCatalog
  19. 19. © 2013 International Business Machines Corporation 19BigInsights: Value Beyond Open SourceEnterprise CapabilitiesAdministration & SecurityWorkload OptimizationConnectorsOpen sourcecomponentsAdvanced EnginesVisualization & ExplorationDevelopment ToolsIBM-certifiedApache Hadoop or or …Key differentiators•  Built-in analytics•  Enterprise software integration•  Spreadsheet-style analysis•  Integrated installation of supported opensource and other components•  Web Console for admin and applicationaccess•  Platform enrichment: additional security,performance features, . . .•  World-class support•  Full open source compatibilityBusiness benefits•  Quicker time-to-value due to IBMtechnology and support•  Reduced operational risk•  Enhanced business knowledge with flexibleanalytical platform•  Leverages and complements existingsoftware
  20. 20. © 2013 International Business Machines Corporation 20If this were easy, everyone would already beleveraging big data“Big Data offers big business gains but hidden costs and complexity presentbarriers that most organizations will struggle with”- The Cost of Big Data, Eric Savitz, Forbes 5/2012§  Open source Apache Hadoop for enterprise usage is incomplete§  Hadoop skills are in short supply§  Custom built solutions lack integrated cluster management§  Requires integration effort within the existing analytic ecosystem§  Most integrated solutions do not help with archival
  21. 21. © 2013 International Business Machines Corporation 21Simplifying Big Data for the EnterpriseThe new PureData System for Hadoop§  Accelerate time to value§  Accelerate time to insight§  Simplify big data adoption and consumption§  Extend the value of the data warehouse§  Implement enterprise class big data§  Minimize system setup and administration§  Available in 2H2013System for Hadoop
  22. 22. © 2013 International Business Machines Corporation 22Accelerate Big DataTime to ValueSimplify Big DataAdoption & ConsumptionImplement Enterprise ClassBig Data1 Based on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally pre-built, pre-tested and optimized. Individual results may vary.2 Based on current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM.Built-in ExpertiseSimplified ExperienceIntegration by DesignBenefits of IBM PureData System for Hadoop§  Deploy 8x faster than custom-built solutions1§  Built-in visualization to accelerate insight§  Built-in analytic accelerators2unlike big data appliances on the market§  Single system console for full system administration§  Rapid maintenance updates with automation§  No assembly required, data load ready in hours§  Only integrated Hadoop systemwith built-in archiving tools2§  Delivered with more robust securitythan open source software§  Architected for high availability
  23. 23. © 2013 International Business Machines Corporation 23SQL Access for Hadoop: Why?•  Data warehouse augmentation isa leading Hadoop use case•  MapReduce is difficult–  MapReduce Java API is tedious andrequires programming expertise–  Unfamiliar languages (ie. Pig) also require special skills•  SQL support would open the data to a much wider audience–  Familiar, widely known syntax–  Common catalog for identifying data and structure–  Declarative – clear separation of the what (the data you’re after) vs.the how (processing)Pre-Processing Hub Query-able Archive Exploratory AnalysisInformationIntegrationData WarehouseStreamsReal-timeprocessingBigInsightsLanding zonefor all dataData WarehouseBigInsights Can combinewithunstructuredinformationData Warehouse1 2 3
  24. 24. © 2013 International Business Machines Corporation 24SQL for Hadoop: What’s the Problem?•  SQL Access to data in Hadoop is challenging–  Data is in many formats•  CSV, JSON, Hive RCFile, HBase, ...•  Some formats (HBase composite keys) don’t map cleanlyto relational models–  No schemas or statistics–  Hadoop was not designed to be a query engine•  Hive (with HiveQL): limited query access for Hadoop–  SQL-like, but NOT SQL•  Limited data types – no varchar(n), decimal(p,s), etc…•  Limited join support•  No subqueries•  No windowed aggregates–  Very limited JDBC/ODBC driver–  Everything executes in MapReduce•  Even very small queries requiring little processing
  25. 25. © 2013 International Business Machines Corporation 25Big SQL: Native SQL Query Access for Hadoop•  Native SQL access to datastored in BigInsights–  ANSI SQL 92+–  Standard syntax support (joins, data types, …)•  Real JDBC/ODBC drivers–  Prepared statements–  Cancel support–  Database metadata API support–  Secure socket connections (SSL)•  Optimization–  Leveraging MapReduce parallelismor…–  Direct access for low-latency queries•  Varied data sources–  HBase (including secondary indexes)–  CSV, Delimited files, Sequence files–  JSON–  Hive tablesBig SQL EngineBigInsightsData SourcesSQLHive Tables HBase tables CSV FilesApplicationJDBC / ODBC ServerJDBC / ODBC Driver
  26. 26. © 2013 International Business Machines Corporation 26From Getting Starting to Enterprise DeploymentInfoSphere BigInsights Brings Hadoop to the EnterpriseBasic EditionEnterprise Edition- Accelerators- Performance Optimization- Visualization Capabilities- Pre-built applications- Text analytics- Spreadsheet-style tool- RDBMS, warehouse connectivity- Administrative tools, security- Eclipse development tools- Enterprise Integration . . . .- Web-basedmgmt console- Jaql- Integrated installBreadth of capabilitiesEnterpriseclassFree downloadSold by # of terabytes managedApacheHadoopPureData for Hadoop- Appliance simplicity for theenterprise
  27. 27. © 2013 International Business Machines Corporation 27Where to start with BigInsights?•  Learn it at BigDataUniversity.com•  Try it on Smart Cloud Enterprise: ibm.biz/Bdx8FF•  Read about it in “Harness the Power of Big Data”at ibm.biz/Bdx8RP•  Learn about Big Data at www.ibmbigdatahub.com•  Register for “Big Data at the speed of business” event onApril 30th at ibm.co/bigdataevent•  Try BigSQL: bigsql.imdemocloud.com•  YouTube Videos - Big Data Channel: youtube.com/user/ibmbigdata
  28. 28. © 2013 International Business Machines Corporation 28IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal withoutnotice at IBM’s sole discretion.Information regarding potential future products is intended to outline our general product direction and itshould not be relied on in making a purchasing decision.The information mentioned regarding potential future products is not a commitment, promise, or legalobligation to deliver any material, code or functionality. Information about potential future products may notbe incorporated into any contract. The development, release, and timing of any future features orfunctionality described for our products remains at our sole discretion.Performance is based on measurements and projections using standard IBM benchmarks in a controlledenvironment. The actual throughput or performance that any user will experience will vary depending uponmany factors, including considerations such as the amount of multiprogramming in the user’s job stream,the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance canbe given that an individual user will achieve results similar to those stated here.Please Note
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×