Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical platform

26,190 views

Published on

Technical introduction to IBM's BigInsights platform for managing and analyzing Big Data.

Published in: Technology
5 Comments
39 Likes
Statistics
Notes
No Downloads
Views
Total views
26,190
On SlideShare
0
From Embeds
0
Number of Embeds
1,179
Actions
Shares
0
Downloads
1,515
Comments
5
Likes
39
Embeds 0
No embeds

No notes for slide

Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical platform

  1. 1. © 2016 IBM Corporation IBM BigInsights: Bringing you big value from Big Data Created by C. M. Saracco, IBM Silicon Valley Lab June 2016
  2. 2. © 2016 IBM Corporation2 IBM Disclaimer Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
  3. 3. © 2016 IBM Corporation3 Agenda  The big picture about Big Data  IBM’s approach  Portfolio overview  BigInsights • Open source core platform with Apache Hadoop • IBM technologies for enhanced analytics • How BigInsights fits within a broader IT infrastructure  How IBM can help you get off to a quick start
  4. 4. © 2016 IBM Corporation The Big Picture about Big Data
  5. 5. © 2016 IBM Corporation5 Business leaders frequently make decisions based on information they don’t trust, or don’t have1in3 83% of CIOs cited “Business intelligence and analytics” as part of their visionary plans to enhance competitiveness Business leaders say they don’t have access to the information they need to do their jobs 1in2 of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions 60% … and organizations need deeper insights Information is at the center of a new wave of opportunity… 2.5 million items per minute 300,000 tweets per minute 200 million emails per minute 220,000 photos per minute 5 TB per flight > 1 PB per day gas turbines 1 ZB = 1 billion TB
  6. 6. © 2016 IBM Corporation6 Extract insight from a high volume, variety and velocity of data in a timely and cost-effective manner Big Data presents big opportunities Manage and benefit from diverse data types and data structures Analyze streaming data and large volumes of persistent data Scale from terabytes to zettabytes Variety: Velocity: Volume:
  7. 7. © 2016 IBM Corporation7 What we hear from customers . . . .  Lots of potentially valuable data is dormant or discarded due to size/performance issues  Large volume of unstructured or semi-structured data is not worth integrating fully (e.g. Tweets, logs, . . .)  Not clear what should be analyzed (exploratory, iterative)  Information distributed across multiple systems and/or Internet  Some data has a short useful lifespan  Volumes can be extremely high  Query-ready resource for “cold” historic data needed (prevent unwieldy growth of data warehouses)  Analysis needed in the context of existing information (not stand alone).
  8. 8. © 2016 IBM Corporation8 Merging the traditional and Big Data approaches IT Structures the data to answer that question IT Delivers a platform to enable creative discovery Business Explores what questions could be asked Business Users Determine what question to ask Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization Big Data Approach Iterative & Exploratory Traditional Approach Structured & Repeatable
  9. 9. © 2016 IBM Corporation9 Why invest in analytics?  Analytics pay back $13.01 for every dollar spent1  69% created significant positive impact on business outcomes2  60% created significant positive impact on revenues2  53% created significant competitive advantage2 1 “Analytics Pays Back $13.01 for Every Dollar Spent” Nucleus Research, September 2014 2 “Analytics: The speed advantage” IBM Institute for Business Value, 2014
  10. 10. © 2016 IBM Corporation10 Big Data scenarios span many industries Identify criminals and threats from disparate video, audio, and data feeds Make risk decisions based on real-time transactional data Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Detect life-threatening conditions at hospitals in time to intervene Multi-channel customer sentiment and experience a analysis
  11. 11. © 2016 IBM Corporation11 Landing and Archive Zone Real-time Analytics Zone Enterprise Warehouse and Mart Zone Information Governance, Security and Business Continuity Analytic Appliances Big Data Platform Capabilities Streaming Data Text Data Applications Data Time Series Geo Spatial Relational • Information Ingest • Real Time Analytics • Warehouse & Data Marts • Analytic Appliances Social Network Video & Image All Data Sources Advanced Analytics / New Insights New / Enhanced Applications Automated Process Case Management Analytic Applications Cognitive Learn Dynamically? Prescriptive Best Outcomes? Predictive What Could Happen? Descriptive What Has Happened? Exploration and Discovery What Do You Have? Watson Cloud Services ISV Solutions Alerts IBM Big Data and analytics sample architecture Ingestion and Operational Information
  12. 12. © 2016 IBM Corporation12 Big Data use expanding rapidly Big data adoption over time, as reported by respondents: 2012 to 2014 2015 22%-27% 25% 0% change 2012 to 2014 2015 24%-26% 10% 250% decrease Educate: Learning about big data capabilities 2012 to 2014 2015 43%-47% 53% 125% increase Explore: Exploring internal use cases and developing a strategy Engage: Implementing infrastructure and running pilot activities 2012 to 2014 2015 5%-6% 13% 210% increase Execute: Using big data and analytics pervasively across the enterprise 2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)
  13. 13. © 2016 IBM Corporation13 Big Data technologies pay off 2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)
  14. 14. © 2016 IBM Corporation14 Return on investment period for big data and analytics projects as reported by respondents Big Data ROI often < 18 months 2015 IBV study “Analytics: The Upside of Disruption” (ibm.biz/w3_2015analytics)
  15. 15. © 2016 IBM Corporation15 Big Data in practice: focus areas Survey summaries from Forbes, May 2015
  16. 16. © 2016 IBM Corporation IBM’s approach
  17. 17. © 2016 IBM Corporation17 IBM analytics platform strategy for Big Data • Integrate and manage the full variety, velocity and volume of Big Data • Apply advanced analytics • Visualize all available data for ad-hoc analysis • Support workload optimization and scheduling • Provide for security and governance • Integrate with enterprise software Discovery & Exploration Prescriptive Analytics Predictive Analytics Content Analytics Business Intelligence Data Mgmt Hadoop & NoSQL Content Mgmt Data Warehouse Information Integration & Governance IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted. Spark Analytics Operating System Machine LearningOn premises On cloud Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.
  18. 18. © 2016 IBM Corporation18 IBM BigInsights for Apache Hadoop and Spark Discovery & Exploration Prescriptive Analytics Predictive Analytics Content Analytics Business Intelligence Data Mgmt Hadoop & NoSQL Content Mgmt Data Warehouse Information Integration & Governance IBM ANALYTICS PLATFORM Built on Spark. Hybrid. Trusted. Spark Analytics Operating System Machine LearningOn premises On cloud Data at rest & In-motion. Inside & outside the firewall. Structured & unstructured.  Analytical platform for persistent Big Data – 100% open source core with IBM add-ons for analysts, data scientists, and admins – On premise or cloud  Distinguishing characteristics – Built-in analytics . . . . Enhances business knowledge – Enterprise software integration . . . . Complements and extends existing capabilities – Production-ready . . . . Speeds time-to-value  IBM advantage – Combination of software, hardware, services and research
  19. 19. © 2016 IBM Corporation19 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  20. 20. © 2016 IBM Corporation20 BigInsights ISV Partner Ecosystem lHelium SW
  21. 21. © 2016 IBM Corporation A Closer Look at IBM BigInsights . . . .
  22. 22. © 2016 IBM Corporation22 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  23. 23. © 2016 IBM Corporation23 IBM Open Platform foundational components  Apache Hadoop  Distributed file system, popular API (MapReduce) for clustered computing  Originally designed for batch processing of massive data volumes, varied data formats  Apache Spark  General purpose, high-speed data processing engine for clustered computing  In-memory processing, popular built-in libraries (e.g., machine learning)  No built-in storage. Attaches to other data stores (e.g., Hadoop Distributed File System)
  24. 24. © 2016 IBM Corporation24 IBM Open Platform: a closer look  Timely updates as new open source versions released  Install only those components you want / need  Compliant with ODPi runtime Ambari 2.2 Flume 1.6.0 Hadoop (includes MapReduce, YARN) 2.7.2 HBase 1.2.0 Hive 1.2.1 Kafka 0.9.0.1 Knox 0.7.0 Oozie 4.2.0 Parquet 2.2 Phoenix 4.6.1 Pig 0.15.0 Ranger 0.5.2 Slider 0.90.2 Solr 5.5 Spark 1.6.1 Sqoop 1.4.6 Titan 1.0.0 ZooKeeper 3.4.6
  25. 25. © 2016 IBM Corporation25 What is ODPi? • ODPi has an open governance model. Developers form a Technical Steering Committee • All members have an equal vote on ODPi Core decisions. • ODPi has a Board of Directors responsible for the financial, legal and promotional aspects of ODPi. • Non-profit organization accelerating the delivery of Big Data solutions by powering a platform called ODPi Core. • The ODPi Core focuses on a small but critical set of projects • Goal: enables a rapid start and an industry driven definition ODPi Members include: Ampool, Altiscale, ArenaData, AsiaInfo, Capgemini, DataTorrent, EMC, GE, Hortonworks, IBM, Infosys, NEC, Pivotal, PLDT, SAS, Squid Solutions, SyncSort, Telstra, Toshiba, UNIFi, VMware, WANdisco, Xiilab, zData and Zettaset. ODPi & Apache Software Foundation (ASF) ODPi supports the ASF mission ASF provides governance around individual projects without looking at ecosystem and collections of projects ODPi provides a vendor-led consistent packaging model and certification for Big Data components as an ecosystem - Test once ; Run anywhere for big data applications
  26. 26. © 2016 IBM Corporation26 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  27. 27. © 2016 IBM Corporation27 SQL for Hadoop (Big SQL) SQL-based Application Big SQL Engine Data Storage IBM data server client SQL MPP Run-time DFS 27  Comprehensive, standard SQL – SELECT: joins, unions, aggregates, subqueries . . . – GRANT/REVOKE, INSERT … INTO – UPDATE / DELETE (HBase) – Procedural logic in SQL – Stored procs, user-defined functions – IBM data server JDBC and ODBC drivers  Optimization and performance – IBM MPP engine (C++) replaces Java MapReduce layer – Continuous running daemons (no start up latency) – Message passing allow data to flow between nodes without persisting intermediate results – In-memory operations with ability to spill to disk (useful for aggregations, sorts that exceed available RAM) – Cost-based query optimization with 140+ rewrite rules  Various storage formats supported – Data persisted in DFS, Hive, HBase – No IBM proprietary format required  Integration with RDBMSs via LOAD, query federation BigInsights
  28. 28. © 2016 IBM Corporation28 Big SQL query federation = virtualized data access Transparent  Appears to be one source  Programmers don’t need to know how / where data is stored Heterogeneous  Accesses data from diverse sources High Function  Full query support against all data  Capabilities of sources as well Autonomous  Non-disruptive to data sources, existing applications, systems. High Performance  Optimization of distributed queries SQL tools, applications Data sources Virtualized data
  29. 29. © 2016 IBM Corporation29 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  30. 30. © 2016 IBM Corporation30 Text analytics  Distills structured info from unstructured text  Sentiment analysis  Consumer behavior  Illegal or suspicious activities  …  Parses text and detects meaning with annotators  Understands the context in which the text is analyzed  Features pre-built extractors for names, addresses, phone numbers, etc. I had an iphone, but it's dead @JoaoVianaa. (I've no idea where it's) !Want a Galaxy now !!! @rakonturmiami im moving to miami in 3 months. i look foward to the new lifestyle I'm at Mickey's Irish Pub Downtown (206 3rd St, Court Ave, Des Moines) w/ 2 others http://4sq.com/gbsaYR
  31. 31. © 2016 IBM Corporation31 Extracting information from text Entity Analytics Preventative Maintenance Customer Segmentation Sentiment Affinity … Analyze Text Single column or document • sentence segmentation • tokenization • part-of-speech tagging • language detection Recognize Entity Recognition Machine Data Primitives Sentiment … Describe via extractors Information Extraction (IE) Tagged syntax Classified words / attributes Classified words / attributes Text preparation • extraction operations via lexical analysis via deep linguistic analysis • span operations • join operations • consolidations • … … • verb-centric abstraction • noun-centric abstraction • shallow parsing • …
  32. 32. © 2016 IBM Corporation32 Web-based tool to define rules to extract data and derive information from unstructured text Graphical interface to describe structure of various textual formats – from log file data to natural language Text analytics tooling
  33. 33. © 2016 IBM Corporation33 Pre-built text extractors  The extractor library contains a rich set of pre-built extractors  Finance actions  Named Entities  Generic  Machine Data  Sentiment Analysis  You can control output properties  Output columns and names  Row filters  Some pre-built extractors can be customized  Add / remove dictionary terms
  34. 34. © 2016 IBM Corporation34 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  35. 35. © 2016 IBM Corporation35 Spreadsheet-style analysis (BigSheets)  Web-based analysis and visualization  Spreadsheet-like interface  Explore, manipulate data without writing code  Invoke pre-built functions  Generate charts  Export results of analysis  Create custom plug-ins  . . .
  36. 36. © 2016 IBM Corporation36 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cognos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  37. 37. © 2016 IBM Corporation37 What is Big R? R Clients Scalable Statistic s Engine Data Sources Embedded R Execution R Packages R Packages 1 2 3 1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm (no MapReduce code) 2. Scale out R • Partitioning of large data (“divide”) • Parallel cluster execution of pushed down R code (“conquer”) • All of this from within the R environment (Jaql, Map/Reduce are hidden from you • Almost any R package can run in this environment 3. Scalable machine learning • A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R “End-to-end integration of R-Project with BigInsights” Pull data (summaries) to R client Or, push R functions right on the data
  38. 38. © 2016 IBM Corporation38 IBM Open Platform 100% open source platform compliant with ODPi Apache Hadoop ecosystem Apache Spark ecosystem IBM-specific BigInsights features Big SQL (industry standard SQL) Text analytics BigSheets (spreadsheet-style tool) Big R (R support) IBM Streams, Cogmos (limited use licenses) Overview of BigInsights Free Quick Start (non production): • IBM Open Platform • IBM added value features • Community support
  39. 39. © 2016 IBM Corporation39 Limited use license: IBM Streams Millions of events per second Microsecond Latency Sensor, video, audio, text, Hadoop and relational data sources Just-in-time decisions Powerful analytics Persist to BigInsights, …  Platform for real-time Big Data analytics  “Data in motion”  Gigabytes+ per second or more  Terabyte+ per day  All kinds of data  Insights in microseconds  Connectivity to varied data sources
  40. 40. © 2016 IBM Corporation40 Limited use license: Cognos BI  Model, explore, analyze data from many sources  Visualize and report on results  Connection to BigInsights via Big SQL  In-memory dynamic views cache data in Cognos for quick data access  Part of IBM BigInsights for Apache Hadoop Demo: https://www.youtube.com/watch?v=yxnoGrK6PSY
  41. 41. © 2016 IBM Corporation41 Thinking cloud? Think IBM! BETTER ECONOMICS LOWER RISK OF FAILURE FASTER INNOVATION Lower Skill Less Cost+ Buy only what you need. Start small and grow. EQUALS
  42. 42. © 2016 IBM Corporation42 Build  Ready-to-run Hadoop clusters in the cloud  IBM Open Platform - 100% open source Hadoop; will align with ODP  Based on proven, performant reference architectures Manage  Key platform components monitored for availability  Hadoop, OS and BigInsights patched and maintained  Ambari cluster manager for complete control Support  24x7 cloud operations and support team  Access to deep Hadoop expertise  Faster time to problem resolution Protect  Deployed in world- class, secure SoftLayer data centers  Dedicated physical machines  Certified SSAE SOC2 Type 1, ISO 27001 IBM BigInsights on cloud http://www.ibm.com/cloud http://www.bluemix.net
  43. 43. © 2016 IBM Corporation Summary and Fast Start
  44. 44. © 2016 IBM Corporation44 IBM investing heavily in Big Data and analytics $24B Investment in both organic development and 30+ acquisitions $100M Announced investment in IBM Interactive Experience, creating 10 new labs worldwide 9Analytics Solution Centers 1,000universities Developing curriculum and training for analytics with $1B To bring cognitive services and applications to market
  45. 45. © 2016 IBM Corporation45 Spark investments: community, core, and consumption Core Accelerating Spark capabilities Community Growing Spark knowledge & expertise Consumption Using Spark within IBM & partner products Spark Technology Center Big Data University SystemML open source contribution Spark stand-alone Hadoop distribution IBM portfolio 30+ research initiatives 3500+ IBM developers and researchers
  46. 46. © 2016 IBM Corporation46 The bottom line about IBM and Big Data  Big Data is a strategic initiative for IBM  Significant investments across software, hardware and services.  BigInsights  Enables firms to exploit growing variety, velocity, and volume of data  Delivers diverse range of analytics  Leverages and extends open source  Provides enterprise-class features and supporting services  Complement existing software investments and commercial offerings  IBM advantage  Full solution spanning software, hardware & services  Rapid technology advances through partnerships with IBM Research  Global reach
  47. 47. © 2016 IBM Corporation47 Jump start your efforts with IBM Analytics Stampede Leading the charge for your analytics success  IBM’s Expertise - takes the guesswork out and delivers savings in time and cost for your early enablement and success  IBM’s Analytics Solution - provides unmatched capabilities for processing and analyzing all types of data  Skills & Knowledge Transfer - ensures knowledge transfer and training roadmap for skills enablement in your organization for new analytics requirements Stampede Time to insights Research Product Selection Services Soluiton Success Solution Success Knowledge Transfer Analytics Prototypes BVA / Roadmaps Standard Roadmap IBM Expertise Use Case Selection Skills & Knowledge https://www-01.ibm.com/software/data/services/stampede.html
  48. 48. © 2016 IBM Corporation48 Want to learn more?  Download Quick Start offering  Follow tutorials, videos, and more  Links all available from HadoopDev – https://developer.ibm.com/hadoop/
  49. 49. © 2016 IBM Corporation IBM big data • IBM big data • IBM big data IBM big data • IBM big data • IBM big data IBMbigdata•IBMbigdata IBMbigdata•IBMbigdata THINK

×