Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Big Data: Introducing InfoSphere BigInsights, IBM's Hadoop-based analytical platform

  • 3,143 views
Uploaded on

Technical introduction to IBM's InfoSphere BigInsights platform for managing and analyzing Big Data. Updated July 2014 for BigInsights 3.0.

Technical introduction to IBM's InfoSphere BigInsights platform for managing and analyzing Big Data. Updated July 2014 for BigInsights 3.0.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,143
On Slideshare
3,143
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
295
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introducing IBM’s InfoSphere BigInsights Cynthia M. Saracco Senior Solution Architect IBM Silicon Valley Lab <
  • 2. 2 © 2013 IBM Corporation IBM Big Data Platform Strategy BI / Reporting BI / Reporting Exploration / Visualization Industry App Predictive Analytics Content Analytics Analytic Applications IBM Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse • Integrate and manage the full range of Big Data • Apply advanced analytics • Explore and visualize data for ad hoc analysis • Speed development of new analytic applications • Provide high levels of performance and scalability • Integrate with enterprise software . . . .
  • 3. 3 © 2013 IBM Corporation BigInsights Brings Hadoop to the Enterprise BigInsights = analytical platform for persistent Big Data – Based on open source & IBM technologies – Deep customer engagements, product plan flexibility Distinguishing characteristics – Built-in analytics . . . . Enhances business knowledge – Enterprise software integration . . . . Complements and extends existing capabilities – Production-ready platform with tooling for analysts, developers, and administrators. . . . Speeds time-to-value; simplifies development and maintenance IBM advantage – Combination of software, hardware, services and advanced research
  • 4. 4 © 2013 IBM Corporation© 2013 IBM Corporation4 From Getting Starting to Enterprise Deployment: Different BigInsights Editions For Varying Needs Standard Edition Breadth of capabilities Enterpriseclass Enterprise Edition - Spreadsheet-style tool -- Web console -- Dashboards - Pre-built applications -- Eclipse tooling -- RDBMS connectivity -- Big SQL -- Monitoring and alerts -- Platform enhancements -- . . . - Accelerators -- GPFS – FPO -- Adaptive MapReduce - Text analytics - Enterprise Integration -- Big R -- InfoSphere Streams* -- Watson Explorer* -- Cognos BI* -- Data Click* -- . . . -* Limited use license Apache Hadoop Quick Start Free. Non-production Same features as Standard Edition plus text analytics and Big R
  • 5. 5 © 2013 IBM Corporation BigInsights Content Function Version Open Source Enterprise Edition Integrated Install Inc Inc Hadoop (including common utilities, HDFS, MapReduce v1) 2.2 Inc Inc Pig (programming / query language) 0.12.0 Inc Inc Flume (data collection/aggregation) 1.3.1 Inc Inc Hive (data summarization/querying) 0.12.0 Inc Inc Lucene (text search) 4.7.0 Inc Inc Solr (enterprise search based on Lucene) 4.7.0 Inc Inc Zookeeper (process coordination) 3.4.5 Inc Inc Avro (data serialization) 1.7.4 Inc Inc HBase (real time read/write) 0.96.0 Inc Inc Sqoop (RDBMS bulk data transfer) 1.4.3 Inc Inc
  • 6. 6 © 2013 IBM Corporation BigInsights Content (cont’d) Function Open Source Enterprise Edition Big SQL (standard SQL query support, JDBC/ODBC drivers, LOAD from RDBMSs, etc.) n/a Inc Integration with Netezza, DB2 LUW with DPF from Jaql. n/a Inc Big R (support for Project R statistics and visualization) n/a Inc LDAP authentication, Kerberos authentication, Guardium support, etc. n/a Inc Web console with admin facilities, application catalog, etc. n/a Inc Business process accelerators (social data, machine data analytics) n/a Inc Platform enhancements (GPFS-FPO, Adaptive MapReduce, efficient processing of compressed text files, flexible job scheduler, high availability, monitoring and alerts, etc.) n/a Inc Text analytics n/a Inc Eclipse tools for text analytic development, Jaql, Hive, Java, Big SQL, n/a Inc Applications for data import/export, social media, ad hoc query, etc. n/a Inc Spreadsheet-like analytical tool n/a Inc IBM support n/a Inc Streams, Watson Explorer, Data Click, Cognos BI (limited use licenses) n/a Inc Unlimited storage n/a Inc
  • 7. 7 © 2013 IBM Corporation A Closer Look at BigInsights . . . .
  • 8. 8 © 2013 IBM Corporation Web Installation Tool Seamless process for single node and cluster environments Integrated installation of all selected components Post-install validation of IBM and open source components Get up and running quickly! No need to iteratively download, configure, and test multiple open source projects and pre-requisite software.
  • 9. 9 © 2013 IBM Corporation Integrated Web Console Manage BigInsights – Inspect /monitor system health – Add / drop nodes – Start / stop services – Run / monitor jobs (applications) – Explore / modify file system – Create custom dashboards – . . . Launch applications – Spreadsheet-like analysis tool – Pre-built applications (IBM supplied or user developed) Publish applications Monitor cluster, applications, data, etc.
  • 10. 10 © 2013 IBM Corporation Spreadsheet-style Analysis Web-based analysis and visualization Spreadsheet-like interface – Define and manage long running data collection jobs – Analyze content of the text on the pages that have been retrieved
  • 11. 11 © 2013 IBM Corporation Big Data Application Ecosystem Eclipse App library MapReduce, Text Analytics Query App Development • Code application program, and generate associated App • Deploy Apps to Enterprise ManagerApp Development Publish Data integration scenario: Pre-defined work flows simplify loading data from various sources •Work flows can be configured, deployed, executed and scheduled Development tooling: •Text analytics •MapReduce •Query languages • . . . Application scenarios (web log, email, social media, ): • Samples provide starting point, speed time to value Big Data Web Console
  • 12. 12 © 2013 IBM Corporation Pre-built Applications 20+ software samples based on common customer needs – Useful for starting point for various applications – Accessible through Web console Available assets – Data movement • From relational DBMS, files, REST-based sources • To relational DBMS, files – Web crawler, social media data collectors, etc. – Ad hoc query – Monitoring – Data sampling and subsetting – TeraGen-TeraSort, WordCount sample applications
  • 13. 13 © 2013 IBM Corporation Running Applications from the Web Console
  • 14. 14 © 2013 IBM Corporation Chaining Applications (Drag-and-Drop)
  • 15. 15 © 2013 IBM Corporation Building a Big Data program – Big SQL example BigInsights plug-in Java MapReduce, Big SQL, Jaql, Hive, Pig, text analytics, etc.
  • 16. 16 © 2013 IBM Corporation Visualizing Results through Dashboards • Built-in dashboards for monitoring system health, application status, distributed file system, etc. • Easy to customize . . . . Add, group, or remove widgets for: • BigSheets collections and charts • Cluster/system Monitoring • HDFS monitoring • MapReduce metrics • Third party Widgets or Open Social Gadgets can be added to a dashboard • Create new, custom dashboards to suit your needs!
  • 17. 17 © 2013 IBM Corporation Big SQL 3.0 11-Apr-2014
  • 18. 18 © 2013 IBM Corporation BigInsights and Text Analytics • Distills structured info from unstructured text – Sentiment analysis – Consumer behavior – Illegal or suspicious activities – • Parses text and detects meaning with annotators • Understands the context in which the text is analyzed • Features pre-built extractors for names, addresses, phone numbers, etc. • Built-in support for English, Spanish, French, German, Portuguese, Dutch, Japanese, Chinese Football World Cup 2010, one team distinguished themselves well, losing to the eventual champions 1-0 in the Final. Early in the second half, Netherlands’ striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas made the save. Winger Andres Iniesta scored for Spain for the win. Unstructured text (document, email, etc) Classification and Insight
  • 19. 19 © 2013 IBM Corporation Text Analytics Lifecycle
  • 20. 20 © 2013 IBM Corporation Big R R Clients Scalable Statistic s Engine Data Sources Embedded R Execution R Packages R Packages 1 2 3 1. Explore, visualize, transform, and model big data using familiar R syntax and paradigm 2. Scale out R • Partitioning of large data (“divide”) • Parallel cluster execution of pushed down R code (“conquer”) • All of this from within the R environment (Jaql, Map/Reduce are hidden from you • Almost any R package can run in this environment 3. Scalable machine learning • A scalable statistics engine that provides canned algorithms, and an ability to author new ones, all via R “End-to-end integration of R into IBM BigInsights” Pull data (summaries) to R client Or, push R functions right on the data
  • 21. 21 © 2013 IBM Corporation IBM Accelerator for Telco Event Data Analytics • Telcos • Campaign management, real-time promotion, fraud detection, service assurance and network monitoring, • Ships with Streams v3, but works with BigInsights or PureSparta for Analytics (a.k.a. Netezza) IBM Accelerator for Social Data Analytics • B2C businesses • Sample applications: Customer acquisition / retention, Customer Segmentation or Micro Segmentation, Marketing Campaign Optimization, Lead generation, Brand Management or Surveillance • Ships with BigInsights v2 and Streams v3 IBM Accelerator for Machine Data Analytics • Cross-industry: manufacturing, oil & gas, energy and utility, healthcare, travel and transportation, CPG, Retail, etc. • Operational efficiency monitoring, security incident investigation. proactive maintenance, troubleshooting, outage prevention, efficiency tracking, etc • Ships with BigInsights v2 Application Accelerators Quickly build, deploy custom applications in high-value areas
  • 22. 22 © 2013 IBM Corporation Adaptive MapReduce (Platform Symphony) option Other Grid Server Broker Engines Each engine polls broker ~5 times per second (configurable) Send work when engine ready Client Serialize input data Network transport (client to broker) Wait for engine to poll broker Network transport (broker to engine) De-serialize Input data Compute Result Serialize result Post result back to broker Time Broker Compute time Platform Symphony advantages: Efficient C language routines use CDR (common data representation) and IOCP rather than slow, heavy-weight XML data encoding) Network transit time is reduced by avoiding text based HTTP protocol and encoding data in more compact CDR binary format Processing time for all Symphony services is reduced by using a native HPC C/C++ implementation for system services rather than Java Platform Symphony has a more efficient “push model” that avoids entirely the architectural problems with polling Platform Symphony Serialize input Network transport SSM Compute time & logging Time Network transport (SSM to engine) De-serialize Serialize Network transport (engine to SSM) Compute result No wait time due to polling, faster serialization/de-serialization, More network efficient protocol
  • 23. 23 © 2013 IBM Corporation 2 3 GPFS – FPO • File system alternative to HDFS. Optional. • Key features • No single point of failure • Built-in High Availability • POSIX compliance • Enhanced Security with ACL support • Support for Storage Pools • SnapShot capability 23
  • 24. 24 © 2013 IBM Corporation • Broad connectivity Traditional and big data sources • Simple end-to-end experience •Web-based configuration InfoSphere Data Click self-service data integration on-demand 24
  • 25. 25 © 2013 IBM Corporation Growing Ecosystem of Solutions IBM Solutions Partner Solutions . . . with more to comePlatform Symphony Cognos Consumer Insight
  • 26. 26 © 2013 IBM Corporation BigInsights Data warehouse Traditional analytic tools Big Data analytic applications Filter Transform Aggregate BigInsights and the data warehouse
  • 27. 27 © 2013 IBM Corporation BigInsights and the data warehouse BigInsights • Query-ready platform for “cold” warehouse data Data Warehouse Big Data analytic applications Traditional analytic tools
  • 28. 28 © 2013 IBM Corporation BigInsights: Value Beyond Open Source Enterprise Capabilities Administration & Security Workload Optimization Connectors Open source components Advanced Engines Visualization & Exploration Development Tools IBM-certified Apache Hadoop and related projects Key differentiators • Built-in text analytics • Enterprise software integration • SQL support • Spreadsheet-style analysis • Integrated installation of supported open source and other components • Web Console for admin and application access • Platform enrichment: additional security, performance features, GPFS (alternative file system), . . . • World-class support • Full open source compatibility Business benefits • Quicker time-to-value due to IBM technology and support • Reduced operational risk • Enhanced business knowledge with flexible analytical platform • Leverages and complements existing software
  • 29. 29 © 2013 IBM Corporation Want to learn more? Download Quick Start Edition Test drive the technologies – Follow online tutorials – Enroll in online classes – Watch video demos, read articles, etc. Links all available from HadoopDev – https://developer.ibm.com/hadoop/
  • 30. IBM big data • IBM big data • IBM big data IBM big data • IBM big data • IBM big data IBMbigdata•IBMbigdata IBMbigdata•IBMbigdata