SlideShare a Scribd company logo
1 of 33
Download to read offline
InfoSphere BigInsights
Analytics power for Hadoop – field experience
Wilfried Hoge
IT Architect Big Data
@wilfriedhoge
Stephan Reimann
IT Specialist Big Data
@stereimann
© 2015 International Business Machines Corporation 2
IBM BigInsights – Open Source and IBM Value Adds
Real-time Analytics
InfoSphere Streams
Enterprise Performance
Adaptive Map Reduce & Big SQL
Storage Integration
GPFS POSIX Distributed Filesystem
Data Governance and Security
Data Click, LDAP and Secured Cluster
Search
BigIndex and Data Explorer
Data Exploration
BigSheets “schema-on-read” tooling
MapReduceHDFS HBase Flume
Pig
Lucene
Jaql ZooKeeperOozie Hive
Sqoop
HCatalog
100% based on Apache Open Source Hadoop Components
Predictive Modeling
BigR scalable data mining” on R
Text Analytics
Text processing with AQL
Application Tooling
Toolkits and accelerators
ANSI SQL
BigSQL Optimized SQL support
© 2015 International Business Machines Corporation 3
Key Differentiators for BigInsights
Enterprise Performance
& Integration
Analytics
Usability
& Productivity
•  Workload / performance
optimization
•  GPFS
•  Security
•  Key integrations & Connectors
with Enterprise Ecosystem
•  Text analytics
•  Social Data Analytics
Accelerators
•  Machine Data Analytics
Accelerators
•  Execute R in an integrated
application
•  Big SQL
•  BigSheets
•  Development Tools
•  Web Console
© 2015 International Business Machines Corporation 4
Field experience – analyzing binary data
The challenge
•  Use case
– Enable users to analyze data that is provided in binary format without the
need to run scripts
•  Challenges
– Binary to csv transformation
– Access csv data on HDFS to directly analyze content
– Access csv data from BI tools through SQL
– Possibility to analyze the data for technical business users
– Flexible automation capabilities (scheduling)
© 2015 International Business Machines Corporation 5
Field experience – analyzing binary data
The binary file – direct analysis not possible
© 2015 International Business Machines Corporation 6
Running Applications on Big Data
•  Browse available applications
•  Deploy published applications
(administrators only)
•  Launch (or schedule for launch) a
deployed application
•  Monitor job (application) execution
status
•  Predefined applications
•  Import & Export Data
•  Database & Files
•  Web and Social
•  Analyze and Query
•  Predictive Analytics
•  Text Analytics
•  SQL/Hive, Jaql, Pig, Hbase
•  Accelerators
© 2015 International Business Machines Corporation 7
7
Editors
•  A workflow editor that greatly simplifies the
creation of complex Oozie workflows with a
consumable interface
•  A Pig/Jaql Editor with content assist and syntax
highlighting that enables users to create and
execute new applications using Pig or Jaql in
local or cluster mode from the Eclipse IDE
Application development & deployment
•  Enablement of BigSheets macro
and BigSheets reader development
•  Text Analytics development,
including support for modular
rule sets
•  Publish new application: BigSheets
Macro, BigSheets Reader, AQL
module, Jaql module
Tools for Developers 1. Sample your
Data
2. Develop your
application using
BigInsights tools
3. Test your
application
4. Package and publish your
application
5. Deploy your
application on the
cluster
© 2015 International Business Machines Corporation 8
Field experience – analyzing binary data
Developing and publishing a transformation application
© 2015 International Business Machines Corporation 9
Field experience – analyzing binary data
The transformation application – user can convert binary data to csv
© 2015 International Business Machines Corporation 10
BigSheets to analyze and visualize
•  Model “big data” collected
from various sources in
spreadsheet-like structures
•  Filter and enrich content with
built-in functions
•  Combine data in different
workbooks
•  Visualize results through
spreadsheets, charts
•  Export data into common
formats (if desired)
No programming knowledge needed!
© 2015 International Business Machines Corporation 11
Field experience – analyzing binary data
The csv file – BigSheets offers easy analysis
© 2015 International Business Machines Corporation 12
Field experience – analyzing binary data
An analytical result with BigSheets
© 2015 International Business Machines Corporation 13
Field experience – analyzing binary data
The loader application – create tables for analysis
© 2015 International Business Machines Corporation 14
Big SQL 3.0 – Architected for Performance
•  Leverage IBM's rich SQL heritage, expertise, and technology
–  Modern SQL:2011 capabilities
–  DB2 compatible SQL PL support
•  SQL bodied functions and stored procedures
•  Application logic/security encapsulation
•  Architected from the ground up for performance
–  low latency and high throughput
•  MapReduce replaced with a modern MPP
architecture
–  Compiler and runtime are native code (not java)
–  Big SQL worker daemons live directly on cluster
–  Continuously running (no startup latency)
–  Processing happens locally at the data
•  Operations occur in memory with the ability
to spill to disk
–  Supports aggregations and sorts larger than available RAM
•  Integration with BigSheets (source & target)
InfoSphere BigInsights
Big SQL
SQL MPP Runtime
Data Sources
Parquet CSV Seq RC
Avro ORC JSON Custom
SQL-based
Application
IBM Data Server Client
© 2015 International Business Machines Corporation 15
Big SQL 3.0 – Architecture cont.
•  Head (coordinator / management) node
–  Listens to the JDBC/ODBC connections and compiles / optimizes the query
–  Coordinates the execution of the query
–  Optionally store user data in traditional RDBMS table (single node only)
•  Big SQL worker processes reside on compute nodes (some or all)
•  Worker nodes stream data between each other as needed
•  Workers can spill large data sets to local disk if needed
–  Allows Big SQL to work with data sets
larger than available memory
Mgmt Node
Big SQL
Mgmt Node
Hive
Metastore
Mgmt Node
Name Node
Mgmt Node
Job Tracker
•••
Compute Node
Task
Tracker
Data
Node
Compute Node
Task
Tracker
Data
Node
Compute Node
Task
Tracker
Data
Node
Compute Node
Task
Tracker
Data
Node•••
Big
SQL
Big
SQL
Big
SQL
Big
SQL
GPFS/HDFS
© 2015 International Business Machines Corporation 16
Big SQL 3.0 – Features
Data shared with Hadoop ecosystem
Comprehensive file format support
Superior enablement of IBM software
Enhanced by Third Party software
Modern MPP runtime
Powerful SQL query rewriter
Cost based optimizer
Optimized for concurrent user throughput
Results not constrained by memory
Distributed requests to multiple data
sources within a single SQL statement
Main data sources supported:
DB2 LUW, DB2/z, Teradata, Oracle, Netezza
Advanced security/auditing
Resource and workload management
Self tuning memory management
Comprehensive monitoring
Comprehensive SQL Support
IBM SQL PL compatibility
Application Portability & Integration
Federation
Performance
Enterprise Features
Rich SQL
© 2015 International Business Machines Corporation 17
Field experience – analyzing binary data
Run complex SQL on generated tables
INSERT INTO Sites
(Counter,Tested,Site1,Site_num1,Number_of_xxxx_tested,
XA1,Percentage_of_xxxx_per_yyyy,
Counter_plus_one,Pass,Site2,Site_num2,Number_of_pass_xxxx,
ZB2,xxxxx_of_site_num,xxxx_file_name)
SELECT 12000 + ROW_NUMBER() OVER () * 10,'Tested','Site’,tab1.Site_num,
(SELECT sum(tab2.piece_count) FROM tab2 WHERE
tab2.site_num=tab1.site_num) as num_xxxx_tested,
'PA',(SELECT sum(tab2.piece_count) FROM tab2 WHERE
tab2.site_num=tab1.site_num and tab2.head_num=255),
34000 + ROW_NUMBER() OVER () * 10 + 1,'Pass','Site',tab1.site_num,
(SELECT COUNT(*) FROM tab1 as tab12 WHERE
tab1.site_num=tab12.site_num and tab1.piece_Flg=0) as num_xxxx_passed,
'PA',((SELECT sum(tab2.piece_count) FROM tab2 WHERE
tab2.site_num=tab1.site_num)
/ NULLIF(0.001,(SELECT COUNT(*) FROM tab1 as tab12 WHERE
tab1.site_num=tab12.site_num and tab1.piece_Flg=0))),
tab1.xxxx_file_name
FROM tab1 as tab1, tab2 as tab2
GROUP BY tab1.site_num, tab1.piece_Flg, tab1.xxxx_file_name;
rank function
subselects
© 2015 International Business Machines Corporation 18
Application linking and interfaces to build new apps
•  Compose new
applications from
existing applications
and BigSheets
•  Invoke analytics
applications from the
web console, including
integration within
BigSheets
•  REST data source App
that enables users to
load data from any data source supporting REST APIs into BigInsights,
including popular social media services
•  Sampling App that enables users to sample data for analysis
•  Subsetting App that enables users to subset data for data analysis
18
© 2015 International Business Machines Corporation 19
Field experience – analyzing binary data
User builds his/her own application flow
© 2015 International Business Machines Corporation 20
Field experience – analyzing binary data
What was achieved 1/2
– Conversion from binary to csv (Transformation App)
•  Customer provided Java classes that read binary file and produced csv output
•  Developer embedded java code in an BigInsights application
•  User can provide source and target path
•  User can provide filters if not the whole data set should be extracted
•  User can schedule the application (with parameters)
•  Application automatically has a REST interface for external scheduling
•  Application uses map/reduce for scaling if larger number of files have to be
transformed
– User can analyze the csv files with BigSheets
© 2015 International Business Machines Corporation 21
Field experience – analyzing binary data
What was achieved 2/2
– Create SQL tables from csv (Loader App)
•  Developer embedded necessary SQL in App
•  User can create tables from csv files
– User can run complex SQL on tables with preferred Front-End tool
– User can combine Apps and create his/her own flow
© 2015 International Business Machines Corporation 22
IBM BigInsights brings efficient integration of R with Big R
•  R as a big data query language
– Outside-in execution
•  R as a statistical language for
deep computing
– Inside-out execution
– Partitioning of large data (“divide”)
– Parallel cluster execution of pushed
down R code (“conquer”)
– Almost any R package can run in
this environment
•  R as the gateway to scalable
machine learning
– A scalable ML engine that provides
canned algorithms, and an ability to
author new ones, all via R
R Clients
Scalable
ML
Engine
Data Sources
Embedded R Execution
R Packages
R Packages
Pull data
(summaries) to
R client
Or, push R
functions right
on the data
© 2015 International Business Machines Corporation 23
SystemML – Declarative high-level language
SystemML
tokens
documents
1 1 0.10
1 2 0.30
1 3 0.22
1 4 1.24
: : :
: : :
W
H
Ktopics
tokensK topics
documents
1 1 0.10
1 2 0.30
: : :
Topic Detection in Social Media
§  Modeled after R syntax and semantics
§  Expressivity
–  Express a wide class of algorithms: Descriptive
statistics, linear & logistic regression, decision trees,
SVM, MCMC simulation, etc.
§  Productivity
–  Enable programmer productivity: algorithm developer
does not have to worry about scalability, numeric
stability and optimizations
§  Performance and Scalability
–  Optimizer to generate low-level executions plans
•  Cost-based operator selection based on
−  data characteristics (dimensions, sparsity)
−  cluster characteristics (memory, parallelism)
•  Generation of runtime execution plan
§  Big Data
–  Sparsity-driven data representation and operator
implementations for data sets with Billions of
non-zero values
© 2015 International Business Machines Corporation 24* Requires Service Engagement
ISV Partner
Solution
Type
BigInsight
Version
Certified
ISV Partner
Solution
Type
BigInsight
Version
Certified
Data
Integration
2.1 (3.0 in
process 4Q)
Reporting 2.1 & 3.0
Data Security 2.1.2
Customer
Analytics
2.1.2
Cluster Mgt 3.0
Analytics
2.1.2 (3.0 in
process)
Data Vis 2.1 (3.0 in process)
Visual
Reporting
2.1 & 3.0
Data Virtual-
ization
2.1.2 & 3.0
TDHC 3.0
Analytics 2.1.2&3.0
Aster 3.0 *
Data
Integration
2.1 (3.0 in
process 3Q)
Backup &
Recovery
2.1.2
IBM Product
Solution
Type
BigInsight
Version
Certified
IBM Product
Solution
Type
BigInsight
Version
Certified
Business
Intelligence
2.1.2 (3.0 end
of Nov’14)
Predictive
Analytics
2.1.2 (3.0
mid4Q)
InfoSphere Information
Server v11.3
Data
Integration
3.0
SPSSv10.2.1 AS v1.0.1
BigInsights Certifications
© 2015 International Business Machines Corporation 25
lHelium SW
BigInsights ISV Partner Ecosystem
© 2015 International Business Machines Corporation 26
Get started with BigInsights
•  Hadoop Dev: links to videos, white papers, lab, . . . .
http://developer.ibm.com/hadoop/
•  BigInsights Trials
http://ibm.com/software/data/infosphere/hadoop/trials.html
IBM big data • IBM big data • IBM big data
IBM big data • IBM big data • IBM big data
IBMbigdata•IBMbigdata
IBMbigdata•IBMbigdata
THINK
© 2015 International Business Machines Corporation 28
BigInsights has a simple but
effective security system based
on a gateway to Hadoop
•  All Hadoop servers are connected over a
private network
•  Unrestricted communication between cluster
servers on the private network
•  BigInsights Web Console acts as a
gateway into the cluster
•  Authentication through PAM, LDAP Kerberos
•  Role based authorization
•  Authorization will be enforced at 3 levels:
– UI level
– Data level
– Map-Reduce level
•  Authorization also respected by services (e.g. SQL)
Authentication
Authority
Gateway / Web Console
External
Sources
Users
Services
Data
Nodes
Infrastr.
Nodes
Distributed Filesystem
© 2015 International Business Machines Corporation 29
Manage your HDFS Files
•  Navigate the distributed file system to see what’s stored
•  Create/remove/rename directories
•  Modify permissions
•  Upload / download files, remove/rename files, Edit files
•  Execute Hadoop file system shell commands
© 2015 International Business Machines Corporation 30
About the Hadoop-DS Benchmark
§ Created by IBM
§ The Big Data Decision Support Benchmark (Hadoop-DS) is inspired
by, and is highly compliant with TPC-DS
-  Fully complies with the TPC-DS schema requirement
-  Uses all 99 queries
-  Meets the multi-user requirement
-  Has been audited by a TPC-DS auditor but as a non-TPC benchmark
§ Select deviations from TPC-DS due to Hadoop limitations:
-  No data maintenance operations, referential integrity enforcement, or ACID
property validation as these are not feasible with HDFS
-  Additional statistics used
-  Metric adjustments
-  No price/performance measures included
-  Not an official TPC benchmark result
© 2015 International Business Machines Corporation 31
IBM Big SQL – Runs 100% of the TPC-DS queries
Key points
§  With competing solutions, many
queries needed to be re-written,
some significantly
§  Owing to various restrictions,
some queries could not be re-
written or failed at run-time
§  Re-writing queries in a
benchmark scenario where
results are known is one thing –
doing this against real databases
in production is another
Competitive environments require significant effort
© 2015 International Business Machines Corporation 32
IBM Big SQL – Leading performance
0
2.000
4.000
6.000
8.000
10.000
12.000
14.000
16.000
18.000
Big SQL Impala Hive
Power run (single-stream) - seconds
As measured across the subset of queries that Impala and Hive can both run
3.6x
FASTER!!
48:29
2:55:35
4:30:35
3.6x faster than Impala, 5.6x faster than Hive
* Subject to findings of TPC auditor – full disclosure report expected late October 2014
© 2015 International Business Machines Corporation 33
IBM Big SQL – Leading performance
0
10.000
20.000
30.000
40.000
50.000
60.000
70.000
Big SQL Impala Hive
Throughput run – (four streams) - seconds
As measured across the subset of queries that Impala and Hive can both run
1:54:02
4:08:39
16:32:12
2.2x
FASTER!!
2.2x faster than Impala, 4x faster than Hive
* Subject to findings of TPC auditor – full disclosure report expected late October 2014

More Related Content

What's hot

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataHortonworks
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesDataWorks Summit
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...DataStax Academy
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z osCuneyt Goksu
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricksBrandon Berlinrut
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDataWorks Summit
 

What's hot (20)

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big DataMicrosoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
 
Ironfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data InfrastructureIronfan: Your Foundation for Flexible Big Data Infrastructure
Ironfan: Your Foundation for Flexible Big Data Infrastructure
 
Information Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data LakesInformation Virtualization: Query Federation on Data Lakes
Information Virtualization: Query Federation on Data Lakes
 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Ibm machine learning for z os
Ibm machine learning for z osIbm machine learning for z os
Ibm machine learning for z os
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 
Chug building a data lake in azure with spark and databricks
Chug   building a data lake in azure with spark and databricksChug   building a data lake in azure with spark and databricks
Chug building a data lake in azure with spark and databricks
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Data Lake
Data LakeData Lake
Data Lake
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
 

Viewers also liked

Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...
Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...
Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...Stephan Reimann
 
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBMInfoSphereUGFR
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...Romeo Kienzler
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Niu Bai
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3IBMInfoSphereUGFR
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 
Real time video analytics with InfoSphere Streams, OpenCV and R
Real time video analytics with InfoSphere Streams, OpenCV and RReal time video analytics with InfoSphere Streams, OpenCV and R
Real time video analytics with InfoSphere Streams, OpenCV and RStephan Reimann
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesAnders Quitzau
 

Viewers also liked (9)

Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...
Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...
Marvin, Data Science & Spark – haben wir ohne Mathematik und Technik noch ein...
 
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZIBM InfoSphere MDM v11 Overview - Aomar BARIZ
IBM InfoSphere MDM v11 Overview - Aomar BARIZ
 
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
The European Conference on Software Architecture (ECSA) 14 - IBM BigData Refe...
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714Value proposition for big data isv partners 0714
Value proposition for big data isv partners 0714
 
Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3Présentation IBM InfoSphere MDM 11.3
Présentation IBM InfoSphere MDM 11.3
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 
Real time video analytics with InfoSphere Streams, OpenCV and R
Real time video analytics with InfoSphere Streams, OpenCV and RReal time video analytics with InfoSphere Streams, OpenCV and R
Real time video analytics with InfoSphere Streams, OpenCV and R
 
Big Data Analytics in Energy & Utilities
Big Data Analytics in Energy & UtilitiesBig Data Analytics in Energy & Utilities
Big Data Analytics in Energy & Utilities
 

Similar to InfoSphere BigInsights - Analytics power for Hadoop - field experience

EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deckmister_moun
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosCresco International
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeNicolas Morales
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsStephan Reimann
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoRomit Mehta
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_SuiteRobin Fong 方俊强
 

Similar to InfoSphere BigInsights - Analytics power for Hadoop - field experience (20)

EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deck
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM Cognos
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
The sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of ThingsThe sensor data challenge - Innovations (not only) for the Internet of Things
The sensor data challenge - Innovations (not only) for the Internet of Things
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit OrlandoGimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
Gimel and PayPal Notebooks @ TDWI Leadership Summit Orlando
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Geode Meetup Apachecon
Geode Meetup ApacheconGeode Meetup Apachecon
Geode Meetup Apachecon
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
ds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suiteds_Pivotal_Big_Data_Suite_Product_Suite
ds_Pivotal_Big_Data_Suite_Product_Suite
 

More from Wilfried Hoge

Cloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloudCloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloudWilfried Hoge
 
Is it harder to find a taxi when it is raining?
Is it harder to find a taxi when it is raining? Is it harder to find a taxi when it is raining?
Is it harder to find a taxi when it is raining? Wilfried Hoge
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...Wilfried Hoge
 
2015.05.07 watson rp15
2015.05.07 watson rp152015.05.07 watson rp15
2015.05.07 watson rp15Wilfried Hoge
 
Twitter analytics in Bluemix
Twitter analytics in BluemixTwitter analytics in Bluemix
Twitter analytics in BluemixWilfried Hoge
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcastWilfried Hoge
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2Wilfried Hoge
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big DataWilfried Hoge
 

More from Wilfried Hoge (8)

Cloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloudCloud Data Services - from prototyping to scalable analytics on cloud
Cloud Data Services - from prototyping to scalable analytics on cloud
 
Is it harder to find a taxi when it is raining?
Is it harder to find a taxi when it is raining? Is it harder to find a taxi when it is raining?
Is it harder to find a taxi when it is raining?
 
innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...innovations born in the cloud - cloud data services from IBM to prototype you...
innovations born in the cloud - cloud data services from IBM to prototype you...
 
2015.05.07 watson rp15
2015.05.07 watson rp152015.05.07 watson rp15
2015.05.07 watson rp15
 
Twitter analytics in Bluemix
Twitter analytics in BluemixTwitter analytics in Bluemix
Twitter analytics in Bluemix
 
2013.12.12 big data heise webcast
2013.12.12 big data heise webcast2013.12.12 big data heise webcast
2013.12.12 big data heise webcast
 
2012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum22012.04.26 big insights streams im forum2
2012.04.26 big insights streams im forum2
 
IBM - Big Value from Big Data
IBM - Big Value from Big DataIBM - Big Value from Big Data
IBM - Big Value from Big Data
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

InfoSphere BigInsights - Analytics power for Hadoop - field experience

  • 1. InfoSphere BigInsights Analytics power for Hadoop – field experience Wilfried Hoge IT Architect Big Data @wilfriedhoge Stephan Reimann IT Specialist Big Data @stereimann
  • 2. © 2015 International Business Machines Corporation 2 IBM BigInsights – Open Source and IBM Value Adds Real-time Analytics InfoSphere Streams Enterprise Performance Adaptive Map Reduce & Big SQL Storage Integration GPFS POSIX Distributed Filesystem Data Governance and Security Data Click, LDAP and Secured Cluster Search BigIndex and Data Explorer Data Exploration BigSheets “schema-on-read” tooling MapReduceHDFS HBase Flume Pig Lucene Jaql ZooKeeperOozie Hive Sqoop HCatalog 100% based on Apache Open Source Hadoop Components Predictive Modeling BigR scalable data mining” on R Text Analytics Text processing with AQL Application Tooling Toolkits and accelerators ANSI SQL BigSQL Optimized SQL support
  • 3. © 2015 International Business Machines Corporation 3 Key Differentiators for BigInsights Enterprise Performance & Integration Analytics Usability & Productivity •  Workload / performance optimization •  GPFS •  Security •  Key integrations & Connectors with Enterprise Ecosystem •  Text analytics •  Social Data Analytics Accelerators •  Machine Data Analytics Accelerators •  Execute R in an integrated application •  Big SQL •  BigSheets •  Development Tools •  Web Console
  • 4. © 2015 International Business Machines Corporation 4 Field experience – analyzing binary data The challenge •  Use case – Enable users to analyze data that is provided in binary format without the need to run scripts •  Challenges – Binary to csv transformation – Access csv data on HDFS to directly analyze content – Access csv data from BI tools through SQL – Possibility to analyze the data for technical business users – Flexible automation capabilities (scheduling)
  • 5. © 2015 International Business Machines Corporation 5 Field experience – analyzing binary data The binary file – direct analysis not possible
  • 6. © 2015 International Business Machines Corporation 6 Running Applications on Big Data •  Browse available applications •  Deploy published applications (administrators only) •  Launch (or schedule for launch) a deployed application •  Monitor job (application) execution status •  Predefined applications •  Import & Export Data •  Database & Files •  Web and Social •  Analyze and Query •  Predictive Analytics •  Text Analytics •  SQL/Hive, Jaql, Pig, Hbase •  Accelerators
  • 7. © 2015 International Business Machines Corporation 7 7 Editors •  A workflow editor that greatly simplifies the creation of complex Oozie workflows with a consumable interface •  A Pig/Jaql Editor with content assist and syntax highlighting that enables users to create and execute new applications using Pig or Jaql in local or cluster mode from the Eclipse IDE Application development & deployment •  Enablement of BigSheets macro and BigSheets reader development •  Text Analytics development, including support for modular rule sets •  Publish new application: BigSheets Macro, BigSheets Reader, AQL module, Jaql module Tools for Developers 1. Sample your Data 2. Develop your application using BigInsights tools 3. Test your application 4. Package and publish your application 5. Deploy your application on the cluster
  • 8. © 2015 International Business Machines Corporation 8 Field experience – analyzing binary data Developing and publishing a transformation application
  • 9. © 2015 International Business Machines Corporation 9 Field experience – analyzing binary data The transformation application – user can convert binary data to csv
  • 10. © 2015 International Business Machines Corporation 10 BigSheets to analyze and visualize •  Model “big data” collected from various sources in spreadsheet-like structures •  Filter and enrich content with built-in functions •  Combine data in different workbooks •  Visualize results through spreadsheets, charts •  Export data into common formats (if desired) No programming knowledge needed!
  • 11. © 2015 International Business Machines Corporation 11 Field experience – analyzing binary data The csv file – BigSheets offers easy analysis
  • 12. © 2015 International Business Machines Corporation 12 Field experience – analyzing binary data An analytical result with BigSheets
  • 13. © 2015 International Business Machines Corporation 13 Field experience – analyzing binary data The loader application – create tables for analysis
  • 14. © 2015 International Business Machines Corporation 14 Big SQL 3.0 – Architected for Performance •  Leverage IBM's rich SQL heritage, expertise, and technology –  Modern SQL:2011 capabilities –  DB2 compatible SQL PL support •  SQL bodied functions and stored procedures •  Application logic/security encapsulation •  Architected from the ground up for performance –  low latency and high throughput •  MapReduce replaced with a modern MPP architecture –  Compiler and runtime are native code (not java) –  Big SQL worker daemons live directly on cluster –  Continuously running (no startup latency) –  Processing happens locally at the data •  Operations occur in memory with the ability to spill to disk –  Supports aggregations and sorts larger than available RAM •  Integration with BigSheets (source & target) InfoSphere BigInsights Big SQL SQL MPP Runtime Data Sources Parquet CSV Seq RC Avro ORC JSON Custom SQL-based Application IBM Data Server Client
  • 15. © 2015 International Business Machines Corporation 15 Big SQL 3.0 – Architecture cont. •  Head (coordinator / management) node –  Listens to the JDBC/ODBC connections and compiles / optimizes the query –  Coordinates the execution of the query –  Optionally store user data in traditional RDBMS table (single node only) •  Big SQL worker processes reside on compute nodes (some or all) •  Worker nodes stream data between each other as needed •  Workers can spill large data sets to local disk if needed –  Allows Big SQL to work with data sets larger than available memory Mgmt Node Big SQL Mgmt Node Hive Metastore Mgmt Node Name Node Mgmt Node Job Tracker ••• Compute Node Task Tracker Data Node Compute Node Task Tracker Data Node Compute Node Task Tracker Data Node Compute Node Task Tracker Data Node••• Big SQL Big SQL Big SQL Big SQL GPFS/HDFS
  • 16. © 2015 International Business Machines Corporation 16 Big SQL 3.0 – Features Data shared with Hadoop ecosystem Comprehensive file format support Superior enablement of IBM software Enhanced by Third Party software Modern MPP runtime Powerful SQL query rewriter Cost based optimizer Optimized for concurrent user throughput Results not constrained by memory Distributed requests to multiple data sources within a single SQL statement Main data sources supported: DB2 LUW, DB2/z, Teradata, Oracle, Netezza Advanced security/auditing Resource and workload management Self tuning memory management Comprehensive monitoring Comprehensive SQL Support IBM SQL PL compatibility Application Portability & Integration Federation Performance Enterprise Features Rich SQL
  • 17. © 2015 International Business Machines Corporation 17 Field experience – analyzing binary data Run complex SQL on generated tables INSERT INTO Sites (Counter,Tested,Site1,Site_num1,Number_of_xxxx_tested, XA1,Percentage_of_xxxx_per_yyyy, Counter_plus_one,Pass,Site2,Site_num2,Number_of_pass_xxxx, ZB2,xxxxx_of_site_num,xxxx_file_name) SELECT 12000 + ROW_NUMBER() OVER () * 10,'Tested','Site’,tab1.Site_num, (SELECT sum(tab2.piece_count) FROM tab2 WHERE tab2.site_num=tab1.site_num) as num_xxxx_tested, 'PA',(SELECT sum(tab2.piece_count) FROM tab2 WHERE tab2.site_num=tab1.site_num and tab2.head_num=255), 34000 + ROW_NUMBER() OVER () * 10 + 1,'Pass','Site',tab1.site_num, (SELECT COUNT(*) FROM tab1 as tab12 WHERE tab1.site_num=tab12.site_num and tab1.piece_Flg=0) as num_xxxx_passed, 'PA',((SELECT sum(tab2.piece_count) FROM tab2 WHERE tab2.site_num=tab1.site_num) / NULLIF(0.001,(SELECT COUNT(*) FROM tab1 as tab12 WHERE tab1.site_num=tab12.site_num and tab1.piece_Flg=0))), tab1.xxxx_file_name FROM tab1 as tab1, tab2 as tab2 GROUP BY tab1.site_num, tab1.piece_Flg, tab1.xxxx_file_name; rank function subselects
  • 18. © 2015 International Business Machines Corporation 18 Application linking and interfaces to build new apps •  Compose new applications from existing applications and BigSheets •  Invoke analytics applications from the web console, including integration within BigSheets •  REST data source App that enables users to load data from any data source supporting REST APIs into BigInsights, including popular social media services •  Sampling App that enables users to sample data for analysis •  Subsetting App that enables users to subset data for data analysis 18
  • 19. © 2015 International Business Machines Corporation 19 Field experience – analyzing binary data User builds his/her own application flow
  • 20. © 2015 International Business Machines Corporation 20 Field experience – analyzing binary data What was achieved 1/2 – Conversion from binary to csv (Transformation App) •  Customer provided Java classes that read binary file and produced csv output •  Developer embedded java code in an BigInsights application •  User can provide source and target path •  User can provide filters if not the whole data set should be extracted •  User can schedule the application (with parameters) •  Application automatically has a REST interface for external scheduling •  Application uses map/reduce for scaling if larger number of files have to be transformed – User can analyze the csv files with BigSheets
  • 21. © 2015 International Business Machines Corporation 21 Field experience – analyzing binary data What was achieved 2/2 – Create SQL tables from csv (Loader App) •  Developer embedded necessary SQL in App •  User can create tables from csv files – User can run complex SQL on tables with preferred Front-End tool – User can combine Apps and create his/her own flow
  • 22. © 2015 International Business Machines Corporation 22 IBM BigInsights brings efficient integration of R with Big R •  R as a big data query language – Outside-in execution •  R as a statistical language for deep computing – Inside-out execution – Partitioning of large data (“divide”) – Parallel cluster execution of pushed down R code (“conquer”) – Almost any R package can run in this environment •  R as the gateway to scalable machine learning – A scalable ML engine that provides canned algorithms, and an ability to author new ones, all via R R Clients Scalable ML Engine Data Sources Embedded R Execution R Packages R Packages Pull data (summaries) to R client Or, push R functions right on the data
  • 23. © 2015 International Business Machines Corporation 23 SystemML – Declarative high-level language SystemML tokens documents 1 1 0.10 1 2 0.30 1 3 0.22 1 4 1.24 : : : : : : W H Ktopics tokensK topics documents 1 1 0.10 1 2 0.30 : : : Topic Detection in Social Media §  Modeled after R syntax and semantics §  Expressivity –  Express a wide class of algorithms: Descriptive statistics, linear & logistic regression, decision trees, SVM, MCMC simulation, etc. §  Productivity –  Enable programmer productivity: algorithm developer does not have to worry about scalability, numeric stability and optimizations §  Performance and Scalability –  Optimizer to generate low-level executions plans •  Cost-based operator selection based on −  data characteristics (dimensions, sparsity) −  cluster characteristics (memory, parallelism) •  Generation of runtime execution plan §  Big Data –  Sparsity-driven data representation and operator implementations for data sets with Billions of non-zero values
  • 24. © 2015 International Business Machines Corporation 24* Requires Service Engagement ISV Partner Solution Type BigInsight Version Certified ISV Partner Solution Type BigInsight Version Certified Data Integration 2.1 (3.0 in process 4Q) Reporting 2.1 & 3.0 Data Security 2.1.2 Customer Analytics 2.1.2 Cluster Mgt 3.0 Analytics 2.1.2 (3.0 in process) Data Vis 2.1 (3.0 in process) Visual Reporting 2.1 & 3.0 Data Virtual- ization 2.1.2 & 3.0 TDHC 3.0 Analytics 2.1.2&3.0 Aster 3.0 * Data Integration 2.1 (3.0 in process 3Q) Backup & Recovery 2.1.2 IBM Product Solution Type BigInsight Version Certified IBM Product Solution Type BigInsight Version Certified Business Intelligence 2.1.2 (3.0 end of Nov’14) Predictive Analytics 2.1.2 (3.0 mid4Q) InfoSphere Information Server v11.3 Data Integration 3.0 SPSSv10.2.1 AS v1.0.1 BigInsights Certifications
  • 25. © 2015 International Business Machines Corporation 25 lHelium SW BigInsights ISV Partner Ecosystem
  • 26. © 2015 International Business Machines Corporation 26 Get started with BigInsights •  Hadoop Dev: links to videos, white papers, lab, . . . . http://developer.ibm.com/hadoop/ •  BigInsights Trials http://ibm.com/software/data/infosphere/hadoop/trials.html
  • 27. IBM big data • IBM big data • IBM big data IBM big data • IBM big data • IBM big data IBMbigdata•IBMbigdata IBMbigdata•IBMbigdata THINK
  • 28. © 2015 International Business Machines Corporation 28 BigInsights has a simple but effective security system based on a gateway to Hadoop •  All Hadoop servers are connected over a private network •  Unrestricted communication between cluster servers on the private network •  BigInsights Web Console acts as a gateway into the cluster •  Authentication through PAM, LDAP Kerberos •  Role based authorization •  Authorization will be enforced at 3 levels: – UI level – Data level – Map-Reduce level •  Authorization also respected by services (e.g. SQL) Authentication Authority Gateway / Web Console External Sources Users Services Data Nodes Infrastr. Nodes Distributed Filesystem
  • 29. © 2015 International Business Machines Corporation 29 Manage your HDFS Files •  Navigate the distributed file system to see what’s stored •  Create/remove/rename directories •  Modify permissions •  Upload / download files, remove/rename files, Edit files •  Execute Hadoop file system shell commands
  • 30. © 2015 International Business Machines Corporation 30 About the Hadoop-DS Benchmark § Created by IBM § The Big Data Decision Support Benchmark (Hadoop-DS) is inspired by, and is highly compliant with TPC-DS -  Fully complies with the TPC-DS schema requirement -  Uses all 99 queries -  Meets the multi-user requirement -  Has been audited by a TPC-DS auditor but as a non-TPC benchmark § Select deviations from TPC-DS due to Hadoop limitations: -  No data maintenance operations, referential integrity enforcement, or ACID property validation as these are not feasible with HDFS -  Additional statistics used -  Metric adjustments -  No price/performance measures included -  Not an official TPC benchmark result
  • 31. © 2015 International Business Machines Corporation 31 IBM Big SQL – Runs 100% of the TPC-DS queries Key points §  With competing solutions, many queries needed to be re-written, some significantly §  Owing to various restrictions, some queries could not be re- written or failed at run-time §  Re-writing queries in a benchmark scenario where results are known is one thing – doing this against real databases in production is another Competitive environments require significant effort
  • 32. © 2015 International Business Machines Corporation 32 IBM Big SQL – Leading performance 0 2.000 4.000 6.000 8.000 10.000 12.000 14.000 16.000 18.000 Big SQL Impala Hive Power run (single-stream) - seconds As measured across the subset of queries that Impala and Hive can both run 3.6x FASTER!! 48:29 2:55:35 4:30:35 3.6x faster than Impala, 5.6x faster than Hive * Subject to findings of TPC auditor – full disclosure report expected late October 2014
  • 33. © 2015 International Business Machines Corporation 33 IBM Big SQL – Leading performance 0 10.000 20.000 30.000 40.000 50.000 60.000 70.000 Big SQL Impala Hive Throughput run – (four streams) - seconds As measured across the subset of queries that Impala and Hive can both run 1:54:02 4:08:39 16:32:12 2.2x FASTER!! 2.2x faster than Impala, 4x faster than Hive * Subject to findings of TPC auditor – full disclosure report expected late October 2014