SlideShare a Scribd company logo
1 of 79
Download to read offline
EVENT SPEAKER
DANISH BI MEETUP, SEP’ 2016
FROM LOTS OF REPORTS (WITH SOME DATA ANALYSIS) 

TO MASSIVE DATA ANALYSIS (WITH SOME REPORTING)
MARK RITTMAN, ORACLE ACE DIRECTOR
info@rittmanmead.com www.rittmanmead.com @rittmanmead 2
•Mark Rittman, Co-Founder of Rittman Mead

‣Oracle ACE Director, specialising in Oracle BI&DW

‣14 Years Experience with Oracle Technology

‣Regular columnist for Oracle Magazine

•Author of two Oracle Press Oracle BI books

‣Oracle Business Intelligence Developers Guide

‣Oracle Exalytics Revealed

‣Writer for Rittman Mead Blog :

http://www.rittmanmead.com/blog

•Email : mark.rittman@rittmanmead.com

•Twitter : @markrittman
About the Speaker
info@rittmanmead.com www.rittmanmead.com @rittmanmead 3
•Started back in 1996 on a bank Oracle DW project

•Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 

and shell scripts

•Went on to use Oracle Developer/2000 and Designer/2000

•Our initial users queried the DW using SQL*Plus

•And later on, we rolled-out Discoverer/2000 to everyone else

•And life was fun…
20 Years in Oracle BI and Data Warehousing
info@rittmanmead.com www.rittmanmead.com @rittmanmead 4
•Data warehouses provided a unified view of the business

‣Single place to store key data and metrics

‣Joined-up view of the business

‣Aggregates and conformed dimensions

‣ETL routines to load, cleanse and conform data

•BI tools for simple, guided access to information

‣Tabular data access using SQL-generating tools

‣Drill paths, hierarchies, facts, attributes

‣Fast access to pre-computed aggregates

‣Packaged BI for fast-start ERP analytics
Data Warehouses and Enterprise BI Tools
Oracle
MongoDB
Oracle
Sybase
IBM	DB/2
MS	SQL	
MS	SQL	Server
Core	ERP	Platform
Retail	
Banking	
Call	Center	
E-Commerce	
CRM	


Business	
Intelligence	
Tools


Data	Warehouse
Access	&

Performance

Layer
ODS	/

Foundation

Layer
4
info@rittmanmead.com www.rittmanmead.com @rittmanmead 5
•Examples were Crystal Reports, Oracle Reports, Cognos Impromptu, Business Objects

•Report written against carefully-curated BI dataset, or directly connecting to ERP/CRM

•Adding data from external sources, or other RDBMSs,

was difficult and involved IT resources

•Report-writing was a skilled job

•High ongoing cost for maintenance and changes

•Little scope for analysis, predictive modeling

•Often user frustration and pace of delivery
Reporting Back Then…
5
info@rittmanmead.com www.rittmanmead.com @rittmanmead 6
•For example Oracle OBIEE, SAP Business Objects, IBM Cognos

•Full-featured, IT-orientated enterprise BI platforms

•Metadata layers, integrated security, web delivery

•Pre-build ERP metadata layers, dashboards + reports

•Federated queries across multiple sources

•Single version of the truth across the enterprise

•Mobile, web dashboards, alerts, published reports

•Integration with SOA and web services
Then Came Enterprise BI Tools
6
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Traditional Three-Layer Relational Data Warehouses
Staging Foundation /

ODS
Performance /

Dimensional
ETL ETL
BI Tool (OBIEE)

with metadata

layer
OLAP / In-Memory

Tool with data load

into own database
Direct

Read
Data

Load
Traditional structured

data sources
Data

Load
Data

Load
Data

Load
Traditional Relational Data Warehouse
•Three-layer architecture - staging, foundation and access/performance
•All three layers stored in a relational database (Oracle)
•ETL used to move data from layer-to-layer
And All Was Good…
(a big BI project)
Lots of reports 

(with some data analysis)
Meanwhile…
The world got digitised
and connected.
and users got impatient…
Reporting and Dashboards…
became self-service 

data discovery
Advanced analytics for everyone
Cloud and SaaS have won
BI has changed
info@rittmanmead.com www.rittmanmead.com @rittmanmead
The Gartner BI & Analytics Magic Quadrant 2016
info@rittmanmead.com www.rittmanmead.com @rittmanmead 29
Analytic Workflow
Component
Traditional BI Platform Modern BI Platform
Data source
Upfront dimensional modeling required (IT-built
star schemas)
Upfront modeling not required (flat files/
flat tables)
Data ingestion and
preparation
IT-produced IT-enabled
Content authoring Primarily IT staff, but also some power users Business users
Analysis
Predefined, ad hoc reporting, based on
predefined model
Free-form exploration
Insight delivery
Distribution and notifications via scheduled
reports or portal
Sharing and collaboration, storytelling,
open APIs
Gartner’s View of A “Modern BI Platform” in 2016
2007 - 2015
Died of ingratitude by business users

Just when we got the infrastructure right

Doesn’t anyone appreciate a single version of the truth?

Don’t say we didn’t warn you

No you can’t just export it to Excel

Watch out OLAP you’re next
Analytic data platforms 

info@rittmanmead.com www.rittmanmead.com @rittmanmead 32
•Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage

•Flexible data storage platform with cheap storage, flexible schema support + compute

•Data lands in the data lake or reservoir in raw form, then minimally processed

•Data then accessed directly by “data scientists”, or processed further into DW
Meet the New Data Warehouse : The “Data Reservoir”
Data	Transfer Data	Access
Data	Factory
Data	Reservoir
Business	
Intelligence	Tools
Hadoop	Platform
File	Based	
Integration
Stream	
Based	
Integration
Data	streams
Discovery	&	Development	Labs
Safe	&	secure	Discovery	and	Development	
environment
Data	sets	and	
samples
Models	and	
programs
Marketing	/
Sales	Applications
Models
Machine
Learning
Segments
Operational	Data
Transactions
Customer
Master	ata
Unstructured	Data
Voice	+	Chat	
Transcripts
ETL	Based
Integration
Raw	
Customer	Data
Data	stored	in	
the	original	
format	(usually	
files)		such	as	
SS7,	ASN.1,	
JSON	etc.
Mapped	
Customer	Data
Data	sets	
produced	by	
mapping	and	
transforming	
raw	data
Hadoop is the new 

Data Warehouse
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Hadoop : The Default Platform Today for Analytics
•Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering

‣Sharded databases (e.g. Netezza) can scale further but with complexity / single workload trade-offs

•Hadoop was designed from outside for massive horizontal scalability - using cheap hardware

•Anticipates hardware failure and makes multiple copies of data as protection

•More nodes you add, more stable it becomes

•And at a fraction of the cost of traditional

RDBMS platforms
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Data from new-world applications is not like historic data

•Typically comes in non-tabular form

•JSON, log files, key/value pairs

•Users often want it speculatively

•Haven’t thought it through

•Schema can evolve

•Or maybe there isn’t one

•But the end-users want it now

•Not when you’re ready
35
But Why Hadoop? Reason #1 - Flexible Storage
Big	Data	Management	Platform
Discovery	&	Development	Labs

Safe	&	secure	Discovery	and	Development	environment
Data	sets	and	
samples
Models	and	
programs
Single	Customer	View
Enriched	

Customer	Profile
Correlating
Modeling
Machine

Learning
Scoring
Schema-on

Read	Analysis
info@rittmanmead.com www.rittmanmead.com @rittmanmead
But Why Hadoop? Reason #2 - Massive Scalability
•Enterprise High-End RDBMSs such as Oracle can scale

‣Clustering for single-instance DBs can scale to >PB

‣Exadata scales further by offloading queries to storage

‣Sharded databases (e.g. Netezza) can scale further

‣But cost (and complexity) become limiting factors

‣Typically $1m/node is not uncommon
info@rittmanmead.com www.rittmanmead.com @rittmanmead
But Why Hadoop? Reason #2 - Massive Scalability
info@rittmanmead.com www.rittmanmead.com @rittmanmead
But Why Hadoop? Reason #2 - Massive Scalability
•Hadoop’s main design goal was to enable virtually-limitless horizontal scalability

•Rather than a small number of large, powerful servers, it spreads processing over

large numbers of small, cheap, redundant servers

•Processes the data where it’s stored, avoiding I/O bottlenecks

•The more nodes you add, the more stable it becomes!

•At an affordable cost - this is key

•$50k/node vs. $1m/node
•And … the Hadoop platform is a better fit for

new types of processing and analysis
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Big	Data	Platform	-	All	Running	Natively	Under	Hadoop
YARN	(Cluster	Resource	Management)
Batch

(MapReduce)
HDFS	(Cluster	Filesystem	holding	raw	data)
Interactive

(Impala,	Drill,

Tez,	Presto)
Streaming	+

In-Memory

(Spark,	Storm)
Graph	+	Search

(Solr,	Giraph)
Enriched	

Customer	Profile
Modeling
Scoring
But Why Hadoop? Reason #3 - Processing Frameworks
•Hadoop started by being synonymous with MapReduce, and Java coding

•But YARN (Yet another Resource Negotiator) broke this dependency

•Modern Hadoop platforms provide overall cluster resource management,

but support multiple processing frameworks

•General-purpose (e.g. MapReduce)

•Graph processing

•Machine Learning

•Real-Time Processing (Spark Streaming, Storm)

•Even the Hadoop resource management framework

can be swapped out

•Apache Mesos
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Combine With DW for Old-World/New-World Solution
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Most high-end RDBMS vendors provide connectors to load data in/out of Hadoop platforms

‣Bulk extract

‣External tables

‣Query federation

•Use high-end RDBMSs

as specialist engines

•a.k.a. "Data Marts"
But … Analytic RDBMSs Are The New Data Mart
Discovery	&	Development	Labs

Safe	&	secure	Discovery	and	Development	environment


Data

Warehouse	
Curated	data	:	
Historical	view	
and	business	
aligned	access


Business	
Intelligence	
Tools
Big	Data	Management	Platform
Data	sets	and	
samples
Models	and	
programs
Big	Data	Platform	-	All	Running	Natively	Under	Hadoop
YARN	(Cluster	Resource	Management)
Batch

(MapReduce)
HDFS	(Cluster	Filesystem	holding	raw	data)
Interactive

(Impala,	Drill,

Tez,	Presto)
Streaming	+

In-Memory

(Spark,	Storm)
Graph	+	Search

(Solr,	Giraph)
Enriched	

Customer	Profile
Modeling
Scoring
BI Innovation is happening

around Hadoop
BI Innovation is happening

around Hadoop
hold on though…
isn’t Hadoop Slow?
too slow

for ad-hoc querying?
welcome to 2016
(Hadoop 2.0)
Hadoop is now fast
info@rittmanmead.com www.rittmanmead.com @rittmanmead 56
Hadoop 2.0 Processing Frameworks + Tools
info@rittmanmead.com www.rittmanmead.com @rittmanmead 57
•Cloudera’s answer to Hive query response time issues

•MPP SQL query engine running on Hadoop, bypasses MapReduce for
direct data access

•Mostly in-memory, but spills to disk if required

•Uses Hive metastore to access Hive table metadata

•Similar SQL dialect to Hive - not as rich though and no support for Hive
SerDes, storage handlers etc
Cloudera Impala - Fast, MPP-style Access to Hadoop Data
info@rittmanmead.com www.rittmanmead.com @rittmanmead 58
•Beginners usually store data in HDFS using text file formats (CSV) but these have limitations

•Apache AVRO often used for general-purpose processing

‣Splitability, schema evolution, in-built metadata, support for block compression

•Parquet now commonly used with Impala due to column-orientated storage

‣Mirrors work in RDBMS world around column-store

‣Only return (project) the columns you require across a wide table
Apache Parquet - Column-Orientated Storage for Analytics
info@rittmanmead.com www.rittmanmead.com @rittmanmead 59
•But Parquet (and HDFS) have significant limitation for real-time analytics applications

‣Append-only orientation, focus on column-store 

makes streaming ingestion harder

•Cloudera Kudu aims to combine 

best of HDFS + HBase

‣Real-time analytics-optimised 

‣Supports updates to data

‣Fast ingestion of data

‣Accessed using SQL-style tables

and get/put/update/delete API
Cloudera Kudu - Combining Best of HBase and Column-Store
info@rittmanmead.com www.rittmanmead.com @rittmanmead 60
•Kudu storage used with Impala - create tables using Kudu storage handler

•Can now UPDATE, DELETE and INSERT into Hadoop tables, not just SELECT and LOAD DATA
Example Impala DDL + DML Commands with Kudu
CREATE TABLE `my_first_table` (
`id` BIGINT,
`name` STRING
)
TBLPROPERTIES(
'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler',
'kudu.table_name' = 'my_first_table',
'kudu.master_addresses' = 'kudu-master.example.com:7051',
'kudu.key_columns' = 'id'
);
INSERT INTO my_first_table VALUES (99, "sarah");
INSERT IGNORE INTO my_first_table VALUES (99, "sarah");
UPDATE my_first_table SET name="bob" where id = 3;
DELETE FROM my_first_table WHERE id < 3;
DELETE c FROM my_second_table c, stock_symbols s WHERE c.name = s.symbol;
and it’s now in-memory
info@rittmanmead.com www.rittmanmead.com @rittmanmead 63
•Another DAG execution engine running on YARN

•More mature than TEZ, with richer API and more vendor support

•Uses concept of an RDD (Resilient Distributed Dataset)

‣RDDs like tables or Pig relations, but can be cached in-memory

‣Great for in-memory transformations, or iterative/cyclic processes

•Spark jobs comprise of a DAG of tasks operating on RDDs

•Access through Scala, Python or Java APIs

•Related projects include

‣Spark SQL

‣Spark Streaming
Apache Spark
info@rittmanmead.com www.rittmanmead.com @rittmanmead 64
•Spark SQL, and Data Frames, allow RDDs in Spark to be processed using SQL queries

•Bring in and federate additional data from JDBC sources

•Load, read and save data in Hive, Parquet and other structured tabular formats
Spark SQL - Adding SQL Processing to Apache Spark
val accessLogsFilteredDF = accessLogs
.filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp).*"))
.filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin).*")).toDF()
.registerTempTable("accessLogsFiltered")
val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) 

as total 

FROM accessLogsFiltered a 

JOIN posts p ON a.endpoint = p.POST_SLUG 

GROUP BY p.POST_TITLE, p.POST_AUTHOR 

ORDER BY total DESC LIMIT 10 ")
// Persist top ten table for this window to HDFS as parquet file
topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet"

, "parquet", SaveMode.Overwrite)
info@rittmanmead.com www.rittmanmead.com @rittmanmead 65
Accompanied by Innovations in Underlying Platform
Cluster Resource Management to

support multi-tenant distributed services
In-Memory Distributed Storage,

to accompany In-Memory Distributed Processing
Dataflow Pipelines 

are the new ETL
New ways to do BI
New ways to do BI
Hadoop is the new ETL Engine
info@rittmanmead.com www.rittmanmead.com @rittmanmead
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Proprietary ETL
engines die circa
2015 – folded into
big data
Oracle Open World 2015 21
Proprietary ETL is Dead. Apache-based ETL is What’s Next
Scripted
SQL
Stored
Procs
ODI for
Columnar
ODI for
In-Mem
ODI for
Exadata
ODI for
Hive
ODI for
Pig & Oozie
1990’s
Eon of Scripts and PL-SQL Era of SQL E-LT/Pushdown Big Data ETL in Batch Streaming ETL
Period of Proprietary Batch ETL Engines
Informatica
Ascential/IBM
Ab Initio
Acta/SAP
SyncSort
1994
Oracle Data Integrator
ODI for
Spark
ODI for
Spark Streaming
Warehouse
Builder
Machine Learning & Search for 

“Automagic” Schema Discovery
New ways to do BI
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•By definition there's lots of data in a big data system ... so how do you find the data you want?

•Google's own internal solution - GOODS ("Google Dataset Search")

•Uses crawler to discover new datasets

•ML classification routines to infer domain

•Data provenance and lineage

•Indexes and catalogs 26bn datasets

•Other users, vendors also have solutions

•Oracle Big Data Discovery

•Datameer

•Platfora

•Cloudera Navigator
Google GOODS - Catalog + Search At Google-Scale
A New Take on BI
info@rittmanmead.com www.rittmanmead.com @rittmanmead
•Came out if the data science movement, as a way to
"show workings"

•A set of reproducible steps that tell a story about the data

•as well as being a better command-line environment for
data analysis

•One example is Jupyter, evolution of iPython notebook

•supports pySpark, Pandas etc

•See also Apache Zepplin
Web-Based Data Analysis Notebooks
Meanwhile
in the real world …
https://www.youtube.com/watch?v=h1UmdvJDEYY
And Emerging Open-Source

BI Tools and Platforms
And Emerging Open-Source

BI Tools and Platforms
http://larrr.com/wp-content/uploads/2016/05/paper.pdf
And Emerging Open-Source

BI Tools and Platforms
To see an example:
See an example in action:
https://speakerdeck.com/markrittman/oracle-big-data-discovery-extending-into-machine-learning-a-quantified-self-case-study
http://www.rittmanmead.com

EVENT SPEAKER
DANISH BI MEETUP, SEP’ 2016
FROM LOTS OF REPORTS (WITH SOME DATA ANALYSIS) 

TO MASSIVE DATA ANALYSIS (WITH SOME REPORTING)
MARK RITTMAN, ORACLE ACE DIRECTOR

More Related Content

What's hot

OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...Mark Rittman
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsMark Rittman
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Mark Rittman
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsMark Rittman
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Mark Rittman
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyMark Rittman
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design PatternsJohn Yeung
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalMichael Rainey
 

What's hot (20)

OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...OTN EMEA TOUR 2016  - OBIEE12c New Features for End-Users, Developers and Sys...
OTN EMEA TOUR 2016 - OBIEE12c New Features for End-Users, Developers and Sys...
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 
Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...Unlock the value in your big data reservoir using oracle big data discovery a...
Unlock the value in your big data reservoir using oracle big data discovery a...
 
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business AnalyticsOracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
Oracle BI Hybrid BI : Mode 1 + Mode 2, Cloud + On-Premise Business Analytics
 
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop : Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
Enkitec E4 Barcelona : SQL and Data Integration Futures on Hadoop :
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
Big Data Architecture and Design Patterns
Big Data Architecture and Design PatternsBig Data Architecture and Design Patterns
Big Data Architecture and Design Patterns
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 

Viewers also liked

Data Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieData Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieJens Albrecht
 
Bitcoin & Blockchain for Friends
Bitcoin & Blockchain for FriendsBitcoin & Blockchain for Friends
Bitcoin & Blockchain for FriendsSam Wouters
 
7 Things Banks should do with Blockchain
7 Things Banks should do with Blockchain7 Things Banks should do with Blockchain
7 Things Banks should do with BlockchainSam Wouters
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Sotaro Kimura
 
Bb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBlackboard APAC
 
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBlackboard APAC
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloudera Japan
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
ICTSC5 大人達の戦い LT資料
ICTSC5 大人達の戦い LT資料ICTSC5 大人達の戦い LT資料
ICTSC5 大人達の戦い LT資料Ken SASAKI
 
ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話Ken SASAKI
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Sotaro Kimura
 
ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話Ken SASAKI
 
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016Cloudera Japan
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)hamaken
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) hamaken
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayShinpei Ohtani
 

Viewers also liked (20)

Data Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieData Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur Technologie
 
Bitcoin & Blockchain for Friends
Bitcoin & Blockchain for FriendsBitcoin & Blockchain for Friends
Bitcoin & Blockchain for Friends
 
7 Things Banks should do with Blockchain
7 Things Banks should do with Blockchain7 Things Banks should do with Blockchain
7 Things Banks should do with Blockchain
 
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Bb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up ReportsBb Tour ANZ 17 - X-Ray Roll Up Reports
Bb Tour ANZ 17 - X-Ray Roll Up Reports
 
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product UpdatesBb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
Bb Tour ANZ 2017 - Moodlerooms & X-Ray Learning Analytics Product Updates
 
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
 
ICTSC5 大人達の戦い LT資料
ICTSC5 大人達の戦い LT資料ICTSC5 大人達の戦い LT資料
ICTSC5 大人達の戦い LT資料
 
ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話ICTSC5 DMM.comラボの紹介+お給料の話
ICTSC5 DMM.comラボの紹介+お給料の話
 
Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本Kafkaを活用するためのストリーム処理の基本
Kafkaを活用するためのストリーム処理の基本
 
ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話ICTSC6 ちょっとだけ数学の話
ICTSC6 ちょっとだけ数学の話
 
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
 
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)ちょっと理解に自信がないなという皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
ちょっと理解に自信がないな という皆さまに贈るHadoop/Sparkのキホン (IBM Datapalooza Tokyo 2016講演資料)
 
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料) 40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
40分でわかるHadoop徹底入門 (Cloudera World Tokyo 2014 講演資料)
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API Gateway
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Application of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
 

Similar to From lots of reports (with some data Analysis) 
to Massive Data Analysis (With some Reporting)

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 
ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013Mark Rittman
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Mark Rittman
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudMark Rittman
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleDatabricks
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & HadoopBlackvard
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - OverviewJeffrey T. Pollock
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 

Similar to From lots of reports (with some data Analysis) 
to Massive Data Analysis (With some Reporting) (20)

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013ODI 11g in the Enterprise - BIWA 2013
ODI 11g in the Enterprise - BIWA 2013
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
Deploying Full Oracle BI Platforms to Oracle Cloud - OOW2015
 
Deploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle CloudDeploying Full BI Platforms to Oracle Cloud
Deploying Full BI Platforms to Oracle Cloud
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Introduction To Big Data & Hadoop
Introduction To Big Data & HadoopIntroduction To Big Data & Hadoop
Introduction To Big Data & Hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 

More from Mark Rittman

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...Mark Rittman
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Mark Rittman
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...Mark Rittman
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Mark Rittman
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...Mark Rittman
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015Mark Rittman
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIMark Rittman
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsMark Rittman
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cMark Rittman
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Mark Rittman
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gMark Rittman
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cMark Rittman
 

More from Mark Rittman (13)

IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
OBIEE12c and Embedded Essbase 12c - An Initial Look at Query Acceleration Use...
 
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
Adding a Data Reservoir to your Oracle Data Warehouse for Customer 360-Degree...
 
What is Big Data Discovery, and how it complements traditional business anal...
What is Big Data Discovery, and how it complements  traditional business anal...What is Big Data Discovery, and how it complements  traditional business anal...
What is Big Data Discovery, and how it complements traditional business anal...
 
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
Delivering the Data Factory, Data Reservoir and a Scalable Oracle Big Data Ar...
 
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
End to-end hadoop development using OBIEE, ODI, Oracle Big Data SQL and Oracl...
 
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
OBIEE11g Seminar by Mark Rittman for OU Expert Summit, Dubai 2015
 
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODIBIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
BIWA2015 - Bringing Oracle Big Data SQL to OBIEE and ODI
 
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI ProjectsOGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
OGH 2015 - Hadoop (Oracle BDA) and Oracle Technologies on BI Projects
 
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12cUKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
UKOUG Tech'14 Super Sunday : Deep-Dive into Big Data ETL with ODI12c
 
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
Part 1 - Introduction to Hadoop and Big Data Technologies for Oracle BI & DW ...
 
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11gPart 4 - Hadoop Data Output and Reporting using OBIEE11g
Part 4 - Hadoop Data Output and Reporting using OBIEE11g
 
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12cPart 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
Part 2 - Hadoop Data Loading using Hadoop Tools and ODI12c
 

Recently uploaded

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 

Recently uploaded (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 

From lots of reports (with some data Analysis) 
to Massive Data Analysis (With some Reporting)

  • 1. EVENT SPEAKER DANISH BI MEETUP, SEP’ 2016 FROM LOTS OF REPORTS (WITH SOME DATA ANALYSIS) 
 TO MASSIVE DATA ANALYSIS (WITH SOME REPORTING) MARK RITTMAN, ORACLE ACE DIRECTOR
  • 2. info@rittmanmead.com www.rittmanmead.com @rittmanmead 2 •Mark Rittman, Co-Founder of Rittman Mead ‣Oracle ACE Director, specialising in Oracle BI&DW ‣14 Years Experience with Oracle Technology ‣Regular columnist for Oracle Magazine •Author of two Oracle Press Oracle BI books ‣Oracle Business Intelligence Developers Guide ‣Oracle Exalytics Revealed ‣Writer for Rittman Mead Blog :
 http://www.rittmanmead.com/blog •Email : mark.rittman@rittmanmead.com •Twitter : @markrittman About the Speaker
  • 3. info@rittmanmead.com www.rittmanmead.com @rittmanmead 3 •Started back in 1996 on a bank Oracle DW project •Our tools were Oracle 7.3.4, SQL*Plus, PL/SQL 
 and shell scripts •Went on to use Oracle Developer/2000 and Designer/2000 •Our initial users queried the DW using SQL*Plus •And later on, we rolled-out Discoverer/2000 to everyone else •And life was fun… 20 Years in Oracle BI and Data Warehousing
  • 4. info@rittmanmead.com www.rittmanmead.com @rittmanmead 4 •Data warehouses provided a unified view of the business ‣Single place to store key data and metrics ‣Joined-up view of the business ‣Aggregates and conformed dimensions ‣ETL routines to load, cleanse and conform data •BI tools for simple, guided access to information ‣Tabular data access using SQL-generating tools ‣Drill paths, hierarchies, facts, attributes ‣Fast access to pre-computed aggregates ‣Packaged BI for fast-start ERP analytics Data Warehouses and Enterprise BI Tools Oracle MongoDB Oracle Sybase IBM DB/2 MS SQL MS SQL Server Core ERP Platform Retail Banking Call Center E-Commerce CRM 
 Business Intelligence Tools 
 Data Warehouse Access &
 Performance
 Layer ODS /
 Foundation
 Layer 4
  • 5. info@rittmanmead.com www.rittmanmead.com @rittmanmead 5 •Examples were Crystal Reports, Oracle Reports, Cognos Impromptu, Business Objects •Report written against carefully-curated BI dataset, or directly connecting to ERP/CRM •Adding data from external sources, or other RDBMSs,
 was difficult and involved IT resources •Report-writing was a skilled job •High ongoing cost for maintenance and changes •Little scope for analysis, predictive modeling •Often user frustration and pace of delivery Reporting Back Then… 5
  • 6. info@rittmanmead.com www.rittmanmead.com @rittmanmead 6 •For example Oracle OBIEE, SAP Business Objects, IBM Cognos •Full-featured, IT-orientated enterprise BI platforms •Metadata layers, integrated security, web delivery •Pre-build ERP metadata layers, dashboards + reports •Federated queries across multiple sources •Single version of the truth across the enterprise •Mobile, web dashboards, alerts, published reports •Integration with SOA and web services Then Came Enterprise BI Tools 6
  • 7. info@rittmanmead.com www.rittmanmead.com @rittmanmead Traditional Three-Layer Relational Data Warehouses Staging Foundation /
 ODS Performance /
 Dimensional ETL ETL BI Tool (OBIEE)
 with metadata
 layer OLAP / In-Memory
 Tool with data load
 into own database Direct
 Read Data
 Load Traditional structured
 data sources Data
 Load Data
 Load Data
 Load Traditional Relational Data Warehouse •Three-layer architecture - staging, foundation and access/performance •All three layers stored in a relational database (Oracle) •ETL used to move data from layer-to-layer
  • 8. And All Was Good…
  • 9. (a big BI project)
  • 10.
  • 11. Lots of reports 
 (with some data analysis)
  • 14.
  • 15. and users got impatient…
  • 16.
  • 17. Reporting and Dashboards… became self-service 
 data discovery
  • 19. Cloud and SaaS have won
  • 20.
  • 22. info@rittmanmead.com www.rittmanmead.com @rittmanmead The Gartner BI & Analytics Magic Quadrant 2016
  • 23.
  • 24.
  • 25. info@rittmanmead.com www.rittmanmead.com @rittmanmead 29 Analytic Workflow Component Traditional BI Platform Modern BI Platform Data source Upfront dimensional modeling required (IT-built star schemas) Upfront modeling not required (flat files/ flat tables) Data ingestion and preparation IT-produced IT-enabled Content authoring Primarily IT staff, but also some power users Business users Analysis Predefined, ad hoc reporting, based on predefined model Free-form exploration Insight delivery Distribution and notifications via scheduled reports or portal Sharing and collaboration, storytelling, open APIs Gartner’s View of A “Modern BI Platform” in 2016
  • 26. 2007 - 2015 Died of ingratitude by business users Just when we got the infrastructure right Doesn’t anyone appreciate a single version of the truth? Don’t say we didn’t warn you No you can’t just export it to Excel Watch out OLAP you’re next
  • 28. info@rittmanmead.com www.rittmanmead.com @rittmanmead 32 •Data now landed in Hadoop clusters, NoSQL databases and Cloud Storage •Flexible data storage platform with cheap storage, flexible schema support + compute •Data lands in the data lake or reservoir in raw form, then minimally processed •Data then accessed directly by “data scientists”, or processed further into DW Meet the New Data Warehouse : The “Data Reservoir” Data Transfer Data Access Data Factory Data Reservoir Business Intelligence Tools Hadoop Platform File Based Integration Stream Based Integration Data streams Discovery & Development Labs Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Unstructured Data Voice + Chat Transcripts ETL Based Integration Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data
  • 29. Hadoop is the new 
 Data Warehouse
  • 30. info@rittmanmead.com www.rittmanmead.com @rittmanmead Hadoop : The Default Platform Today for Analytics •Enterprise High-End RDBMSs such as Oracle can scale into the petabytes, using clustering ‣Sharded databases (e.g. Netezza) can scale further but with complexity / single workload trade-offs •Hadoop was designed from outside for massive horizontal scalability - using cheap hardware •Anticipates hardware failure and makes multiple copies of data as protection •More nodes you add, more stable it becomes •And at a fraction of the cost of traditional
 RDBMS platforms
  • 31. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Data from new-world applications is not like historic data •Typically comes in non-tabular form •JSON, log files, key/value pairs •Users often want it speculatively •Haven’t thought it through •Schema can evolve •Or maybe there isn’t one •But the end-users want it now •Not when you’re ready 35 But Why Hadoop? Reason #1 - Flexible Storage Big Data Management Platform Discovery & Development Labs
 Safe & secure Discovery and Development environment Data sets and samples Models and programs Single Customer View Enriched 
 Customer Profile Correlating Modeling Machine
 Learning Scoring Schema-on
 Read Analysis
  • 32. info@rittmanmead.com www.rittmanmead.com @rittmanmead But Why Hadoop? Reason #2 - Massive Scalability •Enterprise High-End RDBMSs such as Oracle can scale ‣Clustering for single-instance DBs can scale to >PB ‣Exadata scales further by offloading queries to storage ‣Sharded databases (e.g. Netezza) can scale further ‣But cost (and complexity) become limiting factors ‣Typically $1m/node is not uncommon
  • 33. info@rittmanmead.com www.rittmanmead.com @rittmanmead But Why Hadoop? Reason #2 - Massive Scalability
  • 34. info@rittmanmead.com www.rittmanmead.com @rittmanmead But Why Hadoop? Reason #2 - Massive Scalability •Hadoop’s main design goal was to enable virtually-limitless horizontal scalability •Rather than a small number of large, powerful servers, it spreads processing over
 large numbers of small, cheap, redundant servers •Processes the data where it’s stored, avoiding I/O bottlenecks •The more nodes you add, the more stable it becomes! •At an affordable cost - this is key •$50k/node vs. $1m/node •And … the Hadoop platform is a better fit for
 new types of processing and analysis
  • 35. info@rittmanmead.com www.rittmanmead.com @rittmanmead Big Data Platform - All Running Natively Under Hadoop YARN (Cluster Resource Management) Batch
 (MapReduce) HDFS (Cluster Filesystem holding raw data) Interactive
 (Impala, Drill,
 Tez, Presto) Streaming +
 In-Memory
 (Spark, Storm) Graph + Search
 (Solr, Giraph) Enriched 
 Customer Profile Modeling Scoring But Why Hadoop? Reason #3 - Processing Frameworks •Hadoop started by being synonymous with MapReduce, and Java coding •But YARN (Yet another Resource Negotiator) broke this dependency •Modern Hadoop platforms provide overall cluster resource management,
 but support multiple processing frameworks •General-purpose (e.g. MapReduce) •Graph processing •Machine Learning •Real-Time Processing (Spark Streaming, Storm) •Even the Hadoop resource management framework
 can be swapped out •Apache Mesos
  • 36. info@rittmanmead.com www.rittmanmead.com @rittmanmead Combine With DW for Old-World/New-World Solution
  • 37. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Most high-end RDBMS vendors provide connectors to load data in/out of Hadoop platforms ‣Bulk extract ‣External tables ‣Query federation •Use high-end RDBMSs
 as specialist engines •a.k.a. "Data Marts" But … Analytic RDBMSs Are The New Data Mart Discovery & Development Labs
 Safe & secure Discovery and Development environment 
 Data
 Warehouse Curated data : Historical view and business aligned access 
 Business Intelligence Tools Big Data Management Platform Data sets and samples Models and programs Big Data Platform - All Running Natively Under Hadoop YARN (Cluster Resource Management) Batch
 (MapReduce) HDFS (Cluster Filesystem holding raw data) Interactive
 (Impala, Drill,
 Tez, Presto) Streaming +
 In-Memory
 (Spark, Storm) Graph + Search
 (Solr, Giraph) Enriched 
 Customer Profile Modeling Scoring
  • 38. BI Innovation is happening
 around Hadoop
  • 39. BI Innovation is happening
 around Hadoop
  • 43.
  • 45.
  • 46.
  • 48.
  • 50. info@rittmanmead.com www.rittmanmead.com @rittmanmead 56 Hadoop 2.0 Processing Frameworks + Tools
  • 51. info@rittmanmead.com www.rittmanmead.com @rittmanmead 57 •Cloudera’s answer to Hive query response time issues •MPP SQL query engine running on Hadoop, bypasses MapReduce for direct data access •Mostly in-memory, but spills to disk if required •Uses Hive metastore to access Hive table metadata •Similar SQL dialect to Hive - not as rich though and no support for Hive SerDes, storage handlers etc Cloudera Impala - Fast, MPP-style Access to Hadoop Data
  • 52. info@rittmanmead.com www.rittmanmead.com @rittmanmead 58 •Beginners usually store data in HDFS using text file formats (CSV) but these have limitations •Apache AVRO often used for general-purpose processing ‣Splitability, schema evolution, in-built metadata, support for block compression •Parquet now commonly used with Impala due to column-orientated storage ‣Mirrors work in RDBMS world around column-store ‣Only return (project) the columns you require across a wide table Apache Parquet - Column-Orientated Storage for Analytics
  • 53. info@rittmanmead.com www.rittmanmead.com @rittmanmead 59 •But Parquet (and HDFS) have significant limitation for real-time analytics applications ‣Append-only orientation, focus on column-store 
 makes streaming ingestion harder •Cloudera Kudu aims to combine 
 best of HDFS + HBase ‣Real-time analytics-optimised ‣Supports updates to data ‣Fast ingestion of data ‣Accessed using SQL-style tables
 and get/put/update/delete API Cloudera Kudu - Combining Best of HBase and Column-Store
  • 54. info@rittmanmead.com www.rittmanmead.com @rittmanmead 60 •Kudu storage used with Impala - create tables using Kudu storage handler •Can now UPDATE, DELETE and INSERT into Hadoop tables, not just SELECT and LOAD DATA Example Impala DDL + DML Commands with Kudu CREATE TABLE `my_first_table` ( `id` BIGINT, `name` STRING ) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 'my_first_table', 'kudu.master_addresses' = 'kudu-master.example.com:7051', 'kudu.key_columns' = 'id' ); INSERT INTO my_first_table VALUES (99, "sarah"); INSERT IGNORE INTO my_first_table VALUES (99, "sarah"); UPDATE my_first_table SET name="bob" where id = 3; DELETE FROM my_first_table WHERE id < 3; DELETE c FROM my_second_table c, stock_symbols s WHERE c.name = s.symbol;
  • 55. and it’s now in-memory
  • 56.
  • 57. info@rittmanmead.com www.rittmanmead.com @rittmanmead 63 •Another DAG execution engine running on YARN •More mature than TEZ, with richer API and more vendor support •Uses concept of an RDD (Resilient Distributed Dataset) ‣RDDs like tables or Pig relations, but can be cached in-memory ‣Great for in-memory transformations, or iterative/cyclic processes •Spark jobs comprise of a DAG of tasks operating on RDDs •Access through Scala, Python or Java APIs •Related projects include ‣Spark SQL ‣Spark Streaming Apache Spark
  • 58. info@rittmanmead.com www.rittmanmead.com @rittmanmead 64 •Spark SQL, and Data Frames, allow RDDs in Spark to be processed using SQL queries •Bring in and federate additional data from JDBC sources •Load, read and save data in Hive, Parquet and other structured tabular formats Spark SQL - Adding SQL Processing to Apache Spark val accessLogsFilteredDF = accessLogs .filter( r => ! r.agent.matches(".*(spider|robot|bot|slurp).*")) .filter( r => ! r.endpoint.matches(".*(wp-content|wp-admin).*")).toDF() .registerTempTable("accessLogsFiltered") val topTenPostsLast24Hour = sqlContext.sql("SELECT p.POST_TITLE, p.POST_AUTHOR, COUNT(*) 
 as total 
 FROM accessLogsFiltered a 
 JOIN posts p ON a.endpoint = p.POST_SLUG 
 GROUP BY p.POST_TITLE, p.POST_AUTHOR 
 ORDER BY total DESC LIMIT 10 ") // Persist top ten table for this window to HDFS as parquet file topTenPostsLast24Hour.save("/user/oracle/rm_logs_batch_output/topTenPostsLast24Hour.parquet"
 , "parquet", SaveMode.Overwrite)
  • 59. info@rittmanmead.com www.rittmanmead.com @rittmanmead 65 Accompanied by Innovations in Underlying Platform Cluster Resource Management to
 support multi-tenant distributed services In-Memory Distributed Storage,
 to accompany In-Memory Distributed Processing
  • 61. New ways to do BI
  • 62. New ways to do BI
  • 63. Hadoop is the new ETL Engine
  • 64. info@rittmanmead.com www.rittmanmead.com @rittmanmead Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Proprietary ETL engines die circa 2015 – folded into big data Oracle Open World 2015 21 Proprietary ETL is Dead. Apache-based ETL is What’s Next Scripted SQL Stored Procs ODI for Columnar ODI for In-Mem ODI for Exadata ODI for Hive ODI for Pig & Oozie 1990’s Eon of Scripts and PL-SQL Era of SQL E-LT/Pushdown Big Data ETL in Batch Streaming ETL Period of Proprietary Batch ETL Engines Informatica Ascential/IBM Ab Initio Acta/SAP SyncSort 1994 Oracle Data Integrator ODI for Spark ODI for Spark Streaming Warehouse Builder
  • 65. Machine Learning & Search for 
 “Automagic” Schema Discovery
  • 66. New ways to do BI
  • 67. info@rittmanmead.com www.rittmanmead.com @rittmanmead •By definition there's lots of data in a big data system ... so how do you find the data you want? •Google's own internal solution - GOODS ("Google Dataset Search") •Uses crawler to discover new datasets •ML classification routines to infer domain •Data provenance and lineage •Indexes and catalogs 26bn datasets •Other users, vendors also have solutions •Oracle Big Data Discovery •Datameer •Platfora •Cloudera Navigator Google GOODS - Catalog + Search At Google-Scale
  • 68. A New Take on BI
  • 69. info@rittmanmead.com www.rittmanmead.com @rittmanmead •Came out if the data science movement, as a way to "show workings" •A set of reproducible steps that tell a story about the data •as well as being a better command-line environment for data analysis •One example is Jupyter, evolution of iPython notebook •supports pySpark, Pandas etc •See also Apache Zepplin Web-Based Data Analysis Notebooks
  • 70. Meanwhile in the real world … https://www.youtube.com/watch?v=h1UmdvJDEYY
  • 71. And Emerging Open-Source
 BI Tools and Platforms
  • 72. And Emerging Open-Source
 BI Tools and Platforms http://larrr.com/wp-content/uploads/2016/05/paper.pdf
  • 73.
  • 74. And Emerging Open-Source
 BI Tools and Platforms
  • 75.
  • 76. To see an example:
  • 77. See an example in action: https://speakerdeck.com/markrittman/oracle-big-data-discovery-extending-into-machine-learning-a-quantified-self-case-study
  • 79. EVENT SPEAKER DANISH BI MEETUP, SEP’ 2016 FROM LOTS OF REPORTS (WITH SOME DATA ANALYSIS) 
 TO MASSIVE DATA ANALYSIS (WITH SOME REPORTING) MARK RITTMAN, ORACLE ACE DIRECTOR