Oracle Unified Information Architeture + Analytics by Example

Oracle's Unified Information Architecture in Action
Harald Erb Oracle Business Analytics EMEA Local Cluster DE/CH

Copyright © 2014, Oracle and/or its affiliates. 2 All rights reserved.
Agenda
 Oracle's Unified Information Architecture
 Analytics by Example – The MoviePlex Lab
– MoviePlex Application & Architecture
– SQL Access over Hadoop for Oracle BI
(Hive vs. Impala, Oracle SQL Connector for Hadoop)
– Oracle DWH 12c SQL Pattern Matching (Work in Progress)
 MoviePlex Lab – extended: BI Mobile Example

Oracle's Unified Information Architecture

Traditional Data Warehouse / BI Architectures *)
 Warehouse is usually a three-layer architecture:
Staging, Foundation and Access/Performance Layer
 All three Layers stored in a relational database (Oracle),
and additionally in other Data Sources (i.e. Essbase Data Marts)
 ETL used to move data from Layer-to-Layer
*) Copyright © 2013 Deloitte Development LLC

BI System refresh of is closely coupled with ETL
Example: Required time slots to refresh an Exalytics system
TimesTen
(Exalytics)
Oracle BI
Foundation Suite
(Exalytics)
Information Access
t1 t2 t3 t4
Load Times (Full / Incremental) Cache Seeding
Oracle DWH Reference Architecture
 TimesTen Database (t3)
– BI Summary Advisor
recommends necessary
Aggregate Tables
– They need to be
loaded/compressed refreshed,
indexed. Statistics Update, etc.
 BI Server Cache (t4)
– has to be purged and then
– seeded, i.e. triggered by
BI Server‘s Event Polling
mechanism or via scripts
using nqcmd

Deeper Insight Exists Beyond Structured Data
„High Value Data“

The rise of Big Data and Hadoop
 New way to process, store and analyze data
– Family of open-source products used to store, and analyze
distributed datasets
– Hadoop is the enabling framework, automatically parallelizes
and co-ordinates jobs
– MapReduce is the programming framework for filtering, sorting
and aggregating data - can be written in any language (Java etc)
 New paradigm for TCO - low-cost servers, cheap clustering
– Hadoop can be used as an extension of the DWH staging
layer  cheap processing & storage
– But complex analytic algorithms running against large sets of
multi-structured data are much faster on Hadoop
 BI users might benefit from additional data stored in Hadoop
Apache Hadoop = most well-known Big Data technology

Hadoop Distributed File System (HDFS)
Low-Cost, Clustered, Fault-Tolerant Storage
 The filesystem behind Hadoop, used to store
data for Hadoop analysis
– Unix-like, uses commands such as
ls, mkdir, chown, chmod
– Fault-tolerant, with rapid fault detection
and recovery
– High-throughput, with streaming data access
and large block sizes
– Allows fast loads, no structure syntax checks
 Designed for data-locality, placing data closed
to where it is processed
 Accessed from the command-line, via internet
(hdfs://), GUI tools etc

The Hadoop “Data Warehouse”
Idea: Process the data locally where it lives – then return only the results
 Hive
– SQL-like semantic DWH layer
for Hadoop
– Facebook helped to develop
Hive – now adopted as Apache
Project
 Could even be used instead
of a traditional DWH or
data mart:
– Limited functionality now
– But products maturing
– with unbeatable TCO
Source: M. Rittman, Oracle BIWA SIG Summit 2014, San Francisco

Cloudera Distribution of Hadoop (CDH)
Complete Hadoop solution - part of Oracle’s Big Data Appliance
 CDH delivers core elements
of Hadoop – scalable storage
and distributed computing
 Additional components:
– user interface
– enterprise capabilities
(i.e. security )
– integration with a broad
range of hardware and
software solutions
– entire solution is thoroughly
tested and fully documented
2014 Gartner Magic Quadrant for DWH
Database Management Systems

“[Facebook] started in the Hadoop world. We are now
bringing in relational to enhance that. We're kind of going [in]
the other direction ... We've been there, and [we] realized
that using the wrong technology for certain kinds of
problems can be difficult.”
Ken Rubin
Director of Analytics
Facebook
Source: http://tdwi.org/Articles/2013/05/06/Facebooks-Relational-Platform.aspx?Page=1
No need for Relational Warehouses anymore?

The new Analytics Warehouse Architecture...*)

...can be implemented with Oracle *)

Keep all potentially valuable data
Oracle
Database
• Integrated Data
• Acurrate/Trusted
• Modeled
• Aggregated
• Consistent
• Cleansed
• Optim. Perform.
• Metadata
Cloudera
Hadoop
• Structured and
unstructured
• Fast loads
• Histor. archive
• Cheap Storage
• Fault tolerant
• Parallel
Processing
Oracle Big Data
Connectors
Oracle Data
Integrator
Data Reservoir Data Warehouse
“Schema on read” “Schema on write”
Oracle Unified Information Architecture

Options to do In-Place Analysis in both the Warehouse and on Hadoop
In-Database
Analytics
Oracle
Database
Oracle
Advanced
Analytics
Oracle
Spatial &
Graph
Applications
Cloudera
Hadoop
Oracle R
Distribution
Oracle Big Data
Connectors
Oracle Data
Integrator

In-Database
Analytics
Data
Warehouse
Oracle
Advanced
Analytics
Oracle
Database
Applications
Cloudera
Hadoop
Oracle R
Distribution
Oracle Big Data
Connectors
Oracle Data
Integrator
Oracle BI
Foundation Suite
Endeca Information
Discovery
BI and Information Discovery: Distinct but Complementary Capabilities

Known & Clearly
Defined Questions
Who, What, When?
Uncertain or
Open-Ended Questions
Why, How, What Else?
Modeled Data
Conforms to a Single Model
Un-modeled Data
Diverse and Changing Models
New Tools and Processes required by End Users
New questions
require
new data
exploration
Business Intelligence
Proven Answers to Known
Questions
Insights yield
mature models
and KPIs
Information Discovery
Fast Answers to New Questions

In-Database
Analytics
Data
Warehouse
Oracle
Advanced
Analytics
Oracle
Database
Oracle Event
Processing
Apache
Flume
Applications
Oracle
NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
Oracle Big Data
Connectors
Oracle Data
Integrator
Oracle Real-Time
Decisions
Oracle BI
Foundation Suite
Endeca Information
Discovery
Stream into Hadoop, Handle and Cache Events, Automate Decisioning

In-Database
Analytics
Data
Warehouse
Oracle
Advanced
Analytics
Oracle
Database
Oracle Event
Processing
Apache
Flume
Applications
Oracle
NoSQL
Database
Cloudera
Hadoop
Oracle R
Distribution
Oracle Big Data
Connectors
Oracle Data
Integrator
Oracle
Enterprise
Manager
Oracle Real-Time
Decisions
Oracle BI
Foundation Suite
Endeca Information
Discovery
Big Data
Appliance
Exadata
Exalytics
Solution for all Data: Complete, Integrated, Scalable

Real-World Scenario
Example

 Unify access to all data leveraging Oracle
Engineered Systems and a common Analytics API
 Analytics API
– will enable languages like SQL, R and Graph
languages to be applied to all data
– will extend the languages to better address new
data types
 Goal: a single
logical system
Strategy
Pictures: DOAG News Feb. 2014, Jean-Pierre Dijcks,
Oracle Corp., Big Data Product Management

Analytics by Example – The MoviePlex Lab

MoviePlex Lab Topics
 MoviePlex Application & Architecture
 SQL Access over Hadoop for Oracle BI:
– Direct Access: via Hive or Impala
– Via DWH: Oracle SQL Connector for Hadoop (OSCH)
 Oracle DWH 12c SQL Pattern Matching (Work in Progress)

MoviePlex
A fictitious on-line movie streaming company
Oracle Big Data Lite VM
Oracle Business Intelligence VM
Oracle YouTube Channel: MoviePlex-Videos
Download and YouTube Links  see Apendix

25
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Oracle Exadata
Oracle Big Data Appliance
MoviePlex Architecture
Application Log
Log of all activity on site
Capture activity necessary for MoviePlex site
Streamed into HDFS using Flume
Load Recommendations
Customer Profile
(e.g. recommended movies)
Oracle NoSQL DB
HDFS
Map Reduce ORCH - CF Recs.
Map Reduce
Hive - Activities
Map Reduce
Pig - Sessionize
Clustering/Market Basket
Oracle Advanced Analytics
Oracle Exalytics
Endeca Information Discovery
Oracle Business Intelligence EE
“Mood” Recommendations
Load Session & Activity Data
Oracle Big Data Connectors
Query Session & Activity Data

26
MoviePlex Application Simple profile updates
•Goal
–Deliver a personal experience to every user
–Each user profile must be retrieved and updated with minimal latency
•Challenge
–Need to service this at web scales
–100k’s customers buying 100k’s movies
•Products Featured
–Oracle NoSQL DB
•Value
–Minimize latency

27
MoviePlex Application Advanced Analytics - Movies based on Mood
•Goal
–Provide a compelling user experience
–Deliver targeted recommendations based on your “current mood” - or move selections from this session
•Challenge
–Need to service this request in real-time
–Oracle Advanced Analytics
•Value
–Leverage the scalability, performance and advanced analytic features of the Oracle Database

28
MoviePlex Application NoSQL Database as the Key Value store
•Goal
–Each user profile must be retrieved and updated with minimal latency
•Challenge
–Need to service this request in real-time & at scale
–Oracle NoSQL DB
•Value
–Minimize latency
–Simple programming model
key
value
elapsed
code

29
MoviePlex Application Advanced Profile Attributes
•Goal
–Deliver a personal experience to every user
–Deliver targeted recommendations based on past movie viewing habits
•Challenge
–Deliver genres and movies that are targeted to the current customer
–Log files are massive, semi-structured, constantly updated
–Oracle R Connector for Hadoop
•Value
–Enhanced user experience = more $$

MoviePlex Data Warehouse
 Integrated Sources
– Customer Data and Segments
from CRM System
– Movie Database
– Billing Information
– User Activity from MoviePlex App.
 Extensions
– Pre-filtered Application Log Data
from HDFS
– External Social Data aquired by Endeca
Web Aquisition Toolkit (Kapow Catalyst) –
stored in RDBMS or HDFS
(not implemented yet)
Data Sources and Relational Model

31
MoviePlex Architecture & Application
SQL Access over Hadoop for Oracle BI
–Direct Access: via Hive or Impala
–Via DWH: Oracle SQL Connector for Hadoop (OSCH)
Oracle DWH 12c SQL Pattern Matching (Work in Progress)

SQL Access over Hadoop: Hive
 HiveQL – ANSI-92
 Uses RBDMS metastore to hold table
and column definitions in schemas
 Access HDFS and other formats that
provide a SerDE = Ser(ializer) and
a De(serializer): Hbase, Oracle
NoSQL, JSON, XML, etc.
 Hive tables map onto
HDFS-stored files
– Managed Tables or
– External Tables
Picture: M. Rittman, Oracle BIWA SIG Summit 2014, San Francisco

 HiveQL queries are automatically
translated into Java MapReduce
jobs (Batch Processing)
 “Oracle-like” query optimizer,
compiler, executor
 Selection and filtering part
becomes Map tasks
 Aggregation part becomes the
Reduce tasks
 Scales Out on Large Data Sets
 Extensible via User Defined
Functions and Plug-ins
Transforming HiveQL Queries into MapReduce Jobs
Source: M. Rittman, Oracle BIWA SIG Summit 2014, San Francisco

Example 1
MoviePlex Lab: Streaming Application Logs into HDFS using Flume

External Hive Table MOVIEAPP_LOG_JSON Content of a JSON File containing
MoviePlex Application Log Data
MoviePlex Lab: Create External Table MOVIEAPP_LOG_JSON
Example 1

Example 1
MoviePlex Lab: Query Table MOVIEAPP_LOG_JSON with Oracle BI

Set up ODBC Connection at the Oracle BI Server
 OBIEE 11.1.1.7+ ships with HiveODBC
drivers, need to use DataDirect 7.x versions
though (only Linux supported)
 For testing ok, but not yet certified:
Cloudera ODBC Driver for Apache Hive,
Version 2.5.5
 Configure the ODBC connection in odbc.ini,
name needs to match BI Server Repository
ODBC name
 Server Configuration  see Metadata
Repository Builder's Guide – Chapter 16
[ODBC Data Sources]
AnalyticsWeb=Oracle BI Server
Cluster=Oracle BI Server
SSL_Sample=Oracle BI Server
bda_vm=Oracle 7.1 Apache Hive Wire Protocol
[bda_vm]
Driver=/u01/app/Middleware/Oracle_BI1/common
/ODBC/Merant/7.0.1/lib/ARhive27.so
Description=Oracle 7.1 Apache Hive Wire
Protocol ArraySize=16384
Database=moviework
DefaultLongDataBuffLen=1024
EnableLongDataBuffLen=1024
EnableDescribeParam=0
Hostname=bigdatalite
LoginTimeout=30
MaxVarcharSize=2000
PortNumber=10000
RemoveColumnQualifiers=0
StringDescribeType=12
TransactionMode=0
UseCurrentSchema=0

SQL Access over Hadoop: Impala
Datasheet :
http://www.cloudera.com/content/dam/
cloudera/Resources/PDF/DS_Impala.pdf
 Created by Cloudera, Impala is a massively
parallel processing (MPP) SQL query engine
 Circumvents MapReduce, 10-100x faster
than Apache Hive
 Leverages Hive Metadata
 Access: HDFS and Hbase (a non-relational
database that allows quick lookups in Hadoop
and adds transactional capabilities to Hadoop)
 ANSI-92 SQL support with user-defined
functions (UDFs)
 Supports common Hadoop file formats: text,
Sequence Files, Avro, Parquet, …
 Memory bound

Example 2
MoviePlex Lab: Faster Queries on MOVIEAPP_LOG_JSON possible?
External Hive Table MOVIEAPP_LOG_JSON could
not be reused because the SerDE = Ser(ializer)
and De(serializer) row format is not yet supported
by Impala

Example 2
MoviePlex Lab: Create Managed Table MOVIEAPP_LOG_CSV
Managed Hive Table
MOVIEAPP_LOG_CSV is
here created with Hue an
open source web-based
interface for Apache
Hadoop.

Example 2
MoviePlex Lab: Query Table MOVIEAPP_LOG_CSV with Oracle BI
Managed Hive Table
MOVIEAPP_LOG_CSV can be
imported and queried with Oracle
Business Intelligence, but Cloudera
ODBC Driver for Impala is not yet
supported

Example 3
Pre-ETL over HDFS Data with HiveQL
MoviePlex Lab: Create Managed Table MOVIEAPP_LOG_STAGE
Managed HiveTable MOVIEAPP_LOG_STAGE
Insert into HiveTable MOVIEAPP_LOG_STAGE
from External Hive Table MOVIEAPP_LOG_JSON

Example 3
Pre-ETL over HDFS Data with HiveQL
MoviePlex Lab: Execution of MapReduce Jobs during Table Load
Execution of HiveQL Insert Statement for
Managed HiveTable MOVIEAPP_LOG_STAGE
Hue - Job Browser displays
Job Status and Metrics /
Job Details

Oracle SQL Connector for HDFS (OSCH)
Use Oracle SQL to Load or Access Data on HDFS
 Option to access and analyze data with Oracle
SQL – Input formats
– Text files in place on HDFS
– via Hive (managed and external) tables
Note: No indexes, no partitioning, so queries
are a full table scan
 Data files are read in parallel
– Example: If there are 96 data files and the
database can support 96 PQ slaves, all 96
files can be read in parallel
– OSCH automatically balances the load
across the PQ slaves
 Certified: CDH3 & CDH4, Apache Hadoop 1.0, 1.1

Example 4
Oracle SQL Connector for HDFS (OSCH)
MoviePlex Lab: Access Table MOVIEAPP_LOG_STAGE via OSCH
Step 1: Create External Table
MOVIE_FACT_MW_HDFS_EXT_TAB
in Oracle RDBMS
Oracle SQL Connector for HDFS uses the
ORACLE_LOADER access driver, Oracle
Directory MOVIEDEMO_DIR points to path
/home/oracle/movie/moviedemo/osch
Step 2: Publish Data Path to Managed
Hive Table MOVIEAPP_LOG_STAGE
File Location of managed Hive Table
MOVIEAPP_LOG_STAGE
Inserted Data from Hive
Table MOVIEAPP_LOG_JSON

SQL Access over Hadoop: In summary
Hive Impala Oracle SQL Connector for
HDFS (OSCH)
Characteristics • Creates MapReduce jobs
• for batch-mode queries
• Access HDFS + other formats
that provide an SerDe
• Processes queries in MPP
platform
• Leverages Hive Metadata, but
replaces MapReduce
• Leverages External Tables
• Access data in-place on
HDFS with Oracle SQL
• No Indexes or Partitioning
Oracle BI Support • OBIEE 11.1.1.7.+ ships with
DataDirect ODBC Driver
• Cloudera Hive ODBC (Hive2
Server) not yet supported
• Support for Cloudera Impala
ODBC is expected for OBIEE
not earlier than version 12c
• Access via Oracle RDBMS
version 11g and 12c (incl.
necessary Patches)
Pro‘s • Handles any size of data
• Access to many data sources
and formats
• User Defined Functions
• Fast Queries
• Same metastore as Hive
• Additional formats (Parquet)
• Leverage Oracle SQL &
Security
• Join with data in Oracle
• Query HDFS data in-place
Con‘s • Performance
• No Caching – Query fully
executed every time
• HiveQL – Ansi-92
• Multi-Map for Joins
• Memory bound
• Join order is important
• No Caching – Query fully
executed every time
• SQL-92, Immature
• Must stream all data to
Oracle
• No Support of Hive
partitioned tables
• No predicate push down

 MoviePlex Architecture & Application
 SQL Access over Hadoop for Oracle BI
– Direct Access: via Hive or Impala
– Via DWH: Oracle SQL Connector for Hadoop (OSCH)
 Oracle DWH 12c SQL Pattern Matching (Work in Progress)

Via Direct
Database Request
only („MoviePlex
Inc. ORCL 12c
Connection Pool”
Example 5
Oracle DWH 12c SQL Pattern Matching
Idea: Use Advanced SQL Features over Hadoop and DWH Data

Example 5
Oracle DWH 12c SQL Pattern Matching
Idea: Use Advanced SQL Features over Hadoop and DWH Data

MoviePlex Lab – extended: BI Mobile Example

Analytics Warehouse
Data Generation
Data Processing and Storage
Predictive Analysis and Sentiment Analysis
Reporting, Visualization & Analytics
Data generated in
source systems (both
structured and
unstructured data)
Mashups are created
from source systems
and staged to support
transformation and
subsequent loads into
downstream systems
Data is processed
using Oracle Endeca
Server to combine
structured and
unstructured content.
New Oracle 12c SQL
Functions and In-
Database Analytics (R,
ODM) are also used
to process data for
statistical and
predictive analytics.
Visualization is used to
perform predictive
analytics, analyze
structured/unstructured
content, and view
outcomes
OBIEE
Dashboards
Oracle Endeca
Studio
Oracle
BI Mobile
Oracle Endeca
Server
Business
User
Customer Service
Representative
Power
User
Mobile
User
Emails Files ShareEs xternal
Line of Business
(LOB)
Applications
Unstructured
Web
Social
Interactions
( ETL Workflow | Federation | Optimization)
MoviePlex – End User Scenarios
In-Database:
R, ODM, ...
1
4
2
3
Oracle Endeca
Integrator & WAT
Data Reservoir
(Hadoop)
Oracle Data
Warehouse
Oracle
BI Server
Mobile

Example
MoviePlex Lab: Sales by Geography and Customer Segments
Monitor the Business with Dashboards

Monitor the Business with BI Mobile
 Marketing
Are people buying recommended movies?
What is our close rate? Browse vs Buy
Comedy’s really
sell - will look at
this later -
regardless of
recommendations

Oracle BI Mobile – Complete Mobile Analytics
 Extend Oracle BI to mobile devices – smartphones,
tablets – automatically
 Optimized for touch-gestures, interactions
 Location Intelligence
 Offline support
 Enhanced containerized security via BI Mobile
Security Toolkit
 NEW Self Service product capability allowing business
users to create and distribute mobile apps
 Users build targeted business apps with zero-coding
 Stunning, interactive apps in minutes
BI Mobile | HD Client
IT Controlled – Managed - Consistent
BI Mobile | App Designer
Purpose Built Analytical Apps

Which BI Mobile?
BI Mobile HD Client BI Mobile App Designer
Need to discover and access existing BI
dashboards on mobile devices
Need to create and deploy custom mobile
reports/apps on mobile devices
Ad-hoc BI users – similar to desktop users – High
degree of flexibility and interactivity
Functional users, focused workflows, mobile first
experience
Need offline access to dashboards Portal integration is key
Apple iOS & Android tablets and phones
supported
Apple iOS & Android tested – All HTML5 devices
expected to work – Windows Mobile, Blackberry 10
Focus on structured data found in OBIEE
semantic model
Leverage data sources in OBIEE semantic model
and easily upload spreadsheet data
Oracle BI Apps dashboards on mobile devices –
write once, deploy anywhere
Customization and flexibility the prime deal driver.
Alternative to OBIEE Visualizations
Both are licensed together – Get Both Capabilities

BI Mobile App Designer
MoviePlex Lab: Choose Device Type and Data Source
Example

MoviePlex Lab: Create & Test new Mobile Application
Example

Example
MoviePlex Lab: Deploy Application and Test with Mobile Device

MoviePlex Lab: Pages of MoviePlex Mobile Application
Example

61
Summary
Hadoop and DWH are complementary
Hadoop is still maturing
They will become more integrated
SQL (DWH + Hadoop) = More Business Value
Get familiar with (BI Access over) Hadoop

Appendix
 Oracle Developer Virtual Machines
– Big Data Lite, Version 2.5
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
– OBIEE 11.1.1.7.1 - Sample Application (V309 R2)
http://www.oracle.com/technetwork/middleware/bi-foundation/obiee-samples-167534.html
 Oracle YouTube Channel: Big Data / MoviePlex Videos
– Part 1. Overview: Improve the Customer Experience (10 min)
https://www.youtube.com/watch?v=P_hbTw5Gtfc
– Part 2. Deliver a Personalized Service - Oracle MoviePlex Application (5 min)
https://www.youtube.com/watch?v=Qh_zON11-rg
– Part 3. Manage Online Profiles w/Oracle NoSQL DB (5 min)
https://www.youtube.com/watch?v=zB8X4qDPZuQ
– Part 4. Turn Clicks into Value - Flume & Hive (5 min)
https://www.youtube.com/watch?v=IwrjJUoUwXY
– Part 5. Integrate All Your Data with Oracle Big Data Connectors (8 min)
https://www.youtube.com/watch?v=y61vpB4_wT4
– Part 6. Maximize the Business Impact with Oracle Advanced Analytics (8 min)
https://www.youtube.com/watch?v=5tYuZY6dyA8
Sources and Download Links

Oracle Unified Information Architeture + Analytics by Example

More Related Content

What's hot

Viewers also liked

Similar to Oracle Unified Information Architeture + Analytics by Example

More from Harald Erb

Recently uploaded

Oracle Unified Information Architeture + Analytics by Example