Mark Rittman, Oracle ACE Director
ODTUG KScope’18, Orlando June 2018
From BI Developer to Data Engineer with

Oracle Analytics Cloud Data Lake Edition
• Oracle ACE Director, Independent Analyst
• Past ODTUG Exec Board Member + Oracle Scene Editor
• Author of two books on Oracle BI
• Co-founder & CTO of Rittman Mead
• 15+ Years in Oracle BI, DW, ETL + now Big Data
• Host of the Drill to Detail Podcast (www.drilltodetail.com)
• Based in Brighton & work in London, UK
About the Presenter
2
Data Lakes are the new Data Warehouse
•Data now landed in Hadoop clusters, NoSQL
databases and Cloud Storage
•Flexible data storage platform with cheap storage,
flexible schema support + compute
•Solves problem of how to store new types of data
and flexibility on when to process
•Typically used by data scientists as source for new
models or insights of interest
•Data Warehouses still have their place
•But very few new ones are being built
•Nobody leaves college dreaming of being an ETL
developer
•Except Michael Rainey
Meet the New Data Warehouse : The “Data Lake”
4
From “What is a Data Lake”,

https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
Data Lakes Need Data Engineers
What is Data Engineering?
•When “Big Data” first became popular, all
users were termed “data scientists”
•Over time, this evolved into two distinct
roles:

• Data Scientists who focus on new insights + models 

working from laptops using R + sampled data

• Data Engineers, who make at-scale data
consumable in some form, either directly or by data
scientists

•Data Engineers
•Can code, run clusters
•Create data pipelines & prepare data
•Train and build predefined ML models
•Knowledge of the math of ML limited
•They may be DBAs, BI developers
•Experience with DevOps, cloud



and….
What is Data Engineering?
•Oracle’s Cloud Analytics platform, built-on Oracle BI EE and Oracle DV technology
•Available as customer-managed and Oracle-managed (Autonomous Analytics Cloud)
•Available as three packaging options
•Oracle Analytics Cloud Standard 

(aka Oracle DV in Oracle Cloud)
•Oracle Analytics Cloud Enterprise 

(aka OBIEE12c in Oracle Cloud)
•Oracle Analytics Cloud Data Lake

(aka …?)
Oracle Analytics Cloud Data Lake Edition
8
Oracle Analytics Cloud Data Lake Edition
Oracle Analytics Cloud
social
sensors
enterprisepersonal
SaaS
mobile
Data
Sources
Developers
Executives
Data Stewards
AnalystsData Catalog
One place to collect, search, explore & curate all data
Data Preparation
Prepare enriched, sharable, & reliable datasets across all data
Data Analysis
Understand & act using smarts: search, visualization, & storytelling
Oracle
Database
Services
Oracle
Big Data Cloud
Oracle
Storage Cloud
Data Engineers
•All functionality in OAC Standard Edition plus
•Integration with Oracle Big Data Cloud
•Additional data flow/data prep operators
•ML model build and train capability
•Text analytics and NLP processing
•Data flow execution in Apache Spark (*)
•Replicate from Cloud and On-Premise Apps
•Oracle Service Cloud –Taleo, Fusion Apps
•Incremental Ingest from DBs, Cloud + files
•Continuous Ingest from GoldenGate
OAC Data Lake Edition: Key Features
11
Integrates with Oracle Big Data Cloud and Event Hub
12
Long-Term Replacement for Big Data Discovery
13
•Visual Face of Data in Hadoop
•Data Preparation and Enrichment
•Spark Data Transformations
•Standalone technology + processes
•Visual Face of Data in Cloud
•Data Preparation and Enrichment
•Spark Data Transformations
•Oracle Analytics Cloud
•Explore, catalog and discover data in Oracle Big Data Cloud, Oracle Database
•Enrich and transform raw data into valuable information and insights
•Analyze at-scale data using Data Visualization
•Combine data from SaaS, social and real-time
•Create predictive and classification models
•Analyze the sentiment in social media feeds
•Data engineering without the hand-coding
OAC Data Lake Edition Use-Cases
14
Example Scenario
Scenario : Ingest and Analyze Real-Time Feeds
16
IoT events
via Fluentd
Social Media
data via
Fluentd
17
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Scenario : Ingest and Analyze Real-Time Feeds
18
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
Scenario : Ingest and Analyze Real-Time Feeds
19
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Scenario : Ingest and Analyze Real-Time Feeds
20
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Scenario : Ingest and Analyze Real-Time Feeds
21
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
Scenario : Ingest and Analyze Real-Time Feeds
22
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
TRANSFORM
Scenario : Ingest and Analyze Real-Time Feeds
23
IoT events
via Fluentd
Social Media
data via
Fluentd
Firewall
Event Hub
Cloud REST
Proxy
Event Hub
Cloud Kafka
Connect
Event Hub
Cloud Kafka
Connect
INGEST
Oracle Big Data Cloud
Oracle Analytics Cloud
Data Lake Edition
TRANSFORM ANALYZE
Scenario : Ingest and Analyze Real-Time Feeds
24
Scenario : Ingest and Analyze Real-Time Feeds
ID & Access
Management
Auditing
Object
Storage
VCN
25
Scenario : Ingest and Analyze Real-Time Feeds
ID & Access
Management
Auditing
Object
Storage
VCNAvailability	Domain	1
26
Scenario : Ingest and Analyze Real-Time Feeds
ID & Access
Management
Auditing
Object
Storage
VCNAvailability	Domain	1
ORACLE	CLOUD	INFRASTRUCTURE	 (REGION)
Cloud Infrastructure
Oracle Cloud Platform-as-a-Service Stack
28
Oracle Big Data Cloud, Ambari and Hive ThriftServer
29
Oracle Event Hub Cloud Service - Dedicated
30
OAC Data Lake Edition
•Catalog of all data assets
•Projects
•Connection to Hive Thrift Server
•IoT and Social Media Data Sets
•Data Flows and Sequences
•Managed data lake store
•Control the lifecycle of your 

data lake assets
•Security
•Scheduling
Managing and Cataloging the Cloud Data Lake
32
Data Preparation Features from OAC Standard Edition
33
1. Split timestamp field
that’s not in valid format
2. Choose “space”
character as delimiter
3. Convert the first split
column into a date datatype
4. Choose the correct date
format for this field’s values
5. Repeat for the TIME split column,
concatenate with ’T’ in-between and
finally convert resulting field into
TIMESTAMP
34
Data Flows are sequences
of data transformations
executed on the BI Server -
Spark execution on roadmap
for OAC DL
Create Essbase
Cube
Time Series
Forecast
Sentiment
Analysis
Predictive / ML
Model Train and
Build
Run custom R and
other python scripts
Extended Data Flow Capability for Data Lake Edition
Data Flows are based on the
technology previously
announce as “Dataflow ML”,
now delivered as part of
Oracle Analytics Cloud
Example : Enrich With Sentiment, Then Visualize
35
1. Add Sentiment Analyse
step to data flow, persist
final enriched dataset back
to Hive table
2. Add a calculation to convert
sentiment description values to
positive/negative cumulative
score
3. Analyze Results in Data
Visualization UI
Using Explain Feature to Automate Deriving Context
36
1. Right-Click on attribute
column to “explain” the drivers
of its values
2. ML algorithm explains basic
facts, drivers, anomalies and
identifies segments of interest
Display Selected Attribute Explanations on Dashboard
37
Transform, Aggregate and Join Datasets
38
Multi-step dataset joins
Aggregate Datasets
Binning and Grouping
Predictive Modeling and Forecasting
39
1. Select Prediction Model best
suited to predicting Kudos
from Strava bike rides
2. Select column who’s values
are to be predicted, and model
parameter values
3. Train model and then test
against remaining dataset
Analyzing Data At-Scale Hosted on Big Data Cloud
40
•Data Flow feature enables multi-step transform of ingested data
•Sentiment Analyze operator useful for social/text data enrichment
•Enables BI developers to train and build predictive models
•ML-driven Explain feature automates understanding of context
•Basic data engineering for BI developers
•More data lake features expected in v5, v6
•
OAC Data Lake: What Works, What’s Coming?
41
Integration of features from
Oracle Big Data Preparation
Cloud Service
Enhanced Summary view
highlights data shape and
data quality
Coming soon to London, Autumn/Fall 2018



https://mjr-analytics.com
Mark Rittman, Oracle ACE Director
ODTUG KScope’18, Orlando June 2018
From BI Developer to Data Engineer with

Oracle Analytics Cloud Data Lake Edition

From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition

  • 1.
    Mark Rittman, OracleACE Director ODTUG KScope’18, Orlando June 2018 From BI Developer to Data Engineer with
 Oracle Analytics Cloud Data Lake Edition
  • 2.
    • Oracle ACEDirector, Independent Analyst • Past ODTUG Exec Board Member + Oracle Scene Editor • Author of two books on Oracle BI • Co-founder & CTO of Rittman Mead • 15+ Years in Oracle BI, DW, ETL + now Big Data • Host of the Drill to Detail Podcast (www.drilltodetail.com) • Based in Brighton & work in London, UK About the Presenter 2
  • 3.
    Data Lakes arethe new Data Warehouse
  • 4.
    •Data now landedin Hadoop clusters, NoSQL databases and Cloud Storage •Flexible data storage platform with cheap storage, flexible schema support + compute •Solves problem of how to store new types of data and flexibility on when to process •Typically used by data scientists as source for new models or insights of interest •Data Warehouses still have their place •But very few new ones are being built •Nobody leaves college dreaming of being an ETL developer •Except Michael Rainey Meet the New Data Warehouse : The “Data Lake” 4 From “What is a Data Lake”,
 https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
  • 5.
    Data Lakes NeedData Engineers
  • 6.
    What is DataEngineering? •When “Big Data” first became popular, all users were termed “data scientists” •Over time, this evolved into two distinct roles:
 • Data Scientists who focus on new insights + models 
 working from laptops using R + sampled data
 • Data Engineers, who make at-scale data consumable in some form, either directly or by data scientists

  • 7.
    •Data Engineers •Can code,run clusters •Create data pipelines & prepare data •Train and build predefined ML models •Knowledge of the math of ML limited •They may be DBAs, BI developers •Experience with DevOps, cloud
 
 and…. What is Data Engineering?
  • 8.
    •Oracle’s Cloud Analyticsplatform, built-on Oracle BI EE and Oracle DV technology •Available as customer-managed and Oracle-managed (Autonomous Analytics Cloud) •Available as three packaging options •Oracle Analytics Cloud Standard 
 (aka Oracle DV in Oracle Cloud) •Oracle Analytics Cloud Enterprise 
 (aka OBIEE12c in Oracle Cloud) •Oracle Analytics Cloud Data Lake
 (aka …?) Oracle Analytics Cloud Data Lake Edition 8
  • 9.
    Oracle Analytics CloudData Lake Edition
  • 10.
    Oracle Analytics Cloud social sensors enterprisepersonal SaaS mobile Data Sources Developers Executives DataStewards AnalystsData Catalog One place to collect, search, explore & curate all data Data Preparation Prepare enriched, sharable, & reliable datasets across all data Data Analysis Understand & act using smarts: search, visualization, & storytelling Oracle Database Services Oracle Big Data Cloud Oracle Storage Cloud Data Engineers
  • 11.
    •All functionality inOAC Standard Edition plus •Integration with Oracle Big Data Cloud •Additional data flow/data prep operators •ML model build and train capability •Text analytics and NLP processing •Data flow execution in Apache Spark (*) •Replicate from Cloud and On-Premise Apps •Oracle Service Cloud –Taleo, Fusion Apps •Incremental Ingest from DBs, Cloud + files •Continuous Ingest from GoldenGate OAC Data Lake Edition: Key Features 11
  • 12.
    Integrates with OracleBig Data Cloud and Event Hub 12
  • 13.
    Long-Term Replacement forBig Data Discovery 13 •Visual Face of Data in Hadoop •Data Preparation and Enrichment •Spark Data Transformations •Standalone technology + processes •Visual Face of Data in Cloud •Data Preparation and Enrichment •Spark Data Transformations •Oracle Analytics Cloud
  • 14.
    •Explore, catalog anddiscover data in Oracle Big Data Cloud, Oracle Database •Enrich and transform raw data into valuable information and insights •Analyze at-scale data using Data Visualization •Combine data from SaaS, social and real-time •Create predictive and classification models •Analyze the sentiment in social media feeds •Data engineering without the hand-coding OAC Data Lake Edition Use-Cases 14
  • 15.
  • 16.
    Scenario : Ingestand Analyze Real-Time Feeds 16 IoT events via Fluentd Social Media data via Fluentd
  • 17.
    17 IoT events via Fluentd SocialMedia data via Fluentd Firewall Scenario : Ingest and Analyze Real-Time Feeds
  • 18.
    18 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect Scenario : Ingest and Analyze Real-Time Feeds
  • 19.
    19 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect INGEST Scenario : Ingest and Analyze Real-Time Feeds
  • 20.
    20 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect INGEST Oracle Big Data Cloud Scenario : Ingest and Analyze Real-Time Feeds
  • 21.
    21 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect INGEST Oracle Big Data Cloud Oracle Analytics Cloud Data Lake Edition Scenario : Ingest and Analyze Real-Time Feeds
  • 22.
    22 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect INGEST Oracle Big Data Cloud Oracle Analytics Cloud Data Lake Edition TRANSFORM Scenario : Ingest and Analyze Real-Time Feeds
  • 23.
    23 IoT events via Fluentd SocialMedia data via Fluentd Firewall Event Hub Cloud REST Proxy Event Hub Cloud Kafka Connect Event Hub Cloud Kafka Connect INGEST Oracle Big Data Cloud Oracle Analytics Cloud Data Lake Edition TRANSFORM ANALYZE Scenario : Ingest and Analyze Real-Time Feeds
  • 24.
    24 Scenario : Ingestand Analyze Real-Time Feeds ID & Access Management Auditing Object Storage VCN
  • 25.
    25 Scenario : Ingestand Analyze Real-Time Feeds ID & Access Management Auditing Object Storage VCNAvailability Domain 1
  • 26.
    26 Scenario : Ingestand Analyze Real-Time Feeds ID & Access Management Auditing Object Storage VCNAvailability Domain 1 ORACLE CLOUD INFRASTRUCTURE (REGION)
  • 27.
  • 28.
  • 29.
    Oracle Big DataCloud, Ambari and Hive ThriftServer 29
  • 30.
    Oracle Event HubCloud Service - Dedicated 30
  • 31.
  • 32.
    •Catalog of alldata assets •Projects •Connection to Hive Thrift Server •IoT and Social Media Data Sets •Data Flows and Sequences •Managed data lake store •Control the lifecycle of your 
 data lake assets •Security •Scheduling Managing and Cataloging the Cloud Data Lake 32
  • 33.
    Data Preparation Featuresfrom OAC Standard Edition 33 1. Split timestamp field that’s not in valid format 2. Choose “space” character as delimiter 3. Convert the first split column into a date datatype 4. Choose the correct date format for this field’s values 5. Repeat for the TIME split column, concatenate with ’T’ in-between and finally convert resulting field into TIMESTAMP
  • 34.
    34 Data Flows aresequences of data transformations executed on the BI Server - Spark execution on roadmap for OAC DL Create Essbase Cube Time Series Forecast Sentiment Analysis Predictive / ML Model Train and Build Run custom R and other python scripts Extended Data Flow Capability for Data Lake Edition Data Flows are based on the technology previously announce as “Dataflow ML”, now delivered as part of Oracle Analytics Cloud
  • 35.
    Example : EnrichWith Sentiment, Then Visualize 35 1. Add Sentiment Analyse step to data flow, persist final enriched dataset back to Hive table 2. Add a calculation to convert sentiment description values to positive/negative cumulative score 3. Analyze Results in Data Visualization UI
  • 36.
    Using Explain Featureto Automate Deriving Context 36 1. Right-Click on attribute column to “explain” the drivers of its values 2. ML algorithm explains basic facts, drivers, anomalies and identifies segments of interest
  • 37.
    Display Selected AttributeExplanations on Dashboard 37
  • 38.
    Transform, Aggregate andJoin Datasets 38 Multi-step dataset joins Aggregate Datasets Binning and Grouping
  • 39.
    Predictive Modeling andForecasting 39 1. Select Prediction Model best suited to predicting Kudos from Strava bike rides 2. Select column who’s values are to be predicted, and model parameter values 3. Train model and then test against remaining dataset
  • 40.
    Analyzing Data At-ScaleHosted on Big Data Cloud 40
  • 41.
    •Data Flow featureenables multi-step transform of ingested data •Sentiment Analyze operator useful for social/text data enrichment •Enables BI developers to train and build predictive models •ML-driven Explain feature automates understanding of context •Basic data engineering for BI developers •More data lake features expected in v5, v6 • OAC Data Lake: What Works, What’s Coming? 41 Integration of features from Oracle Big Data Preparation Cloud Service Enhanced Summary view highlights data shape and data quality
  • 42.
    Coming soon toLondon, Autumn/Fall 2018
 
 https://mjr-analytics.com
  • 43.
    Mark Rittman, OracleACE Director ODTUG KScope’18, Orlando June 2018 From BI Developer to Data Engineer with
 Oracle Analytics Cloud Data Lake Edition