From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake

© MJR Analytics 2018, T: +44 01273 041134 (UK) 415-218-2161 (US) W: https;//mjr-analytics.com E: info@mjr-analytics.com
From BI Developer to Data Engineer with
Oracle Analytics Cloud, Data Lake
Mark Rittman, CEO and Founder, MJR Analytics
UK Oracle User Group, Liverpool ACC, December 2018

Introductions …. And It’s Good To Be Back..!
● Mark Rittman, Oracle ACE Director
○ Past UKOUG Oracle Scene Editor
○ Author of two books on Oracle BI
○ 18+ Years in Oracle BI, DW, ETL + Big Data
○ Host of Drill to Detail Podcast
● Past two years as Product Manager at Tech Startup
● Now - back again as founder of MJR Analytics
○ Specialists in Modern Cloud & Digital Analytics
○ 100% Cloud focus + project delivery
■ Oracle Analytics Cloud
■ Oracle Autonomous DW Cloud
■ Oracle Data Integration Cloud
■ Oracle Big Data Cloud
■ Speak to us during UKOUG Tech 2018

T: +44 01273 041134 (UK) W: https;//mjr-analytics.com E: info@mjr-analytics.com
Take the Next Step with MJR Analytics
● Specialists in Modern Cloud Analytics
● Founded by Mark Rittman in 2018
● 100% Cloud focus + project delivery
○ Oracle Autonomous Analytics Cloud
○ Oracle Autonomous DW Cloud
○ Oracle Data Integration Cloud
○ Oracle Big Data Cloud
● Speak to us now during OOW 2018
info@mjr-analytics.com
+44 7866 568246
https://www.mjr-analytics.com
MJR Analytics & Red Pill
Analytics Tech’18 Happy Hour
4pm-6pm today, Pump House

Oracle Analytics Cloud
● Oracle’s Cloud Analytics platform based on OBIEE and Oracle DV technology
● Customer-managed or Oracle-managed (Autonomous Analytics Cloud)
● Available in three editions
○ OAC Standard
○ OAC Data Lake
○ OAC Enterprise

Three Key Components of OAC Data Lake
Oracle Data Visualization
(OAC Standard Edition)
Oracle Essbase Cloud
Data Flows &
Data Lake Analysis

● Explore, catalog and discover data in Oracle Big
Data Cloud, Oracle Database
● Enrich and transform raw data into valuable
information and insights
● Analyze at-scale data using Data Visualization
● Combine data from SaaS, social and real-time
● Create predictive and classification models
● Analyze the sentiment in social media feeds
Data Flows

But what’s a Data Lake?

What Is a Data Lake?
● Complements a data warehouse
● Landing area for unstructured and
semi-structured data for analysis
● Flexible data storage platform with
cheap storage, flexible schema
support + compute
● Use-cases include
○ Storing data intended for
multiple query engines
○ Landing data for initial discovery
○ Storing high-volume granular
event data from Event Hub

What Is a Data Lake?

Data Engineers
Makes at-scale data consumable in
some form, either directly or
by data scientists and data analysts
Creates new insights +
models using tools such
as R and sampled data
Data Scientists
Helps people understand
insights from data that
they’ve unearthed
Data Analysts
Data Lake User Personas

Data Engineers
● Can code, run clusters
● Create data pipelines & prepare data
● and build predefined ML models
● Knowledge of the math of ML limited
● They may be DBAs, BI developers
● Experience with DevOps, cloud

OAC Data Lake Features for Data Engineers
13
● Explore, catalog and discover data in Oracle Big Data Cloud, Oracle
Database
● Enrich and transform raw data into valuable information and insights
● Analyze at-scale data using Data Visualization
● Combine data from SaaS, social and real-time
● Create predictive and classification models
● Analyze the sentiment in social media feeds
● Data engineering without the hand-coding

Example OAC Data Lake Scenario

OAC Data Lake Cloud Components

© MJR Analytics 2018, T: +44 01273 041134 (UK) 415-218-2161 (US) W: https;//mjr-analytics.com E: info@mjr-analytics.com16
Scenario : Ingest and Analyze Real-Time Feeds

17

Oracle Cloud Platform-as-a-Service Stack
28

Oracle Big Data Cloud, Ambari and Hive ThriftServer
29

Oracle Event Hub Cloud Service - Dedicated
30

Managing and Cataloging the Cloud Data Lake
31
● Catalog of all data assets in projects
● Connection to Hive Thrift Server
● IoT and Social Media Data Sets
● Data Flows and Sequences
● Managed data lake store
● Control the lifecycle of your
data lake assets
● Security
● Scheduling

Data Preparation Features from OAC Standard Edition
32
1. Split timestamp field
that’s not in valid format
2. Choose “space”
character as delimiter
3. Convert the first split
column into a date datatype
4. Choose the correct date
format for this field’s values
5. Repeat for the TIME split column,
concatenate with ’T’ in-between and
finally convert resulting field into
TIMESTAMP

New in OAC 18.3.3 - Augmented Data Preparation
● Easy self-service data preparation
and blending
● Deep data patterns profiling
produces a rich set recommendation
● ML driven enrichment and
transform
○ Over 20 geographic and
demographic
Enrichments
○ Out of the box recognition of
over 30 semantic types
○ Instant preview of data
transforms

Data Flows are sequences
of data transformations
executed on the BI Server -
Spark execution on
roadmap for OAC DL
Create
Essbase Cube
Time Series
Forecast
Sentiment
Analysis
Predictive / ML
Model Train and
Build
Run custom R and
other python
scripts
Extended Data Flow Capability for Data Lake Edition
Data Flows are based on
the technology previously
announce as “Dataflow ML”,
now delivered as part of
Oracle Analytics Cloud

Example : Enrich With Sentiment, Then Visualize
35
1. Add Sentiment Analyse
step to data flow, persist
final enriched dataset back
to Hive table
2. Add a calculation to convert
sentiment description values to
positive/negative cumulative
score
3. Analyze Results in Data
Visualization UI

Using Explain Feature to Automate Deriving Context
36
1. Right-Click on attribute or
measure column to “explain”
the drivers of its values 2. ML algorithm explains basic
facts, drivers, anomalies and
identifies segments of interest

Display Selected Column Explanations on Dashboard
37

Transform, Aggregate and Join Datasets
38
Multi-step dataset joins
Aggregate Datasets
Binning and Grouping

Predictive Modeling and Forecasting
39
1. Select Prediction Model best
suited to predicting Kudos from
Strava bike rides
2. Select column whose values
are to be predicted, and model
parameter values
3. Train model and then test
against remaining dataset

Analyzing Data At-Scale Hosted on Big Data Cloud
40

Oracle Analytics Cloud, Data Lake - Summary
● Edition of Oracle Analytics Cloud that extends Standard with
○ Essbase Cloud
○ Data Flows and integration with Big Data`
● Data Flow feature enables multi-step transform of ingested data
● Sentiment Analyze operator useful for social/text data enrichment
● Enables BI developers to train and build predictive models
● ML-driven Explain feature automates
understanding of context
● Basic data engineering for BI developers
● Find out more at https://mjr-analytics.com
or speak to us after the session

From BI Developer to Data Engineer with
Mark Rittman, CEO and Founder, MJR Analytics
Oracle Open World 2018, San Francisco

From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake

Similar to From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake (20)

More from Rittman Analytics

More from Rittman Analytics (12)

Recently uploaded

Recently uploaded (20)

From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake