SlideShare a Scribd company logo
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Harald Erb
Oracle Business Analytics & Big Data
1
The New Data Lake
Oracle’s elastisch skalierbare Big Data Cloud
DOAG Big Data Days,
22. September 2017
Click-through version of Live-Demo
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 2
Referent
Harald Erb
Sales Engineer, Information Architect
Business Analytics & Big Data
+49 (0)6103 397-403
harald.erb@oracle.com
Meine bisherige
Business Analytics
Zeitreise
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 4
1 Einführung
Data Lake & Data Labs
Konzepte, Oracle Cloud
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
1876: Edison’s Invention Factory, Menlo Park, NJ
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 7
Line of Governance
Data Lake
Data
Processing
Data
EnrichmentRaw Data
Sets
Curated &
Transformed
Data Sets
Data
Aggregation
Data Lab
Sandboxes
Data Catalog
Data Discovery
Tools
Transformations
Prototyping
Analytic Tools
Enterprise
Information
Store
Operational
Data Store
Data Federation &
Virtualization Layer
CommonSQLAccessto
ALLData
Orchestration, Scheduling & Monitoring
Metadata Management
Data
Ingestion
Batch
Integration
Real-Time
Integration
Data
Streaming
Data
Wrangling
Data Discovery
/ Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Non-structured
Sources
Logs
Social
Media
External
Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
Adhoc Files
or Data Sets
Data Management
Logische Architektur
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Data Lake
8
Data Lake
Intake Tier Management Tier Consumption Tier
Information Lifecycle Management Layer
Metadata Layer
Security & Governance Layer
Data
Discovery/
Business
Intelligence
Data Driven
Applications
Advanced
Analytics
Data Discovery
Data Provisioning
Source System Zone Transient Zone
Raw Zone
Connectivity
Processing
Interfaces for
ODBC
JDBC
NFS
File Shares
Web Services
REST
API
SFTP
Polling
Intake
Processing
Unstruktierte
Daten
Push/Realtime
Semi-strukt.
Daten
Push-Pull
Strukturierte
Daten
Pull
File Validation Checks
(Duplication, Integrity,
Size, Periodicity)
Data Integrity Checks
(Column/Rec. Counts,
Schema Validation)
Lineage Tracking
(Metadata Capture,
Watermarks)
Deep Integrity Checks
(Bit Level Scans,
Periodic Checksums)
Data HubIntegration
Data Profiling
Data Cleansing
Enrichment
Metadata Collection
Data Lineage Tracking
Transformation
Unstructured/
structured
Profiling (Data
completeness,
Correctness,
coherence)
Deletion
(Tuple, pairwise)
Imputation
(Mean/median
predicted value)
Structured Data
(Table/Attr. level)
Unstructured Data
(Word/Document
level: Stop words,
stemming,...)
Structured Data
(Aggr., Decompos.)
Unstructured Data
(Extract.,Tagging,
Entity Recognit. )
LoadDistribution
Vertical:Parti-tioning
(Range,Mod.,Key-Value,Random
Horizontal:Pipelining
Polystructured
Data Sources
Logs
Social Media
External Data
Interactions
Structured Data
Master Data
Applications
Channels
Data Stores
* ) Vgl. P. Pasupuleti, B. S. Purra
External
Access
Interfaces
for
SQL
JDBC
Web
Services
SFTP
Push-
/Pull-
based
Data Classification
(Named entity class, Topic
modelling, Text clustering)
Relation Extraction
(Column types, pattern, ref..
integrity, features, semantics)
Indexing Data
(Inverted Index, Faceted/Fuzzy
Search, Semantic Analysis)
Metadata publication
(Catalog of Raw and
Data Hub Zones)
Data formatting (Standard/
custom) & Data selection
(Row/column-, content-based)
Konzept und denkbare Funktionsbereiche *)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 10
Based on
Raw Data
Full Access to
Data Sources
(Select only)
Complete
Sandbox
Environment
Agile
Experimentation
“Fail Fast”
Data Lab
Key Requirements
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Alex Sadovsky,
Director of Data Science @ The Oracle Data Cloud
describes how to embrace cloud computing, Hive, and
Spark to create machine learning solutions at scale.
YouTube  URL
Warum Cloud?
Machine Learning at Scale  Cluster zeitweise massiv aber nicht permanent benutzen
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
“Data Scientists should not be
System Administrators
• If hardware fails, throw
it away
• If someone messes up the
OS, trash it
• No support tickets, no time
wasted”
“Data scientists should not
have to deal with system
administrators
• Science is about
experimentation
• Experimentation is about
testing boundaries
• No support tickets, no time
wasted”
“Don’t be afraid to throw money
(more computer resources) at a
problem
• Engineers and their time are
often more expensive than
computer resources
• “Burstable” solutions”
Warum Cloud?
Für experimentelles Arbeiten – einige Überlegungen von Alex Sadovsky
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
www.csm.ornl.gov/PR/clusters.jpg
Selbstbau ist teuer!
• Kosten für System Engineers
• Kosten für zusätzliche Entwicklungzeiten
• im Hinblick auf Personal (Gehalt)
• im Hinblick auf verlagerte Arbeit
• Kosten für besondere Developer Skills
• Leute mit Python/SQL-Kenntnissen sind
z.Z. noch leichter zu finden im Vergleich zu
Spark-/Hive-Spezialisten
• Be lazy. Warum nicht ~3..5€/Stunde für eine Single
Cloud instance bezahlen – anstatt eine eigene
Infrastruktur für Datenexperimente aufzubauen
und zu warten?
Warum Cloud?
Case Study: Der 10-Node-Selbstbau Spark Cluster
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 14
Oracle Data Management & Analytics Plattform
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 14
DATA LAKE
Big Data Cloud Services
DEVELOPERS
BUSINESS
IT
ANALYSTS
COMPUTE  STORAGE  NETWORK  IDENTITY
ANALYTICS SERVICES
Oracle Analytics Cloud
LOCATION & NETWORK RELATIONSHIPS
Spatial, Graph
MACHINE LEARNING
ORAAH, R, Spark ML
SEARCH SMARTS PREDICTION LEARNING MOBILE NAT. LANG. PERSONALIZED
SOCIAL
SENSORS
PERSONAL
SaaS
MOBILE
ENTERPRISE
STORE & EXECUTE
Oracle Hadoop, Cloudera, NoSQL
CATALOG
Data Catalog, Cloud Navigator
QUERY
Elastic Search, SparkSQL
DATABASE INTEGRATION
Connectors, Big Data SQL
INTEGRATION
SERVICES
Data Integration
Cloud Service
BATCH
STREAMING
DATA
All Data • Real-time & Batch • Data Science & Business User • Agile • Scalable • Economical
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
Data
Sources
Streaming
Batch
Business Users
High
Performance
Messaging
Event Processing
and Cache
Metadata
Enrichment
Reporting
Database
Data
Discovery
&
Analytics
Reporting
Adaptor Based
Integration
Change
Data
Capture
Files
Database
Real-time
Cloud
Weitere..
Notebooks &
discovery SQL
access to
Data Lake
Long-term,
low cost data
storage
15
Oracle Data Lake & Analytics
Merkmale
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Business Users
Oracle
Analytics Cloud
Oracle Data Lake & Analytics
Files
Database
Real-time
Cloud
Weitere..
Schlüsseltechnologien und Services
KafkaAPI
Spark SQL
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
GoldenGate
RestAPI
RESTAPI
Object
Storage
Big Data
Prep.
Adaptors
Integration
Cloud Service
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
JDBC
JDBC
Oracle DB
Business
Intelligence
Data
Visualization
Essabase
BI Mobile/
Day by Day
SmartView/
Office
Web Browser
Data Visualiz.
Desktop
16
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17
2 Oracle Cloud Journey
“The New Data Lake”
Storage und Big Data Services einrichten
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18
Basis Cloud-Module
Szenario: The New Data Lake
Oracle Public Cloud
Data
Sources
Batch
Files
Database
Real-time
Cloud
Weitere..
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPIKafka
Queue
Streaming
RestAPI
RESTAPI
Object
Storage
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 19
Oracle Storage Cloud Service
Object Storage
• Sehr preiswerte & flexible Speicherung beliebiger Daten
• für strukturierte und unstrukturierte Daten
• Im Gegensatz zu bekannten Dateisystemen enthalten
Objekte zwar Daten, sind allerdings nicht in einer
Hierarchie organisiert.
• Jedes Objekt befindet sich auf der gleichen Ebene eines
Adressraums, wird mithilfe seiner erweiterten Metadaten
charakterisiert und bekommt einen einzigartigen
Identifikator zugewiesen.
• Server oder Endanwender können das Objekt beziehen,
müssen den physischen Standort der Daten nicht kennen.
• Diese Herangehensweise ist für die Automatisierung und
Rationalisierung der Datenspeicherung in Cloud-
Computing-Umgebungen nützlich
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Business Users
The New Data Lake
Files
Database
Real-time
Cloud
Weitere..
Object Storage einrichten und verwenden
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
RestAPI
RESTAPI
Object
Storage
Data Visualiz.
Desktop
20
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Business Users
The New Data Lake
Files
Database
Real-time
Cloud
Weitere..
Object Storage einrichten und Zugriff via CloudBerry *)
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
Data Visualiz.
Desktop
25
*) Infos + Download: www.cloudberrylab.com/solutions/oracle-cloud
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
RestAPI
RESTAPI
Object
Storage
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 27
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 28
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
The New Data Lake
SFDC
Eloqua
RightNow
Twitter
Weitere..
Object Storage: Kopie eines Bootstrap Skripts (für spätere automatisierte Installation/Konfig.)
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
31
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32
Beispiel: Zeppelin Notebooks aus Github importieren
Bootstrap-Skripting für die Oracle Big Data Cloud
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33
Beispiel: Installation von Anaconda inkl. TensorFlow und Konfiguration von Python
Bootstrap-Skripting für die Oracle Big Data Cloud
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 35
Object Storage: Upload Bootstrap Script via Web UI
The New Data Lake
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 36
Object Storage: Weitere Operationen via Web UI
The New Data Lake
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 37
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
The New Data Lake
Files
Database
Real-time
Cloud
Weitere..
Big Data Cloud Service einrichten
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
38
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 39
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 40
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 41
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 42
Big Data Cloud Service einrichten: Cluster Konfiguration
The New Data Lake
Demo-Sequenz
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 43
Big Data Cloud Service einrichten: Schlüssel für SSH-Zugriff
The New Data Lake
Demo-Sequenz
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 46
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 47
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 48
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 49
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 50
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 51
Big Data Cloud Service einrichten: Zugriff für SSH und AMBARI Admin Console anpassen
The New Data Lake
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 52
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 53
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 54
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 55
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 56
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 57
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 58
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 59
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 60
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 61
3 Oracle Cloud Journey – Teil 1
Coding & Analyse mit Notebooks
Big Data-Technologien voll ausnutzen
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 62
Notebooks
Dokumente mit live ausführbarem Code – in fast beliebigen Programmiersprachen
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 63
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 64
Demo Use Case
New York City Bikes: Historische und Streaming-Daten analysieren
Historische Daten
Download aus Amazon Cloud (AWS S3)
Echtzeitdaten
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Big Data Cloud Journey
Files
Database
Real-time
Cloud
Weitere..
Notebook Basics, Dateioperationen in HDFS und Object Store, Hive-Tabellen anlegen
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
65
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 66
Oracle Big Data Cloud Service – Compute Edition
Hive - Tez Query Execution Engine
docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_performance_tuning/content/hive_perf_best_pract_config_tez.html
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 67
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 68
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 69
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Big Data Cloud Journey
Files
Database
Real-time
Cloud
Weitere..
Mit Spark Scala und Spark SQL arbeiten, Caching im Hauptspeicher
KafkaAPI
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
70
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Spark Streaming
Alluxio
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 71
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 72
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Big Data Cloud Journey
Files
Database
Real-time
Cloud
Weitere..
Mit Data Discovery Tools arbeiten
KafkaAPI
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
73
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Spark Streaming
Alluxio
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Oracle Data Visualization
Data Set Management
Lightweight Data Profiling
Data Flow
Editor
Visual Analyzer
Datenaufbereitung, -verknüpfung und interaktive Analyse in einem Werkzeug
74
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
Oracle Data Visualization
Import, Refresh,
Verwaltung
der Data Sets
Prozessablauf
Neue Daten inspizieren,
auf Qualität/Vollständigkeit
prüfen und verstehen
Data Sets bereinigen, filtern
kombinieren, anreichern
(Data Pipelines bauen)
Interaktiv Analysieren,
Zusammenhänge erkennen
und visualisieren
Ergebnisse kommentieren,
Analyseschritte dokumentieren
(Story Telling)
Neue Datenquellen
hinzunehmen
Weitere vorhandene
Data Sets ansehen
Data Sets weiter
aufbereiten, anreichern
Komplett neue Fragen und
Analyseideen verfolgen
Neue Perspektiven und
Datensichten umsetzen
Mit dem Werkzeug
• können komplette Analyse-Projekte umgesetzt werden,
• ist Rapid Prototyping möglich (anstelle starrer Spezifikationen in Papierform)
• Lassen sich Einmal-Analysen umsetzen, die eine Erweiterung der Business Intelligence-Plattform
(noch) nicht rechtfertigen
75
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 76
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 77
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 78
Database as a Service (DBaaS)
Middle-Tier Schema
Oracle Storage Cloud Service (OSCS)
Backup, Restore, DataViz Mashup, …
Oracle
Data Visualization*)
Oracle Business
Intelligence
Oracle Day by Day
Oracle Essbase
Und jetzt das alles bitte “Enterprise Ready”
Oracle Analytics Cloud (OAC)
*) Import von Custom DV Plugins und R-Skripts bzw. zusätzliche R-
Pakets ist technisch möglich, der offizielle Support durch Oracle
aber noch in Vorbereitung
Oracle Analytics Cloud
cloud.oracle.com/de_DE/oac
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Business Users
Oracle
Analytics Cloud
Oracle Analytics Cloud
SFDC
Eloqua
RightNow
Twitter
Weitere..
Zusammenspiel mit Big Data Services
KafkaAPI
Spark SQL
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
GoldenGate
RestAPI
RESTAPI
Object
Storage
Big Data
Prep.
Adaptors
Integration
Cloud Service
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
JDBC
JDBC
Oracle DB
Business
Intelligence
Data
Visualization
Essabase
BI Mobile/
Day by Day
SmartView/
Office
Web Browser
Data Visualiz.
Desktop
79
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 80
4 Oracle Cloud Journey
“The New Data Lake”
Event Hub Service einrichten
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 81
Oracle Event Hub Cloud Service setzt auf Apache Kafka als Schlüsseltechnologie
The New Data Lake
Apache Kafka
Ein Message Broker, dessen Architektur die Verarbeitung von
Datenströmen mit sehr hohem Nachrichtendurchsatz bei
niedrigen Latenzen ermöglicht.
Wichtige Komponenten
• Anwendungen, die Daten in einen Kafka Cluster
schreiben, werden als Producer bezeichnet,
Anwendungen, die Daten von dort lesen, als Consumer.
• Daten, die an einen Kafka Cluster geschickt werden,
werden in sogenannten Topics gruppiert.
• Ein Topic kann wiederum in mehrere Partitionen
unterteilt sein, wobei jede Partition redundant auf
mehreren Knoten (Broker) gespeichert werden kann.
Innerhalb einer Partition werden die Datensätze in der
Reihenfolge in der sie geschrieben werden gespeichert.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 82
Verwendungsmöglichkeiten von Kafka
The New Data Lake
Quelle: Confluent (www.confluent.io)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Business Users
The New Data Lake
Files
Database
Real-time
Cloud
Weitere..
Event Hub Service (OEHCS) einrichten
KafkaAPI
Spark Streaming
Alluxio
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
Data Visualiz.
Desktop
83
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
RestAPI
RESTAPI
Object
Storage
SparkSQL
TEZ
Spark
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 84
OEHCS Konfiguration – Teil 1
The New Data Lake
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 85
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 86
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 87
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 88
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 89
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 90
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 91
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 92
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 93
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 94
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 95
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 96
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 97
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 98
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 99
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 100
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 101
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 102
OEHCS Konfiguration – Teil 2
The New Data Lake
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 103
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 104
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 105
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 106
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 107
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 108
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 109
5 Oracle Cloud Journey – Teil 2
Streaming Data
Big Data-Technologien voll ausnutzen
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Public Cloud
Data Scientists / Developer
R Studio
Cloud Berry
Web Browser
Notebooks
Data
Sources
Batch
Big Data Cloud Journey
Files
Database
Real-time
Cloud
Weitere..
Mit Kafka und Spark Streaming arbeiten (Simulation: Bike Usage als Echtzeit-Datenstrom)
KafkaAPI
Hbase API
HBase Python
KafkaAPI
Kafka
Queue
Streaming
110
Business Users
Data Visualiz.
Desktop
HiveSQL
TEZ
Hive TEZ
Alluxio
HTML
HTML
Notebook
SparkSQL
TEZ
Spark
RestAPI
RESTAPI
Object
Storage
Spark Streaming
Alluxio
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 111
Oracle Big Data Cloud Service – Compute Edition
Big Data File System (BDFS) – In-memory Caching Layer (von Alluxio)
docs.oracle.com/en/cloud/paas/big-data-compute-cloud/csspc/big-data-file-system-bdfs.html
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 112
Verwendung Event Hub Service im Demo-Szenario
Big Data Cloud Journey
Simulation: Echtzeit-Datenstrom
(Bike Usage)
Visualisierung Live Map
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 113
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 114
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 115
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 116
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 117
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 118
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 119
Demo-Sequenz
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 120
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 121
6 Oracle Cloud Journey
“The New Data Lake”
Cluster Scale out per Knopfdruck
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 122
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 123
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 124
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 125
Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 126
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 127

More Related Content

What's hot

IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
Rob Thomas
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
m_hepburn
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
Sanjeev Solanki
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
IMC Institute
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
Information Security Awareness Group
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Karan Desai
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
Putchong Uthayopas
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Lucian Neghina
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
rjain51
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
Dorai Thodla
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
stelligence
 
IBM Big Data References
IBM Big Data ReferencesIBM Big Data References
IBM Big Data References
Rob Thomas
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
AmpoolIO
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
Deepak Ramanathan
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
Ivo Vachkov
 

What's hot (20)

IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Introduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Big data – a brief overview
Big data – a brief overviewBig data – a brief overview
Big data – a brief overview
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
IBM Big Data References
IBM Big Data ReferencesIBM Big Data References
IBM Big Data References
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics in Government
Big Data Analytics in GovernmentBig Data Analytics in Government
Big Data Analytics in Government
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 

Similar to DOAG Big Data Days 2017 - Cloud Journey

Using Graphs for Data Analysis
Using Graphs for Data AnalysisUsing Graphs for Data Analysis
Using Graphs for Data Analysis
オラクルエンジニア通信
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
Jeffrey T. Pollock
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
DataWorks Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
Fran Navarro
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
Sandesh Rao
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Big Data
Big DataBig Data
Big Data
Ben Duan
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
Jürgen Ambrosi
 
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
Karin Patenge
 
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
Cedar Consulting
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
Oracle Developers
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
Jeffrey T. Pollock
 
#PCMVision: Oracle Hybrid Cloud Solutions
#PCMVision: Oracle Hybrid Cloud Solutions#PCMVision: Oracle Hybrid Cloud Solutions
#PCMVision: Oracle Hybrid Cloud Solutions
PCM
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
InfiniteGraph
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Charlie Berger
 

Similar to DOAG Big Data Days 2017 - Cloud Journey (20)

Using Graphs for Data Analysis
Using Graphs for Data AnalysisUsing Graphs for Data Analysis
Using Graphs for Data Analysis
 
Expand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big DataExpand a Data warehouse with Hadoop and Big Data
Expand a Data warehouse with Hadoop and Big Data
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine LearningAUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
AUSOUG - NZOUG-GroundBreakers-Jun 2019 - AI and Machine Learning
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Big Data
Big DataBig Data
Big Data
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
20171106_OracleWebcast_ITTrends_EFavuzzi_KPatenge
 
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
18. Madhur Hemnani - Result Orientated Innovation with Oracle HR Analytics
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Oracle Data Science Platform
Oracle Data Science PlatformOracle Data Science Platform
Oracle Data Science Platform
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
#PCMVision: Oracle Hybrid Cloud Solutions
#PCMVision: Oracle Hybrid Cloud Solutions#PCMVision: Oracle Hybrid Cloud Solutions
#PCMVision: Oracle Hybrid Cloud Solutions
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
Oracle’s Advanced Analytics & Machine Learning 12.2c New Features & Road Map;...
 

More from Harald Erb

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
Harald Erb
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
Harald Erb
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
Harald Erb
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
Harald Erb
 
Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für Architekten
Harald Erb
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
Harald Erb
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Harald Erb
 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!
Harald Erb
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
Harald Erb
 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big Data
Harald Erb
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
Harald Erb
 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Harald Erb
 

More from Harald Erb (13)

Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020Dataiku & Snowflake Meetup Berlin 2020
Dataiku & Snowflake Meetup Berlin 2020
 
Does it only have to be ML + AI?
Does it only have to be ML + AI?Does it only have to be ML + AI?
Does it only have to be ML + AI?
 
Delivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and TableauDelivering rapid-fire Analytics with Snowflake and Tableau
Delivering rapid-fire Analytics with Snowflake and Tableau
 
Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für Architekten
 
Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen Do you know what k-Means? Cluster-Analysen
Do you know what k-Means? Cluster-Analysen
 
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
Exploratory Analysis in the Data Lab - Team-Sport or for Nerds only?
 
Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!Big Data Discovery + Analytics = Datengetriebene Innovation!
Big Data Discovery + Analytics = Datengetriebene Innovation!
 
Big Data Discovery
Big Data DiscoveryBig Data Discovery
Big Data Discovery
 
DOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big DataDOAG News 2012 - Analytische Mehrwerte mit Big Data
DOAG News 2012 - Analytische Mehrwerte mit Big Data
 
Oracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by ExampleOracle Unified Information Architeture + Analytics by Example
Oracle Unified Information Architeture + Analytics by Example
 
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
Endeca Web Acquisition Toolkit - Integration verteilter Web-Anwendungen und a...
 

Recently uploaded

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

DOAG Big Data Days 2017 - Cloud Journey

  • 1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Harald Erb Oracle Business Analytics & Big Data 1 The New Data Lake Oracle’s elastisch skalierbare Big Data Cloud DOAG Big Data Days, 22. September 2017 Click-through version of Live-Demo
  • 2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 2 Referent Harald Erb Sales Engineer, Information Architect Business Analytics & Big Data +49 (0)6103 397-403 harald.erb@oracle.com Meine bisherige Business Analytics Zeitreise
  • 3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 3
  • 4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 4 1 Einführung Data Lake & Data Labs Konzepte, Oracle Cloud
  • 5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 1876: Edison’s Invention Factory, Menlo Park, NJ
  • 6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 7 Line of Governance Data Lake Data Processing Data EnrichmentRaw Data Sets Curated & Transformed Data Sets Data Aggregation Data Lab Sandboxes Data Catalog Data Discovery Tools Transformations Prototyping Analytic Tools Enterprise Information Store Operational Data Store Data Federation & Virtualization Layer CommonSQLAccessto ALLData Orchestration, Scheduling & Monitoring Metadata Management Data Ingestion Batch Integration Real-Time Integration Data Streaming Data Wrangling Data Discovery / Business Intelligence Data Driven Applications Advanced Analytics Non-structured Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores Adhoc Files or Data Sets Data Management Logische Architektur
  • 7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Data Lake 8 Data Lake Intake Tier Management Tier Consumption Tier Information Lifecycle Management Layer Metadata Layer Security & Governance Layer Data Discovery/ Business Intelligence Data Driven Applications Advanced Analytics Data Discovery Data Provisioning Source System Zone Transient Zone Raw Zone Connectivity Processing Interfaces for ODBC JDBC NFS File Shares Web Services REST API SFTP Polling Intake Processing Unstruktierte Daten Push/Realtime Semi-strukt. Daten Push-Pull Strukturierte Daten Pull File Validation Checks (Duplication, Integrity, Size, Periodicity) Data Integrity Checks (Column/Rec. Counts, Schema Validation) Lineage Tracking (Metadata Capture, Watermarks) Deep Integrity Checks (Bit Level Scans, Periodic Checksums) Data HubIntegration Data Profiling Data Cleansing Enrichment Metadata Collection Data Lineage Tracking Transformation Unstructured/ structured Profiling (Data completeness, Correctness, coherence) Deletion (Tuple, pairwise) Imputation (Mean/median predicted value) Structured Data (Table/Attr. level) Unstructured Data (Word/Document level: Stop words, stemming,...) Structured Data (Aggr., Decompos.) Unstructured Data (Extract.,Tagging, Entity Recognit. ) LoadDistribution Vertical:Parti-tioning (Range,Mod.,Key-Value,Random Horizontal:Pipelining Polystructured Data Sources Logs Social Media External Data Interactions Structured Data Master Data Applications Channels Data Stores * ) Vgl. P. Pasupuleti, B. S. Purra External Access Interfaces for SQL JDBC Web Services SFTP Push- /Pull- based Data Classification (Named entity class, Topic modelling, Text clustering) Relation Extraction (Column types, pattern, ref.. integrity, features, semantics) Indexing Data (Inverted Index, Faceted/Fuzzy Search, Semantic Analysis) Metadata publication (Catalog of Raw and Data Hub Zones) Data formatting (Standard/ custom) & Data selection (Row/column-, content-based) Konzept und denkbare Funktionsbereiche *)
  • 8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 10 Based on Raw Data Full Access to Data Sources (Select only) Complete Sandbox Environment Agile Experimentation “Fail Fast” Data Lab Key Requirements
  • 9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Alex Sadovsky, Director of Data Science @ The Oracle Data Cloud describes how to embrace cloud computing, Hive, and Spark to create machine learning solutions at scale. YouTube  URL Warum Cloud? Machine Learning at Scale  Cluster zeitweise massiv aber nicht permanent benutzen
  • 10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. “Data Scientists should not be System Administrators • If hardware fails, throw it away • If someone messes up the OS, trash it • No support tickets, no time wasted” “Data scientists should not have to deal with system administrators • Science is about experimentation • Experimentation is about testing boundaries • No support tickets, no time wasted” “Don’t be afraid to throw money (more computer resources) at a problem • Engineers and their time are often more expensive than computer resources • “Burstable” solutions” Warum Cloud? Für experimentelles Arbeiten – einige Überlegungen von Alex Sadovsky
  • 11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. www.csm.ornl.gov/PR/clusters.jpg Selbstbau ist teuer! • Kosten für System Engineers • Kosten für zusätzliche Entwicklungzeiten • im Hinblick auf Personal (Gehalt) • im Hinblick auf verlagerte Arbeit • Kosten für besondere Developer Skills • Leute mit Python/SQL-Kenntnissen sind z.Z. noch leichter zu finden im Vergleich zu Spark-/Hive-Spezialisten • Be lazy. Warum nicht ~3..5€/Stunde für eine Single Cloud instance bezahlen – anstatt eine eigene Infrastruktur für Datenexperimente aufzubauen und zu warten? Warum Cloud? Case Study: Der 10-Node-Selbstbau Spark Cluster
  • 12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 14 Oracle Data Management & Analytics Plattform Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 14 DATA LAKE Big Data Cloud Services DEVELOPERS BUSINESS IT ANALYSTS COMPUTE  STORAGE  NETWORK  IDENTITY ANALYTICS SERVICES Oracle Analytics Cloud LOCATION & NETWORK RELATIONSHIPS Spatial, Graph MACHINE LEARNING ORAAH, R, Spark ML SEARCH SMARTS PREDICTION LEARNING MOBILE NAT. LANG. PERSONALIZED SOCIAL SENSORS PERSONAL SaaS MOBILE ENTERPRISE STORE & EXECUTE Oracle Hadoop, Cloudera, NoSQL CATALOG Data Catalog, Cloud Navigator QUERY Elastic Search, SparkSQL DATABASE INTEGRATION Connectors, Big Data SQL INTEGRATION SERVICES Data Integration Cloud Service BATCH STREAMING DATA All Data • Real-time & Batch • Data Science & Business User • Agile • Scalable • Economical
  • 13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer Data Sources Streaming Batch Business Users High Performance Messaging Event Processing and Cache Metadata Enrichment Reporting Database Data Discovery & Analytics Reporting Adaptor Based Integration Change Data Capture Files Database Real-time Cloud Weitere.. Notebooks & discovery SQL access to Data Lake Long-term, low cost data storage 15 Oracle Data Lake & Analytics Merkmale
  • 14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Business Users Oracle Analytics Cloud Oracle Data Lake & Analytics Files Database Real-time Cloud Weitere.. Schlüsseltechnologien und Services KafkaAPI Spark SQL Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming GoldenGate RestAPI RESTAPI Object Storage Big Data Prep. Adaptors Integration Cloud Service HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook JDBC JDBC Oracle DB Business Intelligence Data Visualization Essabase BI Mobile/ Day by Day SmartView/ Office Web Browser Data Visualiz. Desktop 16 SparkSQL TEZ Spark
  • 15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17 2 Oracle Cloud Journey “The New Data Lake” Storage und Big Data Services einrichten
  • 16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18 Basis Cloud-Module Szenario: The New Data Lake Oracle Public Cloud Data Sources Batch Files Database Real-time Cloud Weitere.. KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPIKafka Queue Streaming RestAPI RESTAPI Object Storage HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark
  • 17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 19 Oracle Storage Cloud Service Object Storage • Sehr preiswerte & flexible Speicherung beliebiger Daten • für strukturierte und unstrukturierte Daten • Im Gegensatz zu bekannten Dateisystemen enthalten Objekte zwar Daten, sind allerdings nicht in einer Hierarchie organisiert. • Jedes Objekt befindet sich auf der gleichen Ebene eines Adressraums, wird mithilfe seiner erweiterten Metadaten charakterisiert und bekommt einen einzigartigen Identifikator zugewiesen. • Server oder Endanwender können das Objekt beziehen, müssen den physischen Standort der Daten nicht kennen. • Diese Herangehensweise ist für die Automatisierung und Rationalisierung der Datenspeicherung in Cloud- Computing-Umgebungen nützlich
  • 18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Business Users The New Data Lake Files Database Real-time Cloud Weitere.. Object Storage einrichten und verwenden KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming RestAPI RESTAPI Object Storage Data Visualiz. Desktop 20 HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark
  • 19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21
  • 20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22
  • 21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 23
  • 22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 24
  • 23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Business Users The New Data Lake Files Database Real-time Cloud Weitere.. Object Storage einrichten und Zugriff via CloudBerry *) KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming Data Visualiz. Desktop 25 *) Infos + Download: www.cloudberrylab.com/solutions/oracle-cloud HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook RestAPI RESTAPI Object Storage SparkSQL TEZ Spark
  • 24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 26
  • 25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 27
  • 26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 28
  • 27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 29
  • 28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30
  • 29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch The New Data Lake SFDC Eloqua RightNow Twitter Weitere.. Object Storage: Kopie eines Bootstrap Skripts (für spätere automatisierte Installation/Konfig.) KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming 31 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage
  • 30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 32 Beispiel: Zeppelin Notebooks aus Github importieren Bootstrap-Skripting für die Oracle Big Data Cloud
  • 31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 33 Beispiel: Installation von Anaconda inkl. TensorFlow und Konfiguration von Python Bootstrap-Skripting für die Oracle Big Data Cloud
  • 32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34
  • 33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 35 Object Storage: Upload Bootstrap Script via Web UI The New Data Lake
  • 34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 36 Object Storage: Weitere Operationen via Web UI The New Data Lake
  • 35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 37
  • 36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch The New Data Lake Files Database Real-time Cloud Weitere.. Big Data Cloud Service einrichten KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming 38 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage
  • 37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 39
  • 38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 40
  • 39. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 41
  • 40. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 42 Big Data Cloud Service einrichten: Cluster Konfiguration The New Data Lake Demo-Sequenz
  • 41. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 43 Big Data Cloud Service einrichten: Schlüssel für SSH-Zugriff The New Data Lake Demo-Sequenz
  • 42. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 46
  • 43. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 47
  • 44. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 48
  • 45. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 49
  • 46. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 50
  • 47. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 51 Big Data Cloud Service einrichten: Zugriff für SSH und AMBARI Admin Console anpassen The New Data Lake
  • 48. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 52
  • 49. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 53
  • 50. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 54
  • 51. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 55
  • 52. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 56
  • 53. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 57
  • 54. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 58
  • 55. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 59
  • 56. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 60
  • 57. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 61 3 Oracle Cloud Journey – Teil 1 Coding & Analyse mit Notebooks Big Data-Technologien voll ausnutzen
  • 58. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 62 Notebooks Dokumente mit live ausführbarem Code – in fast beliebigen Programmiersprachen
  • 59. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 63
  • 60. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 64 Demo Use Case New York City Bikes: Historische und Streaming-Daten analysieren Historische Daten Download aus Amazon Cloud (AWS S3) Echtzeitdaten
  • 61. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Big Data Cloud Journey Files Database Real-time Cloud Weitere.. Notebook Basics, Dateioperationen in HDFS und Object Store, Hive-Tabellen anlegen KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming 65 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage
  • 62. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 66 Oracle Big Data Cloud Service – Compute Edition Hive - Tez Query Execution Engine docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_performance_tuning/content/hive_perf_best_pract_config_tez.html
  • 63. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 67
  • 64. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 68
  • 65. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 69
  • 66. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Big Data Cloud Journey Files Database Real-time Cloud Weitere.. Mit Spark Scala und Spark SQL arbeiten, Caching im Hauptspeicher KafkaAPI Hbase API HBase Python KafkaAPI Kafka Queue Streaming 70 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage Spark Streaming Alluxio
  • 67. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 71
  • 68. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 72
  • 69. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Big Data Cloud Journey Files Database Real-time Cloud Weitere.. Mit Data Discovery Tools arbeiten KafkaAPI Hbase API HBase Python KafkaAPI Kafka Queue Streaming 73 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage Spark Streaming Alluxio
  • 70. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Oracle Data Visualization Data Set Management Lightweight Data Profiling Data Flow Editor Visual Analyzer Datenaufbereitung, -verknüpfung und interaktive Analyse in einem Werkzeug 74
  • 71. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Oracle Data Visualization Import, Refresh, Verwaltung der Data Sets Prozessablauf Neue Daten inspizieren, auf Qualität/Vollständigkeit prüfen und verstehen Data Sets bereinigen, filtern kombinieren, anreichern (Data Pipelines bauen) Interaktiv Analysieren, Zusammenhänge erkennen und visualisieren Ergebnisse kommentieren, Analyseschritte dokumentieren (Story Telling) Neue Datenquellen hinzunehmen Weitere vorhandene Data Sets ansehen Data Sets weiter aufbereiten, anreichern Komplett neue Fragen und Analyseideen verfolgen Neue Perspektiven und Datensichten umsetzen Mit dem Werkzeug • können komplette Analyse-Projekte umgesetzt werden, • ist Rapid Prototyping möglich (anstelle starrer Spezifikationen in Papierform) • Lassen sich Einmal-Analysen umsetzen, die eine Erweiterung der Business Intelligence-Plattform (noch) nicht rechtfertigen 75
  • 72. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 76
  • 73. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 77
  • 74. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 78 Database as a Service (DBaaS) Middle-Tier Schema Oracle Storage Cloud Service (OSCS) Backup, Restore, DataViz Mashup, … Oracle Data Visualization*) Oracle Business Intelligence Oracle Day by Day Oracle Essbase Und jetzt das alles bitte “Enterprise Ready” Oracle Analytics Cloud (OAC) *) Import von Custom DV Plugins und R-Skripts bzw. zusätzliche R- Pakets ist technisch möglich, der offizielle Support durch Oracle aber noch in Vorbereitung Oracle Analytics Cloud cloud.oracle.com/de_DE/oac
  • 75. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Business Users Oracle Analytics Cloud Oracle Analytics Cloud SFDC Eloqua RightNow Twitter Weitere.. Zusammenspiel mit Big Data Services KafkaAPI Spark SQL Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming GoldenGate RestAPI RESTAPI Object Storage Big Data Prep. Adaptors Integration Cloud Service HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook JDBC JDBC Oracle DB Business Intelligence Data Visualization Essabase BI Mobile/ Day by Day SmartView/ Office Web Browser Data Visualiz. Desktop 79 SparkSQL TEZ Spark
  • 76. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 80 4 Oracle Cloud Journey “The New Data Lake” Event Hub Service einrichten
  • 77. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 81 Oracle Event Hub Cloud Service setzt auf Apache Kafka als Schlüsseltechnologie The New Data Lake Apache Kafka Ein Message Broker, dessen Architektur die Verarbeitung von Datenströmen mit sehr hohem Nachrichtendurchsatz bei niedrigen Latenzen ermöglicht. Wichtige Komponenten • Anwendungen, die Daten in einen Kafka Cluster schreiben, werden als Producer bezeichnet, Anwendungen, die Daten von dort lesen, als Consumer. • Daten, die an einen Kafka Cluster geschickt werden, werden in sogenannten Topics gruppiert. • Ein Topic kann wiederum in mehrere Partitionen unterteilt sein, wobei jede Partition redundant auf mehreren Knoten (Broker) gespeichert werden kann. Innerhalb einer Partition werden die Datensätze in der Reihenfolge in der sie geschrieben werden gespeichert.
  • 78. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 82 Verwendungsmöglichkeiten von Kafka The New Data Lake Quelle: Confluent (www.confluent.io)
  • 79. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Business Users The New Data Lake Files Database Real-time Cloud Weitere.. Event Hub Service (OEHCS) einrichten KafkaAPI Spark Streaming Alluxio Hbase API HBase Python KafkaAPI Kafka Queue Streaming Data Visualiz. Desktop 83 HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook RestAPI RESTAPI Object Storage SparkSQL TEZ Spark
  • 80. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 84 OEHCS Konfiguration – Teil 1 The New Data Lake
  • 81. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 85
  • 82. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 86
  • 83. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 87
  • 84. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 88
  • 85. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 89
  • 86. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 90
  • 87. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 91
  • 88. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 92
  • 89. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 93
  • 90. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 94
  • 91. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 95
  • 92. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 96
  • 93. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 97
  • 94. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 98
  • 95. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 99
  • 96. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 100
  • 97. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 101
  • 98. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 102 OEHCS Konfiguration – Teil 2 The New Data Lake
  • 99. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 103
  • 100. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 104
  • 101. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 105
  • 102. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 106
  • 103. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 107
  • 104. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 108
  • 105. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 109 5 Oracle Cloud Journey – Teil 2 Streaming Data Big Data-Technologien voll ausnutzen
  • 106. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Public Cloud Data Scientists / Developer R Studio Cloud Berry Web Browser Notebooks Data Sources Batch Big Data Cloud Journey Files Database Real-time Cloud Weitere.. Mit Kafka und Spark Streaming arbeiten (Simulation: Bike Usage als Echtzeit-Datenstrom) KafkaAPI Hbase API HBase Python KafkaAPI Kafka Queue Streaming 110 Business Users Data Visualiz. Desktop HiveSQL TEZ Hive TEZ Alluxio HTML HTML Notebook SparkSQL TEZ Spark RestAPI RESTAPI Object Storage Spark Streaming Alluxio
  • 107. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 111 Oracle Big Data Cloud Service – Compute Edition Big Data File System (BDFS) – In-memory Caching Layer (von Alluxio) docs.oracle.com/en/cloud/paas/big-data-compute-cloud/csspc/big-data-file-system-bdfs.html
  • 108. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 112 Verwendung Event Hub Service im Demo-Szenario Big Data Cloud Journey Simulation: Echtzeit-Datenstrom (Bike Usage) Visualisierung Live Map
  • 109. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 113
  • 110. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 114
  • 111. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 115
  • 112. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 116
  • 113. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 117
  • 114. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 118
  • 115. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 119 Demo-Sequenz
  • 116. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 120
  • 117. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 121 6 Oracle Cloud Journey “The New Data Lake” Cluster Scale out per Knopfdruck
  • 118. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 122
  • 119. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 123
  • 120. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 124
  • 121. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 125
  • 122. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 126
  • 123. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 127