SlideShare a Scribd company logo
1 of 44
Download to read offline
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA
HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Big Data - in der Cloud oder doch
lieber On-Premises?
Guido Schmutz
Kassel, 21.9.2017
@gschmutz guidoschmutz@wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
2
Agenda
1. Cloud Primer
2. Big Data and IoT Architecture
3. Big Data in the Cloud
4. Various Models for Big Data in Cloud
5. Big Data On-Premises
6. Hybrid Big Data Solutions
4
Cloud Primer
Cloud Primer
5
Instance
• the thing running in the cloud provider’s infrastructure
• can be a VM but does not have to be
Instance Type
• the size of the instance (Combination of CPU, Memory, Disk Storage => Cost)
• Azure: Instance sizes
Instance Control
• lifecycle of an instance
• Instances can be stopped or terminated (deleted)
Cloud Primer
6
Images
• the template used for provisioning an
instance
Serverless
• Run code “without” servers => only
specify functions (Java, C#, Python,
Node.js)
• Pay only for the compute time you
consume
• easy scale-out
• management and capacity planning
decision done by provider
Regions and Availability Zones
• represents geographic distribution of
cloud provider
• Regions are the geographic areas
where a service is offered
• Availability Zones (AZ) add high
availability within a Region
• communication within AZ in same
region cost less than across regions
Cloud Primer – Specific Instances
7
On-Demand Instance
• flexible, on-demand usage
• billing increment dependent on provider
Temporary Instance
• can disappear at any time (bid price)
• are charged significantly less
• well suited for Hadoop workloads (if storage
and compute are separated)
• AWS: spot instances
Reserved Instance
• reserved capacity in advance
• reduced pricing (up to 75% to on-demand)
Dedicated Instance
• pay for instances
• run on hardware dedicated to you
• Amazon decides placement
Dedicated Host
• pay for entire physical server
• full flexibility of placement of instances (VM)
• solves existing server-bound licenses issues
Bare Metal
• bare hardware resources, no virtualization by
cloud provider
• full flexibility / full control
• almost no automation provided
Cloud Primer - Storage
8
Block Storage
• most common type offered by a cloud
provider
• disk-like storage
• comes with each instance when provisioned
• accessed as filesystem mounts => volumes,
disks
• persistent volumes survive beyond lifetime
of instance that spawned it
• ephemeral volumes are limited to life of
instance to which they are attached
• AWS: EBS
• Azure: VHDS & Azure File Storage
• Oracle: Block Storage
Object Storage
• each chunk of data is treated as its own
entity independent of any instance
• content of each object is opaque to the
provider
• API or URL is used to access data (no
mount)
• well suited for Big Data
• hot and cold storage options
• AWS: S3 & Glacier
• Azure: Azure Blob Storage
• Oracle: Object Storage & Archive Storage
Cloud Primer – Usage Patterns
9
Short Lived (Transient)
👍 Minimal maintenance, high efficiency
👎 spin up time, higher resource demand
👎 data transfer to permanent storage
Self-Service
👍 efficiency of on-demand creation
👎 need to maintain tooling
Cloud-Only
👍 data transfer stay within cloud, minimal on-
premises costs, integration with provider
👎 higher cloud expenditure
Long lived (Long Running)
👍 less time waiting for clusters to start/stop
👍 lower resource demand
👎 wasted idle time (if there is)
👎 maintenance burden, growing size over time
Managed
👍 easy alignment with budget constraints
👎 waiting time for usage, admin effort
Hybrid
👍 lower cloud expenditure, local resources
available
👎 complex workflows, data transfer costs
10
Big Data & IoT Architecture
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Cloud / On-PremisesEdge
Internet /
Cloud /
On-Premises
1) Bulk Source – Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
2) Bulk Source - Edge & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
3) Event Source – Stream & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
4) Event Source – Edge & Stream & Bulk Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
5) Stream Ingestion – Edge & Stream Processing
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Big Data & IoT Reference Architecture
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
20
Big Data in the Cloud
Big Data in the Cloud – two usage patterns
21
Short Lived Cluster (Transient)
data is repurposed, and used for a
specific use case in a specific workload
Cluster spun up only when needed
Flexibility
• spin up arbitrary number of nodes quickly
• Expand quickly from very small to very large
Simplicity
• use as is, solve problem and move on
Long Lived Cluster (Long Running)
data is acquired and augmented
continuously
cluster is in permanent use for mixed
workloads
Performance
• Raw compute performance across wide range
of workloads
• time of availability
BDaaS – Possible Cost Optimizations
22
Autoscaling
• scale up when a query comes in
• scale down when jobs finish
• match utilization with job demand
• benchmark: auto-scaling saves 33% in
compute costs compared to static-
sized cluster
Excess capacity
• Hadoop is fault tolerant, can take
advantage of unreliable instances
such as temporary instances
• benchmark: if 50% is done on spot
nodes, save 80% compared to normal
nodes
Common workload distribution with Big Data applications
Data Locality vs. Compute/Storage Separation
23
Data Local Compute Separate Compute and Storage
Worker #1
Disk
Processing
Master Node
Worker #2
Disk
Processing
Worker #3
Disk
Processing
Network
Storage
Disk Disk Disk
Compute #1
Processing
Compute #2
Processing
Compute #3
Processing
Network
Master Node
Network
Separation of compute
and storage – the
fundamental difference
• store data in Object
Storage instead of DFS
• bring up Compute nodes
only for data processing
• multiple workloads on
separate clusters can
access same data
A new way to Manage Big Data
24
Big Data Traditional
Assumptions
Bare-metal
Data Locality
HDFS on local disks
Big Data
A New Approach
Containers and VMs
Compute and storage
separation
Shared storage
Benefits and Value
Big-Data-as-a-Service
Agility and cost savings
Faster time-to-insights
5 ½ ways to get Big Data in the Cloud
26
1. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on Bare Metal
2. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on VM
3. Hadoop PaaS from Cloud Provider’s Marketplace
4. Dedicated (Long-Running) BigData-as-a-Service
5. Elastic (Transient) Big-Data-as-a-Service (storage and compute
separated)
6. “Cloud on Premises” (Cloud Stack from Vendors on Premises)
28
Various Models for Big Data in
Cloud
Various Models for Big Data in Cloud
29
1. Bare Metal Cloud (Bring Your Own Hadoop - BYOH)
2. IaaS with any Hadoop Distribution (Bring Your Own Hadoop)
3. PaaS with Hadoop (from Marketplace)
4. Dedicated (Long-Running) BDaaS
5. Elastic (Transient) BDaaS
6. BDaas + Analytics SaaS
1) Bare Metal Cloud (BYOH)
30
Compute	(Bare	Metal)
Big	Data	(Custom)
Oracle	Compute
Analytics	(Custom)
Storage	(Bare	Metal)
Oracle	Block	Volume	&	
Object	Storage,	Data	
Transfer	Service
Intelligence	(Custom)
Amazon
Azure
Oracle
Custom
n.a.	(Dedicated	Host	
close,	but	runs	VMs)
n.a.
n.a.	(Dedicated	Host,	
close,	but	runs	VMs)
n.a.
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
2) IaaS (Bring Your Own Hadoop)
31
Amazon	EC2	&	EC2	 Azure	VM
Bring	Your	Own	Hadoop	
(BYOH)
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
General	Purpose	
Compute	&	Dedicated	
Compute
Bring	Your	Own	Hadoop	
(BYOH)
Custom	(SQL,	Machine	
Learning,	..)
S3,	EBS,	Glacier,	
Snowball,	Snowball	
Edge,	Snowmobile
Storage	(Blob),	Data	
Lake	Store,	
Import/Export
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
3) PaaS (Hadoop from Marketplace)
32
S3,	EBS,	Glacier,	
Snowball,	Snowball	
Edge,	Snowmobile
Hadoop	(Hortonworks,	
MapR)
Hadoop	(Cloudera,	
Hortonworks,	MapR)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Custom	(Image-,	
Speech-Recognition,	
Bots,	…)
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
n.a.
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
4) Dedicated BDaaS
33
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Image-,	Speech-
Recognition,	Bots,	…
Image-,	Speech-
Recognition,	Bots,	…
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	(Cloudera)
Custom	(SQL,	Machine	
Learning,	..)
Image-,	Speech-
Recognition,	Bots,	…
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
5) Elastic BDaaS
34
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Custom	(SQL,	Machine	
Learning,	..)
Amazon	EC2 Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Image-,	Speech-
Recognition,	Bots,	…
Image-,	Speech-
Recognition,	Bots,	…
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	Compute	
Edition	(Hortonworks)
Custom	(SQL,	Machine	
Learning,	..)
Image-,	Speech-
Recognition,	Bots,	…
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
6) BDaaS + Analytics SaaS
35
S3,	EBS,	Glacier
Amazon	EMR
Azure	HDInsight	
(Hortonworks)
Machine	Learning,	
Polly,	…
Machine	Learning,	Data	
Lake	Analytics,	…
Amazon	EC2	&	EC2	
Dedicated	Hosts
Azure	VM
General	Purpose	
Compute	&	Dedicated	
Compute
Azure	Storage	(Blob,	
Block,	Disk,	File),	Azure	
Data	Lake	Store
Alexa,	Lex,	Polly
Cortana,	Speech	API,	
Computer	Vision	API,	
Video	API,	...
Oracle	Object	&	Archive	
Storage,	Data	Transfer	
Service
Big	Data	CS	Compute	
Edition	/	Big	Data	CS
Big	Data	Discovery	CS,	
Analytics	Cloud,	Data	
Spatial	&	Graph
n.a.
Amazon
Azure
Oracle
Custom
Compute	(Bare	Metal)
Big	Data	(Custom)
Analytics	(Custom)
Storage	(Bare	Metal)
Intelligence	(Custom)
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Oracle Cloud
36
IoT CS
Event	Hub	CS
Stream	
Analytics
Big	Data	CS
NoSQL	CS
Big	Data	
Discovery	CS
Big	Data	CS	–
Compute
Object
Storage
Archive	
Storage
Data	Transfer	
Service
Block	
Storage
NoSQL	CS
Data	Special	
&	Graph
Data	Transfer	
Service
BigData SQL
Data	Transfer	
Service
NoSQL	CS
Event	Hub	CS
Data	Transfer	
Service
Integration	CS
Messaging	CS
BI	CS
Process	CS
Mobile	CS
Container	CS
Application	
Container	CS
GoldenGate
Visual	Builder
Big	Data	
Preparation	CS
Data	
Visualization	CS
Oracle	Data	
Integrator	CS Analytics	CS
Amazon AWS
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Elastic	MapReduce	(EMR)
Polly
ML
Lex
Rekognition
Kinesis	Analytics
Kinesis	Streams
Kinesis	Firehose
Snowmobile
Snowball
AWS	IoT Platform Lambda
Direct	Connect
S3
Glacier
Dynamo	DB
EC2 Auto	Scaling	
EBS
EFS
Alexa
Athena
Dynamo	DB
Snowball
Direct	Connect
Snowball	Edge
Kinesis	Firehose
Athena
Snowball
Greengrass
Rules	Engine
Lambda
Redshift
EC2	Container	Service
EC2	Container	Registry
Mobile	Hub
Mobile	SDK
Lambda
SQSSNSEmail
PinpointAPI	Gateway
Elasticsearch
ElasticCache
Dynamo	DB
Elasticsearch
TensorFlow
Glue
Data	pipeline
QuickSight
Microsoft Azure
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
HD	Insight
Storage	Blob
Machine	
Learning
Data	Lake	
Store
Storage	Block
Data	Lake
Analytics
Event	Hub
Stream
Analytics
IoT Suite
Cosmos	DB
Import/Export
Import/Export
Speech	
API
Vision	API
Cortana
Bot	Service
Service	Bus
Notification	Hub
API	Management
Power	BI
BizTalk	Services
Event	Hub
IoT Hub
IoT Edge
SQL	Data	
Warehouse
Table	Storage
Redis	Cache
Functions
Container	Service
Container	Registry
Cosmos	DB
Table	Storage
Container	Instances
Time	Series	Insight
Time	Series	Insight
Event	Grid
43
Big Data On-Premises
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
On-Premises – Oracle Cloud Machine
44
IoT CS
Event	Hub	CS
Stream	
Analytics
Big	Data	CS
NoSQL	CS
Big	Data	
Discovery	CS
Big	Data	CS	–
Compute
Object
Storage
Archive	
Storage
Data	Transfer	
Service
Block	
Storage
NoSQL	CS
Data	Special	
&	Graph
Data	Transfer	
Service
BigData SQL
Data	Transfer	
Service
NoSQL	CS
Event	Hub	CS
Data	Transfer	
Service
Integration	CS
Messaging	CS
BI	CS
Process	CS
Mobile	CS
Container	CS
Application	
Container	CS
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
On Premises – Open Source
45
46
Hybrid Big Data Solutions
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
47
Cloud On-PremOn-Prem/Edge/
Internet
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
48
Cloud On-PremOn-Prem/Edge/
Internet
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
49
CloudOn-Prem/Edge/
Internet
On-Prem
Bulk Source
Event Source
Location
DB
Extract
SQL /
Stream
Search
SQL /
Export
Service /
Stream /
Export
BI Tools
Enterprise Data
Warehouse
Search /
Explore
Enterprise
Apps
Import
Import
Edge Cluster
Storage
Core Processing
Stream
Processing
Reference /
Models
File
Weather
Batch Analytics
Stream Analytics
Parallel
Processing
Storage
Storage
RawRefined
Results
Serverless
DB
CDC
Event Hub
Edge Node
Serverless
Rule Engine
Event Hub
Event Hub
Serverless
Processing
File
CDC
Storage
Stream
Stream
State /
Results
IoT
Data
Mobile
Apps
Hybrid Big Data Solutions
50
CloudOn-Prem/Edge
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
@gschmutz guidoschmutz.wordpress.com

More Related Content

What's hot

How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...Amazon Web Services
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data Craig Milroy
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseRightScale
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaKai Wähner
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Amazon Web Services
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journeyAmazon Web Services
 
Cloud Adoption Framework - Overview_partner.pptx
Cloud Adoption Framework - Overview_partner.pptxCloud Adoption Framework - Overview_partner.pptx
Cloud Adoption Framework - Overview_partner.pptxabhishek22611
 
Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioAraf Karsh Hamid
 
CAF presentation 09 16-2020
CAF presentation 09 16-2020CAF presentation 09 16-2020
CAF presentation 09 16-2020Michael Nichols
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSAmazon Web Services
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatchAmazon Web Services
 

What's hot (20)

AWS Intro & History
AWS Intro & HistoryAWS Intro & History
AWS Intro & History
 
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
How a Global Healthcare Company Built a Migration Factory to Quickly Move Tho...
 
Well Architected Framework - Data
Well Architected Framework - Data Well Architected Framework - Data
Well Architected Framework - Data
 
How to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your EnterpriseHow to Set Up a Cloud Cost Optimization Process for your Enterprise
How to Set Up a Cloud Cost Optimization Process for your Enterprise
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Getting started on your AWS migration journey
Getting started on your AWS migration journeyGetting started on your AWS migration journey
Getting started on your AWS migration journey
 
Cloud Adoption Framework - Overview_partner.pptx
Cloud Adoption Framework - Overview_partner.pptxCloud Adoption Framework - Overview_partner.pptx
Cloud Adoption Framework - Overview_partner.pptx
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
Containers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes IstioContainers Docker Kind Kubernetes Istio
Containers Docker Kind Kubernetes Istio
 
FinOps
FinOpsFinOps
FinOps
 
CAF presentation 09 16-2020
CAF presentation 09 16-2020CAF presentation 09 16-2020
CAF presentation 09 16-2020
 
Executing a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWSExecuting a Large-Scale Migration to AWS
Executing a Large-Scale Migration to AWS
 
Serverless
ServerlessServerless
Serverless
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 
Serverless Architectures.pdf
Serverless Architectures.pdfServerless Architectures.pdf
Serverless Architectures.pdf
 
SMS-and-CloudEndure-Module4
SMS-and-CloudEndure-Module4SMS-and-CloudEndure-Module4
SMS-and-CloudEndure-Module4
 
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
(DVO315) Log, Monitor and Analyze your IT with Amazon CloudWatch
 

Viewers also liked

Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Canada
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Guido Schmutz
 
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCGIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCJim Tochterman
 
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...vasuballa
 
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012HughPW
 
Cloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSCloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSGeoCirrus
 
Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011chelm
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
 
David Overton: GIS in the cloud
David Overton: GIS in the cloudDavid Overton: GIS in the cloud
David Overton: GIS in the cloudAGI Geocommunity
 

Viewers also liked (12)

Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
Cisco Connect Toronto 2017 - Cloud and On Premises Collaboration Security Exp...
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?
 
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SCGIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
GIS & Cloud Computing - GAASC 2010 Fall Summit - Florence, SC
 
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
OOW16 - Deploying Oracle E-Business Suite for On-Premises Cloud and Oracle Cl...
 
GIS Into to Cloud Microsoft Azure
GIS  Into  to Cloud Microsoft Azure GIS  Into  to Cloud Microsoft Azure
GIS Into to Cloud Microsoft Azure
 
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012Spatial Cloud Computing And Gis Web Version, Urisa October 2012
Spatial Cloud Computing And Gis Web Version, Urisa October 2012
 
GIS and the Cloud
GIS and the CloudGIS and the Cloud
GIS and the Cloud
 
Cloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUSCloud GIS Software – GEOCIRRUS
Cloud GIS Software – GEOCIRRUS
 
Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011Cloud GIS - GIS in the Rockies 2011
Cloud GIS - GIS in the Rockies 2011
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
David Overton: GIS in the cloud
David Overton: GIS in the cloudDavid Overton: GIS in the cloud
David Overton: GIS in the cloud
 
cloud computing ppt
cloud computing pptcloud computing ppt
cloud computing ppt
 

Similar to Big Data - in the cloud or rather on-premises?

Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureGuido Schmutz
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatestrajramab
 
Windows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongWindows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongSpiffy
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsGuido Schmutz
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...AFAS Software
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010DavidGristwood
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBill Wilder
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongSpiffy
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudDavid Chou
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the CloudNigel Watson
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloudJames Serra
 

Similar to Big Data - in the cloud or rather on-premises? (20)

Fundamentals Big Data and AI Architecture
Fundamentals Big Data and AI ArchitectureFundamentals Big Data and AI Architecture
Fundamentals Big Data and AI Architecture
 
Windowsazureplatform Overviewlatest
Windowsazureplatform OverviewlatestWindowsazureplatform Overviewlatest
Windowsazureplatform Overviewlatest
 
Windows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan WongWindows Azure Platform - Jonathan Wong
Windows Azure Platform - Jonathan Wong
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
Pieter de Bruin (Microsoft) - Welke technologie gebruiken bij implementatie M...
 
Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010Understanding the Windows Azure Platform - Dec 2010
Understanding the Windows Azure Platform - Dec 2010
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows Azure
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan Wong
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The Cloud
 
Microsoft Partner Roadshow - To the Cloud
Microsoft Partner Roadshow  - To the CloudMicrosoft Partner Roadshow  - To the Cloud
Microsoft Partner Roadshow - To the Cloud
 
India Webinar
India WebinarIndia Webinar
India Webinar
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Benefits of the Azure cloud
Benefits of the Azure cloudBenefits of the Azure cloud
Benefits of the Azure cloud
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 

More from Guido Schmutz

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as CodeGuido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsGuido Schmutz
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureGuido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaGuido Schmutz
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming VisualisationGuido Schmutz
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 

More from Guido Schmutz (20)

30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code30 Minutes to the Analytics Platform with Infrastructure as Code
30 Minutes to the Analytics Platform with Infrastructure as Code
 
Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-FormatsBig Data, Data Lake, Fast Data - Dataserialiation-Formats
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?Kafka as your Data Lake - is it Feasible?
Kafka as your Data Lake - is it Feasible?
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) ArchitectureEvent Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache KafkaLocation Analytics - Real-Time Geofencing using Apache Kafka
Location Analytics - Real-Time Geofencing using Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?What is Apache Kafka? Why is it so popular? Should I use it?
What is Apache Kafka? Why is it so popular? Should I use it?
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Location Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using KafkaLocation Analytics Real-Time Geofencing using Kafka
Location Analytics Real-Time Geofencing using Kafka
 
Streaming Visualisation
Streaming VisualisationStreaming Visualisation
Streaming Visualisation
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Big Data - in the cloud or rather on-premises?

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH Big Data - in der Cloud oder doch lieber On-Premises? Guido Schmutz Kassel, 21.9.2017 @gschmutz guidoschmutz@wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz 2
  • 3. Agenda 1. Cloud Primer 2. Big Data and IoT Architecture 3. Big Data in the Cloud 4. Various Models for Big Data in Cloud 5. Big Data On-Premises 6. Hybrid Big Data Solutions
  • 5. Cloud Primer 5 Instance • the thing running in the cloud provider’s infrastructure • can be a VM but does not have to be Instance Type • the size of the instance (Combination of CPU, Memory, Disk Storage => Cost) • Azure: Instance sizes Instance Control • lifecycle of an instance • Instances can be stopped or terminated (deleted)
  • 6. Cloud Primer 6 Images • the template used for provisioning an instance Serverless • Run code “without” servers => only specify functions (Java, C#, Python, Node.js) • Pay only for the compute time you consume • easy scale-out • management and capacity planning decision done by provider Regions and Availability Zones • represents geographic distribution of cloud provider • Regions are the geographic areas where a service is offered • Availability Zones (AZ) add high availability within a Region • communication within AZ in same region cost less than across regions
  • 7. Cloud Primer – Specific Instances 7 On-Demand Instance • flexible, on-demand usage • billing increment dependent on provider Temporary Instance • can disappear at any time (bid price) • are charged significantly less • well suited for Hadoop workloads (if storage and compute are separated) • AWS: spot instances Reserved Instance • reserved capacity in advance • reduced pricing (up to 75% to on-demand) Dedicated Instance • pay for instances • run on hardware dedicated to you • Amazon decides placement Dedicated Host • pay for entire physical server • full flexibility of placement of instances (VM) • solves existing server-bound licenses issues Bare Metal • bare hardware resources, no virtualization by cloud provider • full flexibility / full control • almost no automation provided
  • 8. Cloud Primer - Storage 8 Block Storage • most common type offered by a cloud provider • disk-like storage • comes with each instance when provisioned • accessed as filesystem mounts => volumes, disks • persistent volumes survive beyond lifetime of instance that spawned it • ephemeral volumes are limited to life of instance to which they are attached • AWS: EBS • Azure: VHDS & Azure File Storage • Oracle: Block Storage Object Storage • each chunk of data is treated as its own entity independent of any instance • content of each object is opaque to the provider • API or URL is used to access data (no mount) • well suited for Big Data • hot and cold storage options • AWS: S3 & Glacier • Azure: Azure Blob Storage • Oracle: Object Storage & Archive Storage
  • 9. Cloud Primer – Usage Patterns 9 Short Lived (Transient) 👍 Minimal maintenance, high efficiency 👎 spin up time, higher resource demand 👎 data transfer to permanent storage Self-Service 👍 efficiency of on-demand creation 👎 need to maintain tooling Cloud-Only 👍 data transfer stay within cloud, minimal on- premises costs, integration with provider 👎 higher cloud expenditure Long lived (Long Running) 👍 less time waiting for clusters to start/stop 👍 lower resource demand 👎 wasted idle time (if there is) 👎 maintenance burden, growing size over time Managed 👍 easy alignment with budget constraints 👎 waiting time for usage, admin effort Hybrid 👍 lower cloud expenditure, local resources available 👎 complex workflows, data transfer costs
  • 10. 10 Big Data & IoT Architecture
  • 11. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 12. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Cloud / On-PremisesEdge Internet / Cloud / On-Premises
  • 13. 1) Bulk Source – Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 14. 2) Bulk Source - Edge & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 15. 3) Event Source – Stream & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 16. 4) Event Source – Edge & Stream & Bulk Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 17. 5) Stream Ingestion – Edge & Stream Processing Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 18. Big Data & IoT Reference Architecture Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps
  • 19. 20 Big Data in the Cloud
  • 20. Big Data in the Cloud – two usage patterns 21 Short Lived Cluster (Transient) data is repurposed, and used for a specific use case in a specific workload Cluster spun up only when needed Flexibility • spin up arbitrary number of nodes quickly • Expand quickly from very small to very large Simplicity • use as is, solve problem and move on Long Lived Cluster (Long Running) data is acquired and augmented continuously cluster is in permanent use for mixed workloads Performance • Raw compute performance across wide range of workloads • time of availability
  • 21. BDaaS – Possible Cost Optimizations 22 Autoscaling • scale up when a query comes in • scale down when jobs finish • match utilization with job demand • benchmark: auto-scaling saves 33% in compute costs compared to static- sized cluster Excess capacity • Hadoop is fault tolerant, can take advantage of unreliable instances such as temporary instances • benchmark: if 50% is done on spot nodes, save 80% compared to normal nodes Common workload distribution with Big Data applications
  • 22. Data Locality vs. Compute/Storage Separation 23 Data Local Compute Separate Compute and Storage Worker #1 Disk Processing Master Node Worker #2 Disk Processing Worker #3 Disk Processing Network Storage Disk Disk Disk Compute #1 Processing Compute #2 Processing Compute #3 Processing Network Master Node Network Separation of compute and storage – the fundamental difference • store data in Object Storage instead of DFS • bring up Compute nodes only for data processing • multiple workloads on separate clusters can access same data
  • 23. A new way to Manage Big Data 24 Big Data Traditional Assumptions Bare-metal Data Locality HDFS on local disks Big Data A New Approach Containers and VMs Compute and storage separation Shared storage Benefits and Value Big-Data-as-a-Service Agility and cost savings Faster time-to-insights
  • 24. 5 ½ ways to get Big Data in the Cloud 26 1. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on Bare Metal 2. “Bring your own Hadoop” (MapR, Cloudera, Hortonworks) on VM 3. Hadoop PaaS from Cloud Provider’s Marketplace 4. Dedicated (Long-Running) BigData-as-a-Service 5. Elastic (Transient) Big-Data-as-a-Service (storage and compute separated) 6. “Cloud on Premises” (Cloud Stack from Vendors on Premises)
  • 25. 28 Various Models for Big Data in Cloud
  • 26. Various Models for Big Data in Cloud 29 1. Bare Metal Cloud (Bring Your Own Hadoop - BYOH) 2. IaaS with any Hadoop Distribution (Bring Your Own Hadoop) 3. PaaS with Hadoop (from Marketplace) 4. Dedicated (Long-Running) BDaaS 5. Elastic (Transient) BDaaS 6. BDaas + Analytics SaaS
  • 27. 1) Bare Metal Cloud (BYOH) 30 Compute (Bare Metal) Big Data (Custom) Oracle Compute Analytics (Custom) Storage (Bare Metal) Oracle Block Volume & Object Storage, Data Transfer Service Intelligence (Custom) Amazon Azure Oracle Custom n.a. (Dedicated Host close, but runs VMs) n.a. n.a. (Dedicated Host, close, but runs VMs) n.a. Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) Custom (Image-, Speech-Recognition, Bots, …)
  • 28. 2) IaaS (Bring Your Own Hadoop) 31 Amazon EC2 & EC2 Azure VM Bring Your Own Hadoop (BYOH) Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) General Purpose Compute & Dedicated Compute Bring Your Own Hadoop (BYOH) Custom (SQL, Machine Learning, ..) S3, EBS, Glacier, Snowball, Snowball Edge, Snowmobile Storage (Blob), Data Lake Store, Import/Export Custom (Image-, Speech-Recognition, Bots, …) Custom (Image-, Speech-Recognition, Bots, …) Oracle Object & Archive Storage, Data Transfer Service Custom (Image-, Speech-Recognition, Bots, …) Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 29. 3) PaaS (Hadoop from Marketplace) 32 S3, EBS, Glacier, Snowball, Snowball Edge, Snowmobile Hadoop (Hortonworks, MapR) Hadoop (Cloudera, Hortonworks, MapR) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Custom (Image-, Speech-Recognition, Bots, …) Custom (Image-, Speech-Recognition, Bots, …) Oracle Object & Archive Storage, Data Transfer Service n.a. Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 30. 4) Dedicated BDaaS 33 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Image-, Speech- Recognition, Bots, … Image-, Speech- Recognition, Bots, … Oracle Object & Archive Storage, Data Transfer Service Big Data CS (Cloudera) Custom (SQL, Machine Learning, ..) Image-, Speech- Recognition, Bots, … Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 31. 5) Elastic BDaaS 34 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Custom (SQL, Machine Learning, ..) Custom (SQL, Machine Learning, ..) Amazon EC2 Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Image-, Speech- Recognition, Bots, … Image-, Speech- Recognition, Bots, … Oracle Object & Archive Storage, Data Transfer Service Big Data CS Compute Edition (Hortonworks) Custom (SQL, Machine Learning, ..) Image-, Speech- Recognition, Bots, … Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 32. 6) BDaaS + Analytics SaaS 35 S3, EBS, Glacier Amazon EMR Azure HDInsight (Hortonworks) Machine Learning, Polly, … Machine Learning, Data Lake Analytics, … Amazon EC2 & EC2 Dedicated Hosts Azure VM General Purpose Compute & Dedicated Compute Azure Storage (Blob, Block, Disk, File), Azure Data Lake Store Alexa, Lex, Polly Cortana, Speech API, Computer Vision API, Video API, ... Oracle Object & Archive Storage, Data Transfer Service Big Data CS Compute Edition / Big Data CS Big Data Discovery CS, Analytics Cloud, Data Spatial & Graph n.a. Amazon Azure Oracle Custom Compute (Bare Metal) Big Data (Custom) Analytics (Custom) Storage (Bare Metal) Intelligence (Custom)
  • 33. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Oracle Cloud 36 IoT CS Event Hub CS Stream Analytics Big Data CS NoSQL CS Big Data Discovery CS Big Data CS – Compute Object Storage Archive Storage Data Transfer Service Block Storage NoSQL CS Data Special & Graph Data Transfer Service BigData SQL Data Transfer Service NoSQL CS Event Hub CS Data Transfer Service Integration CS Messaging CS BI CS Process CS Mobile CS Container CS Application Container CS GoldenGate Visual Builder Big Data Preparation CS Data Visualization CS Oracle Data Integrator CS Analytics CS
  • 34. Amazon AWS Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Elastic MapReduce (EMR) Polly ML Lex Rekognition Kinesis Analytics Kinesis Streams Kinesis Firehose Snowmobile Snowball AWS IoT Platform Lambda Direct Connect S3 Glacier Dynamo DB EC2 Auto Scaling EBS EFS Alexa Athena Dynamo DB Snowball Direct Connect Snowball Edge Kinesis Firehose Athena Snowball Greengrass Rules Engine Lambda Redshift EC2 Container Service EC2 Container Registry Mobile Hub Mobile SDK Lambda SQSSNSEmail PinpointAPI Gateway Elasticsearch ElasticCache Dynamo DB Elasticsearch TensorFlow Glue Data pipeline QuickSight
  • 35. Microsoft Azure Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps HD Insight Storage Blob Machine Learning Data Lake Store Storage Block Data Lake Analytics Event Hub Stream Analytics IoT Suite Cosmos DB Import/Export Import/Export Speech API Vision API Cortana Bot Service Service Bus Notification Hub API Management Power BI BizTalk Services Event Hub IoT Hub IoT Edge SQL Data Warehouse Table Storage Redis Cache Functions Container Service Container Registry Cosmos DB Table Storage Container Instances Time Series Insight Time Series Insight Event Grid
  • 37. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps On-Premises – Oracle Cloud Machine 44 IoT CS Event Hub CS Stream Analytics Big Data CS NoSQL CS Big Data Discovery CS Big Data CS – Compute Object Storage Archive Storage Data Transfer Service Block Storage NoSQL CS Data Special & Graph Data Transfer Service BigData SQL Data Transfer Service NoSQL CS Event Hub CS Data Transfer Service Integration CS Messaging CS BI CS Process CS Mobile CS Container CS Application Container CS
  • 38. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps On Premises – Open Source 45
  • 39. 46 Hybrid Big Data Solutions
  • 40. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 47 Cloud On-PremOn-Prem/Edge/ Internet
  • 41. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 48 Cloud On-PremOn-Prem/Edge/ Internet
  • 42. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 49 CloudOn-Prem/Edge/ Internet On-Prem
  • 43. Bulk Source Event Source Location DB Extract SQL / Stream Search SQL / Export Service / Stream / Export BI Tools Enterprise Data Warehouse Search / Explore Enterprise Apps Import Import Edge Cluster Storage Core Processing Stream Processing Reference / Models File Weather Batch Analytics Stream Analytics Parallel Processing Storage Storage RawRefined Results Serverless DB CDC Event Hub Edge Node Serverless Rule Engine Event Hub Event Hub Serverless Processing File CDC Storage Stream Stream State / Results IoT Data Mobile Apps Hybrid Big Data Solutions 50 CloudOn-Prem/Edge