"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

# 1
Integration of Hadoop in Business landscape
Michal Alexa
Service Line Manager
Data Innovation Lab
December 2016

# 2
3.472 images
pinned
72 hours new
video content
uploaded
204.000.000 emails
sent
4.000.000 search
queries
277.000 tweets
347.222 photos
sent
Users sweep
416.667 times
2.460.000 new
items of content
shared
216.000 photos
shared
$ 83.000 in online
sales
48.000 apps
downloaded from
the Itunes store
26.380 new
reviews
What happens on the Internet in 60 seconds (2014)

# 3
Big-Data and Business world
Big-Data
 Java, Python, PigLatin
 Massive clusters for big data processing
 Structured & unstructured data
 Apache & open source
 Distributions (e.g. Cloudera)
 Engines (Spark, Impala)
 Fast paced evolution since 2006

# 4
Big-Data
???
 ABAP
 Client/Server
 classic RDBMS as relational database
 Proprietary software with interfaces
 Engines OLTP, OLAP
 World Positioning: 76% of finance
transactions, 78% of food
production, 82% medical devices
 Steady evolution since 1972

# 5
Big-Data
Business
 ABAP
 Client/Server
 classic RDBMS as relational database
 Proprietary software with interfaces
 Engines OLTP, OLAP
 World Positioning: 76% of finance
transactions, 78% of food production,
82% medical devices
 Steady evolution since 1972

# 8
Biggest struggles in Data Management
Scalability
Data-Pipelines
Granularity and Velocity
Data-Silos
Extensibility
• Not any more possible to do lifetime sizing of platform during procurement
• HW requirements create limitations to possible growth
• Scale UP comes often with great cost, and scale DOWN is usually
valueless
• Data transformations are I/O intensive operations
• Take lot of time, consume lot of resources
• Limitations on format of data
• Limitations on granularity of data, often only aggregated and cleaned
data are stored
• Raw data are necessary for data science activities
• Too many places for storing data
• No interconnection between company units limits data analyzing
possibilities
• Data analyses requires lot of programing languages
• Limited applications compatibility

# 9
What is Apache Hadoop?
A software framework for storing, processing and analyzing
“big data”
ScalableDistributed Fault-TolerantOpen Source

# 10
“Data-Lake” In Business infrastructure

# 11
Data-Lake
BW
Source
systems
logs

# 12
Data-Lake
BW
Source
systems
logs
BW

# 13
Emerging new technologies – Integration answers to Big-Data
Smart Data Access
• Data federation feature
available on SAP HANA
• Not fully read-write
• Sybase ASE, Sybase IQ,
Teradata, and Hadoop and
some other databases
Dynamic Tearing
• Supports only Write
Optimized DSO and PSA
• Some restrictions
• Sybase IQ only
• Limited disaster
recovery
• Read & write, but
only on HANA
SDA DT
Nearline Storage
• Move data from online to
“nearline” database
• Read only
• Uses DAP (Data Archiving
Processes)
• Wrong assumption of
Sybase IQ as “one and
only” storage
NLS
SAP HANA VORA
• DB interface between HANA
and Hadoop (Spark)
• Heavily Java-based – no ABAP
workbench integration etc.
• No UI – engine only
• Allows for reporting within
Hadoop based on Spark
VORA
DLM
Data Lifecycle Manager
• Hana Native only, no ERP
• Offloading to IQ or Spark

# 14
Emerging new technologies – Integration answers to Big Data
Smart Data Access
• Data federation feature
available on SAP HANA
• Not fully read-write
• Sybase ASE, Sybase IQ,
Teradata, and Hadoop and
some other databases
Dynamic Tiering
• Supports only Write
Optimized DSO and PSA
• Some restrictions
• Sybase IQ only
• Limited disaster
recovery
• Read & write, but
only on HANA
SDA DT
Nearline Storage
• Move data from online to
“nearline” database
• Read only
• Uses DAP (Data Archiving
Processes)
• SAP positions Sybase IQ
as “one and only” storage
NLS
SAP HANA VORA
• DB interface between HANA
and Hadoop (Spark)
• Heavily Java-based – no ABAP
workbench integration etc.
• No UI – engine only
• Allows for reporting within
Hadoop based on Spark
VORA
DLM
Data Lifecycle Manager
• Hana Native only, no ERP
• Offloading to IQ or Spark
Offloading Integration

# 15
Business <> Hadoop struggle
Hadoop Integration with Businesses is difficult for
several reasons:
 Technology readiness
 IT culture
 Data integration
 Operations
• Development strategy
• Software logistics
• Rapid prototyping
• Data protection / personal
data
• SOX compliance
IT culture gap Data integration gap Operational gap
• ETL
• Loading of data
• Staging & enriching of
data within Hadoop
• Data flows from SAP to
Hadoop and back
• Running applications 24x7
between SAP and Hadoop
• Job scheduling
• Testing
• Patching & upgrades
 We should intend to close those gaps

# 16
Summary
• Hadoop is awesome! Lets make it really
available for all businesses.
• Start small, small amount of data and
fast turnover.
• Think about how to enable new
technology to others.

Details, tech. slides and knowledge is shareable during networking.

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard

Similar to "Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard (20)

More from Dataconomy Media

More from Dataconomy Media (20)

Recently uploaded

Recently uploaded (20)

"Integration of Hadoop in Business landscape", Michal Alexa, IT and Innovation Cloud Solution Center Lead at DataVard