SlideShare a Scribd company logo
1 of 29
Download to read offline
Running together in Retail Environment
1
Author: Douglas Bernardini
Big Data Platform
2
Big Data Platform
3
 Collection of Hadoop & Apache solutions running
together and integrated
 Open-source: Apache Software Foundation.
 Works across component technologies and
integrates with pre-existing EDW, RDBMS and MPP
systems.
 Linux and Windows.
 Authentication, Authorization, & Data Protection.
 Native integration with Major BI/ analytics
developers & vendors.
HDP Platform overview
Real time
Ingest
Flume
Real time
Ingest
Storm
Batch
Integration
Sqoop
Integration
Processing
YARN
Storage
HDFS
Data Management
Data Access
Script/ETL
Pig
Process
Map
Reduce
SQL like
Hive
OnLine
Hbase
InMemory
Spark
Hortonworks Data Platform (HDP)
Big Data Platform
4
 Scalable: Store/Distribute very large data sets across
hundreds of servers operating in parallel, with thousands
of nodes involving thousands of terabytes of data.
 Cost effective: Savings are staggering, offering
computing and storage capabilities for hundreds of
dollars per terabyte.
 Flexible: an be used for a wide variety of purposes,
such as log processing, recommendation systems, data
warehousing, market campaign analysis and fraud
detection.
 Fast: able to efficiently process terabytes of data in just
minutes, and petabytes in hours
 Resilient to failure: data in individual nodeis also
replicated to other nodes in the cluster, which means
that in the event of failure.
Hadoop Technology Advantages & Profile
External: In almost off
cases from outside
corporation. Social
networks or suppliers
Source
Big: Normally used for
up to Tens/Hundreds of
terabytes. Petabyte
scale.
Size
Not structured: Data
not separated in
columns/rows or with
schema.
Structure
Data Management
5
Stores data in several clusters & servers
 NameNode and DataNodes
 Large volume: 200 PB of storage and a single cluster of
4500 servers, supporting close to a billion files and blocks;
 Minimal data motion: Hadoop moves compute processes
to the data on HDFS and not the other way around. Moving
Computation is Cheaper than Moving Data
 Dynamically diagnose: the health of the file system and
rebalance the data on different nodes;
 Rollback: Allows operators to bring back the previous
version of HDFS after an upgrade;
 Node redundancy: Supports high availability (HA);
Storage
Hadoop File System HDFS
Data Management
6
Manages HDFS
 Multi-tenancy: Multiple access engines to use
Hadoop as the common standard for batch,
interactive and real-time engines that can
simultaneously access the same data set.
 Cluster utilization: Dynamic allocation of cluster
resources improves utilization over more static
MapReduce rules used in early versions.
 Scalability: Data center processing power continues
to rapidly expand. Resource Manager scheduling
clusters expand to thousands of nodes managing
petabytes of data.
Processing
Hadoop YARN
Data Access
7
Process Data
 The Map function: Divides input data into ranges
(parts) by the InputFormat and creates a map task
for each range in the input.
 JobTracker distributes those tasks to the
worker nodes. The output of each map task is
partitioned into a group of key-value pairs for
each reduce.
 The Reduce function: Collects the various results
and combines them to answer the larger problem
that the master node needs to solve.
 reduce is able to collect the data from all of
the maps for the keys and combine them to
solve the problem.
Batch
MapReduce
No-structured
Data
Map Reduce Data Analysis
Data Integration & Governance
8
High volume data ingestion
 Stream data
 Ingest streaming data from multiple sources into
Hadoop for storage and analysis
 Guarantee data delivery
 Channel-based transactions to guarantee reliable
message delivery.
 When a message moves from one agent to another, two
transactions are started, one on the agent that delivers
the event and the other on the agent that receives the
event.
 This ensures guaranteed delivery semantics
 Scale horizontally
 To ingest new data streams: Additional volume
Real-time Ingest
Apache FLUME
No-structured
Data
Agent
Nodes
Collector
Nodes
HDFS
Storage
Area
Data Integration & Governance
9
Very-large volume data ingestion
 Fast – benchmarked as processing million+
messages/records per second per node
 Scalable – with parallel calculations that run
across a cluster of machines
 Fault-tolerant – when workers die, Storm will
automatically restart them. If a node dies, the
worker will be restarted on another node.
 Reliable – Storm guarantees that each unit of
data (tuple) will be processed at least once or
exactly once. Messages are only replayed when
there are failures.
Real-time Ingest
Apache STORM
Data Integration & Governance
10
Connects to traditional RDBMS
 Data imports: Moves certain data from external
stores and EDWs into Hadoop to optimize cost-
effectiveness of combined data storage and
processing
 Improvements: Compression, indexing for query
performance
 Parallel data transfer: For faster performance
and optimal system utilization
 Fast data copies: From external systems into
Hadoop
 Load balancing: Mitigates excessive storage
and processing loads to other systems
Batch Integration
Apache SQOOP
Efficient data analysis: Improves efficiency of data analysis by
combining structured data with unstructured data in a
schema-on-read data lake
Data Access
11
Easy programing language
 Easily programmed: Complex tasks
involving interrelated data transformations can
be simplified and encoded as data flow
sequences. Pig programs accomplish huge
tasks, but they are easy to write and maintain
 Iterative data processing: Extract-transform-
load (ETL) data pipelines. Research tools on
raw data.
 Extensible: Pig users can create custom
functions to meet their particular processing
requirements
 Self-optimizing: Because the system
automatically optimizes execution of Pig jobs,
the user can focus on semantics.
Script
Pig
Data Access
12
SQL like tools
 Familiar: Query data with a SQL-based language. similar to
tables in a relational database, and data units are organized in
a taxonomy from larger to more granular units.
 Fast: Interactive response times, even over huge datasets
 Partitioned: Each table can be sub-divided into partitions that
determine how data is distributed within sub-directories of the
table directory. Scalable and Extensible: As data variety and
volume grows, more commodity machines can be added,
without a corresponding reduction in performance.
 Uses JobTracker (MapReduce) functionalities
SQL
Hive
Data Access
13
NoSQL tools with SQLlike command interface
 Apache HBase is an open source NoSQL database that
provides real-time read/write access to those large datasets.
 Scales linearly to handle huge data sets with billions of rows
and millions of columns
 Easily combines data sources that use a wide variety of
different structures and schemas.
 Natively integrated with Hadoop and works seamlessly
alongside other data access engines through YARN.
 Choice for storing semi-structured data like log data.
OnLine
Hbase
Data Access
14
Fast, in-memory data processing to Hadoop.
 Elegant and expressive development APIs in Scala, Java, R,
and Python.
 Allow data workers to efficiently execute streaming, machine
learning or SQL workloads for fast iterative access to
datasets.
 Designed for data science and its abstraction makes data
science easier.
 Data scientists commonly use machine learning – a set of
techniques and algorithms that can learn from data. These
algorithms are often iterative
InMemory
Spark
ERP/DW-BI Platform
15
16
Fast In-memory Database
 Traditional DBMS: SQL interface, Transactional isolation and
recovery (ACID).
 Parallel Data Flow Model: calculations can be executed in
parallel with distribution across hosts.
 Last generation Data Storage:
 Columnar and Row-Based
 Near to eliminate of indexes.
 High Data compression
 Automatic recovery: From memory errors without system
reboot.
 Native tools: Predictive Analysis Library & Analytical and
Special Interfaces
SAP/Hana Architecture
ERP/DW-BI Platform
17
100x faster
 Optimization: InfoCubes and Datastore Objects (DSO)
with better performance.
 Faster remodeling: Improved and Lean Data
Models.Simplified data modeling and reduced
materialized layers
 Datamarts: Integrated and embedded flexibility. Also
may have OLAP and OLTP are executed in one system.
 Increased Flexibility: Optimized Layered Scalable
Architecture. Aggregates and Cubes no more required
(optional).
 Improved response times: for existing transactions and
entire business processes through general performance
improvement of the underlying HANA database
SAP/Hana Technology Advantages & Profile
Big: Not considered BIG
for web 2.0 era. Tens of
terabytes. Not reaching
Petabytes.
Size
Structured: separated
in columns/rows and
with schema.
Structure
Internal: In almost off
cases from INSIDE
corporation, from
ERP/CRM/SCM.
Source
ERP/DW-BI Platform
SAP/Hana
Evolution
 Starting Point: SAP Landscape
consists of SAP ERP running on a
relational database, connected to a
OLAP engine (e.g. SAP BI) and
perhaps using Business Intelligence
Accelerator like BOBJ
Analytics
SAP/BOBJ
OLTP
SAP/ECC
ETL
OLAP
SAP/BW
ERP/DW-BI Platform
 Introducing HANA in parallel: Install
and run the In-Memory engine
(HANA) in TOGETHER with
traditional SAP instances
 02 BW extractors running at same
time and exporting same data
 Key factor: Real data performance
processing COMPARISON
Analytics
SAP/BOBJ
SAP/HANA
2nd ETL
Analytics
SAP/BOBJ
OLTP
SAP/ECC
ETL
OLAP
SAP/BW
SAP/Hana
Evolution
 BW database upgrade: Re-created
traditional-style BI in memory
ERP/DW-BI Platform
OLTP
SAP/ECC
OLAP
SAP/BW
ETL
Analytics
SAP/BOBJ
SAP/HANA
 ERP/BI full database upgrade:
Eliminate traditional database and
run both instances in In-Memory,
using non materialized views
OLTP
SAP/ECC Analytics
SAP/BOBJ
SAP/HANA
OLAP
BI 2.0
20
Sizing on SAP/Hana
ERP/DW-BI Platform
• Memory
• Traditional sizing:
• CPU performance <> SAP HANA memory.
• Master/transactional data > Main memory.
• Main memory required:
• Storing the business data;
• Temporary memory space ;
• Support complex queries;
• Buffers & Caches;
• CPU
• Behaves differently with SAP HANA compared to traditional
databases.
• Querys: Complex & Maximum speed
• Disk Size
• Still required disk storage space.
• Preserve database information if the system
shuts down (either intentionally or due to a
power loss)
• Data changes: Periodically copied to disk
(Ensures a full image of the business data on
disk)
• Logging mechanism: Enable system
recovery.
21
SAP/Hana on VM
ERP/DW-BI Platform
• SAP HANA on vSphere is fully supported
• Combining SAP HANA and vSphere provides additional
benefits with regards to deployment and availability.
• Some remaining customer slots for SAP on SAP HANA
controlled availability Proof Of Concepts
• SAP HANA > BlueMedora Plug-In
• Monitor memory and vCPU utilization
• Add/Delete resources
• Underutilized – Deploy more SAP HANA
• Over utilized – SAP HANA unleased
• Workload management
• Determine Consolidation Ratios
• AmazonAWS: SAP Partner
• SAP BW on HANA Trial - PoC
• The AWS server provides a HANA
• Ready to go in 30min
• OLAP: BW on HANA or any other data
warehouse application predominantly with
OLAP workloads including data marts
running a lot of complex queries
• OLTP: any transactional application like
Business Suite on HANA predominantly
running simple queries or CRUD operations
22
ERP/DW-BI Platform
SAP/Hana on Cloud
• Storage Replication:
• The storage itself replicates all data to another
location within one or between several data
centers. The technology is hardware vendor-
specific and multiple concepts are available on the
market.
• System Replication:
• SAP HANA replicates all data to another location
within one or between several data centers. The
technology is independent from hardware vendor
concepts and reusable with a changing
infrastructure.
23
ERP/DW-BI Platform
Disaster Recovery
Host Auto-Failover
• Standby mode:
• No data; requests or queries.
• When an active (worker) host fails, a standby host
automatically takes its place.
• Since the standby host can take over operations
from any of the primary hosts, it needs access to
all of the database volumes.
• Once repaired:
• The failed host can be rejoined to the system as
the new standby host to re-establish the failure
recovery capability:
24
ERP/DW-BI Platform
SAP/HANA High Availability
ERP/DW-BI & Big Data Platform
25
ERP/DW-BI & Big Data Platform
26
Not Structured Data
Structured Data
ERP/CRM/SCM
Real time
Ingest
Flume
Real time
Ingest
Storm
Batch
Integration
Sqoop
Integration
Processing
YARN
Storage
HDFS
Data Management
Data Access
Script/ETL
Pig
Process
Map
Reduce
SQL like
Hive
OnLine
Hbase
InMemory
Spark
Hortonworks Data Platform
Analytics
Data Repositories
HANA
OLAP
Engine
Predective
Engine
Spatial
Engine
Aplication
Logic & Rendering
Architecture Proposal
ERP/DW-BI & Big Data Platform
27
Business Case: CRM/Retail
Internal structured data source
• Point-of-sale data – Data captured when the
customer makes purchases either in-store or on the
company’s e-commerce site.(04T)
• Inventory and stock Information –Products are in
stock at which locations/promotion. (07T)
• CRM data – From all the interactions the customer
has had with the company at support site.(8T)
• Total data Size: 21T
External Unstructured data source
• Social media data – Customer’s social media
sentiment analysis, such Facebook (70T)
• Historical Web log information – Record of the
customer’s past browsing behavior on the company’s
Web site.(30T)
• Geographic customer behavior: Origin/Destiny
potential customer nearby stores. (20T)
• Total data Size: 120T
ERP/DW-BI & Big Data Platform
28
Business Case: Data Process
dbernard@d2-data.com
Questions?
29

More Related Content

What's hot

Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707Henrique Pinto
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)Will Gardella
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Data Con LA
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP Technology
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Ocean9, Inc.
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionDataWorks Summit/Hadoop Summit
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_finalEMC
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 

What's hot (20)

Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707SAP HANA Vora SITMTY 20160707
SAP HANA Vora SITMTY 20160707
 
Big data/Hadoop/HANA Basics
Big data/Hadoop/HANA BasicsBig data/Hadoop/HANA Basics
Big data/Hadoop/HANA Basics
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
 
SAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop IntegrationSAP HANA SPS10- Hadoop Integration
SAP HANA SPS10- Hadoop Integration
 
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
Hadoop, Spark and Big Data Summit presentation with SAP HANA Vora and a path ...
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Hawq wp 042313_final
Hawq wp 042313_finalHawq wp 042313_final
Hawq wp 042313_final
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 

Viewers also liked

Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...Richard Winter
 
Hug India Jul10 Hadoop Map
Hug India Jul10 Hadoop MapHug India Jul10 Hadoop Map
Hug India Jul10 Hadoop MapSanjay Sharma
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Sanjay Sharma
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkSanjay Sharma
 
Bp presentation business intelligence and advanced data analytics september ...
Bp presentation business intelligence  and advanced data analytics september ...Bp presentation business intelligence  and advanced data analytics september ...
Bp presentation business intelligence and advanced data analytics september ...Barrett Peterson
 
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDPViplava Kumar Madasu
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationUyoyo Edosio
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringHakka Labs
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBMongoDB
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
 

Viewers also liked (12)

Tcod a framework for the total cost of big data - december 6 2013 - winte...
Tcod   a framework for the total cost of big data  - december 6 2013  - winte...Tcod   a framework for the total cost of big data  - december 6 2013  - winte...
Tcod a framework for the total cost of big data - december 6 2013 - winte...
 
Hug India Jul10 Hadoop Map
Hug India Jul10 Hadoop MapHug India Jul10 Hadoop Map
Hug India Jul10 Hadoop Map
 
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
Cloud expo june 2013: Building a Real Time Analytics Platform on Big Data in ...
 
NYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talkNYC Cassandra March 13- lighting talk
NYC Cassandra March 13- lighting talk
 
Bp presentation business intelligence and advanced data analytics september ...
Bp presentation business intelligence  and advanced data analytics september ...Bp presentation business intelligence  and advanced data analytics september ...
Bp presentation business intelligence and advanced data analytics september ...
 
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Big DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and ApplicationBig DataParadigm, Challenges, Analysis, and Application
Big DataParadigm, Challenges, Analysis, and Application
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
Building LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDBBuilding LinkedIn's Learning Platform with MongoDB
Building LinkedIn's Learning Platform with MongoDB
 
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceLeveraging Microsoft Power BI To Support Enterprise Business Intelligence
Leveraging Microsoft Power BI To Support Enterprise Business Intelligence
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 

Similar to How can Hadoop & SAP be integrated

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdfavenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCPBlibBlobb
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 

Similar to How can Hadoop & SAP be integrated (20)

Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big data
Big dataBig data
Big data
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
paper
paperpaper
paper
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 

More from Douglas Bernardini

More from Douglas Bernardini (19)

Top reasons to choose SAP hana
Top reasons to choose SAP hanaTop reasons to choose SAP hana
Top reasons to choose SAP hana
 
The REAL face of Big Data
The REAL face of Big DataThe REAL face of Big Data
The REAL face of Big Data
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapRHadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
 
R-language
R-languageR-language
R-language
 
REDSHIFT - Amazon
REDSHIFT - AmazonREDSHIFT - Amazon
REDSHIFT - Amazon
 
Splunk
SplunkSplunk
Splunk
 
Finance month closing with HANA
Finance month closing with HANAFinance month closing with HANA
Finance month closing with HANA
 
RDBMS x NoSQL
RDBMS x NoSQLRDBMS x NoSQL
RDBMS x NoSQL
 
SAP - SOLUTION MANAGER
SAP - SOLUTION MANAGER SAP - SOLUTION MANAGER
SAP - SOLUTION MANAGER
 
MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
DBA oracle
DBA oracleDBA oracle
DBA oracle
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
SAP Business Objects - Lopes Supermarket
SAP   Business Objects - Lopes SupermarketSAP   Business Objects - Lopes Supermarket
SAP Business Objects - Lopes Supermarket
 
SAP - Business Objects - Ri happy
SAP - Business Objects - Ri happySAP - Business Objects - Ri happy
SAP - Business Objects - Ri happy
 
Hadoop on retail
Hadoop on retailHadoop on retail
Hadoop on retail
 
Retail: Big data e Omni-Channel
Retail: Big data e Omni-ChannelRetail: Big data e Omni-Channel
Retail: Big data e Omni-Channel
 
Granular Access Control Using Cell Level Security In Accumulo
Granular Access Control  Using Cell Level Security  In Accumulo             Granular Access Control  Using Cell Level Security  In Accumulo
Granular Access Control Using Cell Level Security In Accumulo
 
Proposta aderencia drogaria onofre
Proposta aderencia   drogaria onofreProposta aderencia   drogaria onofre
Proposta aderencia drogaria onofre
 
SAP-Solution-Manager
SAP-Solution-ManagerSAP-Solution-Manager
SAP-Solution-Manager
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

How can Hadoop & SAP be integrated

  • 1. Running together in Retail Environment 1 Author: Douglas Bernardini
  • 3. Big Data Platform 3  Collection of Hadoop & Apache solutions running together and integrated  Open-source: Apache Software Foundation.  Works across component technologies and integrates with pre-existing EDW, RDBMS and MPP systems.  Linux and Windows.  Authentication, Authorization, & Data Protection.  Native integration with Major BI/ analytics developers & vendors. HDP Platform overview Real time Ingest Flume Real time Ingest Storm Batch Integration Sqoop Integration Processing YARN Storage HDFS Data Management Data Access Script/ETL Pig Process Map Reduce SQL like Hive OnLine Hbase InMemory Spark Hortonworks Data Platform (HDP)
  • 4. Big Data Platform 4  Scalable: Store/Distribute very large data sets across hundreds of servers operating in parallel, with thousands of nodes involving thousands of terabytes of data.  Cost effective: Savings are staggering, offering computing and storage capabilities for hundreds of dollars per terabyte.  Flexible: an be used for a wide variety of purposes, such as log processing, recommendation systems, data warehousing, market campaign analysis and fraud detection.  Fast: able to efficiently process terabytes of data in just minutes, and petabytes in hours  Resilient to failure: data in individual nodeis also replicated to other nodes in the cluster, which means that in the event of failure. Hadoop Technology Advantages & Profile External: In almost off cases from outside corporation. Social networks or suppliers Source Big: Normally used for up to Tens/Hundreds of terabytes. Petabyte scale. Size Not structured: Data not separated in columns/rows or with schema. Structure
  • 5. Data Management 5 Stores data in several clusters & servers  NameNode and DataNodes  Large volume: 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks;  Minimal data motion: Hadoop moves compute processes to the data on HDFS and not the other way around. Moving Computation is Cheaper than Moving Data  Dynamically diagnose: the health of the file system and rebalance the data on different nodes;  Rollback: Allows operators to bring back the previous version of HDFS after an upgrade;  Node redundancy: Supports high availability (HA); Storage Hadoop File System HDFS
  • 6. Data Management 6 Manages HDFS  Multi-tenancy: Multiple access engines to use Hadoop as the common standard for batch, interactive and real-time engines that can simultaneously access the same data set.  Cluster utilization: Dynamic allocation of cluster resources improves utilization over more static MapReduce rules used in early versions.  Scalability: Data center processing power continues to rapidly expand. Resource Manager scheduling clusters expand to thousands of nodes managing petabytes of data. Processing Hadoop YARN
  • 7. Data Access 7 Process Data  The Map function: Divides input data into ranges (parts) by the InputFormat and creates a map task for each range in the input.  JobTracker distributes those tasks to the worker nodes. The output of each map task is partitioned into a group of key-value pairs for each reduce.  The Reduce function: Collects the various results and combines them to answer the larger problem that the master node needs to solve.  reduce is able to collect the data from all of the maps for the keys and combine them to solve the problem. Batch MapReduce No-structured Data Map Reduce Data Analysis
  • 8. Data Integration & Governance 8 High volume data ingestion  Stream data  Ingest streaming data from multiple sources into Hadoop for storage and analysis  Guarantee data delivery  Channel-based transactions to guarantee reliable message delivery.  When a message moves from one agent to another, two transactions are started, one on the agent that delivers the event and the other on the agent that receives the event.  This ensures guaranteed delivery semantics  Scale horizontally  To ingest new data streams: Additional volume Real-time Ingest Apache FLUME No-structured Data Agent Nodes Collector Nodes HDFS Storage Area
  • 9. Data Integration & Governance 9 Very-large volume data ingestion  Fast – benchmarked as processing million+ messages/records per second per node  Scalable – with parallel calculations that run across a cluster of machines  Fault-tolerant – when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node.  Reliable – Storm guarantees that each unit of data (tuple) will be processed at least once or exactly once. Messages are only replayed when there are failures. Real-time Ingest Apache STORM
  • 10. Data Integration & Governance 10 Connects to traditional RDBMS  Data imports: Moves certain data from external stores and EDWs into Hadoop to optimize cost- effectiveness of combined data storage and processing  Improvements: Compression, indexing for query performance  Parallel data transfer: For faster performance and optimal system utilization  Fast data copies: From external systems into Hadoop  Load balancing: Mitigates excessive storage and processing loads to other systems Batch Integration Apache SQOOP Efficient data analysis: Improves efficiency of data analysis by combining structured data with unstructured data in a schema-on-read data lake
  • 11. Data Access 11 Easy programing language  Easily programmed: Complex tasks involving interrelated data transformations can be simplified and encoded as data flow sequences. Pig programs accomplish huge tasks, but they are easy to write and maintain  Iterative data processing: Extract-transform- load (ETL) data pipelines. Research tools on raw data.  Extensible: Pig users can create custom functions to meet their particular processing requirements  Self-optimizing: Because the system automatically optimizes execution of Pig jobs, the user can focus on semantics. Script Pig
  • 12. Data Access 12 SQL like tools  Familiar: Query data with a SQL-based language. similar to tables in a relational database, and data units are organized in a taxonomy from larger to more granular units.  Fast: Interactive response times, even over huge datasets  Partitioned: Each table can be sub-divided into partitions that determine how data is distributed within sub-directories of the table directory. Scalable and Extensible: As data variety and volume grows, more commodity machines can be added, without a corresponding reduction in performance.  Uses JobTracker (MapReduce) functionalities SQL Hive
  • 13. Data Access 13 NoSQL tools with SQLlike command interface  Apache HBase is an open source NoSQL database that provides real-time read/write access to those large datasets.  Scales linearly to handle huge data sets with billions of rows and millions of columns  Easily combines data sources that use a wide variety of different structures and schemas.  Natively integrated with Hadoop and works seamlessly alongside other data access engines through YARN.  Choice for storing semi-structured data like log data. OnLine Hbase
  • 14. Data Access 14 Fast, in-memory data processing to Hadoop.  Elegant and expressive development APIs in Scala, Java, R, and Python.  Allow data workers to efficiently execute streaming, machine learning or SQL workloads for fast iterative access to datasets.  Designed for data science and its abstraction makes data science easier.  Data scientists commonly use machine learning – a set of techniques and algorithms that can learn from data. These algorithms are often iterative InMemory Spark
  • 16. 16 Fast In-memory Database  Traditional DBMS: SQL interface, Transactional isolation and recovery (ACID).  Parallel Data Flow Model: calculations can be executed in parallel with distribution across hosts.  Last generation Data Storage:  Columnar and Row-Based  Near to eliminate of indexes.  High Data compression  Automatic recovery: From memory errors without system reboot.  Native tools: Predictive Analysis Library & Analytical and Special Interfaces SAP/Hana Architecture ERP/DW-BI Platform
  • 17. 17 100x faster  Optimization: InfoCubes and Datastore Objects (DSO) with better performance.  Faster remodeling: Improved and Lean Data Models.Simplified data modeling and reduced materialized layers  Datamarts: Integrated and embedded flexibility. Also may have OLAP and OLTP are executed in one system.  Increased Flexibility: Optimized Layered Scalable Architecture. Aggregates and Cubes no more required (optional).  Improved response times: for existing transactions and entire business processes through general performance improvement of the underlying HANA database SAP/Hana Technology Advantages & Profile Big: Not considered BIG for web 2.0 era. Tens of terabytes. Not reaching Petabytes. Size Structured: separated in columns/rows and with schema. Structure Internal: In almost off cases from INSIDE corporation, from ERP/CRM/SCM. Source ERP/DW-BI Platform
  • 18. SAP/Hana Evolution  Starting Point: SAP Landscape consists of SAP ERP running on a relational database, connected to a OLAP engine (e.g. SAP BI) and perhaps using Business Intelligence Accelerator like BOBJ Analytics SAP/BOBJ OLTP SAP/ECC ETL OLAP SAP/BW ERP/DW-BI Platform  Introducing HANA in parallel: Install and run the In-Memory engine (HANA) in TOGETHER with traditional SAP instances  02 BW extractors running at same time and exporting same data  Key factor: Real data performance processing COMPARISON Analytics SAP/BOBJ SAP/HANA 2nd ETL Analytics SAP/BOBJ OLTP SAP/ECC ETL OLAP SAP/BW
  • 19. SAP/Hana Evolution  BW database upgrade: Re-created traditional-style BI in memory ERP/DW-BI Platform OLTP SAP/ECC OLAP SAP/BW ETL Analytics SAP/BOBJ SAP/HANA  ERP/BI full database upgrade: Eliminate traditional database and run both instances in In-Memory, using non materialized views OLTP SAP/ECC Analytics SAP/BOBJ SAP/HANA OLAP BI 2.0
  • 20. 20 Sizing on SAP/Hana ERP/DW-BI Platform • Memory • Traditional sizing: • CPU performance <> SAP HANA memory. • Master/transactional data > Main memory. • Main memory required: • Storing the business data; • Temporary memory space ; • Support complex queries; • Buffers & Caches; • CPU • Behaves differently with SAP HANA compared to traditional databases. • Querys: Complex & Maximum speed • Disk Size • Still required disk storage space. • Preserve database information if the system shuts down (either intentionally or due to a power loss) • Data changes: Periodically copied to disk (Ensures a full image of the business data on disk) • Logging mechanism: Enable system recovery.
  • 21. 21 SAP/Hana on VM ERP/DW-BI Platform • SAP HANA on vSphere is fully supported • Combining SAP HANA and vSphere provides additional benefits with regards to deployment and availability. • Some remaining customer slots for SAP on SAP HANA controlled availability Proof Of Concepts • SAP HANA > BlueMedora Plug-In • Monitor memory and vCPU utilization • Add/Delete resources • Underutilized – Deploy more SAP HANA • Over utilized – SAP HANA unleased • Workload management • Determine Consolidation Ratios
  • 22. • AmazonAWS: SAP Partner • SAP BW on HANA Trial - PoC • The AWS server provides a HANA • Ready to go in 30min • OLAP: BW on HANA or any other data warehouse application predominantly with OLAP workloads including data marts running a lot of complex queries • OLTP: any transactional application like Business Suite on HANA predominantly running simple queries or CRUD operations 22 ERP/DW-BI Platform SAP/Hana on Cloud
  • 23. • Storage Replication: • The storage itself replicates all data to another location within one or between several data centers. The technology is hardware vendor- specific and multiple concepts are available on the market. • System Replication: • SAP HANA replicates all data to another location within one or between several data centers. The technology is independent from hardware vendor concepts and reusable with a changing infrastructure. 23 ERP/DW-BI Platform Disaster Recovery
  • 24. Host Auto-Failover • Standby mode: • No data; requests or queries. • When an active (worker) host fails, a standby host automatically takes its place. • Since the standby host can take over operations from any of the primary hosts, it needs access to all of the database volumes. • Once repaired: • The failed host can be rejoined to the system as the new standby host to re-establish the failure recovery capability: 24 ERP/DW-BI Platform SAP/HANA High Availability
  • 25. ERP/DW-BI & Big Data Platform 25
  • 26. ERP/DW-BI & Big Data Platform 26 Not Structured Data Structured Data ERP/CRM/SCM Real time Ingest Flume Real time Ingest Storm Batch Integration Sqoop Integration Processing YARN Storage HDFS Data Management Data Access Script/ETL Pig Process Map Reduce SQL like Hive OnLine Hbase InMemory Spark Hortonworks Data Platform Analytics Data Repositories HANA OLAP Engine Predective Engine Spatial Engine Aplication Logic & Rendering Architecture Proposal
  • 27. ERP/DW-BI & Big Data Platform 27 Business Case: CRM/Retail Internal structured data source • Point-of-sale data – Data captured when the customer makes purchases either in-store or on the company’s e-commerce site.(04T) • Inventory and stock Information –Products are in stock at which locations/promotion. (07T) • CRM data – From all the interactions the customer has had with the company at support site.(8T) • Total data Size: 21T External Unstructured data source • Social media data – Customer’s social media sentiment analysis, such Facebook (70T) • Historical Web log information – Record of the customer’s past browsing behavior on the company’s Web site.(30T) • Geographic customer behavior: Origin/Destiny potential customer nearby stores. (20T) • Total data Size: 120T
  • 28. ERP/DW-BI & Big Data Platform 28 Business Case: Data Process