The document discusses using SDN (software-defined networking) to improve big data applications. SDN can optimize data flows within and between data centers to improve the efficiency of big data tasks like transferring large files. By controlling data flows, an SDN controller can prioritize large transfers and handle varying traffic patterns to help big data applications run smoothly. Open issues that remain include scalable controller management and intelligent flow table and rule management.
2. “SDN for Big Data and Big
data for SDN”
Introduction
Big Data & its features
Software Defined Network & its features
SDN for Big Data
EXPECTED BENEFITS OF SDN
IMPROVEMENT TECHNIQUES
OPEN ISSUES
BY AHMED KASSAB
2
8/13/2018
3. Big data
Big data refers to data that exceeds conventional sizes .
Although there is no specific definition of the term, it
generally means that data that can not be handled using
conventional methods of data processing
BY AHMED KASSAB
3
8/13/2018
6. HANA
SAP HANA Hadoop Integration. Regardless of structure, you can
combine the in-memory processing power of SAP HANA with Hadoop's
ability to store and process huge amounts of data. ... SAP HANA Hadoop
integration is designed for users who may want to start using SAP HANA
with their Hadoop ecosystem.
BY AHMED KASSAB
6
8/13/2018
7. HADOOP
Apache Hadoop is a collection of open-source software utilities that facilitate using
a network of many computers to solve problems involving massive amounts of
data and computation
Hadoop almost the industry leader in big data storage
BY AHMED KASSAB
7
8/13/2018
8. EIM
Enterprise information management (EIM) is a field of interest within information
technology. It specializes in finding solutions for optimal use of information within
organizations, for instance to support decision-making processes or day-to-day
operations that require the availability of knowledge.
BY AHMED KASSAB
8
8/13/2018
9. EPM
system includes a suite of performance management applications, a suite of
business intelligence (BI) applications, a common foundation of BI tools and
services, and a variety of datasources—all integrated using Oracle Fusion
Middleware.
Business intelligence combines a broad set of data analysis applications, including
ad hoc analytics and querying, enterprise reporting, online analytical processing
(OLAP), mobile BI, real-time BI, operational BI, cloud and software-as-a-service BI,
open source BI, collaborative BI, and location intelligence
BY AHMED KASSAB
9
8/13/2018
10. EDW
In computing, a data warehouse (DW or DWH), also known as an enterprise data
warehouse (EDW), is a system used for reporting and data analysis, and is
considered a core component of business intelligence. DWs are central repositories
of integrated data from one or more disparate sources.
BY AHMED KASSAB
10
8/13/2018
11. BIG DATA
Big data applications depend on underlying networks that make the
transfer of information possible. These networks may be real
(conventional) or virtual (in case of services hosted in data centers).
Either way, the responsibility of smooth execution of the application,
despite increasing traffic volume, lies with the service provider. The
service providers face many challenges with respect to providing a
high quality of service .It is therefore in the best interest of the service
providers that efficiency of the applications is increased. SDN has the
potential to improve big data application performance.
BY AHMED KASSAB
11
8/13/2018
13. SDN
SDN is now a dominant technology paradigm for controlling networks. Existing
networks are increasingly being updated to support SDN because of the benefits
that SDN brings in the network. The primary advantage that SDN offers is
afforded by virtue of separating the control and data plane within the network
BY AHMED KASSAB
13
8/13/2018
14. SDN FOR BIG DATA
An SDN controller, if employed, can help in this scenario by changing the routing
decisions based on the traffic demand while AWS manages increase/decrease in
the resources required to run the service. Since big data applications are specific
applications developed to manage large files over the same networks, there are a
number of problems that may occur during transfer. These problems are related
to how TCP works and how large files are transferred at TCP sessions, however the
primary problem is that the network condition may change during the transfer
since the file is so large. Also, Hadoop uses data nodes to store and process files.
These data nodes are servers or VMs within a data center environment. When a
file is stored on the system it is divided into smaller, more managable parts, and
each part is stored separately [1]. When requested the file parts are retrieved from
multiple locations within the data center and sent to the public data network
(PDN). The part of transfer that happens within the data center is obviously
effected by the network conditions within the data center. This is where SDN can
help and bring about efficiency improvements
BY AHMED KASSAB
14
8/13/2018
15. EXPECTED BENEFITS OF USING
SDN IN BIG DATA
1.Within the data center:
Servers that act as data nodes transfer huge amount of information to
other servers and servicesit undergoes processing that involves break
down of large files into smaller chunks. These chunks are transported to
respective data nodes using the data center network use of SDN. When
used within the data center (which is the most probable use-case) SDN
controller controls flow of information between data nodes, as shown in
Fig. 1. Controller can also be set up to identify larger flows and prioritize
such flows in order for the application to run smoothly. Besides flow
optimization, the controller can be optimized to handle varying traffic
patterns.
BY AHMED KASSAB
15
8/13/2018
16. EXPECTED BENEFITS OF SDN
2.Between data centers:
The benefit is that the same policies can be implemented across
geographical regions. This situation is also depicted in Fig. 1. The basic
principle or operation is the same however local SDN controllers handle
traffic within one data center whereas the master SDN controller
controls traffic between data centers
BY AHMED KASSAB
16
8/13/2018
18. EXPECTED BENEFITS OF SDN
It is pertinent to note that multi-tier controller placement is not far fetched since all
data centers support applications and virtual machines across multiple
geographical locations while remaining a part of the same application. Such an
arrangement can only work however when both the data centers are connected
using direct links, which is typically the case.
BY AHMED KASSAB
18
8/13/2018
19. IMPROVEMENT TECHNIQUES
Since the term big data became popular, the research community is working to find
ways to employ SDN to make big data application more and more efficient. by
improving existing protocols and/or techniques, TCP window size defines how quickly
information is transported from source to destination. Using existing protocols,
If a problem is faced during the transfer the widnow is again shrinked. It is therefore an
adaptive mechanism however for transfer of larger files this may create further
congestion.
times and slight improvement in map/reduce performance can result in overall
significant relevant improvement in the application. One thing that is unique to big
data applications is that the request is smaller that response in term of data. For
instance, a request of few kilo bytes can fetch a file as large as few giga bytes Flow
optimization is another technique that is used to hardcode flows that results in not
only optimizing table space but flow search time as well. Rule caching is another
technique that is employed in order to burn most frequently used rules in the
hardware. Burning in hardware results in added speed of execution of the rule
BY AHMED KASSAB
19
8/13/2018
21. OPEN ISSUES
Scalable controller management
Intelligent flow table / rule management
High flexible language abstraction
Wireless mobile big data
BY AHMED KASSAB
21
8/13/2018