2. CONTENTS
1.1 What is Hadoop Technology ?
1.2 Why Hadoop ?
1.3 When to use or not to use Hadoop
1.4 Hadoop’s Developers
1.5 Uses for Hadoop
1.6 Who Uses Hadoop
1.Hadoop
3. CONTENTS
3.1 Virtualization ?
3.2 What is NV ?
3.3 Virtual Networking : Namespaces and Open Vswitch
3.4 What is Mininet ?
3.5 Why Mininet ?
3.6 Writing Own Topologies
2.Clustering
3.Network Virtualization
4. 1.1 What is Hadoop Technology
Open source software framework designed for storage and processing of large
scale data on clusters of commodity hardware
Created by Doug Cutting and Mike Carafella in 2005.
Cutting named the program after his son’s toy elephant.
5. 1.2 Why Hadoop
Distributed cluster system
Platform for massively scalable applications
Enables parallel data processing
6. 1.3 When to use or not to use Hadoop
Hadoop is good for
Indexing data
Log analysis
Image manipulation
Sorting large scale data
Data mining
Hadoop is NOT good for
Real time processing (Hadoop is batch oriented)
Random access (Hadoop is not database)
Computation-intensive tasks with little data
7. 1.4 Hadoop’s Developers
Doug Cutting
2005: Doug Cutting and Michael J. Cafarella
developed Hadoop to support distribution for
the Nutch search engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to Apache
Software Foundation.
8. 1.5 Uses for Hadoop
Data-intensive text processing
Assembly of large genomes
Graph mining
Machine learning and data mining
Large scale social network analysis
10. 2.Clustering
Load all the required packages to implement k-Means Clustering algorithm
Creating Lists and displaying them
Plot and display scatter chart of x and y
Creating an array X which stores pair (x, y)
Apply KMeans function with two number of clusters and store its output in
variable kmeans, representing a clustering model
kmeans=KMeans(n_clusters=2)
Fit kmeans clustering model on array X.
Extract centroids and labels from the model kmeans and print them on console
Open dataset file “faithful.csv” and store it a variable “d”
Display scatter chart showing all elements of the datasets with designated
clusters and centroids
11. 3.Network Virtualization
Virtualization:
Transparent abstraction of the physical resources
that supports multiple logical views of their properties
Virtual Anything:
o Virtual Memory ( we know this)
o Process Abstraction of OS (we know this too)
o Port abstraction at Transport Layer (we saw this)
o Virtual Machines (OS platform)
3.1 Virtualization ?
13. 3.Network Virtualization
3.3 Virtual Networking : Namespaces and Open Vswitch
h1 and h2 in separate network name spaces
Open Vswitch in root namespace
Let’s see how we can do this…
14. 3.Network Virtualization
# Create host namespaces
ip netns add h1
ip netns add h2
# Create switch
ovs-vsctl add-br s1
# Create links
ip link add h1-eth0 type veth peer name s1-eth1
ip link add h2-eth0 type veth peer name s1-eth2
ip link show
3.3 Virtual Networking : Namespaces and Open Vswitch
15. 3.Network Virtualization
A virtual network environment that can run on single PC
Runs real kernel, switch and application code on a single machine:
CLI, UI, Python Interface
Many OpenFlow Features are built in
Useful for SDN experimentation
3.4 What is Mininet ?
16. 3.Network Virtualization
Fast
Custom topology creation possible
Can run real programs
Anything that can run on Linux can run on a Mininet host.
Programmable OpenFlow switches:
Useful for SDN
Open Source
3.5 Why Mininet ?