Distributed Data Processing Workshop - SBU

1
کارگاه پردازش داده توزیع شده
پردیس- شهیدبهشتی
دانشکده علوم و مهندسی کامپیوتر
درس: پایگاه داده توزیع شده
استاد: دکتر هادی طباطبایی
ارائه: ابوالفضل صدیقی
آبان ۱۳۹۳

Distributed Data Processing
School of Computer Science and Engineering
A. Sedighi
@amirsedighi
Hexican.com
sedighi@gmail.com

3
Every Game needs it's Playing Yard

4
Every Game needs it's Playing Yard

What can I do on a Single Machine?
5
● MVC Programming
● Regular Biz Apps
● 100 GBs Data
● Web Surfing
● ...

9
Introduction
This is a 4 sessions, hands-on, step-by-step
tutorial on setting up, a Linux cluster on your
machine (Notebook or PC), to try a few number
of big-data processing frameworks and tools.

10
What we are going to do?
● Your notebook, or a PC is just enough for starting.
– Setting your Linux cluster up.
● Distributed Log Management and Realtime Search-Engines
– What is Elasticsearch?
– Elasticsearch on the cluster.
– Monitoring and Usage.
● The most popular Distributed Data Processing Framework.
– What is Apache Hadoop?
– Apache Hadoop on the cluster.
– Using Scenarios.

11
What we would Learn?
● Leveraging our knowledge of Big-Data.
● Getting familiar with distributed data processing.
● Maximizing availability and reliability.
● Increasing data storage capacity.
● Leveraging data processing performance.
● Data locality is a silver bullet.
● Increasing cluster utilization.
● Taming giants by giving them a try.

12
Preparing the Linux Cluster -
VirtualBox

13
Preparing the Cluster - Hosting
● VirtualBox
– Memory Size, Disk Capacity and CPU cores.
– Network Interfaces.
● NAT, provides Internet.
● Host-Only, provides cluster communication.

14
Preparing the Cluster – Adding a
Host-Only Network

15
NAT Interface

16
Host-Only Interface

17
Preparing the Cluster – First Node
● Creating a Linux machine inside VirtualBox.
● Installing Linux. (I've used Ubuntu 12.04)
– Check Samba
– Check OpenSSH
● Give the first node all.
– Having an “install” folder on.
– Having primitives such as Java installed on.
● Shutting down the first node.

Preparing the Cluster – Cloning, The
18
Virtual Box Side
● Cloning the first node. (tutorial)

Preparing the Cluster – Cloning, the
19
Linux side
● Turning the new node on.
● Network configuration
– sudo nano /etc/hosts
– sudo nano /etc/hostname
– sudo nano /etc/network/interfaces
– sudo rm /etc/udev/rules.d/70-persistent-net.rules
● sudo reboot

20
Preparing the Cluster – No
Password Login
● Do this:
– ssh-keygen
– ssh-copy-id -i ~/.ssh/id_rsa.pub user@host
● Or this:
– ssh-keygen -t dsa -p '' -f ~/.ssh/id_dsa
– scp .ssh/id_rsa.pub user@host:~/master_key
– ssh user@host
– cat master_key >> ./ssh/authorized_keys

21
Preparing the Cluster – Distributed
Shell
● Do it like a Commander
– Installing DSH (Optional)

22
Preparing the Cluster – Enjoy it
● To scale your cluster just repeat the cloning
step.

23
Next?
● An introduction to distributed Log Management
and analytical search-engines.
– How Elasticsearch works?
– Workshop.
● An introduction to Apache Hadoop
– How Apache Hadoop works?
– Workshop.

Distributed Data Processing Workshop - SBU

More Related Content

What's hot

Viewers also liked

Similar to Distributed Data Processing Workshop - SBU

More from Amir Sedighi

Recently uploaded

Distributed Data Processing Workshop - SBU