Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distributed Data Processing Workshop - SBU

8,139 views

Published on

This presentation is about how to prepare a distributed data processing environment on your PC.

Published in: Data & Analytics
  • Be the first to comment

Distributed Data Processing Workshop - SBU

  1. 1. 1 کارگاه پردازش داده توزیع شده پردیس- شهیدبهشتی دانشکده علوم و مهندسی کامپیوتر درس: پایگاه داده توزیع شده استاد: دکتر هادی طباطبایی ارائه: ابوالفضل صدیقی آبان ۱۳۹۳
  2. 2. Distributed Data Processing School of Computer Science and Engineering A. Sedighi @amirsedighi Hexican.com sedighi@gmail.com
  3. 3. 3 Every Game needs it's Playing Yard
  4. 4. 4 Every Game needs it's Playing Yard
  5. 5. What can I do on a Single Machine? 5 ● MVC Programming ● Regular Biz Apps ● 100 GBs Data ● Web Surfing ● ...
  6. 6. 6 Linux Cluster
  7. 7. 7
  8. 8. 8
  9. 9. 9 Introduction This is a 4 sessions, hands-on, step-by-step tutorial on setting up, a Linux cluster on your machine (Notebook or PC), to try a few number of big-data processing frameworks and tools.
  10. 10. 10 What we are going to do? ● Your notebook, or a PC is just enough for starting. – Setting your Linux cluster up. ● Distributed Log Management and Realtime Search-Engines – What is Elasticsearch? – Elasticsearch on the cluster. – Monitoring and Usage. ● The most popular Distributed Data Processing Framework. – What is Apache Hadoop? – Apache Hadoop on the cluster. – Using Scenarios.
  11. 11. 11 What we would Learn? ● Leveraging our knowledge of Big-Data. ● Getting familiar with distributed data processing. ● Maximizing availability and reliability. ● Increasing data storage capacity. ● Leveraging data processing performance. ● Data locality is a silver bullet. ● Increasing cluster utilization. ● Taming giants by giving them a try.
  12. 12. 12 Preparing the Linux Cluster - VirtualBox
  13. 13. 13 Preparing the Cluster - Hosting ● VirtualBox – Memory Size, Disk Capacity and CPU cores. – Network Interfaces. ● NAT, provides Internet. ● Host-Only, provides cluster communication.
  14. 14. 14 Preparing the Cluster – Adding a Host-Only Network
  15. 15. 15 Preparing the Cluster – Adding a NAT Interface
  16. 16. 16 Preparing the Cluster – Adding a Host-Only Interface
  17. 17. 17 Preparing the Cluster – First Node ● Creating a Linux machine inside VirtualBox. ● Installing Linux. (I've used Ubuntu 12.04) – Check Samba – Check OpenSSH ● Give the first node all. – Having an “install” folder on. – Having primitives such as Java installed on. ● Shutting down the first node.
  18. 18. Preparing the Cluster – Cloning, The 18 Virtual Box Side ● Cloning the first node. (tutorial)
  19. 19. Preparing the Cluster – Cloning, the 19 Linux side ● Turning the new node on. ● Network configuration – sudo nano /etc/hosts – sudo nano /etc/hostname – sudo nano /etc/network/interfaces – sudo rm /etc/udev/rules.d/70-persistent-net.rules ● sudo reboot
  20. 20. 20 Preparing the Cluster – No Password Login ● Do this: – ssh-keygen – ssh-copy-id -i ~/.ssh/id_rsa.pub user@host ● Or this: – ssh-keygen -t dsa -p '' -f ~/.ssh/id_dsa – scp .ssh/id_rsa.pub user@host:~/master_key – ssh user@host – cat master_key >> ./ssh/authorized_keys
  21. 21. 21 Preparing the Cluster – Distributed Shell ● Do it like a Commander – Installing DSH (Optional)
  22. 22. 22 Preparing the Cluster – Enjoy it ● To scale your cluster just repeat the cloning step.
  23. 23. 23 Next? ● An introduction to distributed Log Management and analytical search-engines. – How Elasticsearch works? – Workshop. ● An introduction to Apache Hadoop – How Apache Hadoop works? – Workshop.

×