Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Virtual machines for data science - how to install and use

7,295 views

Published on

Tutorial on how to install and use virtual machines for Data Science, for both Python and Spark virtual machine.
This tutorial assumes that user already has successfully installed VirtualBox and Vagrant.

If you need to install VirtualBox and Vagrant, please read those manuals:

- Linux - http://www.slideshare.net/datascienceschool/virtual-machines-for-data-science-how-to-install-linux
- Windows - http://www.slideshare.net/datascienceschool/virtual-machines-for-data-science-how-to-install-windows
- Mac OS X - http://www.slideshare.net/datascienceschool/virtual-machines-for-data-science-how-to-install-mac-os-x

Published in: Data & Analytics

Virtual machines for data science - how to install and use

  1. 1. Virtual Machine install and usage For Windows, MacOS X and Linux
  2. 2. Install For Windows, MacOS X and Linux
  3. 3. Intro This tutorial will show you how to install virtual machines for data science. There are 2 different machines available: 1. Python machine - which is suitable for many data science projects with Python scientific stack (numpy, scikit-learn, networkx etc.) 2. Spark machine - which is suitable for practicing big data projects with Apache Spark big data processing engine (including Spark MLlib, SQL etc.)
  4. 4. Prerequisites This tutorial assumes, that you have already successfully installed VirtualBox and Vagrant file, which are necessary software tools to run Virtual Machine. If you don’t have VirtualBox and Vagrant installed, please make sure to follow instructions here, before you continue: 1. Windows instructions 2. Linux instructions 3. Mac OS X instructions If you already have VirtualBox and Vagrant installed, please follow next slides
  5. 5. Installing the VM After you have installed VirtualBox and Vagrant, now you are ready to finally install Virtual Machine, which is actually quite easy: ➔ Download “Vagrantfile” for selected virtual machine (Spark or Python) from the website -vm.datascience-school.com or directly from here: 1. Python Virtual Machine - download link 2. Spark Virtual Machine - download link ➔ This file would allow you to download virtual machine in the next step
  6. 6. Third step: Installing the VM ➔ Create a custom directory of your choice, for example: 1) For Windows: C:Vagrant_Spark or C:Vagrant_Python Important: user_name must be written in English and you need to have Microsoft Visual c++ on your computer 2) For Linux: /home/user_name/Vagrant_Spark or home/user_name/Vagrant_Python 3) For Mac OS X: /home/user_name/Vagrant_Spark or home/user_name/Vagrant_Python ➔ Put your downloaded “Vagrantfile” to the custom directory which you have created in first step (NOTE: File must be named exactly "Vagrantfile" with no extension) ➔ Open a Console / Terminal, change to the custom directory you created in first step ● For Windows: cd C:/user/admin/dir_name ● For Linux & Mac OS X: cd /home/user_name/dir_name and issue the command "vagrant up --provider virtualbox". And this will go and download the box with the spark. It takes a several minutes. ➔ Go to VirtualBox and you see that Spark_machine or Python_machine is running.
  7. 7. Third step: Installing the VM It should look like this:
  8. 8. Usage of virtual machines If all above steps run successfully, you are ready to use the virtual machine. Congratulations! For instructions on usage of virtual machines, please see next slides
  9. 9. Usage For Windows, MacOS X and Linux
  10. 10. Use the machine - Spark How to use Spark VM: 1. If it is not already running, start the Virtual Machine by issuing issue the command "vagrant up" from a Terminal within a directory with your Spark Machine Vagrantfile. 2. Once the Virtual Machine is running, access the IPython Spark notebooks by navigating your web browser to "http://localhost:8001". 3. On this web page you will see practical assignments together with explanations for practical assignments for “Intro to big data with Apache Spark” course from Data Science School. 4. Select the file “Lab_0_SPARK” from “Lab_0” folder and run cells to verify that you do not encounter any errors: If you get result [('yes', 4), ('no', 3)] after cell, congratulation, you have install your VM correctly!
  11. 11. Use the machine - Python How to use Python VM: 1. If it is not already running, start the Virtual Machine by issuing issue the command "vagrant up" from a command prompt within a directory with your Python machine. 2. Once the Virtual Machine is running, you can access the following: ▷ Hello World Flask App: http://localhost:5000/ - to see how app renders in browser http://localhost:5001/- to edit Flask app files ▷ IPython Spark notebooks: http://localhost:5002/ - to select Python notebook, and also to upload notebooks of your choice On this web page you will see first practical assignment together with explanations for “Intro to data science with Python” course from Data Science School. ▷ Neo4j graph database browser: http://localhost:7474/browser/ - where you can practice with Neo4j queries
  12. 12. Use the machine - notebook It should look similar to this (Spark and Python machines will have different files available): So, now you can either participate in data science classes via Data Science School, or upload any iPython notebook of your choice from the Internet (please be aware, that Python virtual machine supports majority of scientific Python libraries, however there is slight probability, that some of rare libs used by external notebooks may be not supported at this time)
  13. 13. Thank you for attention! Virtual Machines by datascience-school.com

×