By :: Jaideep Katkar
Under the Guidance of :: Dr. Tran Thanh
GraphLab Overview
A New Framework For Parallel Machine
Learning
– high-level abstractions for machine
learning problems
– ...
How GraphLab Works?
– Represent the user's data by a directed graph
– Each block of data is represented by a vertex
and a ...
GAS Decomposition
GraphLab Toolkit
 Topic Modeling contains applications like LDA which can be used to
cluster documents and extract topica...
Running GraphLab on EC2 Cluster
Requirements ::
• You should have Amazon EC2 account eligible to run on us-east-1a zone.
•...
Satisfying Dependencies on Ubuntu
All the dependencies can be satisfied from the repository:
Below command will install gc...
Compiling and Running Graphlab
In the graphlabapi directory, will create two sub-directories, release/ and
debug/ . cd int...
Running Stochastic gradient descent (SGD) in
Collaborative Filtering toolkit
The collaborative filtering toolkit contains ...
Running SGD for Netflix Data to predict
User Rating
Input File (Training) for Netflix Data
[User] [item] [rating]
1000 2 5...
Command Line Arguments to Run SGD
--gamma=XX Gradient descent step size
--lambda=XX Gradient descent regularization
--step...
O/P file
SGD is a simple gradient descent algorithm. Prediction in SGD is
done as : r_ui = p_u * q_i Where r_ui is a scala...
Creating a GraphLab project
 Create a GraphLab project, simply create a sub-
directory in the graphlab/apps/ folder with ...
Hello World in GraphLab
#include <graphlab.hpp>
using namespace graphlab;
#include <graphlab.hpp>
int main(int argc, char*...
Thank You
References ::
http://graphlab.com/community/events/conference14.html
http://graphlab.com/learn/notebooks/introdu...
CS267_Graph_Lab
Upcoming SlideShare
Loading in …5
×

CS267_Graph_Lab

585 views
370 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
585
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CS267_Graph_Lab

  1. 1. By :: Jaideep Katkar Under the Guidance of :: Dr. Tran Thanh
  2. 2. GraphLab Overview A New Framework For Parallel Machine Learning – high-level abstractions for machine learning problems – Shared-memory multiprocessor – Assume no fault tolerance needed – Concurrent access precessing models with sequential-consistency guarantees
  3. 3. How GraphLab Works? – Represent the user's data by a directed graph – Each block of data is represented by a vertex and a directed edge – Shared data table – User functions:  Update: modify the vertex and edges state, read only to shared table  Fold: sequential aggregation to a key entry in the shared table, modify vertex data  Merge: Parallelize Fold function  Apply: Finalize the key entry in the shared table
  4. 4. GAS Decomposition
  5. 5. GraphLab Toolkit  Topic Modeling contains applications like LDA which can be used to cluster documents and extract topical representations.  Graph Analytics contains application like pagerank and triangle counting which can be applied to general graphs to estimate community structure.  Clustering contains standard data clustering tools such as Kmeans  Collaborative Filtering contains a collection of applications used to make predictions about users interests and factorize large matrices.  Graphical Models contains tools for making joint predictions about collections of related random variables.  Computer Vision contains a collection of tools for reasoning about images.
  6. 6. Running GraphLab on EC2 Cluster Requirements :: • You should have Amazon EC2 account eligible to run on us-east-1a zone. • Amazon AWS console your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (under your account name on the top right corner-> security credentials -> access keys) • You should have a keypair attached to the zone you are running on (in our example us-east-1a) • Install boto. This is the AWS Python client. To install, run: ‘sudo pip boto’. • Download and install Graphlab as mentioned on next slides.
  7. 7. Satisfying Dependencies on Ubuntu All the dependencies can be satisfied from the repository: Below command will install gcc , jdk need to compile graphlab Programs: Downloading GraphLab version 2.2 You can download GraphLab directly from our Github Repository. Github also offers a zip download of the repository if you do not have git. The git command line for cloning the repository is:
  8. 8. Compiling and Running Graphlab In the graphlabapi directory, will create two sub-directories, release/ and debug/ . cd into either of these directories and running make will build the release or the debug versions respectively. Note that this will compile all of GraphLab, including all toolkits.
  9. 9. Running Stochastic gradient descent (SGD) in Collaborative Filtering toolkit The collaborative filtering toolkit contains tools for computing a linear model of the data, and predicting missing values based on this linear model. This is useful when computing recommendations for users http://docs.graphlab.org/collaborative_filtering.html
  10. 10. Running SGD for Netflix Data to predict User Rating Input File (Training) for Netflix Data [User] [item] [rating] 1000 2 5.0 3 7 12.0 6 2 2.1 Creating Directory to load Netflix data
  11. 11. Command Line Arguments to Run SGD --gamma=XX Gradient descent step size --lambda=XX Gradient descent regularization --step_dec=XX Multiplicative step decrease. Should be between 0.1 to 1. Default is 0.9. --D=X Feature vector width. Common values are 20 - 150. --max_iter=XX Max number of iterations --maxval=XX Maximum allowed rating --minval=XX Min allowed rating --predictions=XX File name to write prediction to. Note that you will need a user/item pair input file named something. predict to enable predictions (see section: ratings). --tol=XX Stop computation when absolute error of prediction is less than tolerance. Default is 1e-3.
  12. 12. O/P file SGD is a simple gradient descent algorithm. Prediction in SGD is done as : r_ui = p_u * q_i Where r_ui is a scalar rating of user u to item i, and p_u is the user feature vector of size D, q_i is the item feature vector of size D and the product is a vector product.
  13. 13. Creating a GraphLab project  Create a GraphLab project, simply create a sub- directory in the graphlab/apps/ folder with your project Name.  For instance, graphlab/apps/my_first_GraphLabProject.  Create a text file called CMakeLists.txt with the following contents :: project(My_GraphLabProject) add_graphlab_executable(my_first_GraphLabProject <ProgramName>.cpp)
  14. 14. Hello World in GraphLab #include <graphlab.hpp> using namespace graphlab; #include <graphlab.hpp> int main(int argc, char** argv) { graphlab::mpi_tools::init(argc, argv); graphlab::distributed_control dc; dc.cout() << "Hello World!n"; graphlab::mpi_tools::finalize(); } • dc is the distributed communication layer which is needed by a number of the core GraphLab objects, whether you are running distributed or not • To create the program run the configure script, than run "make" in the •debug/ release/ build folders. The program when executed, will print "Hello World!".
  15. 15. Thank You References :: http://graphlab.com/community/events/conference14.html http://graphlab.com/learn/notebooks/introduction_to_sframes.html http://en.wikipedia.org/wiki/GraphLab https://www.youtube.com/watch?v=lRN91_-hlkg https://wiki.engr.illinois.edu/download/attachments/227740647/GraphLab .pdf?version=1&modificationDate=1382500521000#page=1&zoom=auto, 0,280 http://arxiv.org/pdf/1204.6078v1.pdf http://select.cs.cmu.edu/code/graphlab/doxygen/html/index.html

×