Algo Project Proposal


Published on

algo poject proposal

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Algo Project Proposal

  1. 1. Project Proposal Solving map – reduce problems using a distributed cluster via http 2007095 -Ritesh M Nayak [email_address]
  2. 2. Objective <ul><li>Solving map reduce problems by using a distributed cluster made up of many transient state nodes. </li></ul><ul><li>Nodes are made of WYSIWYG browsers. </li></ul><ul><li>Make a Proof of concept of the technology and package it into a library for generic usage. </li></ul>
  3. 3. The current technology Server Client Client Client Sockets(tcp)
  4. 4. Problems <ul><li>Clients are usually executables that have to be downloaded and installed </li></ul><ul><li>Requires opening up of ports and firewalls which does not work in all settings(due to TCP) </li></ul><ul><li>Examples are the @home project, SETI project and also the Google science project. </li></ul>
  5. 5. Solution <ul><li>Make the clients thin – remove the download and install overhead </li></ul><ul><li>Use the resource that the browser uses like the javascripting engine which can do a lot of computation which is untapped. </li></ul><ul><li>Make the entire process seamless without user intervention or even without the users knowledge </li></ul>
  6. 6. Inspiration <ul><li>People spend hours on the net. Home Pages, social networks, mail boxes remain open for long hours at times. During this time, the bandwidth and the computational capabilities of the system can be utilized to do server’s work ( if the user desires ) </li></ul><ul><li>Bandwidth is becoming cheaper and unlimited which isnt much of a concern. </li></ul>
  7. 7. How it will look Server Http No firewalls Browser Browser Browser Server (Master) : will act as the master which will do the work of dividing the problem Browsers ( Slaves) : will form the computational node when connected to the server.
  8. 8. Problems to be solved <ul><li>Implementing map reduce algorithms on the server and client </li></ul><ul><li>Calculating the problem size and deciding division strategies plus optimization. </li></ul><ul><li>Selection algorithms for the most stable clients . </li></ul><ul><li>Synchronization algos in case of critical section problems </li></ul>
  9. 9. Problems to be solved <ul><li>In case of problems that cannot be tackled via divide and conquer, parallelizing the serial code. </li></ul><ul><li>Algos for error recovery, node failure, new node addition </li></ul><ul><li>And tons more…. </li></ul>
  10. 10. Technology <ul><li>Any technology that uses the web model (J2EE) </li></ul><ul><li>Client side javascript or activex plugins(addons) </li></ul><ul><li>Comet paradigm for async comm between server and client. </li></ul>
  11. 11. Technology Hurdles <ul><li>Current web models aren’t mature enough to handle such reqs, will require new way of looking at web apps (asynchrony and push technology). No known design patterns exist. </li></ul><ul><li>Client side resources include js which is limited in its capabilities. Efficient js library has to be developed. </li></ul>
  12. 12. Prospects <ul><li>With concepts like semantic web, large scale information mining becoming reality, this technology can give a great edge to companies and researchers alike who can delegate processor intensive computations as small computations to clients. </li></ul><ul><li>Companies don’t need to invest in more hardware. </li></ul><ul><li>A new paradigm of programming where the browser acts as a server. </li></ul>
  13. 13. Examples to learn from <ul><li>Yahoo and Apache’s Hadoop technology which solves the map reduce on distributed servers. </li></ul><ul><li>Amazon’s EC2 computational facility. </li></ul><ul><li>Tons of material on grid and cluster computing. </li></ul>