Using Grids to support
Information Filtering Systems
    Running Collaborative Filtering
 Recommendations on gLite middlew...
Recommender system settings
http://canalcinefilia.com.br




     Movielens-like recommender system
             (http://movielens.umn.edu)
          ...
No dedicated servers / standard technologies


Hosted by a comercial Web Hosting service

          www




            My...
Data set




             225 out of 390 users
   are able to get movie recommendations
                921 movies
       ...
Rating Matrix




           225 users X 921 movies
                ~207,000 „cells‟
Users’ similarity - Pearson r correlation




  The complexity of maintaining a similarity
    matrix with the Pearson cor...
Neighborhood: similarity threshold r > 0.3


                                        Pearson
                             ...
Generating recommendations


                 Will this user enjoy the movie “Yes Man” ?




               Weighted mean
...
www.canalcinefilia.com.br - predictions




              65,388 predictions
Grid Computing
Computational intensive research
Computation intensive applications / experiments
EGEE Grid (www.eu-egee.org)
EGEE numbers:
>260 sites
54 countries
~114,000 CPUs
>20 PetaBytes
>16,000 users
>200 VOs
>150,...
The global network coverage
The global Grid coverage

                     SEE-Grid

                  DEISA    BalticGrid
     TeraGrid
             ...
Implementation on the
  gLite-based Grid
GILDA (https://gilda.ct.infn.it)

 Grid INFN Laboratory for Dissemination Activities
 Grid test-bed for training
 A „st...
EELA    (www.eu-eela.eu)


 E-science grid facility for Europe and Latin America
 Co-funded by EC (FP7)
 ~ 5800 CPUs
EELA   (www.eu-eela.eu)
www




MySQL
.JDL




           LFC
 Grid UI
SE




SE
Input sandbox

            mdclient.config

              Start.sh

                Recommender.class




Grid UI
WMS
Grid UI
SE            SE




  WN    WN          WN



CE 1   CE 2        CE n
Output - Version I
.SQL
   .SQL           .SQL
                     .SQL           .SQL
                                       .SQL
      .SQ...
Grid UI       WMS

  .SQL
     .SQL
        .SQL

Output sandbox
www




MySQL
Output - Version II
AMGA




  WN    WN            WN



CE 1   CE 2          CE n
www




MySQL
Implementation on OurGrid
OurGrid (www.ourgrid.org)

 Opportunistic Grid
 Job submissions can be handled by a web portal
What’s next?
Future works

 Run experiments using the Netflix prize database
 Create a new version using the Amazon EC2
 Provide per...
The End

                leandro.ciuffo@ct.infn.it



          http://www.canalcinefilia.com.br
  http://canalcinefilia.c...
Upcoming SlideShare
Loading in...5
×

Using Grids to support Information Filtering Systems

419

Published on

Presentation given at the ICEIS 2009 Conference.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
419
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Grids to support Information Filtering Systems

  1. 1. Using Grids to support Information Filtering Systems Running Collaborative Filtering Recommendations on gLite middleware Leandro N. Ciuffo leandro.ciuffo@ct.infn.it INFN-Catania - Italy
  2. 2. Recommender system settings
  3. 3. http://canalcinefilia.com.br Movielens-like recommender system (http://movielens.umn.edu) Explicit data collection (users need to rate > 20 movies)
  4. 4. No dedicated servers / standard technologies Hosted by a comercial Web Hosting service www MySQL
  5. 5. Data set 225 out of 390 users are able to get movie recommendations 921 movies 33,147 ratings
  6. 6. Rating Matrix 225 users X 921 movies ~207,000 „cells‟
  7. 7. Users’ similarity - Pearson r correlation The complexity of maintaining a similarity matrix with the Pearson correlation between every pair of users is O(m2n)
  8. 8. Neighborhood: similarity threshold r > 0.3 Pearson Correlation 0.6 -1 < r < 1 0.5 0.7 0.4 1 0.3 0.8 0.9
  9. 9. Generating recommendations Will this user enjoy the movie “Yes Man” ? Weighted mean 0.9 0.8 0.6 0.4 0.3 Rating User’s neighborhood Prediction generated A movie must be rated by This repeats for every movie not rated at least 8 neighbors
  10. 10. www.canalcinefilia.com.br - predictions 65,388 predictions
  11. 11. Grid Computing
  12. 12. Computational intensive research
  13. 13. Computation intensive applications / experiments
  14. 14. EGEE Grid (www.eu-egee.org) EGEE numbers: >260 sites 54 countries ~114,000 CPUs >20 PetaBytes >16,000 users >200 VOs >150,000 jobs/day
  15. 15. The global network coverage
  16. 16. The global Grid coverage SEE-Grid DEISA BalticGrid TeraGrid EGEE EUChinaGrid OSG NAREGI EUMedGrid EUIndiaGrid EUAsiaGrid EELA
  17. 17. Implementation on the gLite-based Grid
  18. 18. GILDA (https://gilda.ct.infn.it)  Grid INFN Laboratory for Dissemination Activities  Grid test-bed for training  A „standard‟ t-Infrastructure adopted by many projects  Users can practice prior to run their codes on the production e-Infrastructures  ~ 11 sites - 285 CPUs * (*) # of Sites may change in time they are managed on a “best effort” basis
  19. 19. EELA (www.eu-eela.eu)  E-science grid facility for Europe and Latin America  Co-funded by EC (FP7)  ~ 5800 CPUs
  20. 20. EELA (www.eu-eela.eu)
  21. 21. www MySQL
  22. 22. .JDL LFC Grid UI
  23. 23. SE SE
  24. 24. Input sandbox mdclient.config Start.sh Recommender.class Grid UI
  25. 25. WMS Grid UI
  26. 26. SE SE WN WN WN CE 1 CE 2 CE n
  27. 27. Output - Version I
  28. 28. .SQL .SQL .SQL .SQL .SQL .SQL .SQL .SQL .SQL WN WN WN CE 1 CE 2 CE n
  29. 29. Grid UI WMS .SQL .SQL .SQL Output sandbox
  30. 30. www MySQL
  31. 31. Output - Version II
  32. 32. AMGA WN WN WN CE 1 CE 2 CE n
  33. 33. www MySQL
  34. 34. Implementation on OurGrid
  35. 35. OurGrid (www.ourgrid.org)  Opportunistic Grid  Job submissions can be handled by a web portal
  36. 36. What’s next?
  37. 37. Future works  Run experiments using the Netflix prize database  Create a new version using the Amazon EC2  Provide performance comparisons among EELA (gLite) X OurGrid X Amazon EC2 I‟m looking for partners
  38. 38. The End leandro.ciuffo@ct.infn.it http://www.canalcinefilia.com.br http://canalcinefilia.com.br/en/credits/about.php http://applications.eu-eela.eu/application_details.php?ID=59
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×