DIET_BLAST: Architecture logicielle et petits problèmes de recherche Frédéric Desprez LIP ENS Lyon/INRIA Grenoble Rhône-Al...
Agenda <ul><li>Introduction </li></ul><ul><li>One target application: BLAST using large databases over the grid </li></ul>...
Introduction <ul><li>Several applications ready (and not only number crunching ones !) </li></ul><ul><li>Huge data sets st...
One target application: BLAST over the grid <ul><li>Basic Local Alignment Search Tool (BLAST) </li></ul><ul><ul><li>To fin...
One target application: BLAST over the grid <ul><li>Each sequence of the entry set can be treated independently </li></ul>...
One target application: BLAST over the grid <ul><li>Grid-BLAST execution </li></ul><ul><ul><li>The user submits a large se...
Parallelization and distribution of bioinformatics requests over the Grid <ul><li>Context </li></ul><ul><ul><li>Large sets...
Related work: Job scheduling & data replication <ul><li>In «  Decoupling Computation and Data Scheduling  in Distributed D...
Related work: Job scheduling & data replication <ul><li>For the different combinations they measured: </li></ul><ul><ul><l...
Related work: Integration of Scheduling & Replication <ul><li>In «  Integration of Scheduling and Replication in Data Grid...
<ul><li>To obtain dynamic information about the nodes status is a difficult task involving complex grid services </li></ul...
<ul><li>Problem modeling </li></ul><ul><li>Scheduling and Replication Algorithm (SRA) </li></ul><ul><ul><li>Relies on the ...
Scheduling and Replication Algorithm <ul><li>SRA gives good results when the requests frequencies are constant </li></ul><...
Scheduling and Replication Algorithm <ul><li>Simulation experiments </li></ul><ul><ul><li>Grid’5000 platform </li></ul></u...
Using MCT A. Vernois
Using SRA A. Vernois
SRA with frequency variation detection <ul><li>On the grid, the number and variety of users can make the time needed to ha...
SRA with frequency variation detection <ul><li>Algorithm principle </li></ul><ul><ul><li>SRA gives an initial data distrib...
SRA with frequency variation detection <ul><li>The frequencies vary - We do not proceed to data redistribution </li></ul><...
SRA with frequency variation detection <ul><li>The frequencies vary - We detect the variations and proceed to the data red...
Implementation within a middleware - DIET <ul><li>GridRPC API </li></ul><ul><li>Network Enabled Servers paradigm </li></ul...
Data/replica management <ul><li>Two needs </li></ul><ul><ul><li>Keep the data in place to reduce the overhead of communica...
DAGDA <ul><li>D ata  A rrangement for  G rid and  D istributed  A pplications </li></ul><ul><li>A data manager for the DIE...
DAGDA <ul><li>Transfer model </li></ul><ul><ul><li>Uses the pull model. </li></ul></ul><ul><ul><li>The data are sent indep...
DAGDA <ul><li>DAGDA architecture </li></ul><ul><ul><li>Each data is associated to one unique identifier </li></ul></ul><ul...
Putting everything together G. Le Mahec
Putting everything together <ul><li>Experiments over Grid’5000 using DIET and DAGDA </li></ul><ul><li>300 nodes (4 sites),...
Putting everything together <ul><li>Maximum request partition </li></ul><ul><li>Partition in  n  sub-requests.  ( n  the n...
Putting everything together <ul><li>Comparison of MCT and dynamic SRA over 1000 nodes (8 sites) </li></ul><ul><li>4 databa...
Putting everything together <ul><li>Using MCT: </li></ul>Using Dynamic-SRA: <ul><li>Requests 1 to 20000: </li></ul><ul><ul...
Conclusion and future work about DIET_BLAST <ul><li>Data management is one of the most important issue of large scale appl...
Une collaboration autour de trois partenaires fondateurs <ul><li>AFM </li></ul><ul><ul><li>coordination de l’appel à proje...
Avec la participation d'autres partenaires <ul><li>Pilotage des ressources de la grille en optimisant : planification et e...
Deployment example with Universities Sed = Server Daemon, installed on any server running Loadleveler. Note that we can de...
Philosophy of the Décrypthon grid <ul><li>Transparency </li></ul><ul><ul><li>An interface as close as the one users are us...
Data management Credits: H. N’Guyen, O. Poch, IGBMC Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders,...
Data management, cont
<ul><li>SM2PH  : from  S tructural  M utation to  P athology  P henotypes in  H uman  </li></ul><ul><li>(Friedrich A et al...
<ul><li>SM2PH-db  ( http://decrypthon.igbmc.fr/sm2ph/ ) </li></ul><ul><ul><li>entry:  2 296  proteins involved in human mo...
Complex interconnected programs <ul><li>On a single data, up to 25 programs are applied, in cascade or in parallel. </li><...
What’s next ?  <ul><li>New platform based on IBM BlueGene/P </li></ul><ul><li>Mixed Grid/Cloud approach using SysFera-DS <...
SysFera <ul><li>SysFera-DS : une pile logicielle complète pour le HPC </li></ul><ul><li>... et un accès simple et transpar...
Des questions ?
Upcoming SlideShare
Loading in …5
×

DIET_BLAST

755 views

Published on

Journée Calcul Intensif pour la biologie Lille 14/06/11.
http://www.lifl.fr/~touzet/calculintensif11.html

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
755
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • En comparant deux séquences dont l’une à un r ôle connu, on espère, si elles se ressemblent, trouver le rôle de la nouvelle séquence dans une protèine ou dans un gène. Les bases de données utilisées sont simplement des fichiers plats qui contiennent des séquences accompagnées de leurs descriptions.
  • Si la base n’est pas « installée » dans DAGD, l’utilisateur la transmet en paramètre, sinon on se sert du système d’identifiants partagés de DAGDA (alias sur les données qui évite de se servir d’un identifiant uuid pas pratique à partager/échanger) Rem : La division du fichier d’entrée et la fusion des résultats ont un co ût négligeable en regard du coût d’exécution des BLAST.
  • Lorsqu’un job est envoyé sur un nœud qui n’a pas la donnée, il la télécharge, la met dans un cache sur lequel on utilise LRU pour sélectionner la donnée à effacer quand on a besoin de place. En attendant que la donnée soit effacée, elle est disponible pour les autres nœuds (c’est donc un replicat).
  • 3 fois le meilleur résultat pour l’ordo où la donnée est présente, peu importe la manière dont elle a été repliquée (random ou least loaded)… On peut noter que lorsqu’on ne réplique pas, le meilleur temps de réponse est obtenu quand on exécute le job là où il a été soumis…
  • 5 bases de tailles de 150 Mo à 5 Go. 5 algos : blastn, blastp, blastx, tblastn et tblastx (adn vs adn, protein vs protein, adn vs protein etc…) Le plus long est tblastx (environ 20 fois plus long que blastn) Le pic du début dans le graphique SRA : Les ensembles de jobs soumis sont petits, les replications finissent après la soumission du dernier job =&gt; Le temps moyen des jobs est important puisqu’ils sont tous ordonnancés sur les m êmes machines. Le temps d’attente décroit au fur et à mesure que les réplications se finissent, les derniers jobs soumis profitant pleinement des réplications. MCT envoie chaque job sur le nœud qui sera le plus rapide pour l’exécuter à l’instant de sa soumission : Il va souvent copier les bases sur le site le plus rapide, même s’il faut pour ça effacer une base très grande et donc longue à retransmettre ensuite. Ce qu’il fera quand même pour les jobs les plus longs (tblastx). Pour les jobs les plus rapides (blastn), il avantage les nœuds qui ont déjà la base même s’ils sont lents et même si ils sont très nombreux…
  • Optorsim
  • Explicite : L’utilisateur décide explicitement de répliquer les données. Implicite : Ce sont les appels aux services qui provoquent les réplications de données. Contrairement à DTM, les données sont répliquées et pas déplacées. Accès direct aux données stockées + Ajout direct d’une donnée dans DIET. Automatic data management : Quand on souhaite installer une donnée sur un nœud qui ne dispose plus assez d’espace, on efface une donnée en utilisant un algo choisi dans la configuration du nœud. Transfer optimization : On choisit la « meilleure » source pour une donnée en fonction de stats réalisées pendant les transferts précédents. Storage usage management : On peut choisir quelle quantité de mémoire et quel espace disque sont réservés aux données gérées par DAGDA. Data backup/restoration : On peut enregistrer l’état actuel des distributions de données et rétablir la situation au redémarrage de DIET. (Par exemple, on arrive à la fin d’une réservation, et on veut continuer une expèrience plus tard. Au redémarrage, les données sont remises comme elles étaient avant la coupure.)
  • Contrairement à DTM, c’est le SeD qui télécharge les données et pas le client qui les envoies « d’autorité ». Seules les descriptions des données (type, taille etc.) sont envoyées pour les requ êtes. Si on a configuré la taille maximum des messages envoyés par DAGDA, les données trop grandes sont envoyées en plusieurs fois. Ca permet également de limiter la quantité de mémoire nécessaire pour les transferts. DTM charge tout en mémoire avant d’envoyer les données.
  • Le « cœur » de DAGDA gère l’identification et la recherche des données ainsi que le choix des sources/destinations pour les transferts. Les éléments étendus de DAGDA gèrent les limitations de ressources fixées par les utilisateurs et la sauvegarde/restauration des données. L’API permet d’accéder/ajouter directement des données dans la plateforme ainsi que de lancer des réplications.
  • Une requ ête est un ensemble de séquences à « BLASTER » sur une base donnée. Une sous-requête est un sous-ensemble de ces séquences à BLASTER sur la même base.
  • Use of plugin schedulers
  • Division maximum : Si le fichier requ ête de départ contient n séquences, on crée n fichiers de requête chacun d’entre eux ne contenant qu’une séquence. Division en n sous-requêtes : On a n nœuds dispos, on crée n sous-requêtes de taille identique. Chaque nœud n’a à traiter qu’une seule requête. Avec Random, MCT &amp; Round-Robin, la multiplication des requêtes provoque de l’overhead qui n’est pas compensé par l’ordonnancement. Le mieux reste de découper les requêtes en le nombre de nœuds dispos. Avec SRA, plus on a de requêtes, plus les fréquences sont fiables, et donc, l’algo est plus efficace. On compense l’overhead par l’ordonnancement. Globalement Dynamic-SRA est meilleur, m ême en découpant la requêtes en n parties si on a suffisamment de nœuds (ici 300 SeDs) : Sur 300 fichiers, on arrive à avoir des fréquences à peu près convenables. Avec moins de nœuds, donc moins de requ êtes, les fréquences sont de plus en plus approximatives et comme on optimise le débit de la plateforme, SRA-dynamique devient de moins en moins bon.
  • Les algos ont des complexité différentes : BLASTN est le plus rapide à faire (ADN =&gt; alphabet de 4 lettres Vs ADN). BLASTP : Protéine =&gt; alphabet de 20 lettres Vs Protéines. BLASTX : ADN traduit en protéine Vs Protéines (traduction ADN + BLASTP) TBLASTX : Le plus long =&gt; ADN traduit en Protéine Vs Une base ADN traduite en Protéines. (Traduction de toutes les séquences et BLASTP) Globalement, le changement des fréquences n’a pas beaucoup d’influence sur MCT.
  • DIET_BLAST

    1. 1. DIET_BLAST: Architecture logicielle et petits problèmes de recherche Frédéric Desprez LIP ENS Lyon/INRIA Grenoble Rhône-Alpes EPI GRAAL/Avalon 14/06/11
    2. 2. Agenda <ul><li>Introduction </li></ul><ul><li>One target application: BLAST using large databases over the grid </li></ul><ul><li>One problem: join scheduling and replication </li></ul><ul><ul><li>Thèse d’Antoine Vernois (avec C. Blanchet et P. Vicat-Blanc Primet) </li></ul></ul><ul><ul><li>Thèse de Gaël Le Mahec (avec V. Breton) </li></ul></ul><ul><li>One middleware : DIET/DAGDA </li></ul><ul><li>Conclusion and future work </li></ul><ul><li>Décrypthon </li></ul>
    3. 3. Introduction <ul><li>Several applications ready (and not only number crunching ones !) </li></ul><ul><li>Huge data sets start to be available around the world </li></ul><ul><li>Data management is one of the most important issue of today’s applications </li></ul><ul><li>Replication has to be used to improve platform throughput </li></ul><ul><li>Services for resource management and data management/replication available in most grid middleware … </li></ul><ul><li>… but usually they work separately </li></ul><ul><li>Our approach </li></ul><ul><ul><li>Put the data management into the resource management sphere </li></ul></ul><ul><ul><li>Perform the request scheduling with the data replication based on the information provided by the application </li></ul></ul><ul><li>Application to a large scale bioinformatics application </li></ul>
    4. 4. One target application: BLAST over the grid <ul><li>Basic Local Alignment Search Tool (BLAST) </li></ul><ul><ul><li>To find homologies between nucleotids or amino acids sequences. </li></ul></ul><ul><li>Objectives </li></ul><ul><ul><li>To find clues about the function of a protein/gene comparing a newly discovered sequence to well-known ones. </li></ul></ul><ul><li>Biological databases </li></ul><ul><ul><li>BLAST uses biological databases containing large sets of annotated sequences as entry parameter. </li></ul></ul><ul><li>Usage </li></ul><ul><ul><li>Generally, biologists perform BLAST searches on large sets of sequences. </li></ul></ul>… A T C A A G T C … | | | | | | … A C C A - G T C …
    5. 5. One target application: BLAST over the grid <ul><li>Each sequence of the entry set can be treated independently </li></ul><ul><ul><li>A simple parallelization: to perform the searches for each sequence on different nodes </li></ul></ul><ul><ul><li>But efficient… A sequence weights at most some kilobytes and a search on it takes at most some minutes </li></ul></ul><ul><li>Requirements to “gridify” BLAST applications </li></ul><ul><ul><li>A way to submit and distribute the requests </li></ul></ul><ul><ul><li>A way to replicate databases </li></ul></ul><ul><li>Then a software to perform </li></ul><ul><ul><li>the sequences sets division before the computation </li></ul></ul><ul><ul><li>the results merge on exit </li></ul></ul>
    6. 6. One target application: BLAST over the grid <ul><li>Grid-BLAST execution </li></ul><ul><ul><li>The user submits a large set of sequences and a database or a database identifier </li></ul></ul><ul><ul><li>Entry set is divided. </li></ul></ul><ul><ul><li>The database is replicated on the platform using a data manager </li></ul></ul><ul><ul><li>The resource broker returns a set of computing resources </li></ul></ul><ul><ul><li>Each sequence is treated on a different server </li></ul></ul><ul><ul><li>Results merged into a single result file </li></ul></ul>
    7. 7. Parallelization and distribution of bioinformatics requests over the Grid <ul><li>Context </li></ul><ul><ul><li>Large sets of requests are submitted by many users of the grid </li></ul></ul><ul><ul><li>Large databases are used as parameters of the requests </li></ul></ul><ul><ul><li>The computing nodes of the grid have different computing and storage capacities </li></ul></ul><ul><ul><li>Several different BLAST applications depending upon users’ needs </li></ul></ul><ul><li>The jobs have to be scheduled on-line (the scheduling is made job by job) </li></ul><ul><ul><li>We want to optimize the resources usage: </li></ul></ul><ul><ul><ul><li>No useless replication </li></ul></ul></ul><ul><ul><ul><li>Platform throughput optimization </li></ul></ul></ul>Where to replicate the databases ? How to distribute the requests ?
    8. 8. Related work: Job scheduling & data replication <ul><li>In «  Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications  », Foster et al. analyzed the performance of various combinations of job scheduling and data replication algorithms. </li></ul><ul><li>They evaluated four job scheduling strategies: </li></ul><ul><ul><li>JobRandom: The job is sent to a random site. </li></ul></ul><ul><ul><li>JobLeastLoaded: The job is sent to the node with the least number of waiting jobs. </li></ul></ul><ul><ul><li>JobDataPresent: The job is sent on the least loaded site on which the data is present. </li></ul></ul><ul><ul><li>JobLocal: The job is run locally. </li></ul></ul><ul><li>And three replication strategies: </li></ul><ul><ul><li>DataDoNothing: The data are not replicated. Data may be downloaded from a remote site in a cache using LRU replacement algorithm. </li></ul></ul><ul><ul><li>DataRandom: The most « popular » data are replicated on a random site. </li></ul></ul><ul><ul><li>DataLeastLoaded: The most « popular » data are replicated on the least loaded site in the neighborhood of the node. </li></ul></ul>Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications, Ranganathan, K., Foster, I., HPDC '02, Washington, DC, USA, IEEE. 2002.
    9. 9. Related work: Job scheduling & data replication <ul><li>For the different combinations they measured: </li></ul><ul><ul><li>Best average response time when: </li></ul></ul><ul><ul><ul><li>The job is scheduled where the data is </li></ul></ul></ul><ul><ul><ul><li>With data replication (even with random replication) </li></ul></ul></ul><ul><ul><li>Best average data transferred when: </li></ul></ul><ul><ul><ul><li>The job is scheduled where the data is </li></ul></ul></ul><ul><ul><ul><li>With data replication </li></ul></ul></ul><ul><ul><li>Best average idle time: </li></ul></ul><ul><ul><ul><li>The job is scheduled where the data is </li></ul></ul></ul><ul><ul><ul><li>With data replication </li></ul></ul></ul><ul><li>Scheduling should take into account the data distribution. </li></ul><ul><li>It is not always necessary to couple data movement & computation. </li></ul>
    10. 10. Related work: Integration of Scheduling & Replication <ul><li>In «  Integration of Scheduling and Replication in Data Grids  », Chakrabati et al. present the Integrated Replication and Scheduling Strategy (IRS) which couple scheduling and replication strategies. </li></ul><ul><ul><li>Improve the performance by working alternatively on the data mapping and the task mapping. </li></ul></ul><ul><li>Algorithm principle </li></ul><ul><ul><li>The job scheduling can use two approaches: </li></ul></ul><ul><ul><ul><li>One based on the data availability: The job is sent on the node which have the most of the needed data. </li></ul></ul></ul><ul><ul><ul><li>One based on scheduling cost (time to transfer the data, job waiting time and job execution time) : The job is sent on the node which have the smallest cost. </li></ul></ul></ul><ul><ul><li>After each job submission, a data usage matrix is updated. </li></ul></ul><ul><ul><li>The replication algorithm have two phases: </li></ul></ul><ul><ul><ul><li>Estimation of the maximum number of replications for a data. </li></ul></ul></ul><ul><ul><ul><li>The data replication itself using the data usage matrix </li></ul></ul></ul>Integration of Scheduling and Replication in Data Grids , Chakrabarti A., Dheepak, R.A., Sengupta, S., HiPC 2004, Lecture Notes in Computer Science, 2004.
    11. 11. <ul><li>To obtain dynamic information about the nodes status is a difficult task involving complex grid services </li></ul><ul><li>The information obtained can be several minutes old </li></ul><ul><ul><li>For long execution time jobs: not really a problem </li></ul></ul><ul><ul><li>For jobs that take some minutes: the information are outdated </li></ul></ul><ul><li>In this context without more information, there are few things to do… Using the classical Minimum Completion Time (MCT) scheduling strategy is a good way to schedule the jobs but… </li></ul><ul><li>With few more information, we can do better </li></ul><ul><ul><li>The analysis of the execution traces of bioinformatics clusters showed that the way the biologists are submitting jobs is homogeneous if the observed time interval is long enough. We will use this information to optimize the Grid resources usage. </li></ul></ul>Parallelization and distribution of bioinformatics requests over the Grid
    12. 12. <ul><li>Problem modeling </li></ul><ul><li>Scheduling and Replication Algorithm (SRA) </li></ul><ul><ul><li>Relies on the integer approximation of a linear program </li></ul></ul><ul><ul><li>Gives the data distribution and the jobs scheduling </li></ul></ul><ul><ul><li>Takes into account the specificities of the biologists submissions </li></ul></ul>Parallelization and distribution of bioinformatics requests over the Grid A. Vernois
    13. 13. Scheduling and Replication Algorithm <ul><li>SRA gives good results when the requests frequencies are constant </li></ul><ul><ul><li>The data distribution and the scheduling are efficient </li></ul></ul><ul><ul><li>It does not need dynamic information about the Grid </li></ul></ul>A. Vernois
    14. 14. Scheduling and Replication Algorithm <ul><li>Simulation experiments </li></ul><ul><ul><li>Grid’5000 platform </li></ul></ul><ul><ul><li>2540 heterogeneous CPUs on 9 sites </li></ul></ul><ul><ul><li>5 databases of different sizes </li></ul></ul><ul><ul><li>5 algorithms of different complexities </li></ul></ul><ul><ul><li>1 request per 0,3 s. </li></ul></ul><ul><ul><li>The data are randomly distributed before the start </li></ul></ul><ul><li>For small sets of requests </li></ul><ul><ul><li>SRA does not have enough time to replicate the data </li></ul></ul><ul><li>For large sets of requests </li></ul><ul><ul><li>The replications and jobs scheduling of SRA avoid the computing nodes to saturate </li></ul></ul>A. Vernois
    15. 15. Using MCT A. Vernois
    16. 16. Using SRA A. Vernois
    17. 17. SRA with frequency variation detection <ul><li>On the grid, the number and variety of users can make the time needed to have constant frequencies very long. Some « events » can temporary modify the frequencies </li></ul><ul><ul><li>Data challenges </li></ul></ul><ul><ul><li>Important conference submission deadline </li></ul></ul><ul><ul><li>Holidays and week-ends </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>By detecting such a frequency variation, we can correct the data distribution and jobs scheduling </li></ul>G. Le Mahec
    18. 18. SRA with frequency variation detection <ul><li>Algorithm principle </li></ul><ul><ul><li>SRA gives an initial data distribution </li></ul></ul><ul><ul><li>The Resource Broker records the submissions and update the frequencies </li></ul></ul><ul><ul><li>If a frequency varies beyond a  threshold </li></ul></ul><ul><ul><ul><li>New data distribution computation using SRA </li></ul></ul></ul><ul><ul><ul><li>Start of the transfers asynchronously if possible. Otherwise, the job scheduling will cause the data transfers </li></ul></ul></ul><ul><ul><li>Task scheduling using the job distribution given by SRA </li></ul></ul>G. Le Mahec
    19. 19. SRA with frequency variation detection <ul><li>The frequencies vary - We do not proceed to data redistribution </li></ul><ul><ul><li>The scheduling is no more efficient for large sets of jobs. </li></ul></ul><ul><ul><li>The average execution time of the jobs increases with the number of jobs submitted. </li></ul></ul>G. Le Mahec
    20. 20. SRA with frequency variation detection <ul><li>The frequencies vary - We detect the variations and proceed to the data redistribution </li></ul><ul><ul><li>The scheduling efficiency is saved. </li></ul></ul><ul><ul><li>The average execution time of the jobs does not increase too much. </li></ul></ul>G. Le Mahec
    21. 21. Implementation within a middleware - DIET <ul><li>GridRPC API </li></ul><ul><li>Network Enabled Servers paradigm </li></ul><ul><li>Some features </li></ul><ul><ul><li>Distributed (hierarchical) scheduling </li></ul></ul><ul><ul><li>Plugin schedulers </li></ul></ul><ul><ul><li>Data management </li></ul></ul><ul><ul><li>Workflow management </li></ul></ul><ul><ul><li>Sequential and parallel task (through batch schedulers) </li></ul></ul><ul><ul><li>Clusters, Grids, Clouds </li></ul></ul><ul><ul><li>Open source </li></ul></ul><ul><ul><li>… </li></ul></ul>http://graal.ens-lyon.fr/DIET/
    22. 22. Data/replica management <ul><li>Two needs </li></ul><ul><ul><li>Keep the data in place to reduce the overhead of communications between clients and servers </li></ul></ul><ul><ul><li>Replicate data whenever possible </li></ul></ul><ul><li>Three approaches for DIET </li></ul><ul><ul><li>DTM (LIFC, Besançon) </li></ul></ul><ul><ul><ul><li>Hierarchy similar to the DIET’s one </li></ul></ul></ul><ul><ul><ul><li>Distributed data manager </li></ul></ul></ul><ul><ul><ul><li>Redistribution between servers </li></ul></ul></ul><ul><ul><li>JuxMem (Paris, Rennes) </li></ul></ul><ul><ul><ul><li>P2P data cache </li></ul></ul></ul><ul><ul><li>DAGDA (IN2P3, Clermont-Ferrand and now Univ. of Picardie) </li></ul></ul><ul><ul><ul><li>Replication </li></ul></ul></ul><ul><ul><ul><li>Joining task scheduling and data management </li></ul></ul></ul><ul><li>Work done within the GridRPC Working Group (OGF) </li></ul><ul><ul><li>Relations with workflow management </li></ul></ul>
    23. 23. DAGDA <ul><li>D ata A rrangement for G rid and D istributed A pplications </li></ul><ul><li>A data manager for the DIET middleware providing </li></ul><ul><ul><li>Explicit data replication: using the API </li></ul></ul><ul><ul><li>Implicit data replication: data items are replicated on the selected servers </li></ul></ul><ul><ul><li>Direct data get/put through the API </li></ul></ul><ul><ul><li>Automatic data management: using a selected data replacement algorithm when necessary </li></ul></ul><ul><ul><ul><li>LRU: The Least Recently Used data is deleted </li></ul></ul></ul><ul><ul><ul><li>LFU: The Least Frequently Used data is deleted </li></ul></ul></ul><ul><ul><ul><li>FIFO: The « oldest » data is deleted </li></ul></ul></ul><ul><ul><li>Transfer optimization by selecting the best source </li></ul></ul><ul><ul><ul><li>Using statistics on previous transfers </li></ul></ul></ul><ul><ul><li>Storage resources usage management </li></ul></ul><ul><ul><ul><li>The space reserved for the data can be configured by the “user” </li></ul></ul></ul><ul><ul><li>Data status backup/restoration </li></ul></ul><ul><ul><ul><li>Allowing to stop and restart DIET, saving the data status on each node </li></ul></ul></ul>G. Le Mahec
    24. 24. DAGDA <ul><li>Transfer model </li></ul><ul><ul><li>Uses the pull model. </li></ul></ul><ul><ul><li>The data are sent independently of the service call. </li></ul></ul><ul><ul><li>The data can be sent in several parts. </li></ul></ul>1: The client send a request for a service 2: DIET selects some SeDs according using a scheduling heuristic 3: The client sends its request to the SeD 4: The SeD downloads the data from the client and/or from other DIET servers 5: The SeD performs the call. 6: The persistent data are updated G. Le Mahec
    25. 25. DAGDA <ul><li>DAGDA architecture </li></ul><ul><ul><li>Each data is associated to one unique identifier </li></ul></ul><ul><ul><li>DAGDA control the disk and memory space limits. If necessary, it uses a data replacement algorithm </li></ul></ul><ul><ul><li>The CORBA interface is used to communicate between the DAGDA nodes </li></ul></ul><ul><ul><li>Users can access data and perform replications using the API </li></ul></ul>G. Le Mahec
    26. 26. Putting everything together G. Le Mahec
    27. 27. Putting everything together <ul><li>Experiments over Grid’5000 using DIET and DAGDA </li></ul><ul><li>300 nodes (4 sites), 40000 sequences, 4 BLAST algorithms, 2 databases </li></ul><ul><li>2 different sequences splits </li></ul><ul><ul><li>Small grain: 1 sequence after an other </li></ul></ul><ul><ul><li>Coarse grain: splitting as a function of the number of servers available </li></ul></ul><ul><li>4 scheduling algorithms </li></ul><ul><ul><li>random, MCT, round-robin, dynamic SRA </li></ul></ul>G. Le Mahec
    28. 28. Putting everything together <ul><li>Maximum request partition </li></ul><ul><li>Partition in n sub-requests. ( n the number of available nodes) </li></ul><ul><li>For MCT, Random & Round-Robin: The better requests partitioning is using the number of available nodes (coarse grain). </li></ul><ul><li>For Dynamic-SRA: One sequence per request (small grain) </li></ul>G. Le Mahec
    29. 29. Putting everything together <ul><li>Comparison of MCT and dynamic SRA over 1000 nodes (8 sites) </li></ul><ul><li>4 databases between 175 Mo and 6.5 Go </li></ul><ul><li>4 algorithms </li></ul><ul><li>40000 sequences </li></ul><ul><li>Frequencies variation </li></ul>G. Le Mahec
    30. 30. Putting everything together <ul><li>Using MCT: </li></ul>Using Dynamic-SRA: <ul><li>Requests 1 to 20000: </li></ul><ul><ul><li>BLASTP Vs DB 1 30% </li></ul></ul><ul><ul><li>BLASTN Vs DB 2 30% </li></ul></ul><ul><ul><li>BLASTX Vs DB 3 20% </li></ul></ul><ul><ul><li>TBLASTX Vs DB 4 20% </li></ul></ul><ul><li>Requests 20001 to 40000: </li></ul><ul><ul><li>BLASTP Vs DB 1 10% </li></ul></ul><ul><ul><li>BLASTP Vs DB 3 30% </li></ul></ul><ul><ul><li>BLASTN Vs DB 2 10% </li></ul></ul><ul><ul><li>BLASTN Vs DB 4 10% </li></ul></ul><ul><ul><li>BLASTX Vs DB 1 10% </li></ul></ul><ul><ul><li>BLASTX Vs DB 3 10% </li></ul></ul><ul><ul><li>TBLASTX Vs DB 2 20% </li></ul></ul>G. Le Mahec
    31. 31. Conclusion and future work about DIET_BLAST <ul><li>Data management is one of the most important issue of large scale applications over the grid </li></ul><ul><li>It has to be linked with request scheduling to get the best performance </li></ul><ul><li>Static algorithms perform well even on a dynamic platform </li></ul><ul><li>Future work </li></ul><ul><ul><li>Provide a set of replication algorithms tuned for specific application classes within DAGDA </li></ul></ul><ul><ul><li>Increase the exchange between resource and data management systems </li></ul></ul><ul><ul><li>Use other performance metrics (fairness) </li></ul></ul><ul><ul><li>Manage replication for workflows scheduling </li></ul></ul>
    32. 32. Une collaboration autour de trois partenaires fondateurs <ul><li>AFM </li></ul><ul><ul><li>coordination de l’appel à projets auprès de la communauté scientifique </li></ul></ul><ul><ul><li>financement des projets de recherche </li></ul></ul><ul><li>IBM </li></ul><ul><ul><li>expertise du Grid Computing + Sciences de la Vie </li></ul></ul><ul><ul><li>dotation de 6 universités de supercalculateurs (programme Shared University Research) </li></ul></ul><ul><ul><li>accès WCG </li></ul></ul><ul><li>CNRS </li></ul><ul><ul><li>pilotage scientifique du programme </li></ul></ul><ul><ul><li>expertise scientifique et technologique du portage des applications </li></ul></ul><ul><ul><li>Gestion des travaux et des résultats du WCG (HCMD1 et 2 d’A. Carbone) </li></ul></ul><ul><ul><li>stratégie scientifique de l'AFM : guérir les maladies neuromusculaires et les maladies rares pour la plupart d'origine génétique </li></ul></ul>http://www.decrypthon.fr/
    33. 33. Avec la participation d'autres partenaires <ul><li>Pilotage des ressources de la grille en optimisant : planification et exécution </li></ul><ul><ul><li>utilisation du logiciel DIET développé par l’ENS (succède à United Devices) </li></ul></ul><ul><ul><li>suivi des programmes informatiques de chaque projet </li></ul></ul><ul><li>Installation de supercalculateurs (à base de power G5) </li></ul><ul><li>puissance de 500 Gflops / 473 Gflops déjà présents dans les universités </li></ul><ul><li>Réseau National de Télécommunications pour l’Enseignement et la Recherche (RENATER) connecte l’ensemble des ressources </li></ul><ul><li>ENS Lyon – équipe GRAAL/Avalon </li></ul>
    34. 34. Deployment example with Universities Sed = Server Daemon, installed on any server running Loadleveler. Note that we can define rescue SeD. MA = master agent, coordinates Jobs. We can define rescue or multiple Master Agent. WN = worker node http://www.decrypthon.fr/ ORSAY SeD LoadLeveler BORDEAUX Project Users SeD LoadLeveler SeD LoadLeveler SeD LoadLeveler Web Interface Orsay Decrypthon2 CRIHAN DB2 Orsay Decrypthon1 Master Agent DIET Décrypthon LILLE JUSSIEU BD AFM Cliniques Lyon IBM WII Data manager Interface
    35. 35. Philosophy of the Décrypthon grid <ul><li>Transparency </li></ul><ul><ul><li>An interface as close as the one users are used to, to submit, to monitor and to download the results of jobs. </li></ul></ul><ul><li>Reactivity </li></ul><ul><ul><li>new algorithms are ported as quickly as possible on the grid to be close to the practical use case. </li></ul></ul>
    36. 36. Data management Credits: H. N’Guyen, O. Poch, IGBMC Décrypthon Grid - Grid Resources Dedicated to Neuromuscular Disorders, Bard, N., Bolze, R., Caron, E., Desprez, F., Heymann, M., Friedrich, A., Moulinier, L., Nguyen, N.-H., Poch, O. and Toursel, T., 8th HealthGrid conference, Paris, France, June, 2010.
    37. 37. Data management, cont
    38. 38. <ul><li>SM2PH : from S tructural M utation to P athology P henotypes in H uman </li></ul><ul><li>(Friedrich A et al. Human Mutation 2010 Feb;31(2):127-35) </li></ul><ul><li>Goal: estimate the structural impact of a mutation and correlate with human genotype, phenotype and pathology </li></ul><ul><li>First step: introduce SStEISy ( Sequence/STucture/Evolution Inference in SYstems ) reasoning in the context of human monogenic diseases </li></ul>SM2PH a pilot project and a success story <ul><ul><ul><li>Infrastructure : </li></ul></ul></ul><ul><ul><ul><li>- BIRD database ( sequences, genomical data, alignments, 3D structures or models, pathology descriptions, analysis results… ) </li></ul></ul></ul><ul><ul><ul><li>- Genotype/Phenotype description (UMD, LOVD) </li></ul></ul></ul><ul><ul><ul><li>- WEB server, interactive analysis interface </li></ul></ul></ul><ul><ul><ul><li>Software and algorithms : scoring function for prediction of mutation consequences in structural and variability contexts, correlation between mutation and phenotype severity, … </li></ul></ul></ul>Credits: H. N’Guyen, O. Poch, IGBMC
    39. 39. <ul><li>SM2PH-db ( http://decrypthon.igbmc.fr/sm2ph/ ) </li></ul><ul><ul><li>entry: 2 296 proteins involved in human monogenic diseases </li></ul></ul><ul><ul><li>for each entry: wide range of information involved in the genotype – phenotype relationship </li></ul></ul><ul><ul><ul><li>Evolutionary view : Multiple Alignment of Complete Sequences </li></ul></ul></ul><ul><ul><ul><li>Structural view : 3D models: 1596 wild type proteins </li></ul></ul></ul><ul><ul><ul><li>10245 mutant proteins </li></ul></ul></ul><ul><ul><ul><li>Informational view : </li></ul></ul></ul><ul><ul><ul><ul><li>structural and functional annotations ( UniProt, Pfam, Prosite, Interpro, GO ) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>mutation and phenotypic data (24 962 missense mutations) </li></ul></ul></ul></ul>Structural sampling Eukaryote Filter Automatic update every 2 months, current version : 9 SM2PH-db
    40. 40. Complex interconnected programs <ul><li>On a single data, up to 25 programs are applied, in cascade or in parallel. </li></ul><ul><li>Gain de 15 à 1 jours sur la plate-forme du Décrypthon. </li></ul>Credits: H. N’Guyen, O. Poch, IGBMC
    41. 41. What’s next ? <ul><li>New platform based on IBM BlueGene/P </li></ul><ul><li>Mixed Grid/Cloud approach using SysFera-DS </li></ul><ul><ul><li>LoadLeveler on existing platforms </li></ul></ul><ul><ul><li>IBM Cloud software on new platforms </li></ul></ul><ul><ul><li>Extension to public Clouds </li></ul></ul><ul><ul><li>Seamless access to resources </li></ul></ul><ul><li>Tight integration of recent developments from IGBMC around data management </li></ul><ul><li>Work around the user interfaces </li></ul>
    42. 42. SysFera <ul><li>SysFera-DS : une pile logicielle complète pour le HPC </li></ul><ul><li>... et un accès simple et transparent aux infrastructures de Cloud </li></ul>http://www.sysfera.fr/ <ul><li>Inside the Cloud </li></ul><ul><li>+ DIET platform is virtualized inside the cloud. (as Xen image for example) </li></ul><ul><li>+ Very flexible and scalable as DIET nodes can be launched </li></ul><ul><li>+ Dynamic adaptation % charge </li></ul><ul><li>Cloud manager </li></ul><ul><li>+ EC2 interface </li></ul><ul><li>+ EC2 is treated as a new Batch System </li></ul><ul><li>+ Automatic deployment of VMs with associated services </li></ul>
    43. 43. Des questions ?

    ×