Advanced Data Mining and Integration Research for Europe (ADMIRE)

2,954 views

Published on

Presentation for the UK e-Science All Hands Meeting 2009 on 8 December 2009.

Published in: Sports, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,954
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • * This is not about projects, publications
    * Where did we suddenly appear from
  • * One of the papers that is signposting
  • * Sensors, large machines, interaction with data (software), interaction between people, interaction of software on data, ...
  • * More explicit forms of demands
  • * More explicit forms of demands
  • * A proposed solution
    * How do you go about implementing a solution under the fourth paradigm?
  • Advanced Data Mining and Integration Research for Europe (ADMIRE)

    1. 1. Advanced Data Mining and Integration Research for Europe (ADMIRE) Jano van Hemert NI VER research.nesc.ac.uk U S E IT TH Y O F H G E R D I U N B
    2. 2. Downloaded from www.sciencemag.org on July 6, 2009 COMPUTER SCIENCE The demands of data-intensive science Beyond the Data Deluge represent a challenge for diverse scientific communities. Gordon Bell,1 Tony Hey,1 Alex Szalay2 S ince at least Newton’s laws of motion in the 17th century, scientists have recog- nized experimental and theoretical sci- ence as the basic research paradigms for understanding nature. In recent decades, com- puter simulations have become an essential third paradigm: a standard tool for scientists to explore domains that are inaccessible to theory and experiment, such as the evolution of the universe, car passenger crash testing, and pre- dicting climate change. As simulations and experiments yield ever more data, a fourth par- adigm is emerging, consisting of the tech- niques and technologies needed to perform data-intensive science (1). For example, new types of computer clusters are emerging that are optimized for data movement and analysis rather than computing, while in astronomy and other sciences, integrated data systems allow data analysis and storage on site instead of requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen- hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service. throughput instruments, sensor networks, accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas- to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists. (2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to CREDIT: JONATHAN FAY/MICROSOFT (1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever management challenges. In almost every labo- increasing flood of scientific data generated ratory, “born digital” data proliferate in files, by the faster computers. In university research 1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually 98052, USA. 2Department of Physics and Astronomy, Johns Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can 21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297 Published by AAAS
    3. 3. Downloaded from www.sciencemag.org on July 6, 2009 COMPUTER SCIENCE The demands of data-intensive science Beyond the Data Deluge represent a challenge for diverse scientific communities. Gordon Bell,1 Tony Hey,1 Alex Szalay2 S ince at least Newton’s laws of motion in the 17th century, scientists have recog- nized experimental and theoretical sci- ence as the basic research paradigms for understanding nature. In recent decades, com- puter simulations have become an essential third paradigm: a standard tool for scientists to explore domains that are inaccessible to theory and experiment, such as the evolution of the universe, car passenger crash testing, and pre- dicting climate change. As simulations and experiments yield ever more data, a fourth par- adigm is emerging, consisting of the tech- niques and technologies needed to perform data-intensive science (1). For example, new types of computer clusters are emerging that are optimized for data movement and analysis rather than computing, while in astronomy and other sciences, integrated data systems allow data analysis and storage on site instead of requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen- hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service. throughput instruments, sensor networks, accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas- to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists. (2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to CREDIT: JONATHAN FAY/MICROSOFT (1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever management challenges. In almost every labo- increasing flood of scientific data generated ratory, “born digital” data proliferate in files, by the faster computers. In university research 1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually 98052, USA. 2Department of Physics and Astronomy, Johns Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can 21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297 Published by AAAS
    4. 4. Vol 455|4 September 2008 BOOKS & ARTS Distilling meaning from data Buried in vast streams of data are clues to new science. But we may need to craft new lenses to see them, explain Felice Frankel and Rosalind Reid. It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun- as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www. ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt- we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science. further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline. questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci- To extract new meaning entists make them rarely as D. ARMENDARIZ from the sea of data, scien- adventurous as these young tists have begun to embrace minds. We think we are on 23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409 the tools of visualization. Yet the path to insight when few appreciate that visual rep- shading reveals contours resentation is also a form of in 3D renderings, or when communication. A rich body bursts of red appear on heat of communication expertise maps, for example. But the Vol 440|23 March 2006 holds the potential to greatly algorithms used to produce improve these tools. We pro- the graphics may create illu- pose that graphic artists, com- sions or embed assumptions. municators and visualization scientists should be brought into conversation with theo- The human visual system creates in the brain an appar- ent understanding of what COMMENTARY rists and experimenters a picture represents, not before all the data have been necessarily a picture of the gathered. If we design experi- underlying science. Unless Exceeding human limits ments in ways that offer varied we know all the steps from opportunities for represent- hypothesis to understand- ing and communicating data, ing — by conversing with techniques for extracting new theorists, experimentalists, understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software are turning to automated processes and technologies in a bid to cope with ever higher volumes of data. Scientists available. developers, visualization But automation offers so much more to the future of science than just data handling, says Stephen H. Muggleton. Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy- intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis- FIREFLY PRODUCTIONS/CORBIS for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas- important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam- that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I experiment in biology can to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a gigabyte of data per day, and in astronomy a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University, broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a automatic tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci- On the Surface other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas- ences the querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing low-cost digital tion, discovery and communication. for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is efficiency Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the clear communicators are often consulted only after science emerged. expansion of automation in all its aspects: data
    5. 5. c and probability cal- and charge distributionshould become easier for autonomous experimen On such timescales it of individual molecules however, still a decade ic provides a formal need to be integrated scientists to reproduce new experiments and becoming standard scie Vol 455|4 September 2008 gramming languages with models describ- refute their hypotheses. Despite the potentia BOOKS & ARTS probability calculus ing Today’s generation of microfluidic “Owing tomachines severe danger data the scale and rate of that incre the interdepen- generation, computational models of ms of probability for dency of chemical out a specific series of ume of data generation is designed to carry Distilling meaning from data reactions, scientific flexibility decreases in compreh s bayesian networks.new science. But we may needHowever, but further data now require automatic chemical Buried in vast streams of data are clues to reactions. to craft new stic logic’ is a formaland Rosalind Reid. be added the tool kit by developing Academic studies on the could to this construction and modification.” lenses to see them, explain Felice Frankel differences in statements of sound mathematical under- call what one might t It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun- as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www. ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt- a ‘chemical Turing “There is a severe danger that i robability of A being pinnings of, say, differential equations, bayesian puter. Such chips contai we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their machine’. The universal ure forms of existing networks and logic programs make integrating chambers, ducts, gates t unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes increases in speed and volume of n in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science. further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training Turing machine, devised fortunately computa- these various models virtually impossible. reagent stores, and allow ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline. wever, an increasing Although by Alan Turing, be data generation could leadat high sp in 1936 hybrid models can built by simply sis and testing to questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci- t To extract new meaning entists make them rarely as D. ARMENDARIZ from the sea of data, scien- adventurous as these young tists have begun to embrace minds. We think we are on was intended to mimic decreases in comprehensibility.” ups have developed patching two models together, the underlying miniaturizing our robot-o 23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409 the tools of visualization. Yet the path to insight when few appreciate that visual rep- shading reveals contours resentation is also a form of in 3D renderings, or when communication. A rich body of communication expertise holds the potential to greatly the pencil-and-paper ques that can handle differences lead to unpredictable and error- this way, with the overal bursts of red appear on heat maps, for example. But the Vol 440|23 March 2006 algorithms used to produce s probabilistic logic6. prone behaviour mathematician. The chemical experimental cycle time improve these tools. We pro- pose that graphic artists, com- operations of a when changes are made. beings. This is particu the graphics may create illu- sions or embed assumptions. municators and visualization such research holds Turing encouraging development in this liseconds.associated with scientists should be brought machine would be a universal proces- nologies With microflu COMMENTARY The human visual system creates in the brain an appar- One into conversation with theo- ent understanding of what rists and experimenters a picture represents, not egration of scientific respect is the emergence withinbroad range of chemical reaction not onA before all the data have been gathered. If we design experi- sor capable of performing a computer sci- and experimentation. necessarily a picture of the underlying science. Unless al and computer-sci- ence of new formalisms5 that integrate, in alimits chemical operations Exceeding human complete, but also requi ments in ways that offer varied we know all the steps from opportunities for represent- ing and communicating data, techniques for extracting new on both the reagents essentially human activhypothesis to understand- ing — by conversing with theorists, experimentalists, available to it at the start andoffersto automated processes andof science thaninjustbid to cope with saysStephen H. Muggleton. a thoseof mathe- of input materials, with o Scientists are turning chemicals bothhandling, ever higher volumes of data. technologies a data in the statement understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software available. sound fashion, two major branches more to the future But automation so much developers, visualization Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy- matics: mathematical logic and probabilityauto- On such timescales it sho it later generates. The machine would cal- clear and undeniable intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis- FIREFLY PRODUCTIONS/CORBIS for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the s culus. Mathematicaland test chemical com- scientists to reproduce n matically prepare logic provides a formal experimentation. pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas- important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam- that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput pounds but it would also be programmable, Stephen H. Muggleton is learning approaches foundation for logic programming languages refute their hypotheses. on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I experiment in biology can to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a gigabyte of data per day, and in astronomy a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University, ng scientific models such as Prolog, much theprobability calculusa Computing and the Centr thus allowing whereas same flexibility as broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a automatic Today’s generation of m tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci- On the Surface other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas- ences the real chemist has in the lab. p’ systems with no provides the basic axioms of probability for is designed to carry ou Systems Biology at Imper querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing tion, discovery and communication. low-cost digital for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is efficiency to the collection of One can think of a chemical Turing 2BZ, UK. Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the communicators are often consulted only after science emerged. clear expansion of automation in all its aspects: data
    6. 6. Aims
    7. 7. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge.
    8. 8. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge. • The project is motivated by the difficulty of extracting meaningful information by data mining combinations of data from multiple heterogeneous and distributed resources.
    9. 9. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge. • The project is motivated by the difficulty of extracting meaningful information by data mining combinations of data from multiple heterogeneous and distributed resources. • It will also provide an abstract view of data mining and integration, which will give users and developers the power to cope with complexity and heterogeneity of services, data and processes.
    10. 10. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources
    11. 11. Separating concerns User and application diversity Iterative DMI process development Accommodating Tool level Many application domains Many tool sets Many process representations Many working practices Gateway interface DMI canonical representation and abstract machine one model Composing or hiding Enactment Many autonomous resources & services level Mapping optimisation Multiple enactment mechanisms and Multiple platform implementations enactment System diversity and complexity
    12. 12. Architecture
    13. 13. Architecture
    14. 14. Use case
    15. 15. Use case TS23.embryo.organ system.sensory organ.nose.nasal cavity.epithelium.olfactory TS23.embryo.organ system.visceral organ.alimentary system.oral region.upper jaw.tooth.incisor TS23.embryo.organ system.visceral organ.alimentary system.oral region.lower jaw.tooth.incisor TS23.embryo.organ system.visceral organ.liver and biliary system.liver.lobe
    16. 16. Formulation Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction
    17. 17. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction
    18. 18. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    19. 19. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    20. 20. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation OGSA-DAI Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    21. 21.                                                      
    22. 22.                                                                                                                                                                                                                                                                                                                   
    23. 23. Architecture results 5 2 nodes 3 nodes 4.5 4 nodes 5 nodes 6 nodes 4 7 nodes 8 nodes 3.5 Speedup 3 2.5 2 1.5 1 0.5 0 5000 10000 15000 20000 Number of Images
    24. 24. The ‘hump’ 6000 Workflow Execution Time PE1 PE2 5000 PE3 PE4 PE5 Processing Time(s) PE6 4000 PE7 PE8 PE9 3000 2000 1000 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 6400 Images 12800 Images 19200 Images Number of Computing Nodes
    25. 25. The ‘hump’ 6000 Workflow Execution Time PE1 PE2 5000 PE3 PE4 PE5 Processing Time(s) PE6 4000 PE7 PE8 PE9 3000 2000 1000 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 6400 Images 12800 Images 19200 Images Number of Computing Nodes
    26. 26. Data mining results Table 1. The preliminary result of classification performance using 10-fold validation hhhh h hhClassification Performance hhhh hhhh Sensitivity Specificity Gene expression hh h Humerus 0.7525 0.7921 Handplate 0.7105 0.7231 Fibula 0.7273 0.718 Tibia 0.7467 0.7451 Femur 0.7241 0.7345 Ribs 0.5614 0.7538 Petrous part 0.7903 0.7538 Scapula 0.7882 0.7099 Head mesenchyme 0.7857 0.5507 Note: Sensitivity: true positive rate. Specificity: true negative rate. 5 Conclusion and Future Work
    27. 27. Where we are • Architecture prototype works • Intuitive workbench created • Will be connected next • Two more use cases
    28. 28. Team National e-Science Centre http://www.admire-project.eu/ Malcolm Atkinson Jano van Hemert Liangxiu Han Gagarine Yaikhom Chee-Sun Liew EPCC Mark Parsons et al. University of Vienna Peter Brezany et al. Universidad Politécnica de Madrid Oscar Corcho Slovak Academy of Sciences Ladislav Hluchý Fujitsu Labs Europe David Snelling ComArch SA Marcin Choiński http://research.nesc.ac.uk/

    ×