SlideShare a Scribd company logo
1 of 28
Advanced Data Mining and
       Integration Research for
           Europe (ADMIRE)
                           Jano van Hemert

            NI VER
                            research.nesc.ac.uk
          U        S
 E




                      IT
TH




                       Y
O F




                       H
                       G




      E
                   R




          D I     U
              N B
Downloaded from www.sciencemag.org on July 6, 2009
                                 COMPUTER SCIENCE
                                                                                                                                                      The demands of data-intensive science

                                 Beyond the Data Deluge                                                                                               represent a challenge for diverse scientific
                                                                                                                                                      communities.
                                 Gordon Bell,1 Tony Hey,1 Alex Szalay2



                                 S
                                        ince at least Newton’s laws of motion in
                                        the 17th century, scientists have recog-
                                        nized experimental and theoretical sci-
                                 ence as the basic research paradigms for
                                 understanding nature. In recent decades, com-
                                 puter simulations have become an essential
                                 third paradigm: a standard tool for scientists to
                                 explore domains that are inaccessible to theory
                                 and experiment, such as the evolution of the
                                 universe, car passenger crash testing, and pre-
                                 dicting climate change. As simulations and
                                 experiments yield ever more data, a fourth par-
                                 adigm is emerging, consisting of the tech-
                                 niques and technologies needed to perform
                                 data-intensive science (1). For example, new
                                 types of computer clusters are emerging that
                                 are optimized for data movement and analysis
                                 rather than computing, while in astronomy and
                                 other sciences, integrated data systems allow
                                 data analysis and storage on site instead of
                                 requiring download of large amounts of data.               Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive
                                     Today, some areas of science are facing                science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen-
                                 hundred- to thousandfold increases in data                 tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image
                                 volumes from satellites, telescopes, high-                 of the moon, synthesized within the World Wide Telescope service.
                                 throughput instruments, sensor networks,
                                 accelerators, and supercomputers, compared                 challenging scientists (4). In contrast to the tra-       ing of these digital data are becoming increas-
                                 to the volumes generated only a decade ago                 ditional hypothesis-led approach to biology,              ingly burdensome for research scientists.
                                 (2). In astronomy and particle physics,                    Venter and others have argued that a data-                   Over the past 40 years or more, Moore’s
                                 these new experiments generate petabytes                   intensive inductive approach to genomics                  Law has enabled transistors on silicon chips to
CREDIT: JONATHAN FAY/MICROSOFT




                                 (1 petabyte = 1015 bytes) of data per year. In             (such as shotgun sequencing) is necessary to              get smaller and processors to get faster. At the
                                 bioinformatics, the increasing volume (3) and              address large-scale ecosystem questions (5, 6).           same time, technology improvements for
                                 the extreme heterogeneity of the data are                      Other research fields also face major data            disks for storage cannot keep up with the ever
                                                                                            management challenges. In almost every labo-              increasing flood of scientific data generated
                                                                                            ratory, “born digital” data proliferate in files,         by the faster computers. In university research
                                 1MicrosoftResearch, One Microsoft Way, Redmond, WA         spreadsheets, or databases stored on hard                 labs, Beowulf clusters—groups of usually
                                 98052, USA. 2Department of Physics and Astronomy, Johns
                                 Hopkins University, 3701 San Martin Drive, Baltimore, MD   drives, digital notebooks, Web sites, blogs, and          identical, inexpensive PC computers that can
                                 21218, USA. E-mail: szalay@jhu.edu                         wikis. The management, curation, and archiv-              be used for parallel computations—have

                                                                               www.sciencemag.org            SCIENCE         VOL 323       6 MARCH 2009                                                        1297
                                                                                                              Published by AAAS
Downloaded from www.sciencemag.org on July 6, 2009
                                 COMPUTER SCIENCE
                                                                                                                                                      The demands of data-intensive science

                                 Beyond the Data Deluge                                                                                               represent a challenge for diverse scientific
                                                                                                                                                      communities.
                                 Gordon Bell,1 Tony Hey,1 Alex Szalay2



                                 S
                                        ince at least Newton’s laws of motion in
                                        the 17th century, scientists have recog-
                                        nized experimental and theoretical sci-
                                 ence as the basic research paradigms for
                                 understanding nature. In recent decades, com-
                                 puter simulations have become an essential
                                 third paradigm: a standard tool for scientists to
                                 explore domains that are inaccessible to theory
                                 and experiment, such as the evolution of the
                                 universe, car passenger crash testing, and pre-
                                 dicting climate change. As simulations and
                                 experiments yield ever more data, a fourth par-
                                 adigm is emerging, consisting of the tech-
                                 niques and technologies needed to perform
                                 data-intensive science (1). For example, new
                                 types of computer clusters are emerging that
                                 are optimized for data movement and analysis
                                 rather than computing, while in astronomy and
                                 other sciences, integrated data systems allow
                                 data analysis and storage on site instead of
                                 requiring download of large amounts of data.               Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive
                                     Today, some areas of science are facing                science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen-
                                 hundred- to thousandfold increases in data                 tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image
                                 volumes from satellites, telescopes, high-                 of the moon, synthesized within the World Wide Telescope service.
                                 throughput instruments, sensor networks,
                                 accelerators, and supercomputers, compared                 challenging scientists (4). In contrast to the tra-       ing of these digital data are becoming increas-
                                 to the volumes generated only a decade ago                 ditional hypothesis-led approach to biology,              ingly burdensome for research scientists.
                                 (2). In astronomy and particle physics,                    Venter and others have argued that a data-                   Over the past 40 years or more, Moore’s
                                 these new experiments generate petabytes                   intensive inductive approach to genomics                  Law has enabled transistors on silicon chips to
CREDIT: JONATHAN FAY/MICROSOFT




                                 (1 petabyte = 1015 bytes) of data per year. In             (such as shotgun sequencing) is necessary to              get smaller and processors to get faster. At the
                                 bioinformatics, the increasing volume (3) and              address large-scale ecosystem questions (5, 6).           same time, technology improvements for
                                 the extreme heterogeneity of the data are                      Other research fields also face major data            disks for storage cannot keep up with the ever
                                                                                            management challenges. In almost every labo-              increasing flood of scientific data generated
                                                                                            ratory, “born digital” data proliferate in files,         by the faster computers. In university research
                                 1MicrosoftResearch, One Microsoft Way, Redmond, WA         spreadsheets, or databases stored on hard                 labs, Beowulf clusters—groups of usually
                                 98052, USA. 2Department of Physics and Astronomy, Johns
                                 Hopkins University, 3701 San Martin Drive, Baltimore, MD   drives, digital notebooks, Web sites, blogs, and          identical, inexpensive PC computers that can
                                 21218, USA. E-mail: szalay@jhu.edu                         wikis. The management, curation, and archiv-              be used for parallel computations—have

                                                                               www.sciencemag.org            SCIENCE         VOL 323       6 MARCH 2009                                                        1297
                                                                                                              Published by AAAS
Vol 455|4 September 2008




BOOKS & ARTS
Distilling meaning from data
Buried in vast streams of data are clues to new science. But we may need to craft new
lenses to see them, explain Felice Frankel and Rosalind Reid.
It is a breathtaking time in science               they will create effective computer displays, those run by the US National Science Foun-
as masses of data pour in, prom-                   slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www.
ising new insights. But how can                    they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt-
we find meaning in these tera-                     kept at arm’s length by scientists who are busy ing to visually communicate scientific data and
bytes? To search successfully                      getting their experiments done. Opportunities concepts opens a path to understanding. When
for new science in large datasets, we must find for useful dialogue are thus squandered.              science and design students collaborate, their
unexpected patterns and interpret evidence            When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes
in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science.
further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training
ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for
venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline.
questions and answers.                             imageandmeaning.org), it becomes clear               The ingrained habits of highly trained sci-
    To extract new meaning                                                                                              entists make them rarely as




                                                                                                                                     D. ARMENDARIZ
from the sea of data, scien-                                                                                            adventurous as these young
tists have begun to embrace                                                                                             minds. We think we are on
                                                                                                                             23.3 Commentary Muggleton jw                20/3/06 6:29 PM Page 409
the tools of visualization. Yet                                                                                         the path to insight when
few appreciate that visual rep-                                                                                         shading reveals contours
resentation is also a form of                                                                                           in 3D renderings, or when
communication. A rich body                                                                                              bursts of red appear on heat
of communication expertise                                                                                              maps, for example. But the
                                                                                                                                         Vol 440|23 March 2006
holds the potential to greatly                                                                                          algorithms used to produce
improve these tools. We pro-                                                                                            the graphics may create illu-
pose that graphic artists, com-                                                                                         sions or embed assumptions.
municators and visualization
scientists should be brought
into conversation with theo-
                                                                                                                        The human visual system
                                                                                                                        creates in the brain an appar-
                                                                                                                        ent understanding of what
                                                                                                                                                                                                   COMMENTARY
rists and experimenters                                                                                                 a picture represents, not
before all the data have been                                                                                           necessarily a picture of the
gathered. If we design experi-                                                                                          underlying science. Unless


                                                                                                                      Exceeding human limits
ments in ways that offer varied                                                                                         we know all the steps from
opportunities for represent-                                                                                            hypothesis to understand-
ing and communicating data,                                                                                             ing — by conversing with
techniques for extracting new                                                                                           theorists, experimentalists,
understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software are turning to automated processes and technologies in a bid to cope with ever higher volumes of data.
                                                                                                                                         Scientists
available.                                                                                                              developers, visualization
                                                                                                                                         But automation offers so much more to the future of science than just data handling, says Stephen H. Muggleton.
    Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy-
intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis-




                                                                                                                                                                                                                                                           FIREFLY PRODUCTIONS/CORBIS
for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading.                            The collection and curation
near Geneva, for example, physicists will have dimensional volume rendering may give no                 The greatest opportunity and risk lie in that     of data throughout the
pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether                      sciences is becoming increas-
important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled                ingly automated. For exam-
that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk.               ple, a single high-throughput
on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk.                                          I
                                                                                                                                                          experiment in biology can
to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the                easily generate more than a
                                                                                                                                         gigabyte of data per day, and in astronomy
a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University,
broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a
                                                                                                                                         automatic
tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci-
                                                                                                                                         On the Surface
other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas-
                                                                                                                                         ences the
querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by
ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing
                                                                                                                                         low-cost digital
tion, discovery and communication.                 for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is
                                                                                                                                         efficiency
    Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the
                                                                                                                                         clear
communicators are often consulted only after science emerged.                                                                            expansion of automation in all its aspects: data
c and probability cal- and charge distributionshould become easier for autonomous experimen
                                                 On such timescales it of individual molecules                                 however, still a decade
 ic provides a formal need to be integrated      scientists to reproduce new experiments and becoming standard scie        Vol 455|4 September 2008


gramming languages with models describ-          refute their hypotheses.                                                              Despite the potentia
    BOOKS & ARTS
  probability calculus ing Today’s generation of microfluidic
                                                                                   “Owing tomachines severe danger data
                                                                                                              the scale and rate of that incre
                                                     the interdepen- generation, computational models of
ms of probability for dency of chemical out a specific series of ume of data generation
                                                 is designed to carry
    Distilling meaning from data reactions, scientific flexibility decreases in compreh
 s bayesian networks.new science. But we may needHowever, but further data now require automatic
                                                 chemical
    Buried in vast streams of data are clues to reactions. to craft new
stic logic’ is a formaland Rosalind Reid. be added the tool kit by developing Academic studies on the
                                                 could                  to this
                                                                                         construction and modification.”
    lenses to see them, explain Felice Frankel
                                                differences in
statements of sound mathematical under- call     what one might                                                                                                                   t
    It is a breathtaking time in science               they will create effective computer displays, those run by the US National Science Foun-
    as masses of data pour in, prom-                   slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www.
    ising new insights. But how can                    they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt-

                                                 a ‘chemical Turing                              “There is a severe danger that i
 robability of A being pinnings of, say, differential equations, bayesian puter. Such chips contai
    we find meaning in these tera-                     kept at arm’s length by scientists who are busy ing to visually communicate scientific data and
    bytes? To search successfully                      getting their experiments done. Opportunities concepts opens a path to understanding. When
    for new science in large datasets, we must find for useful dialogue are thus squandered.              science and design students collaborate, their

                                                 machine’. The universal
ure forms of existing networks and logic programs make integrating chambers, ducts, gates                                                                                         t
    unexpected patterns and interpret evidence            When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes


                                                                                             increases in speed and volume of n
    in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science.
    further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training

                                                 Turing machine, devised
fortunately computa- these various models virtually impossible. reagent stores, and allow
    ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for
    venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline.


wever, an increasing Although by Alan Turing, be data generation could leadat high sp
                                                 in 1936 hybrid models can built by simply sis and testing to
    questions and answers.                             imageandmeaning.org), it becomes clear               The ingrained habits of highly trained sci-

                                                                                                                                                                                  t
        To extract new meaning                                                                                              entists make them rarely as




                                                                                                                                                      D. ARMENDARIZ
    from the sea of data, scien-                                                                                            adventurous as these young
    tists have begun to embrace                                                                                             minds. We think we are on

                                                 was intended to mimic decreases in comprehensibility.”
 ups have developed patching two models together, the underlying miniaturizing our robot-o
                                                                                                                                 23.3 Commentary Muggleton jw                20/3/06 6:29 PM    Page 409
    the tools of visualization. Yet                                                                                         the path to insight when
    few appreciate that visual rep-                                                                                         shading reveals contours
    resentation is also a form of                                                                                           in 3D renderings, or when
    communication. A rich body
    of communication expertise
    holds the potential to greatly
                                                 the pencil-and-paper
 ques that can handle differences lead to unpredictable and error- this way, with the overal                                bursts of red appear on heat
                                                                                                                            maps, for example. But the
                                                                                                                                             Vol 440|23 March 2006
                                                                                                                            algorithms used to produce
                                                                                                                                                                                  s
probabilistic logic6. prone behaviour mathematician. The chemical experimental cycle time
    improve these tools. We pro-
    pose that graphic artists, com-              operations of a when changes are made.                                        beings. This is particu
                                                                                                                            the graphics may create illu-
                                                                                                                            sions or embed assumptions.
    municators and visualization

   such research holds Turing encouraging development in this liseconds.associated with
    scientists should be brought
                                                          machine would be a universal proces- nologies With microflu              COMMENTARY
                                                                                                                            The human visual system
                                                                                                                            creates in the brain an appar-

                                                   One
    into conversation with theo-                                                                                            ent understanding of what
    rists and experimenters                                                                                                 a picture represents, not

 egration of scientific respect is the emergence withinbroad range of chemical reaction not onA
    before all the data have been
    gathered. If we design experi-               sor capable of performing a computer sci-                                     and experimentation.
                                                                                                                            necessarily a picture of the
                                                                                                                            underlying science. Unless


 al and computer-sci- ence of new formalisms5 that integrate, in alimits
                                                 chemical operations Exceeding human complete, but also requi
    ments in ways that offer varied                                                                                         we know all the steps from
    opportunities for represent-
    ing and communicating data,
    techniques for extracting new
                                                                                on both the reagents essentially human activhypothesis to understand-
                                                                                                                            ing — by conversing with
                                                                                                                            theorists, experimentalists,

                                                 available to it at the start andoffersto automated processes andof science thaninjustbid to cope with saysStephen H. Muggleton. a
                                                                                                 thoseof mathe- of input materials, with o
                                                                              Scientists are turning
                                                                                                             chemicals bothhandling, ever higher volumes of data.
                                                                                                                   technologies a
                                                                                                                                       data in the statement
    understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software
    available.
                                                sound fashion, two major branches more to the future
                                                                              But automation         so much                developers, visualization
        Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy-


                                                matics: mathematical logic and probabilityauto- On such timescales it sho
                                                 it later generates. The machine would cal-                                    clear and undeniable
    intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis-




                                                                                                                                                                                                           FIREFLY PRODUCTIONS/CORBIS
    for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading.                            The collection and curation
    near Geneva, for example, physicists will have dimensional volume rendering may give no                 The greatest opportunity and risk lie in that     of data throughout the

 s                                              culus. Mathematicaland test chemical com- scientists to reproduce n
                                                 matically prepare logic provides a formal                                     experimentation.
    pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether                      sciences is becoming increas-
    important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled                ingly automated. For exam-
    that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk.               ple, a single high-throughput

                                                 pounds but it would also be programmable, Stephen H. Muggleton is
 learning approaches foundation for logic programming languages refute their hypotheses.
    on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk.                                          I
                                                                                                                                                              experiment in biology can
    to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the                easily generate more than a
                                                                                                                                             gigabyte of data per day, and in astronomy
    a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University,

ng scientific models such as Prolog, much theprobability calculusa Computing and the Centr
                                                 thus allowing whereas same flexibility as
    broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a
                                                                                                                                             automatic

                                                                                                                                   Today’s generation of m
    tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci-
                                                                                                                                             On the Surface
    other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas-
                                                                                                                                             ences the

                                                 real chemist has in the lab.
 p’ systems with no provides the basic axioms of probability for is designed to carry ou                                       Systems Biology at Imper
    querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by
    ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing
    tion, discovery and communication.
                                                                                                                                             low-cost digital
                                                       for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is
                                                                                                                                             efficiency

  to the collection of                              One can think of a chemical Turing 2BZ, UK.
        Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the
    communicators are often consulted only after science emerged.
                                                                                                                                             clear
                                                                                                                                             expansion of automation in all its aspects: data
Aims
Aims
•   ADMIRE aims to deliver a consistent and easy-to-
    use technology for extracting information and
    knowledge.
Aims
•   ADMIRE aims to deliver a consistent and easy-to-
    use technology for extracting information and
    knowledge.
•   The project is motivated by the difficulty of extracting
    meaningful information by data mining combinations
    of data from multiple heterogeneous and
    distributed resources.
Aims
•   ADMIRE aims to deliver a consistent and easy-to-
    use technology for extracting information and
    knowledge.
•   The project is motivated by the difficulty of extracting
    meaningful information by data mining combinations
    of data from multiple heterogeneous and
    distributed resources.
•   It will also provide an abstract view of data mining
    and integration, which will give users and
    developers the power to cope with complexity and
    heterogeneity of services, data and processes.
Computional                                              Domain
  Thinkers                                               Specialists
                            Creating
Formulation                                             Interaction
Data models &                                           Experiments &
computational                                            knowledge
  methods                                                  creation




                Mapping                      Steering


                           Data-Intensive
                            Engineers

                           Execution
                          Implementations,
                           compute & data
                             resources
Separating concerns
         User and application diversity


                 Iterative DMI
                    process
                 development       Accommodating
 Tool level
                                   Many application domains
                                   Many tool sets
                                   Many process representations
                                   Many working practices

 Gateway interface
                          DMI canonical representation and abstract machine
 one model


                                   Composing or hiding
 Enactment                         Many autonomous resources & services
 level             Mapping
                 optimisation      Multiple enactment mechanisms
                     and           Multiple platform implementations
                  enactment




       System diversity and complexity
Architecture
Architecture
Use case
Use case

TS23.embryo.organ system.sensory
organ.nose.nasal cavity.epithelium.olfactory

TS23.embryo.organ system.visceral organ.alimentary
system.oral region.upper jaw.tooth.incisor
TS23.embryo.organ system.visceral organ.alimentary
system.oral region.lower jaw.tooth.incisor




TS23.embryo.organ system.visceral
organ.liver and biliary system.liver.lobe
Formulation
                                 Testing phase    Training phase
    Manual            Image                             Image
  Annotations      integration                        processing
                                       Image
                                     processing


                                                       Feature
                                      Feature         generation
    Images                           generation


                                      Feature           Feature
                                     selection/        selection/
Deployment phase                                       extraction
                                     extraction
                     Apply
                   classifier
   Automatic                         Prediction        Classifier
  annotations                        evaluation       construction
Testing phase    Training phase
    Manual            Image                             Image
  Annotations      integration                        processing
                                       Image
                                     processing


                                                       Feature


         Formulation
                                      Feature         generation
    Images                           generation


                                      Feature           Feature
                                     selection/        selection/
Deployment phase                                       extraction
                                     extraction
                     Apply
                   classifier
   Automatic                         Prediction        Classifier
  annotations                        evaluation       construction
Testing phase    Training phase
          Manual             Image                             Image
        Annotations       integration                        processing
                                              Image
                                            processing


                                                               Feature


              Formulation
                                             Feature          generation
          Images                            generation


                                             Feature           Feature
                                            selection/        selection/
     Deployment phase                                         extraction
                                            extraction
                           Apply
                         classifier
         Automatic                          Prediction        Classifier
        annotations                         evaluation       construction




Data-Intensive Systems Process
/* import non-universal components from the computational environment */
import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery
Engineering Language
import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation
import uk.org.ogsadai.DeliverToRequestStatus;

/* construct and identify instances of the PE */
SQLQuery query = new SQLQuery();
TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays();
DeliverToRequestStatus del = new DeliverToRequestStatus();

/* form connection c1 with an explicit literal stream expression as its source
and query as its destination */

String q1 = "SELECT * FROM weather";
|- q1 -| => expression->query;
String resourceID = "MySQLResource";
|- resourceID -| => resource->query;
query->data => data->wrs;
wrs->result => input->del;
Testing phase    Training phase
          Manual             Image                             Image
        Annotations       integration                        processing
                                              Image
                                            processing


                                                               Feature


              Formulation
                                             Feature          generation
          Images                            generation


                                             Feature           Feature
                                            selection/        selection/
     Deployment phase                                         extraction
                                            extraction
                           Apply
                         classifier
         Automatic                          Prediction        Classifier
        annotations                         evaluation       construction




Data-Intensive Systems Process
/* import non-universal components from the computational environment */
import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery
Engineering Language
import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation
import uk.org.ogsadai.DeliverToRequestStatus;

/* construct and identify instances of the PE */
SQLQuery query = new SQLQuery();

                                                                                 Java
TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays();
DeliverToRequestStatus del = new DeliverToRequestStatus();

/* form connection c1 with an explicit literal stream expression as its source
and query as its destination */

String q1 = "SELECT * FROM weather";
|- q1 -| => expression->query;
String resourceID = "MySQLResource";
|- resourceID -| => resource->query;
query->data => data->wrs;
wrs->result => input->del;
Testing phase    Training phase
          Manual             Image                             Image
        Annotations       integration                        processing
                                              Image
                                            processing


                                                               Feature


              Formulation
                                             Feature          generation
          Images                            generation




                                                                                 OGSA-DAI
                                             Feature           Feature
                                            selection/        selection/
     Deployment phase                                         extraction
                                            extraction
                           Apply
                         classifier
         Automatic                          Prediction        Classifier
        annotations                         evaluation       construction




Data-Intensive Systems Process
/* import non-universal components from the computational environment */
import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery
Engineering Language
import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation
import uk.org.ogsadai.DeliverToRequestStatus;

/* construct and identify instances of the PE */
SQLQuery query = new SQLQuery();

                                                                                  Java
TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays();
DeliverToRequestStatus del = new DeliverToRequestStatus();

/* form connection c1 with an explicit literal stream expression as its source
and query as its destination */

String q1 = "SELECT * FROM weather";
|- q1 -| => expression->query;
String resourceID = "MySQLResource";
|- resourceID -| => resource->query;
query->data => data->wrs;
wrs->result => input->del;
                                                                     



                                                                                                                            
                                                                                                                   
                                          
                                                                                                                         
                                          
                                        
                                                
                                                                                                                      
                                                                                                                      
                                                                                                                    
                                                
                                                                                                                            
                                          
                                       
                                                                                                                            
                                                                                                                      
                             
                                                                                                                 
             
                           
                                                                                                          
                                                                                          
     
                                                                                                       
 

    
                                                                                   
                                                                                                                              
                                                                              
                                                                                                                       

                                                                                                                                                                                                                                                                            

                                                                                                                                                                  
                                                                                                                                                         
                                                                                                                                                                                                                                                                           
                                                                                                                                                                 
                                                   
                                                                                                                                                                                                                                                                
                                                         
                                                                                                                                                                                                                                                                           
                                                                                                                                                              
                                                                                                                                                            
                                                         
                                                                                                                                                                  
                                                                                                                                                                                                                                                                         
                                                      
                                                                                                                                                                                                                                                                               
                                                                                                                                                                  
                                                                                                                                                                                                                                                                             
                                                                                                                                                               
                                      
                                                                                                                                                                                                                                                  
                       
                                    
                                                                                                                                                
                                                                                                                                                                                                                                                        
     
                                                                                                                                                                                                                                                                 
 

    
                                                                                                                                                                                                                                                                                   
                                                                                                              
                                                                                                                                                                     
                                                                                                      
                                                                                                                                                                
                                                                                                     
                
                                                                                                                                                                  
             
                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                         
                                                                                                                                             
                                                                                                                                       
                                                              
                                                                                                                                                                                                                                     
                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                                         
                                                                                                                                                                                                                                                                         
                                                                                                                                                                                                        
                                                                                                                                                                                                                                                             
                        
                                                                                                                                                                                                                                     

                                                                                                                                           
                                                                                                       
                                       
                                                                                                                                                                                                                                                                                       
                                                                                                                                           
                                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                                         
                                                                                                                                                                                
                                                                                                                                          
                                                                                                                                                                                                  
                                                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                                      
                                                                                                                                             
                                                                                                                                                                         
                                                                                                                                                                   



                                                                                                                                                                                                                                                       
                                                                                       
                                                                                                                                                                                                                                                        
                                                                                  
                                                                                                                                                                                                                                                                   

                                                                                                                
                                                                                                                          
                                                                                                           
                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                      
                                                                                                                                         
                                                                                         
                                                                                                                                
                                                                                                          

                                                                                                                                                                                                                                                                                                  
                                                                                                                 
                                                                                                                                                                                                                                                        
                                                               
                                                                                                                                                                                                                                                                       
                                                                                                                                                          
                                                                                                                                            
                                                                                             
                                                                                                                                                         
                                                                                                                                                                                                                 
                                                                                                                                                                                                                    
                                                                                                                                                                                                                                                             
                                                                                              
                                                                                                                                                                     
                                                                 
                                                                                                                                                       
                                                                                           
                                                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                     
                                                                                           
                                                                                                                                                                                                                                                                   
                                                                                                                                                            
                                                                                                                                                                             
                                                                                                                                                                                                                                                                                
                                                                                                                                                     
                                                                                                                                                                                               
                                                                                                                                                         
                                                                                                                                                                                                                                 
                                                                             
                                                                                                                                                                                               
                                                                                                                                                                                               
                                                                                                                                                                                                                                                                                                             
                                                                  
                                                                                                                                                                                                                                                                                                      
Architecture results
           5
                                                  2 nodes
                                                  3 nodes
          4.5                                     4 nodes
                                                  5 nodes
                                                  6 nodes
           4                                      7 nodes
                                                  8 nodes

          3.5
Speedup




           3


          2.5


           2


          1.5


           1


          0.5
                0   5000         10000        15000         20000
                           Number of Images
The ‘hump’
                     6000
                                                              Workflow Execution Time
                                                                                  PE1
                                                                                  PE2
                     5000                                                         PE3
                                                                                  PE4
                                                                                  PE5
Processing Time(s)




                                                                                  PE6
                     4000                                                         PE7
                                                                                  PE8
                                                                                  PE9

                     3000



                     2000



                     1000



                        0
                            1 2 3 4 5 6 7 8      1 2 3 4 5 6 7 8      1 2 3 4 5 6 7 8
                              6400 Images          12800 Images          19200 Images
                                            Number of Computing Nodes
The ‘hump’
                     6000
                                                              Workflow Execution Time
                                                                                  PE1
                                                                                  PE2
                     5000                                                         PE3
                                                                                  PE4
                                                                                  PE5
Processing Time(s)




                                                                                  PE6
                     4000                                                         PE7
                                                                                  PE8
                                                                                  PE9

                     3000



                     2000



                     1000



                        0
                            1 2 3 4 5 6 7 8      1 2 3 4 5 6 7 8      1 2 3 4 5 6 7 8
                              6400 Images          12800 Images          19200 Images
                                            Number of Computing Nodes
Data mining results
Table 1. The preliminary result of classification performance using 10-fold validation
hhhh
    h      hhClassification Performance
                hhhh
                       hhhh                                              Sensitivity Specificity
Gene expression                  hh h
Humerus                                                                    0.7525     0.7921
Handplate                                                                  0.7105     0.7231
Fibula                                                                     0.7273      0.718
Tibia                                                                      0.7467     0.7451
Femur                                                                      0.7241     0.7345
Ribs                                                                       0.5614     0.7538
Petrous part                                                               0.7903     0.7538
Scapula                                                                    0.7882     0.7099
Head mesenchyme                                                            0.7857     0.5507
Note: Sensitivity: true positive rate. Specificity: true negative rate.



5   Conclusion and Future Work
Where we are

• Architecture prototype works
• Intuitive workbench created
• Will be connected next
• Two more use cases
Team
        National e-Science Centre     http://www.admire-project.eu/
                  Malcolm Atkinson
                  Jano van Hemert
                      Liangxiu Han
                 Gagarine Yaikhom
                    Chee-Sun Liew
                             EPCC
               Mark Parsons et al.
             University of Vienna
               Peter Brezany et al.
Universidad Politécnica de Madrid
                     Oscar Corcho
     Slovak Academy of Sciences
                   Ladislav Hluchý
             Fujitsu Labs Europe
                     David Snelling
                     ComArch SA
                   Marcin Choiński      http://research.nesc.ac.uk/

More Related Content

What's hot

Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyFabio Porto
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
Cloud-Based Solutions for Scientific Computing
Cloud-Based Solutions for Scientific ComputingCloud-Based Solutions for Scientific Computing
Cloud-Based Solutions for Scientific ComputingIan Lewis
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...Andrea Scharnhorst
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Larry Smarr
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Bryan Heidorn
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Ian Foster
 
Strong field science core proposal for uph ill site
Strong field science core proposal for uph ill siteStrong field science core proposal for uph ill site
Strong field science core proposal for uph ill siteahsanrabbani
 
Data At Risk poster for UNESCO Conference
Data At Risk poster for UNESCO ConferenceData At Risk poster for UNESCO Conference
Data At Risk poster for UNESCO ConferenceChris Muller
 
13,573,002 Method Patent The Heart Beacon Cycle
13,573,002 Method Patent The Heart Beacon Cycle13,573,002 Method Patent The Heart Beacon Cycle
13,573,002 Method Patent The Heart Beacon CycleSAW Concepts LLC
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs Club 澳洲互联网俱乐部
 

What's hot (19)

Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
Building the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data ScienceBuilding the Pacific Research Platform: Supernetworks for Big Data Science
Building the Pacific Research Platform: Supernetworks for Big Data Science
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
Emc 2013 Big Data in Astronomy
Emc 2013 Big Data in AstronomyEmc 2013 Big Data in Astronomy
Emc 2013 Big Data in Astronomy
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
 
Cloud-Based Solutions for Scientific Computing
Cloud-Based Solutions for Scientific ComputingCloud-Based Solutions for Scientific Computing
Cloud-Based Solutions for Scientific Computing
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
 
Advancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated CyberinfrastructureAdvancing Science through Coordinated Cyberinfrastructure
Advancing Science through Coordinated Cyberinfrastructure
 
Keller geo edu
Keller geo eduKeller geo edu
Keller geo edu
 
Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...Rethinking how we provide science IT in an era of massive data but modest bud...
Rethinking how we provide science IT in an era of massive data but modest bud...
 
Strong field science core proposal for uph ill site
Strong field science core proposal for uph ill siteStrong field science core proposal for uph ill site
Strong field science core proposal for uph ill site
 
Data At Risk poster for UNESCO Conference
Data At Risk poster for UNESCO ConferenceData At Risk poster for UNESCO Conference
Data At Risk poster for UNESCO Conference
 
13,573,002 Method Patent The Heart Beacon Cycle
13,573,002 Method Patent The Heart Beacon Cycle13,573,002 Method Patent The Heart Beacon Cycle
13,573,002 Method Patent The Heart Beacon Cycle
 
IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research IDs书友会 - 主题1 - Swinburne Next Generation Research
IDs书友会 - 主题1 - Swinburne Next Generation Research
 

Viewers also liked

Knowledge Management - Concept of BA
Knowledge Management - Concept of BAKnowledge Management - Concept of BA
Knowledge Management - Concept of BAIMRAN KHAN
 
Concepts of Knowledge Management: Case Study -KPMG, Calibro,Knova
Concepts of Knowledge Management: Case Study -KPMG, Calibro,KnovaConcepts of Knowledge Management: Case Study -KPMG, Calibro,Knova
Concepts of Knowledge Management: Case Study -KPMG, Calibro,KnovaUyoyo Edosio
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
5 Ps of strategy - strategic management - Manu Melwin Joy
5 Ps of strategy  - strategic management - Manu Melwin Joy5 Ps of strategy  - strategic management - Manu Melwin Joy
5 Ps of strategy - strategic management - Manu Melwin Joymanumelwin
 
Database Architecture and Basic Concepts
Database Architecture and Basic ConceptsDatabase Architecture and Basic Concepts
Database Architecture and Basic ConceptsTony Wong
 
The Importance of Listening to Your Customers
The Importance of Listening to Your CustomersThe Importance of Listening to Your Customers
The Importance of Listening to Your CustomersDrift
 

Viewers also liked (7)

Knowledge Management - Concept of BA
Knowledge Management - Concept of BAKnowledge Management - Concept of BA
Knowledge Management - Concept of BA
 
Concepts of Knowledge Management: Case Study -KPMG, Calibro,Knova
Concepts of Knowledge Management: Case Study -KPMG, Calibro,KnovaConcepts of Knowledge Management: Case Study -KPMG, Calibro,Knova
Concepts of Knowledge Management: Case Study -KPMG, Calibro,Knova
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
5 Ps of strategy - strategic management - Manu Melwin Joy
5 Ps of strategy  - strategic management - Manu Melwin Joy5 Ps of strategy  - strategic management - Manu Melwin Joy
5 Ps of strategy - strategic management - Manu Melwin Joy
 
Database Architecture and Basic Concepts
Database Architecture and Basic ConceptsDatabase Architecture and Basic Concepts
Database Architecture and Basic Concepts
 
The Importance of Listening to Your Customers
The Importance of Listening to Your CustomersThe Importance of Listening to Your Customers
The Importance of Listening to Your Customers
 
Employee engagement
Employee engagementEmployee engagement
Employee engagement
 

Similar to Advanced Data Mining and Integration Research for Europe (ADMIRE)

4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lrDominic A Ienco
 
ICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. MurayamaICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. MurayamaYasuhiro Murayama
 
Solar System Model
Solar System ModelSolar System Model
Solar System Modelcharsh
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of ScienceGlobus
 
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederOII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederEric Meyer
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research DataMichael Day
 
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...Yasuhiro Murayama
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-ResearchDavid De Roure
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014VinothkumaR Ramu
 
201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting201109021 mcguinness ska_meeting
201109021 mcguinness ska_meetingDeborah McGuinness
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data PortalTom Loughran
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NACLarry Smarr
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU LIBER Europe
 

Similar to Advanced Data Mining and Integration Research for Europe (ADMIRE) (20)

4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr4th_paradigm_book_complete_lr
4th_paradigm_book_complete_lr
 
ICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. MurayamaICSTI Annual Meeting 2014 Tokyo Y. Murayama
ICSTI Annual Meeting 2014 Tokyo Y. Murayama
 
Solar System Model
Solar System ModelSolar System Model
Solar System Model
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederOII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
OII Summer Doctoral Programme 2010: Global brain by Meyer & Schroeder
 
Curation of Research Data
Curation of Research DataCuration of Research Data
Curation of Research Data
 
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...
Slides by Y. Murayama at Japan-France Joint Meeting on Open Access and Open D...
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014
 
201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
The Fourth Paradigm Book
The Fourth Paradigm BookThe Fourth Paradigm Book
The Fourth Paradigm Book
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data Portal
 
Report to the NAC
Report to the NACReport to the NAC
Report to the NAC
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
ieee cloud 2015 keynote talk
ieee cloud 2015 keynote talkieee cloud 2015 keynote talk
ieee cloud 2015 keynote talk
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 

Recently uploaded

UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxUEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxEuro Cup 2024 Tickets
 
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...Nitya salvi
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfArturo Pacheco Alvarez
 
Spain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docxSpain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docxWorld Wide Tickets And Hospitality
 
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docx
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docxUEFA Euro 2024 Squad Check-in Which team is Top favorite.docx
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docxEuro Cup 2024 Tickets
 
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Eticketing.co
 
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMuzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeTechnical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeOptics-Trade
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...baharayali
 
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...Eticketing.co
 
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfJORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfArturo Pacheco Alvarez
 
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room packageWhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room packageNitya salvi
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxWorld Wide Tickets And Hospitality
 
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxNetherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxEuro Cup 2024 Tickets
 
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuropean Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuro Cup 2024 Tickets
 
Cricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfLatiyalinfotech
 
Personal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley DennisPersonal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley Dennisjocksofalltradespodc
 
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)Delhi Call girls
 
Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)CMBustamante
 

Recently uploaded (20)

UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docxUEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
UEFA Euro 2024 Squad Check-in Who is Most Favorite.docx
 
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...Ramban  Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts In...
Ramban Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts In...
 
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdfJORNADA 5 LIGA MURO 2024INSUGURACION.pdf
JORNADA 5 LIGA MURO 2024INSUGURACION.pdf
 
Spain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docxSpain Vs Italy Spain to be banned from participating in Euro 2024.docx
Spain Vs Italy Spain to be banned from participating in Euro 2024.docx
 
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docx
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docxUEFA Euro 2024 Squad Check-in Which team is Top favorite.docx
UEFA Euro 2024 Squad Check-in Which team is Top favorite.docx
 
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
Italy vs Albania Italy Euro 2024 squad Luciano Spalletti's full team ahead of...
 
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMuzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Muzaffarpur Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics TradeTechnical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
Technical Data | Sig Sauer Easy6 BDX 1-6x24 | Optics Trade
 
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
Asli Kala jadu, Black magic specialist in Pakistan Or Kala jadu expert in Egy...
 
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
Croatia vs Italy Euro Cup 2024 Three pitfalls for Spalletti’s Italy in Group ...
 
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdfJORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
JORNADA 6 LIGA MURO 2024TUXTEPECOAXACA.pdf
 
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room packageWhatsApp Chat: 📞 8617697112 Birbhum  Call Girl available for hotel room package
WhatsApp Chat: 📞 8617697112 Birbhum Call Girl available for hotel room package
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docxSlovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
Slovenia Vs Serbia UEFA Euro 2024 Fixture Guide Every Fixture Detailed.docx
 
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docxNetherlands Players expected to miss UEFA Euro 2024 due to injury.docx
Netherlands Players expected to miss UEFA Euro 2024 due to injury.docx
 
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docxEuropean Football Icons that Missed Opportunities at UEFA Euro 2024.docx
European Football Icons that Missed Opportunities at UEFA Euro 2024.docx
 
Cricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdfCricket Api Solution.pdf
Cricket Api Solution.pdfCricket Api Solution.pdf
 
Personal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley DennisPersonal Brand Exploration - By Bradley Dennis
Personal Brand Exploration - By Bradley Dennis
 
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Sheikh Sarai (Delhi)
 
Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)Sports Writing (Rules,Tips, Examples, etc)
Sports Writing (Rules,Tips, Examples, etc)
 

Advanced Data Mining and Integration Research for Europe (ADMIRE)

  • 1. Advanced Data Mining and Integration Research for Europe (ADMIRE) Jano van Hemert NI VER research.nesc.ac.uk U S E IT TH Y O F H G E R D I U N B
  • 2. Downloaded from www.sciencemag.org on July 6, 2009 COMPUTER SCIENCE The demands of data-intensive science Beyond the Data Deluge represent a challenge for diverse scientific communities. Gordon Bell,1 Tony Hey,1 Alex Szalay2 S ince at least Newton’s laws of motion in the 17th century, scientists have recog- nized experimental and theoretical sci- ence as the basic research paradigms for understanding nature. In recent decades, com- puter simulations have become an essential third paradigm: a standard tool for scientists to explore domains that are inaccessible to theory and experiment, such as the evolution of the universe, car passenger crash testing, and pre- dicting climate change. As simulations and experiments yield ever more data, a fourth par- adigm is emerging, consisting of the tech- niques and technologies needed to perform data-intensive science (1). For example, new types of computer clusters are emerging that are optimized for data movement and analysis rather than computing, while in astronomy and other sciences, integrated data systems allow data analysis and storage on site instead of requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen- hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service. throughput instruments, sensor networks, accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas- to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists. (2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to CREDIT: JONATHAN FAY/MICROSOFT (1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever management challenges. In almost every labo- increasing flood of scientific data generated ratory, “born digital” data proliferate in files, by the faster computers. In university research 1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually 98052, USA. 2Department of Physics and Astronomy, Johns Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can 21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297 Published by AAAS
  • 3. Downloaded from www.sciencemag.org on July 6, 2009 COMPUTER SCIENCE The demands of data-intensive science Beyond the Data Deluge represent a challenge for diverse scientific communities. Gordon Bell,1 Tony Hey,1 Alex Szalay2 S ince at least Newton’s laws of motion in the 17th century, scientists have recog- nized experimental and theoretical sci- ence as the basic research paradigms for understanding nature. In recent decades, com- puter simulations have become an essential third paradigm: a standard tool for scientists to explore domains that are inaccessible to theory and experiment, such as the evolution of the universe, car passenger crash testing, and pre- dicting climate change. As simulations and experiments yield ever more data, a fourth par- adigm is emerging, consisting of the tech- niques and technologies needed to perform data-intensive science (1). For example, new types of computer clusters are emerging that are optimized for data movement and analysis rather than computing, while in astronomy and other sciences, integrated data systems allow data analysis and storage on site instead of requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen- hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service. throughput instruments, sensor networks, accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas- to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists. (2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to CREDIT: JONATHAN FAY/MICROSOFT (1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever management challenges. In almost every labo- increasing flood of scientific data generated ratory, “born digital” data proliferate in files, by the faster computers. In university research 1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually 98052, USA. 2Department of Physics and Astronomy, Johns Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can 21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297 Published by AAAS
  • 4. Vol 455|4 September 2008 BOOKS & ARTS Distilling meaning from data Buried in vast streams of data are clues to new science. But we may need to craft new lenses to see them, explain Felice Frankel and Rosalind Reid. It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun- as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www. ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt- we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science. further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline. questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci- To extract new meaning entists make them rarely as D. ARMENDARIZ from the sea of data, scien- adventurous as these young tists have begun to embrace minds. We think we are on 23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409 the tools of visualization. Yet the path to insight when few appreciate that visual rep- shading reveals contours resentation is also a form of in 3D renderings, or when communication. A rich body bursts of red appear on heat of communication expertise maps, for example. But the Vol 440|23 March 2006 holds the potential to greatly algorithms used to produce improve these tools. We pro- the graphics may create illu- pose that graphic artists, com- sions or embed assumptions. municators and visualization scientists should be brought into conversation with theo- The human visual system creates in the brain an appar- ent understanding of what COMMENTARY rists and experimenters a picture represents, not before all the data have been necessarily a picture of the gathered. If we design experi- underlying science. Unless Exceeding human limits ments in ways that offer varied we know all the steps from opportunities for represent- hypothesis to understand- ing and communicating data, ing — by conversing with techniques for extracting new theorists, experimentalists, understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software are turning to automated processes and technologies in a bid to cope with ever higher volumes of data. Scientists available. developers, visualization But automation offers so much more to the future of science than just data handling, says Stephen H. Muggleton. Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy- intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis- FIREFLY PRODUCTIONS/CORBIS for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas- important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam- that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I experiment in biology can to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a gigabyte of data per day, and in astronomy a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University, broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a automatic tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci- On the Surface other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas- ences the querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing low-cost digital tion, discovery and communication. for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is efficiency Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the clear communicators are often consulted only after science emerged. expansion of automation in all its aspects: data
  • 5. c and probability cal- and charge distributionshould become easier for autonomous experimen On such timescales it of individual molecules however, still a decade ic provides a formal need to be integrated scientists to reproduce new experiments and becoming standard scie Vol 455|4 September 2008 gramming languages with models describ- refute their hypotheses. Despite the potentia BOOKS & ARTS probability calculus ing Today’s generation of microfluidic “Owing tomachines severe danger data the scale and rate of that incre the interdepen- generation, computational models of ms of probability for dency of chemical out a specific series of ume of data generation is designed to carry Distilling meaning from data reactions, scientific flexibility decreases in compreh s bayesian networks.new science. But we may needHowever, but further data now require automatic chemical Buried in vast streams of data are clues to reactions. to craft new stic logic’ is a formaland Rosalind Reid. be added the tool kit by developing Academic studies on the could to this construction and modification.” lenses to see them, explain Felice Frankel differences in statements of sound mathematical under- call what one might t It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun- as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www. ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt- a ‘chemical Turing “There is a severe danger that i robability of A being pinnings of, say, differential equations, bayesian puter. Such chips contai we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their machine’. The universal ure forms of existing networks and logic programs make integrating chambers, ducts, gates t unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes increases in speed and volume of n in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science. further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training Turing machine, devised fortunately computa- these various models virtually impossible. reagent stores, and allow ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline. wever, an increasing Although by Alan Turing, be data generation could leadat high sp in 1936 hybrid models can built by simply sis and testing to questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci- t To extract new meaning entists make them rarely as D. ARMENDARIZ from the sea of data, scien- adventurous as these young tists have begun to embrace minds. We think we are on was intended to mimic decreases in comprehensibility.” ups have developed patching two models together, the underlying miniaturizing our robot-o 23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409 the tools of visualization. Yet the path to insight when few appreciate that visual rep- shading reveals contours resentation is also a form of in 3D renderings, or when communication. A rich body of communication expertise holds the potential to greatly the pencil-and-paper ques that can handle differences lead to unpredictable and error- this way, with the overal bursts of red appear on heat maps, for example. But the Vol 440|23 March 2006 algorithms used to produce s probabilistic logic6. prone behaviour mathematician. The chemical experimental cycle time improve these tools. We pro- pose that graphic artists, com- operations of a when changes are made. beings. This is particu the graphics may create illu- sions or embed assumptions. municators and visualization such research holds Turing encouraging development in this liseconds.associated with scientists should be brought machine would be a universal proces- nologies With microflu COMMENTARY The human visual system creates in the brain an appar- One into conversation with theo- ent understanding of what rists and experimenters a picture represents, not egration of scientific respect is the emergence withinbroad range of chemical reaction not onA before all the data have been gathered. If we design experi- sor capable of performing a computer sci- and experimentation. necessarily a picture of the underlying science. Unless al and computer-sci- ence of new formalisms5 that integrate, in alimits chemical operations Exceeding human complete, but also requi ments in ways that offer varied we know all the steps from opportunities for represent- ing and communicating data, techniques for extracting new on both the reagents essentially human activhypothesis to understand- ing — by conversing with theorists, experimentalists, available to it at the start andoffersto automated processes andof science thaninjustbid to cope with saysStephen H. Muggleton. a thoseof mathe- of input materials, with o Scientists are turning chemicals bothhandling, ever higher volumes of data. technologies a data in the statement understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software available. sound fashion, two major branches more to the future But automation so much developers, visualization Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy- matics: mathematical logic and probabilityauto- On such timescales it sho it later generates. The machine would cal- clear and undeniable intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis- FIREFLY PRODUCTIONS/CORBIS for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the s culus. Mathematicaland test chemical com- scientists to reproduce n matically prepare logic provides a formal experimentation. pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas- important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam- that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput pounds but it would also be programmable, Stephen H. Muggleton is learning approaches foundation for logic programming languages refute their hypotheses. on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I experiment in biology can to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a gigabyte of data per day, and in astronomy a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University, ng scientific models such as Prolog, much theprobability calculusa Computing and the Centr thus allowing whereas same flexibility as broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a automatic Today’s generation of m tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci- On the Surface other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas- ences the real chemist has in the lab. p’ systems with no provides the basic axioms of probability for is designed to carry ou Systems Biology at Imper querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing tion, discovery and communication. low-cost digital for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is efficiency to the collection of One can think of a chemical Turing 2BZ, UK. Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the communicators are often consulted only after science emerged. clear expansion of automation in all its aspects: data
  • 7. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge.
  • 8. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge. • The project is motivated by the difficulty of extracting meaningful information by data mining combinations of data from multiple heterogeneous and distributed resources.
  • 9. Aims • ADMIRE aims to deliver a consistent and easy-to- use technology for extracting information and knowledge. • The project is motivated by the difficulty of extracting meaningful information by data mining combinations of data from multiple heterogeneous and distributed resources. • It will also provide an abstract view of data mining and integration, which will give users and developers the power to cope with complexity and heterogeneity of services, data and processes.
  • 10. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources
  • 11. Separating concerns User and application diversity Iterative DMI process development Accommodating Tool level Many application domains Many tool sets Many process representations Many working practices Gateway interface DMI canonical representation and abstract machine one model Composing or hiding Enactment Many autonomous resources & services level Mapping optimisation Multiple enactment mechanisms and Multiple platform implementations enactment System diversity and complexity
  • 15. Use case TS23.embryo.organ system.sensory organ.nose.nasal cavity.epithelium.olfactory TS23.embryo.organ system.visceral organ.alimentary system.oral region.upper jaw.tooth.incisor TS23.embryo.organ system.visceral organ.alimentary system.oral region.lower jaw.tooth.incisor TS23.embryo.organ system.visceral organ.liver and biliary system.liver.lobe
  • 16. Formulation Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction
  • 17. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction
  • 18. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
  • 19. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
  • 20. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation OGSA-DAI Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
  • 21.                                                      
  • 22.                                                                                                                                                                                                                                                                                                                   
  • 23. Architecture results 5 2 nodes 3 nodes 4.5 4 nodes 5 nodes 6 nodes 4 7 nodes 8 nodes 3.5 Speedup 3 2.5 2 1.5 1 0.5 0 5000 10000 15000 20000 Number of Images
  • 24. The ‘hump’ 6000 Workflow Execution Time PE1 PE2 5000 PE3 PE4 PE5 Processing Time(s) PE6 4000 PE7 PE8 PE9 3000 2000 1000 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 6400 Images 12800 Images 19200 Images Number of Computing Nodes
  • 25. The ‘hump’ 6000 Workflow Execution Time PE1 PE2 5000 PE3 PE4 PE5 Processing Time(s) PE6 4000 PE7 PE8 PE9 3000 2000 1000 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 6400 Images 12800 Images 19200 Images Number of Computing Nodes
  • 26. Data mining results Table 1. The preliminary result of classification performance using 10-fold validation hhhh h hhClassification Performance hhhh hhhh Sensitivity Specificity Gene expression hh h Humerus 0.7525 0.7921 Handplate 0.7105 0.7231 Fibula 0.7273 0.718 Tibia 0.7467 0.7451 Femur 0.7241 0.7345 Ribs 0.5614 0.7538 Petrous part 0.7903 0.7538 Scapula 0.7882 0.7099 Head mesenchyme 0.7857 0.5507 Note: Sensitivity: true positive rate. Specificity: true negative rate. 5 Conclusion and Future Work
  • 27. Where we are • Architecture prototype works • Intuitive workbench created • Will be connected next • Two more use cases
  • 28. Team National e-Science Centre http://www.admire-project.eu/ Malcolm Atkinson Jano van Hemert Liangxiu Han Gagarine Yaikhom Chee-Sun Liew EPCC Mark Parsons et al. University of Vienna Peter Brezany et al. Universidad Politécnica de Madrid Oscar Corcho Slovak Academy of Sciences Ladislav Hluchý Fujitsu Labs Europe David Snelling ComArch SA Marcin Choiński http://research.nesc.ac.uk/

Editor's Notes

  1. * This is not about projects, publications * Where did we suddenly appear from
  2. * One of the papers that is signposting
  3. * Sensors, large machines, interaction with data (software), interaction between people, interaction of software on data, ...
  4. * More explicit forms of demands
  5. * More explicit forms of demands
  6. * A proposed solution * How do you go about implementing a solution under the fourth paradigm?