Your SlideShare is downloading. ×
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Data-Intensive Research
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data-Intensive Research

836

Published on

Edinburgh Data-Intensive Research Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis, and intricate interactions between data and users. Current methods …

Edinburgh Data-Intensive Research Data-intensive refers to huge volumes of data, complex patterns of data integration and analysis, and intricate interactions between data and users. Current methods and tools are failing to address data-intensive challenges effectively. They fail for several reasons, all of which are aspects of scalability. The deluge of computational methods and plethora of computational systems prevents effective and efficient use of resources, user interfaces are not adopted at a sufficient rate to satisfy demand for scientific computing and data and knowledge is created outside suitable contexts for collaborative research to be effective. The Edinburgh Data-Intensive Research group addresses these scalability issues by providing mappings from abstract formulations to concrete and optimised executions of research challenges, by developing intuitive interfaces to enable access to steer these executions and by developing systems to aid in creating new research challenges. In this talk I will present several exemplars where we have dealt with scalability issues in scientific scenarios.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
836
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Research focuses on progressing computer science
    * by evaluating both generic and tailored methodologies
    * in a multidisciplinary context with
    * rich use cases to test hypotheses
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution



  • * scaling 1: rapid to portal building
    * scaling 2: portal to gaussian use (140 students)
    * mention myExperiment
  • * scaling 1: rapid to portal building
    * scaling 2: portal to gaussian use (140 students)
    * mention myExperiment
  • * scaling 1: rapid to portal building
    * scaling 2: portal to gaussian use (140 students)
    * mention myExperiment
  • * scaling 1: rapid to portal building
    * scaling 2: portal to gaussian use (140 students)
    * mention myExperiment

  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution












  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution
  • * Formulation = an abstract description of the data-intensive challenge
    * Execution = an implementation of the challenge that runs on a computational platform
    * Interaction = necessary to manage the formulation process and to steer the execution





  • Transcript

    • 1. "'()*+!,& '$-$.()-")#(/" !"#"$!%& Jano van Hemert research.nesc.ac.uk
    • 2. Efficient distributed systems Computer Science Research Effective algorithms Data-intensive computing
    • 3. Efficient distributed Reusable computational systems models Computer Science Interdisciplinary Research Applications Effective Intuitive algorithms Data-intensive Collaborative interfaces computing environments New conceptual models for systems
    • 4. Developmental Medical Emergency Chemistry Response Biology Genetics Reusable computational models alpha release of a combined earth- quake selection and waveform selec- tion service combining the EMSC and Real-time access to European BB data successively increasing The Virtual European Broad-band the ORFEUS services. The web por- Seismograph Network (VEBSN) is tal also includes a first test version steadily increasing its size. Currently of the underlying software structure more then 270 stations are contrib- Interdisciplinary of the distributed archive services of uting data to the VEBSN in near real- the Integrated European Distributed time. For some tens of these stations Archive (EIDA) for waveform data. we still need to compile the instru- The alpha release implies that a mentation and data details (data- test version of the current service is less Seed volumes). An example of made accessible for a selected group the earthquake in Greece on Febru- Applications of scientist that are willing to test it ary 14, 2008 illustrates the available and recommend modifications. In- data. The VEBSN is a joint initiative terested seismologists, student, re- of European-Mediterranean seismo- searcher or network operator, are logical networks. More information encouraged to contact the NERIES can be obtained from www.orfeus- Project Office if they are interested eu.org/Data-info/vebsn.html. to test the services. A short video Intuitive presentation is available (http:// Figure 3. The Greek earthquake of February 14, 2008 as recorded by the vertical component of broadband www.neries-eu.org/main.php/demo. stations of the VEBSN (mainly in the European-Medi- terranean area) and made available by ORFEUS. The wmv?fileitem=8798210). Alessan- VEBSN is currently still expanding. Collaborative Brain dro Spinuso, Sergio Rives, Luca Tra- Neuro- Quantitative ni, Phetaphone Thomy, Rémy Bossu, interfaces Seismology Torild van Eck. (See figure 2 below.) informatics Genetics Imaging environments
    • 5. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources "'()*+!,& '$-$.()-")#(/" !"#"$!%&
    • 6. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources
    • 7. Interaction Experiments & knowledge creation Steering Data-Intensive Engineers Execution
    • 8. Interaction Experiments & knowledge creation Steering Data-Intensive Engineers Execution
    • 9. ! !
    • 10. ! Figure 3: Screenshots of the DGEMap Web Portal, showing the facility for adding new project details to the database. Page 2 Deliverable D2.8 Design Study Contract number 011993 !
    • 11. ? ! Figure 3: Screenshots of the DGEMap Web Portal, showing the facility for adding new project details to the database. Page 2 Deliverable D2.8 Design Study Contract number 011993 ! ?
    • 12. Scaling • More users able to join in • Deal with more experiments • Better reproducibility (in progress) Want your own scientific computing portal? Ask me!
    • 13. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources
    • 14. Formulation Data models & computational methods Mapping Data-Intensive Engineers Execution
    • 15. Formulation Data models & computational methods Classification of Gene MappingPatterns Expression Data-Intensive Engineers Execution
    • 16. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Java
    • 17. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    • 18. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    • 19. Testing phase Training phase Manual Image Image Annotations integration processing Image processing Feature Formulation Feature generation Images generation OGSA-DAI Feature Feature selection/ selection/ Deployment phase extraction extraction Apply classifier Automatic Prediction Classifier annotations evaluation construction Data-Intensive Systems Process /* import non-universal components from the computational environment */ import uk.org.ogsadai.SQLQuery; //get definition of SQLQuery Engineering Language import uk.org.ogsadai.TupleToWebRowSetCharArrays; // serialisation import uk.org.ogsadai.DeliverToRequestStatus; /* construct and identify instances of the PE */ SQLQuery query = new SQLQuery(); Java TupleToWebRowSetCharArrays wrs = new TupleToWebRowSetCharArrays(); DeliverToRequestStatus del = new DeliverToRequestStatus(); /* form connection c1 with an explicit literal stream expression as its source and query as its destination */ String q1 = "SELECT * FROM weather"; |- q1 -| => expression->query; String resourceID = "MySQLResource"; |- resourceID -| => resource->query; query->data => data->wrs; wrs->result => input->del;
    • 20.                                                      
    • 21.                                                                                                                                                                                                                                                                                                                   
    • 22. TS23.embryo.organ system.sensory organ.nose.nasal cavity.epithelium.olfactory TS23.embryo.organ system.visceral organ.alimentary system.oral region.upper jaw.tooth.incisor TS23.embryo.organ system.visceral organ.alimentary system.oral region.lower jaw.tooth.incisor TS23.embryo.organ system.visceral organ.liver and biliary system.liver.lobe
    • 23. Data mining results Table 1. The preliminary result of classification performance using 10-fold validation hhhh h hhClassification Performance hhhh hhhh Sensitivity Specificity Gene expression hh h Humerus 0.7525 0.7921 Handplate 0.7105 0.7231 Fibula 0.7273 0.718 Tibia 0.7467 0.7451 Femur 0.7241 0.7345 Ribs 0.5614 0.7538 Petrous part 0.7903 0.7538 Scapula 0.7882 0.7099 Head mesenchyme 0.7857 0.5507 Note: Sensitivity: true positive rate. Specificity: true negative rate. How good we can predict it is there 5 Conclusion and Future Work How good we can predict it is not there
    • 24. Scaling • Size of experiment • Volume of data • Available resources Want your own (distributed) data integration & mining? Ask me!
    • 25. Computional Domain Thinkers Specialists Creating Formulation Interaction Data models & Experiments & computational knowledge methods creation Mapping Steering Data-Intensive Engineers Execution Implementations, compute & data resources
    • 26. D Sp Creating n Inte & Exp l kn c
    • 27. Spatial atlases for developmental biology D Sp Creating n Inte & Exp l kn c
    • 28. Next Generation Embryology ≈ Google Maps for Developmental Biology http://research.nesc.ac.uk/nextgenerationembryology
    • 29. Annotating on-line
    • 30. Scaling • Larger collaborations • Handle more & diverse knowledge • Speed-up “Fourth Paradigm” (http://bit.ly/dwQzYe) Want your own 3D visualisation & annotation? Ask me!
    • 31. Multi-disciplinary [1] D. Rodr´ıguez Gonz´lez, T. Carpenter, J.I. van Hemert, and J. Wardlaw. An open source toolkit for a medical imaging de-identification. European Radiology, page First Online, 2010. [2] R.R. Kitchen, V.S. Sabine, A.H. Sims, E.J. Macaskill, L. Renshaw, J.S. Thomas, J.I. van Hemert, J.M. Dixon, and J.M.S. Bartlett. Correcting for intra-experiment variation in illumina beadchip data is necessary to generate robust gene-expression profiles. BMC Genomics, 11, 2010. [3] C.A. Morrison, N. Robertson, A. Turner, J. van Hemert, and J. Koetsier. Molecular Orbital Calculations of Inorganic Compounds, chapter 3.33, pages 261–267. Wiley-VCH, 3 edition, 2010. [4] Ales Tichopad, Tzachi Bar, Ladislav Pecen, Robert R. Kitchen, Mikael Kubista, and Michael W. Pfaffl. Quality control for quantitative pcr based on amplification compatibility test. Methods, 50:308–312, 2010. [5] Robert R. Kitchen, Mikael Kubista, and Ales Tichopad. Statistical aspects of quantitative real-time pcr experiment design. Methods, 50:231–236, 2010. [6] J. Koetsier, A. Turner, P. Richardson, and J.I. van Hemert. Rapid chemistry portals through engaging researchers. In IEEE 5th International Conference on e-Science, page In press, 2009. [7] Liangxiu Han, Jano van Hemert, Richard Baldock, and Malcolm P. Atkinson. Automating gene expression annotation for mouse embryo. In Ronghuai Huang; Qiang Yang; Jian Pei et al., editor, Advanced Data Mining and Applications, 5th International Conference, volume LNAI 5678. Springer, 2009. [8] J. O’Donoghue and J.I. van Hemert. Using the DCC Lifecycle Model to curate a gene expression database: A case study. International Journal of Digital Curation, page In press, 2009. [9] J.D. Armstrong and J.I. van Hemert. Towards a virtual fly brain. Philosophical Transactions A, 367(1896):2387–2397, June 2009.
    • 32. Multi-disciplinary [1] D. Rodr´ıguez Gonz´lez, T. Carpenter, J.I. van Hemert, and J. Wardlaw. An open source toolkit for a medical imaging de-identification. European Radiology, page First Online, 2010. [2] R.R. Kitchen, V.S. Sabine, A.H. Sims, E.J. Macaskill, L. Renshaw, J.S. Thomas, J.I. van Hemert, J.M. Dixon, and J.M.S. Bartlett. Correcting for intra-experiment variation in illumina beadchip data is necessary to generate robust gene-expression profiles. BMC Genomics, 11, 2010. [3] C.A. Morrison, N. Robertson, A. Turner, J. van Hemert, and J. Koetsier. Molecular Orbital Calculations of Inorganic Compounds, chapter 3.33, pages 261–267. Wiley-VCH, 3 edition, 2010. [4] Ales Tichopad, Tzachi Bar, Ladislav Pecen, Robert R. Kitchen, Mikael Kubista, and Michael W. Pfaffl. Quality control for quantitative pcr based on amplification compatibility test. Methods, 50:308–312, 2010. [5] Robert R. Kitchen, Mikael Kubista, and Ales Tichopad. Statistical aspects of quantitative real-time pcr experiment design. Methods, 50:231–236, 2010. [6] J. Koetsier, A. Turner, P. Richardson, and J.I. van Hemert. Rapid chemistry portals through engaging researchers. In IEEE 5th International Conference on e-Science, page In press, 2009. [7] Liangxiu Han, Jano van Hemert, Richard Baldock, and Malcolm P. Atkinson. Automating gene expression annotation for mouse embryo. In Ronghuai Huang; Qiang Yang; Jian Pei et al., editor, Advanced Data Mining and Applications, 5th International Conference, volume LNAI 5678. Springer, 2009. [8] J. O’Donoghue and J.I. van Hemert. Using the DCC Lifecycle Model to curate a gene expression database: A case study. International Journal of Digital Curation, page In press, 2009. [9] J.D. Armstrong and J.I. van Hemert. Towards a virtual fly brain. Philosophical Transactions A, 367(1896):2387–2397, June 2009.
    • 33. "'()*+!,& Jano van Hemert—j.vanhemert@ed.ac.uk '$-$.()-")#(/" !"#"$!%& Academics Malcolm Atkinson Research Assistants Jos Koetsier Liangxiu Han David Rodriguez Gagarine Yaikhom Laura Valkonen PhD Students Thomas French Luna De Ferrari Rob Kitchen Chee-Sun Liew IDEA Lab 29: Fan Zhu Research Students A scientific gateway for real time Gary, Vijay, Hwee, Yue, geophysical experiments Charalampos, Jeff, Gideon, Charis, Gareth, Harika, Andrejs http://research.nesc.ac.uk/partners/

    ×