The document discusses the challenges of data-intensive science across diverse fields. As experiments and simulations generate ever-larger data volumes, a new research paradigm of data-intensive science is emerging. This involves techniques for performing science at extreme scales, such as computer clusters optimized for data analysis. Many fields now face hundred- to thousand-fold increases in data from various instruments. Effectively managing, analyzing, and archiving these "digital deluges" presents significant challenges for scientists.
The demands of data-intensive science represent a challenge for diverse scientific communities. Data volumes from various sources are increasing exponentially, creating data management challenges. New approaches and technologies are needed to enable scientists to effectively analyze and store massive amounts of data.
Understanding the Big Picture of e-ScienceAndrew Sallans
E-science involves large-scale collaborative research enabled by new technologies like high-speed networks and cheap data storage. It produces massive amounts of complex data from areas like climate modeling, particle physics experiments, biomedical research grids, and citizen science projects. This represents a major change for research that requires new infrastructure, expertise, and approaches. Universities like UVA are responding by establishing research computing support services in their libraries to help scientists with the computational and data aspects of e-science throughout the research lifecycle.
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Andrea Scharnhorst
When we google, search Wikipedia, and share information on Mendeley, we obviously deal with complex networks of information. But also traditional information spaces – the collections of libraries for instance – and their classification systems are evolving complex systems. This talk explores the possibilities to use concepts and methods from statistical physics to analyze information dynamics. We depart from information dynamics in scholarly communication, and point to current encounters between physics and scientometrics. We discuss more in-depth the evolution of category systems in libraries (Universal Decimal Classification) in comparison to on-line spaces (Wikipedia). The talk closes with an introduction into a new European network – the COST Action KnowEscape – in which information professionals, sociologists, computer scientists, physicists and digital humanities scholars in an unique alliance seek for knowledge maps to better navigate through large information spaces.
Talk on June 11, 2013 by Andrea Scharnhorst at the IMT in Lucca, Italy.
This document discusses the growing issue of large quantities of data from scientific research and simulations. It notes that data volumes are doubling every 9 months for genomic sequencing and that various scientific projects are now generating petabytes and exabytes of data. Current solutions for data management and analysis are often ad-hoc and inadequate for small and medium scale research. The document advocates for a national research cyberinfrastructure strategy in Canada that could provide standardized services and leverage partnerships with international organizations and commercial cloud providers to support scientists in dealing with large and complex research data. It raises questions about leadership and roles for universities, research organizations, and government in developing solutions.
The document discusses the evolution of science and research from the 1940s to present day. It notes Vannevar Bush's 1945 concerns about the growing mountain of research that scientists did not have time to fully understand or remember. It then discusses the current "data explosion" and challenges of accessing, sharing, and building on increasingly large amounts of data and research. The document advocates for reusable, reproducible, and transparent science through connected resources and environments that facilitate collaboration and knowledge sharing.
The demands of data-intensive science represent a challenge for diverse scientific communities. Data volumes from various sources are increasing exponentially, creating data management challenges. New approaches and technologies are needed to enable scientists to effectively analyze and store massive amounts of data.
Understanding the Big Picture of e-ScienceAndrew Sallans
E-science involves large-scale collaborative research enabled by new technologies like high-speed networks and cheap data storage. It produces massive amounts of complex data from areas like climate modeling, particle physics experiments, biomedical research grids, and citizen science projects. This represents a major change for research that requires new infrastructure, expertise, and approaches. Universities like UVA are responding by establishing research computing support services in their libraries to help scientists with the computational and data aspects of e-science throughout the research lifecycle.
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Andrea Scharnhorst
When we google, search Wikipedia, and share information on Mendeley, we obviously deal with complex networks of information. But also traditional information spaces – the collections of libraries for instance – and their classification systems are evolving complex systems. This talk explores the possibilities to use concepts and methods from statistical physics to analyze information dynamics. We depart from information dynamics in scholarly communication, and point to current encounters between physics and scientometrics. We discuss more in-depth the evolution of category systems in libraries (Universal Decimal Classification) in comparison to on-line spaces (Wikipedia). The talk closes with an introduction into a new European network – the COST Action KnowEscape – in which information professionals, sociologists, computer scientists, physicists and digital humanities scholars in an unique alliance seek for knowledge maps to better navigate through large information spaces.
Talk on June 11, 2013 by Andrea Scharnhorst at the IMT in Lucca, Italy.
This document discusses the growing issue of large quantities of data from scientific research and simulations. It notes that data volumes are doubling every 9 months for genomic sequencing and that various scientific projects are now generating petabytes and exabytes of data. Current solutions for data management and analysis are often ad-hoc and inadequate for small and medium scale research. The document advocates for a national research cyberinfrastructure strategy in Canada that could provide standardized services and leverage partnerships with international organizations and commercial cloud providers to support scientists in dealing with large and complex research data. It raises questions about leadership and roles for universities, research organizations, and government in developing solutions.
The document discusses the evolution of science and research from the 1940s to present day. It notes Vannevar Bush's 1945 concerns about the growing mountain of research that scientists did not have time to fully understand or remember. It then discusses the current "data explosion" and challenges of accessing, sharing, and building on increasingly large amounts of data and research. The document advocates for reusable, reproducible, and transparent science through connected resources and environments that facilitate collaboration and knowledge sharing.
This document discusses the need for digital curation specialists in library settings to manage the growing volume of scholarly data and output. It recognizes that libraries have the skills and infrastructure to curate digital resources but will need new roles like digital curators, archivists, and data scientists. These roles require new training programs and concentrations in areas like data curation to develop specialists that can preserve, organize, and provide access to digital collections over the long term.
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
The document summarizes the 3DPAS theme of the e-Science Institute, which focuses on dynamic distributed data-intensive programming systems and applications. It discusses 14 example applications in different domains that were analyzed in terms of their data, infrastructure usage, and dynamic properties. It provides more detailed descriptions of 6 applications, covering their data sources, processing workflows, and current and future infrastructure usage. The goal of the 3DPAS theme is to understand the unique challenges of data-intensive applications and identify requirements to support future exascale applications involving distributed and dynamic data and computing.
The document discusses big data in astronomy and the LineA-DEXL case. It provides an outline and introduction to big data in science and hypothesis-driven research. It discusses data management techniques like data partitioning and parallel workflow processing. It then provides details on the Laboratorio Nacional de Computacao Cientifica (LNCC) and its role in supporting computational modeling and bioinformatics. It discusses astronomy surveys that generate large amounts of data like the Dark Energy Survey and challenges of data from the Large Synoptic Survey Telescope. Finally, it discusses the need for data infrastructure, metadata management, and distributed data management to support scientific research involving big data.
This document discusses open access to scientific research data. It notes that scientific research is increasingly data-driven and large-scale, especially in fields like high-energy physics, astronomy, and biology. However, inadequate access to research data is a problem, limiting opportunities to reuse data and validate or build upon past findings. The document examines some incentive-based approaches and key developments related to improving data sharing. It provides examples of large-scale data generation projects and challenges around managing and analyzing big data. Overall, the document argues that unrestricted sharing of scientific data deposited in the public domain could accelerate research and advance knowledge.
- Scientific names for species can change over time as taxonomy knowledge evolves
- An event-centric ontology model represents names and changes through time using different URIs for taxon concepts at different times
- Transition and snapshot models can then simplify the descriptions by linking concepts over time or just showing current names
- This approach allows integrated representation of taxonomy knowledge and its revisions in a computable way
Cloud-Based Solutions for Scientific ComputingIan Lewis
This paper discusses examples of cloud computing for scientific applications, including public, private, and hybrid cloud solutions for scientists, and research into performance-improving mechanisms.
How to use science maps to navigate large information spaces? What is the lin...Andrea Scharnhorst
A. Scharnhorst (2016) Wie können Wissenschaftskarten zur Suche in grossen Informationsräumen eingesetzt werden? How to use science maps to navigate large information spaces? What is the link between science maps and predictive models of science? Invited lecture Fraunhofer-Institut für Naturwissenschaftlich-Technische Trendanalysen, Euskirchen, Germany, December 7, 2016
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
This document discusses the need for geoinformatics and cyberinfrastructure in the geosciences. It argues that answering complex scientific questions requires integrating all available data, but that currently it is too difficult to find, work with, and access relevant data and tools. The document advocates for strong partnerships between geoscientists and computer scientists to build user-friendly tools that facilitate data sharing, integration and analysis in order to accelerate scientific progress. It emphasizes that data needs to be organized into databases and data systems with standards and formats to make it easily discovered and used by the broader community.
Rethinking how we provide science IT in an era of massive data but modest bud...Ian Foster
A talk given in January 2012 at a wonderful conference organized in Zakopane, Poland, by colleagues from the erstwhile GridLab project. I talked about how increasing data volumes demand radically new approaches to delivering research computing. Lively discussion ensued.
Strong field science core proposal for uph ill siteahsanrabbani
This document provides a 3-year project summary for an international collaboration on strong field science:
1. The project aims to strengthen India's participation in experiments studying matter under extreme light intensities through international collaboration.
2. Indian researchers will perform experiments at international facilities and host foreign researchers, while also building expertise for potential future domestic facilities.
3. The project aims to consolidate Indian efforts in this area and increase international exposure, with a goal of developing robust Indian competence in studying matter with petawatt laser pulses.
Data At Risk poster for UNESCO ConferenceChris Muller
We, CODATA's Data-At-Risk Task Group (DARTG), were privileged to run a session at the great "Memory of the World" conference in Vancouver in September 2012. Our theme was the identification and protection of important scientific data that is in danger of being lost forever.
see: Web Page: http://sawconcepts.com/heartbeacon/index.html System of Systems Framework for a Better World: Patent Application #13,573,002: The Heart Beacon Cycle LINK Method patent application describing a procedural template framework to form, maintain trade federations among widely distributed organizations by using several key DoD / DARPA system of systems projects to derive key proedures and building block components. Main embodiment is written as to be immune to recent Supreme Court / USPTO limitations on software method process patents. Invention improves mililtary "greatest invention" system of systems situational awareness program adding metrics and meters to monitize time / space trade federation activities.
The document provides an overview of the NGSP Group's research at Swinburne University of Technology, including three key areas:
1) Data management in cloud computing focusing on storage, placement, and replication strategies.
2) Performance management of scientific workflows, addressing temporal quality of service, constraints, monitoring, and violation handling.
3) Security and privacy protection in the cloud, particularly developing noise obfuscation techniques to protect indirect private information during normal cloud service processes.
The document discusses the concept of "Ba", a shared space for knowledge creation proposed by Japanese philosophers. It provides three types of Ba - physical, virtual, and mental. Ba exists at different levels from individual to organization and beyond. Knowledge is also categorized into explicit and tacit forms. The SECI model is presented as a framework for knowledge conversion between tacit and explicit through socialization, externalization, combination, and internalization processes. Examples of companies applying Ba concepts such as dedicated teams and cross-functional groups at Toshiba and Maekawa are outlined.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
This document discusses the need for digital curation specialists in library settings to manage the growing volume of scholarly data and output. It recognizes that libraries have the skills and infrastructure to curate digital resources but will need new roles like digital curators, archivists, and data scientists. These roles require new training programs and concentrations in areas like data curation to develop specialists that can preserve, organize, and provide access to digital collections over the long term.
Building the Pacific Research Platform: Supernetworks for Big Data ScienceLarry Smarr
The document summarizes Dr. Larry Smarr's presentation on building the Pacific Research Platform (PRP) to enable big data science across research universities on the West Coast. The PRP provides 100-1000 times more bandwidth than today's internet to support research fields from particle physics to climate change. In under 2 years, the prototype PRP has connected researchers and datasets across California through optical networks and is now expanding nationally and globally. The next steps involve adding machine learning capabilities to the PRP through GPU clusters to enable new discoveries from massive datasets.
The document summarizes the 3DPAS theme of the e-Science Institute, which focuses on dynamic distributed data-intensive programming systems and applications. It discusses 14 example applications in different domains that were analyzed in terms of their data, infrastructure usage, and dynamic properties. It provides more detailed descriptions of 6 applications, covering their data sources, processing workflows, and current and future infrastructure usage. The goal of the 3DPAS theme is to understand the unique challenges of data-intensive applications and identify requirements to support future exascale applications involving distributed and dynamic data and computing.
The document discusses big data in astronomy and the LineA-DEXL case. It provides an outline and introduction to big data in science and hypothesis-driven research. It discusses data management techniques like data partitioning and parallel workflow processing. It then provides details on the Laboratorio Nacional de Computacao Cientifica (LNCC) and its role in supporting computational modeling and bioinformatics. It discusses astronomy surveys that generate large amounts of data like the Dark Energy Survey and challenges of data from the Large Synoptic Survey Telescope. Finally, it discusses the need for data infrastructure, metadata management, and distributed data management to support scientific research involving big data.
This document discusses open access to scientific research data. It notes that scientific research is increasingly data-driven and large-scale, especially in fields like high-energy physics, astronomy, and biology. However, inadequate access to research data is a problem, limiting opportunities to reuse data and validate or build upon past findings. The document examines some incentive-based approaches and key developments related to improving data sharing. It provides examples of large-scale data generation projects and challenges around managing and analyzing big data. Overall, the document argues that unrestricted sharing of scientific data deposited in the public domain could accelerate research and advance knowledge.
- Scientific names for species can change over time as taxonomy knowledge evolves
- An event-centric ontology model represents names and changes through time using different URIs for taxon concepts at different times
- Transition and snapshot models can then simplify the descriptions by linking concepts over time or just showing current names
- This approach allows integrated representation of taxonomy knowledge and its revisions in a computable way
Cloud-Based Solutions for Scientific ComputingIan Lewis
This paper discusses examples of cloud computing for scientific applications, including public, private, and hybrid cloud solutions for scientists, and research into performance-improving mechanisms.
How to use science maps to navigate large information spaces? What is the lin...Andrea Scharnhorst
A. Scharnhorst (2016) Wie können Wissenschaftskarten zur Suche in grossen Informationsräumen eingesetzt werden? How to use science maps to navigate large information spaces? What is the link between science maps and predictive models of science? Invited lecture Fraunhofer-Institut für Naturwissenschaftlich-Technische Trendanalysen, Euskirchen, Germany, December 7, 2016
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
Large amounts of scientific data remain uncurated, especially small datasets, which are currently invisible or "dark data". This dark data should be curated locally with the involvement of non-scientists at long-lived institutions like libraries and museums that have experience managing scholarly information over time. New roles and skills are needed for data scientists, digital curators, and biological information specialists to help address this problem by developing the necessary cyberinfrastructure, data standards, and educational programs to make more scientific dark data accessible.
Advancing Science through Coordinated CyberinfrastructureDaniel S. Katz
How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
This document discusses the need for geoinformatics and cyberinfrastructure in the geosciences. It argues that answering complex scientific questions requires integrating all available data, but that currently it is too difficult to find, work with, and access relevant data and tools. The document advocates for strong partnerships between geoscientists and computer scientists to build user-friendly tools that facilitate data sharing, integration and analysis in order to accelerate scientific progress. It emphasizes that data needs to be organized into databases and data systems with standards and formats to make it easily discovered and used by the broader community.
Rethinking how we provide science IT in an era of massive data but modest bud...Ian Foster
A talk given in January 2012 at a wonderful conference organized in Zakopane, Poland, by colleagues from the erstwhile GridLab project. I talked about how increasing data volumes demand radically new approaches to delivering research computing. Lively discussion ensued.
Strong field science core proposal for uph ill siteahsanrabbani
This document provides a 3-year project summary for an international collaboration on strong field science:
1. The project aims to strengthen India's participation in experiments studying matter under extreme light intensities through international collaboration.
2. Indian researchers will perform experiments at international facilities and host foreign researchers, while also building expertise for potential future domestic facilities.
3. The project aims to consolidate Indian efforts in this area and increase international exposure, with a goal of developing robust Indian competence in studying matter with petawatt laser pulses.
Data At Risk poster for UNESCO ConferenceChris Muller
We, CODATA's Data-At-Risk Task Group (DARTG), were privileged to run a session at the great "Memory of the World" conference in Vancouver in September 2012. Our theme was the identification and protection of important scientific data that is in danger of being lost forever.
see: Web Page: http://sawconcepts.com/heartbeacon/index.html System of Systems Framework for a Better World: Patent Application #13,573,002: The Heart Beacon Cycle LINK Method patent application describing a procedural template framework to form, maintain trade federations among widely distributed organizations by using several key DoD / DARPA system of systems projects to derive key proedures and building block components. Main embodiment is written as to be immune to recent Supreme Court / USPTO limitations on software method process patents. Invention improves mililtary "greatest invention" system of systems situational awareness program adding metrics and meters to monitize time / space trade federation activities.
The document provides an overview of the NGSP Group's research at Swinburne University of Technology, including three key areas:
1) Data management in cloud computing focusing on storage, placement, and replication strategies.
2) Performance management of scientific workflows, addressing temporal quality of service, constraints, monitoring, and violation handling.
3) Security and privacy protection in the cloud, particularly developing noise obfuscation techniques to protect indirect private information during normal cloud service processes.
The document discusses the concept of "Ba", a shared space for knowledge creation proposed by Japanese philosophers. It provides three types of Ba - physical, virtual, and mental. Ba exists at different levels from individual to organization and beyond. Knowledge is also categorized into explicit and tacit forms. The SECI model is presented as a framework for knowledge conversion between tacit and explicit through socialization, externalization, combination, and internalization processes. Examples of companies applying Ba concepts such as dedicated teams and cross-functional groups at Toshiba and Maekawa are outlined.
The document describes the 8 step data mining process:
1) Defining the problem, 2) Collecting data, 3) Preparing data, 4) Pre-processing, 5) Selecting an algorithm and parameters, 6) Training and testing, 7) Iterating models, 8) Evaluating the final model. It discusses issues like defining classification vs estimation problems, selecting appropriate inputs and outputs, and determining when sufficient data has been collected for modeling.
5 Ps of strategy - strategic management - Manu Melwin Joymanumelwin
Professor Henry Mintzberg articulated the "5 Ps of strategy" which are: plan, ploy, position, pattern, and perspective. Understanding strategy as a plan, ploy, position, pattern, or perspective is important according to Mintzberg. The document discusses Professor Henry Mintzberg's conceptualization of the "5 Ps of strategy" as different ways of viewing strategy.
The document provides an overview of database architecture and basic concepts such as what a database is, structured query language (SQL), and stored procedures. A database allows for structured storage and retrieval of complex data. SQL is used to manipulate and retrieve data from databases. Stored procedures are programs stored in databases that perform specific tasks like validating arguments. They provide benefits like improved performance and protection of database integrity.
The Importance of Listening to Your CustomersDrift
The document discusses the importance of listening to customers and provides examples of companies that failed or succeeded by listening to customer feedback. It introduces the "Spotlight Framework" to categorize customer feedback into user experience issues, product marketing issues, and positioning issues to prioritize responses. It advocates using an incremental approach to make many small updates in response to feedback rather than large changes, in order to strengthen the brand through improved customer experience.
This document discusses employee engagement and its importance. It defines three levels of employee engagement: actively engaged employees who strive to meet and exceed expectations; not engaged employees who feel overlooked and have unproductive relationships; and actively disengaged employees who undermine others and damage the organization. Factors that influence engagement include importance, attrition rates, productivity, costs, and innovation. Measuring engagement involves listening, surveying current levels using tools like the Gallup Q12, and analyzing survey results.
This document introduces the concept of a "fourth paradigm" of scientific discovery based on data-intensive science. It discusses how scientific breakthroughs will increasingly depend on advanced computing capabilities to analyze massive datasets. The document presents perspectives from pioneering computer scientist Jim Gray and others on how data-intensive science represents a new era that requires new tools and ways of working. It also discusses the need for digital libraries and infrastructure to support the long-term curation, analysis, and sharing of scientific data and knowledge.
The document provides instructional steps for a lesson on the solar system. Students will research the planets and sun online and in books. They will take notes and create a model or poster showing the organization of the solar system. Students will present their projects and be assessed using a rubric focusing on inclusion of planets, order, size, color, and facts about each planet. The goal is for students to understand how the solar system is organized and limitations of models in representing it.
Foundations for the future of science discusses using artificial intelligence and machine learning to advance scientific research. Key points discussed include using AI to analyze large datasets, develop scientific models, and automate experimental workflows. The document also outlines several examples of how the Globus data platform is currently enabling AI-powered scientific applications across multiple domains. Overall, the document advocates that embracing "AI for science" has the potential to accelerate scientific discovery by overcoming limitations in human analysis capabilities and computational resources.
OII Summer Doctoral Programme 2010: Global brain by Meyer & SchroederEric Meyer
The document discusses how technology is driving research to become more collaborative globally through distributed and networked tools. It examines several case studies where technologies enabled large-scale collaborative research projects that addressed questions too big for individual labs. These include distributed computing for particle physics, genomic studies, and proteomics. Challenges discussed include interoperability, data sharing policies, and sustaining momentum in infrastructure.
Supplementary presentation slides from a lecture on digital preservation given at the University of the West of England (UWE) as part of the MSc in Library and Library Management, University of the West of England, Frenchay Campus, Bristol, March 10, 2010
e-Science and Technology Infrastructure for Biodiversity Research discusses e-science, which involves conducting science using vast computational resources and data over the internet. It involves areas like astronomy, biology, earth science, health, and more. Key aspects of e-infrastructure discussed are that it provides on-demand access to distributed resources like a power grid, and supports scientific discovery through computational tools. Challenges to e-infrastructure include organizational, financial, legal, and technical issues. Lifewatch is highlighted as a European e-science infrastructure for biodiversity research providing advanced capabilities for research on biodiversity systems.
The document discusses the evolving landscape of semantic technologies and their applications to scientific domains like eScience. It introduces the Tetherless World Constellation, a research group applying semantic web techniques. Examples are given of projects applying semantics to areas like virtual observatories and provenance capture. The value of semantic technologies is discussed for integration, discovery, and validation of scientific data and models. Modular ontologies and semantically-enabled frameworks are presented as important directions for reuse and collaboration.
1) The document discusses challenges in using machine learning and data analytics for materials science research. Specifically, most materials are irrelevant for a given purpose, so models need to identify statistically exceptional subgroups rather than averaging all data.
2) Two potential methods for identifying promising subgroups are discussed: focusing on materials with small oxygen-carbon-oxygen angles or large carbon-oxygen bond lengths for catalysis applications.
3) The concept of a model's domain of applicability is introduced, wherein models perform best when applied only to similar data they were trained on, rather than all data globally. Identifying these reliable domains is important.
This document discusses Jim Gray's vision of a fourth paradigm of scientific discovery based on data-intensive science. It outlines three activities of data-intensive science - data capture, curation, and analysis - and argues that funding is needed to develop tools to support these activities across different scales and types of data. It also discusses the need for digital libraries to archive both data and documents, similar to traditional libraries, to support scientific communication and the construction of a permanent scientific record.
Data Science is an interdisciplinary approach that combines computational science, statistics, and domain knowledge to extract meaningful insights from large and complex data. It aims to address challenges posed by the data revolution characterized by big data from diverse sources. There is no single agreed upon definition, but most definitions emphasize applying techniques from computer science, statistics, and the relevant domain area to discover patterns, make predictions, and support decision making from data. Key aspects include developing appropriate methodologies for knowledge discovery, forecasting and decisions using large and diverse data from sources like surveys, social media, sensors and more. The integration of domain knowledge representation with computational and statistical tools is seen as an important novelty that can enhance data analysis and interpretation.
The document discusses the importance of data in answering questions and unraveling mysteries across many fields such as life sciences, astronomy, physics, geosciences, and more. It notes that data comes from every field and is shared around the world through various grid infrastructures. The document also highlights how much data is being produced and stored, and emphasizes that computational tools are essential for comprehending and analyzing large data collections.
The ITIC will examine NASA's IT infrastructure, software, and data environments to identify opportunities for improvement. This includes investigating collaborative tools, high performance computing, data storage, and aerospace communications. The committee will also review the OCIO's strategic plans and IT governance across NASA to recommend best practices for managing IT infrastructure. The goal is to help NASA utilize leading edge capabilities and disruptive technologies to enhance distributed teams and mission activities.
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
The document summarizes the evolution of e-research over three generations from 1981 to the present. The first generation saw early adopters using tools within their disciplines with some reuse. The second generation was characterized by increased reuse of tools, data and methods across areas. The third generation is defined by radical sharing of resources globally across any discipline through social networks and reusable research objects. The document also discusses several specific projects and tools that exemplify each generation of e-research including myExperiment, Galaxy, and SALAMI.
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
This document summarizes an presentation about opportunities for data exchange and optimizing data sharing conditions. It discusses several projects by LIBER, including Europeana which aims to make cultural content available online. It notes that with proper infrastructure, researchers can collaborate on shared data sets across locations. However, challenges include authentication, skills, and managing large amounts of data being generated. Overall, the presentation argues that data sharing can advance scientific inquiry if barriers are addressed and key stakeholders work together.
Similar to Advanced Data Mining and Integration Research for Europe (ADMIRE) (20)
Euro Cup Group E Preview, Team Strategies, Key Players, and Tactical Insights...Eticketing.co
We offer Euro Cup Tickets to admirers who can get Belgium vs Romania Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Georgia vs Portugal Euro Cup 2024 Clash Unites a Nation Amid Turmoil.pdfEticketing.co
Euro 2024 fans worldwide can book Georgia vs Portugal Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Croatia's UEFA Euro 2024 Puzzle of Experience versus Youth.docxEuro Cup 2024 Tickets
The Netherlands kicked off their Euro Cup 2024 campaign on Sunday against Poland but will have to navigate the tournament without two pivotal players Frenkie de Jong and Teun Koopmeiners
Olympic 2024 Key Players and Teams to Watch in Men's and Women's Football at ...Eticketing.co
Olympic 2024 fans worldwide can book Olympic Football Tickets from our online platforms e-ticketing. co. Fans can book Olympic Tickets on our website at discounted prices. Experience the thrill of the Games in Paris and support your favorites athletes as they compete for glory.
Turkey UEFA Euro 2024 Journey A Quest for Redemption and Success.docxEticketing.co
We offer Euro Cup Tickets to admirers who can get Turkiye vs Georgia Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Belgium vs Romania A Comprehensive Preview of Euro 2024 Campaigns, Key Player...Eticketing.co
Euro 2024 fans worldwide can book Belgium vs Romania Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Euro 2024 Belgium's Rebirth the New Generation Match the Golden Era.docxEticketing.co
The Golden Group is over. Can a new group step up? Two years ago, Kevin De Bruyne plunged Belgium’s Euro 2024 plans into disorder when he claimed the team was “too old” to win in an interview with The Protector. That Belgian squad had 10 players over 30 and the maximum average age of any Euro Cup 2024 team at the competition. A group-stage exit and just one goal at the World Cup put Belgium on course for a restructure.
We offer Euro Cup Tickets to admirers who can get Belgium vs Romania Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Belgium vs Romania Tickets | Euro Cup Tickets | Euro Cup Final Tickets
Coach Domenico Tedesco has managed a tactical shakeup and a regular exit for some of the oldest players. Experienced bests remain, not least the 37-year-old Jan Vertonghen in defense, the 32-year-old De Bruyne himself in midfield, and 31-year-old Romelu Lukaku up visible.
Still, younger actors like De Bruyne’s Manchester City partner Jeremy Doku bring fresh vitality to the team. Euro Cup Germany Qualifying unbeaten with just four goals allowed from eight games was a welcome sign of accomplishment back on track under Tedesco.
The only other squad in Group E besides Belgium to UEFA Euro 2024 qualify unbeaten, Romania was awestruck by winning a group that also checked Switzerland and Israel. Still, Euro 2024 will test a squad sorely lacking in top-level skill.
Euro 2024: Belgium's Transition from Golden Generation to New Hope
Tottenham guardian Vlad Dragusin is the only Euro Cup 2024 squad member singing regularly for one of Europe’s top clubs this flavor. He even played only nine Premier League games since adoption in January. Goalkeeper Horatiu Moldovan is a stoppage at Atletico Madrid.
There’s a link to the beauty days of Romanian soccer with midfielder Ianis Hagi, son of Gheorghe Hagi, who assisted the team to the rounds of the 1994 World Cup and Euro 2000.
We are only a combine of days away from the UEFA Euro 2024 curtain raiser. The 24 squads are winding up their provisions and getting ready to give it their all to life the wanted Euro Cup Final trophy on July 14. Spread across six clusters, the first hurdle in the knockout phase will be the plump of 16.
Euro fans worldwide can book Euro 2024 Tickets from our online platform, www.eticketing.co. Fans can book Euro Cup 2024 Tickets on our website at discounted prices.
Germany and Scotland will take things off before we get into overdrive in two weeks. Meanwhile, Belgium will be longing to bounce back after a horrendous 2022 FIFA World Cup movement, which ended in the group stage.
Belgium vs Romania Tickets | Euro Cup 2024 Tickets | Euro Cup Tickets | Euro Cup Final Tickets
Roberto Martinez completed the way for Domenico Tedesco, who has overseen a compact start to his tenure. The 38-year-old will be assured heading into the group stage
Serbia vs England Tickets: Serbia's Historic Euro 2024 Journey, A Blend of Ex...Eticketing.co
Eticketing.co offers UEFA Euro 2024 Tickets to admirers who can get Serbia vs England Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Matka BOSS Result | Satta Matka Tips | Kalyan Matka 143dpbossdpboss69
Satta BOSS Matka | DpBoss Matka | Matka BOSS Result | Satta Matka Tips | Kalyan Matka 143 · SATTA KING · ➥ SATTA MATKA TIME TEBAL · SATTA KING · ➥ Weekly ...
Poland vs Netherlands UEFA Euro 2024 Poland Battles Injuries Without Lewandow...Eticketing.co
UEFA Euro 2024 fans worldwide can book Poland vs Netherlands Tickets from our online platform www.eticketing.co. Fans can book Euro Cup Germany Tickets on our website at discounted prices.
Volleyball, born in 1895 through the inventive mind of William G. Morgan, originated as "Mintonette" within YMCA circles in Holyoke, Massachusetts, USA. Quickly evolving from its humble beginnings, it underwent a significant name change to "volleyball" as Alfred Halstead aptly captured its essence during an early exhibition. With standardized rules established, the sport spread rapidly, finding fertile ground within YMCA organizations and beyond. The formation of the Fédération Internationale de Volleyball (FIVB) in 1947 marked a pivotal moment, ushering in a new era of international recognition and growth. Volleyball made its Olympic debut in 1964, captivating audiences worldwide with its fast-paced action and competitive spirit. Over the years, beach volleyball emerged as a popular variant, further diversifying the sport's appeal. Today, volleyball stands as a global phenomenon, celebrated for its athleticism, teamwork, and universal accessibility, embodying the enduring spirit of camaraderie and competition.
Football World Cup enthusiasts worldwide can secure their FIFA World Cup 2026 Tickets through our online platform, eticketing.co. With a user-friendly interface and exclusive deals, fans can effortlessly book FIFA World Cup Tickets for thrilling matches, all at discounted prices.
Euro 2024 Key Tactics and Strategies of the Netherlands.docxEticketing.co
We offer Euro Cup Tickets to admirers who can get Netherlands vs Austria Tickets through our trusted online ticketing marketplace. Eticketing.co is the most reliable source for booking Euro Cup Final Tickets. Sign up for the latest Euro Cup Germany Ticket alert.
Advanced Data Mining and Integration Research for Europe (ADMIRE)
1. Advanced Data Mining and
Integration Research for
Europe (ADMIRE)
Jano van Hemert
NI VER
research.nesc.ac.uk
U S
E
IT
TH
Y
O F
H
G
E
R
D I U
N B
2. Downloaded from www.sciencemag.org on July 6, 2009
COMPUTER SCIENCE
The demands of data-intensive science
Beyond the Data Deluge represent a challenge for diverse scientific
communities.
Gordon Bell,1 Tony Hey,1 Alex Szalay2
S
ince at least Newton’s laws of motion in
the 17th century, scientists have recog-
nized experimental and theoretical sci-
ence as the basic research paradigms for
understanding nature. In recent decades, com-
puter simulations have become an essential
third paradigm: a standard tool for scientists to
explore domains that are inaccessible to theory
and experiment, such as the evolution of the
universe, car passenger crash testing, and pre-
dicting climate change. As simulations and
experiments yield ever more data, a fourth par-
adigm is emerging, consisting of the tech-
niques and technologies needed to perform
data-intensive science (1). For example, new
types of computer clusters are emerging that
are optimized for data movement and analysis
rather than computing, while in astronomy and
other sciences, integrated data systems allow
data analysis and storage on site instead of
requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive
Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen-
hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image
volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service.
throughput instruments, sensor networks,
accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas-
to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists.
(2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s
these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to
CREDIT: JONATHAN FAY/MICROSOFT
(1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the
bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for
the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever
management challenges. In almost every labo- increasing flood of scientific data generated
ratory, “born digital” data proliferate in files, by the faster computers. In university research
1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually
98052, USA. 2Department of Physics and Astronomy, Johns
Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can
21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have
www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297
Published by AAAS
3. Downloaded from www.sciencemag.org on July 6, 2009
COMPUTER SCIENCE
The demands of data-intensive science
Beyond the Data Deluge represent a challenge for diverse scientific
communities.
Gordon Bell,1 Tony Hey,1 Alex Szalay2
S
ince at least Newton’s laws of motion in
the 17th century, scientists have recog-
nized experimental and theoretical sci-
ence as the basic research paradigms for
understanding nature. In recent decades, com-
puter simulations have become an essential
third paradigm: a standard tool for scientists to
explore domains that are inaccessible to theory
and experiment, such as the evolution of the
universe, car passenger crash testing, and pre-
dicting climate change. As simulations and
experiments yield ever more data, a fourth par-
adigm is emerging, consisting of the tech-
niques and technologies needed to perform
data-intensive science (1). For example, new
types of computer clusters are emerging that
are optimized for data movement and analysis
rather than computing, while in astronomy and
other sciences, integrated data systems allow
data analysis and storage on site instead of
requiring download of large amounts of data. Moon and Pleiades from the VO. Astronomy has been one of the first disciplines to embrace data-intensive
Today, some areas of science are facing science with the Virtual Observatory (VO), enabling highly efficient access to data and analysis tools at a cen-
hundred- to thousandfold increases in data tralized site. The image shows the Pleiades star cluster form the Digitized Sky Survey combined with an image
volumes from satellites, telescopes, high- of the moon, synthesized within the World Wide Telescope service.
throughput instruments, sensor networks,
accelerators, and supercomputers, compared challenging scientists (4). In contrast to the tra- ing of these digital data are becoming increas-
to the volumes generated only a decade ago ditional hypothesis-led approach to biology, ingly burdensome for research scientists.
(2). In astronomy and particle physics, Venter and others have argued that a data- Over the past 40 years or more, Moore’s
these new experiments generate petabytes intensive inductive approach to genomics Law has enabled transistors on silicon chips to
CREDIT: JONATHAN FAY/MICROSOFT
(1 petabyte = 1015 bytes) of data per year. In (such as shotgun sequencing) is necessary to get smaller and processors to get faster. At the
bioinformatics, the increasing volume (3) and address large-scale ecosystem questions (5, 6). same time, technology improvements for
the extreme heterogeneity of the data are Other research fields also face major data disks for storage cannot keep up with the ever
management challenges. In almost every labo- increasing flood of scientific data generated
ratory, “born digital” data proliferate in files, by the faster computers. In university research
1MicrosoftResearch, One Microsoft Way, Redmond, WA spreadsheets, or databases stored on hard labs, Beowulf clusters—groups of usually
98052, USA. 2Department of Physics and Astronomy, Johns
Hopkins University, 3701 San Martin Drive, Baltimore, MD drives, digital notebooks, Web sites, blogs, and identical, inexpensive PC computers that can
21218, USA. E-mail: szalay@jhu.edu wikis. The management, curation, and archiv- be used for parallel computations—have
www.sciencemag.org SCIENCE VOL 323 6 MARCH 2009 1297
Published by AAAS
4. Vol 455|4 September 2008
BOOKS & ARTS
Distilling meaning from data
Buried in vast streams of data are clues to new science. But we may need to craft new
lenses to see them, explain Felice Frankel and Rosalind Reid.
It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun-
as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www.
ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt-
we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and
bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When
for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their
unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes
in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science.
further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training
ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for
venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline.
questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci-
To extract new meaning entists make them rarely as
D. ARMENDARIZ
from the sea of data, scien- adventurous as these young
tists have begun to embrace minds. We think we are on
23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409
the tools of visualization. Yet the path to insight when
few appreciate that visual rep- shading reveals contours
resentation is also a form of in 3D renderings, or when
communication. A rich body bursts of red appear on heat
of communication expertise maps, for example. But the
Vol 440|23 March 2006
holds the potential to greatly algorithms used to produce
improve these tools. We pro- the graphics may create illu-
pose that graphic artists, com- sions or embed assumptions.
municators and visualization
scientists should be brought
into conversation with theo-
The human visual system
creates in the brain an appar-
ent understanding of what
COMMENTARY
rists and experimenters a picture represents, not
before all the data have been necessarily a picture of the
gathered. If we design experi- underlying science. Unless
Exceeding human limits
ments in ways that offer varied we know all the steps from
opportunities for represent- hypothesis to understand-
ing and communicating data, ing — by conversing with
techniques for extracting new theorists, experimentalists,
understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software are turning to automated processes and technologies in a bid to cope with ever higher volumes of data.
Scientists
available. developers, visualization
But automation offers so much more to the future of science than just data handling, says Stephen H. Muggleton.
Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy-
intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis-
FIREFLY PRODUCTIONS/CORBIS
for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation
near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the
pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas-
important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam-
that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput
on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I
experiment in biology can
to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a
gigabyte of data per day, and in astronomy
a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University,
broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a
automatic
tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci-
On the Surface
other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas-
ences the
querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by
ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing
low-cost digital
tion, discovery and communication. for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is
efficiency
Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the
clear
communicators are often consulted only after science emerged. expansion of automation in all its aspects: data
5. c and probability cal- and charge distributionshould become easier for autonomous experimen
On such timescales it of individual molecules however, still a decade
ic provides a formal need to be integrated scientists to reproduce new experiments and becoming standard scie Vol 455|4 September 2008
gramming languages with models describ- refute their hypotheses. Despite the potentia
BOOKS & ARTS
probability calculus ing Today’s generation of microfluidic
“Owing tomachines severe danger data
the scale and rate of that incre
the interdepen- generation, computational models of
ms of probability for dency of chemical out a specific series of ume of data generation
is designed to carry
Distilling meaning from data reactions, scientific flexibility decreases in compreh
s bayesian networks.new science. But we may needHowever, but further data now require automatic
chemical
Buried in vast streams of data are clues to reactions. to craft new
stic logic’ is a formaland Rosalind Reid. be added the tool kit by developing Academic studies on the
could to this
construction and modification.”
lenses to see them, explain Felice Frankel
differences in
statements of sound mathematical under- call what one might t
It is a breathtaking time in science they will create effective computer displays, those run by the US National Science Foun-
as masses of data pour in, prom- slides and figures for publication. Meanwhile, dation’s Picturing to Learn project (www.
ising new insights. But how can they may be developing their tools in isolation, picturingtolearn.org), teach us that attempt-
a ‘chemical Turing “There is a severe danger that i
robability of A being pinnings of, say, differential equations, bayesian puter. Such chips contai
we find meaning in these tera- kept at arm’s length by scientists who are busy ing to visually communicate scientific data and
bytes? To search successfully getting their experiments done. Opportunities concepts opens a path to understanding. When
for new science in large datasets, we must find for useful dialogue are thus squandered. science and design students collaborate, their
machine’. The universal
ure forms of existing networks and logic programs make integrating chambers, ducts, gates t
unexpected patterns and interpret evidence When scientists, graphic artists, writers, ani- drive to understand one another’s ideas pushes
increases in speed and volume of n
in ways that frame new questions and suggest mators and other designers come together to them to create new ways of seeing science.
further explorations. Old habits of represent- discuss problems in the visual representation Investment in visual communication training
Turing machine, devised
fortunately computa- these various models virtually impossible. reagent stores, and allow
ing data can fail to meet these challenges, pre- of science, such as at the Image and Meaning for young scientists will pay off handsomely for
venting us from reaching beyond the familiar workshops run by Harvard University (www. any data-intensive discipline.
wever, an increasing Although by Alan Turing, be data generation could leadat high sp
in 1936 hybrid models can built by simply sis and testing to
questions and answers. imageandmeaning.org), it becomes clear The ingrained habits of highly trained sci-
t
To extract new meaning entists make them rarely as
D. ARMENDARIZ
from the sea of data, scien- adventurous as these young
tists have begun to embrace minds. We think we are on
was intended to mimic decreases in comprehensibility.”
ups have developed patching two models together, the underlying miniaturizing our robot-o
23.3 Commentary Muggleton jw 20/3/06 6:29 PM Page 409
the tools of visualization. Yet the path to insight when
few appreciate that visual rep- shading reveals contours
resentation is also a form of in 3D renderings, or when
communication. A rich body
of communication expertise
holds the potential to greatly
the pencil-and-paper
ques that can handle differences lead to unpredictable and error- this way, with the overal bursts of red appear on heat
maps, for example. But the
Vol 440|23 March 2006
algorithms used to produce
s
probabilistic logic6. prone behaviour mathematician. The chemical experimental cycle time
improve these tools. We pro-
pose that graphic artists, com- operations of a when changes are made. beings. This is particu
the graphics may create illu-
sions or embed assumptions.
municators and visualization
such research holds Turing encouraging development in this liseconds.associated with
scientists should be brought
machine would be a universal proces- nologies With microflu COMMENTARY
The human visual system
creates in the brain an appar-
One
into conversation with theo- ent understanding of what
rists and experimenters a picture represents, not
egration of scientific respect is the emergence withinbroad range of chemical reaction not onA
before all the data have been
gathered. If we design experi- sor capable of performing a computer sci- and experimentation.
necessarily a picture of the
underlying science. Unless
al and computer-sci- ence of new formalisms5 that integrate, in alimits
chemical operations Exceeding human complete, but also requi
ments in ways that offer varied we know all the steps from
opportunities for represent-
ing and communicating data,
techniques for extracting new
on both the reagents essentially human activhypothesis to understand-
ing — by conversing with
theorists, experimentalists,
available to it at the start andoffersto automated processes andof science thaninjustbid to cope with saysStephen H. Muggleton. a
thoseof mathe- of input materials, with o
Scientists are turning
chemicals bothhandling, ever higher volumes of data.
technologies a
data in the statement
understanding can be made Discussing visual communication before designing experiments may reveal new science. instrument and software
available.
sound fashion, two major branches more to the future
But automation so much developers, visualization
Visual representation is familiar in data- that representations repeatedly fail to com- scientists, graphic artists and cognitive psy-
matics: mathematical logic and probabilityauto- On such timescales it sho
it later generates. The machine would cal- clear and undeniable
intensive fields. Years before a detector is built municate understanding or address obvious chologists — we cannot be sure whether a dis-
FIREFLY PRODUCTIONS/CORBIS
for a facility such as the Large Hadron Collider questions about the underlying data. A three- play is accurate or misleading. The collection and curation
near Geneva, for example, physicists will have dimensional volume rendering may give no The greatest opportunity and risk lie in that of data throughout the
s culus. Mathematicaland test chemical com- scientists to reproduce n
matically prepare logic provides a formal experimentation.
pored over simulations. They examine how hint of important uncertainties or data gaps; last step in the path: understanding. Whether sciences is becoming increas-
important events will ‘look’ in the displays solid surfaces or sharp edges may suggest data verbal or visual, any language that is garbled ingly automated. For exam-
that reveal and communicate what is going where they do not exist. A graphic artist might and inconsistent fails to do its job. Let’s talk. ple, a single high-throughput
pounds but it would also be programmable, Stephen H. Muggleton is
learning approaches foundation for logic programming languages refute their hypotheses.
on inside the machine. Such discussions tend propose ways to reveal gaps or deviations from Let’s all talk. I
experiment in biology can
to take place within the visual conventions of expectation early in an experiment, guiding Felice Frankel is senior research fellow in the easily generate more than a
gigabyte of data per day, and in astronomy
a field. But perhaps conversations might be subsequent data collection or highlighting new faculty of arts and sciences at Harvard University,
ng scientific models such as Prolog, much theprobability calculusa Computing and the Centr
thus allowing whereas same flexibility as
broadened to consider alternative represen- avenues of enquiry. When we asked Harvard Cambridge, Massachusetts 02138, USA. With data collection leads to more than a
automatic
Today’s generation of m
tations of the same data. These might suggest University chemist George Whitesides to G. M. Whitesides, she is co-author of terabyte of data per night. Throughout the sci-
On the Surface
other approaches to collecting, organizing and change the geometry of a self-assembled of Things: Images of the Extraordinary in Science. volumes of archived data are increas-
ences the
real chemist has in the lab.
p’ systems with no provides the basic axioms of probability for is designed to carry ou Systems Biology at Imper
querying data that will maximize the transpar- monolayer with clearly delineated hydropho- e-mail: felice_frankel@harvard.edu ing exponentially, supported not only by
ency of experimental results and thus aid intui- bic and hydrophilic areas to create an image Rosalind Reid is executive director of the Initiative storage but also by the growing
tion, discovery and communication.
low-cost digital
for submission to a journal, he found himself in Innovative Computing at Harvard University of automated instrumentation. It is
efficiency
to the collection of One can think of a chemical Turing 2BZ, UK.
Unfortunately, visualization experts and redesigning the experiment, and unexpected and former Editor of American Scientist. that the future of science involves the
communicators are often consulted only after science emerged.
clear
expansion of automation in all its aspects: data
7. Aims
• ADMIRE aims to deliver a consistent and easy-to-
use technology for extracting information and
knowledge.
8. Aims
• ADMIRE aims to deliver a consistent and easy-to-
use technology for extracting information and
knowledge.
• The project is motivated by the difficulty of extracting
meaningful information by data mining combinations
of data from multiple heterogeneous and
distributed resources.
9. Aims
• ADMIRE aims to deliver a consistent and easy-to-
use technology for extracting information and
knowledge.
• The project is motivated by the difficulty of extracting
meaningful information by data mining combinations
of data from multiple heterogeneous and
distributed resources.
• It will also provide an abstract view of data mining
and integration, which will give users and
developers the power to cope with complexity and
heterogeneity of services, data and processes.
11. Separating concerns
User and application diversity
Iterative DMI
process
development Accommodating
Tool level
Many application domains
Many tool sets
Many process representations
Many working practices
Gateway interface
DMI canonical representation and abstract machine
one model
Composing or hiding
Enactment Many autonomous resources & services
level Mapping
optimisation Multiple enactment mechanisms
and Multiple platform implementations
enactment
System diversity and complexity
26. Data mining results
Table 1. The preliminary result of classification performance using 10-fold validation
hhhh
h hhClassification Performance
hhhh
hhhh Sensitivity Specificity
Gene expression hh h
Humerus 0.7525 0.7921
Handplate 0.7105 0.7231
Fibula 0.7273 0.718
Tibia 0.7467 0.7451
Femur 0.7241 0.7345
Ribs 0.5614 0.7538
Petrous part 0.7903 0.7538
Scapula 0.7882 0.7099
Head mesenchyme 0.7857 0.5507
Note: Sensitivity: true positive rate. Specificity: true negative rate.
5 Conclusion and Future Work
27. Where we are
• Architecture prototype works
• Intuitive workbench created
• Will be connected next
• Two more use cases
28. Team
National e-Science Centre http://www.admire-project.eu/
Malcolm Atkinson
Jano van Hemert
Liangxiu Han
Gagarine Yaikhom
Chee-Sun Liew
EPCC
Mark Parsons et al.
University of Vienna
Peter Brezany et al.
Universidad Politécnica de Madrid
Oscar Corcho
Slovak Academy of Sciences
Ladislav Hluchý
Fujitsu Labs Europe
David Snelling
ComArch SA
Marcin Choiński http://research.nesc.ac.uk/
Editor's Notes
* This is not about projects, publications
* Where did we suddenly appear from
* One of the papers that is signposting
* Sensors, large machines, interaction with data (software), interaction between people, interaction of software on data, ...
* More explicit forms of demands
* More explicit forms of demands
* A proposed solution
* How do you go about implementing a solution under the fourth paradigm?