"The Mudslide Hypothesis of Science" - OSCONKaitlin Thaney
The document discusses the "mudslide hypothesis" which suggests that traditions persist not due to excellence but because of resistance to change from influential people and the difficulties of transition. It argues that outdated research practices waste time and resources, and that while research is changing, discovery is still suboptimal due to reliance on old systems not designed for modern mediums. It calls for rethinking approaches to research to maximize reuse, allow for network effects, and redefine performance metrics to better support current research workflows.
An overview of Digital Science - a new company started out of Macmillan Publishers dedicated to making research more efficient through better use of technology.
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
The document discusses challenges with the current state of scientific research and proposes ways to leverage the power of the open web to improve science. It notes that current systems are designed to create friction rather than enable open collaboration. The document advocates for adopting practices of open source development like using community-driven metadata for software and open, iterative development. It also argues that policies and incentives need to change to reward openness, reuse and reproducibility in order to avoid wasted time, money and opportunities.
Big data repositories are seeing an increase in smaller, niche datasets as more researchers contribute data. This "long tail of data" poses challenges for discovery, access, and attribution. The authors propose a centralized data repository that would make any dataset discoverable and accessible regardless of size or topic by automating metadata generation and attribution to help researchers find and share relevant data.
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
"The Mudslide Hypothesis of Science" - OSCONKaitlin Thaney
The document discusses the "mudslide hypothesis" which suggests that traditions persist not due to excellence but because of resistance to change from influential people and the difficulties of transition. It argues that outdated research practices waste time and resources, and that while research is changing, discovery is still suboptimal due to reliance on old systems not designed for modern mediums. It calls for rethinking approaches to research to maximize reuse, allow for network effects, and redefine performance metrics to better support current research workflows.
An overview of Digital Science - a new company started out of Macmillan Publishers dedicated to making research more efficient through better use of technology.
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
The document discusses challenges with the current state of scientific research and proposes ways to leverage the power of the open web to improve science. It notes that current systems are designed to create friction rather than enable open collaboration. The document advocates for adopting practices of open source development like using community-driven metadata for software and open, iterative development. It also argues that policies and incentives need to change to reward openness, reuse and reproducibility in order to avoid wasted time, money and opportunities.
Big data repositories are seeing an increase in smaller, niche datasets as more researchers contribute data. This "long tail of data" poses challenges for discovery, access, and attribution. The authors propose a centralized data repository that would make any dataset discoverable and accessible regardless of size or topic by automating metadata generation and attribution to help researchers find and share relevant data.
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
This is a presentation I gave at the Library of Congress as part of a NFAIS/FLICC/CENDI meeting as outlined here: http://www.chemspider.com/blog/making-the-web-work-for-science-presentation-at-the-library-of-congress.html
The presentation provides an overview of some of the challenges the publishers face moving forward, how they are responding to it, how InChI is an enabling technology, how quality is important.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
This document summarizes a presentation about Globus Genomics, a service that provides genomic data analysis tools and workflows through a web interface. It allows users to securely transfer data, run standardized analysis pipelines, access computational resources on demand through Amazon Web Services, and collaborate on shared data and workflows. The service aims to make genomic analysis more accessible, reproducible, and sustainable through various pricing models and support for individual labs and bioinformatics cores.
Microsoft genomics to advance clinical scienceBruno Denys
Microsoft invested mlassively into advanced clinical genomics services in the public cloud Azure. This is a healthcare revolution allowing to create new treatment based on better Genomic information
This document discusses ways to incentivize scientists to share their data through self-interest. It describes two existing models where data sharing is successful: oceanographic research consortia that require data sharing, and biomedical research projects that organize data generation and sharing through a common platform. The document proposes a distributed graph database and computing platform that would allow researchers to query diverse public and private datasets, providing immediate returns for data sharing. By making others' data useful to analyze and mine, researchers would be competitively disadvantaged not to share their own data. The goal is to enable open sharing by addressing current problems and remaining agile for future needs.
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
Science has evolved from the isolated individual tinkering in the lab, through the era of the “gentleman scientist” with his or her assistant(s), to group-based then expansive collaboration and now to an opportunity to collaborate with the world. With the advent of the internet the opportunity for crowd-sourced contribution and large-scale collaboration has exploded and, as a result, scientific discovery has been further enabled. The contributions of enormous open data sets, liberal licensing policies and innovative technologies for mining and linking these data has given rise to platforms that are beginning to deliver on the promise of semantic technologies and nanopublications, facilitated by the unprecedented computational resources available today, especially the increasing capabilities of handheld devices. The speaker will provide an overview of his experiences in developing a crowdsourced platform for chemists allowing for data deposition, annotation and validation. The challenges of mapping chemical and pharmacological data, especially in regards to data quality, will be discussed. The promise of distributed participation in data analysis is already in place.
The document discusses using microformats as an alternative to more complex semantic web standards to integrate existing biological web resources. It proposes hAction, a microformat for biology, that could hook together disparate biological resources more simply than existing options. A demo is shown as a proof of concept that microformats may provide a way to share biological data across the web without large overheads.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
Globus Genomics provides tools and services to help researchers manage and analyze large genomic datasets. It uses Globus data management tools to securely transfer data between institutions. Researchers can then run analysis workflows on cloud compute resources through Galaxy interfaces. This enables researchers to assemble diverse datasets, apply multiple computational models, and publish results for others to discover, validate, and reuse. Examples show researchers using Globus Genomics to process petabytes of sequencing data and perform genome-wide analysis across many institutions. The goal is to accelerate scientific discovery by making it easier for researchers to find "needles in haystacks" through data-intensive computational approaches.
Biovision2017 Accessing the scientific literaturepetermurrayrust
This document summarizes discussions from the ContentMine fellowship on using text mining to extract information from scientific literature. Several fellows describe their projects involving text mining papers to build databases on topics like depressive behaviors in animals, cancer research facts, cell migration patterns, genomic software tools, and weevil-plant associations. One fellow discusses being prevented from downloading papers in bulk by Elsevier to text mine for his research on detecting problematic studies.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Keynote for Theory and Practice of Digital Libraries 2017
The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.
However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
This document provides a summary of a presentation on open scientific knowledge and building a knowledgebase beyond traditional journals. The presentation discusses the problems with publishers controlling infrastructure and restricting access to knowledge. It demonstrates software tools like getpapers and AMI that can be used to freely access and search across scientific literature. The presentation advocates for open access to all scientific literature and building a sustainable community and organization to achieve this goal.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
enabling communities of researchers working together across institutional bou...Brian Bot
Sage Bionetworks is a non-profit organization that pilots various components to build a scientific research commons and enable more open and collaborative biomedical research. It supports ~40 employees working on research platform development and leadership. Sage brings together researchers across disciplines and institutions through initiatives like the Pan-Cancer Atlas Consortium and the CommonMind Consortium to facilitate data sharing and collaborative analysis around common questions.
This presentation discusses open science, knowledge sharing, and the commons. It addresses making sharing easy, legal and scalable through an integrated approach. Key challenges discussed include ensuring content is legally and technically accessible, dealing with semantic disagreements, navigating different legal implementations around knowledge sharing, and addressing "rights" issues with licensing frameworks for data. The presentation advocates for a norms-based approach through principles rather than licenses to create legal zones of certainty and promote interoperability.
A keynote given on experiences in curating workflows and web services.
3rd International Digital Curation Conference: "Curating our Digital Scientific Heritage: a Global Collaborative Challenge"
11-13 December 2007
Renaissance Hotel
Washington DC, USA
This is a presentation I gave at the Library of Congress as part of a NFAIS/FLICC/CENDI meeting as outlined here: http://www.chemspider.com/blog/making-the-web-work-for-science-presentation-at-the-library-of-congress.html
The presentation provides an overview of some of the challenges the publishers face moving forward, how they are responding to it, how InChI is an enabling technology, how quality is important.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
This document summarizes a presentation about Globus Genomics, a service that provides genomic data analysis tools and workflows through a web interface. It allows users to securely transfer data, run standardized analysis pipelines, access computational resources on demand through Amazon Web Services, and collaborate on shared data and workflows. The service aims to make genomic analysis more accessible, reproducible, and sustainable through various pricing models and support for individual labs and bioinformatics cores.
Microsoft genomics to advance clinical scienceBruno Denys
Microsoft invested mlassively into advanced clinical genomics services in the public cloud Azure. This is a healthcare revolution allowing to create new treatment based on better Genomic information
This document discusses ways to incentivize scientists to share their data through self-interest. It describes two existing models where data sharing is successful: oceanographic research consortia that require data sharing, and biomedical research projects that organize data generation and sharing through a common platform. The document proposes a distributed graph database and computing platform that would allow researchers to query diverse public and private datasets, providing immediate returns for data sharing. By making others' data useful to analyze and mine, researchers would be competitively disadvantaged not to share their own data. The goal is to enable open sharing by addressing current problems and remaining agile for future needs.
The National Resource for Network Biology (NRNB) aims to advance network biology science through bioinformatic methods, software, infrastructure, collaboration, and training. In the past year, the NRNB made progress in its specific aims, including developing new network analysis methods, catalyzing changes in network representation, establishing software and databases, engaging in collaborations, and providing training opportunities. Going forward, the NRNB plans to further develop methods for differential and predictive network analysis, multi-scale network representation, and pathway analysis tools.
Science has evolved from the isolated individual tinkering in the lab, through the era of the “gentleman scientist” with his or her assistant(s), to group-based then expansive collaboration and now to an opportunity to collaborate with the world. With the advent of the internet the opportunity for crowd-sourced contribution and large-scale collaboration has exploded and, as a result, scientific discovery has been further enabled. The contributions of enormous open data sets, liberal licensing policies and innovative technologies for mining and linking these data has given rise to platforms that are beginning to deliver on the promise of semantic technologies and nanopublications, facilitated by the unprecedented computational resources available today, especially the increasing capabilities of handheld devices. The speaker will provide an overview of his experiences in developing a crowdsourced platform for chemists allowing for data deposition, annotation and validation. The challenges of mapping chemical and pharmacological data, especially in regards to data quality, will be discussed. The promise of distributed participation in data analysis is already in place.
The document discusses using microformats as an alternative to more complex semantic web standards to integrate existing biological web resources. It proposes hAction, a microformat for biology, that could hook together disparate biological resources more simply than existing options. A demo is shown as a proof of concept that microformats may provide a way to share biological data across the web without large overheads.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
Globus Genomics provides tools and services to help researchers manage and analyze large genomic datasets. It uses Globus data management tools to securely transfer data between institutions. Researchers can then run analysis workflows on cloud compute resources through Galaxy interfaces. This enables researchers to assemble diverse datasets, apply multiple computational models, and publish results for others to discover, validate, and reuse. Examples show researchers using Globus Genomics to process petabytes of sequencing data and perform genome-wide analysis across many institutions. The goal is to accelerate scientific discovery by making it easier for researchers to find "needles in haystacks" through data-intensive computational approaches.
Biovision2017 Accessing the scientific literaturepetermurrayrust
This document summarizes discussions from the ContentMine fellowship on using text mining to extract information from scientific literature. Several fellows describe their projects involving text mining papers to build databases on topics like depressive behaviors in animals, cancer research facts, cell migration patterns, genomic software tools, and weevil-plant associations. One fellow discusses being prevented from downloading papers in bulk by Elsevier to text mine for his research on detecting problematic studies.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Keynote for Theory and Practice of Digital Libraries 2017
The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.
However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
Amit Sheth's Keynote at Semantic Web Technologies for Science and Engineering Workshop (held in conjunction with ISWC2003), Sanibel Island, FL, October 20, 2003.
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMichel Dumontier
Biomedical researchers will remain stymied in their ability to take full advantage of the Big Data revolution if they can never find the datasets that they need to analyze, if there is lack of clarity about what particular datasets contain, and if data are insufficiently described.
CEDAR, an NIH BD2K Center of Excellence, aims to develop methods and tools to vastly ease the burden of authoring good experimental metadata, and to maximally use this information to zero in on datasets of interest.
This document provides a summary of a presentation on open scientific knowledge and building a knowledgebase beyond traditional journals. The presentation discusses the problems with publishers controlling infrastructure and restricting access to knowledge. It demonstrates software tools like getpapers and AMI that can be used to freely access and search across scientific literature. The presentation advocates for open access to all scientific literature and building a sustainable community and organization to achieve this goal.
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
Talk presented at Early Detection of Cancer Conference, OHSU, Portland, Oregon USA, 2-4 Oct 2018, http://earlydetectionresearch.com/ in the Data Science session
enabling communities of researchers working together across institutional bou...Brian Bot
Sage Bionetworks is a non-profit organization that pilots various components to build a scientific research commons and enable more open and collaborative biomedical research. It supports ~40 employees working on research platform development and leadership. Sage brings together researchers across disciplines and institutions through initiatives like the Pan-Cancer Atlas Consortium and the CommonMind Consortium to facilitate data sharing and collaborative analysis around common questions.
This presentation discusses open science, knowledge sharing, and the commons. It addresses making sharing easy, legal and scalable through an integrated approach. Key challenges discussed include ensuring content is legally and technically accessible, dealing with semantic disagreements, navigating different legal implementations around knowledge sharing, and addressing "rights" issues with licensing frameworks for data. The presentation advocates for a norms-based approach through principles rather than licenses to create legal zones of certainty and promote interoperability.
Towards semantic systems chemical biology Bin Chen
introduce a semantic framework for studying systems chemical biology / systems pharmacology, in which three major projects (Chem2Bio2RDF, Chem2Bio2OWL, SLAP (semantic link association prediction) are covered.
The information revolution has transformed many business sectors over the last decade and the pharmaceutical industry is no exception. Developments in scientific and information technologies have unleashed an avalanche of content on research scientists who are struggling to access and filter this in an efficient manner. Furthermore, this domain has traditionally suffered from a lack of standards in how entities, processes and experimental results are described, leading to difficulties in determining whether results from two different sources can be reliably compared. The need to transform the way the life-science industry uses information has led to new thinking about how companies should work beyond their firewalls. In this talk we will provide an overview of the traditional approaches major pharmaceutical companies have taken to knowledge management and describe the business reasons why pre-competitive, cross-industry and public-private partnerships have gained much traction in recent years. We will consider the scientific challenges concerning the integration of biomedical knowledge, highlighting the complexities in representing everyday scientific objects in computerised form. This leads us to discuss how the semantic web might lead us to a long-overdue solution. The talk will be illustrated by focusing on the EU-Open PHACTS initiative (openphacts.org), established to provide a unique public-private infrastructure for pharmaceutical discovery. The aims of this work will be described and how technologies such as just-in-time identity resolution, nanopublication and interactive visualisations are helping to build a powerful software platform designed to appeal to directly to scientific users across the public and private sectors.
The document discusses the key features and functions that an ideal scientific data management system should have. It should manage users, instruments, biological samples, experiments and related workflows. It should support standards, ontologies, data models and data exchange formats. The system should be accessible from any device and integrate with other software and external resources. It should support the full lifecycle of information, enable collaboration and knowledge generation from documents and data. It should also be prepared to handle large increases in data volumes.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
The document discusses how computation can accelerate the generation of new knowledge by enabling large-scale collaborative research and extracting insights from vast amounts of data. It provides examples from astronomy, physics simulations, and biomedical research where computation has allowed more data and researchers to be incorporated, advancing various fields more quickly over time. Computation allows for data sharing, analysis, and hypothesis generation at scales not previously possible.
Opportunities for X-Ray science in future computing architecturesIan Foster
The world of computing continues to evolve rapidly. In just the past 10 years, we have seen the emergence of petascale supercomputing, cloud computing that provides on-demand computing and storage with considerable economies of scale, software-as-a-service methods that permit outsourcing of complex processes, and grid computing that enables federation of resources across institutional boundaries. These trends shown no signs of slowing down: the next 10 years will surely see exascale, new cloud offerings, and terabit networks. In this talk I review various of these developments and discuss their potential implications for a X-ray science and X-ray facilities.
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
This document provides an overview of bioinformatics tools and services for analyzing big data in biomedical research. It discusses traditional bioinformatics tools, analyzing genomic data from microarrays and next-generation sequencing without and with code, interpreting results using protein interaction networks and pathways, tools for data storage, cleaning and visualization, and making research reproducible. Galaxy, R, and programming are presented as useful for automated, reproducible analysis of large genomic datasets.
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
Research Objects and their instantiation as RO-Crate: motivation, explanation, examples, history and lessons, and opportunities for scholarly communications, delivered virtually to 17th Italian Research Conference on Digital Libraries
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.
https://www.big-data-europe.eu/the-final-big-data-europe-workshop/
This document discusses three possible strategies for identifying biological knowledge from scientific literature: 1) Allowing authors to validate biological entities during the writing process, 2) Performing discourse analysis to understand persuasive elements and relationships between ideas, and 3) Encouraging collaboration between authors and databases to identify hypotheses. It focuses on the challenges of current fact extraction techniques and the potential for modeling discourse and rhetorical moves to improve knowledge representation.
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
The document discusses the increasing scale and complexity of knowledge generation in science domains like astronomy and medicine over recent centuries. It argues that knowledge generation can be viewed as a systems problem involving many actors and processes. The document proposes a service-oriented approach using web services as an integrating framework to address challenges of scale, complexity, and distributed collaboration in e-Science. Key challenges discussed include semantics, documentation, scaling issues, and sociological factors like incentives.
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Rothamsted Research, UK
Workshop within the Integrative Bioinformatics Conference (IB2018, Harpenden, 2018).
We describe how to use Semantic Web Technologies and graph databases like Neo4j to serve life science data and address the FAIR data principles.
Megaphones to (No)where: On Sustaining ChangeKaitlin Thaney
The document discusses the challenges of sustaining social change movements over the long term. It notes that while some campaigns see initial bursts of interest and participation, engagement often drops off rapidly. True change requires coordinating efforts and receiving feedback from those most impacted over long periods. Designing movements with input and accountability to impacted communities is important for achieving lasting impact rather than just short-term goals.
Lessons in Resilience - International Women's Day Keynote @ Brooklyn CollegeKaitlin Thaney
This document provides lessons on resilience from the author's experiences. It discusses 8 lessons: 1) Know your limits for compromise and willpower. 2) Cultivate supportive relationships. 3) Accept that things will not always go as planned and dust yourself off. 4) Prioritize self-care as it is important for self-preservation. 5) Know when to make a career change if needed. 6) Show gratitude to others. 7) Advocate for yourself and help from others. 8) Your identity is more than just your career. The document encourages learning from failures and having a support system.
Kaitlin Thaney discusses building capacity for open science by investing in networks of open practice and communities that sustain open activity over time. Current systems create friction despite original open intentions, but shifting practices towards openness requires supporting professional development, interoperable tools, and incentives for collaboration and sharing. Mozilla aims to empower researchers through open, collaborative research on the web.
Fueling the Open Movement - Compute MidwestKaitlin Thaney
The document discusses Mozilla and its role in protecting and promoting an open internet. It provides background on Mozilla's history starting from the Netscape browser wars in the late 1990s. It describes Mozilla as a software engineering organization, global non-profit, network of contributors, and global social change movement aimed at protecting the open internet. The document calls on readers to join Mozilla in its mission.
The document discusses shifting scientific practice towards more open, collaborative and web-enabled research. It outlines current challenges around measuring contributions beyond publications alone. It then presents several initiatives to promote open scholarship, including contributorship badges to recognize different types of scientific work, dashboards to improve software discoverability, and community events bringing together researchers. Sustaining these changes requires addressing incentives, skills development, and lowering barriers to participation.
This document discusses Mozilla Science Lab's efforts to promote open and collaborative research practices through enabling access to content, data, code and materials online. It advocates for adopting standards and best practices that reward openness, interoperability and sharing. It also highlights the need for infrastructure, tools, repositories, incentives and training to support open research and foster sustainable practitioner communities.
Building capacity for open science - COASP MeetingKaitlin Thaney
The document discusses building capacity for open science. It outlines current challenges including a lack of incentives for open research. Examples are provided of open science practices like code as a research object and open iterative development. The need to teach web literacy in research and lower barriers to participation is discussed. Sustaining open science requires further adoption incentives, professional development supports, and helping scale community-driven efforts.
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It argues that current systems are creating friction despite original intentions of openness. It advocates for open tools, standards, best practices, and incentives to support web-enabled open research through improved access to content, data, code, materials. This would allow for communication, reuse, and scaling in a distributed environment. It also discusses fostering open source development communities of practice and building capacity for open research through professional development, training, and rewards.
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It notes that current systems are creating friction despite original intentions of openness. It advocates for building capacity for open, web-enabled research through infrastructure, tools, standards, incentives and training to support reuse, collaboration and interoperability. The goal is to foster sustainable communities of practitioners doing open science.
Building capacity for open, data-driven science - Grand RoundsKaitlin Thaney
Kaitlin Thaney gave a presentation on building capacity for open, data-driven science. She discussed leveraging the power of the web for open scholarship through access to content, data, code and materials. Adopting practices from open source development like code as a research object and iterative development can help further open science. Building capacity requires fostering sustainable practitioner communities through rewards, incentives and reputation systems while providing professional development support and lowering barriers to entry. Shifting to open practices is challenging and requires tools, cultural awareness, connections, skills training and incentives.
Kaitlin Thaney is piloting contributorship badges for science to help researchers use open web tools to advance science. The project involves building communities, teaching open science skills, and empowering others to learn and solve problems together. Thaney is hacking the reward system and seeks input on a GitHub project and help with a global sprint in June to further open practices in science.
Piloting Contributorship Badges for ScienceKaitlin Thaney
This document discusses piloting contributorship badges for science. It introduces Project CRedIT, which aims to standardize terms for contributor roles in scientific research. By having common terms for things like data curation or project administration, friction can be eliminated. The document provides links to related resources and invites input on the PaperBadger tool, which is on GitHub and allows assigning standardized contributor roles.
"Designing for Truth, Scale and Sustainability" - WSSSPE2 KeynoteKaitlin Thaney
1) The document discusses designing scientific research practices and tools for truth, scale, and sustainability. It argues current systems are designed for friction rather than collaboration and progress.
2) It notes a perception crisis where up to 70% of research cannot be reproduced, representing wasted money. Shifting practice requires a multi-faceted approach including open tools, standards, incentives and recognition to foster reuse.
3) The document calls for further adoption of "web-enabled science" through access to content, data, code and materials with rewards for openness and collaboration. It discusses rethinking professional development to lower barriers to entry and foster sustainable practitioner communities.
"Making the Web Work for Science" - NCI CBIITKaitlin Thaney
This document discusses making scientific research more open and reproducible by leveraging the power of the web. It argues that current scientific practices and reward systems create friction and limit progress. Adopting open, web-enabled approaches could help address issues like lack of reproducibility and wasted resources by encouraging sharing of content, data, code and materials online. However, shifting practices requires a multi-faceted approach that addresses infrastructure, incentives, skills training and cultural norms.
"Building Capacity for Open Research" - AAMCKaitlin Thaney
This document discusses challenges with the current state of scientific research and proposes approaches to shift towards more open and reproducible practices. It notes that current systems are designed to create friction and rewards the wrong behaviors. To address this, it advocates taking a multi-faceted approach including improving infrastructure for open tools, standards, best practices, incentives and recognition, training, and policies. Key steps proposed are baking reproducible practices into academia, creating opportunities for experimentation and cross-disciplinary work, and rethinking how researchers are rewarded to support more open science.
Making the web work for science - eResearch nzKaitlin Thaney
1) The document discusses how current scientific practices are outdated and designed to create friction, limiting access to information and collaboration.
2) It argues that leveraging the power of the open web through tools, data sharing, and interoperability could help advance science by improving access, reuse of resources, and transparency.
3) However, changing practices and building an open, collaborative culture also requires training researchers in digital skills and establishing social infrastructure to encourage openness and reward sharing.
Making the web work for science - University of QueenslandKaitlin Thaney
The document discusses how current scientific research practices are outdated and designed to create friction, preventing optimal sharing and collaboration. It advocates transitioning to more "web-enabled science" by improving open access to content, data, code, materials and tools while rewarding openness. However, changing practices faces challenges like skills gaps. Collective efforts are needed to build capacity through training, educate the next generation, and instill best digital and reproducible practices to help research fully leverage the web.
"Let's talk about the web" - Citizen Cyberscience SummitKaitlin Thaney
The document discusses the need to improve how knowledge is shared on the web and make it more open and accessible. It notes there is still friction that prevents the original intentions of the web from being fully realized. It advocates helping researchers use the open web to change science and making systems interoperable between humans and machines. It also stresses the importance of building digital literacy and skills through education to grow communities and prepare people for an increasingly digital world.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Assessment and Planning in Educational technology.pptxKavitha Krishnan
In an education system, it is understood that assessment is only for the students, but on the other hand, the Assessment of teachers is also an important aspect of the education system that ensures teachers are providing high-quality instruction to students. The assessment process can be used to provide feedback and support for professional development, to inform decisions about teacher retention or promotion, or to evaluate teacher effectiveness for accountability purposes.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
1. knowledge sharing in the
sciences
kaitlin thaney
program manager, science commons
costa rica - aCCCeso - 11 nov 2009
This presentation is licensed under the CreativeCommons-Attribution-3.0 Unported license.
3. make sharing easy, legal and scalable
integrated approach
building part of the infrastructure for
knowledge sharing
4. knowledge sharing is at the root of
scholarship and science
the system of print publishing is a
system of sharing knowledge
then came the move to digital ...
5. knowledge sharing
journal articles
data
ontologies
annotations
plasmids and cell lines
6. knowledge sharing
journal articles
data
ontologies
annotations
plasmids and cell lines
7. access is step one
content needs to be legally and
technically accessible
9. “ By open access to the literature, we mean its
free availability on the public internet,
permitting users to read, download, copy,
distribute, print, search, or link to the full texts of
the articles, crawl them for indexing, pass them as
data to software, or use them for any other lawful
purpose, without financial, legal or technical
barriers other than those inseparable from gaining
access to the internet itself.”
Image from the Public Library of Science, licensed to the public, under
CC-BY-3.0
10. “The only constraint on reproduction and
distribution, and the only role for copyright in this
domain, should be to give authors control over the
integrity of their work and the right to be
properly acknowledged and cited.”
18. ideally ...
contact author, obtain material,
recreate experiment
build on the existing work, publish
and repeat ...
19. the reality ...
materials difficult to find, fulfill, lack
resources
reagents and assays often re-invented
or reverse engineered
locked in contracts, bureaucracy,
deliberate withholding, “club mentality”
20.
21. solves the access problem via
contract
UBMTA (standardized material
transfer agreements, or
MTAs)
SLA
SCMTA
standard icons, CC
methodology, metadata
55. issue of license proliferation
whatever you do to the least of the
databases, you do to the integrated system
(the most restrictive wins)
risk for unintended consequences
60. national law / jurisdiction-based
hurdles
sui generis,
“sweat of the brow”
Crown copyright
“level of skill”
how internat’l data sharing efforts
are affected?
62. attribution:
(legal entity)
“triggered by making of a copy”
does it apply to facts?
how to attribute? (papers, ontologies, data)
“in a manner specified by ...”
attribution stacking
64. we shouldn’t use the law to make it
hard to do the wrong thing ...
65. need for a legally accurate and
simple solution
reducing or eliminating the need to
make the distinction of what’s protected
requires modular, standards based
approach to licensing
70. ... must promote legal predictability and certainty.
... must be easy to use and understand.
... must impose the lowest possible transaction costs on
users.
full text:
http://sciencecommons.org/projects/publishing/open-access-data-protocol/
71. norms approach
set of principles (not license)
open, accessible, interoperable
create legal zones of certainty
72. calls for data providers to waive all rights
necessary for data extraction and re-use
requires provider place no additional
obligations (like share-alike) to limit
downstream use
request behavior (like attribution) through
norms and terms of use
73.
74.
75.
76.
77.
78. at best, we’re partially right.
at worst, we’re really wrong.
79. infrastructure for a data web
the digital commons
law + content + technology +
community
80. resist the temptation to treat
as property
embrace the potential to treat instead
as a network resource
81. early days of WWW
no licenses (even free)
debate over code
CERN’s decision
view/edit source
network effects