An overview of Digital Science - a new company started out of Macmillan Publishers dedicated to making research more efficient through better use of technology.
"The Mudslide Hypothesis of Science" - OSCONKaitlin Thaney
The document discusses the "mudslide hypothesis" which suggests that traditions persist not due to excellence but because of resistance to change from influential people and the difficulties of transition. It argues that outdated research practices waste time and resources, and that while research is changing, discovery is still suboptimal due to reliance on old systems not designed for modern mediums. It calls for rethinking approaches to research to maximize reuse, allow for network effects, and redefine performance metrics to better support current research workflows.
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
The document discusses challenges with the current state of scientific research and proposes ways to leverage the power of the open web to improve science. It notes that current systems are designed to create friction rather than enable open collaboration. The document advocates for adopting practices of open source development like using community-driven metadata for software and open, iterative development. It also argues that policies and incentives need to change to reward openness, reuse and reproducibility in order to avoid wasted time, money and opportunities.
Building capacity for open, data-driven science - Grand RoundsKaitlin Thaney
Kaitlin Thaney gave a presentation on building capacity for open, data-driven science. She discussed leveraging the power of the web for open scholarship through access to content, data, code and materials. Adopting practices from open source development like code as a research object and iterative development can help further open science. Building capacity requires fostering sustainable practitioner communities through rewards, incentives and reputation systems while providing professional development support and lowering barriers to entry. Shifting to open practices is challenging and requires tools, cultural awareness, connections, skills training and incentives.
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It notes that current systems are creating friction despite original intentions of openness. It advocates for building capacity for open, web-enabled research through infrastructure, tools, standards, incentives and training to support reuse, collaboration and interoperability. The goal is to foster sustainable communities of practitioners doing open science.
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It argues that current systems are creating friction despite original intentions of openness. It advocates for open tools, standards, best practices, and incentives to support web-enabled open research through improved access to content, data, code, materials. This would allow for communication, reuse, and scaling in a distributed environment. It also discusses fostering open source development communities of practice and building capacity for open research through professional development, training, and rewards.
Big data repositories are seeing an increase in smaller, niche datasets as more researchers contribute data. This "long tail of data" poses challenges for discovery, access, and attribution. The authors propose a centralized data repository that would make any dataset discoverable and accessible regardless of size or topic by automating metadata generation and attribution to help researchers find and share relevant data.
"The Mudslide Hypothesis of Science" - OSCONKaitlin Thaney
The document discusses the "mudslide hypothesis" which suggests that traditions persist not due to excellence but because of resistance to change from influential people and the difficulties of transition. It argues that outdated research practices waste time and resources, and that while research is changing, discovery is still suboptimal due to reliance on old systems not designed for modern mediums. It calls for rethinking approaches to research to maximize reuse, allow for network effects, and redefine performance metrics to better support current research workflows.
Making the web work for science - RIT Dean's Lecture SeriesKaitlin Thaney
The document discusses challenges with the current state of scientific research and proposes ways to leverage the power of the open web to improve science. It notes that current systems are designed to create friction rather than enable open collaboration. The document advocates for adopting practices of open source development like using community-driven metadata for software and open, iterative development. It also argues that policies and incentives need to change to reward openness, reuse and reproducibility in order to avoid wasted time, money and opportunities.
Building capacity for open, data-driven science - Grand RoundsKaitlin Thaney
Kaitlin Thaney gave a presentation on building capacity for open, data-driven science. She discussed leveraging the power of the web for open scholarship through access to content, data, code and materials. Adopting practices from open source development like code as a research object and iterative development can help further open science. Building capacity requires fostering sustainable practitioner communities through rewards, incentives and reputation systems while providing professional development support and lowering barriers to entry. Shifting to open practices is challenging and requires tools, cultural awareness, connections, skills training and incentives.
Leveraging the power of the web - Open Repositories 2015Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It notes that current systems are creating friction despite original intentions of openness. It advocates for building capacity for open, web-enabled research through infrastructure, tools, standards, incentives and training to support reuse, collaboration and interoperability. The goal is to foster sustainable communities of practitioners doing open science.
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
This document discusses leveraging the power of the open web for science. It argues that current systems are creating friction despite original intentions of openness. It advocates for open tools, standards, best practices, and incentives to support web-enabled open research through improved access to content, data, code, materials. This would allow for communication, reuse, and scaling in a distributed environment. It also discusses fostering open source development communities of practice and building capacity for open research through professional development, training, and rewards.
Big data repositories are seeing an increase in smaller, niche datasets as more researchers contribute data. This "long tail of data" poses challenges for discovery, access, and attribution. The authors propose a centralized data repository that would make any dataset discoverable and accessible regardless of size or topic by automating metadata generation and attribution to help researchers find and share relevant data.
Keynote for Theory and Practice of Digital Libraries 2017
The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.
However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
Presentation of our short paper
"A First Step Towards Content Protecting Plagiarism Detection"
at the Joint Conference on Digital Libraries (JCDL) 2020 taking place at Wuhan, China, August 2, 2020.
Pre-print of the paper: https://arxiv.org/pdf/2005.11504.pdf
Code and Data: https://github.com/ag-gipp/20CppdData
Enabling knowledge management in the Agronomic DomainPierre Larmande
This talk will focus mainly on, ongoing projects at the Institute of Computational Biology
Agronomic Linked Data (AgroLD): is a Semantic Web knowledge base designed to integrate data from various publically available plant centric data sources.
GIGwA: is a tool developed to manage genomic, transcriptomic and genotyping large data resulting from NGS analyses.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
Making the web work for science - eResearch nzKaitlin Thaney
1) The document discusses how current scientific practices are outdated and designed to create friction, limiting access to information and collaboration.
2) It argues that leveraging the power of the open web through tools, data sharing, and interoperability could help advance science by improving access, reuse of resources, and transparency.
3) However, changing practices and building an open, collaborative culture also requires training researchers in digital skills and establishing social infrastructure to encourage openness and reward sharing.
The need for a transparent data supply chainPaul Groth
1. The document discusses the need for transparency in data supply chains. It notes that data goes through multiple steps as it is collected, modeled, and applied in applications.
2. It illustrates the complexity of data supply chains using examples of how data is reused and integrated from multiple sources to build models and how bias can propagate.
3. The document argues that transparency is important to understand where data comes from, how it has been processed, and help address issues like bias, privacy, or other problems at their source in the data supply chain.
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
This document discusses topic modeling on 350 million documents from Mendeley. It describes how topic modeling can be used to categorize documents into topics and subcategories, though categorization is imperfect and topics change over time. It also discusses how topic modeling and metrics can help with fact discovery and reproducibility of research to build more robust datasets.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
This document discusses ways to incentivize scientists to share their data through self-interest. It describes two existing models where data sharing is successful: oceanographic research consortia that require data sharing, and biomedical research projects that organize data generation and sharing through a common platform. The document proposes a distributed graph database and computing platform that would allow researchers to query diverse public and private datasets, providing immediate returns for data sharing. By making others' data useful to analyze and mine, researchers would be competitively disadvantaged not to share their own data. The goal is to enable open sharing by addressing current problems and remaining agile for future needs.
- DaMaHub is a distributed platform and local client that allows scientists to organize, share, and preserve their research data and results in an easy and secure way.
- It employs blockchain and IPFS technologies to make scientific data findable, accessible, interoperable, and reusable while preserving authenticity.
- As both a distributed platform and local client, DaMaHub integrates into researchers' workflows and makes data management and open sharing simple.
This document summarizes key points about data science and privacy regulation:
1. Regulation aims to alter behavior according to standards to achieve defined outcomes, and can involve standard-setting, information gathering, and modifying behavior.
2. With "big data", problems arise for the laissez-faire conception of privacy regulation due to market failures, insider threats, and mass surveillance capabilities.
3. Designing for privacy is important, such as data minimization, decentralization, consent requirements, and easy-to-use privacy interfaces. The "data exhaust" from ubiquitous data collection threatens privacy in Europe.
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
This document discusses the challenges of sharing large-scale and sensitive data and approaches to address them. It describes how data sharing needs to continue supporting discovery, citation, access and reuse of data as datasets increase in size from GBs to TBs and PBs. Current collaborations are working on integrating large datasets with Dataverse and moving computing resources closer to data storage. The document also discusses the DataTags system for sharing sensitive data while maintaining privacy and security.
enabling communities of researchers working together across institutional bou...Brian Bot
Sage Bionetworks is a non-profit organization that pilots various components to build a scientific research commons and enable more open and collaborative biomedical research. It supports ~40 employees working on research platform development and leadership. Sage brings together researchers across disciplines and institutions through initiatives like the Pan-Cancer Atlas Consortium and the CommonMind Consortium to facilitate data sharing and collaborative analysis around common questions.
This document discusses biohackathons, which are events where participants collaborate to create new tools or applications in bioscience over a weekend. It provides examples of two biohackathons held in 2014 at Stanford University and University of California San Diego. At these events, around 30 participants formed teams and worked on projects like tools for visualizing and editing bio data or finding similarities between biomedical concepts. The goals were to foster collaborative creativity, learning, and networking. The best projects received small cash prizes.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
This document summarizes digital science and the future of online research. It discusses how technology is changing the research workflow by enabling more efficient sharing of ideas, literature reviews, results, materials and data. However, there are still roadblocks to overcome, including specialization of tools, lack of interoperability, and accessibility issues. The key constituencies that must be considered are machines/tools, researchers, and decision makers. While the future of research is digital, adoption remains uneven and cultural shifts are needed to fully realize the benefits of new technologies.
Keynote for Theory and Practice of Digital Libraries 2017
The theory and practice of digital libraries provides a long history of thought around how to manage knowledge ranging from collection development, to cataloging and resource description. These tools were all designed to make knowledge findable and accessible to people. Even technical progress in information retrieval and question answering are all targeted to helping answer a human’s information need.
However, increasingly demand is for data. Data that is needed not for people’s consumption but to drive machines. As an example of this demand, there has been explosive growth in job openings for Data Engineers – professionals who prepare data for machine consumption. In this talk, I overview the information needs of machine intelligence and ask the question: Are our knowledge management techniques applicable for serving this new consumer?
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
The document discusses lessons learned from Bionimbus, a petabyte-scale science cloud service provider. Some key points:
- Bionimbus provides cloud-based storage and computing resources for genomic and biomedical research projects dealing with large datasets, such as sequencing a million genomes which would generate around 1 exabyte of data.
- Lessons from operating at this scale include how to effectively store, manage, analyze, and share extremely large datasets across many users and projects in an open science environment.
- The growth of "big data" sciences like genomics poses challenges around data volume but also opportunities to drive new scientific discoveries through large-scale genomic analysis.
Presentation of our short paper
"A First Step Towards Content Protecting Plagiarism Detection"
at the Joint Conference on Digital Libraries (JCDL) 2020 taking place at Wuhan, China, August 2, 2020.
Pre-print of the paper: https://arxiv.org/pdf/2005.11504.pdf
Code and Data: https://github.com/ag-gipp/20CppdData
Enabling knowledge management in the Agronomic DomainPierre Larmande
This talk will focus mainly on, ongoing projects at the Institute of Computational Biology
Agronomic Linked Data (AgroLD): is a Semantic Web knowledge base designed to integrate data from various publically available plant centric data sources.
GIGwA: is a tool developed to manage genomic, transcriptomic and genotyping large data resulting from NGS analyses.
The document discusses various challenges in social network analysis including collecting and extracting network data at scale from sources such as the web, validating automated data extraction methods, and developing algorithms and software that can analyze large and complex network datasets. It also outlines different network analysis methods, visualization and simulation techniques, and recommendations for how tools can better support networking, referrals, and workflows across multiple data sources and programs. Scaling methods and algorithms to very large network sizes and developing standards to integrate diverse data and tools are highlighted as key challenges.
Making the web work for science - eResearch nzKaitlin Thaney
1) The document discusses how current scientific practices are outdated and designed to create friction, limiting access to information and collaboration.
2) It argues that leveraging the power of the open web through tools, data sharing, and interoperability could help advance science by improving access, reuse of resources, and transparency.
3) However, changing practices and building an open, collaborative culture also requires training researchers in digital skills and establishing social infrastructure to encourage openness and reward sharing.
The need for a transparent data supply chainPaul Groth
1. The document discusses the need for transparency in data supply chains. It notes that data goes through multiple steps as it is collected, modeled, and applied in applications.
2. It illustrates the complexity of data supply chains using examples of how data is reused and integrated from multiple sources to build models and how bias can propagate.
3. The document argues that transparency is important to understand where data comes from, how it has been processed, and help address issues like bias, privacy, or other problems at their source in the data supply chain.
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
This document discusses topic modeling on 350 million documents from Mendeley. It describes how topic modeling can be used to categorize documents into topics and subcategories, though categorization is imperfect and topics change over time. It also discusses how topic modeling and metrics can help with fact discovery and reproducibility of research to build more robust datasets.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
This document discusses ways to incentivize scientists to share their data through self-interest. It describes two existing models where data sharing is successful: oceanographic research consortia that require data sharing, and biomedical research projects that organize data generation and sharing through a common platform. The document proposes a distributed graph database and computing platform that would allow researchers to query diverse public and private datasets, providing immediate returns for data sharing. By making others' data useful to analyze and mine, researchers would be competitively disadvantaged not to share their own data. The goal is to enable open sharing by addressing current problems and remaining agile for future needs.
- DaMaHub is a distributed platform and local client that allows scientists to organize, share, and preserve their research data and results in an easy and secure way.
- It employs blockchain and IPFS technologies to make scientific data findable, accessible, interoperable, and reusable while preserving authenticity.
- As both a distributed platform and local client, DaMaHub integrates into researchers' workflows and makes data management and open sharing simple.
This document summarizes key points about data science and privacy regulation:
1. Regulation aims to alter behavior according to standards to achieve defined outcomes, and can involve standard-setting, information gathering, and modifying behavior.
2. With "big data", problems arise for the laissez-faire conception of privacy regulation due to market failures, insider threats, and mass surveillance capabilities.
3. Designing for privacy is important, such as data minimization, decentralization, consent requirements, and easy-to-use privacy interfaces. The "data exhaust" from ubiquitous data collection threatens privacy in Europe.
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
Ignite talk (20 slides / 15 seconds per slide) for ESA 2014 meeting in Sacramento, CA 12 August 2014. On the Dash platform for helping researchers manage and share their data via institutional repositories
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
This document discusses the challenges of sharing large-scale and sensitive data and approaches to address them. It describes how data sharing needs to continue supporting discovery, citation, access and reuse of data as datasets increase in size from GBs to TBs and PBs. Current collaborations are working on integrating large datasets with Dataverse and moving computing resources closer to data storage. The document also discusses the DataTags system for sharing sensitive data while maintaining privacy and security.
enabling communities of researchers working together across institutional bou...Brian Bot
Sage Bionetworks is a non-profit organization that pilots various components to build a scientific research commons and enable more open and collaborative biomedical research. It supports ~40 employees working on research platform development and leadership. Sage brings together researchers across disciplines and institutions through initiatives like the Pan-Cancer Atlas Consortium and the CommonMind Consortium to facilitate data sharing and collaborative analysis around common questions.
This document discusses biohackathons, which are events where participants collaborate to create new tools or applications in bioscience over a weekend. It provides examples of two biohackathons held in 2014 at Stanford University and University of California San Diego. At these events, around 30 participants formed teams and worked on projects like tools for visualizing and editing bio data or finding similarities between biomedical concepts. The goals were to foster collaborative creativity, learning, and networking. The best projects received small cash prizes.
A Big Picture in Research Data ManagementCarole Goble
A personal view of the big picture in Research Data Management, given at GFBio - de.NBI Summer School 2018 Riding the Data Life Cycle! Braunschweig Integrated Centre of Systems Biology (BRICS), 03 - 07 September 2018
This document summarizes digital science and the future of online research. It discusses how technology is changing the research workflow by enabling more efficient sharing of ideas, literature reviews, results, materials and data. However, there are still roadblocks to overcome, including specialization of tools, lack of interoperability, and accessibility issues. The key constituencies that must be considered are machines/tools, researchers, and decision makers. While the future of research is digital, adoption remains uneven and cultural shifts are needed to fully realize the benefits of new technologies.
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
The document discusses research design and measurement. It defines key concepts in research design such as the different types of scales used in measurement (nominal, ordinal, interval, ratio), sources of measurement error, and criteria for evaluating measurement tools. It also outlines the different descriptors of research design including the degree of question crystallization, data collection methods, time dimensions, research environment, and purpose of studies.
Future of text analysis forrester briefingStuart Shulman
Dr. Stuart Shulman gave a presentation on the future of text analysis. He discussed how text analysis tools will enable quicker processing and more accurate results through features like advanced search, metadata tagging, and active machine learning. Projects will leverage user credentials to control access and allow for shared analysis across distributed teams. Text from various sources will be imported into a unified repository for eDiscovery and search. DiscoverText was introduced as a tool that incorporates these capabilities.
Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell
A comprehensive presentation on Federated Search (FS) Technologies including the types of FS, FS Challenges & Benefits, a case study, FS Evaluation Criteria, Examples of FS Solutions, Best Practices and Future Vision of where FS Technologies may go.
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
The document discusses machine learned relevance at a large scale search engine. It provides biographies of the two authors who have extensive experience in machine learning and search engines. It then outlines the topics to be covered, including an introduction to machine learned ranking for search, relevance evaluation methodologies, data collection and metrics, the Quixey search engine system, model training approaches, and conclusions.
This presentation was provided by Kristi Holmes of Northwestern University during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
When your colleagues say they want Google, they don’t mean the Google Search Appliance. They mean the Google Search user experience: pervasive, expedient and delivering the information that they need. Successful enterprise search does not start with the application features, is not part of the information architecture, does not come from a controlled vocabulary and does not emerge on its own from the developers. It requires enterprise-specific data mining, enterprise-specific user-centered design and fine tuning to turn “search sucks” into search success within the firewall. This presentation looks at action items, tools and deliverables for Discovery, Planning, Design and Post Launch phases of an enterprise search deployment.
Anita de Waard from Elsevier discussed research data management from a publisher's perspective. She outlined tools her organization has developed to enable open and integrated RDM, including metrics to measure data usage. While tools see adoption, challenges include a lack of researcher urgency, distributed responsibility for RDM, integrating many available tools, and unclear business models. She welcomed questions on her organization's role in supporting best practices.
How to Exploring and accessing Knowledge in Research Steps, Explore academic search engines, as well as online databases. Finally, discuss which resources are reliable.
Asis&t webinar people directories access innovationsBert Carelli
This document discusses using taxonomies to create people directories and author networks. It outlines Access Innovations' background in building taxonomies and their data harmony software. Taxonomies can play a role in developing better resources about people by linking entities like authors, publications, and institutions. This allows for knowledge discovery and collaboration through detailed author profiles, visualizing co-author networks, and integrating identity into publisher systems. Standards like VIAF, ORCID, and Project VIVO aim to connect names and publications across repositories through semantic linking of author data.
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...OCLC Research
A view of the research support landscape and RLG partnership activities to help academic librarians provide better services. Given at the Spring CNI briefing in Minneapolis April 6, 2009.
By Ricky Erway, OCLC Research
Lost In Translation: When Machines Meet STM Contentscrazzl
This is an edited version of the presentation that I gave at STM innovations #ukinno on December the 4th, 2013. It covers the Resources Identification Initiative and highlights some of the partners that are using the scrazzl product API to power research product discovery on their websites.
"Building Capacity for Open Research" - AAMCKaitlin Thaney
This document discusses challenges with the current state of scientific research and proposes approaches to shift towards more open and reproducible practices. It notes that current systems are designed to create friction and rewards the wrong behaviors. To address this, it advocates taking a multi-faceted approach including improving infrastructure for open tools, standards, best practices, incentives and recognition, training, and policies. Key steps proposed are baking reproducible practices into academia, creating opportunities for experimentation and cross-disciplinary work, and rethinking how researchers are rewarded to support more open science.
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
Science is knowledge work. The scientific method and scholarly communication are about facilitating “knowledge turns” – that is, the turning of observation and hypothesis through experimentation, comparison, and analysis into new, pooled knowledge. Turns depend on the FAIR flow and availability of data, methods for automated processing, reproducible results and on a society of scientists coordinating and collaborating. We need to build a new form of Research Commons and I will present my steps towards this.
Presented at Symposium: The Future of a Data-Driven Society, Maastricht University, 25 Jan 2018 that accompanied the 42nd Dies Natalis where I was awarded an honorary doctorate
Personal video:
https://www.youtube.com/watch?v=k5WN6KDDatU&index=4&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
Video of the symposium:
https://www.youtube.com/watch?v=JN9eMMtCHf8&t=19s&index=6&list=PLzi-FBaZlOOagma5dCW7WSA5lv22tmNMD
This presentation sets out some of the challenges around citing and identifying datasets and introduces DataCite, the international data citation initiative. DataCite was founded on 1-December 2009 to support researchers by
providing methods for them to locate, identify, and cite
research datasets with confidence.
This presentation was given by Adam Farquhar at the STM Publishers Association Innovation Conference on 4-Dec-2009.
Megaphones to (No)where: On Sustaining ChangeKaitlin Thaney
The document discusses the challenges of sustaining social change movements over the long term. It notes that while some campaigns see initial bursts of interest and participation, engagement often drops off rapidly. True change requires coordinating efforts and receiving feedback from those most impacted over long periods. Designing movements with input and accountability to impacted communities is important for achieving lasting impact rather than just short-term goals.
Lessons in Resilience - International Women's Day Keynote @ Brooklyn CollegeKaitlin Thaney
This document provides lessons on resilience from the author's experiences. It discusses 8 lessons: 1) Know your limits for compromise and willpower. 2) Cultivate supportive relationships. 3) Accept that things will not always go as planned and dust yourself off. 4) Prioritize self-care as it is important for self-preservation. 5) Know when to make a career change if needed. 6) Show gratitude to others. 7) Advocate for yourself and help from others. 8) Your identity is more than just your career. The document encourages learning from failures and having a support system.
Kaitlin Thaney discusses building capacity for open science by investing in networks of open practice and communities that sustain open activity over time. Current systems create friction despite original open intentions, but shifting practices towards openness requires supporting professional development, interoperable tools, and incentives for collaboration and sharing. Mozilla aims to empower researchers through open, collaborative research on the web.
Fueling the Open Movement - Compute MidwestKaitlin Thaney
The document discusses Mozilla and its role in protecting and promoting an open internet. It provides background on Mozilla's history starting from the Netscape browser wars in the late 1990s. It describes Mozilla as a software engineering organization, global non-profit, network of contributors, and global social change movement aimed at protecting the open internet. The document calls on readers to join Mozilla in its mission.
The document discusses shifting scientific practice towards more open, collaborative and web-enabled research. It outlines current challenges around measuring contributions beyond publications alone. It then presents several initiatives to promote open scholarship, including contributorship badges to recognize different types of scientific work, dashboards to improve software discoverability, and community events bringing together researchers. Sustaining these changes requires addressing incentives, skills development, and lowering barriers to participation.
This document discusses Mozilla Science Lab's efforts to promote open and collaborative research practices through enabling access to content, data, code and materials online. It advocates for adopting standards and best practices that reward openness, interoperability and sharing. It also highlights the need for infrastructure, tools, repositories, incentives and training to support open research and foster sustainable practitioner communities.
Building capacity for open science - COASP MeetingKaitlin Thaney
The document discusses building capacity for open science. It outlines current challenges including a lack of incentives for open research. Examples are provided of open science practices like code as a research object and open iterative development. The need to teach web literacy in research and lower barriers to participation is discussed. Sustaining open science requires further adoption incentives, professional development supports, and helping scale community-driven efforts.
Kaitlin Thaney is piloting contributorship badges for science to help researchers use open web tools to advance science. The project involves building communities, teaching open science skills, and empowering others to learn and solve problems together. Thaney is hacking the reward system and seeks input on a GitHub project and help with a global sprint in June to further open practices in science.
Piloting Contributorship Badges for ScienceKaitlin Thaney
This document discusses piloting contributorship badges for science. It introduces Project CRedIT, which aims to standardize terms for contributor roles in scientific research. By having common terms for things like data curation or project administration, friction can be eliminated. The document provides links to related resources and invites input on the PaperBadger tool, which is on GitHub and allows assigning standardized contributor roles.
"Designing for Truth, Scale and Sustainability" - WSSSPE2 KeynoteKaitlin Thaney
1) The document discusses designing scientific research practices and tools for truth, scale, and sustainability. It argues current systems are designed for friction rather than collaboration and progress.
2) It notes a perception crisis where up to 70% of research cannot be reproduced, representing wasted money. Shifting practice requires a multi-faceted approach including open tools, standards, incentives and recognition to foster reuse.
3) The document calls for further adoption of "web-enabled science" through access to content, data, code and materials with rewards for openness and collaboration. It discusses rethinking professional development to lower barriers to entry and foster sustainable practitioner communities.
"Making the Web Work for Science" - NCI CBIITKaitlin Thaney
This document discusses making scientific research more open and reproducible by leveraging the power of the web. It argues that current scientific practices and reward systems create friction and limit progress. Adopting open, web-enabled approaches could help address issues like lack of reproducibility and wasted resources by encouraging sharing of content, data, code and materials online. However, shifting practices requires a multi-faceted approach that addresses infrastructure, incentives, skills training and cultural norms.
Making the web work for science - University of QueenslandKaitlin Thaney
The document discusses how current scientific research practices are outdated and designed to create friction, preventing optimal sharing and collaboration. It advocates transitioning to more "web-enabled science" by improving open access to content, data, code, materials and tools while rewarding openness. However, changing practices faces challenges like skills gaps. Collective efforts are needed to build capacity through training, educate the next generation, and instill best digital and reproducible practices to help research fully leverage the web.
"Let's talk about the web" - Citizen Cyberscience SummitKaitlin Thaney
The document discusses the need to improve how knowledge is shared on the web and make it more open and accessible. It notes there is still friction that prevents the original intentions of the web from being fully realized. It advocates helping researchers use the open web to change science and making systems interoperable between humans and machines. It also stresses the importance of building digital literacy and skills through education to grow communities and prepare people for an increasingly digital world.
The document discusses challenges with the current scientific research system and proposes moving towards a more "open science" model. It notes that the existing system was not designed for knowledge sharing and reproducibility. Open science principles advocate for open access to content, data, code, and materials in order to enable greater collaboration, reuse of results, and transparency. However, moving research practices will require building skills and changing incentives across the scientific community. The document calls for coordinating efforts to provide training, establish best practices, and foster connections to support operating research on the web in a more open and interoperable manner.
The document discusses issues with the current scientific research system including broken incentives discouraging collaboration and sharing, inflexible tools, and research that is difficult to build upon or reuse. It notes that up to 70% of academic research cannot be reproduced, representing a waste of resources. The document advocates shifting practices to instill best digital and reproducible research practices through "research hygiene" and ensuring systems are interoperable through coordination and collaboration, in order to help researchers use open web technologies to improve science. It provides links to join the efforts through teaching, hacking, building and learning.
Making the Web work for science - SciTechLAKaitlin Thaney
Kaitlin Thaney discusses making science more open and reproducible through open web technologies. She defines open science as making the entire research process transparent by sharing hypotheses, protocols, data, and analysis. However, current research practices and systems do not fully support open science. Barriers include a lack of access to materials, difficulty publishing negative results, and the "digital skills gap" of many researchers. Promoting a culture of open source through training, community building, and interoperable standards and platforms can help shift practices towards more open and reproducible science at scale.
The document discusses increasing digital literacy for science researchers by teaching them best practices for open and reproducible science. It notes that self-education on digital tools does not scale effectively. The Mozilla Foundation is working with over 100 instructors to provide bootcamp trainings in digital and reproducible research skills to thousands of learners. It asks what core competencies are needed for open science and how to build broader capacity through assessment and participation in their efforts and online conversations.
This document discusses the need to build open science gateways that maximize scale, foster connectivity, and minimize friction in the research process. It notes that our current systems create unnecessary barriers and that we must learn from open source methods. It calls for building capacity through training and skills development, changing perceptions of open science, and making the web work better for scientific collaboration and knowledge sharing.
The need for disruption in science - Envisioning the Future, CIAKaitlin Thaney
The document discusses the need for disruption in science. It notes that up to 70% of academic research cannot be reproduced, representing a waste of money and effort. Additionally, existing research systems are hindering progress by locking scientists into old mechanisms and traditions that persist due to influential people resisting change. However, the document expresses hope that systems can be improved by rethinking how knowledge is defined, produced and shared in order to increase reproducibility and openness in scientific research.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
52. RAE teams pharma
startups
tools developers
content providers
partners/collaborators
big business
funding agencies
researchers
grant officers administrators
53. RAE teams pharma
startups
tools developers
content providers
work with us
big business
funding agencies
researchers
grant officers administrators