The document discusses the importance of data curation for scientific progress and integrity. It outlines the library's role in connecting researchers to content and supporting the research lifecycle. Barriers to data sharing like poor discovery, unfamiliar processes, and loss of control can be addressed through tools and services that provide identifiers, metadata, data use agreements, and data management planning guidance. Embedding best practices into existing researcher tools and workflows can help promote data curation.
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Integration of research literature and data (InFoLiS)Philipp Zumstein
Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
A demonstration of the DMPTool, which helps researchers create data management plans now required by the Nat'l Science Foundation and other US grant funding agencies. See http://www.cdlib.org/uc3/webinars/20111019/
for recording.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This webinar is intended for librarians, staff, and information professionals interested in improving usability for the DMPTool in their institution. This webinar will also help institutions begin to formalize which individuals or resources will be available to help researchers using the tool. This webinar will be most useful for users that need to customize the tool for their institution.
Bianca Crowley, Collections Coordinator, Biodiversity Heritage Library/Smithsonian Libraries
Keri Thompson, Digital Projects Librarian, Smithsonian Libraries
Constance Rinaldo, Librarian of the Ernst Mayr Library Museum of Comparative Zoology, Harvard University
Biodiversity Heritage Library Content Liberator
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: Provenance of Figures in the Global Change Information SystemASIS&T
Justin Goldstein, Curt Tilmes, Ana Pinheiro Privette, Robert David, Marshall Ma, Jin Zheng, Steven Aulenbach and Fred Burnett
Provenance of Figures in the Global Change Information System
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
A demonstration of the DMPTool, which helps researchers create data management plans now required by the Nat'l Science Foundation and other US grant funding agencies. See http://www.cdlib.org/uc3/webinars/20111019/
for recording.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
Lesson 8 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
This webinar is intended for librarians, staff, and information professionals interested in improving usability for the DMPTool in their institution. This webinar will also help institutions begin to formalize which individuals or resources will be available to help researchers using the tool. This webinar will be most useful for users that need to customize the tool for their institution.
Bianca Crowley, Collections Coordinator, Biodiversity Heritage Library/Smithsonian Libraries
Keri Thompson, Digital Projects Librarian, Smithsonian Libraries
Constance Rinaldo, Librarian of the Ernst Mayr Library Museum of Comparative Zoology, Harvard University
Biodiversity Heritage Library Content Liberator
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: Provenance of Figures in the Global Change Information SystemASIS&T
Justin Goldstein, Curt Tilmes, Ana Pinheiro Privette, Robert David, Marshall Ma, Jin Zheng, Steven Aulenbach and Fred Burnett
Provenance of Figures in the Global Change Information System
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
WeVeel will be in Hong Kong next week previewing all of our new and exciting products for 2017. Please contact me directly to schedule an appointment 10/17 -10/22 in our showroom
Research Integrity Advisor and Data ManagementARDC
Dr Paul Wong from the Australian Research Data Commons presented at the University of Technology Sydney's RIA Data Management Workshop on 21 June 2018. In partnership with the Australian Research Council, the National Health and Medical Research Council, the Australian Research Data Commons, and RMIT University, this is part of a national workshop series in data management for research integrity advisors.
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Slides from Monday 30 July - Data in the Scholarly Communications Life Cycle Course which is part of the FORCE11 Scholarly Communications Institute.
Presenter - Natasha Simons
Merritt’s micro-services-based architecture provides a number of options for easy integration with diverse external discovery services with specific disciplinary focus on scientific data sharing. By removing many of the barriers faced by researchers interested in data publication, the integrations of Merritt with DataShare and Research Hub exemplify a new service model for cooperative and distributed data sharing. The widespread adoption of such sharing is critical to open scientific inquiry and advancement.
OU Library Research Support webinar: Data sharingDaniel Crane
Slides from a webinar delivered on 06th February 2018 for OU research staff and students. Covers data sharing policies; Benefits of data sharing; Data repositories; Preparing data for sharing; and Re-using data.
Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
A presentation given by Manjula Patel (UKOLN) at the Repository Curation Environments (RECURSE) Workshop held at the 4th International Digital Curation Conference, Edinburgh, 1st December 2008,
http://www.dcc.ac.uk/events/dcc-2008/programme/
Scott Edmunds slides for class 8 from the HKU Data Curation (module MLIM7350 from the Faculty of Education) course covering science data, medical data and ethics, and the FAIR data principles.
To facilitate data sharing from within the University of California system and beyond, the University of California Curation Center (UC3) is developing a new ingest and discovery layer for our data curation service, Dash. Dash uses the Merritt repository for preservation and a self-service overlay layer for submission and discovery of research datasets. The new overlay– dubbed Stash (STore And SHare)– will feature an enhanced user interface with a simple and intuitive deposit workflow, while still accommodating rich metadata. Stash will enable individual scholars to upload data through local file browse or drag-and-drop operation; describe data in terms of scientifically-meaning metadata, including methods, references, and geospatial information; identify datasets for persistent citation and retrieval; preserve and share data in an appropriate repository; and discover, retrieve, and reuse data through faceted search and browse. Stash can be implemented in conjunction with any standards-compliant repository that supports the SWORD protocol for deposit and the OAI-PMH protocol for metadata harvesting. Stash will feature native support for the DataCite or Dublin Core metadata schemas, but is designed to accommodate other schemas to support discipline-specific applications. By alleviating many of the barriers that have historically precluded wider adoption of open data principles, Stash empowers individual scholars to assert active curation control over their research outputs; encourages more widespread data preservation, publication, sharing, and reuse; and promotes open scholarly inquiry and advancement.
Data “publication” attempts to appropriate for data the prestige of publication in the scholarly literature. While the scholarly communication community substantially endorses the idea, it hasn’t fully resolved what a data publication should look like or how data peer review should work. To contribute an important and neglected perspective on these issues, we surveyed ~250 researchers across the sciences and social sciences, asking what expectations “data publication” raises and what features would be useful to evaluate the trustworthiness and impact of a data publication and the contribution of its creator(s).
In early 2014, we asked science and social science researchers...
• What expectations do the terms publication and peer review raise in reference to data?
• What features would be useful to evaluate the trustworthiness, evaluate the impact, and enhance the prestige of a data publication?
Although there is consensus that datasets should be treated like “first class” research objects in how they are discovered, cited, and recognized, this is still far from a reality. Datasets are poorly indexed by search engines, and they are rarely cited in formal reference lists. A solution that a number of journals are implementing is to publish discovery and citation proxy objects in the form of peer-reviewed “data papers.” A strength of this approach is that it requires dataset creators to write up rich and useful metadata for the paper, but an accompanying weakness is that busy creators are not always willing to invest the necessary time and energy. To enhance dataset discoverability without burdening creators, EZID (easy-eye-dee) will begin using dataset metadata to automatically generate lightweight, non-peer reviewed publications that will increase the exposure of the metadata to search engines. EZID (ezid.cdlib.org) maintains public DataCite metadata records for over 167,000 datasets, any of which could be viewed as HTML or as a dynamically generated PDF. In cases where the creator has submitted only the required DataCite metadata, the document will function as a cover-sheet or landing page. If the creator chooses to submit optional Abstract and Methods metadata (over 2,000 records already contain Abstracts), the document expands to more closely resemble a traditional journal article, while retaining the linking functionality of a landing page. A potential bonus is that providing an incrementally improved document in exchange for the effort of submitting incrementally improved metadata may encourage authors to submit more than the minimum required metadata.
Software development should build on the successful work of others. The DMPTool helps researchers with data management planning, but what about other phases of the data life cycle? In this webinar, we will discuss what software integration with the DMPTool might look like, and why it is important. Topics include:
1. Background: why tools integration is important; why we are talking about this in terms of the DMPTool.
2. Details and plans for DMPTool2 regarding software integration and compatibility.
3. Future possibilities for software integration for DMPTool2
4. Example of successful integration of tools: work at the Center for Open Science.
Data management plans existed long before the NSF started requiring them. DMPs have inherent value despite their being relatively unknown to researchers until now. Proper, thorough data management plans are potentially a major time saver and a huge asset for the project. In this webinar, we will cover how to go beyond funder requirements and develop more thorough data DMPs The Gulf of Mexico Research Initiative requires an extensive data management plan for projects it funds; we will hear about their efforts and how they are planning to use the DMPTool going forward.
More from University of California Curation Center (20)
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Libraries and Research Data Curation: Barriers and Incentives for Preservation, Sharing, and Reuse
1. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Libraries and Research Data Curation
Barriers and Incentives for Preservation, Sharing, and Reuse
Stephen Abrams
University of California Curation Center
California Digital Library
www.cdlib.org/uc3
2. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Why is data curation important?
Accelerating scientific progress
Enabling appropriate scrutiny and verification of results
Promoting integrity and debate
Facilitating new collaborations
Avoiding needless duplication of effort
Increasingly, complying with institutional policies, publication
requirements, and funder mandates
Cf. White and Teds (2011), “Making the case for research data management” DCC briefing
paper, www.dcc.ac.uk/resources/briefing-papers/making-case-rdm
3. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
The library’s role
A continuation of its long-standing mission and practice to
connect patrons with content of interest in meaningful ways
across barriers of space and time
Cf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th
IFLA General Conference and Assembly, Helsinki, conference.ifla.org/past/ifla78/116-tenopir-en.pdf
Offering solutions that enhance the natural points of
alignment between the scholarly research and information
lifecycles
Publish
Reuse
ShareCreate
Discover
Collect
PreserveAccessResearchResearch CurationCuration
Scholarly lifecycle Information lifecycle
4. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Addressing barriers to adoption
Critical issues on both the demand…
Poor discovery
and supply side …
Unfamiliar processes
Loss of control
Inadequate guidance
Cf. Schäfer et al. (2011), Baseline Report on Drivers and Barriers in Data Sharing, hdl:10013/epic.39262
Better access to tools and resources
Embedded best practices
Data use agreements
Data management planning
Data publication and citation
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
5. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
Provide the same infrastructural support for data that exists
for traditional publications
Unique, actionable identifiers
Stable citation
Bi-directional references between publications and the data that
underlay their analysis, synthesis, and summarization
Discovery via disciplinary portals, catalogs, and web searches
Use and impact metrics
www.flickr.com/photos/fotobib/5555065521 www.flickr.com/photos/minhmeoinfo/4597866532
6. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data publication and citation
Provide the same infrastructural support for data that exists
for traditional publications
http://n2t.net/ezid
ARK and DOI identifiers
Descriptive metadata
Resolution targets
Aggregation by DataCite
(and soon) Primo and Web of Knowledge
7. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
www.flickr.com/photos/vixon/116447718
8. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Data curation is an unfamiliar set of concepts, practices, and
jargon to most researchers
It’s easier to augment systems than change behaviors
Embed curation best practices into tools and workflows already
used by researchers
www.flickr.com/photos/34067077@N00/4576265327 www.flickr.com/photos/wealthofhealth4/6919840647
9. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
http://dataup.cdlib.org/
2013 Innovation Award winner
10. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
http://n2t.net/ark:/90135/q13j39xf
2013 Innovation Award winner
11. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
DataONE federation
http://dataone.org/
http://cn.dataone.org/onemercury
2013 Innovation Award winner
12. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Embedded best practices
Excel is often the database of choice for many researchers
Excel add-in and Azure web service
Automates …
Best practices check
Data description
Persistent identifier and
citation generation
Repository submission
ONEShare repository
http://merritt.cdlib.org/m/oneshare_dataup
DataONE federation
http://dataone.org/
So you don’t need to know …
Metadata schema
XML syntax
Identifier registration
Packaging standards
Submission protocol
Aggregation/harvesting
mechanism
2013 Innovation Award winner
13. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
14. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
http://datashare.ucsf.edu/
15. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
From: no-reply-merritt@ucop.edu
Subject:Merritt DUA acceptance
Name: Stephen Abrams
Affiliation: California Digital Library
Collection: UCSF DataShare
Object: Frontotemporal Lobar Degeneration (FTLD)
Date: 2013-05-3109:50:34PDT
Terms of use: As part of this agreement, Consumer submits to the following
statements:
(1) I will receive access to de-identified data and will not attempt to establish the
identity of any of the study subjects.
(2) I will share these data only with my immediate co-workers, and I will not transfer
these data to other research groups. I understand that these data are available to
other research groups through the process by which I obtain them.
(3) I will require anyone in my group who utilizes these data, or anyone with whom I
share these data to comply with this data use agreement
...
http://datashare.ucsf.edu/
16. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data use agreements
Maintain control over the dissemination of research results
through click-through DUAs
Assert explicit license requirements and terms of use
Notification of consumer acceptance
Cf. Brazhnik and Jones (2007), “Anatomy of data integration,” Journal of Biomedical Informatics 40(3): 252-
69, doi:10.1016/j.jbi.2006.09.001
Next steps …
Disciplinary survey of current DUA practice
Collaborate with Creative Commons to establish “model” DUAs
17. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
Researchers are being asked to plan for data curation by
institutional policy and as a pre-condition for publication and
grant funding
Cf. Office of Science and Technology Policy (2013), Increasing Access to the Results of Federally Funded
Scientific Research, www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_
memo_2013.pdf
18. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Data management planning
provides guidance and resources for managing plans
Edit, publish, and share DMPs
Customizable for funding agency requirements
Customizable for general, disciplinary, and institutional resources
19 requirement templates
43 resource sets
Next steps …
DMPTool2: Follow-on
development –
Sloan Foundation
Outreach and
training – IMLS
http://dmptool.org/
http://blog.dmptool.org/
19. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
“Access to and sharing of data are essential for the conduct
and advancement of science”
— Arzberger et al. (2004), “Promoting access to public research data for
scientific, economic, and social development,” Data Science Journal 3: 135-
52, doi:10.2481/dsj.3.135
Libraries are a natural partner for the research community
Deep and broad experience in the curation, preservation, and
dissemination of digital assets
Subject area specialization in
science, technology, engineering, and mathematics
Collaborations with campus IT groups and data centers
20. Future of Scientific Publishing: Open Access to Manuscripts and Big Data
Stanford University, June 27, 2013
Removing barriers, providing incentives
Libraries are a natural partner for the research community
Effective discovery through … Data publication and citation
Maintain control through … Data use agreements
Familiar processes through … Embedded best practices
Guidance and resources through …Data management planning
www.slideshare.net/UC3/uc3-librariesandcurationbarriersandincentives
www.cdlib.org/uc3
uc3@ucop.edu
n2t.net/ezid datashare.ucsf.edu merritt.cdlib.org dmptool.orgdataup.org
Barry Egan, File rio 2006, http://www.flickr.com/photos/vixon/116447718
Wealth of Health, Nanomedicinescientifist working at the laboratory, http://www.flickr.com/photos/wealthofhealth4/6919840647Martin Caltrane, Work desk, http://www.flickr.com/photos/34067077@N00/4576265327