Talk at CNI 2015 Spring Membership Meeting in Seattle on April 14th, 2015, see http://www.cni.org/events/membership-meetings/upcoming-meeting/spring-2015/
Abstract: The goal of the InFoLiS project is to connect research data and publications. Links between data and literature are created automatically by means of text mining and made available as Linked Open Data (LOD) for seamless integration into different retrieval systems. This enables scientists to directly access information about corresponding research data in a literature information system, and, vice versa, it is possible to directly find different interpretations and analyses in the literature of the same research data. In our talk, we will describe our methods for generating the links and give insight into the Linked Data infrastructure including the services we are currently building. Most importantly, we will detail how our solutions can be used by other institutions and invite all interested participants to discuss with us their ideas and thoughts on the requirements for these services to ensure broad interoperability with existing systems and infrastructures. InFoLiS is a joint project by the GESIS – Leibniz Institute for the Social Sciences, Cologne, Mannheim University Library, and Mannheim University supported by a grant from the DFG – German Research Foundation.
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Discussion of the role of academic libraries in the curation, preservation, and sharing of research data, particularly with regard to addressing barriers and providing incentives. Four specific tools are presented: EZID, data use agreements (DUAs) in the Merritt/DataShare repository, DataUp, and DMPTool.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.
Poster RDAP13: Research Data in eCommons @ Cornell: Present and FutureASIS&T
Wendy A. Kozlowski, Dianne Dietrich, Gail Steinhart and Sarah Wright
Cornell University Library, Ithaca, NY
Research Data in eCommons @ Cornell: Present and Future
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Discussion of the role of academic libraries in the curation, preservation, and sharing of research data, particularly with regard to addressing barriers and providing incentives. Four specific tools are presented: EZID, data use agreements (DUAs) in the Merritt/DataShare repository, DataUp, and DMPTool.
The goal of the Very Open Data Project is to provide a software-technical foundation for this exchange of data, more specifically to provide an open database platform for data from the raw data coming from experimental measurements or models through intermediate manipulations to finally published results. The sheer expanse of the amount data involved creates some unique software-technical challenges. One of these challenges is addressed in the part of the study presented here, namely to characterize scientific data (with the initial focus being detailed chemistry data from the combustion kinetic community), so that efficient searches can be made. A formalization of this characterization comes in the form of schemas of descriptions of tags and keywords describing data and ontologies describing the relationship between data types and the relationship between the characterizations themselves. These will be translated to meta-data tags connected to the data points within a non-relational data of data for the community.
The focus of the initial work will be on data and its accessibility. As the project progresses, the emphasis will shift on not only having available data accessible for the community, but that the community itself will be able to, with emphasis on minimal effort, will be able contribute their own data. This will involve, for example, the concepts of the ‘electronic lab notebook’ and the existence and availability of extensive concept extraction tools, primarily from the chemical informatics field.
Poster RDAP13: Data information literacy multiple paths to a single goalASIS&T
Jake Carlson, Jon Jeffryes, Brian Westra and Sarah Wright
Data Information Literacy: Multiple Paths to a Single Goal
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...ASIS&T
Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries
A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Erica M. Johns, Jon Corson-Rikert, Huda J. Khan, Dean B. Krafft and Matthew S. Mayernik
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
A demonstration of the DMPTool, which helps researchers create data management plans now required by the Nat'l Science Foundation and other US grant funding agencies. See http://www.cdlib.org/uc3/webinars/20111019/
for recording.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
This presentation was provided by Karen Baker, University of Illinois - Urbana-Champaign, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Presentation by Lisa Federer (UCLA) on 16 July 2013 as part of the IMLS-sponsored DMPTool Webinar Series.
Description: This webinar will discuss the special needs of health sciences researchers and help you learn how to talk to researchers in the health and medical fields about their data management needs. We will cover NIH Data Sharing Policy and how to write a data management plan that meets NIH’s requirements. After viewing this webinar, participants will understand: who is required to submit a plan; specific information that should be included in a plan; how to use the DMPTool to write an NIH-specific DMP; and where to find additional resources for help.
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Karen Estlund
Presentation by: Karen Estlund, Sarah Hamid, and Bryce Peake
At the CNI spring 2012 meeting, we presented on a new collaborative journal publishing project from The Fembot Collective and the University of Oregon (UO) Libraries, Ada: A Journal of Gender, New Media, and Technology. The Fembot Collective is a collaborative of feminist media scholars, producers, and artists engaged with the intersection of new media and technology and scholarly communication. One aspiration of this project was to reclaim the means of scholarly production through a community-centered model of open peer review and multi-modal publication processes. As a work in progress, Ada has continuously evolved to meet the needs of diverse authors, readers, and commentators. In the face of changing scholarly communication practices, the Fembot and library collaboration offers an alternative system of open-access publication and review that recaptures academic production structures in favor of cross-disciplinary, multi-modal, collaborative knowledge. Our community standards state that “responding is political work” emphasizing a space that demands constant redirection and active participation by its collaborators in order to generate new expressions of feminist open access scholarship over time. Now in our third year of publication and working on our ninth issue, we will review lessons learned about audience, production, infrastructure, design and assessment. We will discuss the ways in which our intervention has been transformed by, while also transforming, discussions about participatory media, open and collaborative peer review, production costs, and the intersections of technical and intellectual labor.
http://adanewmedia.org
http://fembotcollective.org
https://library.uoregon.edu/digitalscholarship
Software curation as a digital preservation serviceKeith Webster
Presentation to the Coalition for Networked Information Spring Conference, Seattle, April 2015 by Keith Webster of Carnegie Mellon University and Euan Cochrane of Yale. Describes need for software curation services, and offers two examples, one from each of our universities, of library engagement.
RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...ASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Erica M. Johns, Jon Corson-Rikert, Huda J. Khan, Dean B. Krafft and Matthew S. Mayernik
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
Key lecture for the EURO-BASIN Training Workshop on Introduction to Statistical Modelling for Habitat Model Development, 26-28 Oct, AZTI-Tecnalia, Pasaia, Spain (www.euro-basin.eu)
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
A demonstration of the DMPTool, which helps researchers create data management plans now required by the Nat'l Science Foundation and other US grant funding agencies. See http://www.cdlib.org/uc3/webinars/20111019/
for recording.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
This presentation was provided by Karen Baker, University of Illinois - Urbana-Champaign, during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
This presentation was provided by Tim McGeary of Duke University during the NISO virtual conference, Open Data Projects, held on Wednesday, June 13, 2018.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Presentation by Lisa Federer (UCLA) on 16 July 2013 as part of the IMLS-sponsored DMPTool Webinar Series.
Description: This webinar will discuss the special needs of health sciences researchers and help you learn how to talk to researchers in the health and medical fields about their data management needs. We will cover NIH Data Sharing Policy and how to write a data management plan that meets NIH’s requirements. After viewing this webinar, participants will understand: who is required to submit a plan; specific information that should be included in a plan; how to use the DMPTool to write an NIH-specific DMP; and where to find additional resources for help.
Publishing Ada: A Retrospective Look at the First Three Years of an Open Peer...Karen Estlund
Presentation by: Karen Estlund, Sarah Hamid, and Bryce Peake
At the CNI spring 2012 meeting, we presented on a new collaborative journal publishing project from The Fembot Collective and the University of Oregon (UO) Libraries, Ada: A Journal of Gender, New Media, and Technology. The Fembot Collective is a collaborative of feminist media scholars, producers, and artists engaged with the intersection of new media and technology and scholarly communication. One aspiration of this project was to reclaim the means of scholarly production through a community-centered model of open peer review and multi-modal publication processes. As a work in progress, Ada has continuously evolved to meet the needs of diverse authors, readers, and commentators. In the face of changing scholarly communication practices, the Fembot and library collaboration offers an alternative system of open-access publication and review that recaptures academic production structures in favor of cross-disciplinary, multi-modal, collaborative knowledge. Our community standards state that “responding is political work” emphasizing a space that demands constant redirection and active participation by its collaborators in order to generate new expressions of feminist open access scholarship over time. Now in our third year of publication and working on our ninth issue, we will review lessons learned about audience, production, infrastructure, design and assessment. We will discuss the ways in which our intervention has been transformed by, while also transforming, discussions about participatory media, open and collaborative peer review, production costs, and the intersections of technical and intellectual labor.
http://adanewmedia.org
http://fembotcollective.org
https://library.uoregon.edu/digitalscholarship
Software curation as a digital preservation serviceKeith Webster
Presentation to the Coalition for Networked Information Spring Conference, Seattle, April 2015 by Keith Webster of Carnegie Mellon University and Euan Cochrane of Yale. Describes need for software curation services, and offers two examples, one from each of our universities, of library engagement.
The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI).
The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology.
The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.
Piloting Linked Data to Connect Library and Archive Resources to the New Worl...Laura Akerman
Presentation for the CNI (Coalition for Networked Information) Fall Forum, December 2012. Describes Emory University Library’s first-hand experience in interlinking Civil War-related materials and other online resources by leveraging open linked data principles. The library has been actively evaluating linked data’s potential to replace current library processes and services (bibliographic services, finding aids, cataloging, and metadata work) as a more efficient and sustainable means, and one that could bring greater benefit to end users for research and learning. The Library’s initial focus was on workforce education and hands-on learning through real-time experiments: the Connections project was begun to prepare staff to work with linked data, a process that has culminated in a 3-month hands-on pilot to build and convert some data. The pilot introduced the concept to a wide range of staff, including subject liaisons, archivists, metadata librarians, and programmers. Emory’s “silos” of data were interlinked with other open data sources as a way to enhance user discovery and use of library materials on a very limited scale.
Semantic Linking & Retrieval for Digital LibrariesStefan Dietze
An overview of recent works on entitiy linking and retrieval in large corpora, specifically bibliographic data. The works address both traditional Linked Data and knowledge graphs as well as data extracted from Web markup, such as the Web Data Commons.
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchHan Woo PARK
빅데이터가 학문으로 등장한 배경을 잘 정리한 논문
http://www.sciencedirect.com/science/article/pii/S1751157713000473
Park, H.W.@, & Leydesdorff, L. (2013). Decomposing Social and Semantic Networks in Emerging “Big Data” Research. Journal of Informetrics. 7 (3), 756-765. DOI information: 10.1016/j.joi.2013.05.004
Tools für das Management von ForschungsdatenHeinz Pampel
Workshop „Wege in die Köpfe“ des DFG-Projekts „EWIG - Entwicklung von Workflowkomponenten für die Langzeitarchivierung von Forschungsdaten in den Geowissenschaften“ | Berlin, 03.07.2014
Big data is prevalent in our daily life. Not surprisingly, big data becomes a hot topic discussedby commercial worlds, media, magazines, general publics and elsewhere. From academic point of view, isit a research area of potential worth being explored? Or it is just another hype? Are there only computer orIS related scholars suitable for big data research due to its nature? Or scholars from other research areas are alsosuitable for this subject? This study aims to answer these questions through the use of informetricsapproach and data source form the SSCI Journal database, leveraging informetric‟s robust natures ofquantitative power of analyze information in any form onto the data source of representativeness. This research shows that big data research is at its growth phase with an exponential growth patternsince 2012 and with great potential for years to come. And perhaps surprisingly, computer or IS relateddisciplinesare not on the top 5 research areas fromthis research results. In fact, the top five research disciplinesare more diversified then expected: business economics (#1), Government Law (#2), InformationScience/ Library Science (#3), Social Science (#4) and Computer Science (#5). Scholars from the USuniversities are the most productive in this subject while Asian countries, including Taiwan, are alsovisible. Besides, this study also identifies that big data publications from SSCI journal database during2005-2015 do fit Lotka‟s law. This study contributes tounderstand the current big data research trends and also show the ways toresearchers who are interested to conduct future research in big data regardless of their research backgrounds.
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
This is a thesis presentation about interlinking educational data to Web of Data. I explain how I used the Linked Data approach to expose and interlink educational data to the Linked Open Data cloud
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)dri_ireland
Presentation given by Martin Donnelly, Senior Institutional Support Officer at the Digital Curation Centre (DCC), as part of the panel session “Digital data sharing: the opportunities and challenges of opening research” at the Digital Humanities conference, Krakow, 15 July 2016. The presentation looks at digital data curation at the DCC.
In materials sciences, a large amount of research data is generated through a broad spectrum of different
experiments. As of today, experimental research data including meta-data in materials science is often
stored decentralized by the researcher(s) conducting the experiments without generally accepted standards
on what and how to store data. The conducted research and experiments often involve a considerable
investment from public funding agencies that desire the results to be made available in order to increase
their impact. In order to achieve the goal of citable and (openly) accessible materials science experimental
research data in the future, not only an adequate infrastructure needs to be established but the question of
how to measure the quality of the experimental research data also to be addressed. In this publication, the
authors identify requirements and challenges towards a systematic methodology to measure experimental
research data quality prior to publication and derive different approaches on that basis. These methods are
critically discussed and assessed by their contribution and limitations towards the set goals. Concluding, a
combination of selected methods is presented as a systematic, functional and practical quality measurement
and assurance approach for experimental research data in materials science with the goal of supporting
the accessibility and dissemination of existing data sets.
Metadata for digital long-term preservationMichael Day
Presentation given at the Max Planck Gesellschaft eScience Seminar 2008: Aspects of long-term archiving, hosted by the Gesellschaft für Wissenschaftliche Datenverarbeitung mbh Göttingen (GWDG), Göttingen, Germany, 19-20 June 2008
Research Objects: more than the sum of the partsCarole Goble
Workshop on Managing Digital Research Objects in an Expanding Science Ecosystem, 15 Nov 2017, Bethesda, USA
https://www.rd-alliance.org/managing-digital-research-objects-expanding-science-ecosystem
Research output is more than just the rhetorical narrative. The experimental methods, computational codes, data, algorithms, workflows, Standard Operating Procedures, samples and so on are the objects of research that enable reuse and reproduction of scientific experiments, and they too need to be examined and exchanged as research knowledge.
A first step is to think of Digital Research Objects as a broadening out to embrace these artefacts or assets of research. The next is to recognise that investigations use multiple, interlinked, evolving artefacts. Multiple datasets and multiple models support a study; each model is associated with datasets for construction, validation and prediction; an analytic pipeline has multiple codes and may be made up of nested sub-pipelines, and so on. Research Objects (http://researchobject.org/) is a framework by which the many, nested and contributed components of research can be packaged together in a systematic way, and their context, provenance and relationships richly described.
Similar to Integration of research literature and data (InFoLiS) (20)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Integration of research literature and data (InFoLiS)
1. Integration of research literature and data
(InFoLiS)
Katarina Boland1
Philipp Zumstein2
1
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
2
Mannheim University Library, Mannheim, Germany
CNI 2015 Spring Membership Meeting
April 14th, 2015
2. the InFoLiS project:
Integration of research data and publications
InFoLiS I: 05/2011 - 05/2013
InFoLiS II: 08/2014 - 08/2016
InFoLiS is funded by the DFG (SU 647/2-1)
Integration of research literature and data (InFoLiS) 2/22
Introduction
3. Catalogue:
Publications
SSOAR (GESIS),
Primo (UB MA),
...
DataCatalogue:
Research Data
da|ra (GESIS),
...
Query
Query
Response
Links
Response
Response
Response
Integration of research literature and data (InFoLiS) 3/22
InFoLiS Project Goals
4. 1 Part I: Generation of Links
2 Part II: How can you reuse it?
Integration of research literature and data (InFoLiS) 4/22
Outline
6. Recommendation:1
:
Creator (Publication Date): Title. Publication
Agent. Identifier
Creator (Publication Date): Title. Version.
Publication Agent. Type of Resource. Identifier.
→ Extraction based on these patterns?
1
see
http://auffinden-zitieren-dokumentieren.de/zitieren/empfohlene-datenzitation/
Integration of research literature and data (InFoLiS) 6/22
Citation of Research Data
7. presentation and discussion of the empirical findings. For this purpose, data
from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used
and for both periods, the impact factors are estimated using linear regression
models.
data from the title of the years year are used
Integration of research literature and data (InFoLiS) 7/22
References to Datasets
8. Table 1: Population forecast for Germany depending on age cohorts -
proportion in percent.
Data base: 10th Population Forecast of the Federal Statistical Office , variant 5.
(Data base: number title of the publication agent, variant
variant)
Integration of research literature and data (InFoLiS) 8/22
References to Datasets
9. Consulted were furthermore ...
Consulted were furthermore title1, title2, title3, ..., titleN.
Integration of research literature and data (InFoLiS) 9/22
References to Datasets
10. Table 3: Sample of the surveys conducted in the years 2003 and 2004 as well
as size of the sample, with valid data from both surveys
(Source: Ditton et al. 2005a)
(Source: citation of descriptive publication)
Integration of research literature and data (InFoLiS) 10/22
References to Datasets
11. ...are hard to detect!
see also...
Green, Toby (2009). We Need Publishing Standards for
Datasets and Data Tables. OECD Publishing White Paper.
doi: 10.1787/603233448430
Altman, Micah and Gary King (2007). A Proposed Standard
for the Scholarly Citation of Quantitative Data. In: D-Lib
Magazine 13.3.
url: http://www.dlib.org/dlib/march07/altman/03altman.html
Integration of research literature and data (InFoLiS) 11/22
References to Datasets
12. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
13. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage der
Sozialwissenschaften/German General Social Survey 1996)”
14. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“ALLBUS/GGSS 1996 (Allgemeine Bev¨olkerungsumfrage
der Sozialwissenschaften/German General Social Survey 1996)”
“ALLBUS 96”
15. Integration of research literature and data (InFoLiS) 12/22
Automatic Identification of
References
Why not simply search for study titles in publications?
“Youth 2010”
16. How do humans recognize study references?
Source: Estimations based on SOEP, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
17. How do humans recognize study references?
Source: Estimations based on xyz, wave 2002.
Integration of research literature and data (InFoLiS) 13/22
General idea
19. for details see...
Katarina Boland, Dominique Ritze, Kai Eckert & Brigitte Mathiak (2012).
Identifying References to Datasets in Publications. In: Proceedings of the
Second International Conference on Theory and Practice of Digital Libraries
(TPDL), Lecture Notes in Computer Science Volume 7489, pp. 150-161. Berlin:
Springer. doi:10.1007/978-3-642-33290-6 17
Integration of research literature and data (InFoLiS) 15/22
Reference Extraction
21. Strategies: 1) greedy; 2) exact; 3) best
Integration of research literature and data (InFoLiS) 17/22
Mapping to Datasets in da|ra:
granularity of registration vs. citation
22. ALLBUS
ALLBUS 2000 ALLBUS 1996ALLBUS 1998
ALLBUS 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI/PAPI
ALLBUScompact 2000
CAPI
ALLBUS - Cumulation 1980-2006 ALLBUS - Cumulation 1980-2008ALLBUScompact - Cumulation 1980-2010
ALLBUScompact 2000 ... ... ...
......
... ... ... ... ...
... ... ... ... ... ... ... ... ... ... ... ......
ALLBUScompact
→ use ontology
Integration of research literature and data (InFoLiS) 18/22
Mapping to Datasets in da|ra
23. Vocabulary: e.g. DDI-RDF Discovery Vocabulary2
2
Thomas Bosch, Richard Cyganiak, Arofan Gregory, Joachim Wackerow (2013): DDI-RDF Discovery Vocabulary: A Metadata
Vocabulary for Documenting Research and Survey Data. In: Proceedings of the 6th Linked Data on the Web (LDOW) Workshop at
the 22nd International World Wide Web Conference (WWW). CEUR Workshop Proceedings, pp. 46-55
Integration of research literature and data (InFoLiS) 19/22
Ontology: Approach
26. Thank you for your attention!
katarina.boland@gesis.org
Integration of research literature and data (InFoLiS) 22/22
Next part: How can you
reuse it?
33. (Internal) Data structure
Document
Pattern
Executation of
Algorithm
Study Title
Study URI
Which other study
titles are found with
the new
configuration of the
algorithm?
How was a
pattern derived?
Which studies
are found in an
document?
36. RESTful API (web services)
GET, POST, PUT, DELETE, PATCH resources
Search, perform algorithms, upload files
open for integration into other workflows, e.g. in
ressource discovery systems
research data catalogues
digital repositories
possible to orchestrate over a web interface for
individual use
41. Quoting the Horizon Report 2014
“Visionary leadership for research data management
models is also required to determine how to best
incorporate data connections into library catalogs” (NMC
Horizon Report 2014 - Library Edition, p. 7)
42. Current situation: Several steps needed
Common situation today:
Search online catalogue
Evaluate search results
Find fulltext to relevant source
Read the publication
Spot the research data
Moreover, often the reverse information is missing
completely
Which publications are built on some specific
research data?
43. Clientside
load additional data in
catalogue view (e.g. over
Ajax)
enrich view, links
up-to-date data
Embedd data in the web
presentation
Serverside
add additional data in your
catalogue database (e.g.
Primo enrichement process)
enrich view, links, search,
sort, filter
time-lagged because of
the update mechanism
Do the data fit into
existing infrastructure?
(fields, tables, database)
Two Approaches
44. Integration as links
Link from catalogue entry ...
… to the corresponding research data
45. Integration as popup
Cited research data: 2
• ALLBUS 2010 (used in 512 publications)
• part of ALLBUS (used in 13.456 publications)
• own research data (used in 1 publications)
48. Enrich your research data catalogue
Cited in: Ritze, D., Paulheim, H., &
Eckert, K. (2013). Evaluation Measures
for Ontology Matchers in Supervised
Matching Scenarios. In The Semantic
Web – ISWC 2013 (p. 392–407).
Tags from Publication: Supervised
Ontology Matching, Evaluation, Recall,
Precision, F-Measure, Precision@N-
Curves, ROC-Curves, Precision-Recall-
Curves
49. Current Goals of the Project
1. Expansion to other disciplines and languages
2. Linked data based infrastructure
3. Improve the reusability of generated links
50. Dissemination
our web services will be open for everyone
project webpage
http://infolis.github.io/
background information,
slides, publications, news
Additionally our code is open source
https://github.com/infolis
you can install/try out everything locally
development of code
51. Questions, Discussions, Feedback
Questions?
Discussions
Give us feedback
Small online survey: http://t1p.de/infolis
http://wiki.bib.uni-mannheim.de/limesurvey/index.php?sid=55594