Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
These are the slides for Robert H. McDonald for the Future Trends Panel Presentation at the the Inter-institutional Approaches to Supporting Scholarly Communication Symposium held on August 16, 2012 at the Georgia Institute of Technology.
Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator
The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.
Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center
Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Two scientists can be using "the same data" even though the computer files involved appear to be quite different. This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach. Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described.
These are the slides for Robert H. McDonald for the Future Trends Panel Presentation at the the Inter-institutional Approaches to Supporting Scholarly Communication Symposium held on August 16, 2012 at the Georgia Institute of Technology.
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkASIS&T
Mark Leggott, University of PEI/DiscoveryGarden
Islandora: Stewarding research data using the Islandora framework
Mark Leggott, Thornton Staples and Kathleen Van Ekris
Panel: Global scientific data infrastructure
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
Creating a sustainable business model for a digital repository: the Dryad experience
Peggy Schaeffer
Datadryad.org
Presentation at Research Data Access & Preservation Summit
22 March 2012
Discussion of the role of academic libraries in the curation, preservation, and sharing of research data, particularly with regard to addressing barriers and providing incentives. Four specific tools are presented: EZID, data use agreements (DUAs) in the Merritt/DataShare repository, DataUp, and DMPTool.
Slides from my Metadata Workshop at Content Strategy Applied 2012. The session included several hands on exercises, which is where a lot of the interesting conversation took place.
About the Webinar
In May 2012, the Library of Congress announced a new modeling initiative focused on reflecting the MARC 21 library standard as a Linked Data model for the Web, with an initial model to be proposed by the consulting company Zepheira. The goal of the initiative is to translate the MARC 21 format to a Linked Data model while retaining the richness and benefits of existing data in the historical format.
In this webinar, Eric Miller of Zepheira will report on progress towards this important goal, starting with an analysis of the translation problem and concluding with potential migration scenarios for a broad-based transition from MARC to a new bibliographic framework.
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkASIS&T
Mark Leggott, University of PEI/DiscoveryGarden
Islandora: Stewarding research data using the Islandora framework
Mark Leggott, Thornton Staples and Kathleen Van Ekris
Panel: Global scientific data infrastructure
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
Creating a sustainable business model for a digital repository: the Dryad experience
Peggy Schaeffer
Datadryad.org
Presentation at Research Data Access & Preservation Summit
22 March 2012
Discussion of the role of academic libraries in the curation, preservation, and sharing of research data, particularly with regard to addressing barriers and providing incentives. Four specific tools are presented: EZID, data use agreements (DUAs) in the Merritt/DataShare repository, DataUp, and DMPTool.
Slides from my Metadata Workshop at Content Strategy Applied 2012. The session included several hands on exercises, which is where a lot of the interesting conversation took place.
About the Webinar
In May 2012, the Library of Congress announced a new modeling initiative focused on reflecting the MARC 21 library standard as a Linked Data model for the Web, with an initial model to be proposed by the consulting company Zepheira. The goal of the initiative is to translate the MARC 21 format to a Linked Data model while retaining the richness and benefits of existing data in the historical format.
In this webinar, Eric Miller of Zepheira will report on progress towards this important goal, starting with an analysis of the translation problem and concluding with potential migration scenarios for a broad-based transition from MARC to a new bibliographic framework.
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
Publishing your research: Research Data Management (Introduction) Jamie Bisset
Publishing your research: Research Data Management (Introduction) (November 2013) slides. Delivered as part of the Durham University Researcher Development Programme. Further Training available at https://www.dur.ac.uk/library/research/training/
Presentation given to the High Performance Computing Summer School as part of a hands-on workshop developing software management plans and looking at software as data within the context of research data management best practices.
Spring 2014 Data Management Lab: Session 1 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
I shall provide a summary of JISC work in the area of ‘Big Data’. My primary focus will be on how to manage the huge amount of research data produced in UK Universities. I shall cover the history of JISC interventions to improve research data management and look at next steps. I shall touch on some other areas of work like ‘Digging into Data’ and web archiving which also deal with ‘big data’.
Similar to Needs for Data Management & Citation Throughout the Information Lifecycle (20)
Selecting efficient and reliable preservation strategiesMicah Altman
This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies.
This discussion, covened by the Dubai Future Foundation, focusses on identifying the significance of the concept of well-being for social-science and policy; and the opportunities to measure it at scale.
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses.
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
Libraries enable patrons to access a wide range of information, but much of the access to this information is now directly managedy publishers. This has lead to a significant gap across library values, patrons perception of privacy, and effective privacy protection for access to digital resources.
In the work included below, and presented at NERCOMP 2019, we review privacy principles based on ALA, IFLA, and NISO policies. We then organizing and comparing high level privacy protections required by ALA checklist, NISO, and GDPR. This framework of principles and controls is then used to score the privacy policies and practices of major vendors of research library content. We evaluate each element of the vendors privacy policy, and use instrumented browsers to identify the types of tracking mechanisms used by different vendors. We use this set of privacy scores to support analyses of change over time, and of potential gaps between patron expectations and privacy policies and practices.
Presentation by Philip Cohen on collaborative work with Micah Altman as part of the MIT CREOS research talk series. Presented in fall 2018, in Cambridge, MA.
Contemporary journal peer review is beset by a range of problems. These include (a) long delay times to publication, during which time research is inaccessible; (b) weak incentives to conduct reviews, resulting in high refusal rates as the pace of journal publication increases; (c) quality control problems that produce both errors of commission (accepting erroneous work) and omission (passing over important work, especially null findings); (d) unknown levels of bias, affecting both who is asked to perform peer review and how reviewers treat authors, and; (e) opacity in the process that impedes error correction and more systematic learning, and enables conflicts of interest to pass undetected. Proposed alternative practices attempt to address these concerns -- especially open peer review, and post-publication peer review. However, systemic solutions will require revisiting the functions of peer review in its institutional context.
Presentation by Philip Cohen and Micah Altman on developing an exchange system for peer review in support for open science. Prepared for presentation at the ACRL-SSRC meeting on Open scholarship in the social sciences. Washington DC, Dec 2018
Redistricting in the US -- An OverviewMicah Altman
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
Scott Bradner is a Berkman Center affiliate who worked for 50 at Harvard in the areas of computer programming, system management, networking, IT security, and identity management. Scott Bradner was involved in the design, operation and use of data networks at Harvard University since the early days of the ARPANET and served in many leadership roles in the IETF. He presented the talk recorded below, entitled, A History of the Internet -- as part of Program on Information Science Brown Bag Series:
Bradner abstracted his talk as follows:
In a way the Russians caused the Internet. This talk will describe how that happened (hint it was not actually the Bomb) and follow the path that has led to the current Internet of (unpatchable) Things (the IoT) and the Surveillance Economy.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
Cassidy Sugimoto is Associate Professor in the School of Informatics and Computing, Indiana University Bloomington, who researches within the domain of scholarly communication and scientometrics, examining the formal and informal ways in which knowledge producers consume and disseminate scholarship. She presented this talk, entitled Labor And Reward In Science: Do Women Have An Equal Voice In Scholarly Communication? A Brown Bag With Cassidy Sugimoto, as part of the Program on Information Science Brown Bag Series.
Despite progress, gender disparities in science persist. Women remain underrepresented in the scientific workforce and under rewarded for their contributions. This talk will examine multiple layers of gender disparities in science, triangulating data from scientometrics, surveys, and social media to provide a broader perspective on the gendered nature of scientific communication. The extent of gender disparities and the ways in which new media are changing these patterns will be discussed. The talk will end with a discussion of interventions, with a particular focus on the roles of libraries, publishers, and other actors in the scholarly ecosystem..
Utilizing VR and AR in the Library Space:Micah Altman
Matt Bernhardt is a web developer in the MIT libraries and a collaborator in our program. He presented this talk, entitled Reality Bytes - Utilizing VR and AR in The Library Space, as part of Program on Information Science Brown Bag Series.
Terms like "virtual reality" and "augmented reality" have existed for a long time. In recent years, thanks to products like Google Cardboard and games like Pokemon Go, an increasing number of people have gained first-hand experience with these once-exotic technologies. The MIT Libraries are no exception to this trend. The Program on Information Science has conducted enough experimentation that we would like to share what we have learned, and solicit ideas for further investigation.
For slides and comments see: http://informatics.mit.edu/blog
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
Catherine D'Ignazio is an Assistant Professor of Civic Media and Data Visualization at Emerson College, a principal investigator at the Engagement Lab, and a research affiliate at the MIT Media Lab/Center for Civic Media. She presented this talk, entitled, Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots as part of Program on Information Science Brown Bag Series.
Communities, governments, libraries and organizations are swimming in data—demographic data, participation data, government data, social media data—but very few understand what to do with it. Though governments and foundations are creating open data portals and corporations are creating APIs, these rarely focus on use, usability, building community or creating impact. So although there is an explosion of data, there is a significant lag in data literacy at the scale of communities and citizens. This creates a situation of data-haves and have-nots which is troubling for an open data movement that seeks to empower people with data. But there are emerging technocultural practices that combine participation, creativity, and context to connect data to everyday life. These include data journalism, citizen science, emerging forms for documenting and publishing metadata, novel public engagement in government processes, and participatory data art. This talk surveys these practices both lovingly and critically, including their aspirations and the challenges they face in creating citizens that are truly empowered with data.
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
Access to high-quality, relevant information is absolutely foundational for a quality education. Yet, so many schools across the developing world lack fundamental resources, like textbooks, libraries, electricity and Internet connectivity. The SolarSPELL (Solar Powered Educational Learning Library) is designed specifically to address these infrastructural challenges, by bringing relevant, digital educational content to offline, off-grid locations. SolarSPELL is a portable, ruggedized, solar-powered digital library that broadcasts a webpage with open-access educational content over an offline WiFi hotspot, content that is curated for a particular audience in a specified locality—in this case, for schoolchildren and teachers in remote locations. It is a hands-on, iteratively developed project that has involved undergraduate students in all facets and at every stage of development. This talk will examine the design, development, and deployment of a for-the-field technology that looks simple but has a quite complex background.
Laura Hosman is Assistant Professor at Arizona State University, holding a joint appointment in the School for the Future of Innovation in Society and in The Polytechnic School. Her work is action-oriented and focuses on the role for information and communications technology (ICT) in developing countries. Presently, she focuses on ICT-in-education projects, and brings her passion for experiential learning to the classroom by leading real-world-focused, project-based courses that have seen student-built technology deployed in schools in Haiti, Vanuatu, Micronesia, Samoa, and Tonga.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
Rebecca Kennison, who is the Principal of K|N Consultants, the co-founder of the Open Access Network; and was was the founding director of the Center for Digital Research and Scholarship, gave this talk on Come Together Right Now: An Introduction To The Open Access Network as part of the Program on Information Science Brown Bag Series.
Gary Price, MIT Program on Information ScienceMicah Altman
Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Needs for Data Management & Citation Throughout the Information Lifecycle
1. Prepared for
NISO Forum:
Tracking it Back to the Source: Managing and Citing Research Data
September 2012
Needs for Data Management &
Citation Throughout the Information
Lifecycle
Micah Altman
Director of Research, MIT Libraries
2. Collaborators and Co-Conspirators
• Jonathan Crabtree, Merce Crosas, Gary King, Tom
Lipkis, Nancy McGovern, John Willinsky
• Research Support
– Library of Congress (PA#NDP03-1),
– National Science Foundation (DMS-0835500, SES 0112072)
– Institute for Museum and Library Services (LG-05-09-0041-09)
– Sloan Foundation
– Amazon Web Services
– Massachusetts Institute of Technology
Needs for Data Management & Citation 2
3. Related Work
Reprints available from:
http://maltman.hmdc.harvard.edu
• Altman, M. 2012. Data Citation in The Dataverse Network ®. In P. F. Uhlir (Ed.), Developing Data
Attribution and Citation Practices and Standards: Report from an International Workshop (p.
Forthcoming). National Academies Press. Forthcoming.
• Altman, M., & Crabtree, J. 2011. Using the SafeArchive System : TRAC-Based Auditing of LOCKSS.
Archiving 2011 (pp. 165–170). Society for Imaging Science and Technology.
• M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009.
"Digital preservation through archival collaboration: The Data Preservation Alliance for the Social
Sciences." The American Archivist. 72(1): 169-182 M. Altman, 2008, "A Fingerprint Method for
Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software
Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and
Software Engineering 2007) , Springer-Verlag.
• M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”,
D-Lib, 13, 3/4 (March/April).
Needs for Data Management & Citation 3
4. Preview
• Principled approach to data management
• Lifecycle data management planning
• Lifecycle data management tracking
• Lifecycle data management infrastructure
• [Exemplar Projects]
Needs for Data Management & Citation 4
6. “Data science is suddenly sexy –
does that mean data is the new
black?”
Needs for Data Management & Citation 6
7. Valuable Data is Lost
• Researchers lack Examples
archiving capability Intentionally Discarded: “Destroyed, in accord with
[nonexistent] APA 5-year post-publication rule.”
• Incentives for data Unintentional Hardware Problems “Some data were
sharing are weak collected, but the data file was lost in a technical
malfunction.”
Acts of Nature The data from the studies were on punched
cards that were destroyed in a flood in the department
in the early 80s.”
Discarded or Lost in a Move “As I retired ….
Unfortunately, I simply didn’t have the room to store
these data sets at my house.”
Obsolescence “Speech recordings stored on a LISP
Machine…, an experimental computer which is long
obsolete.”
Simply Lost “For all I know, they are on a [University]
server, but it has been literally years and years since
the research was done, and my files are long gone.”
Research by:
Needs for Data Management & Citation 7
8. Unpublished Data Ends up in the “Desk Drawer”
• Null results are less likely to be published
• Outliers are routinely discarded
Daniel
Schectman’s
Lab Notebook
Providing
Initial
Evidence of
Quasi Crystals
Needs for Data Management & Citation 8
9. Data Behind Publications Unavailable for
Review, Reuse, Replication
Needs for Data Management & Citation 9
10. Model Science
“Citations to unpublished data and personal
communications cannot be used to support
claims in a published paper”
“All data necessary to understand, assess,
and extend the conclusions of the
manuscript must be available to any reader
of Science.”
Needs for Data Management & Citation 10
11. Compliance with Policies is Low
Compliance is low even in
best examples of journals
Checking compliance
manually is tedious,
doesn’t scale
Needs for Data Management & Citation 11
12. Special Challenges for Long-Term Access
to New Forms of Data
• Some Examples
– GIS and geospatial trails
– Facebook & social networks
– Text: blogs, tweets
– Cell phone data
• Challenges
– Proprietary – intellectual Source: [Calberese 2008]
property
– Size
– Dynamic content
– Fixity
– Format Needs for Data Management & Citation 12
14. “The published article is not scientific output
–
it’s a summary of scientific output.”
-- corollary of Buckheit & Donaho 1995
Needs for Data Management & Citation 14
15. Information Lifecycle
Long-term Creation/Collecti
access on
Modeling
Re-use
• Scientific Storage/I
• Educational ngest
• Scientometric
• Institutional
External
dissemination/publicati Processing
on
Internal
Analysis
Sharing
Needs for Data Management & Citation 15
16. Stakeholders
Data
Consumers Long- Sources/Su
Creation/C bjects
term
ollection
access
Data
Modeling
Archives/ Storage/
Publisher Re-use
Researchers Ingest
Research Research
Sponsors Organizations
External
dissemination/ Processing
publication
Scholarly Internal
Analysis
Publishers Sharing
Service/Infras
tructure
Needs for Data Management & Citation
Providers 16
17. Legal Requirements and Rights
Contract Intellectual Property
Trade
Secret Intellectual
Contract Click-Wrap Patent
Attribution
TOU
License Moral Rights
Modeling
Database Rights
Journal Funder Open Copyright DMCA Trademar
Replication Access k
Requirement Fair Use Rights of
Common
s Publicity
Rule
HIPAA 45 CFR 26 Privacy
FOIA EU Privacy
FERPA Torts
Directive (Invasion,
State Defamation)
FOI CIPSEA
Potentially
Laws State Harmful
Privacy Laws (Archeologic
al Sites,
Classifie
Sensitive Animal
butd Testing, …)
Access EA Confidentiality
Unclassifie
Rights d R
ITAR
18. Stakeholders, Rights and Requirements
Contract Intellectual Property
Trade
Secret Intellectual
Contract Click-Wrap Scholarly Patent
Publisher Attribution
TOU
License s Moral Rights
Modeling
Consumers
- Secondary research
- Participative Science
- - Public policy uses
Database Rights
Journal Funder Open Copyright
Infrastructure/Serv DMCA Trademar
Replication Access Primary
ice Providers k
Requirement Fair Use Rights of
Researchers
Common
s Publicity
Research HIPAA
HIPAA Rule
FOIA Organizations 45 CFR 26 Privacy
EU Privacy Torts
FERPA FERPA
Directive (Invasion,
State Data Archives CIPSEA Defamation)
FOI Laws State Potentially
Privacy Laws Harmful
Classifie (Archeologic
Research al Sites,
Sponsors Sensitive Sources/S
d
Animal
but ubjects Testing, …)
Access Unclassifie Confidentiality
Rights d
19. Stakeholder Drivers per Stage of Information Lifecycle
Stage Actors Legal Constraint Concerns
Research Subjects - Consent/contract - Public benefit
Proposal, - Privacy
Design and - Future access to own
Modeling
Data information
Collection Sources - Intellectual - Business confidentiality
Property - IP
- Contract - Profit from licenses
Funder - Open Access - Public benefit
- Confidentiality - Policy Relevance
- Reproducible Research
- Future access
Primary - Confidentiality - Publication potential
Researcher - Contract - Compliance with
- IP institutional/funder
requirements
Research - Confidentiality - Compliance with funder
Institution - Contract requirements
- IP
Needs for Data Management & Citation - License, IP, confidentiality
19
compliance
20. Stakeholder Drivers per Stage of Information Lifecycle
Stage Actors Legal Constraint Concerns
Data Storage, Primary - Confidentiality - Publication potential
Analysis Researcher - Contract - Compliance with
(Pre-publication) - IP institutional/funder
Modeling
requirements
Research - Confidentiality - License, IP,
Institution - Contract confidentiality
- IP compliance
- Records management
Service - Contract - Contract
Providers - (Selected Cases) - Service business
Confidentiality model
Requirements - Service deployment
Needs for Data Management & Citation 20
21. Stakeholder Drivers per Stage of Information Lifecycle
Stage Actors Legal Constraint Concerns
Publication Primary Compliance for: - Scholarly attribution/credit
Researcher - Source/subjects - Promote use of research
- Sponsor - Track use/impact of research
- Host institution
Modeling
- Publisher
Sponsor - Track research products
- Track compliance
- Track use/impact
Research - Sponsor compliance - Track OA products
Institution - Records management
- Intellectual property
Scholarly - IP - Impact/use
/Journal - Contract - Profit/business model
Publisher - Replicability
Data - IP - Profit/business model
Publisher - Replicability
- Connection to publication
Needs for Data Management & Citation 21
22. Stakeholder Drivers per Stage of Information Lifecycle
Stage Actors Legal Constraint Concerns
Re(use) Research - Access Rights - Provenance
Reader
Modeling
Secondary - Access rights - Replicability
Researcher - Confidentiality - Data reintegration/reanalysis
- Contract - Linking publications and data
- Provenance
“Citizen/Co Access Rights - Data
mmunity redissemination/reanalysis
Scientist” - Linking publications and data
Public Policy Access Rights - Provenance
- Replicability
- Linking publications and data
Education Access Rights - “Classroom” use
/teaching - MOOC use
Needs for Data Management & Citation 22
24. Some Formal “DMP” Requirements
• The Final NIH Statement on Sharing Research Data
– was published in the NIH Guide on February 26, 2003.
“Starting with the October 1, 2003 receipt date, investigators submitting an
NIH application seeking $500,000 or more in direct costs in any single year
Planning
are expected to include a plan for data sharing or state why data sharing is
not possible. “
– No later than the main findings from the final data set are
accepted for publication
• NSF, All proposals must (as of 1/1/2011) include a data
management plan.
– Specific requirements vague, for the most part:
“will be determined by the community of interest through the process of peer review and
program management.”
• Wellcome Trust:
– “ will review data management and sharing plans, and any costs
involved in delivering them, as an integral part of the funding
decision”
Needs for Data Management & Citation 24
25. DMP Goals
• Orchestrate data for current use
• Control disclosure
• Compliance with contracts, regulations, law,
Planing
and policy
• Maximize value of information assets
• Ensure short term and long term
dissemination
Needs for Data Management & Citation 25
26. DMP Elements
• Orchestrate data for current use – Data description
– Quality Assurance – Data value
– Storage, backup, replication, and – Relation to collection
versioning – Relation to evidence base
– Data Formats – Budget
– Data Organization
Planning
– Budget • Ensure short term and long term
– Metadata and documentation dissemination
– Data description
• Control disclosure – Institutional Archiving Commitments
– Access and Sharing – Audience
– Intellectual Property Rights – Access and Sharing
– Legal Requirements – Data Formats
– Security – Data Organization
– Metadata and documentation
• Compliance with contracts, – Budget
regulations, law, and policy
– Access and Sharing
– Adherence
– Responsibility
– Ethics and privacy
– Security
• Value of information assets
Needs for Data Management & Citation 26
27. DMP Details
• Sharing – Restrictions on use
– Plans for depositing in an existing public database • Budget
– Access procedures – Cost of preparing data and documentation
– Embargo periods – Cost of storage and backup
– Access charges – Cost of permanent archiving and access
– Timeframe for access • Intellectual Property Rights
– Technical access methods – Entities who hold property rights
– Restrictions on access – Types of IP rights in data
• Long term access – Protections provided
(Preservation) – Dispute resolution process
–
Planning
Requirements for data destruction, if applicable • Legal Requirements
– Procedures for long term preservation – Provider requirements and plans to meet them
– Institution responsible for long-term costs of data preservation – Institutional requirements and plans to meet them
– Succession plans for data should archiving entity go out of existence • Responsibility
• Formats – Individual or project team role responsible for data management
– Generation and dissemination formats and procedural justification – Qualifications, certifications, and licenses of responsible parties
– Storage format and archival justification • Ethics and privacy
– Format documentation – Informed consent
• Metadata and documentation – Protection of privacy
– Internal and External Identifiers and Citations – Data use agreements
– Metadata to be provided – Other ethical issues
– Metadata standards used • Adherence
– Planned documentation and supporting materials – When will adherence to data management plan be checked or
– Quality assurance procedures for metadata and documentation demonstrated
• Data Organization – Who is responsible for managing data in the project
– File organization – Who is responsible for checking adherence to data management plan
– Naming conventions – Auditing procedures and framework
• Storage, backup, replication, and versioning • Value of information assets
– Facilities – Project use value
– Methods – Institutional audience and uses
– Procedures – Public audience and uses
– Frequency – Relation to institutional collection
– Replication – Relation to disciplinary evidence base
– Version management – Cost of re-creating data
– Recovery guarantees
• Security
– Procedural controls
– Technical Controls
– Confidentiality concerns
– Access control rules
Needs for Data Management & Citation 27
28. Approaching Requirement Overlap
• Sanity-check DMP details with lifecycle questions:
– Who wants it?
Planning
– What do they need it for?
– When will it be used?
• Be conscious of elements that serve multiple goals / or lifecycle
– Metadata/documentation
– Identifiers
– Budgets
– Formats
– IP Rights and confidentiality restrictions
– Responsibilities/Adherence
• Use tracking tools and methods throughout lifecycle
This Way…
Needs for Data Management & Citation 28
30. What do we track?
What tools and methods provide technical leverage or
incentives to management across lifecycle stages and among
actors?
Tracking
• Identification – identifiers, references, citations
• Provenance – relationship of delivered data to history of inputs and
modifications and actors responsible for these ; revision control; versioning
• Authenticity: assertions about the provenance of the records
• Respect des fonds: assertions about the original organization of the records
• Chain of custody: assertions about the ownership of the records
• Integrity: assertions about the management of the records; fixity of bits; fixity of
semantics
• Auditing: verification of properties & policy compliance
Sources: Bulleted list of attributes adapted from Moore 2008
Needs for Data Management & Citation 30
31. Tracking Across Information Lifecycle
Long-term Creation/Collecti
access on
identifiers
Tracking
Storage/I
Re-use
ngest
Metadata for:
Integrity,
Provenance,
citation Custody
External
dissemination/publicati Processing
on
Internal
Analysis
Sharing 31
32. Data Citation: a Point of Leverage
• Services
– Identifiers to specific fixed versions of data are needed to
establish unambiguous chains of provenance
– Identifiers that can be globally resolved to machine-
understandable metadata and to identified object are needed to
Tracking
building generalized access and analysis services
– Persistence of identifiers are needed to maintain long-term
access
• Incentives
– Scholarly credit (intellectual attribution) is a large motivator for
many researchers
– citation creates incentive for researchers to publish data
– Scholars also comply with enforceable journal policies
-- requiring data citation is a light-weight method to make data
access policies auditable
– Impact/usage is a motivator for public research funders – data
citation provides foundation for measures of usage and impact
Needs for Data Management & Citation 32
33. Emerging Practices for Data Citation
• Publishers
– OECD iLibrary
– Thomson Reuters
Tracking
Data Citation Index
• Data archives
– Dataverse Network
– Data-PASS
• Harmonization
efforts
– DataCite
– NAS BRDI
– ICSU/Co-Data
• Discipline specific
Needs for Data Management & Citation 33
34. Identifier and Citation Use Cases
Attribution
• Provide scholarly attribution
• Provide legal attribution
• Identify contributors to data
Verification Discovery
• Associate work with version • Locate data via identifier
of evidence used • Locate data integral to article
• Verify fixity of bits • Locate works related to data
• Verify fixity of information – articles, derivatives,
• Verify “authenticity” of work sources
Access Persistence
• Access to surrogate • Does evidence persists as
long as assertions based on
• On-line access to object
it?
• Machine understandability
• Is durability of evidence
• Long-term understandability transparent?
Needs for Data Management & Citation 34
35. Emerging Principles for Data Citation
• Data citations should be first class objects for publication
-- appear with citations to other works; should be as easy
Tracking
to cite as other works
• Citations should persist and enable access to fixed version of data at least
as long as citing work
• Citations should persist and enable access to fixed version of data at least
as long as the citing work exists.
• Citations should support unambiguous attribution of credit to all contributors,
possibly through the citation ecosystem.
Needs for Data Management & Citation 35
36. Fixity
Tracking
• Are files, bitstreams corrupted?
• Do semantics remain the same over time, across formats, software
analysis systems?
Some semantic approaches…
Universal Numeric Fingerprint - Canonicalization Perceptual Signatures –
Characterization of Significant Properties
Needs for Data Management & Citation 36
37. Audit [aw-dit]:
An independent evaluation of
records and activities to
Tracking
assess a system of controls
Fixity mitigates risk only if used
for auditing.
38. Example:
Functions of Storage Auditing
• Detect
corruption/deletion of content
Tracking
• Verify
compliance with storage/replication
policies
• Prompt
repair actions
39. Audit Design Choices
• Audit regularity and coverage:
on-demand (manually); on event;
randomized sample;
scheduled/comprehensive
Tracking
• Audit procedure, algorithms, certifying
authority
• Auditing scope:
integrity of object; integrity of collection;
integrity of network; policy compliance;
public/transparent auditing
• Trust model
• Threat model
41. Many Tools, Few Solutions
“Poor carpenters blame their tools”
–Proverb
“If all you have is a hammer, everything looks like a nail”
– Another Proverb
“Ultimately, some people need holes – but no one needs a drill. ”
– Yet Another Proverb
Infrastructure
• Many scientific tools are embedded in needs,
perspectives, and practices of specific disciplines
• Identify common requirements
• Identify gaps across lifecycle stages and among actors
Needs for Data Management & Citation 41
42. Core Requirements for Data Sharing Infrastructure
• Stakeholder incentives
– recognition; citation; payment; compliance; services
Infrastructure
• Dissemination
– access to metadata; documentation; data
• Access control
– authentication; authorization; rights management
• Provenance
– chain of control; verification of metadata, bits, semantic content
• Persistence
– bits; semantic content; use
• Legal protection
– rights management; consent; record keeping; auditing
• Usability
– discovery; deposit; curation; administration; collaboration
• Business model
Sources: King 2007; ICSU 2004; NSB 2005
Needs for Data Management & Citation 42
43. Mind the Gaps
Lifecycle Strengths Other Gaps
dissemination
collection
analysis
storage
reuse
Scientific - Close integration across supported - Discipline-centric
lifecycle - Doesn’t address most storage
Workflow
- Perceived as useful service by requirements (replication, access
Software researchers control)
(e.g. Taverna) - High Performance
Storage - Integration across supported lifecycle - Loose integration of analysis,
- Storage is perceived as useful service insufficient for reproducibility
Grid/VRE
by researchers
(e.g. Irods) - High performance performance
Institutional - Low cost - Access and discovery mechanisms
- Institutional commitment to long- usually tailored to publications, not
Repository data
term access
(e.g. Dspace)
Reproducible - Close integration of analysis and - Addresses replication but not
scientific publication reuse for secondary analysis,
Publications integration
- Reduces risk of embarrassment
Systems when working with “co-authors”
(e.g. StatWeave) - Ensures one form of reproducibility
(calibration, mechanical replicability)
“Data Archive” - Richer support for reuse - Varied models – curated database;
- Often supports cross-discipline “virtual archive”, disciplinary
discovery; long-term access repository
- Often discipline-centric
Needs for Data Management & Citation 43
45. • Audit Data Replication & Integrity
Policies
Automatic Auditing of Data
Examplars
Replication & Integrity
Policies
safearchive.org
Needs for Data Management & Citation 45
46. The Distributed Content Replication Problem
• We hold digital assets we A Partial Solution: LOCKSS
Self-contained OSS
wish to preserve
Harvests resources via open
• Many of these assets are interfaces
not replicated
Replicated through secure P2P
• Even when replicated, protocol
vulnerable to single Self-repairing
Examplars
points of failure because Zero trust
replicas are managed by Used by hundreds of institution
single institution for collaborative preservation
What we needed…
Auditing – how many replicates exist, where & are they
current?
Policy – prove replication are consistent with policy, like
TRAC?
Collaboration – coordinateforwith partners to replicate content?46
Needs Data Management & Citation
47. Resilience of peer-to-peer with
the Accountability of centralized system
Examplars
Facilitating collaborative replication and preservation with cyberinfrastructure …
• Collaborators declare explicit non-uniform resource commitments
• Policy records and schematizes commitments, desired TRAC replication properties
• Storage layer provides replication, integrity, freshness, versioning
• SafeArchive software provides monitoring, auditing, transparency, and provisioning
• Content is harvested through HTTP (LOCKSS) or OAI-PMH
• Integration of LOCKSS, Institutional Repositories, TRAC
Needs for Data Management & Citation 47
48. ORCID is an international, interdisciplinary, open, and not-for-profit
organization created for the benefit of all stakeholders, including research
Examplars
institutions, funding organizations, publishers, and researchers to enhance
the scientific discovery process and improve collaboration and the efficiency
of research funding.
ORCID aims to solve the name ambiguity problem in scholarly
communications by creating a registry of persistent unique identifiers for
individual researchers and an open and transparent linking mechanism
between ORCID, other ID schemes, and research objects such as publications,
grants, and patents.
http://orcid.org
Needs for Data Management & Citation 48
49. ORCID Launch to Public in October
ORCID Launch Partners Program include research institutions, publishers, research funders, data
repositories, and third party providers, such as:
The American Physical Society, Aries Systems, Avedas, Boston University, the California Institute of
Technology, CrossRef, Elsevier, Faculty of 1000, figshare, Hindawi Publishing Corporation, KNODE, Nature
Publishing Group, SafetyLit, Symplectic, Thomson Reuters, Total-Impact, and Wellcome Trust.
Examplars
At Launch, the ORCID Registry will:
• Allow researchers and scholars to register for an ORCID identifier, create ORCID records, and
manage their privacy settings
• Contain ORCID records created by universities on behalf of their researchers and scholars
• Allow researchers and scholars to link their ORCID record external identifiers, including Scopus
and ResearcherID
• Facilitate synchronization of ORCID identifier record data with external systems including
Scopus
• Bi-directionally link to a number of author profile and manuscript submission, including the
American Physical Society, Aries Systems, Hindawi Publishing Corporation, Nature Publishing
Group, and Scholar One Manuscripts
• Allow researchers and scholars to search and upload publication metadata from CrossRef
• (Soon after launch) have the ability to link to grant application systems
Needs for Data Management & Citation 49
50. Data Management Workflows
for Open Access Journals
Examplars
+
http://bit.ly/DVNOJS
Needs for Data Management & Citation 50
51. Embed Real Data Archives in Journals
• Embed remotely managed
data archive in OJS journal
• Replaces “supplemental
materials”
• Ads
– Online analysis
Examplars
– Independent storage
– Persistent identifiers and
citation
– Data versioning
– Enhanced discoverability
and interoperability
– Format normalization
– Fixity and replication
Needs for Data Management & Citation 51
52. Integrated Policies, Workflow, Access
• OJS and DVN
– Support workflows
– Enforce policies
– Disseminate content
• Integrate policies for
– Access and data license
Examplars
– Embargoes
– Citation
• Coordinate
– Submission
– Review
– Publication
• Link
– Content
– Subscriptions & notifications
– Usage Metrics
Needs for Data Management & Citation 52
54. How will we see the geography of science e,
when we reveal how research connects through
data?
Research & Node Layout: Kevin Boyack and Dick
Klavans (mapofscience.com); Data: Thompson ISI;
Graphics & Typography: W. Bradford Paley
(didi.com/brad); Commissioned Katy Börner
(scimaps.org)
Seed Magazine, Mar 7, 2007
http://seedmagazine.com/content/article/scientific_m
ethod_relationships_among_scientific_paradigms/
Needs for Data Management & Citation 54
55. Summary
• Principled approach to data management
– Follow information through information lifecycle
– Assess stakeholder requirements
– Track management, use, impact across lifecycle
• Data management planning goals
– Orchestrate data for current use
– Protect against disclosure
– Compliance with contracts, regulations, law, and policy
– Maximize value of information assets
– Ensure short term and long term dissemination
• Lifecycle data management tracking
– Identification – identifiers, references, citations
– Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for
these
– Authenticity: assertions about the provenance of the records
– Chain of custody: assertions about the ownership of the records
– Integrity: assertions about the management of the records; fixity of bits; fixity of semantics
– Auditing: verification of properties & policy compliance
• Data citation is a key leverage point
– Services: establish provenance; access; long-term preservation
– Incentives: scholarly credit; reproducible research policies; impact/usage analysis
– Data citations should be first class objects for publication -- appear with citations to other works;
should be as easy to cite as other work
Needs for Data Management & Citation 55
56. Additional References
• Buckheit J, Donoho DL. Wavelab and reproducible research. In:
Antoniadis A, editor. Wavelets and Statistics. New York, NY:
Springer; 1995. p. 55-81.
• International Council For Science (ICSU) 2004. ICSU Report of the
CSPR Assessment Panel on Scientific Data and Information. Report.
• King, Gary. 2007. "An Introduction to the Dataverse Network as an
Infrastructure for Data Sharing." Sociological Methods and Research
36
• Moore, M. 2008, Towards a Theory of Digital Preservation,
International Journal of Digital Curation 1(3)
• National Science Board (NSB), 2005, Long-Lived Digital Data
Collections: Enabling Research and Education in the 21rst Century,
NSF. (NSB-05-40).
Needs for Data Management & Citation 56
This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Most of the different stakeholders have stronger relationships/stakes with research at different stages. But researchers and research institutions are in the middle – they have a strong stake in most stagesResearchers are more directly concerned with collection, processing, analysis, dissemination. Organizations have a higher stake in internal sharing, re-use, long-term access.
This section is an a more detailed deep-dive into drivers at major stages of the information lifecycle. It is not intended to be part of the main presentation – but could be used to respond to questions, or to focus on a particular stage.