This discussion, covened by the Dubai Future Foundation, focusses on identifying the significance of the concept of well-being for social-science and policy; and the opportunities to measure it at scale.
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
Libraries enable patrons to access a wide range of information, but much of the access to this information is now directly managedy publishers. This has lead to a significant gap across library values, patrons perception of privacy, and effective privacy protection for access to digital resources.
In the work included below, and presented at NERCOMP 2019, we review privacy principles based on ALA, IFLA, and NISO policies. We then organizing and comparing high level privacy protections required by ALA checklist, NISO, and GDPR. This framework of principles and controls is then used to score the privacy policies and practices of major vendors of research library content. We evaluate each element of the vendors privacy policy, and use instrumented browsers to identify the types of tracking mechanisms used by different vendors. We use this set of privacy scores to support analyses of change over time, and of potential gaps between patron expectations and privacy policies and practices.
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses.
Managing Confidential Information – Trends and ApproachesMicah Altman
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management -- cloud storage, "big" data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Reproducibility from an infomatics perspectiveMicah Altman
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. Over the last several years, I've taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
These slides sketch of this approach, and were presented at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks. See: informatics.mit.edu
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
This class focuses on the tools and good practices for storing confidential data, sharing data for collaboration, and publishing data or derivative results for broad use. Topics covered in this class include: an overview of information security standards and frameworks; information security core practices (credentials, authentication, authorization, and auditing); information partitioning and secure linking; file, disk, and network encryption tools and practices; cloud storage practices for confidential information; data “de-identification” tools and practices; statistical disclosure limitation approaches and tools; and data use agreements.
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
This presentation was provided by Glenn Hampson of Open Scholarship Initiative, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
This discussion, covened by the Dubai Future Foundation, focusses on identifying the significance of the concept of well-being for social-science and policy; and the opportunities to measure it at scale.
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
Libraries enable patrons to access a wide range of information, but much of the access to this information is now directly managedy publishers. This has lead to a significant gap across library values, patrons perception of privacy, and effective privacy protection for access to digital resources.
In the work included below, and presented at NERCOMP 2019, we review privacy principles based on ALA, IFLA, and NISO policies. We then organizing and comparing high level privacy protections required by ALA checklist, NISO, and GDPR. This framework of principles and controls is then used to score the privacy policies and practices of major vendors of research library content. We evaluate each element of the vendors privacy policy, and use instrumented browsers to identify the types of tracking mechanisms used by different vendors. We use this set of privacy scores to support analyses of change over time, and of potential gaps between patron expectations and privacy policies and practices.
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses.
Managing Confidential Information – Trends and ApproachesMicah Altman
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management -- cloud storage, "big" data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Reproducibility from an infomatics perspectiveMicah Altman
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. Over the last several years, I've taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
These slides sketch of this approach, and were presented at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks. See: informatics.mit.edu
July IAP: Confidential Information - Storage, Sharing, & Publication - with M...Micah Altman
This class focuses on the tools and good practices for storing confidential data, sharing data for collaboration, and publishing data or derivative results for broad use. Topics covered in this class include: an overview of information security standards and frameworks; information security core practices (credentials, authentication, authorization, and auditing); information partitioning and secure linking; file, disk, and network encryption tools and practices; cloud storage practices for confidential information; data “de-identification” tools and practices; statistical disclosure limitation approaches and tools; and data use agreements.
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
This presentation was provided by Glenn Hampson of Open Scholarship Initiative, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Assessing Digital Output in New Ways
Mike Taylor, Research Specialist, Elsevier Labs
Presented during NISO/BISG 8th Annual Changing Standards Landscape on June 27, 2014
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkSimon Knight
Talk presented at #ICLS2016 presented in Singapore. I discuss levels of description as sites of epistemic cognition focusing on writing and use of textual features to associate rubric scores with epistemic cognition.
My thanks to my collaborators (listed on the paper) particularly Laura Allen, who also generously let me adapt the later slides on NLP studies of writing.
Abstract: Literacy, encompassing the ability to produce written outputs from the reading of multiple sources, is a key learning goal. Selecting information, and evaluating and integrating claims from potentially competing documents is a complex literacy task. Prior research exploring differing behaviours and their association to constructs such as epistemic cognition has used ‘multiple document processing’ (MDP) tasks. Using this model, 270 paired participants, wrote a review of a document. Reports were assessed using a rubric associated with features of complex literacy behaviours. This paper focuses on the conceptual and empirical associations between those rubric-marks and textual features of the reports on a set of natural language processing (NLP) indicators. Findings indicate the potential of NLP indicators for providing feedback regarding the writing of such outputs, demonstrating clear relationships both across rubric facets and between rubric facets and specific NLP indicators.
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Kristin Eschenfelder, University of Wisconsin-Madison
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
"Reproducibility from the Informatics Perspective"Micah Altman
Dr. Altman will provide expert comment on the need for informatics modeling as part of the National Academies workshop: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
This workshop focuses on the topic of addressing statistical challenges in assessing and fostering the reproducibility of scientific results by examining three issues from a statistical perspective: the extent of reproducibility, the causes of reproducibility failures, and potential remedies.
This presentation was provided by Mike Taylor of Digital Science during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
This presentation was provided by Stacy Konkiel of Altmetric during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
This presentation was provided by William Gunn of Elsevier during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
This presentation was provided by Gabriela Mejias of ORCID, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Many people are surprised to learn that, even though they don’t participate on social media and only use their computers for work, they have a digital life. This is partly because publicly-available information about you is collected from the internet, and this information is used by companies to create records about you. Join Kimberley Barker for an overview of topics such as digital privacy, online reputation management, personal branding, and online identity.
Krishnaprasad Thirunarayan, Trust Management: Multimodal Data Perspective,
Invited Tutorial, The 2015 International Conference on Collaboration
Technologies and Systems (CTS 2015), June 2015
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...Micah Altman
Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University, gave this talk on Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Ophir rebuts the myth that "google has solved search", and discusses the challenges of searching for complex object, through hidden collections, and in harsh environments For more see: http://informatics.mit.edu/blg
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
This presentation was provided by John Wilbanks of Sage Bionetworks, during the NISO Symposium, Privacy Implications of Research Data held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
Reputation Management for Early Career ResearchersMicah Altman
In the rapidly changing world of research and scholarly communications, researchers are faced with a fast growing range of options to publicly disseminate, review, and discuss research—options which will affect their long-term reputation. Early career scholars must be especially thoughtful in choosing how much effort to invest in dissemination and communication, and what strategies to use.
Dr. Micah Altman briefly reviews a number of bibliometric and scientometric studies of quantitative research impact, a sampling of influential qualitative writings advising this area, and an environmental scan of emerging researcher profile systems. Based on this review, and on professional experience on dozens of review panels, Dr. Altman suggests some steps early career researchers may consider when disseminating their research and participating in public reviews and discussion.
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
This is ICPSR's core workshop deck designed to introduce, remind, and refresh your knowledge of ICPSR. It contains four "tours" or sub-presentations describing ICPSR's general reason for being, it's social and behavioral research data complete with search strategies, its training, educational, and instructional resources, and its data management and curation services, data repository options, and support resources (content and budget estimates) for those writing grant proposals.
This presentation was provided by Toby Green of Coherent Digital, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Data citation supports attribution, provenance, discovery, provenance, and persistence. It is not (and should not be) sufficient for all of these things, but its an important component. In the last 2 years, there have been several major efforts to standardize data citation practices, build citation infrastructure, and analyze data citation practices.
This session presented as part of the the Program on Information Science seminar series, examines data citation from an information lifecycle approach: what are the use cases, requirements and research opportunities. And the session will also discuss emerging infrastructure and standardization efforts around data citation.
A number of principles have emerged for citation -- the most central is that data citations should be treated consistently with citations to other objects:Data citations should at least provide the minimal core elements expected in other modern citations; should be included in the references section along with citations to other elements; and indexed in the same way.
Adoption of data citation by journals can provide positive and sustainable incentives for more reproducible science and more complete attribution. This would act to brighten the dark matter of science -- revealing connections among evidence bases that are not now visible through citations of articles.
Privacy in Research Data Managemnt - Use CasesMicah Altman
From Integrating Approaches to Privacy across the Research Lifecycle http://privacytools.seas.harvard.edu/fall-2013-workshop
This workshop will consider how emerging tools and perspectives from a variety of disciplines, such as computer science, social science, law, and the health sciences, should be integrated in the management of confidential research data. Multidisciplinary discussion groups will grapple with these issues in the context of exemplar research use cases.
Assessing Digital Output in New Ways
Mike Taylor, Research Specialist, Elsevier Labs
Presented during NISO/BISG 8th Annual Changing Standards Landscape on June 27, 2014
Writing Analytics for Epistemic Features of Student Writing #icls2016 talkSimon Knight
Talk presented at #ICLS2016 presented in Singapore. I discuss levels of description as sites of epistemic cognition focusing on writing and use of textual features to associate rubric scores with epistemic cognition.
My thanks to my collaborators (listed on the paper) particularly Laura Allen, who also generously let me adapt the later slides on NLP studies of writing.
Abstract: Literacy, encompassing the ability to produce written outputs from the reading of multiple sources, is a key learning goal. Selecting information, and evaluating and integrating claims from potentially competing documents is a complex literacy task. Prior research exploring differing behaviours and their association to constructs such as epistemic cognition has used ‘multiple document processing’ (MDP) tasks. Using this model, 270 paired participants, wrote a review of a document. Reports were assessed using a rubric associated with features of complex literacy behaviours. This paper focuses on the conceptual and empirical associations between those rubric-marks and textual features of the reports on a set of natural language processing (NLP) indicators. Findings indicate the potential of NLP indicators for providing feedback regarding the writing of such outputs, demonstrating clear relationships both across rubric facets and between rubric facets and specific NLP indicators.
RDAP 16: Sustainability of data infrastructure: The history of science scienc...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Part of Panel 2, Sustainability
Presenter:
Kristin Eschenfelder, University of Wisconsin-Madison
Panel Leads:
Kristin Briney, University of Wisconsin-Milwaukee & Erica Johns, Cornell University
"Reproducibility from the Informatics Perspective"Micah Altman
Dr. Altman will provide expert comment on the need for informatics modeling as part of the National Academies workshop: Statistical Challenges in Assessing and Fostering the Reproducibility of Scientific Results
This workshop focuses on the topic of addressing statistical challenges in assessing and fostering the reproducibility of scientific results by examining three issues from a statistical perspective: the extent of reproducibility, the causes of reproducibility failures, and potential remedies.
This presentation was provided by Mike Taylor of Digital Science during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
This presentation was provided by Stacy Konkiel of Altmetric during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
This presentation was provided by William Gunn of Elsevier during the NISO Virtual Conference, Advancing Altmetrics, held on Wednesday, December 13, 2017.
This presentation was provided by Gabriela Mejias of ORCID, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Many people are surprised to learn that, even though they don’t participate on social media and only use their computers for work, they have a digital life. This is partly because publicly-available information about you is collected from the internet, and this information is used by companies to create records about you. Join Kimberley Barker for an overview of topics such as digital privacy, online reputation management, personal branding, and online identity.
Krishnaprasad Thirunarayan, Trust Management: Multimodal Data Perspective,
Invited Tutorial, The 2015 International Conference on Collaboration
Technologies and Systems (CTS 2015), June 2015
MIT Program on Information Science Talk -- Ophir Frieder on Searching in Hars...Micah Altman
Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University, gave this talk on Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Ophir rebuts the myth that "google has solved search", and discusses the challenges of searching for complex object, through hidden collections, and in harsh environments For more see: http://informatics.mit.edu/blg
An invited talk in the Big Data session of the Industrial Research Institute meeting in Seattle Washington.
Some notes on how to train data science talent and exploit the fact that the membrane between academia and industry has become more permeable.
This presentation was provided by John Wilbanks of Sage Bionetworks, during the NISO Symposium, Privacy Implications of Research Data held on September 11, 2016 in conjunction with International Data Week in Denver, Colorado
Reputation Management for Early Career ResearchersMicah Altman
In the rapidly changing world of research and scholarly communications, researchers are faced with a fast growing range of options to publicly disseminate, review, and discuss research—options which will affect their long-term reputation. Early career scholars must be especially thoughtful in choosing how much effort to invest in dissemination and communication, and what strategies to use.
Dr. Micah Altman briefly reviews a number of bibliometric and scientometric studies of quantitative research impact, a sampling of influential qualitative writings advising this area, and an environmental scan of emerging researcher profile systems. Based on this review, and on professional experience on dozens of review panels, Dr. Altman suggests some steps early career researchers may consider when disseminating their research and participating in public reviews and discussion.
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
This is ICPSR's core workshop deck designed to introduce, remind, and refresh your knowledge of ICPSR. It contains four "tours" or sub-presentations describing ICPSR's general reason for being, it's social and behavioral research data complete with search strategies, its training, educational, and instructional resources, and its data management and curation services, data repository options, and support resources (content and budget estimates) for those writing grant proposals.
This presentation was provided by Toby Green of Coherent Digital, during the NISO hot topic virtual conference "Open Research." The event was held on November 17, 2021.
Data citation supports attribution, provenance, discovery, provenance, and persistence. It is not (and should not be) sufficient for all of these things, but its an important component. In the last 2 years, there have been several major efforts to standardize data citation practices, build citation infrastructure, and analyze data citation practices.
This session presented as part of the the Program on Information Science seminar series, examines data citation from an information lifecycle approach: what are the use cases, requirements and research opportunities. And the session will also discuss emerging infrastructure and standardization efforts around data citation.
A number of principles have emerged for citation -- the most central is that data citations should be treated consistently with citations to other objects:Data citations should at least provide the minimal core elements expected in other modern citations; should be included in the references section along with citations to other elements; and indexed in the same way.
Adoption of data citation by journals can provide positive and sustainable incentives for more reproducible science and more complete attribution. This would act to brighten the dark matter of science -- revealing connections among evidence bases that are not now visible through citations of articles.
Privacy in Research Data Managemnt - Use CasesMicah Altman
From Integrating Approaches to Privacy across the Research Lifecycle http://privacytools.seas.harvard.edu/fall-2013-workshop
This workshop will consider how emerging tools and perspectives from a variety of disciplines, such as computer science, social science, law, and the health sciences, should be integrated in the management of confidential research data. Multidisciplinary discussion groups will grapple with these issues in the context of exemplar research use cases.
State of the Art Informatics for Research Reproducibility, Reliability, and...Micah Altman
In March, I had the pleasure of being the inaugural speaker in a new lecture series (http://library.wustl.edu/research-data-testing/dss_speaker/dss_altman.html) initiated by the Libraries at the Washington University in St. Louis Libraries -- dedicated to the topics of data reproducibility, citation, sharing, privacy, and management.
In the presentation embedded below, I provide an overview of the major categories of new initiatives to promote research reproducibility, reliability, and reuse and related state of the art in informatics methods for managing data.
Slide deck from a presentation delivered at the University of Copenhagen Faculty of Law, 3 November 2017, concerning the relevance of the Knowledge Commons Research Framework to the study of biobank institutions.
INFORMATION WANTS SOMEONE ELSE TO PAY FOR IT : AS SCIENCE AND SCHOLARSHIP EVO...Micah Altman
Dr Altman provided this keynote plenary for the annual meeting of the 57th Annual Meeting of National Federation of Advanced Information Services (NFAIS)
More content is being created by scientists and scholars than ever -- and vastly greater collections of information are the subject of science as scholarship. Simultaneously, the community of users for and uses of this information are changing. This talk reflects on trends in the generation and use of durable information assets in scholarship and science, and on the changing relationship between consumers, purchasers and funders.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
Scientific Reproducibility from an Informatics PerspectiveMicah Altman
This talk, prepared for the MIT Program on Information Science, and updating a talk at the National Academies workshop on reproducibility, frames reproducibility from an informatics perspective
Ethical Priniciples for the All Data RevolutionMelissa Moody
A presentation by Stephanie Shipp, from the Research Highlights session at the 2019 Women in Data Science Charlottesville Conference. Hosted by the UVA Data Science Institute.
AAPOR - comparing found data from social media and made data from surveysCliff Lampe
This presentation was for the 2014 AAPOR conference, and deals with specific components of how "big data" from social media is different from data acquired through surveys.
There are many online and in-person courses available for librarians to learn about research data management, data analysis, and visualization, but after you have taken a course, how do you go about applying what you have learned? While it is possible to just start offering classes and consultations, your service will have a better chance of becoming relevant if you consider stakeholders and review your institutional environment. This lecture will give you some ideas to get started with data services at your institution.
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
Selecting efficient and reliable preservation strategiesMicah Altman
This article addresses the problem of formulating efficient and reliable operational preservation policies that ensure bit-level information integrity over long periods, and in the presence of a diverse range of real-world technical, legal, organizational, and economic threats. We develop a systematic, quantitative prediction framework that combines formal modeling, discrete-event-based simulation, hierarchical modeling, and then use empirically calibrated sensitivity analysis to identify effective strategies.
Presentation by Philip Cohen on collaborative work with Micah Altman as part of the MIT CREOS research talk series. Presented in fall 2018, in Cambridge, MA.
Contemporary journal peer review is beset by a range of problems. These include (a) long delay times to publication, during which time research is inaccessible; (b) weak incentives to conduct reviews, resulting in high refusal rates as the pace of journal publication increases; (c) quality control problems that produce both errors of commission (accepting erroneous work) and omission (passing over important work, especially null findings); (d) unknown levels of bias, affecting both who is asked to perform peer review and how reviewers treat authors, and; (e) opacity in the process that impedes error correction and more systematic learning, and enables conflicts of interest to pass undetected. Proposed alternative practices attempt to address these concerns -- especially open peer review, and post-publication peer review. However, systemic solutions will require revisiting the functions of peer review in its institutional context.
Presentation by Philip Cohen and Micah Altman on developing an exchange system for peer review in support for open science. Prepared for presentation at the ACRL-SSRC meeting on Open scholarship in the social sciences. Washington DC, Dec 2018
Redistricting in the US -- An OverviewMicah Altman
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
This presentation was prepared for the International Seminar on Electoral Districting, National Electoral Institute El Colegio de México. http://www.ine.mx/seminario-internacional-distritacion-electoral/
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
Scott Bradner is a Berkman Center affiliate who worked for 50 at Harvard in the areas of computer programming, system management, networking, IT security, and identity management. Scott Bradner was involved in the design, operation and use of data networks at Harvard University since the early days of the ARPANET and served in many leadership roles in the IETF. He presented the talk recorded below, entitled, A History of the Internet -- as part of Program on Information Science Brown Bag Series:
Bradner abstracted his talk as follows:
In a way the Russians caused the Internet. This talk will describe how that happened (hint it was not actually the Bomb) and follow the path that has led to the current Internet of (unpatchable) Things (the IoT) and the Surveillance Economy.
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
Cassidy Sugimoto is Associate Professor in the School of Informatics and Computing, Indiana University Bloomington, who researches within the domain of scholarly communication and scientometrics, examining the formal and informal ways in which knowledge producers consume and disseminate scholarship. She presented this talk, entitled Labor And Reward In Science: Do Women Have An Equal Voice In Scholarly Communication? A Brown Bag With Cassidy Sugimoto, as part of the Program on Information Science Brown Bag Series.
Despite progress, gender disparities in science persist. Women remain underrepresented in the scientific workforce and under rewarded for their contributions. This talk will examine multiple layers of gender disparities in science, triangulating data from scientometrics, surveys, and social media to provide a broader perspective on the gendered nature of scientific communication. The extent of gender disparities and the ways in which new media are changing these patterns will be discussed. The talk will end with a discussion of interventions, with a particular focus on the roles of libraries, publishers, and other actors in the scholarly ecosystem..
Utilizing VR and AR in the Library Space:Micah Altman
Matt Bernhardt is a web developer in the MIT libraries and a collaborator in our program. He presented this talk, entitled Reality Bytes - Utilizing VR and AR in The Library Space, as part of Program on Information Science Brown Bag Series.
Terms like "virtual reality" and "augmented reality" have existed for a long time. In recent years, thanks to products like Google Cardboard and games like Pokemon Go, an increasing number of people have gained first-hand experience with these once-exotic technologies. The MIT Libraries are no exception to this trend. The Program on Information Science has conducted enough experimentation that we would like to share what we have learned, and solicit ideas for further investigation.
For slides and comments see: http://informatics.mit.edu/blog
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
Catherine D'Ignazio is an Assistant Professor of Civic Media and Data Visualization at Emerson College, a principal investigator at the Engagement Lab, and a research affiliate at the MIT Media Lab/Center for Civic Media. She presented this talk, entitled, Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots as part of Program on Information Science Brown Bag Series.
Communities, governments, libraries and organizations are swimming in data—demographic data, participation data, government data, social media data—but very few understand what to do with it. Though governments and foundations are creating open data portals and corporations are creating APIs, these rarely focus on use, usability, building community or creating impact. So although there is an explosion of data, there is a significant lag in data literacy at the scale of communities and citizens. This creates a situation of data-haves and have-nots which is troubling for an open data movement that seeks to empower people with data. But there are emerging technocultural practices that combine participation, creativity, and context to connect data to everyday life. These include data journalism, citizen science, emerging forms for documenting and publishing metadata, novel public engagement in government processes, and participatory data art. This talk surveys these practices both lovingly and critically, including their aspirations and the challenges they face in creating citizens that are truly empowered with data.
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
Access to high-quality, relevant information is absolutely foundational for a quality education. Yet, so many schools across the developing world lack fundamental resources, like textbooks, libraries, electricity and Internet connectivity. The SolarSPELL (Solar Powered Educational Learning Library) is designed specifically to address these infrastructural challenges, by bringing relevant, digital educational content to offline, off-grid locations. SolarSPELL is a portable, ruggedized, solar-powered digital library that broadcasts a webpage with open-access educational content over an offline WiFi hotspot, content that is curated for a particular audience in a specified locality—in this case, for schoolchildren and teachers in remote locations. It is a hands-on, iteratively developed project that has involved undergraduate students in all facets and at every stage of development. This talk will examine the design, development, and deployment of a for-the-field technology that looks simple but has a quite complex background.
Laura Hosman is Assistant Professor at Arizona State University, holding a joint appointment in the School for the Future of Innovation in Society and in The Polytechnic School. Her work is action-oriented and focuses on the role for information and communications technology (ICT) in developing countries. Presently, she focuses on ICT-in-education projects, and brings her passion for experiential learning to the classroom by leading real-world-focused, project-based courses that have seen student-built technology deployed in schools in Haiti, Vanuatu, Micronesia, Samoa, and Tonga.
Information Science Brown Bag talks, hosted by the Program on Information Science, consists of regular discussions and brainstorming sessions on all aspects of information science and uses of information science and technology to assess and solve institutional, social and research problems. These are informal talks. Discussions are often inspired by real-world problems being faced by the lead discussant.
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
In his abstract, Scriffignano summarizes as follows:
l explore some of the ways in which the massive availability of data is changing and the types of questions we must ask in the context of making business decisions. Truth be told, nearly all organizations struggle to make sense out of the mounting data already within the enterprise. At the same time, businesses, individuals, and governments continue to try to outpace one another, often in ways that are informed by newly-available data and technology, but just as often using that data and technology in alarmingly inappropriate or incomplete ways. Multiple “solutions” exist to take data that is poorly understood, promising to derive meaning that is often transient at best. A tremendous amount of “dark” innovation continues in the space of fraud and other bad behavior (e.g. cyber crime, cyber terrorism), highlighting that there are very real risks to taking a fast-follower strategy in making sense out of the ever-increasing amount of data available. Tools and technologies can be very helpful or, as Scriffignano puts it, “they can accelerate the speed with which we hit the wall.” Drawing on unstructured, highly dynamic sources of data, fascinating inference can be derived if we ask the right questions (and maybe use a bit of different math!). This session will cover three main themes: The new normal (how the data around us continues to change), how are we reacting (bringing data science into the room), and the path ahead (creating a mindset in the organization that evolves). Ultimately, what we learn is governed as much by the data available as by the questions we ask. This talk, both relevant and occasionally irreverent, will explore some of the new ways data is being used to expose risk and opportunity and the skills we need to take advantage of a world awash in data.
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
Rebecca Kennison, who is the Principal of K|N Consultants, the co-founder of the Open Access Network; and was was the founding director of the Center for Digital Research and Scholarship, gave this talk on Come Together Right Now: An Introduction To The Open Access Network as part of the Program on Information Science Brown Bag Series.
Gary Price, MIT Program on Information ScienceMicah Altman
Gary Price, who is chief editor of InfoDocket, contributing editor of Search Engine Land, co-founder of Full Text Reports and who has worked with internet search firms and library systems developers alike, gave this talk on Issues in Curating the Open Web at Scale as part of the Program on Information Science Brown Bag Series.
Attribution from a Research Library Perspective, on NISO Webinar: How Librari...Micah Altman
Dr Altman's talk summarizes the lifecycle of research attribution, with special attention to person identifiers and contributor roles. The talk describes and discusses ORCID’s new “collect-and-connect” program, and the CASRAI CRediT contributor taxonomy as exemplars of emerging good practice. We close by describing how identifiers are being incorporated into a broader range of scholarly outputs, such as software.
Dr Micah Altman presented this at the Society for American Archivists 2016 Research Forum.
In this presentation I discuss some key potential topics for preservation research in the next five years.
Software Repositories for Research -- An Environmental ScanMicah Altman
Presented at the Software Preservation Network Forum:
"We discuss the results of an environmental scan characterizing the current landscape of software repositories, hubs, and publication venues that are used in research and scholarships. The study aims to characterize the research and scholarship use cases supported by exemplar repositories, their models for sustainability, and the related key affordances, significant properties which the repository offers/maintains. We supplement this with a scan of funder and publisher policies toward software curation and citation; and a summary of key policy resources and guidelines. Using this environmental scan, we discuss a preliminary gap analysis. It hoped that by addressing these key questions, new insights will be provided into the types of decisions research Libraries can expect to make when designing future pilot software curation services."
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Best Practices for Sharing Economics Data
1. Prepared for
Second Open Economics International Workshop
June 2013
“Not-bad” Practices for Sharing
Economics Data
Dr. Micah Altman
<escience@mit.edu>
Director of Research, MIT Libraries
Non-Resident Senior Fellow, The Brookings Institution
2. DISCLAIMER
These opinions are my own, they are not the opinions
of MIT, Brookings, any of the project funders, nor (with
the exception of co-authored previously published
work) my collaborators
Secondary disclaimer:
“It’s tough to make predictions, especially about the
future!”
-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill,
Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi,
Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle,
George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White,
etc.
“Not-bad” Practices for Sharing Economics Data 2
3. Collaborators & Co-Conspirators
• Jonathan Crabtree, Merce Crosas, Gary
King, Michael McDonald, Nancy
McGovern, Salil Vadhan & many others
• Research Support
Thanks to the Library of Congress, the National
Science Foundation, IMLS, the Sloan
Foundation, the Joyce Foundation, the
Massachusetts Institute of Technology, &
Harvard University.
“Not-bad” Practices for Sharing Economics Data 3
4. Related Work
• Altman (2013) Data Citation in The Dataverse Network ®,. In Developing Data
Attribution and Citation Practices and Standards: Report from an International
Workshop.
• National Digital Stewardship Alliance, 2013 (Forthcoming), 2014 National
Agenda for Digital Stewardship.
• M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young,
C. 2009. "Digital preservation through archival collaboration: The Data Preservation
Alliance for the Social Sciences." The American Archivist. 72(1): 169-182
• M. Altman, 2008, "A Fingerprint Method for Verification of Scientific Data" in,
Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of
the International Conference on Systems, Computing Sciences and Software
Engineering 2007) , Springer-Verlag.
• M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of
Quantitative Data”, D-Lib, 13, 3/4 (March/April).
Most reprints available from:
informatics.mit.edu
“Not-bad” Practices for Sharing Economics Data 4
6. Some Trends
Shifting Evidence Base
High Performance Collaboration
(here comes everybody…)
More Data
Publish, then Filter
More Learners
6
More Open
The Lifecycle and Institutional Ecology of Data
7. Why not ‘best’ practices?
• Few models for systematic valuation of data
– how much will data X be worth to community Y at time Z?
See: National Digital Stewardship Alliance, 2013 (Forthcoming), 2014
National Agenda for Digital Stewardship. Library of Congress
• Optimality of practices are generally strongly dependent on operational
context
• Context of data sharing very dynamic
– change in publication models
– change in evidence base
– change in data management methodologies
– change in policies
• Paucity of evidence to establish data practices as best:
– Descriptive: adoption, compliance
– Predictive: association of best practices &desired outcomes
– Causal: intervention with best practices linked to improvement
“Not-bad” Practices for Sharing Economics Data 7
Best practices neither best nor practiced.
8. Why ‘not bad’ practices?
• Avoid clearly bad practices
• Document operational and tacit knowledge
• Elicit assumptions
• Provide basis for auditing, evaluation, and
improvement
“Not-bad” Practices for Sharing Economics Data 8
13. Legal Constraints
Contract Intellectual Property
Access
Rights Confidentiality
Copyright
Fair Use
DMCA
Database Rights
Moral Rights
Intellectual
Attribution
Trade Secret
Patent
Trademark
Common Rule
45 CFR 26
HIPAA
FERPA
EU Privacy Directive
Privacy
Torts
(Invasion,
Defamation)
Rights of
Publicity
Sensitive but
Unclassified
Potentially
Harmful
(Archeological
Sites,
Endangered
Species,
Animal Testing,
…)
Classified
FOIA
CIPSEA
State
Privacy Laws
EAR
State FOI
Laws
Journal
Replication
Requirements
Funder Open
Access
Contract
License
Click-Wrap
TOU
ITAR
Export
Restrictions
14. Data Dissemination Policies - How
• License: Creative Commons
Version 4.0 of the Creative Commons licenses
– Legally well crafted
– Avoids attribution stacking – attribution through links
– Handles sui-generis database rights, licensee rights to publicity, etc.
– Machine actionable
See: wiki.creativecommons.org/4.0
• Confidentiality
Deidentification & public use files insufficient.
– Need multiple modes of access, including protected access to confidential data.
See: National Research Council. 2005. Expanding access to research data: Reconciling risks and
opportunities. Washington, DC: The National Academies Press.
Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research
Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf
“Not-bad” Practices for Sharing Economics Data 14
15. Data Dissemination Policy - When
• Timeliness [NRC Recommendations]
– Sharing data should be a regular practice.
– Investigators should share their data by the time of
publication of initial major results of analyses of the
data except in compelling circumstances.
– Data relevant to public policy should be shared as
quickly and widely as possible.
– Plans for data sharing should be an integral part of a
research plan whenever data sharing is feasible.
Fienberg, et al. (eds). 1985. Sharing Research data.
Washington, DC: The National Academies Press.
“Not-bad” Practices for Sharing Economics Data 15
16. Data Dissemination Policy - Where
• With journals. Follow NISO supplementary
materials:
http://www.niso.org/workrooms/supplementalre
commendations
• With sustainable well known collaboratively-
stewarded repositories
– Example: data-pass.org
Also see:
M.
Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., &
Young, C. 2009. "Digital preservation through archival collaboration: The Data
Preservation Alliance for the Social Sciences." The American Archivist. 72(1):
169-182
“Not-bad” Practices for Sharing Economics Data 16
17. Data Citation Policies
• Data Citation First Principles
(Harvard Workshop, NRC Report, Co-Data Forthcoming)
– Data citations should be treated as first-class objects of publication
– At minimum, all data necessary to understand assess extend conclusions in scholarly
work should be cited.
See:
Altman, Micah. “Data Citation in The Dataverse Network.” Developing Data Attribution and Citation Practices and
Standards Report from an International Workshop. Ed. Paul F Uhlir. National Academies Press, 2012
M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4
(March/April).
• Data-PASS recommendations
– Minimal elements: author, date, title, persistent id
– Location: must appear with other elements
– Recommended: fixity information, such as
Universal Numeric Fingerprint
See: data-pass.org/citations.html
“Not-bad” Practices for Sharing Economics Data 17
18. Reproducibility Policies
• Science
– “Unpublished data and personal communications. Citations to unpublished
data and personal communications cannot be used to support claims in a
published paper. Papers will be held for publication until all "in press" citations
are published.”
– “Data and materials availability All data necessary to understand, assess, and
extend the conclusions of the manuscript must be available to any reader of
Science. All computer codes involved in the creation or analysis of data must
also be available to any reader of Science. “
• Support for publishing replication
– Registered replication reports:
http://www.psychologicalscience.org/index.php/replication
– ICMJE Clinical Trials Registration:
http://www.icmje.org/publishing_10register.html
– Journals of Negative/Null Results
“Not-bad” Practices for Sharing Economics Data 18
19. Policies are not Self-Enforcing /Sustaining
• Technical and financial sustainability must be
planned, to ensure long term access
See: National Science Board, Long-Lived Digital Data
Collections: Enabling Research and
Education in the 21st Century. NSF.
http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf
• Long-term access requires initial investment in
data preparation
– Capture tacit knowledge, create metadata
– Transfer to stable formats
“Not-bad” Practices for Sharing Economics Data 19
20. Compliance with Data Sharing Policies is often
Low
The Lifecycle and Institutional Ecology of Data
Compliance is low even in best
examples of journals
Checking compliance is labor-
intensive without citation and
repository standards
[See Glandon 2011; Mucullough, et.
al 2008]
20
21. Technical Infrastructure Examples
• CKAN
– Open Source
– Established
– Built on drupal platform
– http://ckan.org/
• Dataverse Network
– http://thedata.org
– Open Source
– Flexible archival models
– Semantic Fixity (UNF)
[Altman 2008]
• MyExperiment
– http://www.myexperiment.org/
– Long lasting
– Archives complete workflows to produce results
“Not-bad” Practices for Sharing Economics Data 21
Technical Criteria
• Long term access
– Replication,
independence
• Verifiability and fixity
• Provenance
• Workflows/code
22. Final Observations
• Best practices aren’t…
– document context of practice & measure desired outcomes
• Not-bad practice starts with analysis…
– lifecycle; requirements; sustainability ; predicted costs and
benefits
• Effective data sharing requires policies:
– dissemination, citation, replication, auditing
• Effective data sharing requires infrastructure:
– For verifiability, provenance, workflows/code, & long term
access
• Policies are not self-enforcing
– combine incentives, transparency, auditing, & evaluation
“Not-bad” Practices for Sharing Economics Data 22
23. Additional Bibliography (Selected)
• McCullough, B.D., Kerry Anne McGeary, and Teresa D. Harrison. "Do Economics Journal Archives Promote Replicable
Research?" Canadian Journal of Economics 41, no. 4 (2008).
• Schneier, Bruce, 2012, Liars and Outliers. Wiley.
• Borgman, Christine. “The Conundrum of Research Sharing.” Journal of the American Society for Information Science and
Technology (2011):1-40.
• Glandon P. , 2011. Report on the American Economic Review Data Availability Compliance Project.
http://www.aeaweb.org/aer/2011_Data_Compliance_Report.pdf
• King, Gary. 2007. An Introduction to the Dataverse Network as an Infrastructure for Data Sharing. Sociological
Methods and Research 36: 173–199NSB
• International Council For Science (ICSU) 2004. ICSU Report of the CSPR Assessment Panel on Scientific Data and Information.
Report.
“Not-bad” Practices for Sharing
Economics Data
23
This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Best practices aren't.The core issue is that there are few models for the systematic valuation of data: We have no robust general proven ways of answering the question of how much data X be worth to community Y at time Z. Thus the "bestness" (optimality) of practices are generally strongly dependent on operational context.. and the context of data sharing is currently both highly complex and dynamic Until there is systematic descriptive evidence that best practices are used, predictive evidence that best practices are associated with future desired outcomes, and causal evidence that the application of best practices yields improved outcomes, we will be unsure that practices are "best".Nevertheless, one should use established "not-bad" practices, for a number of reasons. First, to avoid practices that are clearly bad; second, because use of such practices acts to dcoument op[erational and tacit knowledge; third because selecting practices can help to elicit the underlying assumptions under which practices are applied; and finally because not-bad practcies provide a basis for auditing, evaluation, and eventual improvement.Specific not-bad practices for data sharing fall into roughly three categories :Analytic practices: lifecycle analysis & requirements analysisPolicy practices for: data dissemination, licensing, privacy, availability, citation and reproducibilityTechnical practices for sharing and reproducibility, including fixity, replication, provenanceThis presentation at the Second Open Economics International Workshop (sponsored by the Sloan Foundation, MIT and OKFN) provides an overview of these and links to specific practices recommendations, standards, and tools:
LHC produces a PB every 2 weeks, Sloan Galaxy zoo has hundreds of thousands of “authors”, 50K people attend a class from the University of michigan, and to understand public opinion instead of surveying 100’s of people per month we can analyze 10ooo tweets per second.
Most of the different stakeholders have stronger relationships/stakes with research at different stages. But researchers and research institutions are in the middle – they have a strong stake in most stagesResearchers are more directly concerned with collection, processing, analysis, dissemination. Organizations have a higher stake in internal sharing, re-use, long-term access.