UK Research Data Discovery Service metadata schemaJisc RDM
An overview of the metadata schema being developed for the UK research data discovery service. Dom Fripp at the Research Data Network event at Cardiff University, May 2016.
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive Data
Mercè Crosas, Chief Data Science and Technology Officer, IQSS, Harvard University
Staffing Research Data Services at University of EdinburghRobin Rice
Invited remote talk for Georg-August University of Göttingen workshop: RDM costs and efforts on 28 May in Göttingen. Organised by the project Göttingen Research Data Exploratory (GRAcE).
UK Research Data Discovery Service metadata schemaJisc RDM
An overview of the metadata schema being developed for the UK research data discovery service. Dom Fripp at the Research Data Network event at Cardiff University, May 2016.
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive Data
Mercè Crosas, Chief Data Science and Technology Officer, IQSS, Harvard University
Staffing Research Data Services at University of EdinburghRobin Rice
Invited remote talk for Georg-August University of Göttingen workshop: RDM costs and efforts on 28 May in Göttingen. Organised by the project Göttingen Research Data Exploratory (GRAcE).
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...DeVonne Parks, CEM
Curating the Scholarly Record: Archiving Executable Content
Keith Webster, Dean of Libraries and Director of Emerging and Integrative Media Initiatives, Carnegie Mellon University
Presented by Natasha Aburrow-Jones at the CILIP Cataloguing and Indexing Group Conference 2014 at Canterbury on 8 September 2014. Poor quality, non-standardised metadata may not lead directly to the end of the world, but it won't help!
Presented by Robin Rice at the "IRs dealing with data" workshop at the Open Repositories 2013 Conference in Charlottetown, Prince Edward Island, Canada, on 8 July 2013.
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
Research data spring: giving researchers credit for their dataJisc RDM
The research data spring project "Giving researchers credit for their data" slides for the third sandpit workshop. Project led by the University of Oxford Bodleian Libraries.
This presentation was provided by Sandi Caldrone of Purdue during the NISO Virtual Conference held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours is Populated, Useful and Thriving.
This presentation by David Wilcox was part of the NISO Virtual Conference, held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours Is Populated, Useful and Thriving.
This presentation was provided by Adam Rusbridge of EDINA during a NISO webinar on the topic of Providing Access: Ensuring What Libraries Have Licensed is What Users Can Reach on Feb 8, 2017
The Dataverse repository framework (http://dataverse.org and http://dataverse.harvard.edu) helps Journals make the data accompanying scholarly articles accessible and citable.
More information at: http://scholar.harvard.edu/mercecrosas/presentations/dataverse-journals
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Ana Van Gulick, Carnegie Mellon University
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...DeVonne Parks, CEM
Curating the Scholarly Record: Archiving Executable Content
Keith Webster, Dean of Libraries and Director of Emerging and Integrative Media Initiatives, Carnegie Mellon University
Presented by Natasha Aburrow-Jones at the CILIP Cataloguing and Indexing Group Conference 2014 at Canterbury on 8 September 2014. Poor quality, non-standardised metadata may not lead directly to the end of the world, but it won't help!
Presented by Robin Rice at the "IRs dealing with data" workshop at the Open Repositories 2013 Conference in Charlottetown, Prince Edward Island, Canada, on 8 July 2013.
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Lightning Rounds (Thursday, May 5)
Presenter:
Matthew Spitzer, Center for Open Science
Research data spring: giving researchers credit for their dataJisc RDM
The research data spring project "Giving researchers credit for their data" slides for the third sandpit workshop. Project led by the University of Oxford Bodleian Libraries.
This presentation was provided by Sandi Caldrone of Purdue during the NISO Virtual Conference held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours is Populated, Useful and Thriving.
This presentation by David Wilcox was part of the NISO Virtual Conference, held on Feb 15, 2017, entitled Institutional Repositories: Ensuring Yours Is Populated, Useful and Thriving.
This presentation was provided by Adam Rusbridge of EDINA during a NISO webinar on the topic of Providing Access: Ensuring What Libraries Have Licensed is What Users Can Reach on Feb 8, 2017
The Dataverse repository framework (http://dataverse.org and http://dataverse.harvard.edu) helps Journals make the data accompanying scholarly articles accessible and citable.
More information at: http://scholar.harvard.edu/mercecrosas/presentations/dataverse-journals
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
This presentation was given by guest lecturer Dr. Hélène Draux of Digital Science Consultancy, during the fourth session of the NISO Spring training series "Working with Scholarly APIs." Session Four, Digital Science Dimensions, was moderated by Phill Jones of MoreBrains Cooperative and held on May 19, 2022.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
Integrating repositories and eLab notebooks through an open science frameworkrmacneil88
Overviews Jisc's investigation of including electronic lab notebooks in the Research Data Shared Service, and the benefits of Connected ELNs like RSpace
This presentation gives an overview of the key things that we need to consider before deciding to set up a data repository. It briefly talks about data repository, the software behind data repository and their limitations and merits. Additionally, the presenters shared IFPRI's experiences with Harvard Dataverse.
12.10.14 Slides, “Roadmap to the Future of SHARE”DuraSpace
Hot Topics: The DuraSpace Community Webinar Series
Series 10: All About the SHared Access Research Ecosystem (SHARE)
Webinar 3: Roadmap to the Future of SHARE
Wednesday, January 14, 2015
Presented by Judy Ruttenberg, Program Director, Association of Research Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Lisa Zilinski, Data Specialist, Carnegie Mellon University
Amy Barton, Metadata Specialist, Purdue
Tao Zhang, Digital User Experience Specialist, Purdue
Line Pouchard, Computational Science Information Specialist, Purdue
Pete E. Pascuzzi, Molecular Biosciences Information Specialist, Purdue
This presentation was provided by Julie Goldman of Harvard University, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
A short review of the new initiatives related to research data management at Harvard University for the CRADLE workshop at IASSIST 2017 (http://www.iassist2017.org/).
Cloud Dataverse: A Data repository platform for an OpenStack CloudMerce Crosas
In the last 10 years, the Dataverse project has been a leader in open-source repository software for sharing and archiving research data. Dataverse has an active, growing community of developers and users, with 22 installations of the software around the world. The Harvard Dataverse repository alone hosts 70,000 datasets, 330,000 data files, with contributions from more than 500 institutions.
Cloud Dataverse combines Dataverse and OpenStack by storing datasets in OpenStack’s Swift Object storage and replicating datasets from Dataverse repositories world-wide to the cloud(s) -- offering enormous value to both the Dataverse and OpenStack communities. It provides Dataverse users the ability to host larger datasets and efficiently compute on data from around the world using OpenStack’s compute services. It provides OpenStack users with a repository system that is much richer than Amazon’s Public Datasets service.
Presentation for the workshop on "6 Reasons Fake News is the End of the World as we know it" at Harvard University, organized by the Center for Research on Computation and Society https://crcs.seas.harvard.edu/event/fakenews
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
Talk given at Two Sigma:
The Dataverse project, developed at Harvard's Institute for Quantitative Social Science since 2006, is a widely used software platform to share and archive data for research. There are currently more than 20 Dataverse repository installations worldwide, with the Harvard Dataverse repository alone hosting more than 60,000 datasets. Dataverse provides incentives to researchers to share their data, giving them credit through data citation and control over terms of use and access. In this talk, I'll discuss the Dataverse project, as well as related projects such as DataTags to share sensitive data and Cloud Dataverse to share Big Data.
FAIR Data Management and FAIR Data SharingMerce Crosas
Presentation at the Critical Perspective on the Practice of Digiral Archeology symposium: http://archaeology.harvard.edu/critical-perspectives-practice-digital-archaeology
Presentation at the MOC Workshop, at Boston University.
Cloud Dataverse will be a new service for accessing and processing public data sets in a the Massachusetts Open Cloud (MOC). It is based on Dataverse, a popular software framework for sharing, archiving, and analyzing research data. Cloud Dataverse extends Dataverse to replicate datasets from institution repositories to a cloud-based repository and store their data files in Swift, making data processing faster for in-situ application running in the cloud.
Cloud Dataverse is a collaborative effort between two open source projects: Massachusetts Open Cloud (MOC) and Dataverse. The Dataversesoftware is being developed at Harvard's Institute for Quantitative Social Science (IQSS) with contributors worldwide providing 21 Dataverse installations. The Harvard Dataverse installation alone hosts more than 60,000 datasets from 300 institutions by 15,000 data authors. The MOC is a collaboration between higher education (BU, NEU, Harvard, MIT and UMass), government, and industry. Its mission is to create a self-sustaining at-scale public cloud based on the Open Cloud eXchange model.
Since modern science began, data have been a critical part of the scientific enterprise, not only for conducting science but also for communicating and validating scientific results. From the beginning, it was clear that for the scientific community to continually verify scientific results, the underlying data had to be made accessible. But that has not been, and is still not, always the case. In recent years however, public data repositories have grown significantly, making many research data sets easily accessible to others. The Dataverse project, an open-source software for building repositories to share research data (such as the Harvard Dataverse), has played an important role in making this happen, by giving incentives to researchers to share their own data. In this talk, I will discuss how we got here, and introduce current projects that extend Dataverse to address the next challenges in sharing research data. In particular, I'll present a project that, through integrating Dataverse with remote computing sites, makes large-scale structural biology data widely accessible and helps validate previous results.
Presentation for Harvard's ABCD Technology in Education group:
The Institute for Quantitative Social Science (IQSS) is a unique entity at Harvard - it combines research, software development, and specialized services to provide innovative solutions to research and scholarship problems at Harvard and beyond. I will talk about the software projects that IQSS is currently working on (Dataverse, Zelig, Consilience, and OpenScholar), including the research and development processes, the benefits provided to the Harvard community, and the impacts on research and scholarship.
The DataTags System: Sharing Sensitive Data with ConfidenceMerce Crosas
This talk was part of a session at the Research Data Alliance (RDA) 8th Plenary on Privacy Implications of Research Data Sets, during International Data Week 2016:
https://rd-alliance.org/rda-8th-plenary-joint-meeting-ig-domain-repositories-wg-rdaniso-privacy-implications-research-data
Slides in Merce Crosas site:
http://scholar.harvard.edu/mercecrosas/presentations/datatags-system-sharing-sensitive-data-confidence
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
Presentation for the NFAIS Webinar series: Open Data Fostering Open Science: Meeting Researchers' Needs
http://www.nfais.org/index.php?option=com_mc&view=mc&mcid=72&eventId=508850&orgId=nfais
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...Merce Crosas
Presentation at the National Library of Medicine, in a Symposium organized by the National Data Stewardship Residency, funded by the Library of Congress and the Institute of Museum and Library Services, on "Digital Frenemies: Closing the Gap in Born-Digital and Made-Digital Curation”.
https://ndsr2016.wordpress.com/
A very Brief History of Communicating ScienceMerce Crosas
Mercè Crosas (IQSS, HarvardUniversity) @mercecrosas
An introduction to Force2016 panel on Communicating Science with Steven Pinker, César Hidalgo, and Christie Nicholson
Data Citation Implementation at DataverseMerce Crosas
Presentation at the Data Citation Implementation Pilot Workshop in Boston, February 3rd, 2016.
https://www.force11.org/group/data-citation-implementation-pilot-dcip/pilot-project-kick-workshop
Talk for the workshop on the Future of the Commons, November 18, 2015: http://cendievents.infointl.com/CENDI_NFAIS_RDA_2015/
Slides distributed under under CC-by license: https://creativecommons.org/licenses/by/2.0/
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
Dataverse hpdm symposium
1. @dataverseorg
Mercè Crosas, Director of Data Science, IQSS, Harvard University
@mercecrosas
Harvard Purdue Data Management Symposium, June 16-17, 2015
New!
2. About Dataverse
● Gives credit and control to data authors and distributors
● Follows best practices, standards for data management and archiving
● Dataverse development started in 2006 at Harvard’s IQSS
● Now widely used, with a vibrant development and user community
● Helped instigate and is at the center of a cultural change toward open and
reproducible research
Science requires
community access
to data
An open source software
project for sharing, citing
and archiving data
Technology
Solution
4. Dataverse 4.0
A full rewrite that improves usability defines a rigorous and standardized
data publishing workflow, and leverages the latest technologies.
5. Rich Set of Features
● Standard, persistent data citation
● Branding for each dataverse
● Standard, extensible metadata:
○ citation metadata
○ domain-specific metadata
○ file-level metadata
● Faceted search for all metadata
● Multiple levels of access control
○ CC0/ terms of use/ restricted
● Multiple roles and permissions
● Re-formatting of tabular data files
● Extraction of file metadata
● Versioning
● APIs for search, deposit, access
Upgraded Technology
● UI improved by usability testing
● Built with open source solutions
● Enhanced UI framework
○ PrimeFaces and Bootstrap
● Widely used, community driven
enterprise software platform
○ Java EE7 and Glassfish
● Reliable, scalable search platform
○ Solr
● Web standard programmatic interfaces
○ RESTful APIs
● Standards for archiving and
interoperability
○ OAI-PMH, LOCKSS
6. Dataverse Installations worldwide
Dataverse software installations around the world serve as public data repositories (Harvard and
ODUM Dataverses) or institutional research data repositories.
7. Harvard Dataverse
● A collaboration between the Harvard Library and IQSS
● Open to research data worldwide:
○ > 1000 dataverses
○ > 58,000 datasets
○ > 270,000 files
○ > 1.3 million downloads
○ > 11,000 registered users
● Includes dataverses for:
○ individual researchers
○ research teams
○ journals
○ institutions or organizations
● Rate of data deposit has increased by a factor of 30 since last year
8.
9.
10. Dataverse is now more than a
software project and a data repository
Dataverse
Development of
Protocols and
Software
Collaboration
with the Library
User Support,
curation and
Training
User Community
Input
Grant
Partnerships
Collaboration
with Broader Data
Community
Interns Program
Outreach
(meetings,
papers)
Journals
Integration
External Tools
Integration
Technical Support
for other
Installations
11. A vibrant community
Contributions:
● Internalization, translation to chinese (Fudan University)
● Integration with Archivematica (University of Toronto)
● Integration with iRODS (UNC)
● Integration with Shibboleth (Netherlands Dataverse, DANS)
12. What’s next
•Support new data types
•Sensitive information
•Large-scale data (terabytes to petabytes)
•Streaming data (e.g., sensors, cell phones)
•Make datasets shared in Dataverse more reusable
•Extend APIs to build community of app contributors
•Build integrated tools (e.g., data visualization, analysis)