A basic course on Research data management, part 4: caring for your data, or ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
A basic course on Research data management, part 1: what and whyLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Winning the Tour de France, Research Data and Data StewardshipAlastair Dunning
Presentation to Sport Data Valley given at TU Delft Library meeting on value of Data Stewardship and Curation for those working with data from elite and public sport
May 2016
A basic course on Research data management, part 4: caring for your data, or ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
A basic course on Research data management, part 1: what and whyLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Winning the Tour de France, Research Data and Data StewardshipAlastair Dunning
Presentation to Sport Data Valley given at TU Delft Library meeting on value of Data Stewardship and Curation for those working with data from elite and public sport
May 2016
Data Publishing and Institutional RepositoriesVarsha Khodiyar
Slides presented at the Force16 panel discussion on 18th April 2016 "Libraries united in opening new scholarly platforms" https://www.force11.org/meetings/force2016/program/agenda/concurrent-session-libraries-united-opening-new-scholarly
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
This is module 4 in the EDI Data Publishing training course. In this module, you will learn how to group your data files and other information products into a publishable unit.
This is module 6 in the EDI Data Publishing training course. In this module, you will learn how to create quality metadata and be introduced to the landscape of data repositories and their functions.
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
This presentation gives an overview of the key things that we need to consider before publishing data from the repository. It briefly discusses research data management, research data lifecycle, FAIR principles of research data management and then move on to key elements that should be considered while preparing datasets for publishing through repository.
This presentation introduces the basics of the Dataverse including preparing the submission to the Dataverse, creating an account and logging in, adding datasets to the Dataverse account, and metadata.
A basic course on Research data management, part 3: sharing your dataLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
Data Publishing and Institutional RepositoriesVarsha Khodiyar
Slides presented at the Force16 panel discussion on 18th April 2016 "Libraries united in opening new scholarly platforms" https://www.force11.org/meetings/force2016/program/agenda/concurrent-session-libraries-united-opening-new-scholarly
Good (enough) research data management practicesLeon Osinski
Slides of a lecture on research data management (RDM), given for 3rd year students (Eindhoven University of Technology, major Psychology & Technology), as part of the course 0HV90 Quantitative Research. At the end of the slides a handy summary 'Research data management basics in a nutshell' is added.
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
This is module 4 in the EDI Data Publishing training course. In this module, you will learn how to group your data files and other information products into a publishable unit.
This is module 6 in the EDI Data Publishing training course. In this module, you will learn how to create quality metadata and be introduced to the landscape of data repositories and their functions.
The challenge of sharing data well, how publishers can helpVarsha Khodiyar
Researchers, academic institutes and funders are increasingly recognizing the importance of data sharing for reproducible science. However, it is not always straightforward and clear to researchers as to how best to share data in a useful way. At Springer Nature we are working on several initiatives to help facilitate the sharing of research data in a reusable way, with our overarching goal being to publish research that is robust and reproducible. I will talk about the effort that goes into our flagship data journal, Scientific Data, to facilitate best practices in publication and sharing of research data, and share some of our experiences publishing Challenge datasets. I will also describe some of the newer Research Data Services that are now available to help all researchers (not only Springer Nature authors) to share their data in a useful way.
This presentation gives an overview of the key things that we need to consider before publishing data from the repository. It briefly discusses research data management, research data lifecycle, FAIR principles of research data management and then move on to key elements that should be considered while preparing datasets for publishing through repository.
This presentation introduces the basics of the Dataverse including preparing the submission to the Dataverse, creating an account and logging in, adding datasets to the Dataverse account, and metadata.
A basic course on Research data management, part 3: sharing your dataLeon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
FAIR Ddata in trustworthy repositories: the basicsOpenAIRE
This video illustrates how certified digital repositories contribute to making and keeping research data findable, accessible, interoperable and reusable (FAIR). Trustworthy repositories support Open Access to data, as well as Restricted Access when necessary, and they offer support for metadata, sustainable and interoperable file formats, and persistent identifiers for future citation. Presented by Marjan Grootveld (DANS, OpenAIRE).
Main references
• Core Trust Seal for trustworthy digital repositories: https://www.coretrustseal.org/
• EUDAT FAIR checklist: https://doi.org/10.5281/zenodo.1065991
• European Commission’s Guidelines on FAIR data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
• FAIR data principles: www.force11.org/group/fairgroup/fairprinciples
• Overview of metadata standards and tools: https://rdamsc.dcc.ac.uk/
"Open Science, Open Data" training for participants of Software Writing Skills for Your Research - Workshop for Proficient, Helmholtz Centre Potsdam - GFZ German Research Centre for Geosciences, Telegrafenberg, December 16, 2015
This is a presentation for the Erwin Hahn Instiutute in Essen, explaining the background, functional design and technical architecture of the Donders Repository. Furthermore, it explains how it aligns with the DCCN project management and with the researchers workflow
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
This is the presentation I gave at the monthly meeting of the Donders Institute PhD council. It shortly explains the Donders Repository, but mainly addresses how to deal with direct and indirectly identifying personal data, with anonymization, pseudomimization and de-identification, and with blurring of research data prior to sharing.
Talk at JISC Repositories conference intended for repository managers or research managers on some of the issues involved. Talk had to be originally given unaided because of a technology problem!
The Dutch Approach to Research Data Infrastructurepkdoorn
The Dutch Approach to Research Data Infrastructure
Peter Doorn (DANS), Marc Dupuis (SURF), Maurice Vanderfeesten (SURF)
ANDS Invitational Research Data Infrastructure Workshop, Prato, April 11-13, 2011
Open science curriculum for students, June 2019Dag Endresen
Living Norway seminar on Open Science in Trondheim 12th June 2019.
https://livingnorway.no/2019/04/26/living-norway-seminar-2019/
https://www.gbif.no/events/2019/living-norway-seminar.html
Scholars and researchers are being asked by an increasing number of research sponsors and journals to outline how they will manage and share their research data. This is an introduction to data management and sharing practices with some specific information for Columbia University researchers.
PROOF course Writing articles and abstracts in English, part: Copyright in ac...Leon Osinski
For this presentation students need to have seen 5 web lectures on copyright. During the presentation, the knowledge gained by the students by looking at the web lectures will be tested on the basis of a number of practical questions.
Research data management: course OGO Quantitative research (21-11-2018)Leon Osinski
Slides underlying a presentation on research data management for course OGO Quantitative research (Eindhoven University of Technology, Eindhoven, the Netherlands)
Presentation on the suitability of Creative Commons licenses for research data, held at the meeting of UKB working group Research Data on 18 April 2018 by Leon Osinski, Eindhoven University of Technology
A basic course on Research data management: part 1 - part 4Leon Osinski
Slides belonging to a basic course on research data management. The course consists of 4 parts:
Part 1: what and why
1.1 data management plans
Part 2: protecting and organizing your data
2.1 data safety and data security
2.2 file naming, organizing data (TIER documentation protocol)
Part 3: sharing your data
3.1 via collaboration platforms (during research)
3.2 via data archives (after your research)
Part 4: caring for your data, or making data usable
4.1 tidy data
4.2 documentation/metadata
4.3 licenses
4.4 open data formats
A basic course on Reseach data management, part 2: protecting and organizing ...Leon Osinski
A basic course on research data management for PhD students. The course consists of 4 parts. The course was given at Eindhoven University of Technology (TUe), 24-01-2017
Copyright and citation issues : PROOF course Writing articles and abstracts /...Leon Osinski
As an author of scholarly papers, you will use in your paper materials (text fragments, picture, tables, figures) of other people. In most cases this material is copyright-protected which means that in most cases, not always, you have to ask permission to re-use that material and to attribute the source of the material. This is also the first topic of this lecture: you as a user of copyright-protected material.
In the second place, when you’re done writing you want to publish your paper in a journal. In most cases, not always, this goes with a transfer of the copyright that you initially own to a publisher. Transfer of copyright has some consequences and this is the second topic of this presentation: you as a producer of copyright-protected material.
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL: Ma...Leon Osinski
Onderzoeksdata-bepalingen van financiers van universitair onderzoek in NL : presentatie Master Class Research Data Management in Nederland, Maastricht, 3/4 april 2014.
UKB Werkgroep Datamanagement,Voorwaarden van Financiers.
Maarten van Bentum, Henk van den Hoogen, Leon Osinski
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
How to make your research data open : presentation held at the VU Open Science Festival, 18 October 2018
1.
Sjef Öllers, Leon Osinski
Eindhoven University of Technology, 4TU.Centre for Research Data
Amsterdam, VU, 18-10-2018
Available under CC BY license
2. Opening your data
during your research versus after your research
During your research via collaboration or sharing
platforms
Data sharing is (more) aimed at collaboration, at
working together on data
Being able to control access to data is crucial
After your research via data archives or repositories
Keeping and, if necessary, publishing an archival
copy of data – a copy that cannot be changed - is
essential
Long term preservation of your data - at least 10
years - is important
3. SURFdrive : Dutch academic Dropbox, 250 Gb, maximum data transfer 16 Gb
Google Drive, Dropbox: don’t use these to store sensitive data
DataverseNL: based on Harvard’s Dataverse Project, in NL maintained by DANS
Open Science Framework: for a manual see
https://pedermisager.netlify.com/post/how-to-share-your-data-with-osf/ and
https://doi.org/10.1177%2F2515245918757689
[===============]
CRCNS : neuroscience
OpenML : machine learning
DCCD : dendrochronological research (tree-rings)
Opening your data
during your research (‘active’ data)
4. 1. On request
“I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share their data…”
Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012)
2. On a (personal) website
“Let me start by saying that the reason why I put all excel files online, including all the detailed excel formulas about data
constructions and adjustments, is precisely because I want to promote an open and transparent debate about these important and
sensitive measurement issues.”
Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ; originally published as Addendum: Response to
FT, 28-05-2014
3. A data journal
Journal of open psychology data, Geoscience data journal, Data in brief, Scientific data
Opening your data
after your research (static data, ‘frozen’ datasets, ‘milestone’ datasets)
5. 4. Choose a repository where other researchers in your discipline are sharing their
data, for example TurBase (turbulence data), Lxcat (plasma data), OpenNeuro
(neuroimaging data), KNB (ecological, environmental data), Pangaea (earth
sciences), Dryad (biology, life sciences), ICPSR (social and behavioral sciences)
If not available, use an established multidisciplinary or general repository that at
least assigns a persistent identifier to your data (DOI) and requires that you
provide adequate metadata
Zenodo, Figshare, DANS, … ;
4TU.Centre for Research Data
+ DOI’s [persistent identifier for citability and retrievability]
+ open access
+ long-term availability, Data Seal of Approval
+ self-upload (single data sets < 3Gb)
Opening your data
after your research: via an archive or repository
6. FAIR data
Findable: easy to find by both humans and computer systems;
Accessible: stored for long term with well-defined license and access
conditions (open access when possible)*;
Interoperable: ready to be combined with other datasets by humans
as well as computer systems;
Reusable: ready to be used for future research and to be processed
further using computational
methods.
*”FAIR is not equal to Open: The ‘A’ in FAIR stands for ‘Accessible
under well defined conditions’. There may be legitimate reasons to
shield data and services generated with public funding from public
access.”
7. 4TU.Centre for Research Data
FAIR data
With 4TU.Centre for Research Data data is made FAIR
Data are assigned a DOI findable
Data can be linked to publications (DOI reservation is possible) findable
Data are assigned descriptive/discovery metadata findable, interoperable
Data are assigned a user license (of your choice) re-useable
Data are open access (restricted access options are being developed)
accessible
Data are archived/preserved for the long term accessible
Metadata can be harvested by Google (Data Sets), etc. findable
9. Wrap up
Opening your data is more than just putting your data
on the internet
Make your data FAIR by saving it in an established
disciplinary or general data archive