Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Integrating research indicators for use in the repositories infrastructure petrknoth
The current repository infrastructure, which consists of thousands of repositories, does not make effective use of research indicators largely exploited by commercial players in the area. Research indicators, including citation counts and Mendeley reader counts, enable the development and improvement of functionality researchers use on a daily basis. For example, they make it possible to increase the performance in information retrieval and recommendation tasks and serve as an enabler for the development of research analytics & metrics functionality, such as the analysis of research trends or collaboration networks. We believe that there is a strong case for making a better use of these indicators within the repositories infrastructure to improve the functionality of services users rely on.
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
DATA SCIENCE AND BIG DATA
ANALYTICS
CHAPTER 2:
DATA ANALYTICS LIFECYCLE
DATA ANALYTICS LIFECYCLE
• Data science projects differ from BI projects
• More exploratory in nature
• Critical to have a project process
• Participants should be thorough and rigorous
• Break large projects into smaller pieces
• Spend time to plan and scope the work
• Documenting adds rigor and credibility
DATA ANALYTICS LIFECYCLE
• Data Analytics Lifecycle Overview
• Phase 1: Discovery
• Phase 2: Data Preparation
• Phase 3: Model Planning
• Phase 4: Model Building
• Phase 5: Communicate Results
• Phase 6: Operationalize
• Case Study: GINA
2.1 DATA ANALYTICS
LIFECYCLE OVERVIEW
• The data analytic lifecycle is designed for Big Data problems and
data science projects
• With six phases the project work can occur in several phases
simultaneously
• The cycle is iterative to portray a real project
• Work can return to earlier phases as new information is uncovered
2.1.1 KEY ROLES FOR A
SUCCESSFUL ANALYTICS
PROJECT
KEY ROLES FOR A
SUCCESSFUL ANALYTICS
PROJECT
• Business User – understands the domain area
• Project Sponsor – provides requirements
• Project Manager – ensures meeting objectives
• Business Intelligence Analyst – provides business domain
expertise based on deep understanding of the data
• Database Administrator (DBA) – creates DB environment
• Data Engineer – provides technical skills, assists data
management and extraction, supports analytic sandbox
• Data Scientist – provides analytic techniques and modeling
2.1.2 BACKGROUND AND OVERVIEW
OF DATA ANALYTICS LIFECYCLE
• Data Analytics Lifecycle defines the analytics process and
best practices from discovery to project completion
• The Lifecycle employs aspects of
• Scientific method
• Cross Industry Standard Process for Data Mining (CRISP-DM)
• Process model for data mining
• Davenport’s DELTA framework
• Hubbard’s Applied Information Economics (AIE) approach
• MAD Skills: New Analysis Practices for Big Data by Cohen et al.
https://en.wikipedia.org/wiki/Scientific_method
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
http://www.informationweek.com/software/information-management/analytics-at-work-qanda-with-tom-davenport/d/d-id/1085869?
https://en.wikipedia.org/wiki/Applied_information_economics
https://pafnuty.wordpress.com/2013/03/15/reading-log-mad-skills-new-analysis-practices-for-big-data-cohen/
OVERVIEW OF
DATA ANALYTICS LIFECYCLE
2.2 PHASE 1: DISCOVERY
2.2 PHASE 1: DISCOVERY
1. Learning the Business Domain
2. Resources
3. Framing the Problem
4. Identifying Key Stakeholders
5. Interviewing the Analytics Sponsor
6. Developing Initial Hypotheses
7. Identifying Potential Data Sources
2.3 PHASE 2: DATA PREPARATION
2.3 PHASE 2: DATA
PREPARATION
• Includes steps to explore, preprocess, and condition
data
• Create robust environment – analytics sandbox
• Data preparation tends to be t.
Presentation to IASSIST 2013, in the session Expanding Scholarship: Research Journals and Data Linkages. Describes PREPARDE workshop on repository accreditation for data publication and invites comments on guidelines.
Member privacy is of paramount importance to LinkedIn. The company must protect the sensitive data users provide. On the other hand, our members join LinkedIn to find each other, necessitating the sharing of certain data. This privacy paradox can only be addressed by giving users control over where and how their data is used. While this approach is extremely important, it also presents scaling challenges.
In this talk, we will discuss the challenges behind enforcing compliance at scale as well as LinkedIn's solution. Our comprehensive record-level offline compliance framework includes schema metadata tracking, alternate read-time views of the same dataset, physical purging of data on HDFS, and features for users to define custom filtering rules using SQL, assigning such customizations to specific datasets, groups of datasets, or use cases. We achieve this using many open-source projects like Hadoop, Hive, Gobblin, and Wherehows, as well as a homegrown data access layer called Dali. We also show how the same Hadoop-powered framework can be used for enforcing compliance on other stores like Pinot, Salesforce, and Espresso.
While there is no one-size fits all solution to guaranteeing user data privacy, this talk will provide a blueprint and concrete example of how to enforce compliance at scale, which we hope proves useful to organizations working to improve their privacy commitments. ISSAC BUENROSTRO, Staff Software Engineer, LinkedIn and ANTHONY HSU, Staff Software Engineer, LinkedIn
Are you interesting in offering data management services at your library but aren’t sure where to start? Then this class is for you! During this session, we will
• Outline the data management topics that are commonly offered in libraries
• Present strategies for how to determine what services might be most useful on your campus and create synergistic partnerships with other university entities
• Dive into how to offer support with data management plans
• Present a case study for using an institutional repository to archive and share research data
• Identify additional training opportunities and open educational resources you can use to develop robust DM services
The class will consist of a mix of presentations, hands on activities, and discussion. So come ready to participate!
Integrating research indicators for use in the repositories infrastructure petrknoth
The current repository infrastructure, which consists of thousands of repositories, does not make effective use of research indicators largely exploited by commercial players in the area. Research indicators, including citation counts and Mendeley reader counts, enable the development and improvement of functionality researchers use on a daily basis. For example, they make it possible to increase the performance in information retrieval and recommendation tasks and serve as an enabler for the development of research analytics & metrics functionality, such as the analysis of research trends or collaboration networks. We believe that there is a strong case for making a better use of these indicators within the repositories infrastructure to improve the functionality of services users rely on.
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
DATA SCIENCE AND BIG DATA
ANALYTICS
CHAPTER 2:
DATA ANALYTICS LIFECYCLE
DATA ANALYTICS LIFECYCLE
• Data science projects differ from BI projects
• More exploratory in nature
• Critical to have a project process
• Participants should be thorough and rigorous
• Break large projects into smaller pieces
• Spend time to plan and scope the work
• Documenting adds rigor and credibility
DATA ANALYTICS LIFECYCLE
• Data Analytics Lifecycle Overview
• Phase 1: Discovery
• Phase 2: Data Preparation
• Phase 3: Model Planning
• Phase 4: Model Building
• Phase 5: Communicate Results
• Phase 6: Operationalize
• Case Study: GINA
2.1 DATA ANALYTICS
LIFECYCLE OVERVIEW
• The data analytic lifecycle is designed for Big Data problems and
data science projects
• With six phases the project work can occur in several phases
simultaneously
• The cycle is iterative to portray a real project
• Work can return to earlier phases as new information is uncovered
2.1.1 KEY ROLES FOR A
SUCCESSFUL ANALYTICS
PROJECT
KEY ROLES FOR A
SUCCESSFUL ANALYTICS
PROJECT
• Business User – understands the domain area
• Project Sponsor – provides requirements
• Project Manager – ensures meeting objectives
• Business Intelligence Analyst – provides business domain
expertise based on deep understanding of the data
• Database Administrator (DBA) – creates DB environment
• Data Engineer – provides technical skills, assists data
management and extraction, supports analytic sandbox
• Data Scientist – provides analytic techniques and modeling
2.1.2 BACKGROUND AND OVERVIEW
OF DATA ANALYTICS LIFECYCLE
• Data Analytics Lifecycle defines the analytics process and
best practices from discovery to project completion
• The Lifecycle employs aspects of
• Scientific method
• Cross Industry Standard Process for Data Mining (CRISP-DM)
• Process model for data mining
• Davenport’s DELTA framework
• Hubbard’s Applied Information Economics (AIE) approach
• MAD Skills: New Analysis Practices for Big Data by Cohen et al.
https://en.wikipedia.org/wiki/Scientific_method
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
http://www.informationweek.com/software/information-management/analytics-at-work-qanda-with-tom-davenport/d/d-id/1085869?
https://en.wikipedia.org/wiki/Applied_information_economics
https://pafnuty.wordpress.com/2013/03/15/reading-log-mad-skills-new-analysis-practices-for-big-data-cohen/
OVERVIEW OF
DATA ANALYTICS LIFECYCLE
2.2 PHASE 1: DISCOVERY
2.2 PHASE 1: DISCOVERY
1. Learning the Business Domain
2. Resources
3. Framing the Problem
4. Identifying Key Stakeholders
5. Interviewing the Analytics Sponsor
6. Developing Initial Hypotheses
7. Identifying Potential Data Sources
2.3 PHASE 2: DATA PREPARATION
2.3 PHASE 2: DATA
PREPARATION
• Includes steps to explore, preprocess, and condition
data
• Create robust environment – analytics sandbox
• Data preparation tends to be t.
Presentation to IASSIST 2013, in the session Expanding Scholarship: Research Journals and Data Linkages. Describes PREPARDE workshop on repository accreditation for data publication and invites comments on guidelines.
Member privacy is of paramount importance to LinkedIn. The company must protect the sensitive data users provide. On the other hand, our members join LinkedIn to find each other, necessitating the sharing of certain data. This privacy paradox can only be addressed by giving users control over where and how their data is used. While this approach is extremely important, it also presents scaling challenges.
In this talk, we will discuss the challenges behind enforcing compliance at scale as well as LinkedIn's solution. Our comprehensive record-level offline compliance framework includes schema metadata tracking, alternate read-time views of the same dataset, physical purging of data on HDFS, and features for users to define custom filtering rules using SQL, assigning such customizations to specific datasets, groups of datasets, or use cases. We achieve this using many open-source projects like Hadoop, Hive, Gobblin, and Wherehows, as well as a homegrown data access layer called Dali. We also show how the same Hadoop-powered framework can be used for enforcing compliance on other stores like Pinot, Salesforce, and Espresso.
While there is no one-size fits all solution to guaranteeing user data privacy, this talk will provide a blueprint and concrete example of how to enforce compliance at scale, which we hope proves useful to organizations working to improve their privacy commitments. ISSAC BUENROSTRO, Staff Software Engineer, LinkedIn and ANTHONY HSU, Staff Software Engineer, LinkedIn
Are you interesting in offering data management services at your library but aren’t sure where to start? Then this class is for you! During this session, we will
• Outline the data management topics that are commonly offered in libraries
• Present strategies for how to determine what services might be most useful on your campus and create synergistic partnerships with other university entities
• Dive into how to offer support with data management plans
• Present a case study for using an institutional repository to archive and share research data
• Identify additional training opportunities and open educational resources you can use to develop robust DM services
The class will consist of a mix of presentations, hands on activities, and discussion. So come ready to participate!
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Lawrence-f1000-publishing with data-nfdp13DataDryad
Presentation by Rebecca Lawrence on F1000's initiatives for publishing with data given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
RDM Roadmap to the Future, or: Lords and Ladies of the DataRobin Rice
Story of the new 2017-2020 University of Edinburgh RDM Roadmap, with a Tolkienesque theme for IASSIST-CARTO 2018 in Montreal: "Once upon a data point: sustaining our data storytellers".
Curoverse Presentation at ICG-11 (November 2016)Arvados
Huge genomic datasets are being created all around the world, and their scale is accelerating. But these data gain greater meaning when analyzed in concert with other datasets stored in institutions around the world. Due to data residency restrictions, regulatory barriers, and sheer data volume, it is impossible to effectively centralize all of these data in one place. In order to achieve regional and global use of many data sets in concert, we must overcome these challenges with a new approach to managing, analyzing and sharing sequencing data: Federated Computing.
Federated Computing is difficult from a technical perspective because of the variety of IT infrastructures and workflow engines available, which makes reproducibility across environments nearly impossible, and from a practical perspective because of privacy and competitive concerns among researchers. Federated Computing becomes easier with a scalable, open source, multi-platform, standards-based biomedical big data computing platform that can be deployed in public cloud, private cloud, and HPC environments, and enables bit-for-bit reproducibility of analyses across every deployment.
We present Arvados (http://arvados.org), a free and open source platform for managing and processing biomedical data designed for scale, reproducibility, and federation. Workflows and queries can travel across multiple Arvados clusters, running exactly the same way on each one, regardless of the underlying compute & storage infrastructure.
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
A FAIR Approach to Publishing and Sharing Machine Learning ModelsBen Blaiszik
While there has been a significant increase in the amount of machine learning research across various domains of science, the processes to publish the results and make the resulting models and code available for reuse has been lacking. In this talk, we discuss FAIR data principles applied to machine learning models and how the Data and Learning Hub for Science (DLHub) can help make models more easily discoverable and usable in common scientific workflows. Visit https://www.dlhub.org for more information.
The webinar held 6 October 2020.
The webinar is relevant for new and existing Crossref members, publishers, editors, researchers, service
providers, hosting platforms, funders, librarians; really anyone interested in finding out a bit more about what
Crossref is and does.
This webinar covers:
• How to register content with Crossref
• How to make updates to your metadata in order to make changes, corrections, or to add more detail
• Participation reports
• Additional services and where to find help.
Sessions presented in English by Crossref staff.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
More information on the community of practice: https://www.openaire.eu/cop-training
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
More Related Content
Similar to Leonard&Dhollander_OpenScienceBelgium.pptx
2013 DataCite Summer Meeting - DOIs and Supercomputing (Terry Jones - Oak Rid...datacite
2013 DataCite Summer Meeting - Making Research better
DataCite. Co-sponsored by CODATA.
Thursday, 19 September 2013 at 13:00 - Friday, 20 September 2013 at 12:30
Washington, DC. National Academy of Sciences
http://datacite.eventbrite.co.uk/
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Lawrence-f1000-publishing with data-nfdp13DataDryad
Presentation by Rebecca Lawrence on F1000's initiatives for publishing with data given at the Now and Future of Data Publishing Symposium, 22 May 2013, Oxford, UK
RDM Roadmap to the Future, or: Lords and Ladies of the DataRobin Rice
Story of the new 2017-2020 University of Edinburgh RDM Roadmap, with a Tolkienesque theme for IASSIST-CARTO 2018 in Montreal: "Once upon a data point: sustaining our data storytellers".
Curoverse Presentation at ICG-11 (November 2016)Arvados
Huge genomic datasets are being created all around the world, and their scale is accelerating. But these data gain greater meaning when analyzed in concert with other datasets stored in institutions around the world. Due to data residency restrictions, regulatory barriers, and sheer data volume, it is impossible to effectively centralize all of these data in one place. In order to achieve regional and global use of many data sets in concert, we must overcome these challenges with a new approach to managing, analyzing and sharing sequencing data: Federated Computing.
Federated Computing is difficult from a technical perspective because of the variety of IT infrastructures and workflow engines available, which makes reproducibility across environments nearly impossible, and from a practical perspective because of privacy and competitive concerns among researchers. Federated Computing becomes easier with a scalable, open source, multi-platform, standards-based biomedical big data computing platform that can be deployed in public cloud, private cloud, and HPC environments, and enables bit-for-bit reproducibility of analyses across every deployment.
We present Arvados (http://arvados.org), a free and open source platform for managing and processing biomedical data designed for scale, reproducibility, and federation. Workflows and queries can travel across multiple Arvados clusters, running exactly the same way on each one, regardless of the underlying compute & storage infrastructure.
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
A FAIR Approach to Publishing and Sharing Machine Learning ModelsBen Blaiszik
While there has been a significant increase in the amount of machine learning research across various domains of science, the processes to publish the results and make the resulting models and code available for reuse has been lacking. In this talk, we discuss FAIR data principles applied to machine learning models and how the Data and Learning Hub for Science (DLHub) can help make models more easily discoverable and usable in common scientific workflows. Visit https://www.dlhub.org for more information.
The webinar held 6 October 2020.
The webinar is relevant for new and existing Crossref members, publishers, editors, researchers, service
providers, hosting platforms, funders, librarians; really anyone interested in finding out a bit more about what
Crossref is and does.
This webinar covers:
• How to register content with Crossref
• How to make updates to your metadata in order to make changes, corrections, or to add more detail
• Participation reports
• Additional services and where to find help.
Sessions presented in English by Crossref staff.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Similar to Leonard&Dhollander_OpenScienceBelgium.pptx (20)
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
More information on the community of practice: https://www.openaire.eu/cop-training
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
20221121_KU Leuven Research Data Repository_OpenScienceBelgium.pptxOpenAccessBelgium
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
Presentation at the “Open Science: connecting the actors” event on the 21st of November 2022:
Share best practices, foster community, and encourage knowledge-sharing on Open Science.
At the heart of the Open Access Belgium community is the ambition to open up the way we organize and conduct scientific research.
The Open Science teams of the Belgian universities have developed and tested a wide range of training methods, training materials, networking activities
and data solutions to facilitate and foster Open Science. Achievements, tools and lessons learned by different institutions will be shared in this networking event.
Programme can be found here: https://openaccess.be/2022/10/04/open-science-connecting-the-actors/
the OpenAIRE Research graph is a massive collection of metadata and links connecting research entities such as articles, datasets, software, and other research outputs
Openaccess.be is the central information space for Open Science in Belgium. Open Access Belgium is a collaboration between the Open Science teams of the Belgian universities. Apart from keeping this webpage up to date and writing blogposts about Open Science in Belgium, we also organise a yearly event during Open Access Week for the Belgian Open Science community.
The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013.
In support of its research programme CERN has developed tools for Big Data management and extended Digital Library capabilities for Open Data. Through Zenodo these Big Science tools could be effectively shared with the long-tail of research.
To address problems with the peer-review process, many journals have experimented with open_science_logodifferent types of peer-review models. Open peer review was adopted by several journals in order to encourage transparency in the process, and there are now a number of different ways in which this is implemented. By Axel Cleeremans (ULB), Chief Editor for Frontiers in Psychology, Louisa Flintoft, Executive Editor, BMC In-House Journals.
To address problems with the peer-review process, many journals have experimented with open_science_logodifferent types of peer-review models. Open peer review was adopted by several journals in order to encourage transparency in the process, and there are now a number of different ways in which this is implemented.
• Introduction by Emilie Menz
The section provides an overview of the open science requirements and how to comply with them stipulated by F.N.R.S. Presentation is by Sandrine Brognaux (UMons).
FAIR principles and Open Data explained by Myriam Mertens (UGent) as an introduction to the webianr on FAIR data and Research data management: https://www.youtube.com/watch?v=TEnq2P0r4mo
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
2. Data Curation Requires Datasets
• Data curation: adding value to (meta)data for long-term preservation
• Imagined (ideal) workflow:
1. Researcher provides data to
curator for curation
• Voluntary submission
• Automatic part of ingest in
institutional repository
2. Curator makes changes and
recommendations
3. Data is put online for long-term
preservation
• Is that realistic for many institutions?
4. Proposed Workflow
1. Find datasets online
• Employ existing data linking architectures
• Use repository APIs
2. Produce (meta)data augmentation plan for discovered
datasets
• Develop plan based on current best practices for FAIR metadata
• Recommend changes that maintain existing DOI networks
3. Provide researchers with an easily actionable curation plan
5. Step 1: Where Are The Datasets?
• Difficulties:
• Datasets are broadly distributed
• Affiliation information is not located in a consistent location (or format!)
• Existing data linking systems (e.g., Scholix, DataCite) have limited
coverage
• Solution:
• Use repository APIs to search for institutional datasets
• Search outside of just <creator><affiliation> field
6. Example Python Code
• Python code to search for institutional records
• searchQuery can include multiple items
• Universiteit Gent
• UGent
• Ghent University
• 00cv9y106 (ROR id)
• Saves DOIs of all datasets to csv
• Can use OAI-PMH to extract more metadata information
• Focused on several popular repositories, easily extended
• Zenodo, OSF, Dryad, Figshare, PANGAEA
7. Step 2: What To Do With What You’ve
Found
• Repositories often allow metadata fields to be edited
• WITHOUT triggering the creation of a new version (and therefore a new DOI)
• Editable fields vary by repository:
STRI
CT
LENIEN
T
• Editing any
metadata fields
creates a new
version
• Most fields can be
edited
• Title, authors,
relatedTO
8. Develop Recommendation Plan
• Is the title clear?
• Are keywords provided?
• Are there links to related publications?
• Do the authors have linked ORCIDs or affiliations?
• Is there sufficient documentation in a README?
• Can this information be provided in <description
descriptionType=“Abstract”>?
9. Step 3: Communicating the
Recommendations
• Implementation relies on participation of the researcher
• Curation plan must be easily actionable with clearly articulated benefits
• Reduce burden on researcher to interpret instructions
Metadata Field Current Value
Recommended
Changes
Rationale
Title
Abstract
…
10. Current results
• Currently, the code harvests >2000 total records
• Frequently encountered issues:
• Abstracts redundant with publication
• No direct contact information
• Missing keywords
Source
Number of Records
Found
DataCite 236
Dryad 196
Figshare 302
OSF 186
Pangaea 724
Zenodo 710
11. Conclusions
• Relatively simple method to provide value to existing datasets
• Benefits even if author declines to make recommended edits:
• Helps institution find their research outputs
• Provides researchers with FAIRness recommendations that they can
implement for future datasets
• Communicates the existence (and utility!) of data support staff
Thank you. Today we are going to talk about post-ingest curation, and what we at UGent have been considering to curate research data despite not having an institutional repository.
As I’m sure you are all aware, research institutions are becoming increasingly aware of the importance and utility of data curation, which, for our purposes, we can broadly define as any activity that adds value to data or metadata prior to its long-term preservation in a data repository. It is usually conceptualized as an ideal workflow, wherein the researcher provides their data to a curator for curation. This can either be a voluntary submission, as envisioned here in this diagram from the Data Curation Network, with the researcher actively seeking out and requesting the assistance of a curator, or it can be automatic, such as when a researcher deposits their data in an institutional repository and that institution’s curators can immediately begin to work on the data, making the curation a necessary part of the data’s path towards preservation. Regardless, the curator is then able to make changes and recommendations, ideally through some kind of back and forth dialogue with the researcher, before finally the curated data is put online for long-term preservation. Therefore, in this conception, curators always get to curate the data BEFORE it is published online. What we asked in this project is how realistic that workflow is for most researchers and institutions, and whether an alternate model might be necessary for cases in which such curation is not so automatic.
Looking outside the ideal, we turned our focus on what often actually happens to datasets in the research data lifecycle. The scientist completes some research project, generating a manuscript for publication and some associated data. They submit their manuscript to a journal and (as it approaches acceptance) have to publish their data online. Even if the researcher knows about curation services available to them at their institution, they might not feel that they have time to go through rounds of curation as they need their datasets online NOW, and so they circumvent the data curators and deposit the datasets directly into the general or domain-specific repository of their choice. They annotate the dataset with metadata according to their own understanding of best practices and what little time they have available to dedicate to documentation, and the data then sits online in the repository without ever having the opportunity to have value added to it by data curation specialists.
Note that, while this workflow is possible for researchers from any institution, it’s especially likely for institutions that don’t have their own institutional repository, as the data curators will never have datasets automatically pass through their desks on the way to the institutional repository.
What we aimed to do in this project is to define an alternate workflow for curators, wherein they can go out and find these datasets where they are posted online. Then, once they have knowledge of these datasets that are associated with their institution, they can develop individualized recommendation plans for the creators of those datasets, with the hope that the researcher implements those changes, thereby improving the FAIRness of those datasets.
Our proposed workflow comes in three steps:
First, we find the datasets that have already been posted online. To do so, we first looked at existing data linking architectures, such as Scholix, but ultimately ended up relying on repository APIs for the most popular repositories for researchers from our institution.
Then, once we’ve found the datasets through these various methods, we can develop augmentation plans for these uncovered datasets. This is because many popular repositories actually allow users to edit the metadata of their published datasets without triggering the generation of a new version, therefore preserving existing DOI link networks. So, it should not be considered “too late” to curate a dataset just because it has already been hosted online. There are still things that can be done to improve its FAIRness.
Finally, once we’ve developed a set of recommendations for a given dataset in an online repository, the last step is to create an action plan that can be communicated to the researcher, providing them with an easily actionable way to improve the FAIRness of their own datasets.
So, the first step is to find the datasets online. This is more difficult than it sounds, because datasets are broadly distributed across many different repositories. To make matters worse, the affiliation information is not consistent, in location or format. Some records have the affiliation information associated with the creators. Some use the name of the institution written out in full, whereas others use the ROR, a specific id for institutions. For these and other complicated reasons, existing data linking systems end up missing a lot of the datasets that are out there. This can be easily verified… If you compare the results from using these services to just going onto one of these repository pages and entering the name of your institution into the search bar, you’ll find many records that these systems fail to pick up.
Our solution then is to use the APIs to find as many additional institutional datasets as possible, and wherever possible, by searching outside of the CREATOR:AFFILIATION field.
We’ve written some python code which harnesses the APIs of popular repositories to search for institutional datasets. Importantly for us, and probably for many institutions in Belgium, is our institution is known by many names, all of which we see authors freely use when tagging their datasets. As currently implemented, the code saves the DOIs of all the datasets it finds to a csv because that is most important to ingest into the systems that we use, but you could easily use OAI-PMH or alternate systems to extract more metadata.
Lastly, we focused on the main repositories which are used by researchers from Ghent University, but this could easily be extended to focus on other repositories, insofar as they have APIs to plug into.
Once the dataset records have been located, the next step is to figure out what to do with what you’ve found. The first part of that is determining what you can edit without triggering the creation of a new version (and therefore a new DOI). Even though these new DOIs are typically linked to the DOIs of the older versions, our thought was that it is avoid these potential issues. Different repositories vary with respect to which metadata fields are editable without triggering a new version, from very strict repositories (like Dryad) which allow essentially no editing, to very lenient repositories like Zenodo, for which you can edit almost anything, including the title, abstract, and authors.
Once you’ve decided which metadata fields are in principle editable, you can then develop an individualized recommendation plan for that record. What exact recommendations you provide will depend on your institution’s priorities, current best practices, but we’ve collected here a few of the major items that could be included in such a plan: is the title clear? Are there keywords? Has it been linked to a publication? Are there ORCIDs linked? Have the authors provided something like an ROR? Did they provide a detailed README, and if not, could that information be provided in the abstract field?
The last step is to communicate the recommendation plan to the researcher. Because the actual implementation of the plan relies on the participation of the researcher, steps should be taken to maximize the likelihood that they cooperate. For this, we envision a clearly articulated plan like in the table shown here, which outlines the metadata field in question, what that value currently contains, what the curators believe should be changed for that field, and their rationale. Anything that can reduce the burden on the researcher and lets them clearly see the reasoning and benefit behind the recommendations.
This is all very interesting off course but as the saying goes: ‘the proof of the pudding is in the eating’, so here are some of our results. By running the code we gathered over 2000 dataset records from five major repositories and DataCite. We analysed a subset of these records to get some idea of the issues we will encounter in the future. A first issue is the redundancy of abstracts: most datasets have the same abstract as their corresponding publication. This isn’t necessarily a big issue when a datasheet or README is provided for the dataset but when the abstract is the only information on the content of the dataset, not enough information might be present for researchers to reuse the data. A possible solution to thids was mentioned a few slides ago: a README could be provided in the abstract metadata field.
A second issue is the absence of contact information in the dataset metadata. In most cases the contact information is found via the linked publication, so contact information can be found but this is not a good practice, we want to encourage researchers to provide contact information in their dataset metadata as well.
A third issue concerns keywords as in: there are no keywords provided. The dataset should at least have some of the related publication’s keywords and ideally have its own specific keywords to improve FAIRness.
This is the basis of our proposal to provide curatorial benefits after a dataset has been already uploaded to a repository. Of course, we don’t want to suggest that this solves all problems, and it will not find ALL datasets, but it still has several benefits. Even if the author declines to make the recommended edits, the integration with repository APIs helps institutions find their research outputs. And the emails to the researchers, which include the detailed plan of how to improve the FAIRness of their datasets, provides the researchers with knowledge that they can carry with them in the future, and works as a way to let them know of the curatorial services that your institution might offer, and how they can help improve their online data.
If you would like more information, please contact us. Thank you for your attention!