Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
Â
This document summarizes a webinar on using electronic laboratory notebooks (eLNs). The webinar featured a presentation by Dr. Ulrich Dirnagl on his experience using eLNs to make research teams more efficient. He believes paper notebooks are outdated and that eLNs can help address the reproducibility crisis in research. The webinar covered the benefits of eLNs like collaboration, data sharing, and compliance with regulations. It also reviewed different types of eLNs and pricing models. While implementation challenges exist, eLNs were found to improve oversight, record keeping, and transparency if selected and supported properly.
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
Using Open Science to advance science - advancing open data Robert Oostenveld
Â
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
Â
This is the presentation I gave at the monthly meeting of the Donders Institute PhD council. It shortly explains the Donders Repository, but mainly addresses how to deal with direct and indirectly identifying personal data, with anonymization, pseudomimization and de-identification, and with blurring of research data prior to sharing.
This document discusses provenance and research objects. It introduces key concepts from the PROV model including entities, activities, and agents. It explains how research objects can bundle digital resources from a scientific experiment along with provenance and context. Finally, it provides an example of capturing provenance from workflow runs using the Common Workflow Language and storing it in a research object bundle.
This is a keynote that I have given in polyweb workshop on the state of the art of data science reproducibility. I review tools that have been developed over the last few years in the first part. In the second part, I focus on proposals that I have been involved in to facilitate workflow reproducibility and preservation.
Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
Â
This document summarizes a webinar on using electronic laboratory notebooks (eLNs). The webinar featured a presentation by Dr. Ulrich Dirnagl on his experience using eLNs to make research teams more efficient. He believes paper notebooks are outdated and that eLNs can help address the reproducibility crisis in research. The webinar covered the benefits of eLNs like collaboration, data sharing, and compliance with regulations. It also reviewed different types of eLNs and pricing models. While implementation challenges exist, eLNs were found to improve oversight, record keeping, and transparency if selected and supported properly.
The document provides an overview of the Donders Repository, which aims to securely store original research data, document the research process, and make data accessible to researchers and the public. It describes the procedural design including different roles, collection types, and states. The technical architecture is based on IRODS software and scalable storage. The repository fits into researchers' workflows and supports the timeline of projects from initiation to data sharing. Standards like BIDS help make neuroimaging data FAIR (Findable, Accessible, Interoperable, Reusable).
Using Open Science to advance science - advancing open data Robert Oostenveld
Â
This document discusses using open science practices like open data to advance science. It notes the benefits of open data like improved reproducibility and opportunities for data mining. However, sharing neuroimaging and other human subject data presents challenges regarding data size, sensitivity, and privacy regulations. The document promotes using the Brain Imaging Data Structure (BIDS) format to organize data in an open, standardized way. It also discusses the gradient between personal/identifiable data that requires protection and de-identified research data that can be shared, as well as legal constraints and appropriate repositories for sharing data responsibly.
Donders Repository - removing barriers for management and sharing of research...Robert Oostenveld
Â
This is the presentation I gave at the monthly meeting of the Donders Institute PhD council. It shortly explains the Donders Repository, but mainly addresses how to deal with direct and indirectly identifying personal data, with anonymization, pseudomimization and de-identification, and with blurring of research data prior to sharing.
This document discusses provenance and research objects. It introduces key concepts from the PROV model including entities, activities, and agents. It explains how research objects can bundle digital resources from a scientific experiment along with provenance and context. Finally, it provides an example of capturing provenance from workflow runs using the Common Workflow Language and storing it in a research object bundle.
This is a keynote that I have given in polyweb workshop on the state of the art of data science reproducibility. I review tools that have been developed over the last few years in the first part. In the second part, I focus on proposals that I have been involved in to facilitate workflow reproducibility and preservation.
This short document appears to be a digital file with metadata about its production and distribution. It includes the title "JenH2k", copyright and producer information, confirms it was produced by "GOLDENTEAM", and provides a URL and view count possibly related to its online distribution.
Dengue hearhgic fever by dr muhammad tuseef javedTauseef Jawaid
Â
This document discusses dengue hemorrhagic fever (DHF), a severe mosquito-borne viral illness characterized by increased vascular permeability, hypovolemia, and abnormal blood clotting. It is caused by any of four dengue virus serotypes and transmitted by the Aedes aegypti mosquito. DHF symptoms include high fever, bleeding, and plasma leakage that can lead to shock. Treatment focuses on fluid replacement and management of bleeding and shock. Prevention emphasizes eliminating mosquito breeding sites and using protective measures against mosquito bites.
This document provides instructions for accessing and using the Value Line Research Center database. It begins with four steps to access the database through the library's website. It then provides background information on Value Line's mission and the types of investment information and analysis it offers. The document explains that the library subscribes to several Value Line publications available through the database. It concludes by demonstrating how to navigate to the publications and search through company reports, which have the same layout as the print versions.
Scholastic photojournalists and the publication of graphic, spot news imagesBradley Wilson
Â
A case study examination of how scholastic photojournalists compare with their advisers and professional photojournalists regarding the publication of various images from the Boston Marathon bombing. This presentation also shows how the case study approach and use of current events can be included into the classroom.
NEOBR Board of Realtors Monthly meeting for February 2013. The topic is ZipForm training conducted by Jeff Savage, CRS, e-PRO, SRES with RE/MAX Grand Lake in Grove, Oklahoma.
The document describes the scheduling flow of jobs in a pipelined system. It shows the scheduling of 5 jobs (JOB0 to JOB4) based on certain conditions. The jobs have multiple steps that need to complete with specific condition codes for the next job to be scheduled. For example, JOB1 can only be scheduled after STEP01 of JOB0 completes with a condition code of 0. The scheduling aims to keep the pipeline full by scheduling the next job as soon as the conditions allow.
World hunger is a global issue caused by factors like colonialism, wars, climate change and insufficient aid that destroy environments and agriculture. The UN's Food and Agriculture Organization is responsible for ensuring global food security and reducing hunger, which disproportionately impacts countries in Africa, Asia, and South America as shown in the provided graph. Potential solutions need to be implemented to alleviate hunger worldwide.
This document discusses transmedia storytelling, which is telling stories across multiple platforms. Transmedia allows for dynamic content, participation from audiences, and interactive messages rather than static, passive content. It provides examples of brands like Nike and EA Sports that have used transmedia successfully. The key aspect of transmedia is contextualized content that keeps the core story constant while varying how it is told across different platforms.
Wij zijn een praktisch Verandermanagementbureau met een drive voor excellente uitvoering. Onze aanpak zorgt voor een betere samenwerking tussen afdelingen en gaat verspilling in bedrijfsprocessen tegen. Het gevolg is betrokken medewerkers, snellere klantbediening en een beter financieel resultaat.
We baseren onze werkwijze op erkende methodes als Lean en Prince 2. Met onze aanpak wordt Lean weer LeaRn. Bij ons geen uitgebreide powerpoints of theoretische rapporten. Wij gaan direct aan de slag met uw mensen. Daarbij combineren we âhardâ skills (feiten & cijfers) en âsoftâ skills (gedrag). Als we geen workshops houden met uw mensen vindt u NRG-Advicers op de werkvloer. We stellen vragen, luisteren, motiveren en durven te confronteren.
Mooie organisaties* in de service- & maakindustrie hebben wij reeds mogen begeleiden bij het doorvoeren van procesverbeteringen en het invoeren van een cultuur van continu verbeteren.
Het is onze intrinsieke motivatie uw organisatie duurzaam beter te laten presteren. Eerder zijn we niet tevreden!
zJOS System Events Automation Users GuideDeru Sudibyo
Â
This chapter discusses preparing the zJOS address spaces needed to run zJOS/Sekar. It describes the XDI procedure JCL, including parameters for the main XDI system, license key, load library, and parameter library. The chapter emphasizes not editing key members like XDIEMScc directly unless very familiar with the product internals.
Harmon Dobson founded Whataburger in 1950 in Corpus Christi, Texas. He started with a portable stand selling burgers for 25 cents and saw success expanding to multiple locations across Texas, Tennessee, and Florida. Dobson ran Whataburger as a sole proprietorship and later partnership with his wife Grace. After Dobson's death in 1967, his son Tom Dobson became president and moved headquarters to San Antonio, continuing to grow the Whataburger brand across the southern United States based on Harmon Dobson's vision of quality burgers and customer service.
Perception of Women in the Mining and Mineral Exploration Industries, a Canad...MafaldaArias
Â
This document discusses perceptions of women in the mining and mineral exploration industries from a Canadian perspective. It provides an overview of the author's views on increased awareness and acceptance of women in mining. It notes the advantages of a diversified workforce and presents facts about the low percentages of women in mining-related fields and labor forces. Examples from two Canadian mining companies that have initiatives to attract and retain female employees are also given. Finally, the history of women's groups in the Canadian mining industry from the 1970s onward is briefly outlined.
Bungy jumping is an extreme sport that involves jumping from a high place while attached to a large elastic cord. Participants jump from bridges, buildings, or other structures and experience a thrilling free fall before being bounced back up by the cord. Though dangerous if not performed properly, bungy jumping provides an adrenaline-pumping experience for thrill-seekers looking for an adventurous activity.
1. The document discusses the challenges of widespread adoption of e-research technologies by everyday researchers. While early adopters found success, most researchers are not using the infrastructure services that have been created.
2. It argues that repositories and other e-research tools need to focus on the needs and perspectives of researchers. Researchers work with data, so tools should emphasize data sharing and metadata. They should also support collaboration and open participation in the scientific process.
3. For technologies to truly enable new forms of research, their use needs to become integrated into the everyday work of all researchers, not just a specialized few. Systems must be easy to use, empower researchers' autonomy, and intersect seamlessly with digital and physical
1. The document discusses the challenges of adopting e-research technologies by everyday researchers and moving from specialized scientists doing specialized science to widespread adoption.
2. It proposes a more data-centric and collaborative approach focused on the social process of science and empowering researchers.
3. Key lessons for repositories include understanding user needs, being open-minded about problems and solutions, embracing the web instead of creating barriers, and thinking of repositories as a cloud service instead of an institutional system.
This short document appears to be a digital file with metadata about its production and distribution. It includes the title "JenH2k", copyright and producer information, confirms it was produced by "GOLDENTEAM", and provides a URL and view count possibly related to its online distribution.
Dengue hearhgic fever by dr muhammad tuseef javedTauseef Jawaid
Â
This document discusses dengue hemorrhagic fever (DHF), a severe mosquito-borne viral illness characterized by increased vascular permeability, hypovolemia, and abnormal blood clotting. It is caused by any of four dengue virus serotypes and transmitted by the Aedes aegypti mosquito. DHF symptoms include high fever, bleeding, and plasma leakage that can lead to shock. Treatment focuses on fluid replacement and management of bleeding and shock. Prevention emphasizes eliminating mosquito breeding sites and using protective measures against mosquito bites.
This document provides instructions for accessing and using the Value Line Research Center database. It begins with four steps to access the database through the library's website. It then provides background information on Value Line's mission and the types of investment information and analysis it offers. The document explains that the library subscribes to several Value Line publications available through the database. It concludes by demonstrating how to navigate to the publications and search through company reports, which have the same layout as the print versions.
Scholastic photojournalists and the publication of graphic, spot news imagesBradley Wilson
Â
A case study examination of how scholastic photojournalists compare with their advisers and professional photojournalists regarding the publication of various images from the Boston Marathon bombing. This presentation also shows how the case study approach and use of current events can be included into the classroom.
NEOBR Board of Realtors Monthly meeting for February 2013. The topic is ZipForm training conducted by Jeff Savage, CRS, e-PRO, SRES with RE/MAX Grand Lake in Grove, Oklahoma.
The document describes the scheduling flow of jobs in a pipelined system. It shows the scheduling of 5 jobs (JOB0 to JOB4) based on certain conditions. The jobs have multiple steps that need to complete with specific condition codes for the next job to be scheduled. For example, JOB1 can only be scheduled after STEP01 of JOB0 completes with a condition code of 0. The scheduling aims to keep the pipeline full by scheduling the next job as soon as the conditions allow.
World hunger is a global issue caused by factors like colonialism, wars, climate change and insufficient aid that destroy environments and agriculture. The UN's Food and Agriculture Organization is responsible for ensuring global food security and reducing hunger, which disproportionately impacts countries in Africa, Asia, and South America as shown in the provided graph. Potential solutions need to be implemented to alleviate hunger worldwide.
This document discusses transmedia storytelling, which is telling stories across multiple platforms. Transmedia allows for dynamic content, participation from audiences, and interactive messages rather than static, passive content. It provides examples of brands like Nike and EA Sports that have used transmedia successfully. The key aspect of transmedia is contextualized content that keeps the core story constant while varying how it is told across different platforms.
Wij zijn een praktisch Verandermanagementbureau met een drive voor excellente uitvoering. Onze aanpak zorgt voor een betere samenwerking tussen afdelingen en gaat verspilling in bedrijfsprocessen tegen. Het gevolg is betrokken medewerkers, snellere klantbediening en een beter financieel resultaat.
We baseren onze werkwijze op erkende methodes als Lean en Prince 2. Met onze aanpak wordt Lean weer LeaRn. Bij ons geen uitgebreide powerpoints of theoretische rapporten. Wij gaan direct aan de slag met uw mensen. Daarbij combineren we âhardâ skills (feiten & cijfers) en âsoftâ skills (gedrag). Als we geen workshops houden met uw mensen vindt u NRG-Advicers op de werkvloer. We stellen vragen, luisteren, motiveren en durven te confronteren.
Mooie organisaties* in de service- & maakindustrie hebben wij reeds mogen begeleiden bij het doorvoeren van procesverbeteringen en het invoeren van een cultuur van continu verbeteren.
Het is onze intrinsieke motivatie uw organisatie duurzaam beter te laten presteren. Eerder zijn we niet tevreden!
zJOS System Events Automation Users GuideDeru Sudibyo
Â
This chapter discusses preparing the zJOS address spaces needed to run zJOS/Sekar. It describes the XDI procedure JCL, including parameters for the main XDI system, license key, load library, and parameter library. The chapter emphasizes not editing key members like XDIEMScc directly unless very familiar with the product internals.
Harmon Dobson founded Whataburger in 1950 in Corpus Christi, Texas. He started with a portable stand selling burgers for 25 cents and saw success expanding to multiple locations across Texas, Tennessee, and Florida. Dobson ran Whataburger as a sole proprietorship and later partnership with his wife Grace. After Dobson's death in 1967, his son Tom Dobson became president and moved headquarters to San Antonio, continuing to grow the Whataburger brand across the southern United States based on Harmon Dobson's vision of quality burgers and customer service.
Perception of Women in the Mining and Mineral Exploration Industries, a Canad...MafaldaArias
Â
This document discusses perceptions of women in the mining and mineral exploration industries from a Canadian perspective. It provides an overview of the author's views on increased awareness and acceptance of women in mining. It notes the advantages of a diversified workforce and presents facts about the low percentages of women in mining-related fields and labor forces. Examples from two Canadian mining companies that have initiatives to attract and retain female employees are also given. Finally, the history of women's groups in the Canadian mining industry from the 1970s onward is briefly outlined.
Bungy jumping is an extreme sport that involves jumping from a high place while attached to a large elastic cord. Participants jump from bridges, buildings, or other structures and experience a thrilling free fall before being bounced back up by the cord. Though dangerous if not performed properly, bungy jumping provides an adrenaline-pumping experience for thrill-seekers looking for an adventurous activity.
1. The document discusses the challenges of widespread adoption of e-research technologies by everyday researchers. While early adopters found success, most researchers are not using the infrastructure services that have been created.
2. It argues that repositories and other e-research tools need to focus on the needs and perspectives of researchers. Researchers work with data, so tools should emphasize data sharing and metadata. They should also support collaboration and open participation in the scientific process.
3. For technologies to truly enable new forms of research, their use needs to become integrated into the everyday work of all researchers, not just a specialized few. Systems must be easy to use, empower researchers' autonomy, and intersect seamlessly with digital and physical
1. The document discusses the challenges of adopting e-research technologies by everyday researchers and moving from specialized scientists doing specialized science to widespread adoption.
2. It proposes a more data-centric and collaborative approach focused on the social process of science and empowering researchers.
3. Key lessons for repositories include understanding user needs, being open-minded about problems and solutions, embracing the web instead of creating barriers, and thinking of repositories as a cloud service instead of an institutional system.
This document summarizes key aspects of computational research methods and the myExperiment platform. It discusses how myExperiment allows researchers to automate, share, and reuse workflows and other methods. It also addresses challenges around reproducibility, provenance, collaboration, and incentives for sharing methods. MyExperiment provides social features and aims to build a community around openly exchanging and improving computational research techniques.
Metadata and Semantics Research Conference, Manchester, UK 2015
Research Objects: why, what and how,
In practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not âfinishedâ: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples
Iâll present our practical experiences of the why, what and how of Research Objects.
myExperiment - Defining the Social Virtual Research EnvironmentDavid De Roure
Â
myExperiment is a social networking site for scientists to share workflows, data, and other research objects. It allows users to create profiles, join groups, and share content while maintaining control over privacy. The site aims to facilitate collaboration and reuse in scientific research. It was launched in 2007 and has over 1000 registered users sharing hundreds of workflows and other research objects. The open source software powering the site can also be downloaded and customized for specific communities or projects.
This document discusses software as a research object and the importance of research software. Some key points:
- Many researchers rely on software for their work but few have formal software training. Software is integral to modern research.
- Studies have found low reproducibility in scientific publications due to issues with unavailable software and code. Proper documentation and sharing of research software is needed.
- The Software Sustainability Institute aims to cultivate better, more sustainable research software to enable world-class research. They provide training, community support, and advocate for improved software practices and policies.
- Culture change is needed to incentivize sharing of research software and code. Mechanisms are emerging to properly credit software contributions and cite
Preserving the Inputs and Outputs of Scholarshiptsbbbu
Â
Tim Babbitt discusses the changing context of research and scholarship due to digitization and the internet. The inputs and outputs of research are increasingly digital and complex, including data, code, presentations, and more. ProQuest has a history of preserving scholarship through microfilming and is exploring how to preserve the full range of digital scholarly outputs and their linkages in a sustainable way. Key questions include balancing new and old preservation methods and moving beyond preserving individual objects to also preserving networks and linkages between scholarly works.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
This document discusses open science and provides examples of open science practices and their benefits. It contains the following key points:
1. It introduces three speakers who will discuss open science from different perspectives: the CEO of Kitware will discuss why and how to practice open science, a Kitware engineer will discuss building an open source research program, and a Sandia National Labs researcher will discuss open science in government research collaborations.
2. It outlines the core principles of open science including openly documenting hypotheses, methods, data, and results to ensure reproducibility.
3. It provides examples of open science journals and publishing platforms that integrate publications, code, data and results to enable reproducibility.
4. It discusses benefits
This document discusses data citation and using identifiers to cite datasets. It explains that identifiers provide exposure, transparency, citation tracking and verification for datasets. Identifiers associate an alphanumeric string with the location of an object, like a dataset, and can include optional metadata. Common identifier systems like DOIs provide a precise way to identify and cite datasets. Services like EZID make it easy to create and manage identifiers for datasets. The document encourages attendees to get started with data citation by creating test identifiers and discussing options with librarians.
MyExperiment Ăš un social network allâinterno del quale Ăš possibile cercare flussi di lavoro scientifico resi pubblici, ma anche proporre, condividere e svilupparne di nuovi, al fine di creare delle comunitĂ e sviluppare relazioni. La presentazione illustra la visione di My Experiment sull' Open Science
Dave de Roure - The myExperiment approach towards Open Scienceshwu
Â
Dave de Roure's talk on myExperiment, including thoughts on protocol and workflow sharing and online communities. Presented at the Open Science workshop at the Pacific Symposium on Biocomputing, January 5th, 2009
Facilitate Research Communities Adoption of Open Science Publishing Principle...OpenAIRE
Â
Pre-conference Workshop: Facilitate Research Communities Adoption of Open Science Publishing Principles: The Role of Repositories and the OpenAIRE-Connect Services.
COAR Annual Meeting, May 21, 2019 - Lyon, France
Hosted by The Center for Direct Scientific Communication (CCSD).
The Purdue University Research Repository (PURR) is a collaboration between Purdue Libraries, ITaP, and OVPR that provides data services including a virtual research environment, data publication with DOIs, and archiving. It is a free online platform for Purdue faculty, students, and staff to manage, share, and publish research data. PURR assists with data management plans, offers collaborative project spaces, publishes datasets with DOIs for increased citation and reuse, and preserves datasets for 10 years with potential for permanent inclusion in the Libraries collection. A demonstration showed how PURR can be used by researchers.
Gridforum David De Roure Newe Science 20080402vrij
Â
The document discusses the evolution of e-Science and how it enables new forms of collaborative research. Key points include:
- e-Science has progressed from specialized teams doing "heroic science" to everyday researchers conducting routine research using ubiquitous digital tools and data sharing.
- Web 2.0 technologies and approaches like open data, workflows, and social networking are empowering researchers and supporting new types of collaborative, data-driven science.
- Future e-Science relies on making these technologies simple and accessible to researchers from all domains to further break down barriers to collaborative, data-centric research.
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
Â
Presentation at the First Open Global Systems Science Conference, Brussels, 8-10 November 2012
http://www.gsdp.eu/nc/news/news/date/2012/10/31/first-open-global-systems-science-conference/
This document discusses open science and its various components such as open data, open access, open code, and open peer review. It emphasizes that open science promotes transparency, collaboration, and reproducibility. While open science aims to make research more accessible and equitable, the document notes that open science faces challenges in terms of widespread adoption due to entrenched publishing and evaluation practices that still prioritize commercial publishers and journal impact factors over open principles. It calls for more action and systemic changes to fully realize the goals of open science.
The goal of this talk is to highlight open source opportunities for students especially through an opportunity to earn $5000 through Google Summer of Code program. I will discuss some of the tips on how to engage with open source communities, the befits for contributing. I will provide motivating examples on how students can gain significant experience in contributing challenging distributed systems problems while impacting scientific research. I will specifically focus with a concrete example of Apache Airavata software suite for Web-based science gateways. I will list some example GSoC topics of interest and provide some recipes for success in getting accepted and navigating through success.
Data Curation and Debugging for Data Centric AIPaul Groth
Â
It is increasingly recognized that data is a central challenge for AI systems - whether training an entirely new model, discovering data for a model, or applying an existing model to new data. Given this centrality of data, there is need to provide new tools that are able to help data teams create, curate and debug datasets in the context of complex machine learning pipelines. In this talk, I outline the underlying challenges for data debugging and curation in these environments. I then discuss our recent research that both takes advantage of ML to improve datasets but also uses core database techniques for debugging in such complex ML pipelines.
Presented at DBML 2022 at ICDE - https://www.wis.ewi.tudelft.nl/dbml2022
Content + Signals: The value of the entire data estate for machine learningPaul Groth
Â
Content-centric organizations have increasingly recognized the value of their material for analytics and decision support systems based on machine learning. However, as anyone involved in machine learning projects will tell you the difficulty is not in the provision of the content itself but in the production of annotations necessary to make use of that content for ML. The transformation of content into training data often requires manual human annotation. This is expensive particularly when the nature of the content requires subject matter experts to be involved.
In this talk, I highlight emerging approaches to tackling this challenge using what's known as weak supervision - using other signals to help annotate data. I discuss how content companies often overlook resources that they have in-house to provide these signals. I aim to show how looking at a data estate in terms of signals can amplify its value for artificial intelligence.
Data Communities - reusable data in and outside your organization.Paul Groth
Â
Description
Data is a critical both to facilitate an organization and as a product. How can you make that data more usable for both internal and external stakeholders? There are a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data (re)use. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data. I put this in the context of the notion data communities that organizations can use to help foster the use of data both within your organization and externally.
The literature contains a myriad of recommendations, advice, and strictures about what data providers should do to facilitate data reuse. It can be overwhelming. Based on recent empirical work (analyzing data reuse proxies at scale, understanding data sensemaking and looking at how researchers search for data), I talk about what practices are a good place to start for helping others to reuse your data.
Presentation for NEC Lab Europe.
Knowledge graphs are increasingly built using complex multifaceted machine learning-based systems relying on a wide of different data sources. To be effective these must constantly evolve and thus be maintained. I present work on combining knowledge graph construction (e.g. information extraction) and refinement (e.g. link prediction) in end to end systems. In particular, I will discuss recent work on using inductive representations for link predication. I then discuss the challenges of ongoing system maintenance, knowledge graph quality and traceability.
The document discusses knowledge graphs and their future directions. It summarizes a panel discussion on knowledge graphs at ESWC 2020 and references several papers on industry-scale knowledge graphs, weak supervision for knowledge graph construction, and representing entities and identities in knowledge bases. It concludes that knowledge graph construction involves complex pipelines with many components and calls for an updated theory of knowledge engineering to address the demands of modern knowledge graphs at large scale and with continuous changes.
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
Â
Thinking about the need for deeper provenance for knowledge graphs but also using knowledge graphs to enrich provenance. Presented at https://seminariomirianandres.unirioja.es/sw19/
This document discusses how semantic technologies can help link datasets to publications and institutions to enable new forms of data search and showcasing. It notes that standard schemas and formats are needed to allow linkages between data repositories. Knowledge graphs can help relate entities like papers, authors and institutions to facilitate disambiguation and multi-institutional search capabilities. Semantic technologies are seen as central to efficiently building these linkages at scale across the research data ecosystem.
This document discusses Elsevier's Health Knowledge Graph (H-Graph) which connects Elsevier healthcare products, data, and content to power advanced clinical decision support applications. The H-Graph contains over 400,000 medical concepts with 4.9 million semantic relations extracted from medical literature using natural language processing. It aims to integrate Elsevier's existing products through linked data standards while minimizing impact on current workflows. The document outlines Elsevier's approach to linked data, including the need to control namespaces and prioritize developer experience.
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
Â
Over the past 5 years, we have seen multiple successes in the development of knowledge graphs for supporting science in domains ranging from drug discovery to social science. However, in order to really improve scientific productivity, we need to expand and deepen our knowledge graphs. To do so, I believe we need to address two critical challenges: 1) dealing with low resource domains; and 2) improving quality. In this talk, I describe these challenges in detail and discuss some efforts to overcome them through the application of techniques such as unsupervised learning; the use of non-experts in expert domains, and the integration of action-oriented knowledge (i.e. experiments) into knowledge graphs.
More ways of symbol grounding for knowledge graphs?Paul Groth
Â
This document discusses various ways to ground the symbols used in knowledge graphs. It describes the traditional "symbol grounding problem" where symbols are defined based only on other symbols. It then outlines several approaches to grounding symbols in non-symbolic ways, such as by linking them to perceptual modalities like images, audio, and simulation. It also discusses grounding symbols via embeddings, relationships to physical entities, and operational semantics. The document argues that richer grounding could help integrate these notions and enhance interoperability, exchange, identity, and reasoning over knowledge graphs.
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
Â
Presentation at the IJCAI 2018 Industry Day
Elsevier serves researchers, doctors, and nurses. They have come to expect the same AI based services that they use in everyday life in their work environment, e.g.: recommendations, answer driven search, and summarized information. However, providing these sorts of services over the plethora of low resource domains that characterize science and medicine is a challenging proposition. (For example, most of the shelf NLP components are trained on newspaper corpora and exhibit much worse performance on scientific text). Furthermore, the level of precision expected in these domains is quite high. In this talk, we overview our efforts to overcome this challenge through the application of four techniques: 1) unsupervised learning; 2) leveraging of highly skilled but low volume expert annotators; 2) designing annotation tasks for non-experts in expert domains; and 4) transfer learning. We conclude with a series of open issues for the AI community stemming from our experience.
Progressive Provenance Capture Through Re-computationPaul Groth
Â
Provenance capture relies upon instrumentation of processes (e.g. probes or extensive logging). The more instrumentation we can add to processes the richer our provenance traces can be, for example, through the addition of comprehensive descriptions of steps performed, mapping to higher levels of abstraction through ontologies, or distinguishing between automated or user actions. However, this instrumentation has costs in terms of capture time/overhead and it can be difficult to ascertain what should be instrumented upfront. In this talk, I'll discuss our research on using record-replay technology within virtual machines to incrementally add additional provenance instrumentation by replaying computations after the fact.
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
Â
Keynote Integrative Bioinformatics 2018
https://docs.google.com/document/d/1E7D4_CS0vlldEcEuknXjEnSBZSZCJvbI5w1FdFh-gG4/edit
Can we improve research productivity through providing answers stemming from knowledge graphs? In this presentation, I discuss different ways of building and combining knowledge graphs.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
Â
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
The need for a transparent data supply chainPaul Groth
Â
1. The document discusses the need for transparency in data supply chains. It notes that data goes through multiple steps as it is collected, modeled, and applied in applications.
2. It illustrates the complexity of data supply chains using examples of how data is reused and integrated from multiple sources to build models and how bias can propagate.
3. The document argues that transparency is important to understand where data comes from, how it has been processed, and help address issues like bias, privacy, or other problems at their source in the data supply chain.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Â
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
Â
An English đŹđ§ translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech đšđż version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Â
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Â
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Â
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
Â
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether youâre at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. Weâll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind fĂŒr viele in der HCL-Community seit letztem Jahr ein heiĂes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und LizenzgebĂŒhren zu kĂ€mpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklĂ€ren Ihnen, wie Sie hĂ€ufige Konfigurationsprobleme lösen können, die dazu fĂŒhren können, dass mehr Benutzer gezĂ€hlt werden als nötig, und wie Sie ĂŒberflĂŒssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige AnsĂ€tze, die zu unnötigen Ausgaben fĂŒhren können, z. B. wenn ein Personendokument anstelle eines Mail-Ins fĂŒr geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche FĂ€lle und deren Lösungen. Und natĂŒrlich erklĂ€ren wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt nĂ€herbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Ăberblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und ĂŒberflĂŒssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps fĂŒr hĂ€ufige Problembereiche, wie z. B. Team-PostfĂ€cher, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Fueling AI with Great Data with Airbyte WebinarZilliz
Â
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Â
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Â
Are you ready to revolutionize how you handle data? Join us for a webinar where weâll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, weâll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sourcesâfrom PDF floorplans to web pagesâusing FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether itâs populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
Weâll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Â
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
4. MEET JULIE
PhD Student
âinstitutional influences on
patterns of collaboration in
producing research of
interdisciplinary characterâ
Faculteit der Exacte Wetenschappen
6. I AM NOT A LAWYER
Web of Knowledge Terms of Use
You are entitled to access the product, download or extract reasonable amounts of data from the
product that are required for the activities you carry out individually or as part of your employment, and
include insubstantial portions of extracted data in your work documents and reports, provided that such
documents or reports are for the benefit of (and belong to) your organization, or where such documents
or reports are intended for the benefit of third parties (not your organization ), extracted data is
immaterial in the context of such documents or reports and used only for illustrative/demo purposes.
Thomson Reuters determines a âreasonable amountâ of data to download by comparing your download
activity against the average annual download rates for all Thomson Reuters clients using the product in
question. Thomson Reuters determines an âinsubstantial portionâ of downloaded data to mean an
amount of data taken from the product which (1) would not have significant commercial value of its
own; and (2) would not act as a substitute for access to a Thomson Reuters product for someone who
does not have access to the product.
You are not entitled to do anything that would cause a breach of the terms of the agreement between
your organization and Thomson Reuters, such as (1) allowing anyone else to use your
username/password, (2) downloading excessive amounts of data, (3) providing data to anyone else,
other than in licensed, source-acknowledged documents or reports created as part of your normal work,
(4) archiving or using downloaded data to create a derivative database or metrics, (5) using the product
or any downloaded data to provide services to anyone outside your organization, or (6) using the
product in a way that risks damaging, disabling, overburdening or impairing the operation of the
product, or any other personâs use or enjoyment of the product.
6 Faculteit der Exacte Wetenschappen
18. 5 TAKE-AWAYS
1. Open Data is a boon to young scientists as consumers
2. Trade-offs for producers of open data
3. Producers need support
4. Clear simple guidelines for data publication
5. Data citation is a key to open data
18 Faculteit der Exacte Wetenschappen
Editor's Notes
Talk about citation data, difficult to get 2 weeks to gather a couple of hundred citation scores
Open data to the rescueâŠ. (
My own community
Faster Easier to experiment Access to more data
Effective at the institutional level: Examples: Uniprot, chembl, astromicial data service, us government weather data
Not as much experience at the personal level But good examples from (open source software)
Built software during my phd released it as open sourceâŠ..
A fairly highly sighted paper in the UK e-Science All Hands Meeting (not the biggest outlet in the world)
Led to new collaborators
Exposing your dirty laundry is scary
Lots of questions about the software People want support This is a distraction and can take time away from âscienceâ