On the importance (and absence) of annotation in Next Generation Sequencing Data

With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.

Interpreting Complex Real World Data for Pharmaceutical Research

Paul Agapow

This document discusses using real world data (RWD) for pharmaceutical research and development. It notes that while RWD is attractive due to its scale and realism, it is also complex and difficult to interpret. The document proposes several approaches for analyzing RWD, including using machine learning on graphical representations of patient data, analyzing temporal trajectories, integrating multiple 'omics data sources, and generating hypotheses rather than attempting to definitively model patient populations. It concludes that more work is needed to build larger, more diverse real world datasets and address challenges around privacy, methods validation, and scaling analysis techniques.

Automating the process of continuously prioritising data, updating and deploy...

Ola Spjuth

Presentation at Data Innovation Summit 2019 in Stockholm, Sweden. ABSTRACT Microscopes are capable of producing vast amounts of data, and when used in automated laboratories both the number and size of images present many challenges for storing, categorizing, analyzing, annotating, and transforming the data into actionable information that can used for decision making; either by humans or machines. In this presentation I will describe the informatics system we have established at the Department of Pharmaceutical Biosciences at Uppsala University, which consists of computational hardware (CPUs, GPUs, storage), middleware (Kubernetes), imaging database (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, as well as the continuous analytics system to automate the process from captured images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on features extracted from images or chemical structures. Due to the microservice architecture the system is scalable and can be expanded using hybrid-architectures with cloud computing resources. The informatics system serves a robotized cell profiling setup with incubators, liquid handling and high-content microscopy. The lab is quite young and is targeting applications primarily in drug screening and toxicity assessment, with the aim to improve research using AI and intelligent design of experiments.

This document discusses using mobile apps for drug discovery workflows. It describes how mobile devices are revolutionizing computing through new user interfaces and availability anywhere. Several examples are given of simple app workflows for tasks like looking up structures, running searches, and sharing data. The document advocates for mobile and cloud-based approaches to replace desktop-based cheminformatics workflows. This could make specialized tasks more accessible and collaboration easier. The potential for mobile apps to transform existing software vendors is also noted. The majority of the document focuses on examples of developing mobile apps to enable drug discovery for tuberculosis.

GWAS in a model organism: Arabidopsis thaliana

Golden Helix Inc

GWA studies are perhaps most often used for studying the genetic basis of human diseases, but this technology also has great utility for studying the natural variation of other organisms. In this webcast, Ashley Hintz, Field Application Scientist, will discuss the utility of SVS for analyzing plant GWA data, using publicly available SNP data for Arabidopsis thaliana as a case study. Along the way, Ashley will demonstrate how SVS can be used to manage data, analyze population structure, perform genotype QA and ultimately replicate a published genetic association in A. thaliana using EMMAX regression. She will also address the flexibility of SVS for analyzing the genomes of other plant and animal species.

Interpreting transcriptomics (ers berlin 2017)

Paul Agapow

Working with Quertle

Janet Delicata

Quertle is a biomedical big data analytics company that provides a platform using artificial intelligence and other advanced techniques to analyze over 40 million biomedical documents. Their platform allows for more comprehensive and precise searches compared to keyword-based searches, and can discover relationships and make connections that other tools cannot. The platform also provides predictive visual analytics and concept-oriented exploration of the data to provide actionable insights. Quertle aims to help address issues with the growing volume of biomedical literature and information that is missed with current approaches.

Reproducible research: theory

C. Tobin Magle

This document discusses reproducible research and provides guidance on how to conduct research in a reproducible manner. It covers: 1. The importance of reproducible research due to large datasets, computational analyses, and the potential for human error. Ensuring reproducibility requires new expertise and infrastructure. 2. Key aspects of reproducible research include data management plans, version control, use of file formats and software/tools that allow reproducibility, and publishing data and code to allow others to replicate results. 3. Reproducible research benefits the scientific community by increasing transparency and allows researchers to re-analyze their own data in the future. Journals and funders are increasingly requiring reproducibility.

Ai in drug design webinar 26 feb 2019

Alexander Tropsha presented on using AI and machine learning for drug design and discovery. He discussed using QSAR models to predict properties and activity of molecules based on their structural descriptors. He also introduced ReLeaSE, a new method using deep reinforcement learning to generate novel drug-like molecules and guide chemical library design through a thought cycle of molecule generation, model building, and iterative improvement. If successful, this approach could disrupt traditional computational drug discovery pipelines.

AI in translational medicine webinar

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Dr. Dennis Wang discusses possible ways to enable ML methods to be more powerful for discovery and to reduce ambiguity within translational medicine, allowing data-informed decision-making to deliver the next generation of diagnostics and therapeutics to patients quicker, at lowered costs, and at scale. The talk by Dr. Dennis Wang was followed by a panel discussion with Mr. Albert Wang, M. Eng., Head, IT Business Partner, Translational Research & Technologies, Bristol-Myers Squibb.

RDA Scholarly Infrastructure 2015

William Gunn

eScience at the Royal Society of Chemistry and our current initiatives

Access to scientific information has changed in a manner that was likely never even imagined by the early pioneers of the internet. The quantities of data, the array of tools available to search and analyze, the devices and the shift in community participation continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of this eScience cheminformatics platform and the nature of the solutions that it helps to enable including structure validation, text mining and semantic markup, the National Chemical Database Service for the United Kingdom and the development of a chemistry data repository. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.

eScience Resources for the Chemistry Community from the Royal Society of Chem...

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Our access to scientific information has changed in ways that were hardly imagined even by the early pioneers of the internet. The immense quantities of data and the array of tools available to search and analyze online content continues to expand while the pace of change does not appear to be slowing. ChemSpider is one of the chemistry community’s primary online public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data tens of thousands of chemists every day and it serves as the foundation for many important international projects to integrate chemistry and biology data, facilitate drug discovery efforts and help to identify new chemicals from under the ocean. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of the solutions that it helps to enable. We will also discuss the possibilities it offers in the domain of crowdsourcing and open data sharing. The future of scientific information and communication will be underpinned by these efforts, influenced by increasing participation from the scientific community and facilitated collaboration and ultimately accelerate scientific progress.

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

Making Open the Default - Bjorn Brembs

Right to Research

The document discusses issues with the current scholarly publishing system, including limited access, lack of impact analysis and functional search capabilities, poor peer review, and lack of archiving for publications, software, and data. It notes there are over 575 proposed solutions but the system remains dysfunctional. Issues highlighted include the weakening relationship between journal impact factors and actual citation rates, bias in genetic association studies toward higher impact journals, and that retracted studies are disproportionately published in high impact journals. The document argues for publishing in journals that can be widely read instead of solely high impact journals, and for making science open by default to save time and money while benefiting more people.

Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database

CEDAR: Center for Expanded Data Annotation and Retrieval

This document summarizes analyzing perturbed co-expression networks in cancer using a graph database. It describes building co-expression networks from The Cancer Genome Atlas RNA-seq data for several cancer types and tissues. Key network analysis methods applied include PageRank to identify important genes, Louvain clustering to identify communities of co-expressed genes, and finding shortest paths. The analysis provides insights into differences between normal and tumor networks and identifies gene communities related to processes like immune response and DNA replication.

Advancing Foundation and Practice of Software Analytics

Tao Xie

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...

The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developedthe CEDAR Workbenchis a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality metadata, and to share and manage these resources. The CEDAR Workbench provides a versatile, REST-based environment for authoring metadata that are enriched with terms from ontologies. The metadata are available as JSON, JSON-LD, or RDF for easy integration in scientific applications and reusability on the Web. Users can leverage our APIs for validating and submitting metadata to external repositories. The CEDAR Workbench is freely available and open-source.

CEUTF - TEOLOGIA

WandersonLo

Relazione Progetto cRIO

Aht ren aldeIngurugiro Etxea Fundazioa

Energiak etorkizunean maketaIngurugiro Etxea Fundazioa

Top Ten Digital Engagement Tools - WASHTO 2013 Annual Meeting

krmobley1

1. The document discusses top ten digital engagement tools and techniques for public outreach, including hashtags, social media sites, Twitter, hosted discussions/blogs, virtual meetings, video sharing, textizen, interactive surveying, crowdsourcing, and management dashboards. 2. Virtual meetings allow for regional discussions through webinars while interactive surveying and crowdsourcing gather public input on priorities and preferences. 3. Successful digital engagement requires strategy, regular updates, and viewing social media as a true conversation rather than just broadcasting information.

Tips for UXD that works

Albert Wang

This document discusses tips for successful user experience (UX) design. It recommends exploring abstract concepts to push boundaries. It also suggests listening to users like an orchestra musician listens to other musicians to understand their needs. Finally, it advises acting like an owner who is invested in the product's success. The goal is to design products and services that users value by understanding them and continuously improving based on their feedback.

Formato plano 10th week5_complex_sent

The document provides information about different types of sentences: simple sentences contain a subject and verb and express a complete thought, compound sentences contain two independent clauses joined by a coordinator, and complex sentences contain an independent clause joined by one or more dependent clauses introduced by a subordinating conjunction. Examples of each type of sentence are given. The last part of the document contains a story told through 20 sentences for students to identify as simple, compound, or complex.

What's hot

Cheminformatics Workflows Using Mobile Apps for Drug Discovery

Sean Ekins

GWAS in a model organism: Arabidopsis thaliana

Golden Helix Inc

Interpreting transcriptomics (ers berlin 2017)

Paul Agapow

Working with Quertle

Janet Delicata

Reproducible research: theory

C. Tobin Magle

Ai in drug design webinar 26 feb 2019

AI in translational medicine webinar

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

RDA Scholarly Infrastructure 2015

William Gunn

eScience at the Royal Society of Chemistry and our current initiatives

eScience Resources for the Chemistry Community from the Royal Society of Chem...

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

Making Open the Default - Bjorn Brembs

Right to Research

Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database

CEDAR: Center for Expanded Data Annotation and Retrieval

Advancing Foundation and Practice of Software Analytics

Tao Xie

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...

What's hot (15)

Cheminformatics Workflows Using Mobile Apps for Drug Discovery

GWAS in a model organism: Arabidopsis thaliana

Interpreting transcriptomics (ers berlin 2017)

Working with Quertle

Reproducible research: theory

Ai in drug design webinar 26 feb 2019

AI in translational medicine webinar

RDA Scholarly Infrastructure 2015

eScience at the Royal Society of Chemistry and our current initiatives

eScience Resources for the Chemistry Community from the Royal Society of Chem...

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

Making Open the Default - Bjorn Brembs

Analyzing Perturbed Co-Expression Networks in Cancer Using a Graph Database

Advancing Foundation and Practice of Software Analytics

The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...

Viewers also liked

CEUTF - TEOLOGIA

WandersonLo

Relazione Progetto cRIO

Aht ren aldeIngurugiro Etxea Fundazioa

Energiak etorkizunean maketaIngurugiro Etxea Fundazioa

Top Ten Digital Engagement Tools - WASHTO 2013 Annual Meeting

krmobley1

Tips for UXD that works

Albert Wang

Formato plano 10th week5_complex_sent

Ict4 d rhul talk

Hugh Shanahan

Formato de clase 8y9 acronyms

This document is from a Spanish language school in Cartagena, Colombia. It is about a lesson on colloquial English vocabulary used in songs. The lesson introduces common slang terms and their standard English translations. It then discusses the use of acronyms in text messages and asks students to complete example text messages using appropriate acronyms. Students are also asked to transform a sample conversation into a text message chat using acronyms and omitting unnecessary grammar.

Openid+Opensocial

Galeria Rammstein Slides

NATALIA LAVERDE

Formato de clase 8y9 future

1. The document discusses the future simple tense and how it is used to describe future actions. It explains that will is used for future predictions and facts, as well as spontaneous responses, promises, requests, threats and orders. 2. Going to is used to show future intentions and plans. It is used for predictions based on present evidence. Both will and going to are used for predictions, sometimes with little difference in meaning. 3. The document provides examples of conversations using future tense and presents activities for students to practice and demonstrate their understanding of future simple tense.

Linux & Open Source - Lezione 1

Formato plano 7th week4_simpl_pasrvspastcont

VPI Ontario

vporcaro

Vincent Porcaro Inc. (VPI) is a third party logistics company founded in 1996 that provides fulfillment services like assembly, fulfillment, distribution and warehousing to over 40 national and international companies out of its headquarters in Providence, RI. VPI has a distribution center located in Ontario, CA near major shipping hubs that has a 50,000 square foot warehouse facility equipped to meet various customer needs. VPI aims to provide extraordinary fulfillment through inventory management, quality control, picking/packing/assembly and shipping services.

Folio

souk06

Presentazione Progetto CRio

Viernes santo la merced 2012

Claudio Obregón

Formato plano 6th week6_future_simple

The document is a lesson plan for a 6th grade English class about future simple tenses and future plans. It includes an indicator of achievement stating that students will be able to talk about future plans. It provides an overview of the future simple tense, explaining that it is used to talk about predictions for the future using will or be going to. It notes that be going to is used when a plan is already in place, while will can show willingness or talk about immediate future plans without prior arrangement. Comparisons are made between will and be going to. The lesson plan continues for 5 pages with sections on exploration, conceptualization, production, modeling, workshop, and evaluation.

Aht ren kontraIngurugiro Etxea Fundazioa

Viewers also liked (20)

CEUTF - TEOLOGIA

Relazione Progetto cRIO

Aht ren alde

Energiak etorkizunean maketa

Top Ten Digital Engagement Tools - WASHTO 2013 Annual Meeting

Tips for UXD that works

Formato plano 10th week5_complex_sent

Ict4 d rhul talk

Formato de clase 8y9 acronyms

Openid+Opensocial

Galeria Rammstein Slides

Formato de clase 8y9 future

Linux & Open Source - Lezione 1

Formato plano 7th week4_simpl_pasrvspastcont

VPI Ontario

Folio

Presentazione Progetto CRio

Viernes santo la merced 2012

Formato plano 6th week6_future_simple

Aht ren kontra

Similar to On the importance (and absence) of annotation in Next Generation Sequencing Data

Finding and Accessing Human Genomics Datasets

Manuel Corpas

This document summarizes a workshop about finding and accessing human genomic datasets. The workshop covered various data sources such as public repositories, case studies on accessing data from the University of Cambridge, and a demonstration of the Repositive platform which aims to simplify accessing genomic data through a single search. Hands-on sessions allowed participants to search for genomic data themes in small groups using Repositive and report their results. Overall the workshop aimed to educate researchers on challenges of accessing genomic data and introduce Repositive as a tool to help address fragmentation and simplify the workflow for discovering and accessing genomic datasets.

Dia sds2015 web version

Michael Brodie

2016 09 cxo forum

Chris Dwan

Life sciences big data use cases

Guy Coates

Genome sharing projects around the world nijmegen oct 29 - 2015

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Genome sharing projects across the world Did you ever wonder what happened to the exponential increase in genome sequencing data? It is out there around the world and a lot of it is consented for research use. This means that if you just know where to find the data, you can potentially analyse gigabytes of data to power your research. In this talk Fiona will present community genome initiatives, the genome sharing projects across the world, how you can benefit from this wealth of data in your work, and how you can boost your academic career by sharing and collaboration. by Fiona Nielsen, Founder and CEO of DNAdigest and Repositive With a background in software development Fiona pursued her career in bioinformatics research at Radboud University Nijmegen. Now a scientist-turned-entrepreneur Fiona founded DNAdigest and its social enterprise spin-out Repositive Ltd. Both the charity and company focus on efficient and ethical sharing of genetics data for research to accelerate diagnostics and cures for genetic diseases.

From Replication Crisis to Credibility Revolution

Koki Ikeda

The document discusses issues related to the replication crisis in psychology and potential solutions. It notes that questionable research practices like p-hacking and HARKing are common but unintentional. Solutions proposed include transparency through open science, pre-registration of studies including pre-reviews, direct replication of studies, and higher evidentiary standards. Institutional changes are also needed to incentivize practices like pre-registration and increasing acceptance of replication studies. While rigorous methods may initially lower productivity, they can increase it long-term by allowing easier reuse of materials and data and identifying reliable findings sooner.

Activities at the Royal Society of Chemistry to gather, extract and analyze b...

The phrase “Big Data” is generally used to describe a large volume of structured and/or unstructured data that cannot be processed using traditional database and software techniques. In the domain of chemistry the Royal Society of Chemistry certainly hosts large structured databases of chemistry data, for example ChemSpider, as well as unstructured content, in the form of our collection of scientific articles. Our research literature provides value to their readership and, at present, as an example of one of our databases, ChemSpider is accessed by many tens of thousands of scientists every day. But do these collections constitute “Big Data” or is it the potential which lies within the collections that can contribute to the Big Data movement. This presentation will discuss our activities to contribute both data, and service-based access to our data sets, to support grant-based projects such as the Innovative Medicines Initiative Open PHACTS project (to support drug discovery) and the PharmaSea initiative (to identify novel natural products from the ocean). We will also provide an overview of our activities to perform data mining of public patent collections and examine what can be done with the data. We are presently extracting physicochemical properties and textual forms of NMR spectra and, with the resulting data, are building predictive models (for melting points at present) and assembling a large NMR spectral database containing many hundreds of thousands of spectral-structure pairs. Our experiences to date have demonstrated that we are working at the edge of current algorithmic and computing capabilities for predictive model building, with over a quarter of a million melting points producing a matrix of over 200 billion descriptors. Our work to produce the NMR spectral database will necessitate batch processing of the data to examine consistency between the spectral-structure pairs and other forms of data validation. The intention is to take our experiences in this work applied to a public patents corpus and apply it to the RSC back file of publications to mine data and enable new paths to the discoverability of both data and the associated publications.

Workshop finding and accessing data - fiona - lunteren april 18 2016

Workshop presentation on finding and accessing human genomics data for research. Including statistics of publicly available data sources and tips on how to save time in your workflow of data access. Presented at BioSB2016, pre-conference PhD retreat for young researchers in bioinformatics and systems biology at Congrescentrum De Werelt in Lunteren. #BioSB2016 #BioSB16 Link to event: http://www.youngcb.nl/events/biosb-phd-retreat-2016/ Read more about my work: http://DNAdigest.org http://repositive.io https://uk.linkedin.com/in/fionanielsen

Workshop - finding and accessing data - Cambridge August 22 2016

U.S. Army Engineer Research and Development Center

Finding and accessing human genomic data for research University of Cambridge, United Kingdom | Seminar Room G Monday, 22 August 2016 from 10:00 to 12:00 (BST) Charlotte, Nadia and Fiona presented an overview of data sources around the world where you can find genomics data for your research and gave examples of the data access application for dbGaP and EGA with specific details relevant for University of Cambridge researchers.

CINECA webinar slides: Modular and reproducible workflows for federated molec...

CINECAProject

Genetic analysis of molecular traits such as gene expression, splicing and chromatin accessibility requires a number of complex analysis steps that can easily take weeks or months for a analyst to implement from scratch. In the CINECA project, we have developed a number of modular Nextflow workflows that standardise and automate these steps. In this webinar, we will give an overview of the CINECA workflows for genotype imputation, gene expression and splicing quantification, data normalisation and association testing, and demonstrate how these workflows can be used in a federated setting without transferring identifiable personal data between partners. The CINECA webinar series aims to discuss ways to address common challenges and share best practices in the field of cohort data analysis, as well as distribute CINECA project results. All CINECA webinars include an audience Q&A session during which attendees can ask questions and make suggestions. Please note that all webinars are recorded and available for posterior viewing. This webinar took place on 10th November 2020 and is part of the CINECA webinar series. For previous and upcoming CINECA webinars see: https://www.cineca-project.eu/webinars

The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016

Jisc

There is broad recognition within the scientific community that the emerging data deluge will fundamentally alter disciplines in areas throughout academic research. A wide variety of researchers - from scientists and engineers to social scientists and humanities researchers - will require tools, technologies, and platforms that seamlessly integrate into standard scientific methodologies and processes. 'The fourth paradigm' refers to the data management techniques and the computational systems needed to manipulate, visualize, and manage large amounts of research data. This talk will illustrate the challenges researchers will face, the opportunities these changes will afford, and the resulting implications for data-intensive researchers. In addition, the talk will review the global movement towards open access, research repositories and open science and the importance of curation of digital data. The talk concludes with some comments on the research requirements for campus e-infrastructure and the end-to-end performance of the network.

sience 2.0 : an illustration of good research practices in a real study

wolf vanpaemel

High Performance Computing and the Opportunity with Cognitive Technology

IBM Watson

With the ability to reduce “time to insight” and accelerate research breakthroughs by providing immense computational power, high performance computing is becoming increasingly important in the marketplace. Meanwhile, cognitive technology has risen to prominence, similarly accelerating new insight, but through a very different approach - by analyzing previously ignored unstructured data, which accounts for 80% of new data created today. By combining the powerful computing power of the HPC market, along with the machine learning, natural language processing, and even computer vision techniques found within cognitive technology, there is a huge opportunity to accelerate breakthroughs and enable better decision making than ever before. Watch the replay of the webinar: https://www.youtube.com/watch?v=Hxgieboj3W0

In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...

A high level overview of using artificial intelligence and chemical structure information to predict toxicity in various species. Discusses molecular docking, deep learning, quantitative structure activity relationships, Bayesian networks and cats (lots of cat pictures). Part of my artificial intelligence for national security, artificial intelligence for warfighter readiness, and alternative methods for toxicity prediction research portfolios.

Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...

Spark Summit

This document describes a project at Novartis to use Apache Spark for high-dimensional data analysis from drug screening. Large datasets from various screening technologies were analyzed using Spark pipelines for quality control, normalization, and classification. Visualizations were built using WebGL. The goals were to speed up multi-day batch jobs, create a unified analysis workflow, and build an application for scientists. Future work includes elastic infrastructure, supervised learning of cell phenotypes, and contributing methods to open source.

RNP support to data-driven research

Leandro Ciuffo

The document summarizes the results of a survey conducted in Brazil on researchers' practices and perceptions around open access to research data. Some key findings include: - Most respondents were professors/researchers at public universities conducting research in fields like health sciences, biology and engineering. - Researchers generate a variety of data types and formats, with estimated volumes mostly under 50GB per year. - While a majority of researchers are knowledgeable about research data management, top reasons for not sharing data include needing to publish first and lacking infrastructure for sharing. - Next steps include publishing survey results, deploying a prototype research data repository, and recruiting beta testers.

Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...