How to best manage your data to make the most of it for your research - With ODAM Framework (Open Data for Access and Mining) Give an open access to your data and make them ready to be mined
DataONE Education Module 09: Analysis and WorkflowsDataONE
Lesson 9 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Lesson 7 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Next-Generation Search Engines for Information RetrievalWaqas Tariq
In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is generally rich, not easy to understand, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. This data should be stored in a format that can be locatable, retrievable and understandable, more importantly it should be in a form that will continue to be accessible as technology changes, such as XML. New search technologies are being implemented around these protocols, which makes searching easy, fast and yet robust. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
DataONE Education Module 09: Analysis and WorkflowsDataONE
Lesson 9 in a set of 10 created by DataONE on Best Practices for Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
DataONE Education Module 03: Data Management PlanningDataONE
Lesson 3 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Lesson 7 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Next-Generation Search Engines for Information RetrievalWaqas Tariq
In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is generally rich, not easy to understand, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. This data should be stored in a format that can be locatable, retrievable and understandable, more importantly it should be in a form that will continue to be accessible as technology changes, such as XML. New search technologies are being implemented around these protocols, which makes searching easy, fast and yet robust. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences.
Our regular Introduction to Data Management (DM) workshop (90-minutes). Covers very basic DM topics and concepts. Audience is graduate students from all disciplines. Most of the content is in the NOTES FIELD.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Using data management plans as a research tool: an introduction to the DART Project
Amanda L. Whitmire, Ph.D., Assistant Professor, Data Management Specialist, Oregon State University Libraries & Press
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
Chesapeake Regional Information System for our Patients (CRISP) is a nonprofit healthcare information exchange (HIE) whose customers include states like Maryland and healthcare providers such as Johns Hopkins. CRISP’s work supports the local healthcare community by securely sharing the kind of data that facilitates care and improves health outcomes.
When the pandemic started, the Maryland Department of Health reached out to CRISP with a request: Get us the demographic data we need to track COVID-19 and proactively support our communities. As a result, CRISP employees spent long hours attempting to handle multiple data sources with complex data enrichment processes. To automate these requests, CRISP partnered with Slalom to build a data platform powered by Databricks and Delta Lake.
Using the power of the Databricks Lakehouse platform and the flexibility of Delta Lake, Slalom helped CRISP provide the Maryland Department of Health with near real-time reporting of key COVID-19 measures. With this information, Maryland has been able to track the path of the pandemic, target the locations of new testing sites, and ultimately improve access for vulnerable communities.
The work did not stop there—once CRISP’s customers saw the value of the platform, more requests starting coming in. Now, nearly one year since the platform was created, CRISP has processed billons of records from hundreds of data sources in an effort to combat the pandemic. Notable outcomes from the work include hourly contact tracing with data already cross-referenced for individual risk factors, automated reporting on COVID-19 hospitalizations, real-time ICU capacity reporting for EMTs, tracking of COVID-19 patterns in student populations, tracking of the vaccination campaign, connecting Maryland MCOs to vulnerable people who need to be prioritized for the vaccine, and analysis of the impact of COVID-19 on pregnancies.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Notably, the business area getting the most attention relates to increasing efficiencies and optimizing operations. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyse it to transform your business decisions for the future. Becoming proactive with big data analytics isn't a one-time endeavour, it is more of a culture change – a new way of gaining ground.
Keywords: business, analytics, exabytes, efficiency, data sets
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
There are two cultures in data science and analytics - those that develop analytic models and those that deploy analytic models into operational systems. In this talk, we review the life cycle of analytic models and provide an overview of some of the approaches that have been developed for managing analytic models and workflows and for deploying them, including using analytic engines and analytic containers . We give a quick overview of languages for analytic models (PMML) and analytic workflows (PFA). We also describe the emerging discipline of AnalyticOps that has borrowed some of the techniques of DevOps.
What is Data Commons and How Can Your Organization Build One?Robert Grossman
This is a talk that I gave at the Molecular Medicine Tri Conference on data commons and data sharing to accelerate research discoveries and improve patient outcomes. It also covers how your organization can build a data commons using the Open Commons Consortium's Data Commons Framework and the University of Chicago's Gen3 data commons platform.
This a talk that I gave at BioIT World West on March 12, 2019. The talk was called: A Gen3 Perspective of Disparate Data:From Pipelines in Data Commons to AI in Data Ecosystems.
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
Chesapeake Regional Information System for our Patients (CRISP) is a nonprofit healthcare information exchange (HIE) whose customers include states like Maryland and healthcare providers such as Johns Hopkins. CRISP’s work supports the local healthcare community by securely sharing the kind of data that facilitates care and improves health outcomes.
When the pandemic started, the Maryland Department of Health reached out to CRISP with a request: Get us the demographic data we need to track COVID-19 and proactively support our communities. As a result, CRISP employees spent long hours attempting to handle multiple data sources with complex data enrichment processes. To automate these requests, CRISP partnered with Slalom to build a data platform powered by Databricks and Delta Lake.
Using the power of the Databricks Lakehouse platform and the flexibility of Delta Lake, Slalom helped CRISP provide the Maryland Department of Health with near real-time reporting of key COVID-19 measures. With this information, Maryland has been able to track the path of the pandemic, target the locations of new testing sites, and ultimately improve access for vulnerable communities.
The work did not stop there—once CRISP’s customers saw the value of the platform, more requests starting coming in. Now, nearly one year since the platform was created, CRISP has processed billons of records from hundreds of data sources in an effort to combat the pandemic. Notable outcomes from the work include hourly contact tracing with data already cross-referenced for individual risk factors, automated reporting on COVID-19 hospitalizations, real-time ICU capacity reporting for EMTs, tracking of COVID-19 patterns in student populations, tracking of the vaccination campaign, connecting Maryland MCOs to vulnerable people who need to be prioritized for the vaccine, and analysis of the impact of COVID-19 on pregnancies.
Lesson 2 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Notably, the business area getting the most attention relates to increasing efficiencies and optimizing operations. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyse it to transform your business decisions for the future. Becoming proactive with big data analytics isn't a one-time endeavour, it is more of a culture change – a new way of gaining ground.
Keywords: business, analytics, exabytes, efficiency, data sets
This is an overview of the Data Biosphere Project, its goals, its architecture, and the three core projects that form its foundation. We also discuss data commons.
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
https://ucsb.zoom.us/meeting/register/tZYod-ippz4pHtaJ0d3ERPIFy2QIvKqjwpXR
FAIRy stories: the FAIR Data principles in theory and in practice
The ‘FAIR Guiding Principles for scientific data management and stewardship’ [1] launched a global dialogue within research and policy communities and started a journey to wider accessibility and reusability of data and preparedness for automation-readiness (I am one of the army of authors). Over the past 5 years FAIR has become a movement, a mantra and a methodology for scientific research and increasingly in the commercial and public sector. FAIR is now part of NIH, European Commission and OECD policy. But just figuring out what the FAIR principles really mean and how we implement them has proved more challenging than one might have guessed. To quote the novelist Rick Riordan “Fairness does not mean everyone gets the same. Fairness means everyone gets what they need”.
As a data infrastructure wrangler I lead and participate in projects implementing forms of FAIR in pan-national European biomedical Research Infrastructures. We apply web-based industry-lead approaches like Schema.org; work with big pharma on specialised FAIRification pipelines for legacy data; promote FAIR by Design methodologies and platforms into the researcher lab; and expand the principles of FAIR beyond data to computational workflows and digital objects. Many use Linked Data approaches.
In this talk I’ll use some of these projects to shine some light on the FAIR movement. Spoiler alert: although there are technical issues, the greatest challenges are social. FAIR is a team sport. Knowledge Graphs play a role – not just as consumers of FAIR data but as active contributors. To paraphrase another novelist, “It is a truth universally acknowledged that a Knowledge Graph must be in want of FAIR data.”
[1] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Guest Speaker in the 2nd National level webinar titled "Big Data Driven Solutions to Combat Covid 19" on 4th July 2020, Ethiraj College for Women(Auto), Chennai.
GBIF and reuse of research data, Bergen (2016-12-14)Dag Endresen
Biodiversity informatics seminar at the Department of Biology, University of Bergen on data publication and reuse of GBIF-mediated biodiversity data on 14th December 2016. Organized by the Norwegian GBIF Node and the Norwegian Biodiversity Information Center (NBIC, Artsdatabanken).
See also: http://www.gbif.no/events/2016/data-publishing-seminar-in-bergen.html
See also: http://doi.org/10.13140/RG.2.2.24290.32969
Meeting the NSF DMP Requirement June 13, 2012IUPUI
June 13 version of the IUPUI workshop Meeting the NSF Data Management Plan Requirement: What you need to know. This workshop is co-sponsored by the Office of the Vice Chancellor for Research and the University Library.
Most of the time, when you hear about Artificial Intelligence (AI), people talk about new algorithms or even the computation power needed to train them. But Data is one of the most important factors in AI.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
Lecture 1:
Being FAIR: FAIR data and model management
In recent years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs, workflows. The “FAIR” (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship [1] have proved to be an effective rallying-cry. Funding agencies expect data (and increasingly software) management retention and access plans. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. The multi-component, multi-disciplinary nature of Systems and Synthetic Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Our FAIRDOM project (http://www.fair-dom.org) supports Systems Biology research projects with their research data, methods and model management, with an emphasis on standards smuggled in by stealth and sensitivity to asset sharing and credit anxiety. The FAIRDOM Platform has been installed by over 30 labs or projects. Our public, centrally hosted Asset Commons, the FAIRDOMHub.org, supports the outcomes of 50+ projects.
Now established as a grassroots association, FAIRDOM has over 8 years of experience of practical asset sharing and data infrastructure at the researcher coal-face ranging across European programmes (SysMO and ERASysAPP ERANets), national initiatives (Germany's de.NBI and Systems Medicine of the Liver; Norway's Digital Life) and European Research Infrastructures (ISBE) as well as in PI's labs and Centres such as the SynBioChem Centre at Manchester.
In this talk I will show explore how FAIRDOM has been designed to support Systems Biology projects and show examples of its configuration and use. I will also explore the technical and social challenges we face.
I will also refer to European efforts to support public archives for the life sciences. ELIXIR (http:// http://www.elixir-europe.org/) the European Research Infrastructure of 21 national nodes and a hub funded by national agreements to coordinate and sustain key data repositories and archives for the Life Science community, improve access to them and related tools, support training and create a platform for dataset interoperability. As the Head of the ELIXIR-UK Node and co-lead of the ELIXIR Interoperability Platform I will show how this work relates to your projects.
[1] Wilkinson et al, The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3, doi:10.1038/sdata.2016.18
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
A talk outlining the virtues and processes of Research Data Management for PhD students in the geosciences. Given by Stuart Macdonald at the Introduction to RDM Workshop, School of Geosciences, University of Edinburgh, on 2 November 2015
Spring 2014 Data Management Lab: Session 2 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)
What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.
Metadata management for data storage spaces :
INDEXATOR is a metadata management tool that addresses the problems of organising, documenting, storing and sharing data in a research unit or infrastructure, and fits perfectly into a data management plan of a collective.
The central idea is that the storage space becomes the data repository, so the metadata should go to the data and not the other way around.
Given the diversity of domains, the approach chosen is to be both as flexible and as pragmatic as possible by allowing each collective to choose its own (controlled) vocabulary corresponding to the reality of its field and activities. The main idea is to be able to "capture" the user's metadata as easily as possible using their vocabulary. It is possible to define the whole terminology using a spreadsheet.
The choice was made for the JSON format, which is very appropriate for describing metadata, readable by both humans and machines.
This tool is built around a web interface coupled with a MongoDB database. The web interface allows you to i) Describe a dataset using metadata of various types (Description), ii) Search datasets by their metadata (Accessibility).
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
BioStatFlow is a web application useful to analyze "OMICS", including metabolomics, data with statistical methods.
BioStatFlow is available online: http://biostatflow.org
ODAM is an Experiment Data Table Management System (EDTMS) that gives you an open access to your data and make them ready to be mined - A data explorer as bonus
Spectra processing is crucial in metabolomics approaches, especially for proton NMR metabolomic profiling, since each processing step may impact the following steps. Among the different processing steps, data reduction (binning or bucketing) strongly impacts subsequent statistical data analysis and potential biomarker discovery. Based on a recently published work, we propose an improved method of data reduction, called ERVA which stands for Extraction of Relevant Variables for Analysis. This new method, by providing buckets centred on resonance peaks and rid of any non-significant signal, helps to recover the chemical fingerprints of metabolites. Moreover, we take advantage of the concentration variability of each compound from a series of samples of a complex mixture, to highlight chemical information. This is performed by linking the buckets into clusters based on significant correlations, thus bringing a helpful support for compound identification. As a proof of concept, this new method has been applied to a tomato 1H-NMR dataset to test its ability to recover fruit extract composition.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
1. Daniel Jacob – INRA - 2018
How to best manage your data
to make the most of it for your research
Make your data great now
Give an open access to your data
and make them ready to be mined
Open Data for Access and Mining
ODAM Framework
Daniel Jacob
INRA UMR 1332 BFP – Metabolism Group
Bordeaux Metabolomics Facility
Oct 2018
2. Daniel Jacob – INRA - 2018
DATA Studies
Project
During a research project
Know-how Knowledge
Input Output
3. Daniel Jacob – INRA - 2018
What do they become?
• Nothing ! They rest on a disk space (up to its death!)
Among the possible scenarios, two of them are extreme
• Creation of a comprehensive database managing all
data and metadata in its entirety, associated with a
visualization and querying interface.
Expected objectives
After the project is completed
DATA Studies
Project
4. Daniel Jacob – INRA - 2018
Expected objectives
Scientific Data Repositories
Enrichment
Expected links
DATA Studies
Project
Publishing policies
http://www.omicsdi.org/ …
5. Daniel Jacob – INRA - 2018
Raw Data
Processed
data
Analyzed
data
Published
data
Processed data is the raw
data processed in a way so
that they can highlight
some features, i.e. some
type of variables linked to
the focus of the study.
Raw data
Information
(often partial)
Open access
Specialized Data Repositories
Data flow Concerned data
Experiment
Data Tables
Know-how Knowledge
Annotation, Curation, Validation
Partly not automatically
reproducible because it
requires human expertise
6. Daniel Jacob – INRA - 2018
Open Data
Accessible data
including incomprehensible and
unexploitable documents by
automatons (programmatic way)
Open API
Queryable data
according to the imposed API scheme
Metadata
Application Program Interface
PUBLISH DATA "5 STARS" THE FAIR DATA PRINCIPLES
7. Daniel Jacob – INRA - 2018
Open Data
Accessible data
including incomprehensible and
unexploitable documents by
automatons (programmatic way)
Open API
Queryable data
according to the imposed API scheme
Metadata
Application Program Interface
PUBLISH DATA "5 STARS" THE FAIR DATA PRINCIPLES
Data capture Using data
The more simple as possible and in a normalized way
As far as possible, the most appropriate choice seems to keep the old way of using
the scientist's worksheets,
while allowing other more efficient approaches and this, throughout the data flow
Producers
are scientists
Consumers
are scientists
8. Daniel Jacob – INRA - 2018
Before the project begins
After the project is completed
Future
Data flow
DATA
A data management plan or DMP is a formal
document that outlines how data are to be handled
both during a research project, and after the project is
completed.
The goal of a data management plan is to consider the
many aspects of data management, metadata
generation, data preservation, and analysis before the
project begins
this ensures that data are well-managed in the
present, and prepared for preservation in the future.
Description by metadata (Ontology / Controlled Vocabulary)
Data capturing, data formatting, data linking
Implying data archaeology after several months / years
https://dmp.opidor.fr/
https://www6.inra.fr/datapartage/
Data management
Publishing policies
9. Daniel Jacob – INRA - 2018
During a research project
Know-how
Knowledge
Project
Data flow
DATA
Data / Metadata
Data Mining Modeling
Find out
“biomarkers”
Explain data
Data/Metadata Exploration
Data Visualization
Data mining / Modeling
Data exploration Descriptive statistics
First glimpse of the data that can show trends.
Allow the data to be well characterized, which is
necessary to then choose how to analyze them.
Repetition of multiple scenarios on
different subsets of data
Selection subsets of data
Implying lots of data manipulation
data capturing, data formatting, data linking,
data import / data export
Linking both metadata and data for data
mining
Data processing
10. Daniel Jacob – INRA - 2018
Before the project begins
After the project is completed
Future
During a research project
Know-how
KnowledgeProject
Data flow
2
1 2
1
Data management
Time is clearly explicit
Data processing
Time is often implicit
Reduce data manipulation
Data sharing & data availability
Facilitate the subsequent data mining
Facilitate the data dissemination
Make consistent the two axes :
Motivations
How ?
DATA
The "data life" must therefore be integrated
into the scientific research process
11. Daniel Jacob – INRA - 2018
seeding harvesting
samples
preparation
samples analysis
Sample
identifiers
Experiment
Data Tables
Experiment Design
Make both metadata and data
available for data mining
Several operators,
technics, data
types, SOPs, …
Each time we plan to share data coming from a common experimental
design, the classical challenges for fast using data by every partner are data
storage and data access
Several partners
Use-Case
“Metabolism”
Research question Project Experiment Experimental set-up
During a research project
Plant Metabolism
• Systems Biology
• Biomarkers
associated with plant
performance
12. Daniel Jacob – INRA - 2018
Whatever the kind of experiment, this assumes
a design of experiment (DoE) involving
individuals, samples or whatever things, as the
main objects of study (e.g. plants, tissues,
bacteria, …)
This also assumes the observation of dependent
variables resulting of effects of some controlled
experiment factors.
Moreover, the objects of study have usually an
identifier for each of them, and the variables
can be quantitative or qualitative.
Promote good practices
samples : Sample features
Data capture The experimental context: needs / wishesseeding harvesting
samples
preparation
Sample
identifiers
Experiment Design (DoE)
samples analysis
Use-Case “Metabolism”
identifier factors Quantitative Qualitative
Data
Promote non-proprietary
format like CSV or TSV
13. Daniel Jacob – INRA - 2018
Promote good practices
samples : Sample features
Data capture The experimental context: needs / wishesseeding harvesting
samples
preparation
Sample
identifiers
Experiment Design (DoE)
samples analysis
Use-Case “Metabolism”
Shortname Description Unit
SampleID Pool of several harvests Identifier
Treatment Treatment applied on plants Factor
DevStage fruit development stage Factor
FruitAge fruit age Days post-anthesis (dpa) Factor
FruitDiameter Fruit diameter mm Variable
FruitHeight Fruit height mm Variable
FruitFW Fruit Fresh Weight(g) g Variable
Rank Row of the invidual plant on the table Feature
Truss Position on the stem of the truss Feature
Description of the different
columns within data files
Metadata
Data
Promote non-proprietary
format like CSV or TSV
14. Daniel Jacob – INRA - 2018
Experiment
Data Tables
Data storage
drag & drop
No database schema, no programming code and no additional configuration on the server side.
Data capture Minimal effort (PUT)
Merely dropping data files in a data
storage (e.g. a local NAS or distant
storage space)
PUT
Data capture Using Data
The "core idea"
(See Good Practices)
Data center
mount
Data can be downloaded,
explored and mined
Data analysis / mining
Maximum efficiency (GET)
http://myhost.org/
GET
The more simple as possible and in a normalized way
15. Daniel Jacob – INRA - 2018
Experiment
Data Tables
Data storage
+2 metadata files
drag & drop
No database schema, no programming code and no additional configuration on the server side.
Data capture Minimal effort (PUT)
Merely dropping data files in a data
storage (e.g. a local NAS or distant
storage space)
Web API
Data center
mount
Data can be downloaded,
explored and mined
Data analysis / mining
Maximum efficiency (GET)
http://myhost.org/
GET
PUT
Data capture Using Data
EDTMS
Experiment Data Tables Management System
(EDTMS)
Implementation
F
A
INTEROPERABLE
R
16. Daniel Jacob – INRA - 2018
s_subsets.tsv This metadata file allows to associate a key concept to each data subset file EDTMS
Metadata files
In order to allow data to be explored and mined, we have to adjoin some minimal but relevant metadata:
It must exist a relationship between object types that we assume of “obtainedFrom" type.
To linked together two tables, it implies a common attribute, i.e. an identifier in most case.
Optional(*)
(*) in fact, rather deferred
17. Daniel Jacob – INRA - 2018
a_attributes.tsv This metadata file allows each attribute (variable) to be annotated with
some minimal but relevant metadata
factor
quantitative
qualitative
identifier
categories
EDTMS
Metadata files
In order to allow data to be explored and mined, we have to adjoin some minimal but relevant metadata:
Plants
Harvests
Samples
Compounds
…
…
Optional (*)
Good Practices
Description of the different columns within data files
(*) in fact, rather deferred
18. Daniel Jacob – INRA - 2018
s_subsets.tsv
a_attributes.tsv
…
…
EDTMS
Time
Data
Make consistent both data flow
Additional subsets can be added step by step, as soon as data are produced.
Metadata files some minimal but relevant metadata
19. Daniel Jacob – INRA - 2018
Using DataFRIM1 Fruit Integrative Modelling http://www.erasysbio.net/index.php?index=266
Dataset example
20. Daniel Jacob – INRA - 2018
Using DataFRIM1 Fruit Integrative Modelling
21. Daniel Jacob – INRA - 2018
Using DataFRIM1 Fruit Integrative Modelling
22. Daniel Jacob – INRA - 2018
Metadata files
In order to allow data to be explored and
mined, we have to adjoin some minimal
but relevant metadata
Using DataFRIM1 Fruit Integrative Modelling
23. Daniel Jacob – INRA - 2018
Using Data
http://myhost.org/
Application
Programming
Interface
F
A
INTEROPERABLE
R
EDTMS
GET http://myhost.org/getdata/<data format>/<dataset name>/< … >/< … >
REST Services: hierarchical tree of resource naming (URL)
Metadata files
In order to allow data to be explored and
mined, we have to adjoin some minimal
but relevant metadata
http://pmb-bordeaux.fr/odamsw/
Data Emancipation
24. Daniel Jacob – INRA - 2018
plants samples activome qNMR_metabo
Identifiers
https://pmb-bordeaux.fr/getdata/xml/frim1/(activome,qNMR_metabo)/treatment/Control
Get of data subsets by merging all the subsets with lower rank than the specified
subsets and following the pathway defined by the “obtainedFrom" links.
Avoids lots of data
manipulation
Facilitates linking both
metadata and data for
data mining (activome,qNMR_metabo) plants + samples
+ (aliquots+activome)
+ (pools+qNMR_metabo)
FRIM1
Application
Programming
Interface
EDTMS
http://pmb-bordeaux.fr/odamsw/
25. Daniel Jacob – INRA - 2018
Metadata files
In order to allow data to be explored and
mined, we have to adjoin some minimal
but relevant metadata
Using Data
F
A
INTEROPERABLE
R
EDTMS
Develop if needed, lightweight tools
- R scripts (Galaxy), lightweight GUI
(R shiny)
Tools
Data emancipation
regarding Tools
Data API Tools
Data
http://myhost.org/
Application
Programming
Interface
Data Emancipation
26. Daniel Jacob – INRA - 2018
https://pmb-bordeaux.fr/dataexplorer/?ds=frim1
FRIM1 Example online
R shiny
Visual data exploration
a first key step for deeper analyses
https://pmb-bordeaux.fr/dataexplorer/
Using Data
EDTMS
27. Daniel Jacob – INRA - 2018
FRIM1
Metadata files
In order to allow data to be explored and
mined, we have to adjoin some minimal
but relevant metadata
Using Data
EDTMS
32. Daniel Jacob – INRA - 2018
FRIM1
Export as …
Using Data
As far as possible,
keep the old way of
using the scientist's
worksheets …
33. Daniel Jacob – INRA - 2018
The R package
Rodam
Copy-Paste
The Comprehensive R
Archive Network
https://cran.r-project.org
FRIM1 Using Data
… while allowing
a way to be more
efficient ...
34. Daniel Jacob – INRA - 2018
The R package
Rodam
Selection subsets of data
Repetition of multiple scenarios on
different subsets of data
Data mining / Modeling https://cran.r-project.org/web/packages/Rodam/index.html
The Comprehensive R Archive Network Using Data
35. Daniel Jacob – INRA - 2018
R markdown
knitr
Reproducible Research … with R and RStudio
R code
https://rmarkdown.rstudio.com/authoring_quick_tour.html
The R package
Rodam
EDTMS
ODAM Framework
36. Daniel Jacob – INRA - 2018
Reproducible Research … with R and RStudio
“How you gather your data directly impacts how reproducible your research will be.
If all of your data gathering steps are tied together by your source code, then independent
researchers (and you) can more easily regather the data“
II. 6 - Gathering Data with R
II. 7 - Preparing Data for Analysis
“Once we have gathered the raw data that we want to include in our statistical analyses
we generally need to clean it up so that it can be merged into a single data file.”
https://englianhu.files.wordpress.com/2016/01/reproducible-research-with-r-and-studio-2nd-edition.pdf
This is exactly what the ODAM framework aims to answer in
a normalized way the easier and faster as possible
Chap II Data Gathering and Storage (70 pages out of 300)
Christopher Gandrud (2015)
https://github.com/christophergandrud/Rep-Res-Book
37. Daniel Jacob – INRA - 2018
https://pmb-bordeaux.fr/dataexplorer/?ds=frim1
doi:10.15454/95JUTK
https://data.inra.fr/
FINDABLE
A
I
R
Data Dissemination
R scripts (Rmd) If applicable
38. Daniel Jacob – INRA - 2018
Data as the subject of a paper
Data Paper
The Data Paper describes the data
It includes the associated descriptive elements (metadata),
and all the technical information (methods, formulas, software
applications ...) useful to the understanding of the obtaining of
data and their reuse by other scientists
https://www6.inra.fr/datapartage/Partager-Publier/Valoriser-ses-donnees/Publier-un-Data-Paper
This tool allows you to generate a draft data paper (scientific publication describing
a dataset) from the DOI of a dataset deposited in the data.inra.fr portal
A draft data paper
https://data.inra.fr/datapartage-datapapers-web/
39. Daniel Jacob – INRA - 2018
Make your data great now
All the actors in the data acquisition chain must be convinced that
the data repository can bring added value, all the more so as
producers will do so as soon as possible, i.e.:
• Reduce data manipulation
• Data sharing & data availability
• Facilitate the subsequent data mining
• Facilitate the reproducible research
• Facilitate the data dissemination
• Assistance with decision-making in the selection of samples, in
the choice of additional analyses,
• Assistance with annotation by cross-referencing, a priori
knowledge input,
• etc.
Make consistent the two axes
Data processing
Data management
This implies that (bio)computer scientists
• propose useful and/or innovative tools
• to motivate and convince researchers to submit
their data as soon as possible.
The data management system becomes
completely independent of data usage.
One dataset Several applications
&
One application Several datasets
The "data life" must therefore be integrated into
the scientific research process
40. Daniel Jacob – INRA - 2018
Need to take care of data
Take-home message
Thank you to your attention
Have a good fun !!
Open Data for Access and Mining
https://hub.docker.com/r/odam/getdata/
http://pmb-bordeaux.fr/dataexplorer/
https://github.com/INRA/ODAM
https://cran.r-project.org/package=Rodam
http://pmb-bordeaux.fr/odamsw/
Editor's Notes
Présentation à la fois d’un travail mais aussi d’une réflexion autour de la gestion de la données
À savoir comment en tirer le meilleur parti à toute les étapes de leur cycle de vie.
Schématisation d’un projet du point des données concernant les Entrées / Sorties
Time is more often implicit
Time is clearly explicit
Time is more often implicit
Time is clearly explicit
As far as possible, keep the old way of using the scientist's worksheets,
https://englianhu.files.wordpress.com/2016/01/reproducible-research-with-r-and-studio-2nd-edition.pdf
Chap 6 - Gathering Data with R (p109, PDF p138)
How you gather your data directly impacts how reproducible your research will be. You should try your best to document every step of your data gathering process. Reproduction will be easier if your documentation (especially variable descriptions and source code) makes it easy for you and others to understand what you have done. If all of your data gathering steps are tied together by your source code, then independent researchers (and you) can more easily regather the data. Regathering data will be easiest if running your code allows you to get all the way back to the raw data files –the rawer the better.
Chap 7 - Preparing Data for Analysis (p129, PDF p158)
Once we have gathered the raw data that we want to include in our statistical analyses we generally need to clean it up so that it can be merged into a single data file.