Large-scale integration of data and text

•Download as PPT, PDF•

3 likes•631 views

This document discusses large-scale integration of biological data and text mining. It describes three main parts: association networks that connect entities based on "guilt by association", protein interaction networks built using data from STRING and 2000+ genomes, and using genomic context like gene fusion, gene neighborhood, and phylogenetic profiles. It then provides examples of using STRING to query protein networks and discusses challenges of text mining like the exponential growth of literature and limitations of current natural language processing. Finally, it describes the Jensen Lab's approach of integrating curated knowledge, experimental data, predictions, and data from databases like STRING, STITCH, PubChem, COMPARTMENTS, Gene Ontology, UniProtKB, and disease databases into a common framework with

Science

Large-scale integration of data and
text
Lars Juhl Jensen

Korbel et al., Nature Biotechnology, 2004

Beyer et al., Nature Reviews Genetics, 2007

Letunic & Bork, Trends in Biochemical Sciences, 2008

von Mering et al., Nucleic Acids Research, 2005

Franceschini et al., Nucleic Acids Research, 2013

Exercise 1
Query STRING for human TYMS
Show network in confidence mode
Show up to 20 interaction partners
Show only experimental evidence
Show also low-confidence links

what you learned in school
pronoun pronoun verb preposition noun

Gene and protein names
Cue words for entity
recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene the cytochrome
genes
[nxpg CYC1 and CYC7]]]
is controlled by
[nxpg HAP1]
Saric et al., Proceedings of ACL, 2004

Szklarczyk et al., Nucleic Acids Research, 2015string-db.org

Kuhn et al., Nucleic Acids Research, 2014stitch-db.org

Binder et al., Database, 2014compartments.jensenlab.org

tissues.jensenlab.org Santos et al., submitted, 2015

diseases.jensenlab.org Frankild et al., Methods, 2015

Exercise 2
Find TYMS-related diseases
http://diseases.jensenlab.org
Find some inhibitors of TYMS
http://stitch-db.org
Assess their tissue specificity
http://tissues.jensenlab.org

What's hot

Introduction to STRING

Lars Juhl Jensen

The STRING database

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Network Biology: Large-scale integration of data and text

Lars Juhl Jensen

Network biology: Large-scale data and text miningLars Juhl Jensen

The STRING database - Quality scores for heterogeneous interaction data

Lars Juhl Jensen

Network biology - Large-scale integration of data and text

Lars Juhl Jensen

Biomarker bioinformatics: Network-based candidate prioritization

Lars Juhl Jensen

Network biology: Large-scale data integration and text miningLars Juhl Jensen

STRING - Large-scale integration of data and text

Lars Juhl Jensen

STRING - Modeling of biological systems through cross-species data integ...

Lars Juhl Jensen

Information integrationLars Juhl Jensen

Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks - Large-scale integration of data and text

Lars Juhl Jensen

Systems biology: Bioinformatics on complete biological systemLars Juhl Jensen

Integration of heterogeneous data

Lars Juhl Jensen

Network biology: Large-scale data integration and text miningLars Juhl Jensen

Cellular network biology: Proteome-wide analysis of heterogeneous data

Lars Juhl Jensen

What's hot (20)

Introduction to STRING

The STRING database

Gene association networks - Large-scale integration of data and text

Network Biology: Large-scale integration of data and text

Network biology: Large-scale data and text mining

The STRING database - Quality scores for heterogeneous interaction data

Network biology - Large-scale integration of data and text

Biomarker bioinformatics: Network-based candidate prioritization

Network biology: Large-scale data integration and text mining

STRING - Large-scale integration of data and text

STRING - Modeling of biological systems through cross-species data integ...

Information integration

Large-scale integration of data and text

Gene association networks - Large-scale integration of data and text

Systems biology: Bioinformatics on complete biological system

Integration of heterogeneous data

Network biology: Large-scale data integration and text mining

Cellular network biology: Proteome-wide analysis of heterogeneous data

Similar to Large-scale integration of data and text

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

Protein association networks: Large-scale integration of data and text

Lars Juhl Jensen

Gene association networks: Large-scale integration of data and text

Lars Juhl Jensen

STRING: Protein networks from data and text mining

Lars Juhl Jensen

Advanced bioinformaticsof proteomics datasets

Lars Juhl Jensen

Data integration with STRINGLars Juhl Jensen

Systems biology: Bioinformatics on complete biological systems

Lars Juhl Jensen

Networks of proteins and diseasesLars Juhl Jensen

Large-scale integration of data and textLars Juhl Jensen

In silico and Text-Based Analysis of Cellular Networks

Lars Juhl Jensen

Large-scale data and text miningLars Juhl Jensen

Gene Association Networks: Large-scale integration of data and text

Lars Juhl Jensen

Cross-species data integration

Lars Juhl Jensen

Network Biology: A crash course on STRING and Cytoscape

Lars Juhl Jensen

STRING & STITCH: Network integration of heterogeneous data

Lars Juhl Jensen

Making gene networks through data integrationLars Juhl Jensen

Similar to Large-scale integration of data and text (18)

Gene association networks: Large-scale integration of data and text

Protein association networks: Large-scale integration of data and text

Gene association networks: Large-scale integration of data and text

STRING: Protein networks from data and text mining

Advanced bioinformaticsof proteomics datasets

Data integration with STRING

Systems biology: Bioinformatics on complete biological systems

Networks of proteins and diseases

Large-scale integration of data and text

In silico and Text-Based Analysis of Cellular Networks

Large-scale data and text mining

Gene Association Networks: Large-scale integration of data and text

Cross-species data integration

Network Biology: A crash course on STRING and Cytoscape

STRING & STITCH: Network integration of heterogeneous data

Making gene networks through data integration

Recently uploaded

Nutraceutical market, scope and growth: Herbal drug technology

Lokesh Patil

As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.

justice-and-fairness-ethics with example

azzyixes

Mammalian Pineal Body Structure and Also Functions

YOGESH DOGRA

extra-chromosomal-inheritance[1].pptx.pdfpdf

DiyaBiswas10

Slide 1: Title Slide Extrachromosomal Inheritance Slide 2: Introduction to Extrachromosomal Inheritance Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus. Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids. Slide 3: Mitochondrial Inheritance Mitochondria: Organelles responsible for energy production. Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria. Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring. Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy. Slide 4: Chloroplast Inheritance Chloroplasts: Organelles responsible for photosynthesis in plants. Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts. Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species. Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA. Slide 5: Plasmid Inheritance Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes. Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation. Significance: Important in biotechnology for gene cloning and genetic engineering. Slide 6: Mechanisms of Extrachromosomal Inheritance Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance. Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells. Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression. Slide 7: Examples of Extrachromosomal Inheritance Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells. Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration. Slide 8: Importance of Extrachromosomal Inheritance Evolution: Provides insight into the evolution of eukaryotic cells. Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases. Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification. Slide 9: Recent Research and Advances Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA. Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases. Slide 10: Conclusion Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology. Future Directions: Continued research and technological advancements hold promise for new treatments and applications. Slide 11: Questions and Discussion Invite Audience: Open the floor for any questions or further discussion on the topic.

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Sérgio Sacani

Since volcanic activity was first discovered on Io from Voyager images in 1979, changes on Io’s surface have been monitored from both spacecraft and ground-based telescopes. Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images show that a plume deposit from a powerful eruption at Pillan Patera has covered part of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive optics at visible wavelengths.

Multi-source connectivity as the driver of solar wind variability in the heli...

Sérgio Sacani

The ambient solar wind that flls the heliosphere originates from multiple sources in the solar corona and is highly structured. It is often described as high-speed, relatively homogeneous, plasma streams from coronal holes and slow-speed, highly variable, streams whose source regions are under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify solar wind sources and understand what drives the complexity seen in the heliosphere. By combining magnetic feld modelling and spectroscopic techniques with high-resolution observations and measurements, we show that the solar wind variability detected in situ by Solar Orbiter in March 2022 is driven by spatio-temporal changes in the magnetic connectivity to multiple sources in the solar atmosphere. The magnetic feld footpoints connected to the spacecraft moved from the boundaries of a coronal hole to one active region (12961) and then across to another region (12957). This is refected in the in situ measurements, which show the transition from fast to highly Alfvénic then to slow solar wind that is disrupted by the arrival of a coronal mass ejection. Our results describe solar wind variability at 0.5 au but are applicable to near-Earth observatories.

Richard's aventures in two entangled wonderlands

Richard Gill

Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.

filosofia boliviana introducción jsjdjd.pptx

IvanMallco1

Unveiling the Energy Potential of Marshmallow Deposits.pdf

Erdal Coalmaker

platelets_clotting_biogenesis.clot retractionpptx

muralinath2

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Sérgio Sacani

We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and 30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1 . Our search finds no candidates at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to infer the properties of the evolving luminosity function without binning in redshift or luminosity that marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results, and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5 from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical models for evolution of the dark matter halo mass function.

In silico drugs analogue design: novobiocin analogues.pptx

AlaminAfendy1

Large scale production of streptomycin.pptx

Cherry

insect morphology and physiology of insect

anitaento25

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.

Sérgio Sacani

The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.

Cancer cell metabolism: special Reference to Lactate Pathway

AADYARAJPANDEY1

Normal Cell Metabolism: Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function. Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released. Cell utilize energy in the form of ATP. The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process. Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP. The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos). It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation. If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use. IN CANCER CELL: Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful. Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation. This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive. Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful. Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation. This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive. introduction to WARBERG PHENOMENA: WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside. Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme. WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.

Anemia_ different types_causes_ conditions

muralinath2

Lateral Ventricles.pdf very easy good diagrams comprehensive

silvermistyshot

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

Health Advances

NuGOweek 2024 Ghent - programme - final version

pablovgd

Recently uploaded (20)

Nutraceutical market, scope and growth: Herbal drug technology

justice-and-fairness-ethics with example

Mammalian Pineal Body Structure and Also Functions

extra-chromosomal-inheritance[1].pptx.pdfpdf

Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...

Multi-source connectivity as the driver of solar wind variability in the heli...

Richard's aventures in two entangled wonderlands

filosofia boliviana introducción jsjdjd.pptx

Unveiling the Energy Potential of Marshmallow Deposits.pdf

platelets_clotting_biogenesis.clot retractionpptx

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

In silico drugs analogue design: novobiocin analogues.pptx

Large scale production of streptomycin.pptx

insect morphology and physiology of insect

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.

Cancer cell metabolism: special Reference to Lactate Pathway

Anemia_ different types_causes_ conditions

Lateral Ventricles.pdf very easy good diagrams comprehensive

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

NuGOweek 2024 Ghent - programme - final version

Large-scale integration of data and text

1. Large-scale integration of data and text Lars Juhl Jensen

2. three parts

3. association networks

4. guilt by association

6. protein networks

7. STRING

8. 2000+ genomes

9. genomic context

10. gene fusion

11. Korbel et al., Nature Biotechnology, 2004

12. gene neighborhood

13. Korbel et al., Nature Biotechnology, 2004

14. phylogenetic profiles

15. Korbel et al., Nature Biotechnology, 2004

16. a real example

17.

18.

19.

20. Cell Cellulosomes Cellulose

21. experimental data

22. gene coexpression

23.

24. physical interactions

25. Jensen & Bork, Science, 2008

26. genetic interactions

27. Beyer et al., Nature Reviews Genetics, 2007

28. curated knowledge

29. pathways

30. Letunic & Bork, Trends in Biochemical Sciences, 2008

31. many databases

32. different formats

33. different identifiers

34. variable quality

35. not comparable

36. not same species

37. hard work

38. (students)

39. parsers

40. mapping files

41. quality scores

42. phylogenetic profiles

43.

44.

45. affinity purification

46. von Mering et al., Nucleic Acids Research, 2005

47. score calibration

48. gold standard

49. von Mering et al., Nucleic Acids Research, 2005

50. implicit weighting by quality

51. common scale

52. homology-based transfer

53. orthologous groups

54. Franceschini et al., Nucleic Acids Research, 2013

55. Exercise 1 Query STRING for human TYMS Show network in confidence mode Show up to 20 interaction partners Show only experimental evidence Show also low-confidence links

56. text mining

57. >10 km

58. too much to read

59. exponential growth

60. ~40 seconds per paper

61. computer

62. as smart as a dog

63. teach it specific tricks

64.

65.

66. named entity recognition

67. comprehensive lexicon

68. cyclin dependent kinase 1

69. CDC2

70. orthographic variation

71. expansion rules

72. prefixes and suffixes

73. CDC2

74. hCdc2

75. flexible matching

76. spaces and hyphens

77. cyclin dependent kinase 1

78. cyclin-dependent kinase 1

79. “black list”

80. SDS

81. text corpus

82. ~22 million abstracts

83. Medline

84. ~2 million full-text articles

85. restricted access

86. information extraction

87. co-mentioning

88. counting

89. within documents

90. within paragraphs

91. within sentences

92. scoring scheme

93.

94.

95. score calibration

96. NLP Natural Language Processing

97. grammatical analysis

98. part-of-speech tagging

99. what you learned in school pronoun pronoun verb preposition noun

100. semantic tagging

101. words of special interest

102. sentence parsing

103. Gene and protein names Cue words for entity recognition Verbs for relation extraction [nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]] is controlled by [nxpg HAP1] Saric et al., Proceedings of ACL, 2004

104. more precise

105. worse recall

106. web resources

107. general approach

108. text mining

109. curated knowledge

110. experimental data

111. computational predictions

112. common identifiers

113. quality scores

114. score calibration

115. visualization

116. STRING

117. protein networks

118. Szklarczyk et al., Nucleic Acids Research, 2015string-db.org

119. STITCH

120. chemical networks

121. Kuhn et al., Nucleic Acids Research, 2014stitch-db.org

122. PubChem

123. pathways

124. drug targets

125. high-throughput screens

126. COMPARTMENTS

127. subcellular localization

128. Binder et al., Database, 2014compartments.jensenlab.org

129. Gene Ontology

130. GO annotations

131. UniProtKB

132. model organism databases

133. sequence-based prediction

134. PSORT

135. YLoc

136. TISSUES

137. tissue expression

138. tissues.jensenlab.org Santos et al., submitted, 2015

139. Brenda Tissue Ontology

140. high-throughput studies

141. EST libraries

142. microarrays

143. RNA-Seq

144. mass spectrometry

145. immunohistochemistry

146. DISEASES

147. disease associations

148. diseases.jensenlab.org Frankild et al., Methods, 2015

149. Disease Ontology

150. genetics studies

151. Genetics Home Reference

152. NHGRI GWAS Catalog

153. DistiLD

154. cancer mutation data

155. COSMIC

156. Exercise 2 Find TYMS-related diseases http://diseases.jensenlab.org Find some inhibitors of TYMS http://stitch-db.org Assess their tissue specificity http://tissues.jensenlab.org

157. thank you!

Large-scale integration of data and text

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Large-scale integration of data and text

Similar to Large-scale integration of data and text (18)

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

Recently uploaded

Recently uploaded (20)

Large-scale integration of data and text