Accurate and timely collection of facts from a range of text sources is crucial for supporting the work of experts in detecting and understanding highly complex diseases. In this talk I illustrate several applications using techniques that baseline Natural Language Processing (NLP) pipelines against human-curated biomedical gold standards. (1) In the BioCaster project , high throughput text mining on multilingual news was employed to map infectious disease outbreaks. In order to detect norm violations we show the effectiveness of a range of time series analysis algorithms evaluated against ProMED-mail; (2)In the PhenoMiner project, using an ensemble approach together with SVM learn-to-rank, we show how named entity recognition can achieve improved levels of performance for biomedical concepts. We show however that performance still remains fragile when adapting to new disease domains; (3) Finally, I will discuss how in the SIPHS project we are building concept recognition systems based on deep learning to understand the ‘voice of the patient’ in social media messages.
Its my utmost belief that Kenya and other developing countries should be in the mainstream of adapting technology in excellent service delivery.
Veterinary Medicine applications of technology can improve education and service delivery.Here i highlight Informatics, Diagnostics,Biotechnology.Data analysis,Simualtion modelling and networks to outline policy changes for Kenya
This document provides an overview of the November 2000 issue of JALA (Journal of Analytical Laboratories Automation). It describes the development of a novel robotic system for the New York Cancer Project biorepository in collaboration with the Medical Automation Research Center. The biorepository receives 50-100 blood samples per day which are processed robotically to extract, quantify, aliquot and store DNA, plasma and RNA to be accessible to investigators. The robotic system aims to provide rapid random access to the hundreds of thousands of DNA samples stored for high-throughput analysis in studies of gene-environment interactions and cancer risk.
ciclo autonomico-short paper - Witfor 2016 paper_42.. ..
This paper presents an ongoing project to develop a biocomputational platform to analyze genomic data from cancer patients and bacteria in Costa Rica. The platform will integrate genomic data processing, prediction of drug sensitivity, and identification of new therapeutic targets. It will use pattern recognition techniques and mathematical models on genomic and drug response data to predict personalized therapy. Preliminary results include databases to store cancer and bacteria genomic data, and tools for exploring relationships between genomic features and drug responses. The platform aims to help identify optimal personalized treatments to overcome drug resistance in cancer and bacterial infections.
The document discusses the intersection of precision medicine, biomarkers, and healthcare policy. It describes how biomarkers and -omics data can be used for precision medicine to improve diagnostic accuracy, deliver targeted therapies, and stratify patient populations. However, clinical validation of biomarkers now requires large datasets and years of studies due to regulatory and payer requirements. This has reduced incentives for diagnostic innovation. The document also discusses challenges around clinical interpretation of complex multi-omic tests, evolving medical training and workflows, and disconnects between patent and reimbursement policies.
Referencias bibliograficas en formato apa y vancouver de elena rodadoelenard6
The document describes a student conducting research on kidney cancer. The student searches bibliographic databases Scopus and CINAHL to find information on kidney cancer using the search term "(cancer* OR neoplasm*) AND (kidney OR renal)". The student finds several relevant articles from each database and imports them into the Mendeley reference manager. The student then generates bibliographies for the articles in both APA and Vancouver citation styles.
Forum on Personalized Medicine: Challenges for the next decadeJoaquin Dopazo
Bioinformatics and Big Data in the era of Personalized Medicine
10th Anniversary Instituto Roche Forum on Personalized Medicine: Challenges for the next decade.
Santiago de Compostela (Spain), September 25th 2014
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
Its my utmost belief that Kenya and other developing countries should be in the mainstream of adapting technology in excellent service delivery.
Veterinary Medicine applications of technology can improve education and service delivery.Here i highlight Informatics, Diagnostics,Biotechnology.Data analysis,Simualtion modelling and networks to outline policy changes for Kenya
This document provides an overview of the November 2000 issue of JALA (Journal of Analytical Laboratories Automation). It describes the development of a novel robotic system for the New York Cancer Project biorepository in collaboration with the Medical Automation Research Center. The biorepository receives 50-100 blood samples per day which are processed robotically to extract, quantify, aliquot and store DNA, plasma and RNA to be accessible to investigators. The robotic system aims to provide rapid random access to the hundreds of thousands of DNA samples stored for high-throughput analysis in studies of gene-environment interactions and cancer risk.
ciclo autonomico-short paper - Witfor 2016 paper_42.. ..
This paper presents an ongoing project to develop a biocomputational platform to analyze genomic data from cancer patients and bacteria in Costa Rica. The platform will integrate genomic data processing, prediction of drug sensitivity, and identification of new therapeutic targets. It will use pattern recognition techniques and mathematical models on genomic and drug response data to predict personalized therapy. Preliminary results include databases to store cancer and bacteria genomic data, and tools for exploring relationships between genomic features and drug responses. The platform aims to help identify optimal personalized treatments to overcome drug resistance in cancer and bacterial infections.
The document discusses the intersection of precision medicine, biomarkers, and healthcare policy. It describes how biomarkers and -omics data can be used for precision medicine to improve diagnostic accuracy, deliver targeted therapies, and stratify patient populations. However, clinical validation of biomarkers now requires large datasets and years of studies due to regulatory and payer requirements. This has reduced incentives for diagnostic innovation. The document also discusses challenges around clinical interpretation of complex multi-omic tests, evolving medical training and workflows, and disconnects between patent and reimbursement policies.
Referencias bibliograficas en formato apa y vancouver de elena rodadoelenard6
The document describes a student conducting research on kidney cancer. The student searches bibliographic databases Scopus and CINAHL to find information on kidney cancer using the search term "(cancer* OR neoplasm*) AND (kidney OR renal)". The student finds several relevant articles from each database and imports them into the Mendeley reference manager. The student then generates bibliographies for the articles in both APA and Vancouver citation styles.
Forum on Personalized Medicine: Challenges for the next decadeJoaquin Dopazo
Bioinformatics and Big Data in the era of Personalized Medicine
10th Anniversary Instituto Roche Forum on Personalized Medicine: Challenges for the next decade.
Santiago de Compostela (Spain), September 25th 2014
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
Presentation about how much bioinformatics involved in the medical field. This was presented at the University of Colombo in 2007 for an undergraduate seminar
Cancer genome databases & Ecological databases Waliullah Wali
Introduction
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Cancer genome databases
COSMIC cancer database
COSMIC cancer database
COSMIC is an online database of somatically acquired mutations found in human cancer.
The database is freely available.
COSMIC cancer database
Types of data
Expert curation data
Genome-wide screen data
COSMIC cancer database
Expert curation data
Manually input by COSMIC expert curators.
Consists of comprehensive literature curation followed by subsequent updates.
Includes additional data points relevant to each disease and publication.
Provides accurate frequency data as mutation negative samples are specified.
COSMIC cancer database
Genome-wide screen data
Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC.
Provides unbiased molecular profiling of diseases while covering the whole genome.
Provides objective frequency data by interpreting non mutant genes across each genome.
Facilitates finding novel driver genes in cancer.
Enter into -
COSMIC cancer database
by typing http://cancer.sanger.ac.uk/cosmic
in the address bar of Browser
Searching Process
Examples
Examples
Examples
Examples
Ecological databases
Ecological databases
Ecological databases is a source for finding ecological datasets and quickly figuring out the best ways to use them.
BioOne
DataONE
GEOBASE
BioOne
BioOne is a nonprofit publisher that aims to make scientific research more accessible.
BioOne was established in 1999 in Washington, DC.
BioOne is Complete and open-access.
It serves a community of over 140 society and institutional publishers, 4,000 accessing institutions, and millions of researchers worldwide.
Enter into -
BioOne Ecological database
by typing http://www.bioone.org/
in the address bar of Browser
This document provides an overview of bioinformatics and highlights several key points:
- Bioinformatics has emerged as a field to help analyze the vast amounts of biological data being generated through high-throughput technologies. It integrates biology, computer science, and information technology.
- The size of the human genome and rate of data generation has grown exponentially, necessitating computational approaches. International efforts like the Human Genome Project helped sequence the entire human genome.
- Bioinformatics tools and databases are used to study genomics, transcriptomics, proteomics and more to better understand living systems at the molecular level and enable applications in medicine, agriculture, forensics and more. This work also raises ethical, legal and social considerations.
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
Abstract
Pharos (https://pharos.nih.gov/) is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied “dark” targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. In this webinar, Dr. Tudor Oprea will introduce how to use Pharos to find targets of interest for drug discovery.
The top 3 key questions that Pharos can answer:
1. What are the novel drug targets that may play a role in a specific disease?
2. What are the diseases that are related directly or indirectly to a drug target?
3. Find researchers that are related directly or indirectly to a drug target.
Presenter: Tudor Oprea, MD, PhD, Professor of Medicine, Chief of Translational Informatics Division & Internal Medicine, University of New Mexico
dkNET Webinar Information: https://dknet.org/about/webinar
This document provides an introduction to bioinformatics. It defines bioinformatics as the analysis of large amounts of biological data, such as DNA sequences, using computer programs. It discusses how next-generation sequencing technologies are generating terabytes of nucleotide sequence data that is analyzed by automated computer programs. The document then provides examples of the types of biological data that is analyzed in bioinformatics, including DNA, RNA, protein sequences and their interactions. It also discusses some common programming languages and analysis techniques used in bioinformatics.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
Bioinformatics involves the application of computer technology to manage biological information. Computers are used to gather, store, analyze, and integrate biological and genetic data, which can then be applied to areas like drug discovery. The need for bioinformatics arose from the large amount of genomic data generated by the Human Genome Project. It combines molecular biology and computer science to understand diseases and find new drug targets. Many universities, government agencies, and pharmaceutical companies have formed bioinformatics groups with computational biologists and computer scientists.
Patient-Organized Genomic Research StudiesMelanie Swan
DIYgenomics has developed a methodology for the conduct of patient-organized genomic research studies, obtaining outcomes by linking genomic data to phenotypic data and intervention. The general hypothesis is that individuals with one or more polymorphisms in the main variants associated with conditions may be more likely to have baseline out-of-bounds phenotypic biomarker levels, and could benefit the most from targeted intervention.
Computational challenges in precision medicine and genomicsGary Bader
Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.
Potentials of 3D models in anticancer drug screeningAnjali R.
A short presentation about the differences between 2D and 3D culture models, why researchers are moving toward 3D models in anticancer drug screening, the methods used in doing so and a recent case study of 3D tumour model being used for drug screening.
Bioinformatics is the application of computer technology to manage biological information. It involves gathering, storing, analyzing, and integrating genetic data. This allows for gene-based drug discovery and personalized medicine. The document outlines several key applications of bioinformatics such as diagnosing hereditary diseases, developing drug targets, and performing gene therapy. It also discusses trends like integrating genomic data into electronic health records, direct-to-consumer genetic testing services, and large-scale population studies. Challenges include disease commonality, lack of treatment options, and cost effectiveness of genetic tests.
Emerging collaboration models for academic medical centers _ our place in the...Rick Silva
- The document discusses emerging collaboration models between academic medical centers and other organizations in the genomics and precision medicine field, as genomic sequencing capabilities advance and more clinical cases are needed to power artificial intelligence platforms. It explores new partnership approaches around data sharing, patient engagement, infrastructure needs, and how academic medical centers can position themselves in this evolving ecosystem.
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov
Personalized medicine promises to uniquely treat patients without side effects but faces challenges before widespread clinical use. While sequencing costs have dropped, analyzing vast genomic data remains difficult. Additional studies are needed to prove causal gene-disease links before clinical use. New Zealand cannot compete on scale but expertise in bioinformatics and networks position it well for health software opportunities. Advances in sequencing technologies have driven costs below $1000 per genome. However, downstream analysis and storage costs remain significant hurdles. Additional large-scale studies are still needed to establish clinical validity before widespread clinical adoption within the next 10-15 years. New Zealand organizations are well positioned to develop specialized clinical analysis software and services.
This document summarizes Andrew Su's presentation on using crowdsourcing and citizen science for biology. Some key points:
- The biomedical literature is growing rapidly but most genes are poorly annotated due to the large amount of data and limited curation by human scientists.
- Projects like the Gene Wiki and Wikidata have harnessed the "long tail" of scientists to collaboratively curate and annotate gene information, resulting in high-quality structured data.
- Experiments using Amazon Mechanical Turk showed that non-experts can accurately perform tasks like identifying disease mentions in text, matching the performance of experts. This approach could scale to annotate the vast biomedical literature.
- The presenter's
The document provides information about a workshop on cancer genomic databases, including The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Catalogue of Somatic Mutations in Cancer (COSMIC). It summarizes the goals, data access, and analysis tools available for each database. It also discusses controlled access vs open data and the process for applying for access to controlled TCGA and ICGC genomic and clinical data.
The document discusses several use cases for applying data mining and machine learning techniques in healthcare and biomedical research. Three examples are:
1) Early diagnosis of cancers like lung cancer and breast cancer through predictive modeling of patient data to detect cancers at earlier stages when survival rates are higher.
2) Predicting patient responses to drug therapies for cancers like breast cancer by combining different types of molecular profiling data using techniques like support vector machines and random forests.
3) Using imaging data and temporal analysis of metrics like medication purchases to better understand and predict chronic diseases like diabetes and associated health complications.
Data integration: The STITCH database of protein-small molecule interactionsLars Juhl Jensen
The document discusses the STITCH database, which integrates protein-small molecule interaction data from several sources. It describes how STITCH combines data on interactions, pathways, protein complexes, gene neighborhoods, gene fusions, and phylogenetic profiles. It also discusses how STITCH uses text mining to extract interaction data from literature and integrates this with curated interaction databases. Finally, it summarizes how STITCH assigns confidence scores to predicted interactions based on the different lines of evidence and how these scores are calibrated against known interactions.
Text mining for protein and small molecule relationsLars Juhl Jensen
The document discusses using text mining to identify relationships between proteins and small molecules mentioned in biomedical documents. It describes techniques for entity recognition and identification, as well as methods for extracting relationships between entities using co-occurrence analysis and natural language processing. Examples are provided to illustrate how relationships can be identified between proteins mentioned in a sample text passage.
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
The document presents research on developing an adaptive entity linking system called ADEL. It discusses 6 problems in entity linking and proposes research questions to address adaptivity to different text, entity types, knowledge bases, and languages. It describes ADEL's modular framework including extraction, linking, and pruning modules. Evaluation shows ADEL achieves state-of-the-art results on multiple datasets. Future work focuses on knowledge base and language adaptivity, improving the system, and engineering a distributed architecture.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
Presentation about how much bioinformatics involved in the medical field. This was presented at the University of Colombo in 2007 for an undergraduate seminar
Cancer genome databases & Ecological databases Waliullah Wali
Introduction
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Cancer genome databases
COSMIC cancer database
COSMIC cancer database
COSMIC is an online database of somatically acquired mutations found in human cancer.
The database is freely available.
COSMIC cancer database
Types of data
Expert curation data
Genome-wide screen data
COSMIC cancer database
Expert curation data
Manually input by COSMIC expert curators.
Consists of comprehensive literature curation followed by subsequent updates.
Includes additional data points relevant to each disease and publication.
Provides accurate frequency data as mutation negative samples are specified.
COSMIC cancer database
Genome-wide screen data
Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC.
Provides unbiased molecular profiling of diseases while covering the whole genome.
Provides objective frequency data by interpreting non mutant genes across each genome.
Facilitates finding novel driver genes in cancer.
Enter into -
COSMIC cancer database
by typing http://cancer.sanger.ac.uk/cosmic
in the address bar of Browser
Searching Process
Examples
Examples
Examples
Examples
Ecological databases
Ecological databases
Ecological databases is a source for finding ecological datasets and quickly figuring out the best ways to use them.
BioOne
DataONE
GEOBASE
BioOne
BioOne is a nonprofit publisher that aims to make scientific research more accessible.
BioOne was established in 1999 in Washington, DC.
BioOne is Complete and open-access.
It serves a community of over 140 society and institutional publishers, 4,000 accessing institutions, and millions of researchers worldwide.
Enter into -
BioOne Ecological database
by typing http://www.bioone.org/
in the address bar of Browser
This document provides an overview of bioinformatics and highlights several key points:
- Bioinformatics has emerged as a field to help analyze the vast amounts of biological data being generated through high-throughput technologies. It integrates biology, computer science, and information technology.
- The size of the human genome and rate of data generation has grown exponentially, necessitating computational approaches. International efforts like the Human Genome Project helped sequence the entire human genome.
- Bioinformatics tools and databases are used to study genomics, transcriptomics, proteomics and more to better understand living systems at the molecular level and enable applications in medicine, agriculture, forensics and more. This work also raises ethical, legal and social considerations.
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
Abstract
Pharos (https://pharos.nih.gov/) is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied “dark” targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. In this webinar, Dr. Tudor Oprea will introduce how to use Pharos to find targets of interest for drug discovery.
The top 3 key questions that Pharos can answer:
1. What are the novel drug targets that may play a role in a specific disease?
2. What are the diseases that are related directly or indirectly to a drug target?
3. Find researchers that are related directly or indirectly to a drug target.
Presenter: Tudor Oprea, MD, PhD, Professor of Medicine, Chief of Translational Informatics Division & Internal Medicine, University of New Mexico
dkNET Webinar Information: https://dknet.org/about/webinar
This document provides an introduction to bioinformatics. It defines bioinformatics as the analysis of large amounts of biological data, such as DNA sequences, using computer programs. It discusses how next-generation sequencing technologies are generating terabytes of nucleotide sequence data that is analyzed by automated computer programs. The document then provides examples of the types of biological data that is analyzed in bioinformatics, including DNA, RNA, protein sequences and their interactions. It also discusses some common programming languages and analysis techniques used in bioinformatics.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
Bioinformatics involves the application of computer technology to manage biological information. Computers are used to gather, store, analyze, and integrate biological and genetic data, which can then be applied to areas like drug discovery. The need for bioinformatics arose from the large amount of genomic data generated by the Human Genome Project. It combines molecular biology and computer science to understand diseases and find new drug targets. Many universities, government agencies, and pharmaceutical companies have formed bioinformatics groups with computational biologists and computer scientists.
Patient-Organized Genomic Research StudiesMelanie Swan
DIYgenomics has developed a methodology for the conduct of patient-organized genomic research studies, obtaining outcomes by linking genomic data to phenotypic data and intervention. The general hypothesis is that individuals with one or more polymorphisms in the main variants associated with conditions may be more likely to have baseline out-of-bounds phenotypic biomarker levels, and could benefit the most from targeted intervention.
Computational challenges in precision medicine and genomicsGary Bader
Genomics is mapping complex data about human biology and promises major medical advances. In particular, genomics is enabling precision medicine, the use of a patient's genome and physiological state to improve therapeutic efficacy and outcome. However, routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with "Big data". These data are so complex and large that typical researchers are not able to cope with them. Collectively, these data require an understanding of many aspects of experimental biology and medicine to correctly process and interpret. Data size is also an issue, as individual researchers may need to handle tens of terabytes (genomes from a few hundred patients), which is challenging to download and store on typical workstations. To effectively support precision medicine, scientists from a wide range of disciplines, including computer science, must develop algorithms to improve precision medicine (e.g. diagnostics and prognostics), genome interpretation, raw data processing and secure high performance computing.
Potentials of 3D models in anticancer drug screeningAnjali R.
A short presentation about the differences between 2D and 3D culture models, why researchers are moving toward 3D models in anticancer drug screening, the methods used in doing so and a recent case study of 3D tumour model being used for drug screening.
Bioinformatics is the application of computer technology to manage biological information. It involves gathering, storing, analyzing, and integrating genetic data. This allows for gene-based drug discovery and personalized medicine. The document outlines several key applications of bioinformatics such as diagnosing hereditary diseases, developing drug targets, and performing gene therapy. It also discusses trends like integrating genomic data into electronic health records, direct-to-consumer genetic testing services, and large-scale population studies. Challenges include disease commonality, lack of treatment options, and cost effectiveness of genetic tests.
Emerging collaboration models for academic medical centers _ our place in the...Rick Silva
- The document discusses emerging collaboration models between academic medical centers and other organizations in the genomics and precision medicine field, as genomic sequencing capabilities advance and more clinical cases are needed to power artificial intelligence platforms. It explores new partnership approaches around data sharing, patient engagement, infrastructure needs, and how academic medical centers can position themselves in this evolving ecosystem.
John Boikov Personalised Medicine Essay, Mark - 95 out of 100John Boikov
Personalized medicine promises to uniquely treat patients without side effects but faces challenges before widespread clinical use. While sequencing costs have dropped, analyzing vast genomic data remains difficult. Additional studies are needed to prove causal gene-disease links before clinical use. New Zealand cannot compete on scale but expertise in bioinformatics and networks position it well for health software opportunities. Advances in sequencing technologies have driven costs below $1000 per genome. However, downstream analysis and storage costs remain significant hurdles. Additional large-scale studies are still needed to establish clinical validity before widespread clinical adoption within the next 10-15 years. New Zealand organizations are well positioned to develop specialized clinical analysis software and services.
This document summarizes Andrew Su's presentation on using crowdsourcing and citizen science for biology. Some key points:
- The biomedical literature is growing rapidly but most genes are poorly annotated due to the large amount of data and limited curation by human scientists.
- Projects like the Gene Wiki and Wikidata have harnessed the "long tail" of scientists to collaboratively curate and annotate gene information, resulting in high-quality structured data.
- Experiments using Amazon Mechanical Turk showed that non-experts can accurately perform tasks like identifying disease mentions in text, matching the performance of experts. This approach could scale to annotate the vast biomedical literature.
- The presenter's
The document provides information about a workshop on cancer genomic databases, including The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Catalogue of Somatic Mutations in Cancer (COSMIC). It summarizes the goals, data access, and analysis tools available for each database. It also discusses controlled access vs open data and the process for applying for access to controlled TCGA and ICGC genomic and clinical data.
The document discusses several use cases for applying data mining and machine learning techniques in healthcare and biomedical research. Three examples are:
1) Early diagnosis of cancers like lung cancer and breast cancer through predictive modeling of patient data to detect cancers at earlier stages when survival rates are higher.
2) Predicting patient responses to drug therapies for cancers like breast cancer by combining different types of molecular profiling data using techniques like support vector machines and random forests.
3) Using imaging data and temporal analysis of metrics like medication purchases to better understand and predict chronic diseases like diabetes and associated health complications.
Data integration: The STITCH database of protein-small molecule interactionsLars Juhl Jensen
The document discusses the STITCH database, which integrates protein-small molecule interaction data from several sources. It describes how STITCH combines data on interactions, pathways, protein complexes, gene neighborhoods, gene fusions, and phylogenetic profiles. It also discusses how STITCH uses text mining to extract interaction data from literature and integrates this with curated interaction databases. Finally, it summarizes how STITCH assigns confidence scores to predicted interactions based on the different lines of evidence and how these scores are calibrated against known interactions.
Text mining for protein and small molecule relationsLars Juhl Jensen
The document discusses using text mining to identify relationships between proteins and small molecules mentioned in biomedical documents. It describes techniques for entity recognition and identification, as well as methods for extracting relationships between entities using co-occurrence analysis and natural language processing. Examples are provided to illustrate how relationships can be identified between proteins mentioned in a sample text passage.
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
The document presents research on developing an adaptive entity linking system called ADEL. It discusses 6 problems in entity linking and proposes research questions to address adaptivity to different text, entity types, knowledge bases, and languages. It describes ADEL's modular framework including extraction, linking, and pruning modules. Evaluation shows ADEL achieves state-of-the-art results on multiple datasets. Future work focuses on knowledge base and language adaptivity, improving the system, and engineering a distributed architecture.
This document discusses using natural language processing (NLP) techniques to extract biological information from literature to help interpret large genomics datasets. The author describes developing a method to identify gene regulatory interactions by parsing Medline abstracts. This information can then be combined with data from experiments to classify protein associations and interactions. While literature provides important context, it should not be used alone. The author also intends to apply these NLP methods to full text articles to extract information from different sections like introductions and discussions.
One tagger, many uses - Illustrating the power of ontologies in named entity ...Lars Juhl Jensen
The document describes a C++ tagger that can recognize named entities in biomedical literature with high precision and recall. It can identify molecular entities, genes, proteins, chemicals, and can assess studiedness, association networks, localization, expressions, tissues, diseases, side effects, organisms, and habitats. The tagger is fast, flexible, inherently thread-safe, and uses ontologies, dictionaries, expansion rules, and blacklists to identify entities. It has been used in various databases and tools for data integration, literature mining, and interactive annotation.
STRING - Protein networks from data and text miningLars Juhl Jensen
This document discusses protein networks and how they can be constructed from data and text mining. It describes challenges like different data sources using different formats and identifiers and issues with data quality. It also outlines techniques used to parse the data, map identifiers, assign quality scores, and implicitly weight evidence by quality to build a comprehensive protein interaction network across all available sources. The resulting database is made freely available online as a web resource, downloadable files, and via an API and apps to facilitate its use.
Exploiting NLP for Digital Disease InformaticsNigel Collier
Exploiting These are the slides from my talk at the Department of Computer Science at Sheffield University. The talk covers broad ground in my experience of applying natural language processing to knowledge discovery from various media including social media, news and the scientific literature.
Bioinformatics issues and challanges presentation at s p collegeSKUASTKashmir
This document provides an overview of bioinformatics and some key concepts:
- It discusses the exponential growth of biological data from technologies like PCR and microarrays, and how bioinformatics is needed to analyze this data.
- Bioinformatics is defined as integrating biology and computer science to collect, analyze, and interpret large amounts of molecular-level information. It uses databases and tools to study genomes, proteins, and biological processes.
- Major databases like GenBank, EMBL, and SwissProt store DNA, RNA, protein sequences and provide access to researchers. Tools like BLAST are used to search databases and analyze sequences.
- Benefits of bioinformatics include advances in medicine, agriculture, forensics
This document provides an overview of the field of bioinformatics. It defines bioinformatics as using computational techniques to solve biological problems by analyzing large amounts of biological data like DNA sequences, amino acid sequences, and more. It discusses the need for bioinformatics due to the exponential growth of biological data from sequencing projects. Some key applications of bioinformatics mentioned include data management, knowledge discovery, drug discovery, proteomics, personalized medicine, agriculture, and its use in systems biology.
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.
Computational Pathology Workshop July 8 2014Joel Saltz
This document discusses computational pathology research. It describes using computational methods like high dimensional fused informatics, image analysis, and machine learning to analyze pathology images and integrate them with genomic and clinical data. The goals are to characterize tumors at multiple scales, predict treatment outcomes, and identify tumor subtypes. Challenges include managing the large amounts of image and multi-dimensional data generated. The document outlines several of Joel Saltz's pathology research projects and computational pathology initiatives like challenges that integrate radiology, pathology, and genomic data to predict patient outcomes.
Biobanking a user’s perspective: Dr. Jonathan PevsnerData Science NIH
The document provides an overview of biobanking from the perspective of a user. It discusses three examples of biobanking: 1) Using postmortem brain samples from the NIH NeuroBioBank to validate findings related to Sturge-Weber syndrome. 2) Establishing a biobank for Sturge-Weber syndrome. 3) Discovering mosaic mutations in autism samples by analyzing genomic data and then validating findings using samples from existing biobanks. It also outlines several issues, lessons, and principles for biobanking including usefulness, existing biobanks, importance of identifiers, role of data science, standards, informed consent, and ongoing needs and opportunities.
The document provides an overview of biobanking from the perspective of a user. It discusses three examples of biobanking: 1) Using postmortem brain samples from the NIH NeuroBioBank to validate findings related to Sturge-Weber syndrome. 2) Establishing a biobank for Sturge-Weber syndrome. 3) Discovering mosaic mutations in autism samples by analyzing genomic data and then validating findings using samples from existing biobanks. It also outlines several issues, lessons, and principles for biobanking including usefulness, existing biobanks, importance of identifiers, role of data science, use of standards, informed consent considerations, and ongoing needs and opportunities.
This document discusses human disease from a bioinformatics perspective. It describes major categories of human disease and approaches to identifying disease-associated genes. It compares disease databases and describes how model organisms can elucidate disease-related genetic variation. Key tools for studying disease at the molecular level include DNA databases, genetic and physical maps, protein structure analyses, and functional genomics. Classification systems organize diseases by causes of mortality, global disease burden, and clinical codes.
An Introduction to Bioinformatics
Drexel University INFO648-900-200915
A Presentation of Health Informatics Group 5
Cecilia Vernes
Joel Abueg
Kadodjomon Yeo
Sharon McDowell Hall
Terrence Hughes
Presentation "The Impact of All Data on Healthcare"
Keith Perry
Associate VP & Deputy CIO
UT MD Anderson Cancer Center
With continuing advancement in both technology and medicine, the drive is on to make all data meaningful to drive medical discovery and create actionable outcomes. With tools and capabilities to capture more data than ever before, the challenge becomes linking existing structured and unstructured clinical data with genomic data to increase the industry’s analytical footprint.
Learning Objectives:
∙ Discuss the need to make all data meaningful in order to speed discovery of new knowledge
∙ Provide examples of an analytical direction that supports evolution in medicine
∙ Expose the challenges facing the industry with respect to ~omits
The document discusses various technologies used at the House Ear Institute including genomics, proteomics, and imaging. It describes how researchers are using these tools to study diseases like neurofibromatosis type 2 (NF2) at the molecular level in order to develop personalized treatments and therapies. Maintaining high quality biospecimens is important for enabling various types of research.
This document outlines Svitlana Volkova's thesis on entity extraction and animal disease-related event recognition from web documents. It provides background on existing animal disease monitoring systems, both manually supported web interfaces and automated web services. It then discusses related work on text categorization, entity extraction, relation extraction, and event recognition. The document outlines Volkova's proposed framework for epidemiological analytics, including the main system components of data collection, data sharing, search, data analysis, and visualization. It provides details on disease-related document classification, domain-specific entity extraction, and ontology-based entity extraction. The goal is to build a system that can automatically extract information on animal disease outbreaks from unstructured web data.
This document discusses the large amounts of data being generated in life and health sciences research, referred to as "big data". It provides examples of big data projects involving genes, molecules, cells, tissues, and clinical research. The document stresses that researchers need tools to manage this data deluge and calls for improved data sharing, curation of scientific literature, and training of students in data concepts and software.
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
1. Machine learning faces challenges in biomedical research due to data heterogeneity, lack of labeled data, and complexity in biological patterns and networks.
2. Combining machine learning and biological network models can help address these challenges by encoding data in biologically meaningful networks and extracting network-based features for prediction.
3. Examples applying this approach to cancer datasets showed that models based on network centrality features outperformed other methods, and deep learning using these features achieved the best prediction performance across multiple neuroblastoma datasets.
Pathomics Based Biomarkers, Tools, and Methodsimgcommcall
This document discusses pathomics-based biomarkers, tools, and methods for multi-scale integrative analysis in biomedical informatics. It summarizes several projects involving extracting quantitative features from pathology and radiology images using image segmentation and analysis techniques. These features are then linked to molecular data and clinical outcomes using statistical and machine learning methods to develop biomarkers. The tools and methods described aim to standardize and optimize feature extraction while accounting for uncertainties.
This document provides a summary of a seminar presentation on bio-ontology and its application in bioinformatics. It discusses key topics like the goals and elements of ontology, applications of ontology including in bioinformatics, importance of bioinformatics, need for ontology in bioinformatics, types of bioinformatics ontologies and relations used in cancer ontologies. It also summarizes the growth of bio-ontology papers over time, top ontologies in different biology domains, limitations and future prospects.
Similar to Exploiting NLP for Digital Disease Informatics (20)
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...Université de Montréal
“Psychiatry and the Humanities”: An Innovative Course at the University of Montreal Expanding the medical model to embrace the humanities. Link: https://www.psychiatrictimes.com/view/-psychiatry-and-the-humanities-an-innovative-course-at-the-university-of-montreal
Pictorial and detailed description of patellar instability with sign and symptoms and how to diagnose , what investigations you should go with and how to approach with treatment options . I have presented this slide in my 2nd year junior residency in orthopedics at LLRM medical college Meerut and got good reviews for it
After getting it read you will definitely understand the topic.
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)MuskanShingari
Statistics- Statistics is the science of collecting, organizing, presenting, analyzing and interpreting numerical data to assist in making more effective decisions.
A statistics is a measure which is used to estimate the population parameter
Parameters-It is used to describe the properties of an entire population.
Examples-Measures of central tendency Dispersion, Variance, Standard Deviation (SD), Absolute Error, Mean Absolute Error (MAE), Eigen Value
This presentation gives information on the pharmacology of Prostaglandins, Thromboxanes and Leukotrienes i.e. Eicosanoids. Eicosanoids are signaling molecules derived from polyunsaturated fatty acids like arachidonic acid. They are involved in complex control over inflammation, immunity, and the central nervous system. Eicosanoids are synthesized through the enzymatic oxidation of fatty acids by cyclooxygenase and lipoxygenase enzymes. They have short half-lives and act locally through autocrine and paracrine signaling.
PGx Analysis in VarSeq: A User’s PerspectiveGolden Helix
Since our release of the PGx capabilities in VarSeq, we’ve had a few months to gather some insights from various use cases. Some users approach PGx workflows by means of array genotyping or what seems to be a growing trend of adding the star allele calling to the existing NGS pipeline for whole genome data. Luckily, both approaches are supported with the VarSeq software platform. The genotyping method being used will also dictate what the scope of the tertiary analysis will be. For example, are your PGx reports a standalone pipeline or would your lab’s goal be to handle a dual-purpose workflow and report on PGx + Diagnostic findings.
The purpose of this webcast is to:
Discuss and demonstrate the approaches with array and NGS genotyping methods for star allele calling to prep for downstream analysis.
Following genotyping, explore alternative tertiary workflow concepts in VarSeq to handle PGx reporting.
Moreover, we will include insights users will need to consider when validating their PGx workflow for all possible star alleles and options you have for automating your PGx analysis for large number of samples. Please join us for a session dedicated to the application of star allele genotyping and subsequent PGx workflows in our VarSeq software.
Dr. Tan's Balance Method.pdf (From Academy of Oriental Medicine at Austin)GeorgeKieling1
Home
Organization
Academy of Oriental Medicine at Austin
Academy of Oriental Medicine at Austin
Academy of Oriental Medicine at Austin
About AOMA: The Academy of Oriental Medicine at Austin offers a masters-level graduate program in acupuncture and Oriental medicine, preparing its students for careers as skilled, professional practitioners. AOMA is known for its internationally recognized faculty, award-winning student clinical internship program, and herbal medicine program. Since its founding in 1993, AOMA has grown rapidly in size and reputation, drawing students from around the nation and faculty from around the world. AOMA also conducts more than 20,000 patient visits annually in its student and professional clinics. AOMA collaborates with Western healthcare institutions including the Seton Family of Hospitals, and gives back to the community through partnerships with nonprofit organizations and by providing free and reduced price treatments to people who cannot afford them. The Academy of Oriental Medicine at Austin is located at 2700 West Anderson Lane. AOMA also serves patients and retail customers at its south Austin location, 4701 West Gate Blvd. For more information see www.aoma.edu or call 512-492-303434.
Spontaneous Bacterial Peritonitis - Pathogenesis , Clinical Features & Manage...Jim Jacob Roy
In this presentation , SBP ( spontaneous bacterial peritonitis ) , which is a common complication in patients with cirrhosis and ascites is described in detail.
The reference for this presentation is Sleisenger and Fordtran's Gastrointestinal and Liver Disease Textbook ( 11th edition ).
Congestive Heart failure is caused by low cardiac output and high sympathetic discharge. Diuretics reduce preload, ACE inhibitors lower afterload, beta blockers reduce sympathetic activity, and digitalis has inotropic effects. Newer medications target vasodilation and myosin activation to improve heart efficiency while lowering energy requirements. Combination therapy, following an assessment of cardiac function and volume status, is the most effective strategy to heart failure care.
Giloy in Ayurveda - Classical Categorization and SynonymsPlanet Ayurveda
Giloy, also known as Guduchi or Amrita in classical Ayurvedic texts, is a revered herb renowned for its myriad health benefits. It is categorized as a Rasayana, meaning it has rejuvenating properties that enhance vitality and longevity. Giloy is celebrated for its ability to boost the immune system, detoxify the body, and promote overall wellness. Its anti-inflammatory, antipyretic, and antioxidant properties make it a staple in managing conditions like fever, diabetes, and stress. The versatility and efficacy of Giloy in supporting health naturally highlight its importance in Ayurveda. At Planet Ayurveda, we provide a comprehensive range of health services and 100% herbal supplements that harness the power of natural ingredients like Giloy. Our products are globally available and affordable, ensuring that everyone can benefit from the ancient wisdom of Ayurveda. If you or your loved ones are dealing with health issues, contact Planet Ayurveda at 01725214040 to book an online video consultation with our professional doctors. Let us help you achieve optimal health and wellness naturally.
The Children are very vulnerable to get affected with respiratory disease.
In our country, the respiratory Disease conditions are consider as major cause for mortality and Morbidity in Child.
Gene therapy can be broadly defined as the transfer of genetic material to cure a disease or at least to improve the clinical status of a patient.
One of the basic concepts of gene therapy is to transform viruses into genetic shuttles, which will deliver the gene of interest into the target cells.
Safe methods have been devised to do this, using several viral and non-viral vectors.
In the future, this technique may allow doctors to treat a disorder by inserting a gene into a patient's cells instead of using drugs or surgery.
The biggest hurdle faced by medical research in gene therapy is the availability of effective gene-carrying vectors that meet all of the following criteria:
Protection of transgene or genetic cargo from degradative action of systemic and endonucleases,
Delivery of genetic material to the target site, i.e., either cell cytoplasm or nucleus,
Low potential of triggering unwanted immune responses or genotoxicity,
Economical and feasible availability for patients .
Viruses are naturally evolved vehicles that efficiently transfer their genes into host cells.
Choice of viral vector is dependent on gene transfer efficiency, capacity to carry foreign genes, toxicity, stability, immune responses towards viral antigens and potential viral recombination.
There are a wide variety of vectors used to deliver DNA or oligo nucleotides into mammalian cells, either in vitro or in vivo.
The most common vector system based on retroviruses, adenoviruses, herpes simplex viruses, adeno associated viruses.
1. Exploiting NLP for Digital Disease Informatics
University of Warwick, October 15th 2015
Nigel Collier, Language Technology Lab
Department of Theoretical and Applied Linguistics
2. Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
3. Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
• A significant portion of human health data is ‘messy data’
existing only as unstructured text
• Biomedical literature, Clinical trials data, Lab notebooks, Clinical
records, Diagnostic reports, News reports, Social media messages
• Represents the most contextually grounded, high precision
information about an individual’s health, attitudes and behaviours
4. Really understanding natural language is the next grand
challenge
• High throughput methods have transformed biomedicine into a
data-rich science
• All genes in a genome, all proteins in a proteome, all transcripts in a
cell, all metabolic processes in a tissue…
• A significant portion of human health data is ‘messy data’
existing only as unstructured text
• Biomedical literature, Clinical trials data, Lab notebooks, Clinical
records, Diagnostic reports, News reports, Social media messages
• Represents the most contextually grounded, high precision
information about an individual’s health, attitudes and behaviours
• Natural language processing (NLP) is a cornerstone
technology to translate ‘messy data’ into structured forms that
are systematically encoded, e.g. SNOMED-CT, ICD.
5. Experience from personal research
(1) Global infectious disease alerting and
mapping
(2) Extracting a database of phenotype terms
(3) Understanding the voice of the patient
(4) Chemical cancer risk assessment
(5) Critical hypothesis generation from literature
6. Typical workflow from text to knowledge
raw text
document
sentence
segmentation
tokenization
lexical
featurisation
entity
recognition
trigger
detection
relation
extraction
event
extraction
entity
grounding
knowledge objects
syntactic
parsing
7. Broad Research Objectives
• Extrinsic: Robust data collection from across health-related text types:
literature, patient records, news, social media (public health alerts, developing
disease profiles, etc.)
• Intrinsic: Understand how NLP/ML/Ontology techniques perform and can be
improved in operational settings
8. BIOCASTER: GLOBAL INFECTIOUS DISEASE
ALERTING AND MAPPING
Case study #1
[5] Collier, N. et al. (2008). BioCaster: detecting public health rumors with a Web-based text mining system. Bioinformatics, 24(24), 2940-2941.
[6] Collier, N., et al. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. J. Biomedical Semantics, 2(S-5), S9.
[7] Hay, S. I., et al. (2013). Global mapping of infectious disease. Philosophical Transactions of the Royal Society of London B: Biological
Sciences, 368(1614), 20120250.
9. Infectious diseases spread rapidly
“We live in a world where threats to health arise from the speed and volume of air
travel, the way we produce and trade food, the way we use and misuse antibiotics,
and the way we manage the environment…”
- Dr. Margaret Chan, DG WHO
SARS, 2003
HK, world
H5N1 flu, 2003-
PRC, Thailand,
ROC, Vietnam
Foot & mouth, 2001
United Kingdom
Ebola, 2014-
Guinea, Liberia,
Sierra Leone,
Nigeria
10. Trend graphs
Event summaries
Event alerts
Ontology browsing
Email/GeoRSS alerting
Watchboard, etc.
Real time Twitter
analysis
Up to date news in
12 languages
Event database search
GHSI
partners
US
UK
FR
DE
WHO
IT
JP
CA
Digital epidemic surveillance with BioCaster
12. Technical challenges
X0,000 news providers
REAL TIME
SCALING 30,000-40,000 news items/day
900 on topic/day
200 events/day
4 alerts/day
13. Technical challenges
X0,000 news providers
MULTILINGUALITY
Avian Flu
Influenza aviaire
鳥インフルエンザ
조류인플루엔자
โรคไข้หวัดนก
Cúm gia cầm
REAL TIME
SCALING
Increased sensitivity and
timeliness from multilingual
news
News event counts for porcine foot-
and-mouth outbreak in South Korea
2010-2011
14. Technical challenges
X0,000 news providers
MULTILINGUALITY
REAL TIME
SCALING
AMBIGUITY
“Obama fever builds as Americans
await a new era”
Equine influenza in Camden
Camden (UK) Camden (AU) Camden (CA) + 19 others
Entity identification
Toponym grounding
Tajoura Tajura Tajoora…
Variant transliterations
Coreference
“Two British holidaymakers fell ill… ”
“Two male pensioners died…”
2 or 4 victims?
Temporal identification
“The Spanish flu outbreak…
18. 5 detection algorithms
1. Early aberration reporting system (EARS) C2 algorithm
• captures the number of standard deviations that the current count exceeds the history mean;
• St = max(0, (Ct – (μt + kσt))/ σt)
2. EARS C3 algorithm
• similar to C2 except that C3 uses a weighted sum of the previous 3 days for the current period;
3. W2 algorithm
• a modified version of C2 which ignores history counts on Saturdays and Sundays to compensate for day of week effects;
4. F statistic
• compares the variance in the history window to the variance in the current window;
• St = σt
2 +σb
2
5. Exponential Weighted Moving Average (EWMA)
• provides less weight to days in the history that are further from the test day.
• St = (Yt – μt)/[σt * (λ/(2- λ))1/2], where Y1 = C1 and Yt = λCt + (1- λ)Yt-1
Model parameters were estimated based on an additional 5 epidemic data sets from ProMED-mail (data not
shown)
[8] Burkom H. S. (2005), “Accessible Alerting Algorithms for Biosurveillance”. National Syndromic Surveillance Conference
[9] Jackson M. L. et all (2007), “A simulation study comparing aberration detection algorithms for syndromic surveillance” Medical Informatics and Decision
Making , 7(6): BMC, DOI: 10.1186/1472-6947-7-6.
[10] Madoff L. (2004), “ProMED-mail: An early warning system for emerging diseases”. Clin Infect Dis , 39(2): 227–232.
19. # Disease Country ProMED-alerts
1 Hand,foot,mo
uth
PR China 9
2 Ebola Congo 17
3 Yellow fever Brazil 28
4 Influenza USA 21
5 Cholera Iraq 5
6 Chikungunya Singapore 8
7 Anthrax USA 15
8 Yellow fever Argentina 5
9 Ebola Reston Philippine
s
15
# Disease Country ProMED-alerts
10 Influenza Egypt 49
11 Plague USA 8
12 Dengue Brazil 27
13 Dengue Indonesia 14
14 Measles UK 13
15 Chikunguny
a
Malaysia 15
16 Yellow fever Senegal 0
17 Influenza Indonesia 35
18 Influenza Banglade
sh
3
14 countries and 11 infectious disease types. 366 days of news data was collected from BioCaster for each disease and
country. The study period is 17th June 2008 to 17th June 2009
Creating a benchmark data set
21. Field evaluation
• (2006-2012) Global Health Security Initiative– a unique initiative by G7+WHO+EC to
bring together end-users, system providers and stakeholders to test the feasibility of
open source public health intelligence systems.
[12] Barboza, P., Vaillant, L., Le Strat, Y., Hartley, D. M., Nelson, N. P., Mawudeku, A., Madoff, L. C., Linge, J. P., Collier, N., Brownstein, J. S. and Astagneau,
P. (2014). Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases
Outbreaks. PloS one, 9(3), e90536.
[13] Barboza, P., Vaillant, L., Mawudeku, A., Nelson, N., Hartley, D., Madoff, L., Linge, J., Collier, N., Brownstein, J., Yangarber, R. and Astagneau, P. (2013),
“Evaluation of epidemic intelligence systems integrated in the Early Alerting and Reporting project for the detection of A/H5N1 Influenza events”, PLoS One,
8(3):e57252.
Major findings for A/H5H1:
- Detection rates for individual systems from
31% to 38%
- Rising to 72% for the combined system
- PPV ranged from 3% to 24%
- F1 ranged from 6% to 27%
- Sensitivity ranged from 38% to 72%
- Average improvement in alerting over WHO or
OIE was 10.2 days
22. User outcomes
• Used by WHO and Japanese MoH to detect early cases during
the A(H1N1) pandemic;
• Used by ECDC to monitor diseases during the Shanghai Expo
2010, London Olympics 2012;
• Used by French Institute for Public Health to monitor for human-
to-human A(H5N1) transmission;
• Used by GHSI members to monitor for suspected accidental or
deliberate releases;
• Used by CDC to help monitor for health impact of the Oil spill in
the Gulf of Mexico;
23. PHENOMINER/PHENEBANK: EXTRACTING A
DATABASE OF PHENOTYPE TERMS
Case study #2
[14] Collier, N., Groza, T., Smedley, D., Robinson, P., Oellrich, A. and Rebholz-Schuhmann, D. (2015). PhenoMiner: from text to a database of
phenotypes associated with OMIM diseases. Database, Oxford University Press (in press).
[15] Collier, N., Oellrich, A. and Groza, T. (2013), “Toward knowledge support for analysis and interpretation of complex traits”, Genome Biology
14(9):214.
24. What is a phenotype?
Image courtesy of Washington, Haendel, Mungall, Ashburner, Westfield and Lewis (2009), “Linking human diseases
to animal models using ontology-based phenotype annotation”, PLoS Biology, 7(11):e1000247.
25. “… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician …”
HPO: 0009088 Speech articulation difficulties
Image courtesy of Damian Smedley,
Welcome Trust Sanger Institute,
Hinxton and Tudor Groza, University
of Queensland, Brisbane
Coding personal terminology
26. SVM learn-to-rank (pairwise)
Maximum entropy
Priority list heuristic
“… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician”
“… patients were selected for FOXP2 screening only if
they fulfilled the following criteria: presence of
speech articulation problems diagnosed by a clinician”
27. Creating a benchmark data set
• Data from OMIM cited autoimmune literature (112 abstracts, 472
phenotypes, 1611 gene/gene products).
29. F-scores using 3 hypothesis resolution strategies
[16] Collier, N., Tran, M., Le, H. Ha, Q., Oellrich, A. Rebholz-Schuhmann, D. (2013), “Learning to recognize phenotype candidates in the auto-immune literature
using SVM re-ranking”, PLoS One 8(10): e72965.
38. How can we do domain adaptation better (with less
annotations)?
[17] Collier, N., Paster, F., Campus, H., & Tran, A. M. V. (2014), “The impact of near domain transfer on biomedical named entity recognition”, Proc. 5th
International Workshop on Health Text Mining and Information Analysis (LOUHI) at the European Conference on Computational Linguistics (EACL),
Gothenburg, Sweden, pp. 11-20.
39. SIPHS: UNDERSTANDING THE VOICE OF THE
PATIENT
Case study #3
[18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to normalise medical terms in social media messages”,
in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17-21 September 2015, pp.
1675-1680.
[19] Limsopatham, N. and Collier, N. (2015), “Towards the semantic interpretation of personal health messages from social media”, in Proceedings
of the 24th ACM International Conference on Information and Knowledge Management (CIKM 2015), Workshop on Understanding the City with
Urban Informatics (UCUI 2015), Melbourne, Australia, 19-23 October 2015.
40. What do people talk about?
Types Tweet samples
Influenza
confirmation
I got flu n coughed a lot. Now my voice is like
monster’s voice. Rrr
Influenza symptoms My day: flu-like symptoms (headache, body aches,
cough, chills, 100.9 fever). Swine flu not ruled out.
#H1N1
Flu shots I’m still getting flu shots, nothing is worth flu turning
into bronchitis into pneumonia
Self protection Cover your mouth if coughing, use a tissue, wash
your hands often & get a flu shot - protect and
defend your community from #H1N1
Medication Wondering why I didn’t take the flu shot, laying in
bed with cough drops, medicine, and the remote
41. Tracking anxiety indicators have moderate-strong correlation
with CDC seasonal flu tracking
Category Spearman’s
Rho
P-value
A 0.66 0.020
S 0.66 0.021
I 0.58 0.048
P 0.67 0.017
A+I+P 0.68 0.008
A+I+P+S 0.67 0.017
0
50
100
150
200
250
300
350
400
450
0
500
1000
1500
2000
2500
3000
46 47 48 49 50 51 52 1 2 3 4 5
CDC
A
S
I
P
A+I+P
A+I+P+S
Data source: CDC (2009-2010 flu season)
“Cover your mouth if coughing,
use a tissue, wash your hands often & get
a flu shot - protect and defend your
community”
“I’m still getting flu shots, nothing is worth
flu turning into bronchitis into pneumonia”
“I can ignore this sore throat no longer.
And, um, maybe I should have gotten
that H1N1 vaccine.“
42. Frustratingly simple models work better
Classifying respiratory syndrome: Turning 225,000 Tweets into a
high correlation influenza tracker
[22] Doan, S., Ohno-Machado, L. and Collier, N. (2012), "Enhancing Twitter data analysis with simple semantic filtering: example in tracking Influenza-Like Illnesses", in
the 2nd IEEE Conference on Healthcare Informatics, Imaging and Systems Biology: Analyzing Big Data for Healthcare and Biomedical Sciences, California, USA,
September 27-28.
43. Coding the voice of the patient in SIPHS
• Integrate the language of Social Media and Lifescience Ontologies
• ‘Voice of the patient’ – real time public health mapping/risk analysis
• Code patient-centred vocabulary and links
• Generate public health summaries, e.g. infectious diseases, ADRs
Twitter message SNOMED
preferred
term
SNOMED ID
No way I’m getting any sleep 2nite Insomnia 193462001
Take _DRUG_ and can’t even
focus forreal
Unable to
concentrate
60032008
_DRUG_ makes u skinny Weight loss 89362005
44. “You shall know a word by the company it keeps”
– (Firth, J. R. 1957)
• Existing work [1,2] used word vector similarity to measure the
semantic similarity between texts
Performance seems depended on the used vector representation (e.g.
CBOW [1], GloVe [2])
[23] Mikolov et al. Distributed representations of words and phrases and their compositionality. NIPS 2013
[24] Pennington et al. GloVe: Global vectors for word representation. EMNLP 2014
• Recent advances in deep learning
technology [1,2] allowed the learned
representation of terms (i.e. DWRs) that
could capture the semantic similarity of
terms based on their co-occurrences e.g.
Continuous bag-of-words (CBOW) [1], Global
Vector (GloVe) [2]
44
45. Related work – Phrase-based MT
• Phrase-based MT [3]: Translate between languages by learning local
term dependencies from parallel corpora
We adapt phrase-based MT to translate from social media language to
formal medical language
Can’t even focus forreal no concentrate ???
[25] Koehn et al. Statistical phrase-based translation. NAACL 2003
45
46. Adapting Phrase-based MT for Twitter Normalisation
• We use phrase-based MT to translate social media text to formal
medical text, then map the translated symptoms to a SNOMED-CT
concept
Can’t even focus forreal unable to focus unable to concentrate
(ID 60032008)
translate
find semantic distance
[18] Limsopatham, N. and Collier, N. (2015), “Adapting phrase-based machine translation to noramlise medical terms
in social media messages”, in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,
Lisbon, Portugal, September, pp. 1675-1680.
47. A Twitter Phrase
Training pairs of
Twitter phrases and
SNOMED-CT terms
A Phrase-based MT
Model
Our Mapping
Approach (i.e. Sim,
rSim)
A ranking of
mapped
concepts
e.g. ‘No way I’m getting
any sleep 2nite’
e.g.
‘no sleep week’ = ‘Insomnia’,
‘so unfocussed!!!’ = ‘Unable to
concentrate’
Using a phrase-based
model, such as Koehn et
al. (2003)
e.g.
1. Insomnia (193462001)
2. Productivity at work
(224403006)
System Architecture
48. Experimental Setup
• Instantiations of our approach:
Sim(1): using only the best translation
Sim(5): using the top 5 translations
rSim(5): using the top 5 translations
• Baseline: Cosine similarity of vector representations of the original
tweet and the description of a concept
One-hot
Continuous Bags of Words (CBOW)
Global Vector (GloVe)
48
49. Experimental Results
• RQ1: Does our approach perform better than SOTA DWR baselines?
0.1675
0.2232
0.2491
0.2458
0.1896
0.1869
0
0.05
0.1
0.15
0.2
0.25
0.3
Baseline Sim(1) Sim(5) rSim(5)
One-hot CBOW GloVe
Yes, all instances of our approach
markedly outperformed the DWR
baselines by up to 33%MRR-5
49
Twitter message: “unable to sleep at all”
Baseline:
Mapping: “unable to sleep at all” ‘unable to concentrate’
Our approach:
Translation: “unable to sleep at all” “insomnia of”
Mapping: “insomnia of” ‘insomnia’
50. Experimental Results
• RQ2: Which types of DWRs are effective for our approach?
0.1675
0.2232
0.2491
0.2458
0.1896
0.2070
0.2104
0.2109
0.1869
0.2500
0.2638
0.2617
0
0.05
0.1
0.15
0.2
0.25
0.3
Baseline Sim(1) Sim(5) rSim(5)
One-hot CBOW GloVe
Both Sim and rSim outperform
the baseline, regardless of the
used vector representation
MRR-5
51. Experimental Results
• RQ3: Would the performance improve if we consider both original and
translated text when mapping a concept? Performances improved
when using one-hot
representationMRR-5
0.2232
0.242
0.2491
0.2556
0.2458
0.2594
0.2070
0.1953
0.2104
0.2144
0.2109
0.207
0.2500
0.2532
0.2638
0.2600
0.2617
0.2509
0.15
0.17
0.19
0.21
0.23
0.25
0.27
Sim(1) Sim(1)+ Sim(5) Sim(5)+ rSim(5) rSim(5)+
One-hot CBOW GloVe
51
52. Summary
• How we exploit the base of medical evidence is changing as access to unstructured
‘messy’ data opens up new opportunities
• Data access, bias and standards
• We can expect impact in epidemic detection, pharmacovigilence, translational health,
disease mapping, risk communication, rare disease profiling and many other areas.
• Encoding the data increases value through data mining, exchange and integration
• Machine learning outperforms dictionaries and hand built rules
• Finding the right lexical representation and right target form is key
53. Thank you
Contributions by:
Nigel Collier
nhc30@cam.ac.uk
Anna Korhonen
alk23@cam.ac.uk
Nut Limsopatham
nl347@cam.ac.uk
Further information at the
Language Technology Lab
http://ltl.mml.cam.ac.uk/
Funding: