Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner.
Consider the tweet 'New Sandra Bullock astronaut lost in space movie looks absolutely terrifying' and the text snippet extracted from a clinical narrative 'He is suffering from nausea and severe headaches. Dolasteron was prescribed'. The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea. Extracting such implicit information has not received enough attention in the information extraction literature. This dissertation developed a solution that consists of four steps, namely, knowledge acquisition, knowledge modeling, implicit information detection, and information extraction. This solution is demonstrated on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets.
Este documento resume las propiedades y beneficios del magnesio, incluyendo que una deficiencia de magnesio es común y puede ser resultado de una dieta pobre en minerales, diarreas prolongadas, diabetes, mala absorción intestinal o alcoholismo. También explica que el magnesio es esencial para el buen rendimiento físico y cognitivo, ayuda a la relajación muscular, y está relacionado con la salud cardiovascular.
Phat monday invigorating your vocab instructionjenniferplucker
The document discusses various strategies for teaching vocabulary words directly:
1) Alike but Different graphic organizers allow students to compare related concepts and their distinctions. An example compares parts of the brain.
2) Songs and podcasts set vocabulary words to music to improve memorization. An example is provided for the "Atoms Family" song.
3) Knowledge Rating Charts assess students' prior familiarity with vocabulary words on a scale from unfamiliar to fully understanding multiple meanings. An example chart covers skateboarding terms.
4) Word walls and notebooks display vocabulary for reference. The Frayer model provides definitions, examples, non-examples and comparisons for deep understanding of concept meanings.
Mycenae reached its peak around 1350 BC, with a citadel and lower town housing 30,000 inhabitants spanning 32 hectares. The Mycenaeans became successful merchants through contacts with Crete and Eastern countries, acquiring supplies, materials, and technological knowledge. Wealth was strictly accumulated by kings ruling from palaces that were centers of entrepreneurship, crafts, agriculture, trade, and redistribution of goods. Skilled artisans working in the palaces created fine artworks like pottery, ivory carvings, metalwork, textiles, stone sculptures, and seals using imported materials. Writing also developed during this period to help administrate commerce. The fortified citadel of Mycenae had a
El documento resume el año 2013, marcado por la crisis, el desempleo y la pobreza. Al acercarse 2014, propone hacer propósitos como perseguir lo inalcanzable, apartar los temores, ser valiente, soñar, leer, luchar por objetivos, afrontar retos, investigar y descubrir. También sugiere no rendirse, sonreír, compartir con los demás y ayudar a quien lo necesite. El 2014 se presenta como una página en blanco con más de 365 razones para escribir una herm
The document summarizes key lessons from the E-tourism Summit 09, including addressing tourists' needs, engaging them through apps and content, and making tourism easy and fun. It emphasizes the importance of providing engaging, credible, useful, and understandable content in multiple languages and platforms to cater to modern tourists who research and book online, travel with mobile phones, and want on-demand information in their own language. It promotes Mobiguide as a solution to deliver multilingual content across websites, mobile apps, social media, and more.
The document describes the Elephant Road Resort located in Arugam Bay, Sri Lanka. It is situated on the quiet jungle side facing a quiet wild lagoon, surrounded by nature and within 10 minutes walking distance of the beach and village center. It offers 4 luxury suites with jungle and lagoon views and private yoga balconies. Guests have access to a chill out restaurant, wifi, laundry services, and an attached tuk tuk driver for local transportation. The resort offers a relaxing getaway surrounded by wildlife in nature.
Este documento resume las propiedades y beneficios del magnesio, incluyendo que una deficiencia de magnesio es común y puede ser resultado de una dieta pobre en minerales, diarreas prolongadas, diabetes, mala absorción intestinal o alcoholismo. También explica que el magnesio es esencial para el buen rendimiento físico y cognitivo, ayuda a la relajación muscular, y está relacionado con la salud cardiovascular.
Phat monday invigorating your vocab instructionjenniferplucker
The document discusses various strategies for teaching vocabulary words directly:
1) Alike but Different graphic organizers allow students to compare related concepts and their distinctions. An example compares parts of the brain.
2) Songs and podcasts set vocabulary words to music to improve memorization. An example is provided for the "Atoms Family" song.
3) Knowledge Rating Charts assess students' prior familiarity with vocabulary words on a scale from unfamiliar to fully understanding multiple meanings. An example chart covers skateboarding terms.
4) Word walls and notebooks display vocabulary for reference. The Frayer model provides definitions, examples, non-examples and comparisons for deep understanding of concept meanings.
Mycenae reached its peak around 1350 BC, with a citadel and lower town housing 30,000 inhabitants spanning 32 hectares. The Mycenaeans became successful merchants through contacts with Crete and Eastern countries, acquiring supplies, materials, and technological knowledge. Wealth was strictly accumulated by kings ruling from palaces that were centers of entrepreneurship, crafts, agriculture, trade, and redistribution of goods. Skilled artisans working in the palaces created fine artworks like pottery, ivory carvings, metalwork, textiles, stone sculptures, and seals using imported materials. Writing also developed during this period to help administrate commerce. The fortified citadel of Mycenae had a
El documento resume el año 2013, marcado por la crisis, el desempleo y la pobreza. Al acercarse 2014, propone hacer propósitos como perseguir lo inalcanzable, apartar los temores, ser valiente, soñar, leer, luchar por objetivos, afrontar retos, investigar y descubrir. También sugiere no rendirse, sonreír, compartir con los demás y ayudar a quien lo necesite. El 2014 se presenta como una página en blanco con más de 365 razones para escribir una herm
The document summarizes key lessons from the E-tourism Summit 09, including addressing tourists' needs, engaging them through apps and content, and making tourism easy and fun. It emphasizes the importance of providing engaging, credible, useful, and understandable content in multiple languages and platforms to cater to modern tourists who research and book online, travel with mobile phones, and want on-demand information in their own language. It promotes Mobiguide as a solution to deliver multilingual content across websites, mobile apps, social media, and more.
The document describes the Elephant Road Resort located in Arugam Bay, Sri Lanka. It is situated on the quiet jungle side facing a quiet wild lagoon, surrounded by nature and within 10 minutes walking distance of the beach and village center. It offers 4 luxury suites with jungle and lagoon views and private yoga balconies. Guests have access to a chill out restaurant, wifi, laundry services, and an attached tuk tuk driver for local transportation. The resort offers a relaxing getaway surrounded by wildlife in nature.
El documento presenta diferentes tipos de estructuras para formular preguntas en cuestionarios, como la estructura de embudo y la estructura de rombo. También discute lineamientos para determinar el uso de cuestionarios y diferentes tipos de preguntas como preguntas abiertas y cerradas. Explica cuatro escalas comunes para medir las respuestas: escala nominal, ordinal, de intervalo y de relación.
La inteligencia competitiva como actividad fundamental en la gestión de la in...IPAE_INNOVA
Presentación de Ender Azkarate: "La inteligencia competitiva como actividad fundamental en la gestión de la innovación" dentro del Seminario Avanzado Modelos de Gestión de la Innovación durante la Semana de la Innovación - Innotec 2014 organizado por el Centro de Innovación de IPAE
Este documento presenta una guía didáctica del Tae Kwon Do que incluye preguntas y respuestas para cada grado de cinta, desde la cinta blanca hasta la cinta negra. Detalla los requisitos de tiempo mínimo, formas (poomse), términos técnicos, aspectos teóricos y de nutrición requeridos para ascender de un grado a otro. La guía proporciona información concisa sobre diferentes áreas del Tae Kwon Do para evaluar los conocimientos de quienes practican este arte marcial y ascender
Proyecto Lingüístico de Centro del IES El Fontanal 2013-2104tomasrodriguezreyes
Este documento presenta el Proyecto Lingüístico de Centro del IES El Fontanal para el curso 2013/2014. El proyecto tiene como objetivo principal mejorar la competencia en comunicación lingüística del alumnado a través de diferentes líneas de acción como el desarrollo de la lectura, la enseñanza de lenguas y el uso de nuevas metodologías. El proyecto es anual, multidisciplinar y cuenta con el consenso del claustro docente.
Operaciones principales de una base de datos ricardo cuevasRiCUSA
El documento describe diferentes operaciones entre tablas como unión, intersección, proyección, selección y producto cartesiano. La unión combina las filas de dos tablas en una nueva tabla. La intersección muestra los datos comunes entre dos tablas. La proyección extrae un campo específico de una tabla. La selección filtra una tabla según una condición. El producto cartesiano genera todas las posibles combinaciones entre las filas de dos tablas.
El documento presenta el Plan de Desarrollo Municipal (PDM) del municipio de Porco para el período 2004-2008. El PDM incluye un diagnóstico de la situación actual del municipio con información sobre su ubicación, división político-administrativa, población y aspectos socioeconómicos. También presenta la visión estratégica y la programación de proyectos quinquenales para alcanzar un desarrollo sostenible en la región, basado en los principios de participación ciudadana, equidad e integralidad. El objetivo general
The Economic, Social and Cultural Impact of the City Arts and Culture ClusterCallum Lee
This report analyzes the economic, social, and cultural impact of arts and culture organizations located in the City of London. It finds that in 2011/12, these organizations generated £225 million for the City of London economy and supported over 6,700 jobs. Their economic impact comes from direct operations, spending in supply chains, and audience spending in other local businesses. They also provide significant social benefits through educational programs that engaged over 300,000 people, and over 1,100 volunteer opportunities. Surveys show that audiences feel the organizations offer high quality, innovative experiences and international artists not otherwise accessible. The report concludes that the arts cluster enhances the City's appeal and London's status as a global city.
Here is some advice for those of you who are beginning residency trainingMedico Apps
This document provides advice for residents beginning their training, emphasizing the importance of admitting when you don't know something and asking for help from senior residents or attendings. It also stresses respecting colleagues and patients, prioritizing tasks with potentially life-threatening issues needing immediate attention, and taking responsibility for fully understanding a patient's condition and test results rather than relying only on reports from others. Residents are advised to be diligent learners, reading extensively on their own and not embarrassing peers during rounds or conferences.
El documento describe el uso generalizado de las TIC para el aprendizaje entre estudiantes universitarios en Australia, con el 95% indicando su uso regular. También señala que la Open University of Australia ha duplicado su matriculación en los últimos 4 años a 55,000 estudiantes, 65% de los cuales son mujeres. Además, la Swinburne University of Technology ha aumentado su matriculación en más del 200% en los últimos 2 años a través del uso del aprendizaje en línea.
Este documento proporciona una guía sobre las funciones y herramientas básicas de Microsoft Word 2007. Explica las partes principales de la ventana de Word como la barra de inicio rápido, las barras de herramientas y las reglas. También describe cómo establecer diferentes alineaciones y sangrías de párrafos, insertar imágenes y usar formatos como viñetas y columnas. El documento sirve como una introducción práctica para aprender a utilizar Word 2007.
A client expresses feelings of being a burden and having ruined their life by leaving college. The therapeutic response by the nurse reflects the client's feelings of having lost opportunities without making judgments or giving advice. The nurse helps the client explore alternatives without prematurely analyzing the situation.
El documento describe la Tercera Vía como una propuesta política que intenta superar los esquemas de la derecha e izquierda. Se propone como un proyecto económico dirigido a toda la sociedad sin distinción. Después de la Segunda Guerra Mundial, el término empieza a usarse comúnmente para describir proyectos que buscan un camino intermedio entre la ley del mercado y la dictadura del estado. Finalmente, la Tercera Vía enfatiza ideas como subsidiariedad, responsabilidad comunitaria, programas de bien
Ficheros de ayuda. Formatos.
Herramientas de generación de ayudas.
Ayuda genérica y sensible al contexto.
Tablas de contenidos, índices, sistemas de búsqueda, entre otros.
Incorporación de la ayuda a la aplicación.
Tipos de manuales: anual de usuario, guía de referencia, guías rápidas, manuales de instalación, configuración y administración. Destinatarios y estructura.
Confección de tutoriales multimedia. Herramientas de captura de pantallas y secuencias de acciones.
Herramientas para la confección de tutoriales interactivos; simulación.
Implicit Entity Recognition in Clinical DocumentsSujan Perera
With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.
With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...Amit Sheth
The document discusses using big health data from personal, clinical, and social sources to better understand health outcomes through tools like the Kno.e.sis research center, which analyzes data from sensors, medical records, and social media to provide personalized health information and recommendations to improve care. It also describes specific projects like kHealth, which monitors asthma patients using mobile and sensor data, and PREDOSE, which tracks prescription drug abuse using information extracted from social media.
El documento presenta diferentes tipos de estructuras para formular preguntas en cuestionarios, como la estructura de embudo y la estructura de rombo. También discute lineamientos para determinar el uso de cuestionarios y diferentes tipos de preguntas como preguntas abiertas y cerradas. Explica cuatro escalas comunes para medir las respuestas: escala nominal, ordinal, de intervalo y de relación.
La inteligencia competitiva como actividad fundamental en la gestión de la in...IPAE_INNOVA
Presentación de Ender Azkarate: "La inteligencia competitiva como actividad fundamental en la gestión de la innovación" dentro del Seminario Avanzado Modelos de Gestión de la Innovación durante la Semana de la Innovación - Innotec 2014 organizado por el Centro de Innovación de IPAE
Este documento presenta una guía didáctica del Tae Kwon Do que incluye preguntas y respuestas para cada grado de cinta, desde la cinta blanca hasta la cinta negra. Detalla los requisitos de tiempo mínimo, formas (poomse), términos técnicos, aspectos teóricos y de nutrición requeridos para ascender de un grado a otro. La guía proporciona información concisa sobre diferentes áreas del Tae Kwon Do para evaluar los conocimientos de quienes practican este arte marcial y ascender
Proyecto Lingüístico de Centro del IES El Fontanal 2013-2104tomasrodriguezreyes
Este documento presenta el Proyecto Lingüístico de Centro del IES El Fontanal para el curso 2013/2014. El proyecto tiene como objetivo principal mejorar la competencia en comunicación lingüística del alumnado a través de diferentes líneas de acción como el desarrollo de la lectura, la enseñanza de lenguas y el uso de nuevas metodologías. El proyecto es anual, multidisciplinar y cuenta con el consenso del claustro docente.
Operaciones principales de una base de datos ricardo cuevasRiCUSA
El documento describe diferentes operaciones entre tablas como unión, intersección, proyección, selección y producto cartesiano. La unión combina las filas de dos tablas en una nueva tabla. La intersección muestra los datos comunes entre dos tablas. La proyección extrae un campo específico de una tabla. La selección filtra una tabla según una condición. El producto cartesiano genera todas las posibles combinaciones entre las filas de dos tablas.
El documento presenta el Plan de Desarrollo Municipal (PDM) del municipio de Porco para el período 2004-2008. El PDM incluye un diagnóstico de la situación actual del municipio con información sobre su ubicación, división político-administrativa, población y aspectos socioeconómicos. También presenta la visión estratégica y la programación de proyectos quinquenales para alcanzar un desarrollo sostenible en la región, basado en los principios de participación ciudadana, equidad e integralidad. El objetivo general
The Economic, Social and Cultural Impact of the City Arts and Culture ClusterCallum Lee
This report analyzes the economic, social, and cultural impact of arts and culture organizations located in the City of London. It finds that in 2011/12, these organizations generated £225 million for the City of London economy and supported over 6,700 jobs. Their economic impact comes from direct operations, spending in supply chains, and audience spending in other local businesses. They also provide significant social benefits through educational programs that engaged over 300,000 people, and over 1,100 volunteer opportunities. Surveys show that audiences feel the organizations offer high quality, innovative experiences and international artists not otherwise accessible. The report concludes that the arts cluster enhances the City's appeal and London's status as a global city.
Here is some advice for those of you who are beginning residency trainingMedico Apps
This document provides advice for residents beginning their training, emphasizing the importance of admitting when you don't know something and asking for help from senior residents or attendings. It also stresses respecting colleagues and patients, prioritizing tasks with potentially life-threatening issues needing immediate attention, and taking responsibility for fully understanding a patient's condition and test results rather than relying only on reports from others. Residents are advised to be diligent learners, reading extensively on their own and not embarrassing peers during rounds or conferences.
El documento describe el uso generalizado de las TIC para el aprendizaje entre estudiantes universitarios en Australia, con el 95% indicando su uso regular. También señala que la Open University of Australia ha duplicado su matriculación en los últimos 4 años a 55,000 estudiantes, 65% de los cuales son mujeres. Además, la Swinburne University of Technology ha aumentado su matriculación en más del 200% en los últimos 2 años a través del uso del aprendizaje en línea.
Este documento proporciona una guía sobre las funciones y herramientas básicas de Microsoft Word 2007. Explica las partes principales de la ventana de Word como la barra de inicio rápido, las barras de herramientas y las reglas. También describe cómo establecer diferentes alineaciones y sangrías de párrafos, insertar imágenes y usar formatos como viñetas y columnas. El documento sirve como una introducción práctica para aprender a utilizar Word 2007.
A client expresses feelings of being a burden and having ruined their life by leaving college. The therapeutic response by the nurse reflects the client's feelings of having lost opportunities without making judgments or giving advice. The nurse helps the client explore alternatives without prematurely analyzing the situation.
El documento describe la Tercera Vía como una propuesta política que intenta superar los esquemas de la derecha e izquierda. Se propone como un proyecto económico dirigido a toda la sociedad sin distinción. Después de la Segunda Guerra Mundial, el término empieza a usarse comúnmente para describir proyectos que buscan un camino intermedio entre la ley del mercado y la dictadura del estado. Finalmente, la Tercera Vía enfatiza ideas como subsidiariedad, responsabilidad comunitaria, programas de bien
Ficheros de ayuda. Formatos.
Herramientas de generación de ayudas.
Ayuda genérica y sensible al contexto.
Tablas de contenidos, índices, sistemas de búsqueda, entre otros.
Incorporación de la ayuda a la aplicación.
Tipos de manuales: anual de usuario, guía de referencia, guías rápidas, manuales de instalación, configuración y administración. Destinatarios y estructura.
Confección de tutoriales multimedia. Herramientas de captura de pantallas y secuencias de acciones.
Herramientas para la confección de tutoriales interactivos; simulación.
Implicit Entity Recognition in Clinical DocumentsSujan Perera
With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.
With the increasing automation of health care information processing, it has become crucial to extract meaningful information from textual notes in electronic medical records. One of the key challenges is to extract and normalize entity mentions. State-of-the-art approaches have focused on the recognition of entities that are explicitly mentioned in a sentence. However, clinical documents often contain phrases that indicate the entities but do not contain their names. We term those implicit entity mentions and introduce the problem of implicit entity recognition (IER) in clinical documents. We propose a solution to IER that leverages entity definitions from a knowledge base to create entity models, projects sentences to the entity models and identifies implicit entity mentions by evaluating semantic similarity between sentences and entity models. The evaluation with 857 sentences selected for 8 different entities shows that our algorithm outperforms the most closely related unsupervised solution. The similarity value calculated by our algorithm proved to be an effective feature in a supervised learning setting, helping it to improve over the baselines, and achieving F1 scores of .81 and .73 for different classes of implicit mentions. Our gold standard annotations are made available to encourage further research in the area of IER.
Smart Data in Health – How we will exploit personal, clinical, and social “Bi...Amit Sheth
The document discusses using big health data from personal, clinical, and social sources to better understand health outcomes through tools like the Kno.e.sis research center, which analyzes data from sensors, medical records, and social media to provide personalized health information and recommendations to improve care. It also describes specific projects like kHealth, which monitors asthma patients using mobile and sensor data, and PREDOSE, which tracks prescription drug abuse using information extracted from social media.
Predicting Adverse Drug Reactions Using PubChem Screening DataYannick Pouliot
This document discusses predicting adverse drug reactions using PubChem screening data. It aims to determine if specific classes of adverse drug reactions can be identified from patterns of compound reactivity in PubChem bioassay screens. The document outlines the hypothesis that drugs with increased frequency of tissue-specific adverse drug reactions can be identified from their bioassay screening patterns. It then presents results of predictive modeling for different system organ classes, showing areas under the curve for various models and highlighting top assays correlated with specific adverse event classes. Lessons learned are discussed around database and data loading challenges.
This document discusses chiropractic and the use of scanning technologies to analyze the nervous system and spinal health. It makes the following key points:
1) Various scanning and surface EMG technologies can provide objective data about a patient's nervous system function, spinal health, and stress levels that help chiropractors identify areas of nerve interference called subluxations.
2) Technologies like thermal scanning, heart rate variability, and inclinometry can measure a patient's "GAP" or general adaptive potential and see how well they can adapt to stressors. Lower GAP indicates more subluxations and health issues.
3) When used together, these scanning tools provide a comprehensive "CORE analysis" that takes a "
The document discusses a project to analyze and predict sepsis early using clinical data. It aims to predict sepsis 6 hours before clinical diagnosis to allow for earlier treatment. The author handles missing data and class imbalance in a large dataset. Features are engineered and selected. Decision trees and XGBoost models are used for prediction, achieving partial success. Further research is needed on time-series modeling, feature importance, and model performance with a domain expert.
Using Data Analytics to Discover the 100 Trillion Bacteria Living Within Each...Larry Smarr
The document summarizes Dr. Larry Smarr's talk on using data analytics to analyze the human microbiome. Some key points:
- Next-generation sequencing and supercomputing are used to map the microbiomes of hundreds of people to analyze bacterial species abundance in health and diseases like IBD.
- Analysis with Dell Analytics and Ayasdi reveals major differences in bacterial phyla and protein families between healthy and disease states that can be used to noninvasively diagnose disease. Certain species are found at much higher or lower levels in disease states.
- Continued microbiome profiling and topological data analysis may help discover new diagnostic biomarkers for disease states and track disease progression.
This document summarizes a statistics lecture about the research process and why statistics are needed in optometry and vision science. It discusses the steps of evidence-based practice including asking questions, acquiring evidence, appraising evidence, and applying evidence. It also covers generating and testing theories, levels of measurement, measurement error, validity, reliability, types of research such as correlational and experimental research, and methods of data collection and analysis. The goal is to explain the research process and why statistics are an essential tool for evidence-based practice in optometry.
Mental Workload Alerts - Reliable Brain Measurements of HCI using fNIRS - Uni...Max L. Wilson
A human-computer interaction research talk about how we measure mental workload, and how people might reflect on this type of personal data in the future. The research is carried out at the University of Nottingham in the School of Computer Science, involving functional Near Infrared Spectroscopy (fNIRS)
Clinician Decision Support By Clinicians, For CliniciansDATA360US
Jason Jones' Presentation on "Clinician Decision Support By Clinicians, For Clinicians" at DATA 360 Healthcare Informatics Conference - March 5th, 2015
This document discusses how telehealth and real-time analytics can help critical care achieve better health outcomes, better care, and lower costs. It describes how monitoring patients and gaining situation awareness is important for critical care. Real-time data analytics can help clinicians understand a patient's current physiological status and trajectory. Pattern recognition in patient data may help identify issues earlier. The challenges of big data in healthcare including volume, velocity, variety and veracity are discussed. Technologies that provide real-time situation awareness and predictive analytics could help improve patient care and outcomes in the ICU.
This document discusses how telehealth and real-time analytics can help critical care achieve better health outcomes, better care, and lower costs. It describes how monitoring patients and gaining situation awareness is important for critical care. Real-time data analytics can help clinicians understand a patient's current physiological status and trajectory. Pattern recognition in patient data may help identify issues earlier. The challenges of big data in healthcare including volume, velocity, variety and veracity are discussed. Technologies that provide real-time situation awareness and predictive analytics could help improve patient care and outcomes in the ICU.
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human ...Hyper Wellbeing
"Hacking the Software for Life" - Brad Perkins (Chief Medical Officer, Human Logevity, Inc.)
Delivered at the inaugural Hyper Wellbeing Summit, 14th November 2016, Mountain View, California.
For more information including details of subsequent events, please visit http://hyperwellbeing.com
The summit was created to foster a community around an emerging industry - Wellness as a Service (WaaS). Consumer technologies, in particular wearables and mobile, are powering a consumer revolution. A revolution to turn health and wellness into platform delivered services. A revolution enabling consumer data-driven disease risk reduction. A revolution extending health care past sick care towards consumer-led lifelong health, wellness and lifestyle optimization.
WaaS newsletter sign-up http://eepurl.com/b71fdr
@hyperwellbeing
GEMC- Advanced Cardiac Life Support- for ResidentsOpen.Michigan
This is a lecture by Rockefeller Oteng from the Ghana Emergency Medicine Collaborative. To download the editable version (in PPT), to access additional learning modules, or to learn more about the project, see http://openmi.ch/em-gemc. Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike-3.0 License: http://creativecommons.org/licenses/by-sa/3.0/.
This document summarizes an AI lecture on applications of AI in medicine. It discusses 6 applications: 1) diagnosing diabetic eye disease using deep learning, 2) detecting anemia from retinal images using deep learning, 3) predicting cardiovascular risk using retinal images and deep learning, 4) performing differential diagnosis using probabilistic graphs, 5) extracting symptoms from clinical conversations using RNNs, 6) predicting osteoarthritis using machine learning. It also describes research on using machine learning to detect early biomarkers of osteoarthritis from MRI scans before symptoms appear. The lecture highlights how AI can help with early disease detection and augment doctor capabilities.
This document provides information from the American Academy of Sleep Medicine on sleepiness, alertness and fatigue among medical residents. It discusses the scope of the problem of sleep deprivation among residents and its impact on learning, patient care, health and safety. Specific consequences discussed include increased medical errors, impaired performance, risks of drowsy driving, and negative health effects. The document provides recommendations for managing alertness through napping, recovery sleep, healthy sleep habits and avoiding drowsy driving.
The document provides an overview of evidence-based medicine, including what it is, why it is important, where evidence comes from, levels of evidence, and the steps to practice evidence-based medicine. It discusses formulating a clear clinical question, searching for relevant information and studies, critically appraising the evidence, and applying the evidence to a specific patient case. Key aspects covered include evaluating the validity, importance, and applicability of various types of studies to determine the strength and relevance of the evidence. The goal is to systematically review and apply the best available research findings to clinical decision making.
Similar to Knowledge-driven Implicit Information Extraction (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
2. 2
Information Extraction
• More than 70% of data in organizations exist in unstructured form1
• Extraction of structured information from unstructured data is a
fundamental task
“All home medications although his insulin
dose (nph 20 qPM) was halved (--> NPH
10 qPM) on the floor, and his sugars were
running in the 150s-250s range.”
Insulin
Cisapride
Diabetes Mellitus
Hyperglycemia
Proinsulin
Porcine Insulin Insulin Glulisine
is a is a
is a
1https://en.wikipedia.org/wiki/Unstructured_data
3. 3
Information Extraction
• Almost exclusively focused on explicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he started
to have left shoulder twinges and tingling in his hands. A stress test done on
2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to
fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed
accumulation of fluid in his extremities. He does not have any chest pain.”
4. 4
Information Extraction
• Almost exclusively focused on explicit information
Named Entity Recognition Relationship Extraction
Entity Linking
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he started
to have left shoulder twinges and tingling in his hands. A stress test done on
2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to
fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed
accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
5. 5
Information Extraction
• Misses the implicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac
catheterization because of a positive exercise tolerance test. Recently, he started
to have left shoulder twinges and tingling in his hands. A stress test done on
2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to
fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed
accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
No shortness of breath
edema
Named Entity Recognition Relationship Extraction
Entity Linking Implicit information extraction
6. 6
Thesis Statement
Implicit factual information in unstructured text can be efficiently
extracted by bridging syntactic and semantic gaps in natural language
usage and augmenting information extraction techniques with
relevant domain knowledge.
7. 7
• Express sarcasm/sentiment
• “I'm striving to be positive in what I say on Twitter. So I'll refrain
from making a comment about the latest Michael Bay movie.”
• Provide descriptive information
• “small fluid adjacent to the gallbladder with gallstones which may
represent inflammation”
• Emphasize features of the entity
• “Mason Evans 12 year long shoot won big in golden globe”
• Communicate the common understanding
• “He is suffering from nausea and severe headaches. Dolasteron was
prescribed.”
• Stylistic Preferences
• “Democratic candidate Bernie Sanders … The Vermont senator …”
Credit:http://bit.ly/2b9Bnjk
8. 8
Significance
• Volume
• 20% movie references and 40% book references in tweets
• 35% edema and 40% shortness of breath references in clinical
narratives
• Value
Explicit Information
Computer Assisted Coding
30-day Readmission
Prediction
Sentiment Analysis
Structured
Information
9. 9
Significance
• Volume
• 20% movie references and 40% book references in tweets
• 35% edema and 40% shortness of breath references in clinical
narratives
• Value
Ignoring implicit information in text would adversely affect
downstream applications
Explicit Information
Implicit Information
Computer Assisted Coding
30-day Readmission
Prediction
Sentiment Analysis
Structured
Information
10. 10
Role of Knowledge
New Sandra Bullock astronaut lost in space
movie looks absolutely terrifying
The patient showed accumulation of fluid in his
extremities, but respirations were unlabored and there
were no use of accessory muscles.
Edema Accumulation of an excessive amount
of watery fluid in cells or intercellular
tissues
Shortness
of breath
Labored or difficult breathing
associated with a variety of disorders
Sandra
Bullock
Gravity
Knowledge Bases
Image credits: http://bit.ly/2b5HPDQ and Icon made by Freepik from www.flaticon.com
Credit: http://bit.ly/2bi34FGCredit: http://bit.ly/1x3sack Credit: http://bit.ly/2b9CejW Credit: http://bit.ly/2aXM97v
12. 12
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
13. 13
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
14. 14
Sentence Entity
“small fluid adjacent to the gallbladder with gallstones which may represent
inflammation.”
Cholecystitis
“His tip of the appendix is inflamed.” Appendicitis
“The respirations were unlabored and there were no use of accessory muscles.” Shortness of breath (NEG)
Implicit Entities in Clinical Documents
• One should know the physiological observations that characterize
particular entity
• Negations are embedded in the phrases indicating entities
• “Patient denies shortness of breath”
• “The respirations were unlabored”
15. 15
Knowledge Acquisition
• Unified Medical Language System – integrate many health and
biomedical vocabularies
• Linguistic Knowledge – WordNet
• Synonyms/antonyms
• Syntactic variations of the same term
CUI AUI STR
CUI TUI
CUI STR DEF SAB
Definitions for shortness of breath
A disorder characterized by an uncomfortable sensation of difficulty
breathing
Difficult or labored breathing
Labored or difficulty breathing associated with a variety of disorders,
indicating inadequate ventilation or low blood oxygen or a subjective
experience of breathing discomfort
16. 16
Knowledge Modeling
• Each entity has multiple definitions
• Each definition is processed to create entity indicator
• Representative power of the term (r1) calculated with
measure inspired by TF-IDF
• A collection of entity indicators constitute entity
model
definition1
definition2
definition3
Entity
Indicator1
Entity
Indicator2
Entity
Indicator3
Entity Model
Definition Entity Indicator
A disorder characterized by an uncomfortable
sensation of difficulty breathing
(uncomfortable, r1), (sensation, r2),
(difficulty, r3), (breathing, r4)
Difficult or labored breathing (difficult, r5), (labored, r6), (breathing, r4)
17. 17
Detecting Sentences with Implicit Entities
• The sentences with entity representative term but without the
entity name may have implicit mention of the entity.
“However, Mr. Smith is comfortably breathing in room air.”
Candidate sentence for shortness of breath
18. • The similarity between entity model and the pruned sentence is
measured to annotate them with positive or negative labels
• We developed a semantic similarity measure that takes care of the
synonyms and antonyms
18
Information Extraction – Entity Linking
Candidate Sentence
Indicator1
Indicator2
Indicator3
Entity Model
sim1
sim2
sim3
19. 19
Information Extraction – Entity Linking
ct1
ct2
ct3
ct4
et5
et6
et7
Candidate Sentence Entity Indicator
WordNet
If antonym then -1
else max similarity
𝑠𝑖𝑚 ∗ 𝑟𝑝 𝑒𝑡
𝑟𝑝 𝑒𝑡
>t1
<t2
Positive Annotation
Negative Annotation
20. 20
Evaluation
• Re-annotated the SemEval-
2014 task 7 dataset for implicit
entities
• Entities are selected
considering the frequency of
appearance and with expert
feedback
• 857 sentences selected for 8
entities
• Annotated by three domain
experts
• Annotation agreement 0.58
Entity Positive
Annotations
Negative
Annotations
Shortness of Breath 93 94
Edema 115 35
Syncope 96 92
Cholecystitis 78 36
Gastrointestinal Gas 18 14
Colitis 12 11
Cellulitis 8 2
Fasciitis 7 3
21. 21
Algorithm Positive
Precision
Positive
Recall
Positive
F1
Negative
Precision
Negative
Recall
Negative
F1
Our 0.66 0.87 0.75 0.73 0.73 0.73
MCS 0.50 0.93 0.65 0.31 0.76 0.44
SVM 0.73 0.82 0.77 0.66 0.67 0.67
Adding similarity value as a feature for the supervised algorithm
SVM+MCS 0.73 0.82 0.77 0.66 0.66 0.66
SVM+Our 0.77 0.85 0.81 0.72 0.75 0.73
• Baselines
• MCS algorithm (Mihalcea 2006)
• SVM (trained on n-grams)
• Our algorithm outperforms selected baselines in negative category.
• SVM is able to leverage the supervision to beat our algorithm in
positive category.
Annotation Performance
22. 22
Similarity as a Feature to Supervised Algorithm
• Added similarity value of unsupervised algorithms as a feature to the
SVM.
Positive Annotations Negative Annotations
23. 23
Annotation Performance – A Study with the Confidence
• Each annotation has
confidence ranges from 1 to 5
• Low confidence reflects
incomplete or ambiguous
information
• Annotation performance
increases as the confidence
increases
• The negative class shows
significant increment
24. 24
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
25. 25
• Use diverse characteristics of the entity
– “New Sandra Bullock astronaut lost in space movie looks absolutely
terrifying”
– “ISRO sends probe to Mars for less money than it takes Hollywood to send a
woman to space.”
– “oh yeah there is that new space movie coming out that looks terrifying i am
going to go see it”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to
earn $1 billion
Paul walkers’
last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bkePJ6
26. 26
• Use diverse characteristics of the entity
– “… Richard Linklater movie …”
– “… Ellar Coltrane on his 12-year movie …”
– “… 12-year long movie shoot …”
– “… Mason Evan's childhood movie …”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to
earn $1 billion
Paul walkers’
last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bk8xdp
27. 27
Knowledge Acquisition
• Acquiring factual knowledge
• Source – DBpedia
• Not all factual knowledge is important – movie has ‘starring’ and
‘director’ as well as ‘billed‘ and ‘license’
• Rank the relationships based on joint probability with the entity type
• Values of top-k relationships and the value of rdfs:comment are obtained
• Acquiring contextual knowledge
• Source – contemporary tweets
• We collect 1000 tweets with explicit mentions of the entity
• Number of views for the entity’s Wikipedia page within last t days
28. 28
Knowledge Acquisition
Wikipedia page titles
and anchor texts
Contemporary
tweets
Generate
semantic cues
Factual
knowledge
Clean tweets
Generate n-grams
• Need to extract meaningful phrases from acquired
knowledge
• Meaningful phrases = Wikipedia titles + anchor texts
• Matching n-grams are added to semantic cues
• Non-matching n-grams are added to semantic cues
after removing stop words
29. 29
Knowledge Modeling – Entity Model Network
Sandra Bullock
Alfonso Curan
Mars orbiter mission
Woman in space
astronaut
• A property graph - reflecting the topical relationships between entities
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
|𝑁|
|𝑁𝑐 𝑗
|
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑝ℎ𝑟𝑎𝑠𝑒 𝑖𝑛 𝑡𝑤𝑒𝑒𝑡𝑠
𝑡𝑖𝑚𝑒 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒 = number
of Wikipedia views
𝑁 − 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠, 𝑁𝑐 𝑗
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑡 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
Factual Knowledge
Contextual Knowledge
Entity
Gravity
Christopher Nolan
Matt Damon
Interstellar
The Martian
30. 30
Detecting Tweets with Implicit Entities
• Tweets are filtered with keywords – movie, film, book, novel
• Applied simple annotation technique – dictionary matching
• The tweets that are not annotated with entity of types we are
looking for are considered to have implicit entity mentions
Keywords
Entity
Dictionary
Annotating
Tweets
31. 31
Information Extraction – Entity Linking
• Two Step Process
• Step 1: Candidate selection and filtering
• Objective - prune the search space to reduce number of entities to be
considered in disambiguation step from EMN
• Step 2: Disambiguation
• Objective - sort the selected candidate entities to place the implicitly
mentioned entity in top position
32. 32
Entity Linking - Candidate selection and filtering
m1
m2 m4
m5
m3
m7
m6
c1
c5
c8
c4
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money
than it takes Hollywood movie to send a
woman to space”
m8
EntityFactual Knowledge Contextual Knowledge
33. 33
m1
m2 m4
m5
m3
m7
m6
c1
c5
c8
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money
than it takes Hollywood movie to send a
woman to space” c5
c2 c7
c8
m8
Factual Knowledge Contextual Knowledge Entity
Entity Linking - Candidate selection and filtering
c4
35. 35
m1
m2 m4
m5
m3
m7
m6
c1
c5
c8
c6
c3
c2
c9
c7
c5
c2
m1
m2
m4
m5
m3
𝑠𝑐𝑜𝑟𝑒 𝑚 𝑖
=
𝑐 𝑗 𝜖 ℂ
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑜𝑓 𝑐𝑗 ∗ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑐𝑗, 𝑚𝑖)
c7
c8
m6
m7
m2
m4
m6
m7
m3
ℂ is the set of matching cues
m8
Factual Knowledge Contextual Knowledge Entity
“ISRO sends probe to Mars for less money
than it takes Hollywood movie to send a
woman to space”
Entity Linking - Candidate selection and filtering
c4
36. 36
• Formulated as a ranking problem
• SVMrank to rank candidates
• Similarity between the candidate entity and the tweet
• Temporal salience of the candidate entity
x1 x2 x3 … xn
xj= 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑜𝑓 𝑐𝑗 ∗ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑐𝑗, 𝑒𝑖)
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒 𝑒𝑖
𝑒∈𝐸 𝑐
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒 𝑒
𝐸𝑐 is the selected candidate set
m2
m6
m4m3
m7
Entity Linking - Disambiguation
37. Evaluation Dataset
Entity Type Annotation Tweets Entity
Movie Explicit 391 107
Implicit 207 54
NULL 117 0
Book Explicit 200 24
Implicit 190 53
NULL 70 0
• Tweets are collected in August 2014 using keywords
• Manually annotated the tweets with DBpedia URL of entities
• The tweets annotated with NULL do not have either explicit or implicit
mention of an entity
37
38. Entity Model Network Creation
• 15,000 tweets for movies and books in July 2014
• 617 movies and 102 books
• Recent 1000 tweets per entity to build its contextual knowledge
• May 2014 version of DBpedia used to extract factual knowledge
• Temporal salience is obtained for July 2014
m1
m2 m4
m5
m3
m7
m6
c1
c5
c8
c4
c6
c3
c2
c9
c7
38Factual Knowledge Contextual Knowledge Entity
39. • How many tweets had correct entity within selected candidate set (top-25)?
• How many entities were correctly linked by our disambiguation approach?
• Importance of contextual knowledge
Evaluation - Implicit Entity Linking
Entity Type Candidate Selection Recall Disambiguation accuracy
Movie 90.33% 60.97%
Book 94.73% 61.05%
39
Step Entity Type Without Contextual
Knowledge
With Contextual
Knowledge
Candidate Selection
Recall
Movie 77.29% 90.33%
Book 76.84% 94.73%
Disambiguation
Accuracy
Movie 51.7% 60.97%
Book 50.0% 61.05%
40. Qualitative Error Analysis
Error Tweet Entity
Lack of contextual
knowledge
‘That Movie Where Shailene Woodley Has Her First Nude Scene?
The Trailer Is RIGHT HERE!: No one can say Shailene Woodley isn't
brave!’
White Bird in a Blizzard
Novel entities ‘”hey, what's wrawng widdis goose?" RT @TIME: Mark Wahlberg
could be starring in a movie about the BP oil spill
http://ti.me/1oZh55V'
Deepwater Horizon
Cold start of entities ‘Video: George R.R. Martin's Children's Book Gets Re-release
http://bit.ly/1qNNH5r’
The Ice Dragon
Multiple implicit entity
mentions
‘That moment when you realize that hazel grace and Augustus are
brother and sister in one movie and in love battling cancer in
another’
Divergent, The Fault in
Our Stars
40
41. 41
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
43. 43
• Implicit relationships:
• Exist between symptoms, disorders, medications, and procedures
• Can be established by leveraging domain knowledge
• The existing knowledge bases fall short in eliciting relationships
• Data + Knowledge can help to elicit such implicit relationships
efficiently
Implicit Relationships in Clinical Narratives
45. 45
A Scenario
Atrial fibrillation
Hypertension
Diabetes
Fatigue
Syncope
Weight loss
Chest pain
Discomfort in chest
Dizzy
Shortness of Breath
Nausea
Vomiting
Headache
Cough
Weight gain
Atrial fibrillation
Hypertension
Diabetes
Chest pain
Weight gain
Discomfort in chest
Cough
Headache
Edema
Shortness of Breath
Knowledge base does not know about edema. Now edema can
be a symptom of any disorder in the document.
Observed
Disorders
Observed
Symptoms
46. 46
Knowledge Acquisition
• Hierarchical knowledge and non-hierarchical knowledge
Hierarchical Knowledge
Retrieved from UMLS
Non-hierarchical Knowledge
Extracted from Web Resources
+
Feedback from domain expert
www.nlm.nih.gov www.en.wikipedia.org
www.webmd.com www.mayoclinic.com
www.clevelandclinic.org ww.healthline.org
CUI AUI PAUI PTR
C0013404 A0052186 A0111363 A0434168.A2367943. …
C0013604 A0052723 A0135504 A0434168.A2367943
CUI AUI SAB STR
C0013404 A0052186 MSH Shortness of breath
C0013604 A0052723 MSH Edema
MRHIER
MRCONSO
48. 48
Detecting Unexplained Symptoms
• Clinical documents were semantically annotated for entities using
cTAKES
• Known relationships are populated
• Unexplained symptoms were detected Modeled
Knowledge
Credit:http://bit.ly/2aMWVAd
49. 49
Information Extraction – Unknown Relationships
• Naïve method would assume relationship between unexplained
symptom and all disorders in clinical narrative
• Can we leverage the knowledge we have about symptom to find
most plausible disorders?
• Intuition: a symptom is most likely to be shared by similar disorders
54. 5. Eliminate non-
matching candidate
disorders
54
S
D2
D4
We left with most plausible disorders for unexplained symptom. If
this scenario occurs frequently, it increases the confidence on this
relationship.
Information Extraction – Unknown Relationships
55. 55
Evaluation
• A corpus of 1,500 electronic medical records were used
• Annotated with cTAKES and selected the most frequent entities
were selected
• UMLS semantic types were used to categorize disorders and
symptoms
• Initial knowledge base - 86 disorders, 42 symptoms, 255 disorder-
symptom relationships
56. • There were 29 distinct unexplained symptoms
• Precision of the questions generated
• 1st iteration - 105 correct from 142 (73.94%)
• 2nd iteration - 20 correct from 29 (68.96%)
• 3rd iteration - 4 correct from 9 (44.44%)
56
Evaluation – Relationship Prediction
Symptom Number of unexplained
instances
Edema 910
Syncope 336
Systolic Murmur 168
Tachycardia 143
Angina 136
Disorder Number of co-
occurrences
Hypertension 647
Hyperlipidemia 641
Claudication 454
Coronary atherosclerosis 395
Coronary artery disease 242
Top 5 unexplained symptom Top 5 co-occurring disorders with edema
57. 57
Evaluation – Increment in Explainability
Knowledge base Number of unexplained
relationships
Increment in explainability
Initial knowledge base 2251 0%
After 1st iteration 878 60.99%
After 2nd iteration 806 64.19%
58. 58
Summary
• Implicit information is frequent occurrence in text and ignoring them
would adversely affect downstream applications.
• Linguistic and world Knowledge plays an important role in decoding
implicit information.
• This dissertation demonstrated characteristics of implicit information
and developed solution to capture factual implicit constructs.
Knowledge Acquisition Knowledge Modeling Detecting Implicit
Information
Information Extraction
59. 59
Contributions
• Identify and demonstrate the value of implicit information.
• Study the characteristics of the implicit information manifestation.
• Demonstrate the value of knowledge in extracting factual implicit
information.
- Linguistic - Domain - Contextual
• Developed a framework for factual implicit information extraction.
• Demonstrated the usage of the framework to solve three implicit
information extraction problems.
60. 60
Graduate Life@Kno.e.sis
Journal Publications:
• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas
Nair, Semantics Driven Approach for Knowledge Acquisition from EMRs, IEEE Journal of
Biomedical and Health Informatics.
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu
Chen and Amit Sheth. I just wanted to tell you that loperamide WILL WORK': A Web-
Based Study of Extra-Medical Use of Loperamide.
Conference Publications:
• Sujan Perera, Pablo Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad
Thirunarayan, Implicit Entity Linking in Tweets, ESWC 2016
• Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex,
Christopher Heid, Greg Mott, Implicit Entity Recognition in Clinical Documents, *SEM
2015
• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas Nair, Data
Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in the
Healthcare, BIBM 2012
• Menasha Thilakaratne, Ruvan Weerasinghe, Sujan Perera, Knowledge-driven Approach
to Predict Personality Traits by Leveraging Social Media Data, WI 2016
Workshop and Posters:
• Sujan Perera, Amit Sheth, Krishnaprasad Thirunarayan, Challenges in Understanding
Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can
Help, DARE 2013
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu
Chen, Amit Sheth. A Web-Based Study of Self-Treatment of Opioid Withdrawal
Symptoms with Loperamide, CPDD 2012
Internships:
• ezDI Summer 2012
• IBM Watson Summer 2014 and 2015
Awards and grants:
• George Thomas Graduate Fellowship
• NSF travel grants: BIBM and ICHI
PC Committee:
• DARE (2013), EKAW (2014, 2016), ISWC
2015, IJCAI 2016
External Reviewer:
• ISWC, ESWC, IJSWIS, IEEE Intelligent
Systems, Applied Ontology, ODBASE
Proposal Contributions:
• eDrugTrends (NIH R01)
• Healthcare Outcome Prediction (NSF-SCH)
Mentoring:
• Adarsh Alex (MSc)
• Menasha Tilakaratne (BSc)
62. 62
Coffee Mates and Colleagues
Thank You
Funding
• ezDI
• George Thomas Fellowship
• NSF: CNS 1513721 Context-
Aware Harassment Detection
on Social Media
Editor's Notes
Hi Good morning everyone, thanks for attending my dissertation defense.
My dissertation topic is “Knolwedge-driven Implicit Information Extraction”
My dissertation committee is Dr. Sheth, my advisor, Dr. Prasad, Dr. Raymer, and joining us remotely Dr. Mendes from IBM research
It is a well-known fact that more than 70% of data are unstructured
The field of information extraction focuses on extracting structured information from unstructured data
The information extraction can add more semantics to the text
For example, when we know that the insulin here is the medication and not a regular English term we know more about the text
We know it is probably treating diabetes and it should not be used with Cisapride
We have exclusively focused on explicit information extraction
Here is a text snippet extracted from a clinical narrative
With the current information extraction techniques we can identify these entities and relationships, namely we can perform named entity recognition, entity linking, and relationship extraction on this text
However, this snippet had these two phrases and they say that the patient does not have shortness of breath and he has edema.
These were implicit and we don’t know how to identify these implicit entities in this text
This is the problem I address in my dissertation work
My thesis statement is …
Implicit communication is not a random occurrence, people have genuine reasons to do so
The first example here has negative sentiment towards Michael Bay movie and sarcastic in nature
Sometimes the terms used to indicate entities does not provide complete and important information, the term cholecystitis does not provide enough information in second example, having better idea about the causes helps accurate treatments. However, in medical domain it is important to know these nuances
We use them when we forget stuff and to emphasize features of the entity rather than the entity
One more instance is when we know other party has required knowledge to interpret the statements
Implicit information has enough volume to get notice and a value, at least on the domain we looked at.
Our studies showed that some entities are mentioned implicitly between 20%-40% of the time
In terms of value, extracting structured information is very important to applications like computer assisted coding an secondary data analysis like prediction tasks on medical data and tasks like sentiment analysis on tweets
But all of them were functioning without the implicit information
Sometimes, they were handicapped
For example a sentiment analysis application would not know the target of the positive sentiment in the tweet “new Sandra Bullock space movie is terrific”
Hence ignoring implicit entity mentions would adversely affect downstream applications
Whenever we talk implicitly we assume a common understanding between us and the audience
For example, the first scenario here, the speaker assumes that audience know about physiological observations that characterizes edema and shortness of breath
In the second scenario, speaker assumes that audience know Sandra Bullock starred in Gravity
These knowledge is available in different formats.
Such resources contain linguistic knowledge, common sense knowledge, domain-specific and cross-domain world knowledge
http://icons.mysitemyway.com/legacy-icon/115791-magic-marker-icon-people-things-people-audience/
Icon made by Freepik from www.flaticon.com
In this dissertation I address the problem of implicit information extraction.
Here are the four main components of the solution I developed.
As demonstrated domain knowledge is an important component, hence the solution acquire relevant domain knowledge.
The acquired knowledge does not always readable by machine algorithms
Hence we model it in a way that can be consumed by algorithms.
Then given a text it identifies whether it has implicit information of interest and develop techniques to extract such information.
Within this talk I will demonstrate 3 applications of implicit information extraction and show how each of these high level components are realized to solve the problem at hand.
Before going into the solutions, let me define the focus of the dissertation
There are so many things can be expressed implicitly: ideas, opinions, and facts.
Before going into the solutions, let me define the focus of the dissertation
There are so many things can be expressed implicitly: ideas, opinions, and facts.
What we observe is that when the entities are mentioned implicitly in clinical documents they describe the entities with physiological observations.
Sometimes you need semantics of the language to understand what is in the description
One of the prominent place that you can obtain medical knowledge is UMLS.
It provides human curated definitions of the entities using their physiological observations, this is ideal input to extract implicit entities.
Linguistic knowledge about the semantics of the terms can be obtained from WordNet
The definitions are text snippet and cannot be used as it is to understand implicit entity mentions.
Hence, we process them and create models that can be read by the machines.
The idea is that entity model will consist of essential characteristics described in the definitions and are machine readable
An entity can have multiple definitions, each definition uses different terminologies to explain the entity and this information is important since their manifestations can also use different terminologies.
So entity model benefit from this factor and use all definitions, it consists of entity indicators which are created using definitions. A collection of entity indicators is called entity model.
Here is an example for shortness of breath…
The r_i in the entity indicator is called representative power and it reflects the term’s ability to discriminately identify the mention of the entity
The entity representative term can be used to identify the candidate sentences with implicit entity mentions
The sentences with ERT but without the entity name are identified as candidate sentences and they are pruned to get essential portion of the sentence.
The entity linking step takes the pruned sentence and the entity model as inputs and compare them to see whether the sentence has mention of the entity of given model.
Basically pruned sentence is compared with each entity indicator and measure the similarity between them
Similarity is calculated by comparing the terms in entity indicator and candidate sentence
WordNet is used to understand the lexical semantics
The term similarity is weighted with its representative power and normalized
Total similarity value lies between -1 and +1
We evaluated this work by creating a dataset by re-annotating a SemEval dataset.
We selected 8 entities based on their frequency of appearance and selected 850 sentences with implicit mentions of these entities.
The table shows the distribution of the sentences for each entity in two classes.
In order to assess how our proposed algorithm works, we used one unsupervised and one supervised baseline.
As shown in first three rows our algorithm doing well in identifying the negated mentions of the entities and SVM does slightly better in identifying positive mentions.
So as the next step we added the similarity value we calculated as a feature to the supervised solution.
The supervised solution was able to leverage the semantics of the similarity value to enhance its results.
he was placed on mechanical ventilation shortly after presentation – mechanical ventilation is not in definitions, but experts annotate them positively – SVM captueres such bigrams in training
Although supervised algorithm does well with added features, it needs to be trained on each clinical entity.
This means it needs labeled data for each entity, we checked whether this can be compensated with our features
Then we studied our results with annotation confidence, each annotation has confidence marked from 1 to 5.
Low confidence reflects incomplete or ambiguous information
So we expect our results to increase as the annotation confidence increases.
The high confidence negations are mostly explicit, low confidence negations are confused (negation or absence)?
For example, annotators often argue whether ‘patient breathing at a rate of 15-20’ means the negation of entity ‘shortness of breath’ (because that is a normal breathing pattern) or just lacks a mention of the entity.
Before going into the solutions, let me define the focus of the dissertation
There are so many things can be expressed implicitly: ideas, opinions, and facts.
Twitter is very dynamic environment and it react to the real world event quickly
This is reflected in implicit mention of entities in tweets
Tweets use diverse characteristics of the entities to mention them in an implicit manner
The phrases they use are time-sensitive
Static manually curated definitions do not give sufficient information to decipher
Look at these examples:
Same entity can be referred using different characteristics – all these phrases refer to movie Gravity using different features of the movie
Knowledge should be upto date – same phrase can be used to refer different entities at different times
Same entity can referred with different phrases at different times
First example is similar to medical entity definitions where each entity had more than one way to express it, hence multiple definitions. But in this case the phrases listed here are not paraphrases, they are literally different ways to express the entity.
Twitter is very dynamic environment and it react to the real world event quickly
This is reflected in implicit mention of entities in tweets
Tweets use diverse characteristics of the entities to mention them in an implicit manner
The phrases they use are time-sensitive
Static manually curated definitions do not give sufficient information to decipher
Look at these examples:
Same entity can be referred using different characteristics – all these phrases refer to movie Gravity using different features of the movie
Knowledge should be upto date – same phrase can be used to refer different entities at different times
Same entity can referred with different phrases at different times
First example is similar to medical entity definitions where each entity had more than one way to express it, hence multiple definitions. But in this case the phrases listed here are not paraphrases, they are literally different ways to express the entity.
Knowledge acquisition should acquire very comprehensive knowledge – we use three different sources to acquire three different types of knowledge
We acquire factual knowledge from DBpedia,
contextual knowledge that reflects the temporally relevant topics to the entity are acquired from daily communications in Twitter
knowledge about the temporal relevance of the entity is obtained through number of hits to its Wikipedia page
The collected knowledge has unstructured text components specially tweets, we need to identify meaningful phrases from them
In order to do that we create a dictionary of meaningful phrases using Wikipedia page titles and anchor texts
Then these phrases are used to spot meaningful phrases in unstructured text – they are called semantic cues
Knowledge is modelled reflecting the association of the semantic cues towards the entities
Some semantic cues are shared by the entities
The modelled knowledge reflects the topical relationship between entities
The semantic cues are weighted based on their specificity towards entities – the cues that are common among multiple entities are weighted low
The entity model network provides necessary infrastructure to perform entity linking
Now we need to identify tweets with implicit entities
We do this by first filtering tweets from stream of tweets using keywords – then annotate them for explicit entity mentions, the tweets that are not annotated with entities are assumed to have implicit entity mentions
The information extraction phrase links the tweets with implicit entities with correct entity
It consists of two phrases
The first phrase find the candidates entities and second phrase rank candidate entities in a way that rank the correct entity to the top place
Lets walk through one example here
First it matches the phrases in the incoming tweet with the nodes in the entity model network
All entities that have matching node as neighbor will become candidates
The matching entities are scored based on the strength of the evidence – the strength captures how many semantic cues match and how specific those cues are for particular entity
Then we take top-k entities as candidates for ranking step
The ranking step considers the similarity between the tweet and the entity and temporal salience of the entity to rank candidates
Isn’t the candidate ranking enough? No, candidate ranking consider only matching phrases, how about the impact by the non matching phrases? If the non matching phrases are more specific to entity that entity should be ranked lower.
In order to evaluate this work we annotated tweets of two domains – movies and books
Here are the statistics of the annotation –
When you filter the tweets with keywords there are three types of tweets – tweets with explicit entities, tweets with implicit entities, and tweets with no entities
For example the evaluation data set 391 tweets with explicit movie entities, 207 tweets with implicit movie entities, and 117 with no movie mentions
Total tweets 1175
The evaluation dataset was collected in August 2014, we created entity model network for July 2014
We collected entities of interest by collecting 15,000 tweets and annotating them
So EMN consists of 617 movies and 102 books
We collected maximum 1000 tweets per each entity to extract contextual knowledge
Wikipedia statistics of July 2014 is used to estimate the temporal salience
We evaluated both candidate selection and disambiguation
Candidate selection achieves more than 90% recall and disambiguation achieved around 61% accuracy
Then we evaluated the contribution of the knowledge extracted from Twitter conversations – it is observed that there is a clear improvement in results in both steps when entity models are built with contextual knowledge
Candidate Selection Recall - proportion of the tweets that had correct entity within top k (k=25) candidates
Disambiguation accuracy – proportion of the tweets that had correct entity in the top position
Why disambiguation is hard: the number of candidates that you get is high and they are relevant, finding candidates for term Gravity vs movies with Sandra Bullock, second one has more candidates
Also, if it is a disambiguation of gravity, you have different types of entities, but in the second scenario, all of them are movies, so the disambiguation is harder among same type of entities vs entities of different types
Here are the four types of errors we found in entity linking step
Correct examples and show how EMN is activated with examples
Before going into the solutions, let me define the focus of the dissertation
There are so many things can be expressed implicitly: ideas, opinions, and facts.
Look at this example document snippet, it list down disorders, symptoms, medications, and procedures
But do not tell how they are related
A medical professional reading this document will create a graph like this after reading this document
These relationships exist between all types of entities in clinical documents.
If you have relevant domain knowledge about the relationships between these entities, it is possible to establish these relationships
Lets have a look at a scenario
The task boils down to find absent knowledge in the knowledge base in an efficient manner
Assume that you have this kind of knowledge which has the relationships between disorders and symptoms
Now given a document with entities shown on your right, you can establish the relationships between disorders and symptoms.
However, it seems knowledge base does not know why edema is present in the document.
This is frequently the case with clinical knowledge bases – they lack non-hierarchical relationships
Now you have the choice of asking from a domain expert for an explanation about the presence of edema in this document.
Instead what we did was, we develop an algorithm that leverages known things about edema in order to make the best guess among the three disorders here
Essentially, what it does is, it analyses the clinical narratives with available partial domain knowledge and identify the gaps in the knowledge base.
Then it suggest link that should be in knowledge base, these links ultimately help to establish implicit relationships between entities in clinical records
We start with collecting the knowledge about clinical entities in standard knowledge bases
This knowledge includes hierarchical and non-hierarchical relationships between the entities.
Hierarchical knowledge is obtained from UMLS and non-hierarchical knowledge is obtained from credible Web resources
Then we model the entities of interest using collected knowledge.
Entity models look like this.
In order to detect unexplained symptoms, we assume that all symptoms in clinical documents should be explained by the disorders present in same document.
Whenever we see symptoms present without explanation, we flag them.
The first step to do this is to identify the entities in clinical documents and establish known relationships
We used cTAKES to annotate clinical documents and then used acquired knowledge to establish relationships
All this is done using IntellegO ontology with perceptual reasoning
Now our task is to find the relationship that is not present in the knowledge base
A naïve method would say any of the co-occurring disorders can have relationship
But, can we use the knowledge that we have about the symptoms to make smart guesses?
Lets walk through a exercise that essentially does this
Assume S is the unexplained symptom and D1 to D5 are co-occurring disorders
We know that S is a symptom of D6 and D7
We know more about D6 and D7, lets use this knowledge
Now seems that there is a overlap between co-occurring disorders and disorders that are similar to D6 and D7
This says us that D2 and D4 are the best candidates to have relationship between S
So we pruned the search space and consult domain expert about the relationship between D2 and D4
Now if you are seeing this combination frequently, you have more confidence on your guesses. This is exactly what we did
We evaluated this work with 1500 electronic medical records
Our initial knowledge base had 86 disorders, 42 symptoms, and 255 relationships
In our evaluation dataset there were 29 distinct unexplained symptoms and the table on the right shows the most frequent disorders appear when edema was found unexplained
Now our algorithm ran in an iterative manner
Run the first round and find relationships add them to knowledge base and run same reasoning again on the corpus
The precision of the generated questions are around 73 in the first iteration and decreased as it goes on
Number of relationships –
1st iteration – 105 (105/142)
2nd iteration – 29 (20/29)
3rd iteration – 4 (4/9)
Adding more relationships improves the explainability of the documents
This is because we are identifying more implicit relationships in documents
Explainability of the corpus increased by 64% after 2nd iteration
In summary, we showed that we cannot afford to ignore implicit information in text
Knowledge about the world plays a significant role in extracting implicit information. I have demonstrated this with three applications.
Here is the summary we did in each component in each application we demonstrated in this talk
Unsupervised – no labelled data
Here is the explicitly list down contributions of the dissertation
I have helped few people in their ezDI internships