Here are 3 similar requests with the experts' answers:
Nearest neighbor 1 I am still breastfeeding my son who is 9 months old. My periods
are very irregular. Can I get pregnant again?
Expert: As long as you are breastfeeding, your fertility is reduced. However,
irregular periods do not necessarily mean that you are infertile. It is
possible to get pregnant before your first period. But to plan a pregnancy,
I would recommend weaning the baby and waiting for at least 3 regular cycles.
Then try to conceive. Come see us if it does not work within 6 months.
Nearest neighbor 2 My cycles are very irregular since giving birth 6 months ago. I am
still breast
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
The Web considers one of the main sources of customer opinions and reviews which they are represented in two formats; structured data (numeric ratings) and unstructured data (textual comments). Millions of textual comments about goods and services are posted on the web by customers and every day thousands are added, make it a big challenge to read and understand them to make them a useful structured data for customers and decision makers. Sentiment
analysis or Opinion mining is a popular technique for summarizing and analyzing those opinions and reviews. In this paper, we use natural language processing techniques to generate some rules to help us understand customer opinions and reviews (textual comments) written in the Arabic language for the purpose of understanding each one of them and then convert them to a structured data. We use adjectives as a key point to highlight important information in the text then we work around them to tag attributes that describe the subject of the reviews, and we associate them with their values (adjectives).
This document compares classification and regression models using the CARET package in R. Four classification algorithms are evaluated on Titanic survival data and three regression algorithms are evaluated on property liability data. For classification, random forests performed best based on the F-measure metric. For regression, gradient boosted models performed best based on RMSE. The document concludes classification can predict Titanic survivor characteristics while regression can predict property hazards.
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
This document outlines the aim, objectives, scope, and structure of a dissertation on using genetic programming to optimize and combine K nearest neighbor classifiers for intrusion detection. The aim is to use genetic programming with the KDD Cup 1999 dataset to develop a numeric classifier that shows improved performance over individual KNN classifiers. The objectives are to determine if a GP-based numeric classifier outperforms individual KNN classifiers, if GP combination techniques produce higher performance than KNN component classifiers, and if heterogeneous KNN classifier combination performs better than homogeneous combination. The document describes the methodology that will be used, including developing an optimal KNN classifier using fitness evaluation in the first phase and combining optimal KNN classifiers based on ROC curves in the second phase.
El documento describe las principales partes de un computador, incluyendo la CPU, que es el cerebro del computador donde se producen la mayoría de los cálculos; la tarjeta madre, que contiene los conectores para conectar otras tarjetas; y la memoria RAM, que permite acceder aleatoriamente a los bytes de memoria sin tener que acceder a los anteriores.
IRJET- Classification of Chemical Medicine or Drug using K Nearest Neighb...IRJET Journal
This document proposes using a combination of K-nearest neighbors (KNN) and genetic algorithms to classify chemical medicine or drug data with improved accuracy. KNN is described as a simple and effective classification algorithm that stores training data instances. Genetic algorithms are presented as evolutionary algorithms useful for optimization problems. The proposed system applies genetic search to rank attribute importance, selects high-ranked attributes, and then applies both KNN and genetic algorithms to classify the drug data, aiming to improve classification accuracy over using either technique alone. The combination of KNN and genetic algorithms is expected to better optimize classification of complex medical data compared to other algorithms.
Drug discovery and development is a long and expensive process and over time has notoriously bucked Moore’s law that it now has its own law called Eroom’s Law named after it (the opposite of Moore’s). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of the failures.
Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible is paramount in accelerating drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains.
Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories:
1. Discovery,
2. Toxicity and Safety, and
3. Post-Market Monitoring.
We will address the recent progress in predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them.
We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will address some of the remaining challenges and limitations yet to be addressed in the area of drug discovery and safety assessment.
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly.
For more information:
http://societyofdatascientists.com/controlling-informative-features-for-improved-accuracy-and-faster-predictions-in-omentum-cancer-models/?src=slideshare
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
The Web considers one of the main sources of customer opinions and reviews which they are represented in two formats; structured data (numeric ratings) and unstructured data (textual comments). Millions of textual comments about goods and services are posted on the web by customers and every day thousands are added, make it a big challenge to read and understand them to make them a useful structured data for customers and decision makers. Sentiment
analysis or Opinion mining is a popular technique for summarizing and analyzing those opinions and reviews. In this paper, we use natural language processing techniques to generate some rules to help us understand customer opinions and reviews (textual comments) written in the Arabic language for the purpose of understanding each one of them and then convert them to a structured data. We use adjectives as a key point to highlight important information in the text then we work around them to tag attributes that describe the subject of the reviews, and we associate them with their values (adjectives).
This document compares classification and regression models using the CARET package in R. Four classification algorithms are evaluated on Titanic survival data and three regression algorithms are evaluated on property liability data. For classification, random forests performed best based on the F-measure metric. For regression, gradient boosted models performed best based on RMSE. The document concludes classification can predict Titanic survivor characteristics while regression can predict property hazards.
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
This document outlines the aim, objectives, scope, and structure of a dissertation on using genetic programming to optimize and combine K nearest neighbor classifiers for intrusion detection. The aim is to use genetic programming with the KDD Cup 1999 dataset to develop a numeric classifier that shows improved performance over individual KNN classifiers. The objectives are to determine if a GP-based numeric classifier outperforms individual KNN classifiers, if GP combination techniques produce higher performance than KNN component classifiers, and if heterogeneous KNN classifier combination performs better than homogeneous combination. The document describes the methodology that will be used, including developing an optimal KNN classifier using fitness evaluation in the first phase and combining optimal KNN classifiers based on ROC curves in the second phase.
El documento describe las principales partes de un computador, incluyendo la CPU, que es el cerebro del computador donde se producen la mayoría de los cálculos; la tarjeta madre, que contiene los conectores para conectar otras tarjetas; y la memoria RAM, que permite acceder aleatoriamente a los bytes de memoria sin tener que acceder a los anteriores.
La democracia requiere paz, equilibrio social y estabilidad. Los principios de justicia, amor, libertad y libertad de expresión deben ser protegidos de acuerdo con la constitución. Los ciudadanos deben hacer oír sus voces para sistematizar los valores y principios de la sociedad.
This document appears to be a magazine titled "Good Eating" for spring/summer 2006, as various page numbers are listed throughout ranging from pages 3 to 43 without any other context provided.
Sourajit Aiyer - GSCGI WealthGram, Switzerland - Earning Alpha at a Passive C...South Asia Fast Track
This document provides a summary of a quantitative finance portfolio strategy called "Consistent Dividend Consistent Fundamentals" or CDCF. The strategy aims to outperform benchmarks during volatile markets by investing in a concentrated portfolio of stocks that demonstrate consistency in fundamentals like revenue, earnings, and profit margins as well as consistent dividend payments.
The summary describes the screening methodology used to select stocks for the portfolio, which evaluates companies over the past 3 years for growth in key financial metrics, consistent profit margins, low debt levels, and consistent dividend payments. The strategy is tested by backtesting portfolios of Canadian and Indian stocks. Results show the CDCF portfolios outperformed their respective benchmarks, indicating the strategy could generate alpha returns
El documento resume la situación de posguerra y los locos años 20. Inicialmente, los países se encontraban en una difícil situación económica y social luego de la guerra, con altos niveles de pobreza y desempleo. Sin embargo, pronto surgieron nuevas ideas y tecnologías que dieron paso a una era de prosperidad basada en el consumo masivo, la educación y los medios de comunicación. No obstante, esta burbuja económica terminó reventando en 1929 con el crac de la bolsa de Wall Street y una nueva crisis mundial.
Este documento presenta una lección sobre el uso del pasado simple y el verbo "to use to" en inglés. La lección incluye observar un video, organizar información sobre las reglas gramaticales, completar ejercicios de aplicación y enviarlos con comentarios de metacognición.
Report of Social Life's work exploring how Malmö City can think about the comprehensive social and physical regeneration of its lower income neighbourhoods, by developing a new approach to placemaking that has the potential to be funded through social investment.
Este documento presenta la información básica sobre el curso de Patología II impartido en la Facultad de Ciencias de la Salud de la Universidad Regional Autónoma de Los Andes en el período abril-septiembre de 2015. El curso tiene un valor de 5 créditos y se centra en profundizar el estudio de las patologías de acuerdo a los diferentes aparatos y sistemas del cuerpo humano. El objetivo general del curso es caracterizar los aspectos relacionados con la etiología, patogenia, cambios morfológicos
This documentary tells the story of photographer Bob Gruen's career capturing the punk rock scene in the 1970s and 1980s. It follows Gruen's journey through the music industry during this time period, as he photographed famous artists like Alice Cooper and documented the alternative lifestyle and culture of punk music. The documentary has a linear narrative structure, using Gruen's perspective and interviews to chart his experiences meeting people, traveling for work, and feeling immersed in the world of rock and roll photography.
The document summarizes the agenda for the final Cool Car Club meeting. It discusses going over the agenda, presenting fun facts and a PowerPoint, sharing puns, and holding elections for various officer positions for the upcoming year. These positions include President, Vice President, Co-IT Director, Secretary, Director of Promotions, Field Trip Coordinator, Director of Hospitality, Community Service Manager, and Public Relations officer. The terms for each position are listed as well as duties.
The document discusses conventions for different elements of a pop magazine, including the cover, contents page, and double page spreads. Some key conventions mentioned are:
- The cover should feature an eye-catching masthead, main image of a pop star between 17-25, and cover lines linking to the image. The color scheme should appeal to the target audience, usually pinks, purples, blues and yellows.
- The contents page typically copies the cover image and uses page numbers, subheadings, and bold text to make navigation easy. It also reflects the magazine's color scheme.
- Double page spreads usually include a large main image, are formatted as a question and answer interview, and
Get Fast Money Before You Enter Your Next PaydayHendry Mansoor
Payday loans that do not require a debit card can provide cash quickly to people in need, without the risk of losing a debit card or having their credit checked. The website www.paydayloansnodebitcard.org.uk offers several types of payday loans without debit cards, which can be applied for quickly online without faxing and provide an easy, secure way to access instant cash loans.
Este documento describe cómo eliminar archivos y carpetas en Windows utilizando el comando DEL. Explica que primero se creó una carpeta en Documentos y luego se usó el comando DEL para eliminarla, lo que hizo que apareciera una confirmación para eliminar la carpeta. Al escribir "S", la carpeta se eliminó de inmediato. También se puede eliminar archivos y carpetas de la misma manera desde una unidad USB.
Recent studies have identified patterns in how new technologies are adopted by society. Researchers analyzed data on the spread of innovations over the last hundred years and found similarities regardless of the technology. Most new devices or services start slowly with only a few early adopters but eventually reach a tipping point where adoption accelerates rapidly until market saturation is approached.
Joseph Fisher, General Manager of the Marco Beach Ocean Resort, writes a letter of recommendation for Mariamalia Moncaleano for a management position. He states that Mariamalia was an excellent Director of Housekeeping at the Conrad Miami Hotel under his leadership. Mariamalia is smart, skilled, and a gifted manager with strong people skills and intuitive leadership that earned her respect within the company. Her unmatched team-building skills helped build camaraderie in her department that still continues today.
Evolution of Knowledge Discovery and Management inscit2006
The document discusses the evolution of knowledge discovery and management over time. It outlines how knowledge discovery has progressed from early efforts using simulated data due to lack of large real-world datasets, to today where there is no shortage of complex real-world problems and data available. Key areas evolving now include automated data analysis, integration of tools and databases, and handling different data types like text and images. While expert systems showed early promise but had limited results, knowledge discovery has led to valuable applications through continued academic and industrial research.
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...Kees van Bochove
Talk from Kees van Bochove, The Hyve at SCOPE Summit, Real World Data track, Jan 26, 2017, Miami
A large open source initiative for standardisation and epidemiological analysis for real world data is OHDSI: Observational Health Data Sciences and Informatics. OHDSI leverages the OMOP common data model for observational data, and provides data analysis tools for a broad range of use cases. This talk will explain OMOP and OHDSI with case study IMI EMIF, in which health data from over 50 million patients from 13 national and regional European registries is brought together.
La democracia requiere paz, equilibrio social y estabilidad. Los principios de justicia, amor, libertad y libertad de expresión deben ser protegidos de acuerdo con la constitución. Los ciudadanos deben hacer oír sus voces para sistematizar los valores y principios de la sociedad.
This document appears to be a magazine titled "Good Eating" for spring/summer 2006, as various page numbers are listed throughout ranging from pages 3 to 43 without any other context provided.
Sourajit Aiyer - GSCGI WealthGram, Switzerland - Earning Alpha at a Passive C...South Asia Fast Track
This document provides a summary of a quantitative finance portfolio strategy called "Consistent Dividend Consistent Fundamentals" or CDCF. The strategy aims to outperform benchmarks during volatile markets by investing in a concentrated portfolio of stocks that demonstrate consistency in fundamentals like revenue, earnings, and profit margins as well as consistent dividend payments.
The summary describes the screening methodology used to select stocks for the portfolio, which evaluates companies over the past 3 years for growth in key financial metrics, consistent profit margins, low debt levels, and consistent dividend payments. The strategy is tested by backtesting portfolios of Canadian and Indian stocks. Results show the CDCF portfolios outperformed their respective benchmarks, indicating the strategy could generate alpha returns
El documento resume la situación de posguerra y los locos años 20. Inicialmente, los países se encontraban en una difícil situación económica y social luego de la guerra, con altos niveles de pobreza y desempleo. Sin embargo, pronto surgieron nuevas ideas y tecnologías que dieron paso a una era de prosperidad basada en el consumo masivo, la educación y los medios de comunicación. No obstante, esta burbuja económica terminó reventando en 1929 con el crac de la bolsa de Wall Street y una nueva crisis mundial.
Este documento presenta una lección sobre el uso del pasado simple y el verbo "to use to" en inglés. La lección incluye observar un video, organizar información sobre las reglas gramaticales, completar ejercicios de aplicación y enviarlos con comentarios de metacognición.
Report of Social Life's work exploring how Malmö City can think about the comprehensive social and physical regeneration of its lower income neighbourhoods, by developing a new approach to placemaking that has the potential to be funded through social investment.
Este documento presenta la información básica sobre el curso de Patología II impartido en la Facultad de Ciencias de la Salud de la Universidad Regional Autónoma de Los Andes en el período abril-septiembre de 2015. El curso tiene un valor de 5 créditos y se centra en profundizar el estudio de las patologías de acuerdo a los diferentes aparatos y sistemas del cuerpo humano. El objetivo general del curso es caracterizar los aspectos relacionados con la etiología, patogenia, cambios morfológicos
This documentary tells the story of photographer Bob Gruen's career capturing the punk rock scene in the 1970s and 1980s. It follows Gruen's journey through the music industry during this time period, as he photographed famous artists like Alice Cooper and documented the alternative lifestyle and culture of punk music. The documentary has a linear narrative structure, using Gruen's perspective and interviews to chart his experiences meeting people, traveling for work, and feeling immersed in the world of rock and roll photography.
The document summarizes the agenda for the final Cool Car Club meeting. It discusses going over the agenda, presenting fun facts and a PowerPoint, sharing puns, and holding elections for various officer positions for the upcoming year. These positions include President, Vice President, Co-IT Director, Secretary, Director of Promotions, Field Trip Coordinator, Director of Hospitality, Community Service Manager, and Public Relations officer. The terms for each position are listed as well as duties.
The document discusses conventions for different elements of a pop magazine, including the cover, contents page, and double page spreads. Some key conventions mentioned are:
- The cover should feature an eye-catching masthead, main image of a pop star between 17-25, and cover lines linking to the image. The color scheme should appeal to the target audience, usually pinks, purples, blues and yellows.
- The contents page typically copies the cover image and uses page numbers, subheadings, and bold text to make navigation easy. It also reflects the magazine's color scheme.
- Double page spreads usually include a large main image, are formatted as a question and answer interview, and
Get Fast Money Before You Enter Your Next PaydayHendry Mansoor
Payday loans that do not require a debit card can provide cash quickly to people in need, without the risk of losing a debit card or having their credit checked. The website www.paydayloansnodebitcard.org.uk offers several types of payday loans without debit cards, which can be applied for quickly online without faxing and provide an easy, secure way to access instant cash loans.
Este documento describe cómo eliminar archivos y carpetas en Windows utilizando el comando DEL. Explica que primero se creó una carpeta en Documentos y luego se usó el comando DEL para eliminarla, lo que hizo que apareciera una confirmación para eliminar la carpeta. Al escribir "S", la carpeta se eliminó de inmediato. También se puede eliminar archivos y carpetas de la misma manera desde una unidad USB.
Recent studies have identified patterns in how new technologies are adopted by society. Researchers analyzed data on the spread of innovations over the last hundred years and found similarities regardless of the technology. Most new devices or services start slowly with only a few early adopters but eventually reach a tipping point where adoption accelerates rapidly until market saturation is approached.
Joseph Fisher, General Manager of the Marco Beach Ocean Resort, writes a letter of recommendation for Mariamalia Moncaleano for a management position. He states that Mariamalia was an excellent Director of Housekeeping at the Conrad Miami Hotel under his leadership. Mariamalia is smart, skilled, and a gifted manager with strong people skills and intuitive leadership that earned her respect within the company. Her unmatched team-building skills helped build camaraderie in her department that still continues today.
Evolution of Knowledge Discovery and Management inscit2006
The document discusses the evolution of knowledge discovery and management over time. It outlines how knowledge discovery has progressed from early efforts using simulated data due to lack of large real-world datasets, to today where there is no shortage of complex real-world problems and data available. Key areas evolving now include automated data analysis, integration of tools and databases, and handling different data types like text and images. While expert systems showed early promise but had limited results, knowledge discovery has led to valuable applications through continued academic and industrial research.
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...Kees van Bochove
Talk from Kees van Bochove, The Hyve at SCOPE Summit, Real World Data track, Jan 26, 2017, Miami
A large open source initiative for standardisation and epidemiological analysis for real world data is OHDSI: Observational Health Data Sciences and Informatics. OHDSI leverages the OMOP common data model for observational data, and provides data analysis tools for a broad range of use cases. This talk will explain OMOP and OHDSI with case study IMI EMIF, in which health data from over 50 million patients from 13 national and regional European registries is brought together.
The document discusses experimental and quasi-experimental research methods. It defines key characteristics of experimental research such as random assignment, control and intervention groups, and pre- and post-testing. Issues of internal and external validity are examined. Common statistical analyses for experimental designs are introduced, including t-tests, ANOVA, and multiple regression. Examples of experimental designs like single-group, non-equivalent groups, interrupted time series, and factorial designs are also summarized.
The document discusses approaches for modern disease surveillance using collaboration and semantic web technologies. It describes how tools like InSTEDD Evolve use machine learning, social media, and geospatial data to improve early detection of disease outbreaks and facilitate effective coordination of public health responses. Key components of the proposed approach include automated analysis, user feedback loops, and representation of unstructured data to enable early detection and verification of health-related events.
This document discusses various methods for information extraction from electronic health records (EHR) data, including rule-based and machine learning approaches. Rule-based methods discussed include developing regular expressions for clinical text classification using top-down and bottom-up approaches. Machine learning methods include using support vector machines to classify implicit opinions like side effects in drug reviews. The document also reviews various software tools for named entity recognition and information extraction from EHRs, such as MetaMap, cTakes, MedEx, and NegEx for negation detection.
Our classification technique uses a deep CNN to classify skin lesions. An image is warped through the CNN architecture into a probability distribution over clinical skin disease classes. The CNN was pretrained on a large generic image dataset and fine-tuned on a dataset of over 129,000 skin lesions spanning 2,032 diseases. Data integration from multiple sources is key to future digital medicine, but challenges include data quality, availability, and privacy. Techniques like distributed learning models and homomorphic encryption can help address privacy concerns while enabling large-scale data sharing and analysis.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
This document provides an overview of point-of-care tools for answering clinical questions. It discusses common types of clinical questions, levels of evidence, and criteria for selecting resources. Examples are given of searching various point-of-care tools like ACP Pier, UpToDate, and DynaMed to answer clinical questions related to diagnosis, prognosis, treatment, screening, and drug dosing or interactions. Free resources available through libraries are also highlighted.
V.8.0-Emerging Frontiers and Future Directions for Predictive AnalyticsElinor Velasquez
This document proposes a novel methodology for predictive analytics based on topological-geometric-analytic-algebraic principles. It views the universe as a canonical heat bath partitioned into components that act as restricted thermal reservoirs. Each component has a well-defined structure and invariant that allows for new predictions. The methodology generalizes concepts like entropy and reinterprets prediction in terms of biological form and function. This provides a new framework for predictive modeling, especially with big data.
Answer questions Minimum 100 words each and reference (questions.docxamrit47
Answer questions Minimum 100 words each and reference (questions #1-2) KEEP questions WITH ANSWER
1. A key point to get out of this topic is the idea that these errors are theoretical. You won't be sure as to whether one occurred or not. Why are they theoretical in nature? Hint: think about a study and knowing the "truth"
2. Pick a study of interest and identify the null and alternative hypothesis. How does this fit in with regards to the topic of a type I and type II error? Always keep this in mind when you are trying to identify what a type I and type II error are.
A minimum of 75 words each question and References (IF NEEDED)(Response #1 – 7) KEEP RESPONSE WITH ANSWER
Make sure the Responses includes the Following: (a) an understanding of the weekly content as supported by a scholarly resource, (b) the provision of a probing question. (c) stay on topic
1) According to the reading, we set the alpha which is the largest probability for type I error. To increase the power of a hypothesis researchers can use larger samples which provides more information and raise the significance level which increases the probability that the hypothesis will be rejected.
2) A Type 1 error occurs when individuals involved in research make the decision to reject the belief of truth when in actuality the hypothesis is true. Type 1 errors are errors in research when the researcher makes the wrong decision to reject a true null hypothesis. Type II errors are considered less of a problem than Type 1 errors, but can prove to be detrimental in the field of medicine. This type of error occurs when researchers decide to keep a false null hypothesis, when in fact the hypothesis is true. The method to avoid making Type 1 decisions is to test the null hypothesis at the highest level (Alpha Level). This will lessen the possibilities of making this type of error (Privitera, 2018).
3) According to Privitera (2018) a type 1 error is the probability of rejecting a null hypotheses that is actually true, researchers purposely make this error. A null hypotheses is a statement about a population parameter that is assumed to be true, this hypotheses is a starting point (Privitera, 2018). The type 2 error or beta error is the probability of retaining a null hypotheses that is actually false (Privitera, 2018). The type 1 error is committed when a researcher decides to reject previous notions of truth that are in fact true (Privitera, 2018). The best way to avoid these types of errors is to be open minded and not reject notions if there is fact to back the notions up. In my opinion a type 1 error is something committed with bias by the researcher. I say this because as a researcher it is their job to find all facts or at least most all facts and apply them to their study or research, especially if they commit a type one error knowingly. If a researcher does this error then they are not following through with basic research guidelines.
4) A one-tailed test is used in a hypothesi.
This reflection paper discusses the importance of proper procedures when performing a venipuncture. The key steps outlined include properly identifying the patient, explaining the procedure to gain consent, selecting an appropriate vein, cleaning the site, and carefully drawing and handling the blood samples. Following standardized protocols is essential to obtaining quality samples and preventing errors or injuries. Adhering to precautions and best practices helps ensure accurate test results and safe experiences for both medical professionals and patients.
Clinical decision support, medical knowledge representation and workflow technology were discussed. Key points include:
1) Clinical informatics uses knowledge representation methods like terminologies, facts, and clinical guidelines to model medical knowledge.
2) Workflow technology can automate healthcare processes and clinical pathways using editors, engines and worklist handlers.
3) The presenter's research uses a tool called RetroGuide that applies workflow concepts to retrospectively analyze patient data using temporal modeling, helping improve care quality.
Answer all questions individually and cite all work!!1. Provid.docxfestockton
Answer all questions individually and cite all work!!
1. Provide an example of an idea, creativity, and innovation and argue why it best fits that category.
2. Identify three catalysts to enable innovativeness. Explain how they would enable innovation in your organization.
3. Why is it significant that an organization allow for failure? What are some significant ways an organization can allow for failure and still find success?
4. Making a pivot has saved organizations from completely deteriorating. Research an organization of your choice that has made an impactful pivot. Write an 8-10 sentence summary of the organization and the monumental pivot.
iStockphoto/Thinkstock
chapter 11
Nominal Data and the
Chi-Square Tests: What Occurs
Versus What Is Expected
Learning Objectives
After reading this chapter, you will be able to. . .
1. describe nominal data.
2. complete and explain the chi-square goodness-of-fit test.
3. complete and explain the chi-square test of independence.
4. present and interpret the results of the two types of chi-square test in proper APA format.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_11_c11.indd 407 10/23/13 1:45 PM
CHAPTER 11Section 11.1 Nominal Data
When there was an important development in statistical analysis in the early part of the 20th century, more often than not Karl Pearson was associated with it. Many of those
who made important contributions were members of the department that Pearson founded
at University College London. William Sealy Gosset, who developed the t-tests; Ronald A.
Fisher, who developed analysis of variance; and Charles Spearman, who developed factor
analysis, all gravitated to Pearson’s department at some point. Although social relations
among these men were not always harmonious, they were enormously productive schol-
ars, and this was particularly true of Pearson. Besides the correlation coefficient named for
him, Pearson also developed an analytical approach related to Spearman’s factor analysis
called principal components analysis, and he developed the procedures that are the sub-
jects of this chapter, the chi-square tests. The Greek letter chi (x) is pronounced “kie” like
“pie.” Chi is the equivalent of the letter c, rather than the letter x, which it resembles.
11.1 Nominal Data
With the exception of Spearman’s rho in Chapter 9, the attention in Chapters 1 through 10 has been directed at procedures designed for interval or ratio data. Sometimes
the data is not interval scale, nor is it the ordinal-scale data that Spearman’s rho accom-
modates. When the data is nominal scale, often one of the chi-square (x2) tests is used.
It will be helpful to review what makes data nominal scale. Nominal data either fits into a
category or it does not, which is why nominal data is sometimes called categorical or clas-
sification data. Because the analysis is based on counting the data, it is also called count or
frequency data. Compared to ratio, interva ...
This study evaluated the use of an interactive computer module to supplement a traditional paper informed consent form for pediatric endoscopy. Parents who received the supplemental electronic module were more likely to achieve informed consent compared to those who only received the paper form. The electronic module did not impact parent satisfaction, anxiety, or the number of questions asked of physicians. The results suggest that electronic tools can enhance traditional informed consent methods.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...journal ijrtem
: A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
E-Symptom Analysis System to Improve Medical Diagnosis and Treatment Recommen...IJRTEMJOURNAL
A wealth of data in public health care systems has been collected and meanwhile there are plenty
of new technological improvements which have considerable influence on current data pool. Nevertheless,
important obstacles are challenging to utilize existing clinical data. Enhanced technological improvements lead
patients to search their symptoms and corresponding diagnosis on online resources. In this study, it is aimed to
develop a machine learning model to suit in different availability of users. Most of the current systems allow
people to choose related symptom in web interfaces or Q&A forums. In addition to these applications it is aimed
to implement a new technique which extracts the text-based symptoms and its related parameters such as, severity,
duration, location, cause, accompanied by any other indicators. This study is applicable for patient`s everyday
language statements besides medical expression of symptoms for corresponding symptoms. Extracted terms are
used as an input of the model and analyzed for matching diagnosis where an accuracy of 72.5% has been
accomplished.
A discriminative-feature-space-for-detecting-and-recognizing-pathologies-of-t...Damian R. Mingle, MBA
Each year it has become more and more difficult for healthcare providers to determine if a patient has a pathology
related to the vertebral column. There is great potential to become more efficient and effective in terms of quality
of care provided to patients through the use of automated systems. However, in many cases automated systems
can allow for misclassification and force providers to have to review more causes than necessary. In this study, we
analyzed methods to increase the True Positives and lower the False Positives while comparing them against stateof-the-art
techniques in the biomedical community. We found that by applying the studied techniques of a data-driven
model, the benefits to healthcare providers are significant and align with the methodologies and techniques utilized
in the current research community.
演講-Meta analysis in medical research-張偉豪Beckett Hsieh
This document provides an overview of meta-analysis. It defines meta-analysis as a quantitative approach to systematically combining results from previous studies to arrive at conclusions about the body of research. It discusses key aspects of planning and conducting a meta-analysis such as defining the research question, searching for relevant literature, determining study eligibility, extracting data, analyzing effect sizes, assessing heterogeneity, and addressing publication bias. Software for performing meta-analyses and specific effect sizes like risk ratio and odds ratio are also mentioned.
Text Extraction Engine to Upgrade Clinical Decision Support SystemIJRTEMJOURNAL
New generation technological improvements lead patients to search their symptoms and
corresponding diagnosis on online resources. In this study, it is aimed to develop a machine learning model to
suit in different availability of users. Most of the current systems allow people to choose related symptom in web
interfaces. In addition to these applications it is aimed to implement a new technique which extracts the text-based
symptoms and its related parameters such as, severity, duration, location, cause, accompanied by any other
indicators. This study is applicable for patient`s everyday language statements besides medical expression of
symptoms for corresponding symptoms which is also supporting initial clinical decision system. Extracted terms
are analyzed for matching diagnosis where an accuracy of 90% has been accomplished.
Similar to 210 2008 using text mining to classify requests and prepare semiautomatic answers (20)
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Manufacturing Process of molasses based distillery ppt.pptx
210 2008 using text mining to classify requests and prepare semiautomatic answers
1. Paper 210-2008
Using Text Mining to Classify Lay Requests to a Medical Expert
Forum and to Prepare Semiautomatic Answers
Wolfgang Himmel, Department of General Practice/Family Medicine, University of Göttingen,
Germany
Ulrich Reincke, Competence Center Enterprise Intelligence, SAS Institute, Heidelberg,
Germany
Hans Wilhelm Michelmann, Department of Obstetrics and Gynecology, University of Göttingen,
Germany
ABSTRACT
We developed a scoring procedure to automatically classify lay requests to an internet medical forum about involuntary
childlessness. The requests should be classified according to their subject matter (32 categories) and the sender’s
expectation (6 categories). Building upon this procedure, the experts’ answers to former requests could be the basis of an
automatic answer for new incoming requests, finding their “nearest neighbors”. Our text mining approach comprised the
following steps: a large start list of relevant words and the calculation of the Cramer’s V statistic for the association
between relevant words and the 38 categories. We trained logistic regression models with high precision and recall. We
then simulated a scenario in which a subset of requests (n=50) served as ‘new’ requests. To find the most nearest
neighbors, we applied a formula, which gave high weight to singular value decompositions (SVDs) but also considered
the automatically classified subject matter of this ‘new’ request and—to a lesser degree—the sender’s expectation. If we
can implement this procedure into the real life of a health internet forum, the workload of medical experts could be
lightened, visitors to a forum could receive a more timely answer to their question and they could become aware of former
questions and answers that were rather similar to the one they had sent.
INTRODUCTION
Both healthy and sick people increasingly use electronic media to obtain medical information and advice (Umefjord
2007). One important application is web-based “ask the doctor” services. People may send requests to these services
to better understand their disease, to learn about new therapies or to ask for a second opinion (Widman 1997,
Eysenbach 1999).
To facilitate the work of medical experts in web forums, it would be helpful to classify visitors’ requests automatically.
The classification of medical requests - or any other documents - can refer to (1) their subject matter, e.g. the drug
treatment of high blood pressure or the pros and cons of screening for colon cancer and (2) the sender’s expectation,
e.g. to receive a commentary on the current treatment (second opinion), to get general information about a certain
disease or biological processes, or where to seek adequate medical help.
Text mining has been successfully applied for automatic classifications of large volumes of documents (Weiss et al.),
for example in classifying consumer complaints. The automatic classification of medical requests is more difficult.
Since such requests come from lay people, they are often very long and unstructured; at the same time, many of
these requests ask for the same information although they tell a different story. Moreover, lay people often mix
personal experiences with laboratory data or medical issues with the need for psychological help.
We have finished an initial trial to automatically classify these requests using standard text-mining software and are
developing further strategies to refine this process (Himmel et al. 2007). We use a large sample of requests in the
section “Wish for a Child” on the German website www.rund-ums-baby.de, which provides information for parents and
potential parents.
In this preparatory work, we developed a scoring procedure that calculated for any medical request the probability to
belong to a certain subject matter (e.g. hormones or insemination) and to belong to a respective expectation of the
sender (e.g. general information or interpretation of laboratory data). We will first describe this procedure and then
present a method how to find the “nearest neighbors” for the requests studied. If this procedure leads to satisfying
results, it will be possible to classify new incoming requests to the expert forum and to look for the nearest neighbors
of this request in the large stock of the expert forum. These nearest neighbors and the experts’ former answers to
these requests could be the basis of an automatic answer.
1
Pharma, Life Sciences and HealthcareSAS Global Forum 2008
2. METHODS
The analysis is based on a random sample of requests (n=988) from the German website www.rund-ums-baby.de.
According to the above-mentioned 2 dimensions (subject matter and expectations), these requests were first
manually classified to provide a basis for training and validation. The categories for the subject matter dimension were
mutually exclusive; the categories for the expectation dimension permitted overlapping (see Box 1 for the 38
categories).
Validation Data
Dimensions
Category
Precision
(%) *
Recall
(%) *
Medical category
Abortion 91 100
Abrasion 100 100
Clomifen 100 100
Oviduct 100 100
Examination of the oviduct 100 100
Ovulation 90 86
Endometriosis 75 100
Nourishment 100 100
Luteal phase defects 88 100
Sexual intercourse 100 100
Habitual abortion 100 100
Hormones 78 78
Insemination 100 100
IVF 81 88
Cost 100 100
Cryo transfer 100 75
Medical drugs 92 100
Multiples 100 100
Menstruation 90 100
Naturopathy 90 100
Validation Data
Dimensions
Category
Precision
(%) *
Recall
(%) *
Medical category
PCO 100 100
Birth control pill 100 100
Problems Sexual intercourse 100 100
Thyroid gland 100 100
Pregnancy worries 100 92
Pregnancy symptoms 100 100
Pregnancy test 88 88
Semen analysis 88 93
Stimulation 63 100
Intermenstrual bleeding 100 100
Cycle 80 86
Cysts 100 100
Expectations**
General information 92 84
Actual treatment 85 72
Results 86 82
Feelings 100 61
Interpretation 78 69
Possibilities 82 81
* Model (selection criterion): To calculate recall and precision, we first chose the best model according to the following selection criteria: Akaike’s Information Criterion,
Schwarz Baysian Criterion, cross validation misclassification of the training data, cross validation error of the training data.
** multiple categories possible
Box 1: Goodness of automatic classification
A TEXT MINING APPROACH TO CLASSIFY THE DOCUMENTS
To automatically classify the 988 requests sent to the expert forum, we developed an individual text mining approach
comprising the following steps: (1) A large start list of possibly relevant words (2) A word count of all words from the
start list appearing in all requests of each category. (3) Calculation of the average Cramer’s V statistic for the
significant word associations in each category, principle component analysis, and singular value decomposition (SVD)
with the SAS Text Miner (Albright 2006, Reincke 2003). (4) On the basis for these 3 types of input variables, we
trained logistic regression models and built a meta-model for the automatic classification. (5) Calculation of recall and
precision for a subset of validation data.
2
Pharma, Life Sciences and HealthcareSAS Global Forum 2008
3. SCORING OF DOCUMENTS FOR A SEMI-AUTOMATIC ANSWERING
A scoring procedure should calculate for all 988 requests the probability to fall in any of the 38 classifications of the
subject matter and expectation dimension. We then determined “nearest neighbors” to these requests, according to
the 2 dimensions, and additionally considered the SVD as a distance measure. We simulated a scenario in which a
subset of requests (n=50) served as ‘new’ requests. For each of these requests, 3 similar former requests should be
found as “nearest neighbors”, if possible. The experts’ former answers to these requests should be displayed as
preliminary answers to the ‘new’ request.
As a crude measure of the quality of our procedure, we calculated how many of the 3 displayed requests and their
respective expert answers corresponded to the ‘new’ request.
RESULTS
TEXT MINING APPROACH
The start list of words for the sample of 988 requests consisted of about 11,000 terms, resulting in a total of about
4,100 synonyms, which we called ‘parents’ (Table 1 in Box 2). Terms and parents were treated as binary variables
(0/1) and we could determine whether a term or parent appeared in a document or not (Table 2 in Box 2). This large
table was then transposed in a transaction table (Table 3 in Box 2), showing whether a document fell into a certain
category (Box 2 shows an example for category 37) or not. This transaction table also shows whether a certain parent
(Box 2, parent 4) was present in this document or not.
Table 1 Synonym List Table 2 Binary Indicator Variables Based on Parents
Term Parent Document ID
ABARTIG(p1)
abartig(t1)
ABBAUEN(p2)
abbauen(t2)
ABBAUEN(p2)
abbaut(t3)
ABBAUEN(p2)
abgebaut(t4)
abartig abartig 330 0 0 1 0
abbauen abbauen 333 0 0 0 0
abbaut abbauen 336 0 0 0 0
abgebaut abbauen 338 0 0 0 0
abblutet abbluten 353 0 0 0 0
abbort abort 355 0 0 0 1
Table 3 Transposing Table 2 into a Transaction Table
target_id target Document ID parent_id parent
37 1 299 4 0
37 1 296 4 1
37 1 293 4 0
37 1 286 4 0
37 1 290 4 0
37 1 289 4 0
37 0 292 4 0
Box 2: Terms, Parents and Categories Appearing in Each Document in Binary Presentation
We calculated the average Cramer’s V statistic for each parent with each of the 38 categories. Significance levels for
including a parent’s Cramer’s V statistic was alternatively set at 1%, 2%, 5%, 10%, 20%, 30% and 40%. We summed
for each category all Cramer’s V coefficients over the significant words. About 4 million chi2
tests were necessary for
this task and were performed automatically with one call of the SAS procedure proc freq.
proc freq noprint data=trans ;
tables parent*target
/ChiSq measures outpct
expected nowarn out=freq_count ;
output out=ChiSq pchi measures cramv ;
by parent_id target_id;
run;
3
Pharma, Life Sciences and HealthcareSAS Global Forum 2008
5. Frequency; n (%)
Word
In category In other
categories
Cramer’s V
p
category “oviduct”
tube 16 (100) 60 (6) .44 > 0.001
fallopian tube 16 (100) 60 (6) .44 > 0.001
removed 8 (50) 16 (2) .40 > 0.001
exception 2 (13) 0 (0) .35 > 0.001
away 8 (50) 35 (4) .29 > 0.001
link 7 (44) 28 (3) .28 > 0.001
category “examination of the oviduct”
tube 15 (79) 61 (6) .37 > 0.001
fallopian tube 15 (79) 61 (6) .37 > 0.001
laparoscopy 11 (58) 35 (4) .35 > 0.001
endoscopy 12 (63) 43 (4) .35 > 0.001
x-ray 3 (16) 1 (0) .34 > 0.001
angiography 2 (11) 0 (0) .32 > 0.001
Box 4: Most predictive words for the categories “oviduct” and “examination of the oviduct”
(2) For some categories, the input variables also included variables from other categories, most often with a negative
sign. For example, the meta-model for “pregnancy test” included a sample of words (as an indicator variable)
predictive for the category “menstruation” with a negative sign. This means that a lack of words predictive for
“menstruation” was a strong indicator for the category “pregnancy test”.
On the basis of the different input variables, we trained regression models and combined them to meta-models in
order to classify a training sample of requests. We calculated a probability for each request to belong to any of the 38
categories and set a cut-off of 50%. Applying these models to a subset of requests (validation data), we achieved
high rates of precision and recall, especially for the subject matter dimension; in the expectations dimension, recall
was in some instances not quite as satisfying (Box 1).
NEAREST NEIGHBORS
The good rate of classification of a validation sample of requests motivated us to experiment with a semi-automatic
answering procedure. At the moment this procedure is, of course, only being tested in the form of a simulation.
For this simulation, we restricted ourselves to 2 methods of dimension reduction: Cramer’s V indicator variables on
the basis of the chi
2
statistic and a SVD with SAS Text Miner. A sub-sample of 50 randomly selected requests were
considered to be new incoming requests to the expert forum. Out of the more than 6,000 requests from the whole
sample of the website we tried to find matches for these 50 requests. Each of these new requests and its three
matches should correspond in the subject matter and expectation dimension. Moreover, their first 10 SVDs should be
rather close to each other. To find the most nearest neighbors we applied the following formula:
Score = match_s + .5 * match_e - 5 * dist_svd;
meaning that we considered the subject matter (_s) dimension with a factor of 1, the expectation dimension (_e) with
a factor of 0.5 and weighed the SVDs with the factor 5. This procedure is graphically displayed in Box 5.
5
Pharma, Life Sciences and HealthcareSAS Global Forum 2008
7. Excerpts from ‘new’ and former requests
‘New’ request Who can tell me if I am pregnant? After ovulation, my morning
temperature has increased constantly to 370
C for 5 days now.
Automatic classification Ovulation (82%), General information (99%)
Neighbor 1 For quite a long time, I have been trying to get pregnant and have
measured my temperature daily. But it differs constantly. Why?
Automatic classification Ovulation (100%), General information (94%), Treatment
opportunities (84%)
Expert Answer to Neighbor 1 Sometimes it is because of your mode of measurement or your
thermometer is not working properly. (Not appropriate!)
Box 6: Nearest neighbor: a poor example
Considering all 50 requests of the sub-sample, we found a good fit with three adequate matches in 29% of the cases,
in 39 cases (78%) at least one match met the sender’s subject matter and expectation and its sublime contents.
However, in 11 requests (22%) none of the matches corresponded to the sender’s information need; in these cases,
none of the experts’ former answers would have been helpful for the sender.
DISCUSSION
A combination of different methods to automatically classify requests to a medical expert forum, according to their
subject matter and the sender’s expectations, yielded rates of precision and recall above 80% in nearly all categories.
This was a rather satisfying basis to implement a semi-automatic answering procedure for new requests. As
simulation model was successful for about 80% of ‘new’ requests.
Several factors may have contributed to this result:
1. A high input of expert knowledge to build a large and meaningful start list of words for our chi2
statistics.
2. The combination of different text mining strategies, including meta-models for regression, obviously met with the
different text nature of our categories.
3. In some instances, the meta-models not only considered words or PCAs significant for a category but also the
lack of certain words is a (negative) predictor for a category.
4. Although we had a lot of subject categories (n=32) they are still to simplistic to find nearest neighbors; SVDs were
a good means to find, within a given category, those requests that are very close to each other.
One important limitation must be mentioned: although matches to a new request had to correspond with respect to
the subject matter and the expectation and should be close to each other with regard to the SVDs, this does not
protect us against mismatches due to false classifications. In this case, the experts’ answers from former requests
cannot meet the sender’s information needs on principle. If a semi-automatic answering of medical requests should
become reality, the possibility of such mistakes should be clearly mentioned. But even then, visitors to an expert
health forum will be disappointed if they do not receive a more adequate and individual answer in due time.
In conclusion, a text mining strategy as presented in our paper may be helpful for health politicians and researchers to
identify in vivo health needs and information needs of the public in different medical areas. It may lighten the workload
of experts and help visitors receive a more timely answer, because similar patient requests and their corresponding
answers could be collated, even before the expert himself/herself replies.
REFERENCES
Albright R. Taming Text with the SVD. 2004.
Available at: ftp://ftp.sas.com/techsup/download/EMiner/TamingTextwiththeSVD.pdf
Eysenbach G, Diepgen TL. Patients looking for information of the Internet and seeking teleadvice: motivation,
expectations, and misconceptions as expressed in e-mails sent to physicians. Arch Dermatol. 1999;135:151-6.
Himmel, W, Reincke, U, Michelmann HW. Semi-automatic answering of health questions to a medical internet forum
using text mining [in German]. In: Muche R, Bödeker R-H (eds.). KSFE 2007 - Proceedings der 11. Konferenz der SAS-
Anwender in Forschung und Entwicklung (KSFE); March 1-2, 2007, Ulm University. Aachen: Shaker (ISBN 978-3-8322-
6680-6) 2007; 113-22.
7
Pharma, Life Sciences and HealthcareSAS Global Forum 2008
8. Reincke U. Profiling and classification of scientific documents with SAS Text Miner. Paper presented at the third
“Knowledge Discovery” workshop in October 2003 in Karlsruhe, Germany. Available at: http://km.aifb.uni-
karlsruhe.de/ws/LLWA/akkd/8.pdf
Umefjord G, Sandström H, Malker H, Petersson G. Medical text-based consultations on the Internet: A 4-year study.
Int J Med Inform. 2007; doi:10.1016/ijmedinf.2007.01.009
Weiss SM, Indurkhya N, Zhang T, Damerau FJ. Text Mining: Predictive Methods for analyzing unstructured
information. New York: Springer; 2005.
Widman LE, Tong DA. Requests for medical advice from patients and families to health care providers who publish
on the World Wide Web. Arch Intern Med. 1997;157:209-12
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Professor Wolfgang Himmel
Dept. General Practice / Family Medicine
University Medical Center Göttingen
Georg-August-University
Humboldtallee 38
37073 Göttingen
Tel # 00 49. (0) 551. 39-2648
Fax # 00 49. (0) 551. 39-9530
whimmel@gwdg.de
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
8
Pharma, Life Sciences and HealthcareSAS Global Forum 2008