Data Mining in Radiology Reports <br />SaeedMehrabi<br />Spring 2010INFO-I535<br />Dr. Patrick W. Jamieson<br />Dr. Josett...
Outline<br />Introduction to data and text mining <br />Our data set<br />Structuring free text<br />Results<br />Similar ...
What is Data Mining <br />Data mining is<br />The extraction of useful patterns from data sources such as databases, texts...
Why data mining now?<br />The data is abundant.<br />The data is being warehoused.<br />The computing power is affordable....
Text Mining <br />Text mining applies and adapts data mining techniques to text domain<br />Structured vs. Free Text<br />...
Data Set<br />Our corpus consists of:<br />594,000 de-identified radiology reports <br />36 million words<br /> 4.3 millio...
Structuring Free text <br />Regular expression was used to detect sentences in reports!<br />Regular expression is a conci...
Structuring Free text (Cont.)<br />A proposition is a declarative sentence, that is either true or false but not both.<br ...
Corpus Annotation <br />So for annotating each new sentence from the radiology reports the computer initially propose prop...
Results <br />The process of building the ontology of propositions is in parallel with the expert annotating sentences to ...
Results (Cont.)<br />The propositions are categorized into main findings such as brain and skull, general radiology, .. <b...
Similar works<br />CLEF (Clinical E-Science Framework)<br />It consists of both structured records and free text documents...
LEXIcon Mediated Entropy Reduction<br />
LEXIMER(Cont.)<br />Phrase Isolation<br />includes scanning the report text and separating the content into phrases<br />N...
NLP using OLAP for assessing Recommendations in radiology reports <br />Database:<br />4,279,179 radiology reports from a ...
Discussion <br />CLEF work is on very limited number of reports <br />In Leximer, there is no validation of their classifi...
Reference <br />Friedlin, J., Mahoui, M., Jones, J., Kashyap, V., & Jamieson , P. (2010).Knowledge Discovery and Data Mini...
Upcoming SlideShare
Loading in …5
×

Data Mining in Rediology reports

1,697 views

Published on

This is a data mining of large scale of radiology reports

Published in: Education
1 Comment
1 Like
Statistics
Notes
  • Very Good! Thank you for sharing!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,697
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
22
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Data Mining in Rediology reports

  1. 1. Data Mining in Radiology Reports <br />SaeedMehrabi<br />Spring 2010INFO-I535<br />Dr. Patrick W. Jamieson<br />Dr. Josette Jones <br />
  2. 2. Outline<br />Introduction to data and text mining <br />Our data set<br />Structuring free text<br />Results<br />Similar works <br />Discussion <br />
  3. 3. What is Data Mining <br />Data mining is<br />The extraction of useful patterns from data sources such as databases, texts and web.<br />There is a big gap from stored data to knowledge and the transition won’t occur automatically.<br />Many interesting things you want to find cannot be found using database queries<br /> “find me people likely to buy my products”<br /> “Who are likely to respond to my promotion”<br />
  4. 4. Why data mining now?<br />The data is abundant.<br />The data is being warehoused.<br />The computing power is affordable.<br />The competitive pressure is strong.<br />Data mining tools have become available<br />
  5. 5. Text Mining <br />Text mining applies and adapts data mining techniques to text domain<br />Structured vs. Free Text<br />Structured text can be stored in a relational database.<br />Providing the means to represent data available in text in structured format will make information exchange, data mining and information retrieval more feasible. <br />
  6. 6. Data Set<br />Our corpus consists of:<br />594,000 de-identified radiology reports <br />36 million words<br /> 4.3 million sentences <br />The reports were dictated by the Indiana University Radiology faculty, a group of 40 radiologists, from 1993-1998.<br />
  7. 7. Structuring Free text <br />Regular expression was used to detect sentences in reports!<br />Regular expression is a concise and flexible way of matching strings of text, such as particular characters or words.<br />Sentences annotated to propositions which simply are sentences expressing the same concept for similar findings within reports<br />
  8. 8. Structuring Free text (Cont.)<br />A proposition is a declarative sentence, that is either true or false but not both.<br />Today is a beautiful sunny day. ( A proposition)<br />x + 2 = 4 (Not a proposition)<br /> Users can select propositions and map sentences to propositions<br />
  9. 9.
  10. 10. Corpus Annotation <br />So for annotating each new sentence from the radiology reports the computer initially propose propositions<br />The suggested propositions by the software are reviewed by experts and corrected as needed before validation.<br />If there is no proposition in the ontology then the expert can create new ones.<br />
  11. 11.
  12. 12. Results <br />The process of building the ontology of propositions is in parallel with the expert annotating sentences to the existing proposition<br />So far, 427,433 unique sentences from the corpus have been annotated. <br />Representing a total of 2,561,330 sentences or 60% of the total sentences. <br />
  13. 13. Results (Cont.)<br />The propositions are categorized into main findings such as brain and skull, general radiology, .. <br />All propositions with information such as whether they are normal or abnormal finding and the number of the sentences mapped to them are all stored in a relational data base <br />We can find the most frequent or highest ranked propositions by sorting them based the number of sentences that are mapped to them, how many of them are normal or abnormal and the number of normal and abnormal propositions and sentences in each category <br />
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19. Similar works<br />CLEF (Clinical E-Science Framework)<br />It consists of both structured records and free text documents(clinical narratives, radiology reports and histopathology report)<br />Semantic annotation of clinical text to assist in the development and evaluation of an Information Extraction system<br />
  20. 20. LEXIcon Mediated Entropy Reduction<br />
  21. 21. LEXIMER(Cont.)<br />Phrase Isolation<br />includes scanning the report text and separating the content into phrases<br />Noise Reduction <br />decreases the amount of non-clinically relevant information contained within the report<br />Signal Extraction <br />pulls out the positive statements and recommendations from the clinically relevant phrases<br />
  22. 22. NLP using OLAP for assessing Recommendations in radiology reports <br />Database:<br />4,279,179 radiology reports from a single tertiary health care center<br />10-year period (1995-2004)<br />Consist of reports of most common imaging modalities tests with patient demographics<br />Leximerin conjunction with OnLine Analytic Processing was used for classifying reports into those with recommendation (IREC) and without recommendations for imaging <br />IREC rates were determined for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians<br />
  23. 23. Discussion <br />CLEF work is on very limited number of reports <br />In Leximer, there is no validation of their classification method and phrases cannot convey the meaning of a sentence. <br />What distinguish our work from others is the large amount of data that is mined and consistent expert validation.<br />
  24. 24. Reference <br />Friedlin, J., Mahoui, M., Jones, J., Kashyap, V., & Jamieson , P. (2010).Knowledge Discovery and Data Mining of Free Text Radiology.Submitted to the journal of biomedical informatics <br />Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Setzer, A., et al. (2008). Semantic Annotation of Clinical Text: The CLEF Corpus. Retrieved April 20, 2010, from ftp://ftp.dcs.shef.ac.uk/home/robertg/papers/lrec08-clefcorpus.pdf<br />Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Lemay PR, Freshman DJ, Halpern EF, Dreyer KJ. Natural language processing using online analytic processing for assessing recommendations in radiology reports.J Am CollRadiol. 2008 Mar;5(3):197-204.<br />http://www.nuance.com/healthcare/products/radcube-for-radiology.asp<br />

×