THE IMPORTANCE OF DATA MINING  Musa Mohd. Nordin FRCP, FAMM President Federation of Islamic Medical Associations
WHAT WE WILL DISCUSS :  <ul><li>Why data mine ? </li></ul><ul><li>What is data mining ? </li></ul><ul><li>Applications of ...
<ul><li>Lots of data collected  and warehoused  </li></ul><ul><ul><li>Web data, e-commerce </li></ul></ul><ul><ul><li>Purc...
Why Mine Data? Scientific Viewpoint <ul><li>Data collected & stored at  enormous speeds (GB/hour) </li></ul><ul><ul><li>re...
SIZE OF MEDICAL KNOWLEDGE <ul><li>NLM Meta Thesaurus </li></ul><ul><li>- 875,255 concepts </li></ul><ul><li>- 2.14 million...
Mining Large Data Sets - Motivation <ul><li>Information “hidden” in the data that is not readily evident </li></ul><ul><li...
 
 
 
 
Data mining is the process of  identifying VALID, NOVEL, potentially USEFUL & UNDERSTANDABLE patterns in data.
<ul><li>Emerged late 1980s </li></ul><ul><li>Flourished 1990s </li></ul><ul><li>Roots traced to 3 disciplines : </li></ul>...
Transformed  Data Target  Data RawData Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleani...
DATA MINING – MEDICAL APPLICATIONS <ul><li>Medical diagnostics tools </li></ul><ul><li>Medical image analysis </li></ul><u...
Strong Government Initiatives US:  US$3B for Human Genome Project Germany:  US$62M & US$18M to support proteomics and bact...
 
<ul><li>There is  6 feet  of DNA in each of our cells packed into a structure only 0.0004  inches across </li></ul><ul><li...
Life sciences research: from gene to function Gene NH 2 COOH Protein Genome-wide micro-array analysis   “ High-throughput”...
Paradigm Shift in Life Sciences <ul><li>Past experiments were  hypothesis driven </li></ul><ul><ul><li>Evaluate hypothesis...
DATA MINING :  GENOMICS & BIOINFORMATICS <ul><li>Experiments increasingly complex </li></ul><ul><li>Driven by increase of ...
DATA MINING : CONCLUSIONS <ul><li>Knowledge discovery from databases </li></ul><ul><li>“ Cutting edge” of the art & scienc...
Thank You
Upcoming SlideShare
Loading in …5
×

The Importance Of Data Mining By Musa Mohd. Nordin, Noor

5,484 views

Published on

Noor Conference | Global Knowledge Forum | http://www.noor.org.sa | Day 2 - Panel 2 - The Importance Of Data Mining By Musa Mohd. Nordin, Noor

Published in: Education, Technology
  • Fioricet is often prescribed for tension headaches caused by contractions of the muscles in the neck and shoulder area. Buy now from http://www.fioricetsupply.com and make a deal for you.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

The Importance Of Data Mining By Musa Mohd. Nordin, Noor

  1. 1. THE IMPORTANCE OF DATA MINING Musa Mohd. Nordin FRCP, FAMM President Federation of Islamic Medical Associations
  2. 2. WHAT WE WILL DISCUSS : <ul><li>Why data mine ? </li></ul><ul><li>What is data mining ? </li></ul><ul><li>Applications of data mining. </li></ul><ul><li>Data mining vis a vis genomics & informatics </li></ul><ul><li>Conclusions </li></ul>
  3. 3. <ul><li>Lots of data collected and warehoused </li></ul><ul><ul><li>Web data, e-commerce </li></ul></ul><ul><ul><li>Purchases at department </li></ul></ul><ul><ul><li>stores </li></ul></ul><ul><ul><li>Bank & Credit Card transactions </li></ul></ul><ul><li>Computers cheaper & more powerful </li></ul><ul><li>Competitive pressure is strong </li></ul><ul><ul><li>Make better decisions </li></ul></ul><ul><ul><li>Serve customers </li></ul></ul><ul><ul><li>Gain competitive edge </li></ul></ul>Why Mine Data?
  4. 4. Why Mine Data? Scientific Viewpoint <ul><li>Data collected & stored at enormous speeds (GB/hour) </li></ul><ul><ul><li>remote sensors on a satellite </li></ul></ul><ul><ul><li>telescopes scanning the skies </li></ul></ul><ul><ul><li>microarrays generating gene expression data </li></ul></ul><ul><ul><li>scientific simulations generating terabytes of data </li></ul></ul><ul><li>Data volumes overwhelm traditional techniques </li></ul><ul><li>- enormity, multi dimensional, heterogenous data </li></ul><ul><li>Data mining may help scientists </li></ul><ul><ul><li>in segmenting & analysing data </li></ul></ul><ul><ul><li>in hypothesis generation </li></ul></ul><ul><ul><li>in knowledge discovery </li></ul></ul>
  5. 5. SIZE OF MEDICAL KNOWLEDGE <ul><li>NLM Meta Thesaurus </li></ul><ul><li>- 875,255 concepts </li></ul><ul><li>- 2.14 million concepts names </li></ul><ul><li>Biomarkers & prognosis </li></ul><ul><li>- one marker in 12 years (1989-2001) </li></ul><ul><li>- 400 markers in 1 year (2004) </li></ul><ul><li>Drug development </li></ul><ul><li>- one molecule & 1,000 compound (1985) </li></ul><ul><li>- 40,000 cDNAs and 1 million compounds (2005) </li></ul>
  6. 6. Mining Large Data Sets - Motivation <ul><li>Information “hidden” in the data that is not readily evident </li></ul><ul><li>Human analysts may take weeks to discover useful information </li></ul><ul><li>Much of the data is never analyzed at all </li></ul>The Data Gap Total new disk (TB) since 1995 Number of analysts From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
  7. 11. Data mining is the process of identifying VALID, NOVEL, potentially USEFUL & UNDERSTANDABLE patterns in data.
  8. 12. <ul><li>Emerged late 1980s </li></ul><ul><li>Flourished 1990s </li></ul><ul><li>Roots traced to 3 disciplines : </li></ul><ul><li>- Classical Statistics </li></ul><ul><li>- Artificial Intelligence </li></ul><ul><li>- Machine Learning </li></ul><ul><li>Pre 1993 : “Torturing </li></ul><ul><li>the data into a confession” </li></ul><ul><li>Post 1993 : “Charming </li></ul><ul><li>the data into a confession” </li></ul>Origins of Data Mining Machine Learning/ Pattern Recognition Statistics/ AI Data Mining Database systems
  9. 13. Transformed Data Target Data RawData Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge __ ____ __ ____ __ ____ Patterns and Rules
  10. 14. DATA MINING – MEDICAL APPLICATIONS <ul><li>Medical diagnostics tools </li></ul><ul><li>Medical image analysis </li></ul><ul><li>Micro-array gene expression </li></ul><ul><li>Protein structure & fxn prediction </li></ul><ul><li>New drug development </li></ul><ul><li>Disease surveillance </li></ul><ul><li>Bioterrorism surveillance </li></ul><ul><li>Environmental health impacts </li></ul>
  11. 15. Strong Government Initiatives US: US$3B for Human Genome Project Germany: US$62M & US$18M to support proteomics and bacterial genomes over 3 years respectively Britain: Budget to grow at 7% a year for next 4 years in bioinformatics and other post-genomics research Italy: US$195M fund to focus on human genetics, cancer and bioinformatics Sweden: US$91.4M for biotech, biosciences, healthcare Singapore: US$1.2B in life sciences Malaysia: BioValley will be valued at US$13.2B in 10 years Japan: US$489.6M invested towards sequencing and analysis Korea: US$1.7M for 2 plant genome projects
  12. 17. <ul><li>There is 6 feet of DNA in each of our cells packed into a structure only 0.0004 inches across </li></ul><ul><li>There are 100 trillion (100,000,000,000,000) cells in the body </li></ul><ul><li>If all the DNA in the human body was put end to end it would reach to the sun and back over 600 times (100 trillion x 6 feet divided by 93 million miles = 1200). </li></ul>
  13. 18. Life sciences research: from gene to function Gene NH 2 COOH Protein Genome-wide micro-array analysis “ High-throughput” protein-analysis mRNA AAAAAAAAA function-2 function-1 function-n Whole-genome sequence projects Protein function: -prediction by bioinformatics -proof by laboratory research cell nucleus Gene expression by RNA synthesis mRNA translation by protein synthesis DNA
  14. 19. Paradigm Shift in Life Sciences <ul><li>Past experiments were hypothesis driven </li></ul><ul><ul><li>Evaluate hypothesis </li></ul></ul><ul><ul><li>Complement existing knowledge </li></ul></ul><ul><li>Present experiments are data driven </li></ul><ul><ul><li>Discover knowledge from large amounts of data </li></ul></ul>
  15. 20. DATA MINING : GENOMICS & BIOINFORMATICS <ul><li>Experiments increasingly complex </li></ul><ul><li>Driven by increase of detector developments </li></ul><ul><li>Results in an increase in amount and complexity of data </li></ul><ul><li>DM to harness this development </li></ul><ul><li>To translate data into useful biological, medical, pharmaceutical & agricultural knowledge </li></ul>
  16. 21. DATA MINING : CONCLUSIONS <ul><li>Knowledge discovery from databases </li></ul><ul><li>“ Cutting edge” of the art & science of medicine </li></ul><ul><li>“ Competitive edge” of the business of medicine </li></ul><ul><li>Applicable to other sciences and arts </li></ul><ul><li>Knowledge discovery : </li></ul><ul><li>- in search of excellence (IHSAN) </li></ul><ul><li>- transformation (ISLAH) towards benefiting humanity </li></ul>
  17. 22. Thank You

×