"Recent Advances in Data Mining and Applications for Heliophysics"

1,156 views
1,022 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,156
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"Recent Advances in Data Mining and Applications for Heliophysics"

  1. 1. Recent Advances in Data Mining and Applications for Heliophysics Kirk D. Borne George Mason University and QSS Group Inc., NASA-Goddard [email_address] or [email_address] http://rings.gsfc.nasa.gov/nvo_datamining.html
  2. 2. “ Recent Advances in Data Mining and Applications for Solar and Space Physics” LSSP Seminar – GSFC Code 612 – June 9, 2006 Kirk Borne (QSS / SSDOO) ABSTRACT : Modern approaches to long-standing scientific research problems now rely heavily upon novel computational techniques. Much of this is driven by a common research challenge that pervades most disciplines: the growing volumes of scientific data that need to be processed and analyzed. Relevant computational techniques include data mining, evolutionary computing, and high-performance computing (HPC). For example, facilitating data-intensive science is a focus of the Goddard space sciences HPC initiative. Several examples of scientific data mining in large data sets will be presented from the author's own astronomy research. Additional examples will be given from the author's collaborations in the fields of: data mining of remote sensing data for wildfire detection; and data mining within Solar coronal mass ejection data sets. The goals of the talk are to illustrate and to motivate collaborative research opportunities across the LSSP, involving scientific discovery within existing and upcoming large solar and space physics mission data collections. Our goals would be: (a) to demonstrate and augment the legacy value of the tremendous investment of resources that have gone into the acquisition of large NASA mission data sets; and (b) to reap the maximum scientific benefit from those investments. BIO : Dr. Kirk Borne has a PhD in Astronomy from Caltech, and he subsequently had positions at the University of Michigan, Carnegie's Department of Terrestrial Magnetism, Space Telescope Science Institute, and Hughes/Raytheon STX in Goddard's Code 631. He currently works for QSS Group Inc. as Program Manager for Goddard's SSDOO support contract, managing staff in Codes 612.4, 690.1, and 605. Dr. Borne is also Associate Research Professor of Astrophysics and Computational Sciences at George Mason University (GMU) in Fairfax Virginia, and he is also Adjunct Associate Professor in the Database Technologies Program at the University of Maryland University College where he teaches a graduate course in data mining. He is a senior member of the U.S. National Virtual Observatory (NVO) project and of the planned Large Synoptic Survey Telescope project. His research interests include: extragalactic astronomy, numerical modeling, scientific data mining, computational science, and science education technologies.
  3. 3. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  4. 4. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  5. 5. The New Face of Science – 1 <ul><li>Big Data (usually geographically distributed) </li></ul><ul><ul><li>High-Energy Particle Physics </li></ul></ul><ul><ul><li>Astronomy and Space Physics </li></ul></ul><ul><ul><li>Earth Observing System (Remote Sensing) </li></ul></ul><ul><ul><li>Human Genome and Bioinformatics </li></ul></ul><ul><ul><li>Numerical Simulations of any kind </li></ul></ul><ul><ul><li>Digital Libraries (electronic publication repositories) </li></ul></ul><ul><li>e-Science </li></ul><ul><ul><li>Built on Web Services (e-Gov, e-Biz) paradigm </li></ul></ul><ul><ul><li>Distributed heterogeneous data are the norm </li></ul></ul><ul><ul><li>Data integration across projects & institutions </li></ul></ul><ul><ul><li>One-stop shopping: “The right data, right now.” </li></ul></ul>
  6. 6. <ul><li>Databases enable scientific discovery </li></ul><ul><ul><li>Data Handling and Archiving (management of massive data resources) </li></ul></ul><ul><ul><li>Data Discovery (finding data wherever they exist) </li></ul></ul><ul><ul><li>Data Access (WWW-Database interfaces) </li></ul></ul><ul><ul><li>Data/Metadata Browsing (serendipity) </li></ul></ul><ul><ul><li>Data Sharing and Reuse (within project teams; and by other scientists – scientific validation) </li></ul></ul><ul><ul><li>Data Integration (from multiple sources) </li></ul></ul><ul><ul><li>Data Fusion (across multiple modalities & domains) </li></ul></ul><ul><ul><li>Data Mining (KDD = Knowledge Discovery in Databases) </li></ul></ul>The New Face of Science – 2
  7. 7. The Promise of e-Science <ul><li>The best of Google and Amazon.com </li></ul><ul><ul><li>Go to one place to shop for all your data needs </li></ul></ul><ul><ul><li>Use scientific indexing (through scientific metadata) </li></ul></ul><ul><ul><li>Find the data that you need </li></ul></ul><ul><ul><li>Ignore data that are not relevant </li></ul></ul><ul><ul><li>Recommend “also relevant” data sets </li></ul></ul><ul><ul><li>Access distributed data seamlessly (transparently) </li></ul></ul><ul><ul><li>Integrate multiple data sets </li></ul></ul><ul><ul><li>Integrate data sets into analysis/visualization software packages </li></ul></ul><ul><ul><li>Provide value-added services </li></ul></ul><ul><ul><li>Provide intelligence within the archive </li></ul></ul><ul><ul><li>Provide intelligence at the point of service </li></ul></ul>
  8. 8. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  9. 9. Sun-Earth Space Environment – Rich Source of Heliophysical Phenomena
  10. 10. Multi-point Observations and Models of Space Plasmas Deliver a Deluge of Physical Measurements
  11. 12. Space Science data volumes are growing and growing and… <ul><li>a few terabytes &quot;yesterday” (10,000 CDROMs) </li></ul><ul><li>tens of terabytes &quot;today” (100,000 CDROMs) </li></ul><ul><li>100’s of petabytes &quot;tomorrow&quot; (within 10-20 years) (1,000,000,000 CDROMs) </li></ul>
  12. 13. Technological Advances: the cause and the solution?
  13. 14. Data Access and Analysis Tools are Essential, but do not scale well with Exponential Data Growth
  14. 15. The Data Flood is Everywhere! <ul><li>Huge quantities of data are being generated in all business, government, and research domains: </li></ul><ul><ul><li>Banking, retail, marketing, telecommunications, homeland security, computer networks, other business transactions ... </li></ul></ul><ul><ul><li>Scientific data: genomics, space science , physics, etc. </li></ul></ul><ul><ul><li>Web, text, and e-commerce </li></ul></ul>
  15. 16. (Credit: Tim Eastman)
  16. 17. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  17. 18. How do we learn about our Universe and the World around us? WE GATHER INFORMATION , FROM WHICH WE DERIVE KNOWLEDGE , FROM WHICH WE LEARN WHAT IT ALL MEANS Data  Information  Knowledge  Understanding / Wisdom!
  18. 19. Data-Information-Knowledge-Wisdom <ul><li>T.S. Eliot (1934): </li></ul><ul><li>“ Where is the wisdom we have lost in knowledge? </li></ul><ul><li>Where is the knowledge we have lost in information?” </li></ul>
  19. 20. Understanding: the Universe is expanding!! Astronomy Example Data: <ul><li>Information (catalogs / databases): </li></ul><ul><ul><li>Measure brightness of galaxies from image (e.g., 14.2 or 21.7) </li></ul></ul><ul><ul><li>Measure redshift of galaxies from spectrum (e.g., 0.0167 or 0.346) </li></ul></ul><ul><li>Knowledge: </li></ul><ul><ul><li>Hubble Diagram  </li></ul></ul><ul><ul><li>Redshift-Brightness Correlation  </li></ul></ul><ul><ul><li>Redshift = Distance </li></ul></ul>(a) Imaging data (ones & zeroes) (b) Spectral data (ones & zeroes)
  20. 21. So what is Data Mining? <ul><li>Data Mining is Knowledge Discovery in Databases ( KDD ) </li></ul><ul><li>Data mining is defined as “ an information extraction activity whose goal is to discover hidden facts contained in (large) databases .&quot; </li></ul>
  21. 22. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  22. 23. Data Mining <ul><li>Data Mining is the Killer App for Scientific Databases. </li></ul><ul><li>Scientific Data Mining References : </li></ul><ul><ul><li>http://rings.gsfc.nasa.gov/nvo_datamining.html </li></ul></ul><ul><ul><li>http://www.itsc.uah.edu/f-mass/ </li></ul></ul><ul><ul><ul><li>Framework for Mining and Analysis of Space Science data (F-MASS) </li></ul></ul></ul><ul><li>Data mining is used to find patterns and relationships in data. (EDA = Exploratory Data Analysis) </li></ul><ul><li>Patterns can be analyzed via 2 types of models: </li></ul><ul><ul><li>Descriptive : Describe patterns and to create meaningful subgroups or clusters. (Unsupervised Learning, Clustering) </li></ul></ul><ul><ul><li>Predictive : Forecast explicit values, based upon patterns in known results. (Supervised Learning, Classification) </li></ul></ul><ul><li>How does this apply to Scientific Research? … </li></ul><ul><ul><li>through KNOWLEDGE DISCOVERY </li></ul></ul><ul><li>Data  Information  Knowledge  Understanding / Wisdom! </li></ul>
  23. 24. Data Mining is a core database function <ul><li>Data Mining has many names / aliases : </li></ul><ul><ul><li>Knowledge Discovery in Databases (KDD) </li></ul></ul><ul><ul><li>Machine Learning (ML) </li></ul></ul><ul><ul><li>Exploratory Data Analysis (EDA) </li></ul></ul><ul><ul><li>Intelligent Data Analysis (IDA) </li></ul></ul><ul><ul><li>On-Line Analytical Processing (OLAP) </li></ul></ul><ul><ul><li>Business Intelligence (BI) </li></ul></ul><ul><ul><li>Customer Relationship Management (CRM) </li></ul></ul><ul><ul><li>Business Analytics </li></ul></ul><ul><ul><li>Target Marketing </li></ul></ul><ul><ul><li>Cross-Selling </li></ul></ul><ul><ul><li>Market Basket Analysis </li></ul></ul><ul><ul><li>Credit Scoring </li></ul></ul><ul><ul><li>Case-Based Reasoning (CBR) </li></ul></ul><ul><ul><li>Connecting the Dots </li></ul></ul><ul><ul><li>Intrusion Detection Systems (IDS) </li></ul></ul><ul><ul><li>Recommendation / Personalization Systems! </li></ul></ul>
  24. 25. Data Mining is Ready for Prime Time <ul><li>Why are Data Mining & Knowledge Discovery such hot topics? -- because of the enormous interest in existing huge databases and their potential for new discoveries . </li></ul><ul><li>Data mining is ready for general application because it engages three technologies that are now sufficiently mature: </li></ul><ul><ul><ul><ul><li>Massive data collection & delivery </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Powerful multiprocessor computers </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Sophisticated data mining algorithms </li></ul></ul></ul></ul><ul><li>5 Reasons to use Data Mining: </li></ul><ul><ul><li>Most agencies collect and refine massive quantities of data. </li></ul></ul><ul><ul><li>Data mining moves beyond the analysis of past events … to predicting future trends and behaviors that may be missed because they lie outside the experts’ expectations. </li></ul></ul><ul><ul><li>Data mining tools can answer complex questions that traditionally were too time- consuming to resolve. </li></ul></ul><ul><ul><li>Data mining tools can explore the intricate interdependencies within databases in order to discover hidden patterns and relationships. </li></ul></ul><ul><ul><li>Data mining allows decision-makers to make proactive, knowledge-driven decisions. </li></ul></ul>
  25. 26. Examples of real Data Mining in Action <ul><li>Classic Textbook Example of Data Mining (Legend?) : Data mining of grocery store logs indicated that men who buy diapers also tend to buy beer at the same time. </li></ul><ul><li>Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. </li></ul><ul><li>Astronomers examined objects with extreme colors in a huge database to discover the most distant Quasars ever seen. </li></ul><ul><li>Credit card companies recommend products to cardholders based on analysis of their monthly expenditures. </li></ul><ul><li>Airline purchase transaction logs revealed that 9-11 hijackers bought one-way airline tickets with the same credit card. </li></ul><ul><li>Wal-Mart studied product sales in their Florida stores in 2004 when several hurricanes passed through Florida. Wal-Mart found that, before the hurricanes arrived, people purchased 7 times as many strawberry pop tarts compared to normal shopping days. </li></ul>
  26. 27. Strawberry pop tarts???
  27. 28. Mega-Flares on normal Sun-like stars = a star like our Sun increased in brightness 300X one night! … say what?? Exploring the Time Domain Astronomy Data Mining in Action
  28. 29. <ul><li>Clustering </li></ul><ul><li>Classification </li></ul><ul><li>Associations </li></ul><ul><li>Neural Nets </li></ul><ul><li>Decision Trees </li></ul><ul><li>Pattern Recognition </li></ul><ul><li>Correlation/Trend Analysis </li></ul><ul><li>Principal Component Analysis </li></ul><ul><li>Independent Component Analysis </li></ul><ul><li>Regression Analysis </li></ul><ul><li>Outlier/Glitch Identification </li></ul><ul><li>Visualization </li></ul><ul><li>Autonomous Agents </li></ul><ul><li>Self-Organizing Maps (SOM) </li></ul><ul><li>Link (Affinity Analysis) </li></ul>Data Mining Methods and Some Examples Classify new data items using the known classes & groups Find unusual co-occurring associations of attribute values among DB items Organize information in the database based on relationships among key data descriptors Identify linkages between data items based on features shared in common Group together similar items and separate dissimilar items in DB Predict a numeric attribute value
  29. 30. Some Data Mining Techniques Graphically Represented <ul><li>Self-Organizing Map (SOM) </li></ul>Outlier (Anomaly) Detection Clustering Link Analysis Decision Tree Neural Network
  30. 31. Data Mining Application: Outlier Detection Figure : The clustering of data clouds (dc#) within a multidimensional parameter space (p#). Such a mapping can be used to search for and identify clusters, voids, outliers, one-of-kinds, relationships, and associations among arbitrary parameters in a database (or among various parameters in geographically distributed databases). <ul><ul><ul><li>statistical analysis of “typical” events </li></ul></ul></ul><ul><ul><ul><li>automated search for “rare” events </li></ul></ul></ul>
  31. 32. Outlier Detection: Serendipitous Discovery of Rare or New Objects & Events
  32. 33. Learning From Legacy Temporal Data (Time Series): Classify New Data (Bayes Analysis or Markov Modeling)
  33. 34. Principal Components Analysis & Independent Components Analysis Cepheid Variables: Cosmic Yardsticks -- One Correlation -- Two Classes!
  34. 35. Classification Methods: Decision Trees, Neural Networks, SVM (Support Vector Machines) <ul><li>There are 2 Classes! </li></ul><ul><li>How do you ... </li></ul><ul><li>Separate them? </li></ul><ul><li>Distinguish them? </li></ul><ul><li>Learn the rules? </li></ul><ul><li>Classify them? </li></ul>Apply Kernel (SVM)
  35. 36. Data Mining: For Exploration, Discovery, and Decision Support (in science, government, homeland security, and business)
  36. 37. Sample Space Science Data Mining Use Cases <ul><ul><li>Discover data stored in geographically distributed heterogeneous systems. </li></ul></ul><ul><ul><li>Search huge databases for trends and correlations in high-dimensional parameter spaces: identify new properties or new classes of scientific objects. </li></ul></ul><ul><ul><li>Discover new linkages & associations among data parameters. </li></ul></ul><ul><ul><li>Search for rare, one-of-a-kind, and exotic objects in huge databases. </li></ul></ul><ul><ul><li>Identify repeating patterns of temporal variations from millions or billions of observations. </li></ul></ul><ul><ul><li>Identify parameter glitches / anomalies / deviations either in static databases (e.g., archives) or in dynamic data (e.g., science / instrumental / engineering data streams). </li></ul></ul><ul><ul><li>Find clusters, nearest neighbors, outliers, and/or zones of avoidance in the distribution of objects or other observables in arbitrary parameter spaces. </li></ul></ul><ul><ul><li>Serendipitously explore huge scientific databases through access to distributed, autonomous, federated, heterogeneous, multi-experiment, multi-institutional scientific data archives. </li></ul></ul>
  37. 38. OUTLINE <ul><li>The New Face of Science </li></ul><ul><li>Heliophysics (Data) Environment </li></ul><ul><li>Knowledge Discovery </li></ul><ul><li>Data Mining Examples and Techniques </li></ul><ul><li>Discovery Informatics for Large Database Science </li></ul><ul><ul><li>Heliophysics Example </li></ul></ul>
  38. 39. Existing Space Science Data Infrastructure <ul><li>The Recent Past: many independent distributed heterogeneous data archives </li></ul><ul><li>Today: VxO’s = Virtual Observatories </li></ul><ul><ul><li>Web Services-enabled: e-Science paradigm (middleware, standards, protocols) ** </li></ul></ul><ul><ul><li>Provides seamless uniform access to distributed heterogenous data sources </li></ul></ul><ul><ul><ul><li>“ Find the right data, right now” </li></ul></ul></ul><ul><ul><ul><li>“ One-stop shopping for all of your data needs” </li></ul></ul></ul><ul><ul><li>Emerging environment consists of many VxO’s – for example: </li></ul></ul><ul><ul><ul><li>NVO = National Virtual Observatory (precursor to VAO = Virtual Astro Obs) </li></ul></ul></ul><ul><ul><ul><li>VSO = Virtual Solar Observatory </li></ul></ul></ul><ul><ul><ul><li>VSPO = Virtual Space Physics Observatory </li></ul></ul></ul><ul><ul><ul><li>NVAO = National Virtual Aeronomy Observatory </li></ul></ul></ul><ul><ul><ul><li>VITMO = Virtual Ionospheric, Thermospheric, Magnetospheric Observatory </li></ul></ul></ul><ul><ul><ul><li>VHO = Virtual Heliospheric Observatory </li></ul></ul></ul><ul><ul><ul><li>VMO = Virtual Magnetospheric Observatory </li></ul></ul></ul><ul><li>** Standards for data formats, data/metadata exchange, data models, registries, Web Services, VO queries, query results, semantics </li></ul><ul><li>** And of course: The Grid, Web Services, Semantic Web, etc. ... </li></ul>
  39. 40. Our science data systems should enable distributed multi-mission database access, discovery, mining, and analysis.  DISCOVERY INFORMATICS
  40. 41. What is Informatics? <ul><li>Informatics is the discipline of structuring, storing, accessing, and distributing information describing complex systems. </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Bioinformatics </li></ul></ul><ul><ul><li>Geographic Information Systems (= Geoinformatics) </li></ul></ul><ul><ul><li>New! Discovery Informatics for Space Science </li></ul></ul><ul><li>Common features of X-informatics: </li></ul><ul><ul><li>Basic object granule is defined </li></ul></ul><ul><ul><li>Common community tools operate on object granules </li></ul></ul><ul><ul><li>Data-centric and Information-centric approaches </li></ul></ul><ul><ul><li>Data-driven science </li></ul></ul><ul><ul><li>X-informatics is key enabler of scientific discovery in the era of large data science </li></ul></ul>
  41. 42. X-Informatics Compared <ul><li>Discipline X </li></ul><ul><li>Bioinformatics </li></ul><ul><li>Geoinformatics </li></ul><ul><li>Space Science Informatics </li></ul><ul><li>Common Tools </li></ul><ul><li>BLAST, FASTA </li></ul><ul><li>GIS </li></ul><ul><li>Classification, Clustering, Bayes Inference, Cross Correlations, Principal Components, ??? </li></ul><ul><li>Object Granules </li></ul><ul><li>Gene Sequence </li></ul><ul><li>Points, Vectors, Polygons </li></ul><ul><li>Time Series, Event List, Catalog </li></ul>
  42. 43. Discovery Informatics <ul><li>Key enabler for new science discovery in large databases </li></ul><ul><li>Essential tool (Large data science is here to stay) </li></ul><ul><li>Common data integration, browse, and discovery tools will enable exponential knowledge discovery within exponentially growing data collections </li></ul><ul><li>X-informatics represents the 3 rd leg of scientific research: experiment, theory, and data-driven exploration (Reference: Jim Gray, KDD-2003) </li></ul><ul><li>Discovery Informatics should parallel Bioinformatics and Geoinformatics: become a stand-alone research sub-discipline </li></ul>
  43. 44. Key Role of Data Mining <ul><li>Data Mining (KDD) is the killer app for scientific databases </li></ul><ul><li>Space and Earth Science Examples: </li></ul><ul><ul><li>Neural Network for Pixel Classification: Event Detection and Prediction (e.g., Wildfires) </li></ul></ul><ul><ul><li>Bayesian Network for Object Classification </li></ul></ul><ul><ul><li>PCA for finding Fundamental Planes of Galaxy Parameters </li></ul></ul><ul><ul><li>PCA (weakest component) for Outlier Detection: anomalies, novel discoveries, new objects </li></ul></ul><ul><ul><li>Link Analysis (Association Mining) for Causal Event Detection (e.g., linking Solar Surface, CME, and Space Weather events) </li></ul></ul><ul><ul><li>Clustering analysis: Spatial, Temporal, or any scientific database parameters </li></ul></ul><ul><ul><li>Markov models: Temporal mining of time series data </li></ul></ul>
  44. 45. Space Science Knowledge Discovery
  45. 46. This is the Informatics Layer
  46. 47. This is the Informatics Layer <ul><li>Informatics Layer: </li></ul><ul><li>Provides standardized representations of the “information extracted” – for use in the KDD (data mining) layer. </li></ul><ul><li>Standardization is not required (nor feasible) at the “data source” layer. </li></ul><ul><li>The informatics is discipline-specific. </li></ul><ul><li>Informatics enables KDD across large distributed heterogeneous scientific data repositories. </li></ul>
  47. 48. Space Weather Example CME = Coronal Mass Ejection SEP = Solar Energetic Particle
  48. 49. Key Role of Discovery Informatics <ul><li>The key role of Discovery Informatics is : </li></ul><ul><ul><li>... data integration and fusion ... </li></ul></ul><ul><ul><li>... across multiple heterogeneous data collections ... </li></ul></ul><ul><ul><li>... to enable scientific knowledge discovery ... </li></ul></ul><ul><ul><li>... and decision support. </li></ul></ul>
  49. 50. Future Work: Discovery Informatics Applications <ul><li>Query-By-Example (QBE) science data systems: </li></ul><ul><ul><li>“ Find more data entries similar to this one” </li></ul></ul><ul><ul><li>“ Find the data entry most dissimilar to this one” </li></ul></ul><ul><li>Automated Recommendation (Filtering) Systems: </li></ul><ul><ul><li>“ Other users who examined these data also retrieved the following...” </li></ul></ul><ul><ul><li>“ Other data that are relevant to these data include...” </li></ul></ul><ul><li>Information Retrieval Metrics for Scientific Databases: </li></ul><ul><ul><li>Precision = “How much of the retrieved data is relevant to my query?” </li></ul></ul><ul><ul><li>Recall = “How much of the relevant data did my query retrieve?” </li></ul></ul><ul><li>Semantic Annotation (Tagging) Services: </li></ul><ul><ul><li>Report discoveries back to the science database for community reuse </li></ul></ul><ul><li>Science / Technical / Math (STEM) Education: </li></ul><ul><ul><li>Transparent reuse and analysis of scientific data in inquiry-based classroom learning ( http://serc.carleton.edu/usingdata/ , DLESE.org ) </li></ul></ul><ul><li>Key concepts that need defining (by community consensus): Similarity, Relevance, Semantics (dictionaries, ontologies) </li></ul>
  50. 52. (science knowledge sharing & re-use) ( *** Repositories of information, knowledge, and scientific results.) ( *** ) ( *** )
  51. 53. Informatics: Synergy between Scientific Measurement, Mining, and Modeling
  52. 54. Data Mining and Discovery Informatics: It is more than just connecting the dots Reference: http://homepage.interaccess.com/~purcellm/lcas/Cartoons/cartoons.htm

×