Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Mining The Sky


Published on

Data Mining The Sky

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data Mining The Sky

  1. 1. Mining the sky<br />
  2. 2. Data analysis in astronomy <br />Data mining techniques are rapidly gaining acceptance in a variety of scientific disciplines.<br />Large amount of data collected in astronomical surveys require the use of semi-automated techniques for analysis<br />Focus is on extracting useful information from a single survey<br />
  3. 3. Data mining is a multi-<br />disciplinary field, borrowing <br />and enhancing ideas from<br /> diverse areas such as signal<br /> and image processing, image<br /> understanding, statistics, mathematical optimization, computer vision and pattern recognition.<br />Mining scientific data sets is an area rich in mathematical problems.<br />
  4. 4. Use of data mining techniques in astronomy<br />Data mining is a process of <br />uncovering patterns, anomalies, <br />and statistically significant structures in data<br />Neural networks are used to discriminate between stars and galaxies.<br />SKICAT project for star/galaxy makes use of decision trees in the DPOSS survey.<br />
  5. 5. Astro-informatics<br />Problems in astronomy<br /> increasingly require use of <br />machine learning and data mining<br /> techniques:<br /><ul><li>Detection of spurious objects
  6. 6. Record image
  7. 7. Object classification and clustering
  8. 8. Compression
  9. 9. Source separation</li></li></ul><li>Mining a single astronomical survey<br />Survey is defined by the wavelength of the light used, the depth of the images, and the angular resolution of the images. <br />Data is available in 2 forms-images and a catalog.<br />The original data obtained from the telescope is images, after some processing a catalog is obtained which has information about every object in the image.<br />It is the catalog that’s got more importance than images in the survey.<br />
  10. 10. Issues in astronomy<br />Compression(ex: galaxy images, spectra)<br />Classification(ex: stars, galaxies or gamma<br /> ray bursts)<br />Reconstruction(ex: blurred galaxy images,<br /> mass distribution from week gravitational lensing)<br />Feature extraction(signatures features of<br /> stars, galaxies and quasers)<br />Parameter estimation(ex: star parameter measurement, photometric redshift prediction, cosmological parameters)<br />Model selection( ex: are there 0,1,2,…. Patterns around the star or is there a cosmological model with non-zero nutrino mass more favorable.<br />
  11. 11. Science requirements for data mining<br />Cross-identification: classical problem of associating the source list of one database to the source list of the other.<br />Cross-correlation: search for co-relations, tendencies and trends between physical parameters in multi-dimensional data.<br />Nearest-neighbor identification: general application of clustering algorithms in multi-dimensional parameter space, usually within a database.<br />Systematic data exploration: application of broad range of event based and relationship based queries to a database in the hope of making a discovery of new objects or a class of new objects.<br />
  12. 12. KDD<br />KDD is automatic extraction of non obvious hidden knowledge from large volumes of data.<br />DM becomes the core of knowledge discovery.<br />KDD process involves:<br /><ul><li>Data mining object
  13. 13. Data Preparation
  14. 14. Data Processing
  15. 15. Analysis
  16. 16. Evolution</li></li></ul><li>Primary tasks of data mining:<br /><ul><li>Classification(finding the</li></ul> description of several<br /> predefined classes and classify a<br /> data item into one of them)<br /><ul><li>Regression(mapping the data item into a real valued data item)
  17. 17. Clustering(discovering the most significant changes in the data)
  18. 18. Deviation and change detection(identifying the finite set of clusters or categories in the data)
  19. 19. Dependency modeling (finding a model which describes significant dependencies between the variables)
  20. 20. Summarization(finding a compact description for the summarization of data)</li></li></ul><li>Machine learning and data mining tasks will continue to prove useful with astronomical data bases.<br />