Mining the sky
Data analysis in astronomy Data mining techniques are rapidly gaining acceptance in a variety of scientific disciplines.Large amount of data collected in astronomical surveys require the use of semi-automated  techniques for analysisFocus is on extracting useful information from a single survey
Data mining is a multi-disciplinary field, borrowing and enhancing ideas from diverse areas such as signal and image processing, image understanding, statistics, mathematical optimization, computer vision and pattern recognition.Mining scientific data sets is an area rich in mathematical problems.
Use of data mining techniques in astronomyData mining is a process of uncovering patterns, anomalies, and statistically significant structures in dataNeural networks are used to discriminate between stars and galaxies.SKICAT project for star/galaxy makes use of decision trees in the DPOSS survey.
Astro-informaticsProblems in astronomy increasingly require use of machine learning and data mining techniques:Detection of spurious objects
Record image
Object classification and clustering
Compression
Source separationMining a single astronomical surveySurvey is defined by the wavelength of the light used, the depth of the images, and the angular resolution of the images. Data is available in 2 forms-images and a catalog.The original data obtained from the telescope is images, after some processing a catalog is obtained which has information about every object in the image.It is the catalog that’s got more importance than images in the survey.
Issues in astronomyCompression(ex: galaxy images, spectra)Classification(ex: stars, galaxies or gamma ray bursts)Reconstruction(ex: blurred galaxy images, mass distribution from week gravitational lensing)Feature extraction(signatures features of stars, galaxies and quasers)Parameter estimation(ex: star parameter measurement, photometric redshift prediction, cosmological parameters)Model selection( ex: are there 0,1,2,…. Patterns around the star or is there a cosmological model with non-zero nutrino mass more favorable.
Science requirements for data miningCross-identification: classical problem of associating the source list of one database to the source list of the other.Cross-correlation: search for co-relations, tendencies and trends between physical parameters in multi-dimensional data.Nearest-neighbor identification: general application of clustering algorithms in multi-dimensional parameter space, usually within a database.Systematic data exploration: application of broad range of event based and relationship based queries to a database in the hope of making a discovery of new objects or a class of new objects.

Data Mining The Sky

  • 1.
  • 2.
    Data analysis inastronomy Data mining techniques are rapidly gaining acceptance in a variety of scientific disciplines.Large amount of data collected in astronomical surveys require the use of semi-automated techniques for analysisFocus is on extracting useful information from a single survey
  • 3.
    Data mining isa multi-disciplinary field, borrowing and enhancing ideas from diverse areas such as signal and image processing, image understanding, statistics, mathematical optimization, computer vision and pattern recognition.Mining scientific data sets is an area rich in mathematical problems.
  • 4.
    Use of datamining techniques in astronomyData mining is a process of uncovering patterns, anomalies, and statistically significant structures in dataNeural networks are used to discriminate between stars and galaxies.SKICAT project for star/galaxy makes use of decision trees in the DPOSS survey.
  • 5.
    Astro-informaticsProblems in astronomyincreasingly require use of machine learning and data mining techniques:Detection of spurious objects
  • 6.
  • 7.
  • 8.
  • 9.
    Source separationMining asingle astronomical surveySurvey is defined by the wavelength of the light used, the depth of the images, and the angular resolution of the images. Data is available in 2 forms-images and a catalog.The original data obtained from the telescope is images, after some processing a catalog is obtained which has information about every object in the image.It is the catalog that’s got more importance than images in the survey.
  • 10.
    Issues in astronomyCompression(ex:galaxy images, spectra)Classification(ex: stars, galaxies or gamma ray bursts)Reconstruction(ex: blurred galaxy images, mass distribution from week gravitational lensing)Feature extraction(signatures features of stars, galaxies and quasers)Parameter estimation(ex: star parameter measurement, photometric redshift prediction, cosmological parameters)Model selection( ex: are there 0,1,2,…. Patterns around the star or is there a cosmological model with non-zero nutrino mass more favorable.
  • 11.
    Science requirements fordata miningCross-identification: classical problem of associating the source list of one database to the source list of the other.Cross-correlation: search for co-relations, tendencies and trends between physical parameters in multi-dimensional data.Nearest-neighbor identification: general application of clustering algorithms in multi-dimensional parameter space, usually within a database.Systematic data exploration: application of broad range of event based and relationship based queries to a database in the hope of making a discovery of new objects or a class of new objects.