Data analysis in astronomy Data mining techniques are rapidly gaining acceptance in a variety of scientific disciplines. Large amount of data collected in astronomical surveys require the use of semi-automated techniques for analysis Focus is on extracting useful information from a single survey
Data mining is a multi- disciplinary field, borrowing and enhancing ideas from diverse areas such as signal and image processing, image understanding, statistics, mathematical optimization, computer vision and pattern recognition. Mining scientific data sets is an area rich in mathematical problems.
Use of data mining techniques in astronomy Data mining is a process of uncovering patterns, anomalies, and statistically significant structures in data Neural networks are used to discriminate between stars and galaxies. SKICAT project for star/galaxy makes use of decision trees in the DPOSS survey.
Astro-informatics Problems in astronomy increasingly require use of machine learning and data mining techniques:
Detection of spurious objects
Object classification and clustering
Mining a single astronomical survey Survey is defined by the wavelength of the light used, the depth of the images, and the angular resolution of the images. Data is available in 2 forms-images and a catalog. The original data obtained from the telescope is images, after some processing a catalog is obtained which has information about every object in the image. It is the catalog that’s got more importance than images in the survey.
Issues in astronomy Compression(ex: galaxy images, spectra) Classification(ex: stars, galaxies or gamma ray bursts) Reconstruction(ex: blurred galaxy images, mass distribution from week gravitational lensing) Feature extraction(signatures features of stars, galaxies and quasers) Parameter estimation(ex: star parameter measurement, photometric redshift prediction, cosmological parameters) Model selection( ex: are there 0,1,2,…. Patterns around the star or is there a cosmological model with non-zero nutrino mass more favorable.
Science requirements for data mining Cross-identification: classical problem of associating the source list of one database to the source list of the other. Cross-correlation: search for co-relations, tendencies and trends between physical parameters in multi-dimensional data. Nearest-neighbor identification: general application of clustering algorithms in multi-dimensional parameter space, usually within a database. Systematic data exploration: application of broad range of event based and relationship based queries to a database in the hope of making a discovery of new objects or a class of new objects.
KDD KDD is automatic extraction of non obvious hidden knowledge from large volumes of data. DM becomes the core of knowledge discovery. KDD process involves:
Data mining object
Primary tasks of data mining:
description of several predefined classes and classify a data item into one of them)
Regression(mapping the data item into a real valued data item)
Clustering(discovering the most significant changes in the data)
Deviation and change detection(identifying the finite set of clusters or categories in the data)
Dependency modeling (finding a model which describes significant dependencies between the variables)
Summarization(finding a compact description for the summarization of data)
Machine learning and data mining tasks will continue to prove useful with astronomical data bases.