TO STUDY MACHINE LEARNING APPROACH FOR CLASSIFICATION OF ASTRONOMICAL STRUCTURE<br />Sky Image Cataloguing and Analysis To...
 Catalog management
 High-level statistical and scientific analysis.</li></ul>SKICAT is based on state-of-the-art machine learning, high perfo...
star with fuzz (sf),
galaxy (g),
artifact (long)</li></ul>Decision Tree Algorithms Used:<br /><ul><li>ID3
Upcoming SlideShare
Loading in …5
×

Machine learning astronomical structure

434
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
434
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Machine learning astronomical structure

  1. 1. TO STUDY MACHINE LEARNING APPROACH FOR CLASSIFICATION OF ASTRONOMICAL STRUCTURE<br />Sky Image Cataloguing and Analysis Tool (SKI CAT) to Classify new astronomical structures<br />Introduction:<br />In astronomy and space sciences, we currently face a data glut crisis. The problem of dealing with the huge volume of data accumulated from a variety of sources, of correlating the data and extracting and visualizing the important trends, is now fully recognized. This problem will become more acute very rapidly, with the advent of new telescopes, detectors, and space missions, with the data flux measured in terabytes. We face a critical need for information processing technology and methodology with which to manage this data avalanche in order to produce interesting scientific results quickly and efficiently. Developments in the fields of machine learning and AI can provide at least some solutions. Much of the future of scientific information processing lies in the implementation of these methods.<br />We present an application of supervised classification to the automation of the tasks of cataloging and analyzing objects in digitized sky images. The Sky Image Cataloging and Analysis Tool (SKICAT) was developed for use on the images resulting from the 2nd Palomar Observatory Sky Survey (POSS-II) conducted by the California Institute of Technology (Caltech). The photographic plates collected from the survey are digitized at the Space Telescope Science Institute. This process will result in about 3,000 digital images of 23,040 x 23,040 16-bit pixels each, totalling over 3 terabytes of data. When complete, the survey will cover the entire northern sky in three colors, detecting virtually every sky object down to a B magnitude of 22. This is at least one magnitude fainter than previous comparable photographic surveys. We estimate that there are on the order of 107 galaxies 109 stellar objects (including over 105 quasars) are detectable in this survey. This data set will be the most comprehensive large-scale imaging survey produced to date and will not be surpassed in scope until the completion of a fully digital all-sky survey. The purpose of SKICAT is to enable and maximize the extraction of meaningful information from such a large database in timely manner. The system is built in a modular way, incorporating several existing algorithms and packages. There are three basic functional components to SKICAT, serving the purposes of sky object catalog construction, catalog management, and high-level statistical and scientific analysis.<br /> SKICAT: <br />.Manual analysisRemoval of noiseImage segmentationFeature extractionclassification<br />Classification method: decision tree classifier<br /> Evaluation <br />Much faster than manual classification<br />Classifies also very faint celestial objects<br />The purpose of SKICAT is to enable and maximize the extraction of meaningful information from such a large database in timely manner. <br />There are three basic functional components to SKICAT, serving the purposes of sky object <br /><ul><li>Catalog construction
  2. 2. Catalog management
  3. 3. High-level statistical and scientific analysis.</li></ul>SKICAT is based on state-of-the-art machine learning, high performance database and image processing techniques (FOCUS).<br />Core of the new system includes two integrated machine learning mathematical formulas, called algorithms. These algorithms automatically produce decision trees for the computer based on astronomer-provided training data or examples. A machine learning program learns to classify new data based on training data provided by human experts.<br />SKICAT has a correct sky object classification rate of about 94%, which exceeds the performance requirement of 90 percent needed for accurate scientific analysis of the data.<br />The best performance of a commercially available learning algorithm was about 75%. By training the learning algorithms to predict classes for faint astronomical objects on the survey plates, the algorithms can learn to classify objects that actually are too faint for humans to recognize.<br />The training data for faint objects was obtained from a limited set of charge coupled device (CCD) images taken at a much higher resolution than the survey images.<br />The SKICAT system will produce a comprehensive survey catalog database containing about one-half billion entries by automatically processing about three terabytes (24 trillion bits, 8-bits to a byte) of image data.<br />SKICAT can classify sky objects that are too faint for humans to recognize, the SKICAT catalog will contain a wealth of new information not obtainable using traditional cataloging methods.<br />We are currently classifying objects into four major categories:<br /><ul><li>star (s),
  4. 4. star with fuzz (sf),
  5. 5. galaxy (g),
  6. 6. artifact (long)</li></ul>Decision Tree Algorithms Used:<br /><ul><li>ID3
  7. 7. GID3*
  8. 8. O-BTree
  9. 9. Ruler System</li></ul>ID3:<br />ID3 starts by placing all the training examples at the root node of the tree. An attribute is selected (partition the data for each value of the attribute a branch is created) the corresponding subset of examples that have the attribute value specified by the branch are moved to the newly created child node. The algorithm is applied recursively to each child node until either all examples at a node are of one class, or all the examples at that node have the same values for all the attribute. Every leaf in the decision tree represents a classification rule.<br />GID3*:<br /> It utilizes a vector distance measure applied to the class vectors of an example partition, in conjunction with the entropy measure, to create for each attribute a phantom attribute that has only a subset of the original attribute’s value. We generalised the ID3 algorithm so that it does not necessarily branch on each value of the chosen attribute, GlD3* can branch on arbitrary individual values of an attribute and “lump” the rest of the values in a single default branch. Unlike the other branches of the tree which represent a single value, the default branch represents a subset of values an attribute. Unnecessary subdivision of the data may thus be reduced.<br />O-BTree:<br /> The O-Btree algorithm [Fayy92b] was designed to overcome problems with the information entropy selection measure itself. O-Btree creates strictly binary trees and utilizes a measure from a family of measures (C-SEP) that (detects class separation rather than class impurity. Information entropy is a member of the class of impurity measure. O-Btree employs an Orthogonality measure rather than entropy for branching.<br />Ruler System:<br />PERFORMANCE MEASURE: <br />

×