SlideShare a Scribd company logo
1 of 16
Download to read offline
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
DOI : 10.5121/ijnsa.2012.4213 163
IDENTIFICATION OF IMAGE SPAM BY
USING LOW LEVEL & METADATA
FEATURES
Anand Gupta1
, Chhavi Singhal2
and Somya Aggarwal1
1
Department of Computer Engineering,
2
Department of Electronic and Communication Engineering
Netaji Subhas Institute of Technology, New Delhi, India
{omaranand, chhavisinghal28, somya3322}@gmail.com
ABSTRACT
Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now
research in spam image identification has been addressed by considering properties like colour, size,
compressibility, entropy, content etc. However, we feel the methods of identification so evolved have
certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and
wide variety of fonts and formats .To overcome these limitations, we have proposed 2
methodologies(however there can be more). Each methodology has 4 stages. Both the methodologies are
almost similar except in the second stage where methodology I extracts low level features while the other
extracts metadata features. Also a comparison between both the methodologies is shown. The method
works on images with and without noise separately. Colour properties of the images are altered so that
OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed
methods are tested on a dataset of 1984 spam images and are found to be effective in identifying all types
of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging
experimental results show that the methodology I achieves an accuracy of 92% while the other achieves
an accuracy of 93.3%.
KEYWORDS
Low level feature, anti obfuscation technique, noise & entropy
1. INTRODUCTION
Image spam is a kind of spam in e-mail where the message text of the spam is presented as an
image file. Anti spam filters label an e-mail (with image attached) as spam if they find
suspicious text embedded in that image. For that, the filters employ OCR that reads text
embedded in images. It works by measuring the geometry in images, searching for shapes that
match the shapes of letters, then translating a matched geometric shape into real text. To defeat
OCR, spammers upset the geometry of letters enough—by altering colours, for example—so
that OCR can't "see" a letter, even though the human eye easily recognize it. To overcome this
falsity, low level and metadata features of images are extracted as they are effective against
randomly added noises and simple translational shift of the images. We now review the prior
significant work in the area of image spam identification.
2. PRIOR WORK
Till now spam identification has been carried out by considering the following spam image
properties.
1. Content (C) 2. Metadata features (M) 3. Low level features (L) 4. Text region (T)
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
164
The following matrix shows the properties as used in the previous works. Whereas the left most
column shows the reference numbers, the top most row shows the properties employed as given
above. Numerals ‘1’ and ‘0’ mean that the given property is used and not used respectively.
In [1], a scheme is proposed which implements a spam filter based on the text in the subject and
body fields of e-mails, and the text embedded into attached images. The conventional document
processing steps (tokenization, indexing and classification) are improved upon in [1] by
employing text extraction using OCR from attached images. SpamAssassin (SA)[2] is a widely
deployed filter program that uses OCR software to pull words out of images and then uses the
text based traditional methods to filter spam. This happens to be an improvement of the earlier.
In [3][13], a spam filtering technique has been proposed that uses image information (metadata
features) such as file size, area, compressibility etc., and states a characteristics that appears for
each information entity.
On the contrary, [4] identifies spam using a probabilistic boosting tree based on global image
features (low level features), i.e. colour and gradient orientation histograms. In the year 2010, a
feature extraction scheme that concentrates on both low-level and metadata features is proposed
in [5].It does not rely on extracting the text.
In [6], a mechanism is proposed to ascertain spam embedded main body e-mail file, called as
Partial Image Spam Inspector (PIMSI). The significant feature of this method is that it evaluates
both low level features and metadata features to confirm whether a mail is spam or not. It
analyses spam images by dividing it into 2 databases. Database of object image spam consists of
images and its properties such as RGB colours, contrast and brightness. In the database of Vocal
Spam, all keywords of the advertised spam images are recorded and are compared with the text
extracted using OCR.
To overcome the shortcomings of the methods mentioned above, a spam identification model
has been proposed in [7] which does not exploit low level features and OCR to extract text from
images. Instead, it identifies spam by using the visual-BOW (VBOW) based duplicate image
detection and statistical language model. Computation-efficient edge-detection method is used
to locate possible text regions, and then text coverage rate in an image is calculated. Text region
in a large majority of normal image is less than 15%, while text region in most spam is larger
than such a threshold.
2.1. Motivation
The following drawbacks in prior related works have motivated us to develop a method that
mitigates them.
Nowadays, spammers use different image processing technologies to vary the properties of
individual messages e.g. by changing the foreground colours, backgrounds, font types or even
rotating and adding artifacts to the images. Thus, they pose great challenges to conventional
spam filters.
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
165
[4][6][5][8] use colour histograms to distinguish spam images from normal images. Colour
histograms of natural images tend to be continuous, while the colour histograms of artificial
spam images tend to have some isolated peaks. We point out however that the discriminating
capability of the above feature is not likely to be satisfactory, since colour distribution is solely
dependent on the format of the image. Figure 1 shows a sample image and the difference
between its colour histograms when saved with different formats (jpeg and gif) is illustrated in
Figures 2 and 3.
Figure 1. Original Image
Figure 2. Colour histogram of fig 1 in jpeg format
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
166
Figure 3. Colour histogram of fig 1 in gif format
[3][5][6] use metadata features. Different images can have similar (even same) metadata
features. This technology of image spam detection may be wrong and has low accuracy rate.
After carrying out experiments we have found that 58.65% of spam images and 44.28% of
normal images are smaller than 10KB. It implies that metadata features can be similar for both
kind of images. Hence it is not a reliable method to distinguish between spam images from
normal images.
The method mentioned in [7] has helped achieve significant results in identifying spam images
which contain only text. However, few spam images contain both text and images. Figures 4
and 5 show that edge detection is not able to distinguish between text and images. Also edge
detection will detect noise and treat it as text.
3. SYSTEM ARCHITECTURE
We have proposed two methodologies based on low level features and entropy. Figure 6 depicts
the basic framework of the two methodologies employed.
Figure 4. Original Image Figure 5. Edge detection of fig 4
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
167
Figure 6. Basic framework of the two methodologies
3.1. Methodology I
Figure 7 shows the system architecture of methodology I.
Figure 7. Methodology I
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
168
3.1.1. Stage I – Identification of noise
Canny’s edge detection [9] is used to identify noise in images. Images with noise are stored in
set A and images without noise are stored in set B. Edge detection highlights even the slightest
of noise added in an image. Images in Set A are more likely to be spam images. Set A is further
classified into databases, namely -(A1) Dots & Dashes (A2) Lines. This classification is done
on the basis of type of noise usually found in spam images. Figure 9 shows a sample image for
the noise found in Figure 8. They also display the difficulty to identify noise without edge
detection.
Figure 8. Image with noise
Figure 9. Edge detection shows line in fig 8
3.1.2. Stage II – Extraction of low level features
In this stage, all the input images pass through Intensity Plotter [9], which plots the variation of
intensity along a line segment or a multiline path of an image. Since spam images are artificially
generated, we expect their low level features to be different from those of images typically
included as attachments to personal e-mails. The plots thus obtained do not change on varying
the format of the image (from gif to jpeg or vice versa). Stage II has two sub stages. Images
with noise are classified into two sets, S and C. The classification is based on the difference in
the shape of the plots obtained. Figures 10 and 11 show the intensity plots of normal and spam
images respectively. Herein X and Y are two element vectors specifying X and Y data of the
image. Images common to both Set B and Set C are labelled as normal images and are not
processed further. Images in set S are directly passed to stage IV. Images with noise are
classified (as mentioned above) on the basis of plots obtained in two sets, S1 and C1. Images in
set S1 and C1 are passed to stage III.
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
169
Figure 10. Intensity plot for normal images
Figure 11. Intensity plot for spam images
3.1.3. Stage III – Removal of noise
Spammers add noise in images so that it becomes difficult for OCR to read the embedded text.
To overcome this difficulty, obfuscation techniques are applied. Therefore, only images with
noise are passed through this stage. In this technique we alter the RGB properties of the images.
Images thus obtained are stored in set D. Figures 13 and 14 show a comparison between the text
identified by OCR in Figure 12 before and after stage III. Table 1 shows a comparison between
the words identified by OCR before and after stage III.
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
170
Figure 12. Original Image
DEMDEAG ERANDE ING
ClTC:DMGB,PI( —~,I,._-
Teda•,r's Breaking news sent shares up +122% in
just a few minutes, revelutienaryr new preduet in
eenstruetien. r___..F-—— u_
Huge PR campaign is under wa•,r, eemhined with
teda•,r's news there's ne telling where this stuck is
geing te end up, ——
GET IN MDNDAX FEB 12*'* and EXPERIENCE A TRUE
-*"' WINNER
REAL CDMPANX, REAL RE*.•'DLLITIDNAR'f
PRDDUCT
AND REAL LDNG TERM PDTENTIAL
Ti CLIRRENT PRICE: $3.DD "“——
5 DAT EXPECTED: $15.DD
_`_"‘•-T.ADD DMGB TD XDUR RADAR, •,reu went
regret it
Figure 13. Text identified in Figure 12 before applying anti-obfuscation technique
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
171
DEMOBAG BRANDS INC
OTC:DMGB.PK
Today's Breaking news sent shares up +122% in
Just a few minutes, revolutionary new product in
construction.
Huge PR campaign is under way, combined with
today's news there's no telling where this stock is
going to end up,
GET I,N MO.NDAY FEB 12W and EXPERIENCE A TRUE
WINNER
REAL COMPANY, REAL REVOLUTIONARY PRODUCT
AND REALLONG TERMPOTENTIAL
CURRENT PRICE : $3.00
5 DAY EXPECTED : $15.00
ADD DMGB TO YOUR RADAR, you wont regret it
Figure 14. Text identified in Figure 12 after applying anti-obfuscation technique
Table 1. Comparison between the words identified by OCR before and after applying
stage III on Figure 12
Word in image with
noise
Words as read by OCR
before stage III
Words as read by
OCR after stage III
DEMOBAG DEMDEAG DEMOBAG
OTC:DMGB.PK C1TC:DMGB,PIC--,I,- OTC:DMGB.PK
Today’s Teda.,r’s Today’s
Revolutionary Revelutienaryr revolutionary
Construction Eenstruction construction
Product Preduet product
Blank R-----f-----u-
Going Geing going
Way Wa.r way
Stock stuck stock
To te to
MONDAY MDNDAX MO.NDAY
3.1.4. Stage IV – Content Extraction using OCR
Input to this stage comprises of images in set S and D. Images are passed through OCR, which
identifies embedded text and compares it with a list of keywords. If text identified matches with
the list of spam words, then the image is labelled as spam image else it is a normal image. If
OCR fails, then the graphs plotted in stage II are considered. Images in sets S1 and S are
labelled as spam images. Images in set C1 are labelled as normal images.
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
172
3.2. Methodology II
In this methodology a score system is employed in which scores are allotted to each image on
the basis of their results obtained after each stage. Images with a score of 2 or more are
classified as spam images and the rest are classified as normal images.
3.2.1. Stage I – Identification of noise
Canny’s edge detection [9] is used to identify noise in images. Images with noise are more
likely to be spam images hence images with noise are allotted a score of 1 while images without
noise are allotted a score of 0.
Figure 15. Stage I
3.2.1. Stage II – Calculation of entropy
In this stage, entropy of all the images is calculated. ENTROPY returns a scalar value
representing the entropy of an intensity image. Entropy is a statistical measure of randomness
that can be used to characterize the texture of the input image. Entropy is defined as –
sum(p.*log2(p)) where p contains the histogram counts returned from IMHIST. In this stage
score has been allotted to each image in accordance with the value of entropy and its extension.
The basis for the 4 different categories has been explained in the experiments.
Figure 16. Stage II
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
173
3.2.3. Stage III – Removal of noise
Only images with noise are passed through this stage. In this technique we alter the RGB
properties of the images. Images thus obtained are further passed to stage IV.
3.2.4. Stage IV – Content Extraction using OCR
Images are passed through OCR, which identifies embedded text and compares it with a list of
keywords. The images are allotted a score of 1 or 0 depending upon whether text identified
matches or doesn’t match with the list of words respectively.
Figure 17. Stage IV
4. EXPERIMENTS AND RESULTS
We have collected two sets of images to test the filter: spam images and normal images. The
dataset consist of spam images in gif and jpeg format [12]. A set of 1984 images are taken, out
of which 802 are normal images and 1182 are spam images. Experiments are performed on a
system with the following specifications: 32- bit operating system, 2.40 Ghz processor and 4 Gb
RAM. Matlab version 7.7 is used. It employs image processing techniques like Canny‘s edge
detection, intensity plotter and entropy calculator. For applying anti obfuscation techniques, we
make use of an online editor [11]. Free OCR V3 is used to extract the content of the spam
image. The experiment has shown that our technique is effective in identifying spam images in
any format. According to the experimental results, detection rate of methodology I is 0.92, false
positive rate is 0.0064 and false negative rate is 0.059 by calculation. Detection rate of
methodology II is 93.3%. Entropy of 96% of normal images is above 4. The following are the
results.
4.1. Result I
Figure 18. shows a comparison between the execution time of both the methodologies. Average
execution time of methodology I is 0.174 seconds and the average execution time of
methodology II is 0.06 seconds. However, methodology 2 is more time efficient than the first
methodology but it is unable to give correct results when the extension of the image is changed.
When we changed the extension of image from jpeg to gif we found out that the entropy falls by
3 units on an average thus decreasing its detection rate.
Intensity plotter on the other hand is totally independent of the extension of the image and hence
does not show any such variation.
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
174
Figure 18. Comparison between execution time of methodology I and methodology II.
4.2. Result II
Figures 19 and 20 show the efficiency of spam and normal images which were correctly
identified by using methodology I. We have tabulated the results by classifying hams and spam
images into the following categories.
Spam Images: Advertisement, URL (Uniform Resource Locator), Pornography, Stock
Normal Images: Only Text, Text and Images, Only Images.
4.2.1. Discussion
Hams with only images are easiest to identify because the range of colour components used is
quite vast. Their intensity plots are curved and continuous. Their plots are easily distinguishable
Figure 19. Histogram showing
efficiency of spam images
Figure 20. Histogram showing
efficiency of normal images
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
175
from those of spam images. Hams with only text consist mostly of survey forms, documents and
newspaper articles. Due to the use of limited colours (mostly black and white) intensity plots of
these images may be similar to that of spam images. Hence their efficiency is less than that of
hams with only images. However hams with only text have efficiency higher than that of hams
with both text and images because the former can be easily read by OCR .The presence of
colourful images in the background of the latter makes it difficult for OCR to read the
embedded text. Spam images concerning Advertisements are made attractive by adding
colourful images and text so that they look like real advertisements. Hence, they are often
confused with hams with both text and images and are toughest to identify.
4.3. Result III
We have taken five samples of a single image and increased its brightness from 20% to 80%.
Table 2 shows that OCR depends on the brightness of images. Tick (✓) shows images that
OCR could read, cross (×) shows images that OCR could not read and P shows images that
OCR could partially read.
Table 2. Effectiveness of OCR on varying brightness of images
Type of images 20% 35% 50% 65% 80%
URL Spam × ✓ ✓ ✓ ×
Stock Spam × ✓ ✓ P ×
Vulgar Spam × ✓ ✓ P ×
Advertisement Spam × ✓ P P ×
Noise Spam × ✓ ✓ ✓ ×
Ham with only text ✓ ✓ ✓ ✓ ✓
4.3.1. Discussion
OCR reads successfully the embedded text in an image if its brightness is confined to a
particular range. However, it fails if brightness of an image deviates from a particular threshold
having both upper and lower limit.
4.4. Result IV
We have varied the background colour of an image keeping its font colour constant. We have
set the font colour at [R:255 G:0 B:0] and decreased the background colour from [R:255 G:240
B:240] to [R:255 G:0 B:0] in sets of 30 keeping R constant. Figure 21 shows that the detection
rate of OCR decreases as background colour approaches font colour. Black dots ( ) show that
OCR has failed. Red dots ( ) show that OCR has been successful. The largest circle is [R:255
G:0 B:0].
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
176
Figure 21. Detection rate of OCR decreases as background colour approaches font colour.
4.4.1. Discussion
OCR reads successfully the embedded text in an image if there exists a dissimilarity between
font colour and background colour. As font colour approaches background colour it becomes
difficult for OCR to read text.
4.5. Result V
We have artificially added 'salt & pepper' noise in spam image, and have gradually increased the
amount of noise added from 0.005 to 0.020. On increasing noise, distortion in the graphs
obtained by intensity plotter also increases. Also entropy of normal image decreases by 20%
approximately as amount of noise added increases.
4.5.1. Discussion
4.5.1. Discussion
Intensity plot of artificial spam images tends to be straight. Intensity plotter gives correct plot
for spam images till a particular amount of added noise, above that it gives a distorted plot for
spam images.
Figure 22. Intensity plot (noise is less) Figure 23. Intensity plot (noise is more)
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
177
5. CONCLUSION
In this paper we present two methodologies for addressing image spam problems, however
more methodologies can be proposed. It takes into account some of the recent evolutions of the
spammers tricks in which obfuscation techniques are used to the extent that standard OCR tools
become nearly ineffective. Methodology I extracts low level features while methodology II
extracts metadata features of the image. Both the methodologies have their advantages and
disadvantages. Experiments reveal that time taken by methodology II is less compared to
methodology I, but entropy is not independent of the extension of image and shows a decreases
in value when extension of the image changes from jpg to gif. However, intensity plotter is
independent of extension and is effective in distinguishing between spam and normal images.
Anti-obfuscation techniques are applied to improve text filtering performance of the system.
According to the experimental results, we have concluded that the detection rate depends on the
type of spam images, i.e. whether it contains only text, text and images or images only.
Spammers add noise and changes brightness of an image upto an extent only because then it
hampers the clarity of the images and the user is unable to read the text.
REFERENCES
[1] G. Fumera, I. Pillai and F. Roli, “Spam Filtering Based On The Analysis Of Text Information
Embedded Into Image”, Journal of Machine Learning Research (JMLR), vol. 7, pp. 2699-2720,
December 2006.
[2] Apache.org (2011). The apache spamassassin project. Webpage (last accessed May 3, 2011):
http://spamassassin.apache.org/index.html.
[3] M. Uemura and T. Tabata, “Design and Evaluation of a Bayesian-filter-based Image Spam
Filtering Method”, in Proceedings of the 2nd
International Conference on Information Security
and Assurance (ISA '08), pp. 46-51, Busan, Korea, 24-26 April 2008.
[4] G. Yan, Y. Ming, Z. Xiaonan, B. Pardo, W. Ying, T. N. Pappas and A. Choudhary, ” Image
Spam Hunter”, in Proceeding of the International Conference on Acoustics, Speech and Signal
Processing (ICASSP '08), pp. 1765-1768, Las Vegas, Nevada, USA, 30 March- 4 April 2008.
[5] C. Wang, F. Zhang, F. Li and Q. Liu, “Image Spam Classification based on Low Level Image
Features”, in Proceeding of the 8th
International Conference on Communications, Circuits and
Systems (ICCCAS '10), pp. 290-293, Chengdu China, 28-30 July 2010.
[6] P. Klangpraphant, and P. Bhattarakosol, “PIMSI: A Partial Image Spam Inspector”, in
Proceeding of the 5th
International Conference on Future Information Technology (FutureTech),
pp. 1-6, Busan, South Korea, 21-23 May 2010.
[7] J. H. Hsia and M. S. Chen, “Language-Model-based Detection Cascade for Efficient
Classification of Image-based Spam e-mail”, in Proceeding of the International Conference on
Multimedia and Expo (ICME '09), pp. 1182-1185, New York USA, 28 June-3 July 2009.
[8] M. Soranamageswari and C. Meena, “Statistical Feature Extraction for Classification of Image
Spam Using Artificial Neural Networks”, in Proceeding of the 2nd
International Conference on
Machine Learning and Computing (ICMLC '10), pp. 101-105, Bangalore India, 9-11 February
2010.
[9] Mathworks The Matlab image processing toolbox.M, Available at
http://www.mathworks.com/access/helpdesk/help/toolbox/images/.. downloaded on july 10
[10] Bag of Visual words Model: Recognizing Object Categories, Available at
http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf.
[11] Image editor, http://www.lunapic.com/editor/?action=contrast, downloaded on july 10,2011
International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012
178
[12] Image spam dataset, http://www.cs.jhu.edu/~mdredze/datasets/image_spam/, downloaded on
june 3,2011.
[13] Bhaskar Mehta, Saurabh Nangia, Manish Gupta, “Detecting Image Spam using Visual Features
and Near Duplicate Detection”, in Proceeding of the International World Wide Web Conference
Committee (IW3C2), April 21-25, 2008.

More Related Content

What's hot

Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformIOSR Journals
 
Off-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative SurveyOff-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative Surveyidescitation
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Text extraction from images
Text extraction from imagesText extraction from images
Text extraction from imagesGarby Baby
 
Representation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templatesRepresentation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templatesAhmed Abd-Elwasaa
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a surveyeSAT Journals
 
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...
IRJET-  	  Detection and Recognition of Text for Dusty Image using Long Short...IRJET-  	  Detection and Recognition of Text for Dusty Image using Long Short...
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...IRJET Journal
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
 
Text Extraction from Image using Python
Text Extraction from Image using PythonText Extraction from Image using Python
Text Extraction from Image using Pythonijtsrd
 
Ocr accuracy improvement on
Ocr accuracy improvement onOcr accuracy improvement on
Ocr accuracy improvement onsipij
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a surveySOYEON KIM
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Divya Gera
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...Editor IJMTER
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 

What's hot (16)

Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR TransformText Extraction of Colour Images using Mathematical Morphology & HAAR Transform
Text Extraction of Colour Images using Mathematical Morphology & HAAR Transform
 
Off-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative SurveyOff-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative Survey
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Text extraction from images
Text extraction from imagesText extraction from images
Text extraction from images
 
Representation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templatesRepresentation and recognition of handwirten digits using deformable templates
Representation and recognition of handwirten digits using deformable templates
 
Inpainting scheme for text in video a survey
Inpainting scheme for text in video   a surveyInpainting scheme for text in video   a survey
Inpainting scheme for text in video a survey
 
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...
IRJET-  	  Detection and Recognition of Text for Dusty Image using Long Short...IRJET-  	  Detection and Recognition of Text for Dusty Image using Long Short...
IRJET- Detection and Recognition of Text for Dusty Image using Long Short...
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In Videos
 
Text Extraction from Image using Python
Text Extraction from Image using PythonText Extraction from Image using Python
Text Extraction from Image using Python
 
Ocr accuracy improvement on
Ocr accuracy improvement onOcr accuracy improvement on
Ocr accuracy improvement on
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a survey
 
E1803012329
E1803012329E1803012329
E1803012329
 
Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...Texture features based text extraction from images using DWT and K-means clus...
Texture features based text extraction from images using DWT and K-means clus...
 
C1803011419
C1803011419C1803011419
C1803011419
 
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
CHARACTER RECOGNITION USING NEURAL NETWORK WITHOUT FEATURE EXTRACTION FOR KAN...
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 

Similar to IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES

Spam image email filtering using K-NN and SVM
Spam image email filtering using K-NN and SVMSpam image email filtering using K-NN and SVM
Spam image email filtering using K-NN and SVMIJECEIAES
 
Blank Background Image Lossless Compression Technique
Blank Background Image Lossless Compression TechniqueBlank Background Image Lossless Compression Technique
Blank Background Image Lossless Compression TechniqueCSCJournals
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
IRJET- Real-Time Text Reader for English Language
IRJET- Real-Time Text Reader for English LanguageIRJET- Real-Time Text Reader for English Language
IRJET- Real-Time Text Reader for English LanguageIRJET Journal
 
On Text Realization Image Steganography
On Text Realization Image SteganographyOn Text Realization Image Steganography
On Text Realization Image SteganographyCSCJournals
 
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSBON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSBijcsit
 
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...Mohammad Liton Hossain
 
A Survey Paper on Character Recognition
A Survey Paper on Character RecognitionA Survey Paper on Character Recognition
A Survey Paper on Character Recognitionijsrd.com
 
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICES
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICESA HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICES
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICESijcsit
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosCSCJournals
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
IRJET- Face Spoof Detection using Machine Learning with Colour Features
IRJET-  	  Face Spoof Detection using Machine Learning with Colour FeaturesIRJET-  	  Face Spoof Detection using Machine Learning with Colour Features
IRJET- Face Spoof Detection using Machine Learning with Colour FeaturesIRJET Journal
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersCSCJournals
 
Ijarcet vol-2-issue-3-1078-1080
Ijarcet vol-2-issue-3-1078-1080Ijarcet vol-2-issue-3-1078-1080
Ijarcet vol-2-issue-3-1078-1080Editor IJARCET
 
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...Frequency based edge-texture feature using Otsu’s based enhanced local ternar...
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...journalBEEI
 

Similar to IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES (20)

E017322833
E017322833E017322833
E017322833
 
Spam image email filtering using K-NN and SVM
Spam image email filtering using K-NN and SVMSpam image email filtering using K-NN and SVM
Spam image email filtering using K-NN and SVM
 
Blank Background Image Lossless Compression Technique
Blank Background Image Lossless Compression TechniqueBlank Background Image Lossless Compression Technique
Blank Background Image Lossless Compression Technique
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
Sub1586
Sub1586Sub1586
Sub1586
 
IRJET- Real-Time Text Reader for English Language
IRJET- Real-Time Text Reader for English LanguageIRJET- Real-Time Text Reader for English Language
IRJET- Real-Time Text Reader for English Language
 
On Text Realization Image Steganography
On Text Realization Image SteganographyOn Text Realization Image Steganography
On Text Realization Image Steganography
 
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSBON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
ON THE IMAGE QUALITY AND ENCODING TIMES OF LSB, MSB AND COMBINED LSB-MSB
 
A Survey of Image Based Steganography
A Survey of Image Based SteganographyA Survey of Image Based Steganography
A Survey of Image Based Steganography
 
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
DEVELOPMENT OF AN ALPHABETIC CHARACTER RECOGNITION SYSTEM USING MATLAB FOR BA...
 
A Survey Paper on Character Recognition
A Survey Paper on Character RecognitionA Survey Paper on Character Recognition
A Survey Paper on Character Recognition
 
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICES
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICESA HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICES
A HYBRID COPY-MOVE FORGERY DETECTION TECHNIQUE USING REGIONAL SIMILARITY INDICES
 
A Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In VideosA Survey On Thresholding Operators of Text Extraction In Videos
A Survey On Thresholding Operators of Text Extraction In Videos
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
IRJET- Face Spoof Detection using Machine Learning with Colour Features
IRJET-  	  Face Spoof Detection using Machine Learning with Colour FeaturesIRJET-  	  Face Spoof Detection using Machine Learning with Colour Features
IRJET- Face Spoof Detection using Machine Learning with Colour Features
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi Characters
 
40120140501009
4012014050100940120140501009
40120140501009
 
Ijarcet vol-2-issue-3-1078-1080
Ijarcet vol-2-issue-3-1078-1080Ijarcet vol-2-issue-3-1078-1080
Ijarcet vol-2-issue-3-1078-1080
 
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...Frequency based edge-texture feature using Otsu’s based enhanced local ternar...
Frequency based edge-texture feature using Otsu’s based enhanced local ternar...
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 

Recently uploaded

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 

Recently uploaded (20)

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 

IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES

  • 1. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 DOI : 10.5121/ijnsa.2012.4213 163 IDENTIFICATION OF IMAGE SPAM BY USING LOW LEVEL & METADATA FEATURES Anand Gupta1 , Chhavi Singhal2 and Somya Aggarwal1 1 Department of Computer Engineering, 2 Department of Electronic and Communication Engineering Netaji Subhas Institute of Technology, New Delhi, India {omaranand, chhavisinghal28, somya3322}@gmail.com ABSTRACT Spammers are constantly evolving new spam technologies, the latest of which is image spam. Till now research in spam image identification has been addressed by considering properties like colour, size, compressibility, entropy, content etc. However, we feel the methods of identification so evolved have certain limitations due to embedded obfuscation like complex backgrounds, compression artifacts and wide variety of fonts and formats .To overcome these limitations, we have proposed 2 methodologies(however there can be more). Each methodology has 4 stages. Both the methodologies are almost similar except in the second stage where methodology I extracts low level features while the other extracts metadata features. Also a comparison between both the methodologies is shown. The method works on images with and without noise separately. Colour properties of the images are altered so that OCR (Optical Character Recognition) can easily read the text embedded in the image. The proposed methods are tested on a dataset of 1984 spam images and are found to be effective in identifying all types of spam images having (1) only text, (2) only images or (3) both text and images. The encouraging experimental results show that the methodology I achieves an accuracy of 92% while the other achieves an accuracy of 93.3%. KEYWORDS Low level feature, anti obfuscation technique, noise & entropy 1. INTRODUCTION Image spam is a kind of spam in e-mail where the message text of the spam is presented as an image file. Anti spam filters label an e-mail (with image attached) as spam if they find suspicious text embedded in that image. For that, the filters employ OCR that reads text embedded in images. It works by measuring the geometry in images, searching for shapes that match the shapes of letters, then translating a matched geometric shape into real text. To defeat OCR, spammers upset the geometry of letters enough—by altering colours, for example—so that OCR can't "see" a letter, even though the human eye easily recognize it. To overcome this falsity, low level and metadata features of images are extracted as they are effective against randomly added noises and simple translational shift of the images. We now review the prior significant work in the area of image spam identification. 2. PRIOR WORK Till now spam identification has been carried out by considering the following spam image properties. 1. Content (C) 2. Metadata features (M) 3. Low level features (L) 4. Text region (T)
  • 2. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 164 The following matrix shows the properties as used in the previous works. Whereas the left most column shows the reference numbers, the top most row shows the properties employed as given above. Numerals ‘1’ and ‘0’ mean that the given property is used and not used respectively. In [1], a scheme is proposed which implements a spam filter based on the text in the subject and body fields of e-mails, and the text embedded into attached images. The conventional document processing steps (tokenization, indexing and classification) are improved upon in [1] by employing text extraction using OCR from attached images. SpamAssassin (SA)[2] is a widely deployed filter program that uses OCR software to pull words out of images and then uses the text based traditional methods to filter spam. This happens to be an improvement of the earlier. In [3][13], a spam filtering technique has been proposed that uses image information (metadata features) such as file size, area, compressibility etc., and states a characteristics that appears for each information entity. On the contrary, [4] identifies spam using a probabilistic boosting tree based on global image features (low level features), i.e. colour and gradient orientation histograms. In the year 2010, a feature extraction scheme that concentrates on both low-level and metadata features is proposed in [5].It does not rely on extracting the text. In [6], a mechanism is proposed to ascertain spam embedded main body e-mail file, called as Partial Image Spam Inspector (PIMSI). The significant feature of this method is that it evaluates both low level features and metadata features to confirm whether a mail is spam or not. It analyses spam images by dividing it into 2 databases. Database of object image spam consists of images and its properties such as RGB colours, contrast and brightness. In the database of Vocal Spam, all keywords of the advertised spam images are recorded and are compared with the text extracted using OCR. To overcome the shortcomings of the methods mentioned above, a spam identification model has been proposed in [7] which does not exploit low level features and OCR to extract text from images. Instead, it identifies spam by using the visual-BOW (VBOW) based duplicate image detection and statistical language model. Computation-efficient edge-detection method is used to locate possible text regions, and then text coverage rate in an image is calculated. Text region in a large majority of normal image is less than 15%, while text region in most spam is larger than such a threshold. 2.1. Motivation The following drawbacks in prior related works have motivated us to develop a method that mitigates them. Nowadays, spammers use different image processing technologies to vary the properties of individual messages e.g. by changing the foreground colours, backgrounds, font types or even rotating and adding artifacts to the images. Thus, they pose great challenges to conventional spam filters.
  • 3. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 165 [4][6][5][8] use colour histograms to distinguish spam images from normal images. Colour histograms of natural images tend to be continuous, while the colour histograms of artificial spam images tend to have some isolated peaks. We point out however that the discriminating capability of the above feature is not likely to be satisfactory, since colour distribution is solely dependent on the format of the image. Figure 1 shows a sample image and the difference between its colour histograms when saved with different formats (jpeg and gif) is illustrated in Figures 2 and 3. Figure 1. Original Image Figure 2. Colour histogram of fig 1 in jpeg format
  • 4. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 166 Figure 3. Colour histogram of fig 1 in gif format [3][5][6] use metadata features. Different images can have similar (even same) metadata features. This technology of image spam detection may be wrong and has low accuracy rate. After carrying out experiments we have found that 58.65% of spam images and 44.28% of normal images are smaller than 10KB. It implies that metadata features can be similar for both kind of images. Hence it is not a reliable method to distinguish between spam images from normal images. The method mentioned in [7] has helped achieve significant results in identifying spam images which contain only text. However, few spam images contain both text and images. Figures 4 and 5 show that edge detection is not able to distinguish between text and images. Also edge detection will detect noise and treat it as text. 3. SYSTEM ARCHITECTURE We have proposed two methodologies based on low level features and entropy. Figure 6 depicts the basic framework of the two methodologies employed. Figure 4. Original Image Figure 5. Edge detection of fig 4
  • 5. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 167 Figure 6. Basic framework of the two methodologies 3.1. Methodology I Figure 7 shows the system architecture of methodology I. Figure 7. Methodology I
  • 6. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 168 3.1.1. Stage I – Identification of noise Canny’s edge detection [9] is used to identify noise in images. Images with noise are stored in set A and images without noise are stored in set B. Edge detection highlights even the slightest of noise added in an image. Images in Set A are more likely to be spam images. Set A is further classified into databases, namely -(A1) Dots & Dashes (A2) Lines. This classification is done on the basis of type of noise usually found in spam images. Figure 9 shows a sample image for the noise found in Figure 8. They also display the difficulty to identify noise without edge detection. Figure 8. Image with noise Figure 9. Edge detection shows line in fig 8 3.1.2. Stage II – Extraction of low level features In this stage, all the input images pass through Intensity Plotter [9], which plots the variation of intensity along a line segment or a multiline path of an image. Since spam images are artificially generated, we expect their low level features to be different from those of images typically included as attachments to personal e-mails. The plots thus obtained do not change on varying the format of the image (from gif to jpeg or vice versa). Stage II has two sub stages. Images with noise are classified into two sets, S and C. The classification is based on the difference in the shape of the plots obtained. Figures 10 and 11 show the intensity plots of normal and spam images respectively. Herein X and Y are two element vectors specifying X and Y data of the image. Images common to both Set B and Set C are labelled as normal images and are not processed further. Images in set S are directly passed to stage IV. Images with noise are classified (as mentioned above) on the basis of plots obtained in two sets, S1 and C1. Images in set S1 and C1 are passed to stage III.
  • 7. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 169 Figure 10. Intensity plot for normal images Figure 11. Intensity plot for spam images 3.1.3. Stage III – Removal of noise Spammers add noise in images so that it becomes difficult for OCR to read the embedded text. To overcome this difficulty, obfuscation techniques are applied. Therefore, only images with noise are passed through this stage. In this technique we alter the RGB properties of the images. Images thus obtained are stored in set D. Figures 13 and 14 show a comparison between the text identified by OCR in Figure 12 before and after stage III. Table 1 shows a comparison between the words identified by OCR before and after stage III.
  • 8. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 170 Figure 12. Original Image DEMDEAG ERANDE ING ClTC:DMGB,PI( —~,I,._- Teda•,r's Breaking news sent shares up +122% in just a few minutes, revelutienaryr new preduet in eenstruetien. r___..F-—— u_ Huge PR campaign is under wa•,r, eemhined with teda•,r's news there's ne telling where this stuck is geing te end up, —— GET IN MDNDAX FEB 12*'* and EXPERIENCE A TRUE -*"' WINNER REAL CDMPANX, REAL RE*.•'DLLITIDNAR'f PRDDUCT AND REAL LDNG TERM PDTENTIAL Ti CLIRRENT PRICE: $3.DD "“—— 5 DAT EXPECTED: $15.DD _`_"‘•-T.ADD DMGB TD XDUR RADAR, •,reu went regret it Figure 13. Text identified in Figure 12 before applying anti-obfuscation technique
  • 9. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 171 DEMOBAG BRANDS INC OTC:DMGB.PK Today's Breaking news sent shares up +122% in Just a few minutes, revolutionary new product in construction. Huge PR campaign is under way, combined with today's news there's no telling where this stock is going to end up, GET I,N MO.NDAY FEB 12W and EXPERIENCE A TRUE WINNER REAL COMPANY, REAL REVOLUTIONARY PRODUCT AND REALLONG TERMPOTENTIAL CURRENT PRICE : $3.00 5 DAY EXPECTED : $15.00 ADD DMGB TO YOUR RADAR, you wont regret it Figure 14. Text identified in Figure 12 after applying anti-obfuscation technique Table 1. Comparison between the words identified by OCR before and after applying stage III on Figure 12 Word in image with noise Words as read by OCR before stage III Words as read by OCR after stage III DEMOBAG DEMDEAG DEMOBAG OTC:DMGB.PK C1TC:DMGB,PIC--,I,- OTC:DMGB.PK Today’s Teda.,r’s Today’s Revolutionary Revelutienaryr revolutionary Construction Eenstruction construction Product Preduet product Blank R-----f-----u- Going Geing going Way Wa.r way Stock stuck stock To te to MONDAY MDNDAX MO.NDAY 3.1.4. Stage IV – Content Extraction using OCR Input to this stage comprises of images in set S and D. Images are passed through OCR, which identifies embedded text and compares it with a list of keywords. If text identified matches with the list of spam words, then the image is labelled as spam image else it is a normal image. If OCR fails, then the graphs plotted in stage II are considered. Images in sets S1 and S are labelled as spam images. Images in set C1 are labelled as normal images.
  • 10. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 172 3.2. Methodology II In this methodology a score system is employed in which scores are allotted to each image on the basis of their results obtained after each stage. Images with a score of 2 or more are classified as spam images and the rest are classified as normal images. 3.2.1. Stage I – Identification of noise Canny’s edge detection [9] is used to identify noise in images. Images with noise are more likely to be spam images hence images with noise are allotted a score of 1 while images without noise are allotted a score of 0. Figure 15. Stage I 3.2.1. Stage II – Calculation of entropy In this stage, entropy of all the images is calculated. ENTROPY returns a scalar value representing the entropy of an intensity image. Entropy is a statistical measure of randomness that can be used to characterize the texture of the input image. Entropy is defined as – sum(p.*log2(p)) where p contains the histogram counts returned from IMHIST. In this stage score has been allotted to each image in accordance with the value of entropy and its extension. The basis for the 4 different categories has been explained in the experiments. Figure 16. Stage II
  • 11. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 173 3.2.3. Stage III – Removal of noise Only images with noise are passed through this stage. In this technique we alter the RGB properties of the images. Images thus obtained are further passed to stage IV. 3.2.4. Stage IV – Content Extraction using OCR Images are passed through OCR, which identifies embedded text and compares it with a list of keywords. The images are allotted a score of 1 or 0 depending upon whether text identified matches or doesn’t match with the list of words respectively. Figure 17. Stage IV 4. EXPERIMENTS AND RESULTS We have collected two sets of images to test the filter: spam images and normal images. The dataset consist of spam images in gif and jpeg format [12]. A set of 1984 images are taken, out of which 802 are normal images and 1182 are spam images. Experiments are performed on a system with the following specifications: 32- bit operating system, 2.40 Ghz processor and 4 Gb RAM. Matlab version 7.7 is used. It employs image processing techniques like Canny‘s edge detection, intensity plotter and entropy calculator. For applying anti obfuscation techniques, we make use of an online editor [11]. Free OCR V3 is used to extract the content of the spam image. The experiment has shown that our technique is effective in identifying spam images in any format. According to the experimental results, detection rate of methodology I is 0.92, false positive rate is 0.0064 and false negative rate is 0.059 by calculation. Detection rate of methodology II is 93.3%. Entropy of 96% of normal images is above 4. The following are the results. 4.1. Result I Figure 18. shows a comparison between the execution time of both the methodologies. Average execution time of methodology I is 0.174 seconds and the average execution time of methodology II is 0.06 seconds. However, methodology 2 is more time efficient than the first methodology but it is unable to give correct results when the extension of the image is changed. When we changed the extension of image from jpeg to gif we found out that the entropy falls by 3 units on an average thus decreasing its detection rate. Intensity plotter on the other hand is totally independent of the extension of the image and hence does not show any such variation.
  • 12. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 174 Figure 18. Comparison between execution time of methodology I and methodology II. 4.2. Result II Figures 19 and 20 show the efficiency of spam and normal images which were correctly identified by using methodology I. We have tabulated the results by classifying hams and spam images into the following categories. Spam Images: Advertisement, URL (Uniform Resource Locator), Pornography, Stock Normal Images: Only Text, Text and Images, Only Images. 4.2.1. Discussion Hams with only images are easiest to identify because the range of colour components used is quite vast. Their intensity plots are curved and continuous. Their plots are easily distinguishable Figure 19. Histogram showing efficiency of spam images Figure 20. Histogram showing efficiency of normal images
  • 13. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 175 from those of spam images. Hams with only text consist mostly of survey forms, documents and newspaper articles. Due to the use of limited colours (mostly black and white) intensity plots of these images may be similar to that of spam images. Hence their efficiency is less than that of hams with only images. However hams with only text have efficiency higher than that of hams with both text and images because the former can be easily read by OCR .The presence of colourful images in the background of the latter makes it difficult for OCR to read the embedded text. Spam images concerning Advertisements are made attractive by adding colourful images and text so that they look like real advertisements. Hence, they are often confused with hams with both text and images and are toughest to identify. 4.3. Result III We have taken five samples of a single image and increased its brightness from 20% to 80%. Table 2 shows that OCR depends on the brightness of images. Tick (✓) shows images that OCR could read, cross (×) shows images that OCR could not read and P shows images that OCR could partially read. Table 2. Effectiveness of OCR on varying brightness of images Type of images 20% 35% 50% 65% 80% URL Spam × ✓ ✓ ✓ × Stock Spam × ✓ ✓ P × Vulgar Spam × ✓ ✓ P × Advertisement Spam × ✓ P P × Noise Spam × ✓ ✓ ✓ × Ham with only text ✓ ✓ ✓ ✓ ✓ 4.3.1. Discussion OCR reads successfully the embedded text in an image if its brightness is confined to a particular range. However, it fails if brightness of an image deviates from a particular threshold having both upper and lower limit. 4.4. Result IV We have varied the background colour of an image keeping its font colour constant. We have set the font colour at [R:255 G:0 B:0] and decreased the background colour from [R:255 G:240 B:240] to [R:255 G:0 B:0] in sets of 30 keeping R constant. Figure 21 shows that the detection rate of OCR decreases as background colour approaches font colour. Black dots ( ) show that OCR has failed. Red dots ( ) show that OCR has been successful. The largest circle is [R:255 G:0 B:0].
  • 14. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 176 Figure 21. Detection rate of OCR decreases as background colour approaches font colour. 4.4.1. Discussion OCR reads successfully the embedded text in an image if there exists a dissimilarity between font colour and background colour. As font colour approaches background colour it becomes difficult for OCR to read text. 4.5. Result V We have artificially added 'salt & pepper' noise in spam image, and have gradually increased the amount of noise added from 0.005 to 0.020. On increasing noise, distortion in the graphs obtained by intensity plotter also increases. Also entropy of normal image decreases by 20% approximately as amount of noise added increases. 4.5.1. Discussion 4.5.1. Discussion Intensity plot of artificial spam images tends to be straight. Intensity plotter gives correct plot for spam images till a particular amount of added noise, above that it gives a distorted plot for spam images. Figure 22. Intensity plot (noise is less) Figure 23. Intensity plot (noise is more)
  • 15. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 177 5. CONCLUSION In this paper we present two methodologies for addressing image spam problems, however more methodologies can be proposed. It takes into account some of the recent evolutions of the spammers tricks in which obfuscation techniques are used to the extent that standard OCR tools become nearly ineffective. Methodology I extracts low level features while methodology II extracts metadata features of the image. Both the methodologies have their advantages and disadvantages. Experiments reveal that time taken by methodology II is less compared to methodology I, but entropy is not independent of the extension of image and shows a decreases in value when extension of the image changes from jpg to gif. However, intensity plotter is independent of extension and is effective in distinguishing between spam and normal images. Anti-obfuscation techniques are applied to improve text filtering performance of the system. According to the experimental results, we have concluded that the detection rate depends on the type of spam images, i.e. whether it contains only text, text and images or images only. Spammers add noise and changes brightness of an image upto an extent only because then it hampers the clarity of the images and the user is unable to read the text. REFERENCES [1] G. Fumera, I. Pillai and F. Roli, “Spam Filtering Based On The Analysis Of Text Information Embedded Into Image”, Journal of Machine Learning Research (JMLR), vol. 7, pp. 2699-2720, December 2006. [2] Apache.org (2011). The apache spamassassin project. Webpage (last accessed May 3, 2011): http://spamassassin.apache.org/index.html. [3] M. Uemura and T. Tabata, “Design and Evaluation of a Bayesian-filter-based Image Spam Filtering Method”, in Proceedings of the 2nd International Conference on Information Security and Assurance (ISA '08), pp. 46-51, Busan, Korea, 24-26 April 2008. [4] G. Yan, Y. Ming, Z. Xiaonan, B. Pardo, W. Ying, T. N. Pappas and A. Choudhary, ” Image Spam Hunter”, in Proceeding of the International Conference on Acoustics, Speech and Signal Processing (ICASSP '08), pp. 1765-1768, Las Vegas, Nevada, USA, 30 March- 4 April 2008. [5] C. Wang, F. Zhang, F. Li and Q. Liu, “Image Spam Classification based on Low Level Image Features”, in Proceeding of the 8th International Conference on Communications, Circuits and Systems (ICCCAS '10), pp. 290-293, Chengdu China, 28-30 July 2010. [6] P. Klangpraphant, and P. Bhattarakosol, “PIMSI: A Partial Image Spam Inspector”, in Proceeding of the 5th International Conference on Future Information Technology (FutureTech), pp. 1-6, Busan, South Korea, 21-23 May 2010. [7] J. H. Hsia and M. S. Chen, “Language-Model-based Detection Cascade for Efficient Classification of Image-based Spam e-mail”, in Proceeding of the International Conference on Multimedia and Expo (ICME '09), pp. 1182-1185, New York USA, 28 June-3 July 2009. [8] M. Soranamageswari and C. Meena, “Statistical Feature Extraction for Classification of Image Spam Using Artificial Neural Networks”, in Proceeding of the 2nd International Conference on Machine Learning and Computing (ICMLC '10), pp. 101-105, Bangalore India, 9-11 February 2010. [9] Mathworks The Matlab image processing toolbox.M, Available at http://www.mathworks.com/access/helpdesk/help/toolbox/images/.. downloaded on july 10 [10] Bag of Visual words Model: Recognizing Object Categories, Available at http://www.robots.ox.ac.uk/~az/icvss08_az_bow.pdf. [11] Image editor, http://www.lunapic.com/editor/?action=contrast, downloaded on july 10,2011
  • 16. International Journal of Network Security & Its Applications (IJNSA), Vol.4, No.2, March 2012 178 [12] Image spam dataset, http://www.cs.jhu.edu/~mdredze/datasets/image_spam/, downloaded on june 3,2011. [13] Bhaskar Mehta, Saurabh Nangia, Manish Gupta, “Detecting Image Spam using Visual Features and Near Duplicate Detection”, in Proceeding of the International World Wide Web Conference Committee (IW3C2), April 21-25, 2008.