SlideShare a Scribd company logo
Group Details:- 
Dhara Shah z3299353 
Imad Hashmi z3193866 
Zuo Cui z3261136 
Our Paper:- Y. Xie , F. Yu, K. Achan , R. Panigraphy , G. Hulten and I. Osipkov , Spamming Botnets: Signatures and Characteristics, in Proceedings of ACM SIGCOMM 2008, pp. 171-182, Seattle, USA August 2008. 
Is this paper technically sound? 
Paper is based on the experiments conducted on 3 months data collected from the Hotmail‟s Server. To simulate similar results we needed the algorithm or rules used in the AutoRE software to generate regular expression and data on which experiments could be conducted. 
To get the details of the software we tried contacting the Authors but unfortunately could not receive any reply from them (proof attached in appendix). We suspect that as it‟s a Microsoft group research and commercial product details are confidential. Hence we tried looking at the open source spam detection software to understand working of AutoRE. We could not compare the techniques used by the open source Spam Detection Software and AutoRE as we didn‟t had all details of AutoRE. 
There are a number of spam detection tools available both commercial and open source but none of them is based on signatures. The idea in this paper is genuine and novel because other content based filters do not generate signatures and rely on a complete scan of the email. Following are some of the rules used to identify a spam URL[3]. We discuss URLs only because AutoRE works with URLs only: 
 Uses a numeric IP address in URL 
 Uses %-escapes inside a URL's hostname 
 Completely unnecessary %-escapes inside a URL 
 Dotted-decimal IP address followed by CGI 
 Uses non-standard port number for HTTP 
 Has Yahoo Redirect URI 
 Contains an URL-encoded hostname (HTTP77) 
 URI contains ".com" in middle 
 URI contains ".com" in middle and end 
 URI contains ".net" or ".org", then ".com" 
 URI hostname has long hexadecimal sequence 
 URI hostname has long non-vowel sequence 
 CGI in .info TLD other than third-level "www" 
 CGI in .biz TLD other than third-level "www"
There is a long list of email header criteria which can be applied to identify spam but that is beyond the scope here. 
Next was we tried collecting data from the University‟s Mail server to verify the characteristics about the spam emails mentioned in the paper (proof attached in the appendix). But due to security issues concerned with the university we couldn‟t get the data. Hence we redirected our yahoo, Gmail and hotmail accounts to Cse account. And then accessing the Cse account via “pine” utility. Pine is a text based email reader which enables us to see detailed email headers. We tried distinguishing the email header of the Spam Email and a legitimate Email. But as Cse doesn‟t have an anti spam technology applied to it, it relies on the University‟s server for this. We verified this by observing that all the emails coming to Cse are being forwarded by the University‟s server. Also we understood that even if the user marks a email as spam, the system does not categorize it as spam until it satisfy the basic property of burstiness. We classified few legitimate email-ids as spam but the email server never classified it as spam as they were never sending in bulk. 
Result from Pine is as follows:- 
INFPACM003.services.comms.unsw.edu.au ([149.171.193.26]) (IP doesn't match sender domain) 
(for <dsha472@cse.unsw.edu.au>) By note With Smtp ; 
Fri, 18 Jun 2010 20:23:12 +1000 
Received: from mta156.mail.in.yahoo.com ([203.84.221.168]) by INFPACM003.services.comms.unsw.edu.au with SMTP; 18 Jun 2010 20:02:46 
+1000 
Received: from 68.142.207.198 (HELO web32405.mail.mud.yahoo.com) 
(68.142.207.198) by mta156.mail.in.yahoo.com with SMTP; Fri, 18 Jun 2010 15:53:07 +0530 
Received: (qmail 20395 invoked by uid 60001); 18 Jun 2010 10:23:04 -0000 
Received: from [117.193.43.248] by web32405.mail.mud.yahoo.com via HTTP; Fri 
, 18 Jun 2010 03:23:03 PDT 
Received: From INFPACM001.services.comms.unsw.edu.au ([149.171.193.18]) 
(for <dsha472@cse.unsw.edu.au>) By note With Smtp ; 
Fri, 18 Jun 2010 20:04:32 +1000 
Received: from mta177.mail.in.yahoo.com ([202.86.5.206]) by INFPACM001.services.comms.unsw.edu.au with SMTP; 18 Jun 2010 19:52:33 
+1000 
Received: from 65.54.190.16 (EHLO bay0-omc1-s5.bay0.hotmail.com) 
(65.54.190.16) by mta177.mail.in.yahoo.com with SMTP; Fri, 18 Jun 2010 15:34:22 +0530 
Received: from BL2PRD0102HT003.prod.exchangelabs.com ([65.54.190.61]) by bay0-omc1-s5.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); 
Fri, 18 Jun 2010 03:04:00 -0700 
Received: from BL2PRD0102MB009.prod.exchangelabs.com ([169.254.34.168]) by BL2PRD0102HT003.prod.exchangelabs.com ([169.254.220.82]) with mapi; Fri, 18 
Jun 2010 10:03:59 +0000
Are the ideas and results presented in this paper novel? 
In our opinion, the idea of framework AutoRE is significantly novel. Although in some previous works, regular expressions were used for spam detection which is based on URLs in the email content; AutoRE is quite different from them. As can be seen from reasons below: 
First, AutoRE has ability to automatically generate regular expressions based on the discovered URLs. Currently, man-made regular expressions are required in most detection framework. With the rapid growth of the number of spam, it becomes increasing tough even impossible to generate regular expressions manually. By learning from some methods of worm detection system (Singh's research [2]), AutoRE generates spam signature automatically. Therefore, this technique reduces the workload of human being and improves the veracity of regular expressions. 
Second, AutoRE has capacity to predict the future domain-agnostic botnets. Most of previous researches and current detection frameworks are aiming at specific individual botnet. However, for those botnets which have similar behaviours, AutoRE cannot detect them automatically and they can only take action to the domain of those botnets which have been captured. For those possible future domain, these previous research is helpless. However, AutoRE is able to analyse and group the domains which have similar behaviour, and then merge domain-specific regular expressions into domain-agnostic regular expressions, therefore, AutoRE obtain the ability of detecting the domain both currently and in the future which possess same behaviour. 
From these points of view, AutoRE can be considered as an innovative framework in the field of spam detection. 
Are there any weaknesses of this paper that you have not mentioned in your answers to the above questions? 
One of the weaknesses is that AutoRE doesn‟t deal with proxy URL. These proxy URLs usually have no relevance to their redirect destination, so it is hard to group them by using AutoRE. Although they can be traced from redirecting destination and using this destination address to detect whether it is a spam or not by AutoRE, but the tracing process is exactly as spammer‟s wishes. Currently, this situation cannot be improved in this paper. Another weakness is that AutoRE cannot detect the increasing image spam. So authors could borrow ideas from other image spam detection framework (like Uemura research [1]), using image‟s information, such as URL, file name or size, to improve this framework.
Do you think the results of this paper are of practical significance? 
Even though AutoRE was only tested randomly on Hotmail, the result was pretty compelling. As the author mentioned, the regular expression signatures can detect 10 times more spam than previous complete URL based signatures and it can reduce the false positive rate of detection of botnet spam and host significantly. AutoRE is able to capture an additional 16-18% of the spam that bypassed well known spam filters (e.g. spamhaus). Meanwhile, at the present time, both the transient nature of the attack and the fact that only a few spam sent by each botnet make it more difficult for previous spam filtering frameworks detecting and blacklisting the individual bots. Hence, AutoRE becomes more practical for helping existing spam filtering frameworks to detect spam. And most importantly, AutoRE is also capable of “predicting” future botnets regardless of domain name, and besides, it is also quite useful for the characteristic of current botnets. 
However, there is no single framework that can be permanent suitable for all kinds of spam. If AutoRE is widely used in real time, spam senders will try to find weaknesses of this framework, and further, find a way to counter the weaknesses and hide spam from being detected by AutoRE. Thus, AutoRE needs to update frequently to make it more efficiently. 
What is your assessment of the readability, organization and overall presentation of the paper? 
The idea of the paper has been well described overall. The reader gets a fair idea about what the author wants them to understand as they proceed with the topics. There is however a few improvements deemed important. The abstract section of the paper gives an impression as the software AutoRE processes the complete email contents including body for signature generation which is not the case. As the algorithm works only on the URLs inside the email contents it should be mentioned in the abstract section that this is not a content based filtering system. Another point that we noted is the focus of the paper which seems divided between two different topics; AutoRE and Botnet characteristics. Although the paper addresses both of these topics but they seem unrelated sometimes as AutoRE generates signatures only on already received collection of emails. The way these spam emails are sent and how different botnet characteristics effect that may be better described in a separate paper with more details and then can be referred here as required by AutoRE. There is a lot of detail associated with topics like dynamic and static IP addresses, email sending behaviours of botnets and traffic correlations. A lot of data and statistics can be collected on these lines for analysis. The paper itself suggests that this is an interesting future direction because due importance cannot be given to all areas in a single paper.
If you were a reviewer whose recommendation is being sought by the editor of the journal or the conference proceedings on whether or not to publish this paper, what would be your recommendation? 
This is a very important topic and a well known subject. The authors does not need to explain too much about the importance as there is a lot of investment already being done in the field of spam detection. The authors also have a complete working implementation of the algorithm which has been tested on real world data. With the success results claimed by the authors the idea seems to carry a lot of weight although the software has not been in practice for unknown reasons. 
The paper is definitely worth publishing in a related conference. The low false positive rates of applying AutoRE signatures is significantly less than the existing mechanisms although it does not cover the complete email contents. 
How can the work presented in this paper be improved? 
The paper tries to solve a very important problem of spam emails using a mix of content based and non-content based filtering. With significantly low false positive rate and detection of high number of spam campaigns, the results are quite impressive. However we suggest that the work can be improved in a number of ways. 
 Improvement of Signature 
Since AutoRE generates a signature of the spam campaign which it applies to emails arriving later to find out similarities. This signature creation can be improved in a number of ways. Currently it involves only the URLs inside the email message. This signature generation mechanism is incomplete since a lot of spam emails do not contain URLs. 
 Handling of Proxy URLs 
The system at the moment does not work with proxy URLs. This means that a lot of different URLs redirecting to a single resource will not be picked up by the signature. This can be solved by building a blacklist database of all domains providing redirection services to spammers. A domain found in multiple subsequent emails is a good candidate for the blacklist database. It will not be possible for spammers to quickly register new domains for redirection services.
 Keeping signature up-to-date 
AutoRE works on historic data. Since it generates spam signatures and identified spam emails based on historic data it is a big challenge itself to keep those signatures correct and up-to-date. If the signature expires the low false positive rate may change significantly and the system may lose its strength. The paper does not explain anything about it. Having a mechanism to update the signature will heavily boost the software performance. 
 Detecting Image spam 
A lot of spam emails today are sent in the form of images. The purpose of using images is to hide email contents from content based filters. This important feature should be dealt with by content based filtering systems like AutoRE. One way of doing this is to generate signature of the image as well. Some basic characteristics like image size, type and dimensions can be recorded inside the email signature to identify similar images in other emails. Advanced image signature algorithms like colour histograms might not be possible to apply at such mass scale but calculating an image hash might turn out to be useful. 
 Dependence on Botnet burstiness 
AutoRE heavily relies on the burstiness property of spamming botnets with the assumption that the botnets will be rented for a small time only. This can ultimately result in generation of a totally incorrect spam signature if botnet start throttling the sending speed. However this topic remains wide open because waiting for the right spam email to be used as signature data is not the option. 
Reference: 
[1]Uemura, M& Tabata, T 2008 „Design and Evaluation of a Bayesian- filter-based Image Spam Filtering Method‟, 2008 International Conference on Information Security and Assurance, 2008 IEEE 
[2]S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In OSDI, 2004. 
[3]Apache SpamAssassin
Appendix:- 
Following are the proof of our efforts:- 
1. Letter from the IT Department of UNSW
Email to Microsoft Team:- 
Respected Sir/Madam, I am a student at The University of New South Wales,Sydney,Australia. 
Your paper on "Spamming Botnets: Signatures and Characteristics" is my anchor 
paper for a research study in my course "Advance Computer Networks". 
First of all, I would like to appreciate the manner in which the paper is written, 
It was very interesting and inspiring to go through the paper. 
Secondly I needed a favour from you to help me in research study on your paper. 
I would be highly obliged if you could help in my research study. 
I understand your limitations and would highly appreciate any help you could 
provide me. I am hoping for some kind of pointers to move ahead on my research 
work all, I am expected is to do is try and conduct some experiments on anchor 
paper to understand the topic well and if possible come up with any difficulties 
not mentioned in paper. I would be waiting for your reply eagerly. Thanks and Regards, Dhara Shah Master of Engineering Science specialization Information Technology The University of New South Wales Student. 
Inquiry Regarding your paper on "Spamming Botnets : Signatures and Characteristics" 
Dharaben Shah You forwarded this message on 4/13/2010 12:36 AM. 
Sent: 
Tuesday, April 13, 2010 12:34 AM 
To: 
rina@microsoft.com
Our Diary 
Release Date: - 11th March, 2010. 
Read abstract of 8 topics each and nominated 2 topics per group member by 17th March, 2010. 
Got the final selected topic by 19th March, 2010. 
Till 28th March went through the anchor paper thoroughly and wrote one page write-up as a summary of the understanding of the paper. 
On 28th March decided the approach ahead. Our approach was we listed the references mentioned in the anchor paper and each on us was assigned 8 of them. Our objective was to find where the references were used in the anchor paper and to write a small summary explaining its use in the anchor paper. The Deadline for this work was 4th April. Every Monday we discussed our progress as it was our lab time. 
Next was we mailed to the researchers of our anchor paper and tried to get coding of the software mentioned in our anchor paper. Our efforts were futile as the software was not available commercially and being a Microsoft research details were not revealed to us. Hence we decided to move ahead and gather more literature to find a way to experiment the anchor paper. 
Till 11th April we had been working around anchor paper only as it took us time understanding and finding a way to experimenting. From 11th April for 2 weeks (till 25th April) following task was assigned to the group members: - Dhara - working on anchor paper and finding way to conduct experiment on it. Imad – Future work and related work. Zuo – Past and related work. 
Outcome: - Possible area of exploitation are creating Botnet and sending emails to test various mail service provider and see how they detect spam email .Proving difference between Regular Expression and Token Conjunction Signature. 2 page write-up on key findings of the paper, future and background work. 
From 25th April to 12th May we are working on presentation as our presentation was on 13th May. After 13th May from 13th May to 20th May we tried getting data from University mail server and tried setting up mail server to get data to testify findings. Due to failure in setting up the mail server from 21st May to 27th May we tried getting University data and setting up Botnet. From 27th may to 25th June we tried collecting data through pine and applying for University Data.

More Related Content

What's hot

Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
Partnered Health
 
Web Spam Detection Using Machine Learning
Web Spam Detection Using Machine LearningWeb Spam Detection Using Machine Learning
Web Spam Detection Using Machine Learning
butest
 
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
IJNSA Journal
 
Tracking Spam Mails Using SPRT Algorithm With AAA
Tracking Spam Mails Using SPRT Algorithm With AAATracking Spam Mails Using SPRT Algorithm With AAA
Tracking Spam Mails Using SPRT Algorithm With AAA
IRJET Journal
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharya
sankhadeep
 
Processing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithmProcessing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithm
ijcsit
 
B0940509
B0940509B0940509
B0940509
IOSR Journals
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
International Journal of Technical Research & Application
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
csandit
 
Phishing
PhishingPhishing
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
Akshay Pal
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
iNazneen
 
Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentation
Jhih-Ming Chen
 
Js3616841689
Js3616841689Js3616841689
Js3616841689
IJERA Editor
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filtering
ijtsrd
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
Editor IJARCET
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
newsan2001
 

What's hot (17)

Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
Web Spam Detection Using Machine Learning
Web Spam Detection Using Machine LearningWeb Spam Detection Using Machine Learning
Web Spam Detection Using Machine Learning
 
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
MINIMIZING THE TIME OF SPAM MAIL DETECTION BY RELOCATING FILTERING SYSTEM TO ...
 
Tracking Spam Mails Using SPRT Algorithm With AAA
Tracking Spam Mails Using SPRT Algorithm With AAATracking Spam Mails Using SPRT Algorithm With AAA
Tracking Spam Mails Using SPRT Algorithm With AAA
 
Spam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta BhattacharyaSpam and Anti-spam - Sudipta Bhattacharya
Spam and Anti-spam - Sudipta Bhattacharya
 
Processing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithmProcessing obtained email data by using naïve bayes learning algorithm
Processing obtained email data by using naïve bayes learning algorithm
 
B0940509
B0940509B0940509
B0940509
 
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERINGDEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
DEVELOPMENT OF AN EFFECTIVE BAYESIAN APPROACH FOR SPAM FILTERING
 
Computing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engineComputing semantic similarity measure between words using web search engine
Computing semantic similarity measure between words using web search engine
 
Phishing
PhishingPhishing
Phishing
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentation
 
Js3616841689
Js3616841689Js3616841689
Js3616841689
 
A Survey: SMS Spam Filtering
A Survey: SMS Spam FilteringA Survey: SMS Spam Filtering
A Survey: SMS Spam Filtering
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 

Viewers also liked

web training
web trainingweb training
web training
sourabh4u
 
Natural science 2 reviewer
Natural science 2 reviewerNatural science 2 reviewer
Natural science 2 reviewer
Walter Valencerina
 
I phone programming project report
I phone programming project reportI phone programming project report
I phone programming project report
Dhara Shah
 
Interactive Powerpoint
Interactive PowerpointInteractive Powerpoint
Interactive Powerpoint
purofutbol
 
Vremena Goda
Vremena GodaVremena Goda
Vremena GodaCaHHu
 
Bachelorthesis
BachelorthesisBachelorthesis
Bachelorthesis
Dhara Shah
 
Organizational Culture MEASURING
Organizational Culture MEASURINGOrganizational Culture MEASURING
Organizational Culture MEASURING
bertvanderlinden
 
Data_Management_Seminar_Dhara_Shah
Data_Management_Seminar_Dhara_ShahData_Management_Seminar_Dhara_Shah
Data_Management_Seminar_Dhara_Shah
Dhara Shah
 
Soccer Presentation1
Soccer Presentation1Soccer Presentation1
Soccer Presentation1
purofutbol
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
Dhara Shah
 
Lloyd Pro Group
Lloyd Pro GroupLloyd Pro Group
Lloyd Pro Group
April Lentini
 
Peace Table
Peace TablePeace Table
Peace Table
drivera2
 
Laurel Grove Slideshow Compressed
Laurel Grove Slideshow CompressedLaurel Grove Slideshow Compressed
Laurel Grove Slideshow Compressed
April Lentini
 
Targets for Resilient Cities
Targets for Resilient CitiesTargets for Resilient Cities
Targets for Resilient Cities
Kym Lennox
 
Content inventories
Content inventoriesContent inventories
Content inventories
April Lentini
 
IETC : Are your Students REALLY Collaborating?
IETC : Are your Students REALLY Collaborating?IETC : Are your Students REALLY Collaborating?
IETC : Are your Students REALLY Collaborating?
jorech
 
Holiday Retirement Holidays
Holiday Retirement HolidaysHoliday Retirement Holidays
Holiday Retirement Holidays
April Lentini
 
Encouragement
EncouragementEncouragement
Encouragement
drivera2
 
Bachelorthesis.compressed
Bachelorthesis.compressedBachelorthesis.compressed
Bachelorthesis.compressed
Dhara Shah
 
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P. RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
andymaque
 

Viewers also liked (20)

web training
web trainingweb training
web training
 
Natural science 2 reviewer
Natural science 2 reviewerNatural science 2 reviewer
Natural science 2 reviewer
 
I phone programming project report
I phone programming project reportI phone programming project report
I phone programming project report
 
Interactive Powerpoint
Interactive PowerpointInteractive Powerpoint
Interactive Powerpoint
 
Vremena Goda
Vremena GodaVremena Goda
Vremena Goda
 
Bachelorthesis
BachelorthesisBachelorthesis
Bachelorthesis
 
Organizational Culture MEASURING
Organizational Culture MEASURINGOrganizational Culture MEASURING
Organizational Culture MEASURING
 
Data_Management_Seminar_Dhara_Shah
Data_Management_Seminar_Dhara_ShahData_Management_Seminar_Dhara_Shah
Data_Management_Seminar_Dhara_Shah
 
Soccer Presentation1
Soccer Presentation1Soccer Presentation1
Soccer Presentation1
 
NetworkPaperthesis1
NetworkPaperthesis1NetworkPaperthesis1
NetworkPaperthesis1
 
Lloyd Pro Group
Lloyd Pro GroupLloyd Pro Group
Lloyd Pro Group
 
Peace Table
Peace TablePeace Table
Peace Table
 
Laurel Grove Slideshow Compressed
Laurel Grove Slideshow CompressedLaurel Grove Slideshow Compressed
Laurel Grove Slideshow Compressed
 
Targets for Resilient Cities
Targets for Resilient CitiesTargets for Resilient Cities
Targets for Resilient Cities
 
Content inventories
Content inventoriesContent inventories
Content inventories
 
IETC : Are your Students REALLY Collaborating?
IETC : Are your Students REALLY Collaborating?IETC : Are your Students REALLY Collaborating?
IETC : Are your Students REALLY Collaborating?
 
Holiday Retirement Holidays
Holiday Retirement HolidaysHoliday Retirement Holidays
Holiday Retirement Holidays
 
Encouragement
EncouragementEncouragement
Encouragement
 
Bachelorthesis.compressed
Bachelorthesis.compressedBachelorthesis.compressed
Bachelorthesis.compressed
 
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P. RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
RELACION DE HIERBAS UTILIZADAS EN EL PARTO Y POSTPARTO. Obst. A. MAQUE P.
 

Similar to NetworkPaperthesis2

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in Internet
IOSR Journals
 
Blockmail Technical White Paper
Blockmail   Technical White PaperBlockmail   Technical White Paper
Blockmail Technical White Paper
niallmmackey
 
Do Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern RecognitionDo Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern Recognition
Bitdefender
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET Journal
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Editor IJCATR
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
ijtsrd
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
IRJET Journal
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
Editor IJCATR
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBS
ijsrd.com
 
Robo10 tr
Robo10 trRobo10 tr
Robo10 tr
Ouzza Brahim
 
IRJET- Detecting Phishing Websites using Machine Learning
IRJET- Detecting Phishing Websites using Machine LearningIRJET- Detecting Phishing Websites using Machine Learning
IRJET- Detecting Phishing Websites using Machine Learning
IRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
IRJET Journal
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
ieijjournal
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
ieijjournal
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
ieijjournal
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
IRJET Journal
 
Captcha Recognition and Robustness Measurement using Image Processing Techniques
Captcha Recognition and Robustness Measurement using Image Processing TechniquesCaptcha Recognition and Robustness Measurement using Image Processing Techniques
Captcha Recognition and Robustness Measurement using Image Processing Techniques
IOSR Journals
 

Similar to NetworkPaperthesis2 (20)

Monitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in InternetMonitoring the Spread of Active Worms in Internet
Monitoring the Spread of Active Worms in Internet
 
Blockmail Technical White Paper
Blockmail   Technical White PaperBlockmail   Technical White Paper
Blockmail Technical White Paper
 
Do Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern RecognitionDo Humans Beat Computers At Pattern Recognition
Do Humans Beat Computers At Pattern Recognition
 
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
IRJET - Detection and Prevention of Phishing Websites using Machine Learning ...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
Detecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBSDetecting Spambot as an Antispam Technique for Web Internet BBS
Detecting Spambot as an Antispam Technique for Web Internet BBS
 
Robo10 tr
Robo10 trRobo10 tr
Robo10 tr
 
IRJET- Detecting Phishing Websites using Machine Learning
IRJET- Detecting Phishing Websites using Machine LearningIRJET- Detecting Phishing Websites using Machine Learning
IRJET- Detecting Phishing Websites using Machine Learning
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web SpamLow Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 
Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam Low Cost Page Quality Factors To Detect Web Spam
Low Cost Page Quality Factors To Detect Web Spam
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
Captcha Recognition and Robustness Measurement using Image Processing Techniques
Captcha Recognition and Robustness Measurement using Image Processing TechniquesCaptcha Recognition and Robustness Measurement using Image Processing Techniques
Captcha Recognition and Robustness Measurement using Image Processing Techniques
 

NetworkPaperthesis2

  • 1. Group Details:- Dhara Shah z3299353 Imad Hashmi z3193866 Zuo Cui z3261136 Our Paper:- Y. Xie , F. Yu, K. Achan , R. Panigraphy , G. Hulten and I. Osipkov , Spamming Botnets: Signatures and Characteristics, in Proceedings of ACM SIGCOMM 2008, pp. 171-182, Seattle, USA August 2008. Is this paper technically sound? Paper is based on the experiments conducted on 3 months data collected from the Hotmail‟s Server. To simulate similar results we needed the algorithm or rules used in the AutoRE software to generate regular expression and data on which experiments could be conducted. To get the details of the software we tried contacting the Authors but unfortunately could not receive any reply from them (proof attached in appendix). We suspect that as it‟s a Microsoft group research and commercial product details are confidential. Hence we tried looking at the open source spam detection software to understand working of AutoRE. We could not compare the techniques used by the open source Spam Detection Software and AutoRE as we didn‟t had all details of AutoRE. There are a number of spam detection tools available both commercial and open source but none of them is based on signatures. The idea in this paper is genuine and novel because other content based filters do not generate signatures and rely on a complete scan of the email. Following are some of the rules used to identify a spam URL[3]. We discuss URLs only because AutoRE works with URLs only:  Uses a numeric IP address in URL  Uses %-escapes inside a URL's hostname  Completely unnecessary %-escapes inside a URL  Dotted-decimal IP address followed by CGI  Uses non-standard port number for HTTP  Has Yahoo Redirect URI  Contains an URL-encoded hostname (HTTP77)  URI contains ".com" in middle  URI contains ".com" in middle and end  URI contains ".net" or ".org", then ".com"  URI hostname has long hexadecimal sequence  URI hostname has long non-vowel sequence  CGI in .info TLD other than third-level "www"  CGI in .biz TLD other than third-level "www"
  • 2. There is a long list of email header criteria which can be applied to identify spam but that is beyond the scope here. Next was we tried collecting data from the University‟s Mail server to verify the characteristics about the spam emails mentioned in the paper (proof attached in the appendix). But due to security issues concerned with the university we couldn‟t get the data. Hence we redirected our yahoo, Gmail and hotmail accounts to Cse account. And then accessing the Cse account via “pine” utility. Pine is a text based email reader which enables us to see detailed email headers. We tried distinguishing the email header of the Spam Email and a legitimate Email. But as Cse doesn‟t have an anti spam technology applied to it, it relies on the University‟s server for this. We verified this by observing that all the emails coming to Cse are being forwarded by the University‟s server. Also we understood that even if the user marks a email as spam, the system does not categorize it as spam until it satisfy the basic property of burstiness. We classified few legitimate email-ids as spam but the email server never classified it as spam as they were never sending in bulk. Result from Pine is as follows:- INFPACM003.services.comms.unsw.edu.au ([149.171.193.26]) (IP doesn't match sender domain) (for <dsha472@cse.unsw.edu.au>) By note With Smtp ; Fri, 18 Jun 2010 20:23:12 +1000 Received: from mta156.mail.in.yahoo.com ([203.84.221.168]) by INFPACM003.services.comms.unsw.edu.au with SMTP; 18 Jun 2010 20:02:46 +1000 Received: from 68.142.207.198 (HELO web32405.mail.mud.yahoo.com) (68.142.207.198) by mta156.mail.in.yahoo.com with SMTP; Fri, 18 Jun 2010 15:53:07 +0530 Received: (qmail 20395 invoked by uid 60001); 18 Jun 2010 10:23:04 -0000 Received: from [117.193.43.248] by web32405.mail.mud.yahoo.com via HTTP; Fri , 18 Jun 2010 03:23:03 PDT Received: From INFPACM001.services.comms.unsw.edu.au ([149.171.193.18]) (for <dsha472@cse.unsw.edu.au>) By note With Smtp ; Fri, 18 Jun 2010 20:04:32 +1000 Received: from mta177.mail.in.yahoo.com ([202.86.5.206]) by INFPACM001.services.comms.unsw.edu.au with SMTP; 18 Jun 2010 19:52:33 +1000 Received: from 65.54.190.16 (EHLO bay0-omc1-s5.bay0.hotmail.com) (65.54.190.16) by mta177.mail.in.yahoo.com with SMTP; Fri, 18 Jun 2010 15:34:22 +0530 Received: from BL2PRD0102HT003.prod.exchangelabs.com ([65.54.190.61]) by bay0-omc1-s5.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 18 Jun 2010 03:04:00 -0700 Received: from BL2PRD0102MB009.prod.exchangelabs.com ([169.254.34.168]) by BL2PRD0102HT003.prod.exchangelabs.com ([169.254.220.82]) with mapi; Fri, 18 Jun 2010 10:03:59 +0000
  • 3. Are the ideas and results presented in this paper novel? In our opinion, the idea of framework AutoRE is significantly novel. Although in some previous works, regular expressions were used for spam detection which is based on URLs in the email content; AutoRE is quite different from them. As can be seen from reasons below: First, AutoRE has ability to automatically generate regular expressions based on the discovered URLs. Currently, man-made regular expressions are required in most detection framework. With the rapid growth of the number of spam, it becomes increasing tough even impossible to generate regular expressions manually. By learning from some methods of worm detection system (Singh's research [2]), AutoRE generates spam signature automatically. Therefore, this technique reduces the workload of human being and improves the veracity of regular expressions. Second, AutoRE has capacity to predict the future domain-agnostic botnets. Most of previous researches and current detection frameworks are aiming at specific individual botnet. However, for those botnets which have similar behaviours, AutoRE cannot detect them automatically and they can only take action to the domain of those botnets which have been captured. For those possible future domain, these previous research is helpless. However, AutoRE is able to analyse and group the domains which have similar behaviour, and then merge domain-specific regular expressions into domain-agnostic regular expressions, therefore, AutoRE obtain the ability of detecting the domain both currently and in the future which possess same behaviour. From these points of view, AutoRE can be considered as an innovative framework in the field of spam detection. Are there any weaknesses of this paper that you have not mentioned in your answers to the above questions? One of the weaknesses is that AutoRE doesn‟t deal with proxy URL. These proxy URLs usually have no relevance to their redirect destination, so it is hard to group them by using AutoRE. Although they can be traced from redirecting destination and using this destination address to detect whether it is a spam or not by AutoRE, but the tracing process is exactly as spammer‟s wishes. Currently, this situation cannot be improved in this paper. Another weakness is that AutoRE cannot detect the increasing image spam. So authors could borrow ideas from other image spam detection framework (like Uemura research [1]), using image‟s information, such as URL, file name or size, to improve this framework.
  • 4. Do you think the results of this paper are of practical significance? Even though AutoRE was only tested randomly on Hotmail, the result was pretty compelling. As the author mentioned, the regular expression signatures can detect 10 times more spam than previous complete URL based signatures and it can reduce the false positive rate of detection of botnet spam and host significantly. AutoRE is able to capture an additional 16-18% of the spam that bypassed well known spam filters (e.g. spamhaus). Meanwhile, at the present time, both the transient nature of the attack and the fact that only a few spam sent by each botnet make it more difficult for previous spam filtering frameworks detecting and blacklisting the individual bots. Hence, AutoRE becomes more practical for helping existing spam filtering frameworks to detect spam. And most importantly, AutoRE is also capable of “predicting” future botnets regardless of domain name, and besides, it is also quite useful for the characteristic of current botnets. However, there is no single framework that can be permanent suitable for all kinds of spam. If AutoRE is widely used in real time, spam senders will try to find weaknesses of this framework, and further, find a way to counter the weaknesses and hide spam from being detected by AutoRE. Thus, AutoRE needs to update frequently to make it more efficiently. What is your assessment of the readability, organization and overall presentation of the paper? The idea of the paper has been well described overall. The reader gets a fair idea about what the author wants them to understand as they proceed with the topics. There is however a few improvements deemed important. The abstract section of the paper gives an impression as the software AutoRE processes the complete email contents including body for signature generation which is not the case. As the algorithm works only on the URLs inside the email contents it should be mentioned in the abstract section that this is not a content based filtering system. Another point that we noted is the focus of the paper which seems divided between two different topics; AutoRE and Botnet characteristics. Although the paper addresses both of these topics but they seem unrelated sometimes as AutoRE generates signatures only on already received collection of emails. The way these spam emails are sent and how different botnet characteristics effect that may be better described in a separate paper with more details and then can be referred here as required by AutoRE. There is a lot of detail associated with topics like dynamic and static IP addresses, email sending behaviours of botnets and traffic correlations. A lot of data and statistics can be collected on these lines for analysis. The paper itself suggests that this is an interesting future direction because due importance cannot be given to all areas in a single paper.
  • 5. If you were a reviewer whose recommendation is being sought by the editor of the journal or the conference proceedings on whether or not to publish this paper, what would be your recommendation? This is a very important topic and a well known subject. The authors does not need to explain too much about the importance as there is a lot of investment already being done in the field of spam detection. The authors also have a complete working implementation of the algorithm which has been tested on real world data. With the success results claimed by the authors the idea seems to carry a lot of weight although the software has not been in practice for unknown reasons. The paper is definitely worth publishing in a related conference. The low false positive rates of applying AutoRE signatures is significantly less than the existing mechanisms although it does not cover the complete email contents. How can the work presented in this paper be improved? The paper tries to solve a very important problem of spam emails using a mix of content based and non-content based filtering. With significantly low false positive rate and detection of high number of spam campaigns, the results are quite impressive. However we suggest that the work can be improved in a number of ways.  Improvement of Signature Since AutoRE generates a signature of the spam campaign which it applies to emails arriving later to find out similarities. This signature creation can be improved in a number of ways. Currently it involves only the URLs inside the email message. This signature generation mechanism is incomplete since a lot of spam emails do not contain URLs.  Handling of Proxy URLs The system at the moment does not work with proxy URLs. This means that a lot of different URLs redirecting to a single resource will not be picked up by the signature. This can be solved by building a blacklist database of all domains providing redirection services to spammers. A domain found in multiple subsequent emails is a good candidate for the blacklist database. It will not be possible for spammers to quickly register new domains for redirection services.
  • 6.  Keeping signature up-to-date AutoRE works on historic data. Since it generates spam signatures and identified spam emails based on historic data it is a big challenge itself to keep those signatures correct and up-to-date. If the signature expires the low false positive rate may change significantly and the system may lose its strength. The paper does not explain anything about it. Having a mechanism to update the signature will heavily boost the software performance.  Detecting Image spam A lot of spam emails today are sent in the form of images. The purpose of using images is to hide email contents from content based filters. This important feature should be dealt with by content based filtering systems like AutoRE. One way of doing this is to generate signature of the image as well. Some basic characteristics like image size, type and dimensions can be recorded inside the email signature to identify similar images in other emails. Advanced image signature algorithms like colour histograms might not be possible to apply at such mass scale but calculating an image hash might turn out to be useful.  Dependence on Botnet burstiness AutoRE heavily relies on the burstiness property of spamming botnets with the assumption that the botnets will be rented for a small time only. This can ultimately result in generation of a totally incorrect spam signature if botnet start throttling the sending speed. However this topic remains wide open because waiting for the right spam email to be used as signature data is not the option. Reference: [1]Uemura, M& Tabata, T 2008 „Design and Evaluation of a Bayesian- filter-based Image Spam Filtering Method‟, 2008 International Conference on Information Security and Assurance, 2008 IEEE [2]S. Singh, C. Estan, G. Varghese, and S. Savage. Automated worm fingerprinting. In OSDI, 2004. [3]Apache SpamAssassin
  • 7. Appendix:- Following are the proof of our efforts:- 1. Letter from the IT Department of UNSW
  • 8. Email to Microsoft Team:- Respected Sir/Madam, I am a student at The University of New South Wales,Sydney,Australia. Your paper on "Spamming Botnets: Signatures and Characteristics" is my anchor paper for a research study in my course "Advance Computer Networks". First of all, I would like to appreciate the manner in which the paper is written, It was very interesting and inspiring to go through the paper. Secondly I needed a favour from you to help me in research study on your paper. I would be highly obliged if you could help in my research study. I understand your limitations and would highly appreciate any help you could provide me. I am hoping for some kind of pointers to move ahead on my research work all, I am expected is to do is try and conduct some experiments on anchor paper to understand the topic well and if possible come up with any difficulties not mentioned in paper. I would be waiting for your reply eagerly. Thanks and Regards, Dhara Shah Master of Engineering Science specialization Information Technology The University of New South Wales Student. Inquiry Regarding your paper on "Spamming Botnets : Signatures and Characteristics" Dharaben Shah You forwarded this message on 4/13/2010 12:36 AM. Sent: Tuesday, April 13, 2010 12:34 AM To: rina@microsoft.com
  • 9. Our Diary Release Date: - 11th March, 2010. Read abstract of 8 topics each and nominated 2 topics per group member by 17th March, 2010. Got the final selected topic by 19th March, 2010. Till 28th March went through the anchor paper thoroughly and wrote one page write-up as a summary of the understanding of the paper. On 28th March decided the approach ahead. Our approach was we listed the references mentioned in the anchor paper and each on us was assigned 8 of them. Our objective was to find where the references were used in the anchor paper and to write a small summary explaining its use in the anchor paper. The Deadline for this work was 4th April. Every Monday we discussed our progress as it was our lab time. Next was we mailed to the researchers of our anchor paper and tried to get coding of the software mentioned in our anchor paper. Our efforts were futile as the software was not available commercially and being a Microsoft research details were not revealed to us. Hence we decided to move ahead and gather more literature to find a way to experiment the anchor paper. Till 11th April we had been working around anchor paper only as it took us time understanding and finding a way to experimenting. From 11th April for 2 weeks (till 25th April) following task was assigned to the group members: - Dhara - working on anchor paper and finding way to conduct experiment on it. Imad – Future work and related work. Zuo – Past and related work. Outcome: - Possible area of exploitation are creating Botnet and sending emails to test various mail service provider and see how they detect spam email .Proving difference between Regular Expression and Token Conjunction Signature. 2 page write-up on key findings of the paper, future and background work. From 25th April to 12th May we are working on presentation as our presentation was on 13th May. After 13th May from 13th May to 20th May we tried getting data from University mail server and tried setting up mail server to get data to testify findings. Due to failure in setting up the mail server from 21st May to 27th May we tried getting University data and setting up Botnet. From 27th may to 25th June we tried collecting data through pine and applying for University Data.