SlideShare a Scribd company logo
1 of 26
MOBILE DEVICE FORENSICS USING NLP
Presented By:
Ankita Jadhao
Roll no. CSE15S2002
Supervised By:
Dr. A. J. Agrawal
Department of Computer Science & Engineering
Shri Ramdeobaba College of Engineering and Management
Nagpur
8/24/2015 1
Department of Computer Science and
Engineering
Contents
• Introduction
• Motivation
• Review of Literature
• Issues
• Methodology
• Existing System
• Advantages
• Bibliography
8/24/2015 Department of Computer Science and Engineering 2
Introduction
• Text mining also is known as Text Data Mining (TDM) and
Knowledge Discovery in Textual Database (KDT).
• Text Mining Tasks: 1. Exploratory Data Analysis
2. Information Extraction
3.Text Classification
Fig.1 Overview of Process
8/24/2015 Department of Computer Science and Engineering 3
Introduction
Where Text Mining Used
• Biomedical applications
- To the identification of biological such as protein and gene names as well as
chemical compounds and drugs.
• Software applications
- The mining and analysis processes, and by different firms to improve their results.
- Software for tracking and monitoring terrorist activities.
• Online media applications
- Provide readers with greater search experiences which in turn increases site
“stickiness” and revenue.
• Marketing applications
- In analytical customer relationship management.
8/24/2015 Department of Computer Science and Engineering 4
Introduction
• Sentiment analysis
-Analysis of movie reviews for estimating how favorable a review is for a movie
-Text has been used to detect emotions in the related area of affective computing.
• Security Application
-Monitoring and analysis of online plain text sources such as Internet news, blogs,
etc. for national security purposes.
- Criminal activity
8/24/2015 Department of Computer Science and Engineering 5
Introduction
8/24/2015 Department of Computer Science and Engineering 6
•Use of mobile phones to store and transmit personal and corporate
information
•Law enforcement, criminals and mobile phone devices
•There are limited corpora available
• A simple methodology is proposed for feature extraction
•What is corpora?
A text corpus is a large and structured set of texts.
Motivation
• Growth of mobile devices is rapid.
• The average cell phone user sends over 15,000 texts annually
• The average 18-24 year old sends almost 40,000 text messages
every year
• Most tools and methodologies merely acquire all supported data
and dump the output to a spreadsheet or HTML report
• Search hits must be manually examined and noted in a report.
• Problem 1.Simple keyword searches
2. Limited corpora
8/24/2015 7Department of Computer Science and Engineering
Literature Review
• Corpora: A corpus linguistics study of SMS text messaging[3]
-Tagg developed a text message corpus in British English, but an American English
corpus focusing on forensic application is desirable.
-Even for neutral text messages (non-drug-related), as the language is
significantly different which will skew results.
• Integrating Machine Learning into the Forensic Process
Approaches:-
1.The digital forensic process can be summarized as preservation, isolation,
correlation, and logging [4]
2. Begins with acquisition, then analysis, then concludes with presentation[5]
3. preservation, extraction, and then interpretation[1]
8/24/2015 Department of Computer Science and Engineering 8
Literature Review
• Natural Language Processing: Dela Rosa and Ellen
- Detect linguistic patterns is an invaluable tool when applied to text messaging
data
- NLTK machine learning algorithms can be applied to a training set and assessed
on a test set to create and train an experimental model
- Applying k nearest neighbor (kNN) and support vector machines (SVM)
machine-learning algorithms to micro-text classification
8/24/2015 Department of Computer Science and Engineering 9
Issues in Mobile Forensics
• Corpora is not available
• Accuracy Problem
• Feature Extraction
• Micro-Text Problem
8/24/2015 Department of Computer Science and Engineering 10
Overview
8/24/2015 Department of Computer Science and Engineering 11
•Mobile Device Forensic Extraction
•Text Message Corpus
•Feature Extraction
•Supervised Machine Learning
Mobile Device Forensic Extraction
• Text messages were extracted from mobile.
• Administrative access to the device was gained by utilizing the
redsn0w software to “jailbreak” the device.
• The text message database was accessed on the device by
navigating to the default location
• An MD5 hash value was computed for the text message
database file to mathematically verify that the file had not been
altered during the execution of the methodology
8/24/2015 Department of Computer Science and Engineering 12
Corpora
8/24/2015 Department of Computer Science and Engineering 13
1. First, collect the corpus data
2. Save the text in plain text format
3. Provide an identification of the text at the beginning of it.
4. Carry out any pre-processing of the text
5. The corpus was saved in extensible markup language XML
format.
Fig 3 Common Structures for Text Corpora
Corpora
8/24/2015 Department of Computer Science and Engineering 14
<?xml version="1.0" encoding="UTF-8"?>
<corpus_data>
<text_message>
<class>0</class>
<subscriber>1</subscriber>
<message_body>Text Message</message_body>
<timestamp>9/4/2012 2:40 PM</timestamp>
<type>Incoming</type>
</text_message>
• Class refers to whether or not each individual text message is drug-
related (1) or neutral (0).
• The data were modified and additional text messages were developed
Information Extraction System
8/24/2015 Department of Computer Science and Engineering 15
Fig 1 Simple Pipeline Architecture for an Information Extraction System
We first convert the unstructured data of natural language sentences into
the structured data.
Then getting meaning from text is called Information Extraction
Information Extraction System
Example:
String: We saw the yellow dog
8/24/2015 Department of Computer Science and Engineering 16
Fig 2 Segmentation and Labeling at both the Token and Chunk Levels
Feature Extraction
8/24/2015 Department of Computer Science and Engineering 17
Data Representation
– “Bag of words” most commonly used: either counts or binary
– Can also use “phrases” for commonly occurring combinations
of words
There are three aspects of feature extraction:
• Feature construction;
• Feature subset generation (or search strategy);
• Evaluation criterion estimation
Approach for Feature Extraction
8/24/2015 Department of Computer Science and Engineering 18
•Utilizing a count of known drug-related unigrams as a Feature
•NLTK was used to identify bigrams of interest
•The alternate approach
-Two-word pairs as features and to allow the algorithm to
determine which bigrams were most effective in classifying text
messages as drug-related or neutral.
Example:
1. “After school today let’s go smoke some weed at my house.”
2. “Hey pull that weed in my flower garden when you get
home.”
Approach for Feature Extraction
8/24/2015 Department of Computer Science and Engineering 19
•While the first text message was drug-related, the second
was neutral and would therefore be a false positive.
•The hypothesis was that drugrelated terms would exist in
frequented bigrams, such as “smoke weed,” “mary jane,” “hit
acid,” “pop pilz,” etc. and that these bigrams would increase
classification accuracy
Algorithm
8/24/2015 Department of Computer Science and Engineering 20
Supervised Machine Learning
8/24/2015 Department of Computer Science and Engineering 21
•Input- text message corpus.
•Bigrams were selected as features.
•System was trained utilizing NLTK’s implementation of the Naïve
Bayes classifier .
•It was hypothesized that a smaller training set might increase the
accuracy.
Application’s of Mobile Forensics
• Makes SMS analysis techniques highly applicable to Twitter
“tweet” analysis.
• It is useful for corporate investigation, criminal and civil
defense.
• Useful for law enforcement investigators to analyze Social
Media Profile for evidence of criminal activity
8/24/2015 Department of Computer Science and Engineering 22
Conclusion
8/24/2015 Department of Computer Science and Engineering 23
•Natural language processing and machine classification
have been applied to mobile device forensic analysis in a
unique way
•Text message classification and are free to develop a
better methodology using the text message corpus.
•Develop the more efficient corpora, it has been made
available to the research community
Future Work
8/24/2015 Department of Computer Science and Engineering 24
•We can overcome on the “micro-text” problem by using more
efficient feature extraction techniques
•Future research recommendations include determination of the
frequency of text messaging between criminal suspects
•Calculating the average time span between sent and received
messages in text message conversation threads.
References
8/24/2015 Department of Computer Science and Engineering 25
1. Daniel R. O’Day and Ricardo A. Calix“TEXT MESSAGE CORPUS: APPLYING NATURAL
LANGUAGE PROCESSING TO MOBILE DEVICE FORENSICS”, Purdue University
Calumet, 2200 169th Street, Hammond, IN, 46323, USA
2. D. Phuc and N.T.K. Phung, “Using Naïve Bayes model and natural language
processing for classifying messages on online forum,” 2007 IEEE International
Conference on Research, Innovation and Vision for the Future, pp. 247-252, March
2007
3. A.Smith. “Americans and Text Messaging”. 2011.[Online]. http://pewinternet.org
media/Files/Reports/2011/Americans %20and%20Text%20Messaging.pdf
4. B. Carrier, “File System Forensic Analysis”. Boston,MA: Addison-Wesley, 2005, p. 8.
5. C. Altheide and H. Carvey, “Digital Forensics With Open Source Tools”, Waltham,
MA: Syngress, 2011.
6. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing
Text with the Natural Language Toolkit. Sebastopol, CA: O’Reilly Media, 2009, pp.
221-255.
Thank you!
8/24/2015
Department of Computer Science and
Engineering
26

More Related Content

What's hot

Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...IJCSIS Research Publications
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Wolfgang Kuchinke
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WGGenomeInABottle
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...IJCSIS Research Publications
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...Markus Borg
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206GenomeInABottle
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeRafael C. Jimenez
 
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boostImproved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boosteSAT Journals
 
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...IJECEIAES
 
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020dkNET
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
 

What's hot (17)

Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...Importance of data standards and system validation of software for clinical r...
Importance of data standards and system validation of software for clinical r...
 
H017445260
H017445260H017445260
H017445260
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
140127 Performance Metrics WG
140127 Performance Metrics WG140127 Performance Metrics WG
140127 Performance Metrics WG
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
 
Firewalls
FirewallsFirewalls
Firewalls
 
B017441015
B017441015B017441015
B017441015
 
Findability through Traceability - A Realistic Application of Candidate Tr...
Findability through Traceability  - A Realistic Application of Candidate Tr...Findability through Traceability  - A Realistic Application of Candidate Tr...
Findability through Traceability - A Realistic Application of Candidate Tr...
 
2014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 1402062014 agbt giab data integration poster 140206
2014 agbt giab data integration poster 140206
 
Technical activities in ELIXIR Europe
Technical activities in ELIXIR EuropeTechnical activities in ELIXIR Europe
Technical activities in ELIXIR Europe
 
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boostImproved spambase dataset prediction using svm rbf kernel with adaptive boost
Improved spambase dataset prediction using svm rbf kernel with adaptive boost
 
Post-Genesis Digital Forensics Investigation
Post-Genesis Digital Forensics InvestigationPost-Genesis Digital Forensics Investigation
Post-Genesis Digital Forensics Investigation
 
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...
Forensic Tools Performance Analysis on Android-based Blackberry Messenger usi...
 
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
dkNET Webinar - FAIR Data Require Better Metadata: The Case for CEDAR 11/13/2020
 
‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud ‘CodeAliker’ - Plagiarism Detection on the Cloud
‘CodeAliker’ - Plagiarism Detection on the Cloud
 
G0434045
G0434045G0434045
G0434045
 

Viewers also liked

Unlock the true potential of UC
Unlock the true potential of UCUnlock the true potential of UC
Unlock the true potential of UCCode Software
 
Medidas de tendencia central
Medidas de tendencia centralMedidas de tendencia central
Medidas de tendencia centralLeslie1233
 
The Internet of Things: Patterns for building real world applications
The Internet of Things:  Patterns for building real world applicationsThe Internet of Things:  Patterns for building real world applications
The Internet of Things: Patterns for building real world applicationsIron.io
 
Maddleman Portfolio
Maddleman PortfolioMaddleman Portfolio
Maddleman Portfoliomarcaddleman
 
Vaucluse matin er_septembre_2012-2
Vaucluse matin er_septembre_2012-2Vaucluse matin er_septembre_2012-2
Vaucluse matin er_septembre_2012-2François TAPIEZO
 
Christopher, Im Hoi Tek - CV
Christopher, Im Hoi Tek - CVChristopher, Im Hoi Tek - CV
Christopher, Im Hoi Tek - CVChristopher Im
 
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...Why New Models For Innovation Are Driving Growth, Commercialization, Social I...
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...INTERFACE Health
 
Deloitte 2016 Oil and Gas Industry Survey
Deloitte 2016 Oil and Gas Industry SurveyDeloitte 2016 Oil and Gas Industry Survey
Deloitte 2016 Oil and Gas Industry SurveyMarcellus Drilling News
 
UK Naric 1
UK Naric 1UK Naric 1
UK Naric 1NL Coco
 
Now that SharePoint is Deployed, How do I Measure Success?
Now that SharePoint is Deployed, How do I Measure Success?Now that SharePoint is Deployed, How do I Measure Success?
Now that SharePoint is Deployed, How do I Measure Success?Christian Buckley
 
Concurrency presentation
Concurrency presentationConcurrency presentation
Concurrency presentationTed Wentzel
 
Computer Talk presentation
Computer Talk presentationComputer Talk presentation
Computer Talk presentationTed Wentzel
 
Electrolux Interim Report Q3 2016 - Presentation
Electrolux Interim Report Q3 2016 - PresentationElectrolux Interim Report Q3 2016 - Presentation
Electrolux Interim Report Q3 2016 - PresentationElectrolux Group
 
Concurrency SharePoint Summit 2016 Presentation
Concurrency SharePoint Summit 2016 PresentationConcurrency SharePoint Summit 2016 Presentation
Concurrency SharePoint Summit 2016 PresentationTed Wentzel
 
The Startup Ecosystem - Maxime Pico Startup42
The Startup Ecosystem - Maxime Pico Startup42The Startup Ecosystem - Maxime Pico Startup42
The Startup Ecosystem - Maxime Pico Startup42Maxime Pico
 

Viewers also liked (17)

Unlock the true potential of UC
Unlock the true potential of UCUnlock the true potential of UC
Unlock the true potential of UC
 
Medidas de tendencia central
Medidas de tendencia centralMedidas de tendencia central
Medidas de tendencia central
 
The Internet of Things: Patterns for building real world applications
The Internet of Things:  Patterns for building real world applicationsThe Internet of Things:  Patterns for building real world applications
The Internet of Things: Patterns for building real world applications
 
Maddleman Portfolio
Maddleman PortfolioMaddleman Portfolio
Maddleman Portfolio
 
Vaucluse matin er_septembre_2012-2
Vaucluse matin er_septembre_2012-2Vaucluse matin er_septembre_2012-2
Vaucluse matin er_septembre_2012-2
 
Christopher, Im Hoi Tek - CV
Christopher, Im Hoi Tek - CVChristopher, Im Hoi Tek - CV
Christopher, Im Hoi Tek - CV
 
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...Why New Models For Innovation Are Driving Growth, Commercialization, Social I...
Why New Models For Innovation Are Driving Growth, Commercialization, Social I...
 
Deloitte 2016 Oil and Gas Industry Survey
Deloitte 2016 Oil and Gas Industry SurveyDeloitte 2016 Oil and Gas Industry Survey
Deloitte 2016 Oil and Gas Industry Survey
 
UK Naric 1
UK Naric 1UK Naric 1
UK Naric 1
 
Now that SharePoint is Deployed, How do I Measure Success?
Now that SharePoint is Deployed, How do I Measure Success?Now that SharePoint is Deployed, How do I Measure Success?
Now that SharePoint is Deployed, How do I Measure Success?
 
これでわかる!Webアクセシビリティって?JIS X 8341-3って?
これでわかる!Webアクセシビリティって?JIS X 8341-3って?これでわかる!Webアクセシビリティって?JIS X 8341-3って?
これでわかる!Webアクセシビリティって?JIS X 8341-3って?
 
Concurrency presentation
Concurrency presentationConcurrency presentation
Concurrency presentation
 
Computer Talk presentation
Computer Talk presentationComputer Talk presentation
Computer Talk presentation
 
Electrolux Interim Report Q3 2016 - Presentation
Electrolux Interim Report Q3 2016 - PresentationElectrolux Interim Report Q3 2016 - Presentation
Electrolux Interim Report Q3 2016 - Presentation
 
Concurrency SharePoint Summit 2016 Presentation
Concurrency SharePoint Summit 2016 PresentationConcurrency SharePoint Summit 2016 Presentation
Concurrency SharePoint Summit 2016 Presentation
 
Life with jupyter
Life with jupyterLife with jupyter
Life with jupyter
 
The Startup Ecosystem - Maxime Pico Startup42
The Startup Ecosystem - Maxime Pico Startup42The Startup Ecosystem - Maxime Pico Startup42
The Startup Ecosystem - Maxime Pico Startup42
 

Similar to MOBILE DEVICE FORENSICS USING NLP

fakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxdeepmitra8
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposiumguest5e6f31
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)AltheimPrivacy
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsNina Jeliazkova
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]Frederick Zarndt
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
Turning data into knowledge the impacts of bioinformatics
Turning data into knowledge  the impacts of bioinformaticsTurning data into knowledge  the impacts of bioinformatics
Turning data into knowledge the impacts of bioinformaticsICRISAT
 
Review on Computer Forensic
Review on Computer ForensicReview on Computer Forensic
Review on Computer ForensicEditor IJCTER
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Elia Brodsky
 
Automatic summarization of medical literature
Automatic summarization of medical literatureAutomatic summarization of medical literature
Automatic summarization of medical literatureharinithiyagarajan4
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
 

Similar to MOBILE DEVICE FORENSICS USING NLP (20)

fakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptxfakenews_DBDA_Mar23.pptx
fakenews_DBDA_Mar23.pptx
 
Computer application
Computer application   Computer application
Computer application
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
Text Mining
Text MiningText Mining
Text Mining
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Mik Black bioinformatics symposium
Mik Black bioinformatics symposiumMik Black bioinformatics symposium
Mik Black bioinformatics symposium
 
Integrating Semantic Systems
Integrating Semantic SystemsIntegrating Semantic Systems
Integrating Semantic Systems
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)Digital Forensics by William C. Barker (NIST)
Digital Forensics by William C. Barker (NIST)
 
On chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurementsOn chemical structures, substances, nanomaterials and measurements
On chemical structures, substances, nanomaterials and measurements
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Turning data into knowledge the impacts of bioinformatics
Turning data into knowledge  the impacts of bioinformaticsTurning data into knowledge  the impacts of bioinformatics
Turning data into knowledge the impacts of bioinformatics
 
Review on Computer Forensic
Review on Computer ForensicReview on Computer Forensic
Review on Computer Forensic
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
Automatic summarization of medical literature
Automatic summarization of medical literatureAutomatic summarization of medical literature
Automatic summarization of medical literature
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 

Recently uploaded

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxMuhammadAsimMuhammad6
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxmaisarahman1
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptxrouholahahmadi9876
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 

Recently uploaded (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 

MOBILE DEVICE FORENSICS USING NLP

  • 1. MOBILE DEVICE FORENSICS USING NLP Presented By: Ankita Jadhao Roll no. CSE15S2002 Supervised By: Dr. A. J. Agrawal Department of Computer Science & Engineering Shri Ramdeobaba College of Engineering and Management Nagpur 8/24/2015 1 Department of Computer Science and Engineering
  • 2. Contents • Introduction • Motivation • Review of Literature • Issues • Methodology • Existing System • Advantages • Bibliography 8/24/2015 Department of Computer Science and Engineering 2
  • 3. Introduction • Text mining also is known as Text Data Mining (TDM) and Knowledge Discovery in Textual Database (KDT). • Text Mining Tasks: 1. Exploratory Data Analysis 2. Information Extraction 3.Text Classification Fig.1 Overview of Process 8/24/2015 Department of Computer Science and Engineering 3
  • 4. Introduction Where Text Mining Used • Biomedical applications - To the identification of biological such as protein and gene names as well as chemical compounds and drugs. • Software applications - The mining and analysis processes, and by different firms to improve their results. - Software for tracking and monitoring terrorist activities. • Online media applications - Provide readers with greater search experiences which in turn increases site “stickiness” and revenue. • Marketing applications - In analytical customer relationship management. 8/24/2015 Department of Computer Science and Engineering 4
  • 5. Introduction • Sentiment analysis -Analysis of movie reviews for estimating how favorable a review is for a movie -Text has been used to detect emotions in the related area of affective computing. • Security Application -Monitoring and analysis of online plain text sources such as Internet news, blogs, etc. for national security purposes. - Criminal activity 8/24/2015 Department of Computer Science and Engineering 5
  • 6. Introduction 8/24/2015 Department of Computer Science and Engineering 6 •Use of mobile phones to store and transmit personal and corporate information •Law enforcement, criminals and mobile phone devices •There are limited corpora available • A simple methodology is proposed for feature extraction •What is corpora? A text corpus is a large and structured set of texts.
  • 7. Motivation • Growth of mobile devices is rapid. • The average cell phone user sends over 15,000 texts annually • The average 18-24 year old sends almost 40,000 text messages every year • Most tools and methodologies merely acquire all supported data and dump the output to a spreadsheet or HTML report • Search hits must be manually examined and noted in a report. • Problem 1.Simple keyword searches 2. Limited corpora 8/24/2015 7Department of Computer Science and Engineering
  • 8. Literature Review • Corpora: A corpus linguistics study of SMS text messaging[3] -Tagg developed a text message corpus in British English, but an American English corpus focusing on forensic application is desirable. -Even for neutral text messages (non-drug-related), as the language is significantly different which will skew results. • Integrating Machine Learning into the Forensic Process Approaches:- 1.The digital forensic process can be summarized as preservation, isolation, correlation, and logging [4] 2. Begins with acquisition, then analysis, then concludes with presentation[5] 3. preservation, extraction, and then interpretation[1] 8/24/2015 Department of Computer Science and Engineering 8
  • 9. Literature Review • Natural Language Processing: Dela Rosa and Ellen - Detect linguistic patterns is an invaluable tool when applied to text messaging data - NLTK machine learning algorithms can be applied to a training set and assessed on a test set to create and train an experimental model - Applying k nearest neighbor (kNN) and support vector machines (SVM) machine-learning algorithms to micro-text classification 8/24/2015 Department of Computer Science and Engineering 9
  • 10. Issues in Mobile Forensics • Corpora is not available • Accuracy Problem • Feature Extraction • Micro-Text Problem 8/24/2015 Department of Computer Science and Engineering 10
  • 11. Overview 8/24/2015 Department of Computer Science and Engineering 11 •Mobile Device Forensic Extraction •Text Message Corpus •Feature Extraction •Supervised Machine Learning
  • 12. Mobile Device Forensic Extraction • Text messages were extracted from mobile. • Administrative access to the device was gained by utilizing the redsn0w software to “jailbreak” the device. • The text message database was accessed on the device by navigating to the default location • An MD5 hash value was computed for the text message database file to mathematically verify that the file had not been altered during the execution of the methodology 8/24/2015 Department of Computer Science and Engineering 12
  • 13. Corpora 8/24/2015 Department of Computer Science and Engineering 13 1. First, collect the corpus data 2. Save the text in plain text format 3. Provide an identification of the text at the beginning of it. 4. Carry out any pre-processing of the text 5. The corpus was saved in extensible markup language XML format. Fig 3 Common Structures for Text Corpora
  • 14. Corpora 8/24/2015 Department of Computer Science and Engineering 14 <?xml version="1.0" encoding="UTF-8"?> <corpus_data> <text_message> <class>0</class> <subscriber>1</subscriber> <message_body>Text Message</message_body> <timestamp>9/4/2012 2:40 PM</timestamp> <type>Incoming</type> </text_message> • Class refers to whether or not each individual text message is drug- related (1) or neutral (0). • The data were modified and additional text messages were developed
  • 15. Information Extraction System 8/24/2015 Department of Computer Science and Engineering 15 Fig 1 Simple Pipeline Architecture for an Information Extraction System We first convert the unstructured data of natural language sentences into the structured data. Then getting meaning from text is called Information Extraction
  • 16. Information Extraction System Example: String: We saw the yellow dog 8/24/2015 Department of Computer Science and Engineering 16 Fig 2 Segmentation and Labeling at both the Token and Chunk Levels
  • 17. Feature Extraction 8/24/2015 Department of Computer Science and Engineering 17 Data Representation – “Bag of words” most commonly used: either counts or binary – Can also use “phrases” for commonly occurring combinations of words There are three aspects of feature extraction: • Feature construction; • Feature subset generation (or search strategy); • Evaluation criterion estimation
  • 18. Approach for Feature Extraction 8/24/2015 Department of Computer Science and Engineering 18 •Utilizing a count of known drug-related unigrams as a Feature •NLTK was used to identify bigrams of interest •The alternate approach -Two-word pairs as features and to allow the algorithm to determine which bigrams were most effective in classifying text messages as drug-related or neutral. Example: 1. “After school today let’s go smoke some weed at my house.” 2. “Hey pull that weed in my flower garden when you get home.”
  • 19. Approach for Feature Extraction 8/24/2015 Department of Computer Science and Engineering 19 •While the first text message was drug-related, the second was neutral and would therefore be a false positive. •The hypothesis was that drugrelated terms would exist in frequented bigrams, such as “smoke weed,” “mary jane,” “hit acid,” “pop pilz,” etc. and that these bigrams would increase classification accuracy
  • 20. Algorithm 8/24/2015 Department of Computer Science and Engineering 20
  • 21. Supervised Machine Learning 8/24/2015 Department of Computer Science and Engineering 21 •Input- text message corpus. •Bigrams were selected as features. •System was trained utilizing NLTK’s implementation of the Naïve Bayes classifier . •It was hypothesized that a smaller training set might increase the accuracy.
  • 22. Application’s of Mobile Forensics • Makes SMS analysis techniques highly applicable to Twitter “tweet” analysis. • It is useful for corporate investigation, criminal and civil defense. • Useful for law enforcement investigators to analyze Social Media Profile for evidence of criminal activity 8/24/2015 Department of Computer Science and Engineering 22
  • 23. Conclusion 8/24/2015 Department of Computer Science and Engineering 23 •Natural language processing and machine classification have been applied to mobile device forensic analysis in a unique way •Text message classification and are free to develop a better methodology using the text message corpus. •Develop the more efficient corpora, it has been made available to the research community
  • 24. Future Work 8/24/2015 Department of Computer Science and Engineering 24 •We can overcome on the “micro-text” problem by using more efficient feature extraction techniques •Future research recommendations include determination of the frequency of text messaging between criminal suspects •Calculating the average time span between sent and received messages in text message conversation threads.
  • 25. References 8/24/2015 Department of Computer Science and Engineering 25 1. Daniel R. O’Day and Ricardo A. Calix“TEXT MESSAGE CORPUS: APPLYING NATURAL LANGUAGE PROCESSING TO MOBILE DEVICE FORENSICS”, Purdue University Calumet, 2200 169th Street, Hammond, IN, 46323, USA 2. D. Phuc and N.T.K. Phung, “Using Naïve Bayes model and natural language processing for classifying messages on online forum,” 2007 IEEE International Conference on Research, Innovation and Vision for the Future, pp. 247-252, March 2007 3. A.Smith. “Americans and Text Messaging”. 2011.[Online]. http://pewinternet.org media/Files/Reports/2011/Americans %20and%20Text%20Messaging.pdf 4. B. Carrier, “File System Forensic Analysis”. Boston,MA: Addison-Wesley, 2005, p. 8. 5. C. Altheide and H. Carvey, “Digital Forensics With Open Source Tools”, Waltham, MA: Syngress, 2011. 6. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol, CA: O’Reilly Media, 2009, pp. 221-255.
  • 26. Thank you! 8/24/2015 Department of Computer Science and Engineering 26