SlideShare a Scribd company logo
JSPM’s RAJARSHI SHAHU COLLEGE OF
ENGG. PUNE-33
DEPARTMENT OF COMPUTER ENGG.

BE Computer Engineering
Preliminary Project Presentation
2013-14
1
Identifying Fraudulent Activities Over
Online Application Through Clickstream
Analysis
Group No.: 08
Exam Seat No.

Name of Student

B80374244

Nikita Hiremath

B80374265

Surbhi Sonkhaskar

B80374270

Shital More

Guided by:-Ms. V.M.Barkade

2
Introduction
• Internet has been integrated in day to day activities of human
beings.
• Frauds followed with the advent of e-commerce.
• It is necessary for all the online applications involving
monetary transactions to ensure the safety of money being
invested by people.
• Click stream analysis is one such technique which helps in

detecting frauds by analyzing the user behavior.
3
Problem Statement
To Develop a Business Solution for Identifying the Fraudulent
Activities on Online Application through Click Stream Analysis
using Hadoop.

4
Scope
1.
2.
3.
4.
5.

Detecting only order frauds.
Limited to only detecting the fraudulent user.
No recovery measure
Analysis will be done only for a particular session of an user
There is no restriction over number of clicks or for time
stamp.

5
Literature Survey
• Paper 1:
Wichian Premchaiswadi;Walisa Romsaiyud,” Extracting WebLog of Siam
University for Learning User Behavior on MapReduce”,Siam
University,Thailand.(ICIAS2012)

• Paper 2:
Narayanan Sadgopan;Jie Li,”Characterizing Typical and Atypical User
Sessions in Clickstreams”Yahoo!.WWW2008

• Paper 3:
Bimal Viswanath;Ansley Post;Krishna P.Gummadi;Alan Mislove,”An
Analysis of Social Network-based Sybil Defenses”,MPI-SWS and North
Eastern University .SIGCOMM’10.

6
Literature Survey
Paper 1
Abstract:
MapReduce is a framework that allows developers to write applications that
rapidly process and analyze large volumes of data in a massively parallel
scale.Moreover, a clickstream is a record of a user's activity on the Internet. Using a
clickstream analysis we can collect, analyze, and report aggregate data about which
pages visitors visit in what order – and which are the result of the succession of
mouse clicks each visitor makes. Clickstream analysis can reveal usage patterns
leading to a heightened understanding of users’ behavior. In this paper, we
introduced a novel and efficient web log mining model for web users clustering. In
general, our model consists of three main steps; 1) Computing the similarity
measure of any path in a web page, 2) Defining the k-mean clustering for group
customerID 3) Generating the report based on the Hadoop MapReduce Framework.
Consequently.Our experiments were run on real world data derived from weblogs
of SiamUniversity at Bangkok, Thailand (www.siam.edu).

7
In this paper they have proposed:
The paper has suggested how two algorithms: Calculate the similarity of the
graph and Fuzzy K-mean clustering can be used to analyze the user behavior
using click stream. These algorithm use graphs and data set as
input respectively.

From this paper we have referred:
An already existing systems’ study that has used Clickstream analysis for
studying the behavior of the user over an educational website.

8
Paper 2

Abstract:
Millions of users retrieve information from the Internet using search engines.
Mining these user sessions can provide valuable information about the quality of
user experience and the perceived quality of search results. Often search engines
rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of
user experience. The vast heterogeneity in the user population and presence of
automated software programs (bots) can result in high variance in the estimates of
CTR. To improve the estimation accuracy of user experience metrics like CTR, we
argue that it is important to identify typical and atypical user sessions in
clickstreams. Our approach to identify these sessions is based on detecting outliers
using Mahalanobis distance in the user session space. Our user session model
incorporates several key clickstream characteristics including a novel conformance
score obtained by Markov Chain analysis. Editorial results show that our approach
of identifying typical and atypical sessions has a precision of about 89%. Filtering
out these atypical sessions reduces the uncertainty (95% confidence interval) of the
mean CTR by about 40%. These results demonstrate that our approach of
identifying typical and atypical user sessions is extremely valuable for cleaning
“noisy" user session data for increased accuracy in evaluating user experience.

9
In this paper they have proposed:
Use of Markov Chain analysis to improve the detection of typical and
atypical user sessions. Also they have used Click Through Rate(CTR) to
evaluate the quality of users.

From this paper we have referred:
From this paper we referred to various techniques for analyzing typical
and atypical users depending on the clicks made by the user. It has suggested
few models like Click-based model, Time-based model and Hybrid model,
using which the sessions can be divide and analyzed. The concept of Click
Through Rate is referred from this paper.

10
Paper 3

Abstract:
Recently, there has been much excitement in the research community over using
social networks to mitigate multiple identity, or Sybil, attacks. A number of
schemes have been proposed, but they differ greatly in the algorithms they use and
in the networks upon which they are evaluated. As a result, the research community
lacks a clear understanding of how these schemes compare against each other, how
well they would work on real-world social networks with different structural
properties, or whether there exist other (potentially better) ways of Sybil defense. In
this paper, we show that, despite their considerable differences, existing Sybil
defense schemes work by detecting local communities (i.e., clusters of nodes more
tightly knit than the rest of the graph) around a trusted node. Our finding has
important implications for both existing and future designs of Sybil defense
schemes. First, we show that there is an opportunity to leverage the substantial
amount of prior work on general community detection algorithms in order to defend
against Sybils. Second, our analysis reveals the fundamental limits of current social
network-based Sybil defenses: We demonstrate that networks with well-defined
community structure are inherently more vulnerable to Sybil attacks, and that, in
such networks, Sybils can carefully target their links in order make their attacks
more effective.

11
In this paper they have proposed:
An analysis of Sybil attacks on social networking sites has been
given. They have given how even a well structured site can be
targeted for such attacks.

From this paper we referred:
In this paper we got more information about Sybil Attacks over online social
network. We got an understanding that Sybil attacks over an online shopping
website cannot completely block the site. But partial Sybil attack can be done
through order frauds.

12
Requirement Analysis
Software Requirement
•
•
•
•

Hardware Requirements

Shell Script
Apache Hadoop 0.20.x
Pig Script 0.9.1
Ubuntu 12.04

• Processor :Intel Pentium IV
2.1 GHz or above
• Clock speed:500 MHz
• RAM:128MB
• HD:20 GB or higher

13
Proposed System
Data gathering

Extraction of weblogs

Storing and structuring data

Pattern matching and map reduce algorithm
Data analysis

HDFS

Data visualization
14
SYSTEM DIAGRAMS
•
•
•
•
•
•
•
•
•
•
•

Class diagram
State Transition Diagram
System Architecture Diagram
Use case diagram
Activity Diagram
Object Diagram
Sequence Diagram
Collaboration Diagram
State chart Diagram
Component Diagram
Deployment Diagram
15
WORKING OF THE SYSTEM

USER 1

USER 2

USER 3

Extracting
Weblogs
FLUME
AGENT
16
FLUME
AGENT
PROVIDING
WEBLOGS

SERVER

STORING AND
STRUCTURING
DATA

HDFS

17
HDFS

DATA NODES

PROVIDING
THE MATCHED
VALUES

PATTERN
MATCHING
ALGORITHM
18
DATA
ANALYSIS

HDFS

PROVIDING
PROCESSED
DATA

DATA
VISUALIZATION
USING DATA
ANALYTICS
TOOLS

SERVER
6
4 Se
2
0 rie
s
1

10
5
0

19
Algorithms
The various pattern matching algorithm that can be applied are:1.
2.
3.
4.

Brute force algorithm
Boyer Moore algorithm
Not so naïve algorithm
Knuth-Morris-Pratt algorithm

Out of all the above listed algorithms, we are going to use the KnuthMorris-Pratt algorithm since it is most efficient algorithm for matching short as
well as long patterns.

20
MATHEMATICAL MODEL
Bernoulli’s Distribution:
This distribution best describes all situations where a "trial" is made resulting
in either "success" or "failure," such as when tossing a coin, or when modeling
the success or failure of a surgical procedure. The Bernoulli distribution is
defined as:
f(x) = px (1-p)1-x, for x = 0,
where, p is the probability that a particular event (e.g.,success) will occur

Arithmetic Mean:
The arithmetic mean of a set of data is found by taking the sum of the
data, and then dividing the sum by the total number of values in the set. A
mean is commonly referred to as an average.
n/sum(n)
where
n is total number of elements

Arithmetic Mode:
Mode is a most frequently occurring value in frequency distribution.
21
Arithmetic Median:
Median is the “middle number” value in number.

Variance:
The variance (σ2), is defined as the sum of the squared distances of each
term in the distribution from the mean (μ), divided by the number of terms in
the distribution (N).

22
Future Scope
• This system can be implemented for any online commercial application.
• Currently only detection of fraudulent users is being done, the system can

be expanded to undertake the necessary authentication steps.

23
References
1.

SADAGOPAN, N., AND LI, J. Characterizing typical and atypical user
sessions in clickstreams. In Proc. of WWW(2008).

2.

You are How You Click: Clickstream Analysis for Sybil Detection Gang
Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng and
Ben Y. Zhao.

3.

Wichian Premchaiswadi, Walisa Romsaiyud Extracting WebLog of Siam
University for Learning User Behavior on MapReduce.

4.

YU, H., KAMINSKY, M., GIBBONS, P. B., AND FLAXMAN,A.
Sybilguard: defending against sybil attacks via social networks. In Proc.
of SIGCOMM (2006).

5.

DOUCEUR, J. R. The Sybil attack. In Proc. of IPTPS(2002).
24
THANK YOU!

25

More Related Content

What's hot

Web Analytics in 10 slides
Web  Analytics in 10 slidesWeb  Analytics in 10 slides
Web Analytics in 10 slides
Aishwarya Saseendran
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
Masih Nabizadeh
 
Web Metircs and KPI
Web Metircs and KPIWeb Metircs and KPI
Web Metircs and KPI
Shipra Malik
 
Click stream analysis and hadoop framwork
Click stream analysis and hadoop framworkClick stream analysis and hadoop framwork
Click stream analysis and hadoop framwork
Marwadi Univercity
 
Cursorcomp ipm
Cursorcomp ipmCursorcomp ipm
Cursorcomp ipm
Ouzza Brahim
 
Dynamic Organization of User Historical Queries
Dynamic Organization of User Historical QueriesDynamic Organization of User Historical Queries
Dynamic Organization of User Historical Queries
IJMER
 
BAQMaR - Conference DM
BAQMaR - Conference DMBAQMaR - Conference DM
BAQMaR - Conference DM
BAQMaR
 
Web Analytics Concepts & Theories
Web Analytics Concepts & TheoriesWeb Analytics Concepts & Theories
Web Analytics Concepts & Theories
mattPROv1
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
butest
 
Website Analytics
Website AnalyticsWebsite Analytics
Website Analytics
Visitor Analytics
 
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group:  Google Analytics for BeginnersAffiliate Summit Orlando Meetup Group:  Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
Missy Ward
 
Technical SEO
Technical SEOTechnical SEO
Technical SEO
Visitor Analytics
 

What's hot (12)

Web Analytics in 10 slides
Web  Analytics in 10 slidesWeb  Analytics in 10 slides
Web Analytics in 10 slides
 
web analytics overview
web analytics overviewweb analytics overview
web analytics overview
 
Web Metircs and KPI
Web Metircs and KPIWeb Metircs and KPI
Web Metircs and KPI
 
Click stream analysis and hadoop framwork
Click stream analysis and hadoop framworkClick stream analysis and hadoop framwork
Click stream analysis and hadoop framwork
 
Cursorcomp ipm
Cursorcomp ipmCursorcomp ipm
Cursorcomp ipm
 
Dynamic Organization of User Historical Queries
Dynamic Organization of User Historical QueriesDynamic Organization of User Historical Queries
Dynamic Organization of User Historical Queries
 
BAQMaR - Conference DM
BAQMaR - Conference DMBAQMaR - Conference DM
BAQMaR - Conference DM
 
Web Analytics Concepts & Theories
Web Analytics Concepts & TheoriesWeb Analytics Concepts & Theories
Web Analytics Concepts & Theories
 
LyonALMProposal20041018.doc
LyonALMProposal20041018.docLyonALMProposal20041018.doc
LyonALMProposal20041018.doc
 
Website Analytics
Website AnalyticsWebsite Analytics
Website Analytics
 
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group:  Google Analytics for BeginnersAffiliate Summit Orlando Meetup Group:  Google Analytics for Beginners
Affiliate Summit Orlando Meetup Group: Google Analytics for Beginners
 
Technical SEO
Technical SEOTechnical SEO
Technical SEO
 

Similar to Clickstream ppt copy

Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...
IJECEIAES
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
IRJET Journal
 
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
RSIS International
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection
IJSTA
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
IRJET Journal
 
An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...
csandit
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
ShrutiGarg649495
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...
csandit
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
cscpconf
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
ajagbesundayadeola
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data Mining
Anavadya Shibu
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
IRJET Journal
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
IRJET Journal
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
Zakaria Zubi
 
Major
MajorMajor
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
Nilu Desai
 
M phil-computer-science-biometric-system-projects
M phil-computer-science-biometric-system-projectsM phil-computer-science-biometric-system-projects
M phil-computer-science-biometric-system-projects
Vijay Karan
 
M.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System ProjectsM.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System Projects
Vijay Karan
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEEFINALYEARSTUDENTPROJECTS
 

Similar to Clickstream ppt copy (20)

Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...Concept drift and machine learning model for detecting fraudulent transaction...
Concept drift and machine learning model for detecting fraudulent transaction...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning TechniquesAnalysis on Fraud Detection Mechanisms Using Machine Learning Techniques
Analysis on Fraud Detection Mechanisms Using Machine Learning Techniques
 
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
Trustworthy Sensing for Public Safety in Cloud Centric Things of Internet wit...
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection
 
Automated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning ModelsAutomated Feature Selection and Churn Prediction using Deep Learning Models
Automated Feature Selection and Churn Prediction using Deep Learning Models
 
An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...An improvised model for identifying influential nodes in multi parameter soci...
An improvised model for identifying influential nodes in multi parameter soci...
 
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...A mathematical model of access control in big data using confidence interval ...
A mathematical model of access control in big data using confidence interval ...
 
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
A MATHEMATICAL MODEL OF ACCESS CONTROL IN BIG DATA USING CONFIDENCE INTERVAL ...
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data Mining
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
 
Using Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime PatternUsing Data Mining Techniques to Analyze Crime Pattern
Using Data Mining Techniques to Analyze Crime Pattern
 
Major
MajorMajor
Major
 
data mining for terror attacks
data mining for terror attacksdata mining for terror attacks
data mining for terror attacks
 
M phil-computer-science-biometric-system-projects
M phil-computer-science-biometric-system-projectsM phil-computer-science-biometric-system-projects
M phil-computer-science-biometric-system-projects
 
M.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System ProjectsM.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System Projects
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Clickstream ppt copy

  • 1. JSPM’s RAJARSHI SHAHU COLLEGE OF ENGG. PUNE-33 DEPARTMENT OF COMPUTER ENGG. BE Computer Engineering Preliminary Project Presentation 2013-14 1
  • 2. Identifying Fraudulent Activities Over Online Application Through Clickstream Analysis Group No.: 08 Exam Seat No. Name of Student B80374244 Nikita Hiremath B80374265 Surbhi Sonkhaskar B80374270 Shital More Guided by:-Ms. V.M.Barkade 2
  • 3. Introduction • Internet has been integrated in day to day activities of human beings. • Frauds followed with the advent of e-commerce. • It is necessary for all the online applications involving monetary transactions to ensure the safety of money being invested by people. • Click stream analysis is one such technique which helps in detecting frauds by analyzing the user behavior. 3
  • 4. Problem Statement To Develop a Business Solution for Identifying the Fraudulent Activities on Online Application through Click Stream Analysis using Hadoop. 4
  • 5. Scope 1. 2. 3. 4. 5. Detecting only order frauds. Limited to only detecting the fraudulent user. No recovery measure Analysis will be done only for a particular session of an user There is no restriction over number of clicks or for time stamp. 5
  • 6. Literature Survey • Paper 1: Wichian Premchaiswadi;Walisa Romsaiyud,” Extracting WebLog of Siam University for Learning User Behavior on MapReduce”,Siam University,Thailand.(ICIAS2012) • Paper 2: Narayanan Sadgopan;Jie Li,”Characterizing Typical and Atypical User Sessions in Clickstreams”Yahoo!.WWW2008 • Paper 3: Bimal Viswanath;Ansley Post;Krishna P.Gummadi;Alan Mislove,”An Analysis of Social Network-based Sybil Defenses”,MPI-SWS and North Eastern University .SIGCOMM’10. 6
  • 7. Literature Survey Paper 1 Abstract: MapReduce is a framework that allows developers to write applications that rapidly process and analyze large volumes of data in a massively parallel scale.Moreover, a clickstream is a record of a user's activity on the Internet. Using a clickstream analysis we can collect, analyze, and report aggregate data about which pages visitors visit in what order – and which are the result of the succession of mouse clicks each visitor makes. Clickstream analysis can reveal usage patterns leading to a heightened understanding of users’ behavior. In this paper, we introduced a novel and efficient web log mining model for web users clustering. In general, our model consists of three main steps; 1) Computing the similarity measure of any path in a web page, 2) Defining the k-mean clustering for group customerID 3) Generating the report based on the Hadoop MapReduce Framework. Consequently.Our experiments were run on real world data derived from weblogs of SiamUniversity at Bangkok, Thailand (www.siam.edu). 7
  • 8. In this paper they have proposed: The paper has suggested how two algorithms: Calculate the similarity of the graph and Fuzzy K-mean clustering can be used to analyze the user behavior using click stream. These algorithm use graphs and data set as input respectively. From this paper we have referred: An already existing systems’ study that has used Clickstream analysis for studying the behavior of the user over an educational website. 8
  • 9. Paper 2 Abstract: Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user experience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our approach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics including a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence interval) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning “noisy" user session data for increased accuracy in evaluating user experience. 9
  • 10. In this paper they have proposed: Use of Markov Chain analysis to improve the detection of typical and atypical user sessions. Also they have used Click Through Rate(CTR) to evaluate the quality of users. From this paper we have referred: From this paper we referred to various techniques for analyzing typical and atypical users depending on the clicks made by the user. It has suggested few models like Click-based model, Time-based model and Hybrid model, using which the sessions can be divide and analyzed. The concept of Click Through Rate is referred from this paper. 10
  • 11. Paper 3 Abstract: Recently, there has been much excitement in the research community over using social networks to mitigate multiple identity, or Sybil, attacks. A number of schemes have been proposed, but they differ greatly in the algorithms they use and in the networks upon which they are evaluated. As a result, the research community lacks a clear understanding of how these schemes compare against each other, how well they would work on real-world social networks with different structural properties, or whether there exist other (potentially better) ways of Sybil defense. In this paper, we show that, despite their considerable differences, existing Sybil defense schemes work by detecting local communities (i.e., clusters of nodes more tightly knit than the rest of the graph) around a trusted node. Our finding has important implications for both existing and future designs of Sybil defense schemes. First, we show that there is an opportunity to leverage the substantial amount of prior work on general community detection algorithms in order to defend against Sybils. Second, our analysis reveals the fundamental limits of current social network-based Sybil defenses: We demonstrate that networks with well-defined community structure are inherently more vulnerable to Sybil attacks, and that, in such networks, Sybils can carefully target their links in order make their attacks more effective. 11
  • 12. In this paper they have proposed: An analysis of Sybil attacks on social networking sites has been given. They have given how even a well structured site can be targeted for such attacks. From this paper we referred: In this paper we got more information about Sybil Attacks over online social network. We got an understanding that Sybil attacks over an online shopping website cannot completely block the site. But partial Sybil attack can be done through order frauds. 12
  • 13. Requirement Analysis Software Requirement • • • • Hardware Requirements Shell Script Apache Hadoop 0.20.x Pig Script 0.9.1 Ubuntu 12.04 • Processor :Intel Pentium IV 2.1 GHz or above • Clock speed:500 MHz • RAM:128MB • HD:20 GB or higher 13
  • 14. Proposed System Data gathering Extraction of weblogs Storing and structuring data Pattern matching and map reduce algorithm Data analysis HDFS Data visualization 14
  • 15. SYSTEM DIAGRAMS • • • • • • • • • • • Class diagram State Transition Diagram System Architecture Diagram Use case diagram Activity Diagram Object Diagram Sequence Diagram Collaboration Diagram State chart Diagram Component Diagram Deployment Diagram 15
  • 16. WORKING OF THE SYSTEM USER 1 USER 2 USER 3 Extracting Weblogs FLUME AGENT 16
  • 20. Algorithms The various pattern matching algorithm that can be applied are:1. 2. 3. 4. Brute force algorithm Boyer Moore algorithm Not so naïve algorithm Knuth-Morris-Pratt algorithm Out of all the above listed algorithms, we are going to use the KnuthMorris-Pratt algorithm since it is most efficient algorithm for matching short as well as long patterns. 20
  • 21. MATHEMATICAL MODEL Bernoulli’s Distribution: This distribution best describes all situations where a "trial" is made resulting in either "success" or "failure," such as when tossing a coin, or when modeling the success or failure of a surgical procedure. The Bernoulli distribution is defined as: f(x) = px (1-p)1-x, for x = 0, where, p is the probability that a particular event (e.g.,success) will occur Arithmetic Mean: The arithmetic mean of a set of data is found by taking the sum of the data, and then dividing the sum by the total number of values in the set. A mean is commonly referred to as an average. n/sum(n) where n is total number of elements Arithmetic Mode: Mode is a most frequently occurring value in frequency distribution. 21
  • 22. Arithmetic Median: Median is the “middle number” value in number. Variance: The variance (σ2), is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N). 22
  • 23. Future Scope • This system can be implemented for any online commercial application. • Currently only detection of fraudulent users is being done, the system can be expanded to undertake the necessary authentication steps. 23
  • 24. References 1. SADAGOPAN, N., AND LI, J. Characterizing typical and atypical user sessions in clickstreams. In Proc. of WWW(2008). 2. You are How You Click: Clickstream Analysis for Sybil Detection Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng and Ben Y. Zhao. 3. Wichian Premchaiswadi, Walisa Romsaiyud Extracting WebLog of Siam University for Learning User Behavior on MapReduce. 4. YU, H., KAMINSKY, M., GIBBONS, P. B., AND FLAXMAN,A. Sybilguard: defending against sybil attacks via social networks. In Proc. of SIGCOMM (2006). 5. DOUCEUR, J. R. The Sybil attack. In Proc. of IPTPS(2002). 24