SlideShare a Scribd company logo
1 of 34
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 1
CHAPTER1
PREAMBLE
1.1 Introduction
Many approaches have been successfully developed to detect online spam. Fangtao Li
and colleagues1 initially analyzed several attributes related to spam behavior, such as content,
sentiment, product, and metadata features, and exploited a two-view semi supervised method to
identify spam reviews. Song Feng and colleagues2 defined three types of reviewers (any-time,
multi-time, and single- time reviewers) and statistically made distributional footprints of
deceptive reviews by using neuro-linguistic programming (NLP) techniques. Geli Fei and
colleagues3 proposed a model to detect spammed products or product groups by comparing the
differences in rating behaviors between suspicious and normal users.
All these models rely on content features that can be easily found by inserting special
characters, but other features, such as temporal and network information, have been employed as
well. Qian Xu and colleagues4 collected large-scale real-world datasets from telecommunication
service providers and combined temporal and user network information to classify spammers
using Short Message Service (SMS). Sihong Xie and colleagues5 proposed a model that uses
only temporal features, with no semantic or rating behavior analysis, to detect abnormal bursts
as the number of reviews increases. Finally, Tyler Moore and colleagues6 studied the problem of
temporal correlations between spam and phishing websites.
Intuitively, these works can also be used to uncover sophisticated spam strategies.
Amazon has sued more than 1,000 product review sellers who sell fake promotions on
Fiverr.com (one of the most famous being Spam Reviewer Cloud;
http://money.cnn.com/2015/10/18 /technology/amazon-lawsuit-fake -reviews). On such user
cloud platforms, business owners can purchase anonymous comments generated by real users by
paying for them. It makes spam detection very challenging, as the advent of a massive number
of apparently genuine fake reviewers (which we refer to as “genuine fakes” in this article) makes
the fraud pattern much more nebulous to track.
To date many third party platforms have created various fake review markets for online
product sellers and fake review providers. In real-world business processes, massive numbers of
random but genuine fake review providers conduct real transactions and write positive
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 2
comments to claim a bonus (many e-commerce websites think they can reduce spam reviews by
allowing only real buyers to write them). Existing research ignores the latent connections in
product networks, which are difficult to discover, especially when these spam activities have
become a hyping and advertising investment that has gained increased popularity among
homogeneous competitors online. Thus, antispam rules can be easily avoided, which also
impairs the efficiency and effectiveness of detection performance. In this work, we coin a new
solution— collaborative marketing hyping detection—that aims to detect groups of online stores
that simultaneously adopt marketing hyping. [1]
This field involves various challenges:
• How can heterogeneous product information network be defined to infer their latent
collaborative hyping behaviors? Network information might not be directly observed in the
original datasets, so we need to build up a relationship matrix between products to represent
their underlying correlation.
• What features need to be selected to best solve our problem? Traditional features such as
semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly
evolving spam strategies. Hence, we need to choose dedicated features according to our specific
scenario.
• How can we design a model that effectively identifies collaborative marketing hyping
behavior? A model that can employ the power of heterogeneous product networks to discover
collective hyping behavior is required here.
To overcome these challenges, we propose an unsupervised shapelet learning model to discover
the temporal features of product reviews and then integrate the heterogeneous product network
information as regularization terms, to discover the products that are subject to collaborative
hyping. We define three regularization terms that reflect the underlying correlations among
users, products, and online store networks.[1]
The beginning configuration procedure of recognizing these subsystems and building up a
structure for subsystem control and correspondence is called construction modeling outline and
the yield of this outline procedure is a portrayal of the product structural planning.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 3
1.2 Objectives And Goals
A new solution aims to identify spam comments and detect products that adopt an evolving
spam strategy for promotion. Specifically, an unsupervised learning model combines
heterogeneous product review networks to discover collective hyping activities.
1.3 Existing System
Traditional features such as semantic clues or user relations might no longer be suitable
for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose
dedicated features according to our specific scenario.
Disadvantages:
 Decreases the inaccuracy caused by only using the user name information.
 Stores usually purchase fake reviews periodically.
1.4 ProposedSystem
We propose an unsupervised shape let learning model to discover the temporal features of
product reviews and then integrate the heterogeneous product network information as
regularization terms, to discover the products that are subject to collaborative hyping. We define
three regularization terms that reflect the underlying correlations among users, products, and
online store networks.
Advantages:
 Gained increased popularity among homogeneous competitors online.
 Efficiency and effectiveness of detection performance.
 Aims to detect groups of online stores that simultaneously adopt marketing hyping.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 4
CHAPTER 2
LITERATURE SURVEY
 Learning to Identify Review Spam by Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan
Zhu [1]
In this paper, we study the review spam identification task in our product review mining
system. We manually build a review spam collection based on our crawled reviews. We first
employ supervised learning methods and analyze the effect of different features in review spam
identification. We also observe that the spammer consistently writes spam. This provides us
another view to identify review spam: we can identify if the author of the review is spammer.
Based on the observation, we provide a two-view semi-supervised methods to exploit the large
amount of unlabeled data. The experiment results show that the two-view co-training algorithms
can achieve better results than the single-view algorithm. Our designed machine learning
methods achieve significant improvements as compared with the heuristic baselines.
 Distributional Footprints of Deceptive Product Reviews by Song Feng, Longfei Xing,
Anupam Gogar & Yejin Choi [2]
This paper postulates that there are natural distributions of opinions in product reviews.
In particular, we hypothesize that for a given domain, there is a set of representative
distributions of review rating scores. A deceptive business entity that hires people to write fake
reviews will necessarily distort its distribution of review scores, leaving distributional footprints
behind. In order to validate this hypothesis, we introduce strategies to create dataset with
pseudo-gold standard that is labeled automatically based on different types of distributional
footprints. A range of experiments confirm the hypothesized connection between the
distributional anomaly and deceptive reviews. This study also provides novel quantitative
insights into the characteristics of natural distributions of opinions in the Trip Advisor hotel
review and the Amazon product review domains.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 5
 Exploiting Burstiness in Reviews for Review Spammer Detection by Geli Fei, Arjun
Mukherjee & Bing Liu [3]
In this paper, we proposed to exploit bursts in detecting opinion spammers due to the
similar nature of reviewers in a burst. A graph propagation method for identifying spammers
was presented. A novel evaluation method based on supervised learning was also described to
deal with the difficult problem of evaluation without ground truth data, which classifies reviews
based on a different set of features from the features used in identifying spammers. Our
experimental results using Amazon.com reviews from the software domain showed that the
proposed method is effective, which not only demonstrated its effectiveness objectively based
on supervised learning (or classification), but also subjectively based on human expert
evaluation. The fact that the supervised learning/classification results are consistent with human
judgment also indicates that the proposed supervised learning based evaluation technique is
justified.
 Topic: SMS Spam Detection Using Non-Content Features by Qian Xu, Evan Wei Xiang
and Qiang Yang [4]
In this paper, we have examined mobile-phone SMS message features from static,
network and temporal views, and proposed an effective way to identify important features that
can be used to construct an anti-spam algorithm. We exploited a temporal analysis to design
features that can detect SMS spammers with both high performance, and incorporated these
features into an SVM classification algorithm. Our evaluation on a real SMS dataset showed that
the temporal features and network features can be effectively incorporated to build an SVM
classifier, with a gain of around 8% in improvement on AUC, as compared with those that are
only based on conventional static features.
 Topic: Temporal Correlations between Spam and Phishing Websites by Tyler Moore,
Richard Clayton & Henry Stern [6]
Empirical study of malicious online activity is hard. Attackers remain elusive,
compromises happen fast, and strategies change frequently. Unfortunately, each of these factors
cannot be changed. In this paper, we have combined phishing website lifetimes with detailed
spam data, and consequently we have provided several new insights. First, we have
demonstrated the gravity of the threat posed by attackers using fast-flux techniques. They send
out 68% of spam while hosting only 3% of all phishing websites. They also transmit spam
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 6
effectively: the bulk is sent out early, it stops once the site is removed, and keeps going
whenever websites are overlooked by the take-down companies. In this respect, we also
conclude that long-lived phishing websites continue to cause harm and should be taken down.
 A Shapelet Transfer for Time Series Classification by Jason Lines, Luke M. Davis &
Anthony Bagnall [8]
In this paper, we have proposed a shapelet transform for TSC that extracts the k best
shapelets from a dataset in a single pass. We implement this using a novel caching algorithm to
store shapelets, and apply a simple, parameter-free cross-validation approach for extracting the
most significant shapelets. We transform a total of 26 data sets with our filter and demonstrate
that a C4.5 decision tree classifier trained with transformed data is competitive with an
implementation of the original shapelet decision tree. We show that our filtered data can be
applied to further, non-tree based classifiers to achieve improved classification performance,
whilst still maintaining the interpretability of shapelets. We provide two implementations of the
filter using different quality measures for discriminating between shapelets; we use information
gain as proposed by in the first, and introduce the application of the F-statistic as an evaluation
method for shapelets in the second. We show that classifiers trained using features derived from
an F-statistic filter are competitive with classifiers trained with the information gain approach,
whilst being easier to apply to multi-class classification problems. Finally, we provide
exploratory data analysis of the shapelets extracted by our Filter on the Gun=NoGun problem
and compare them with the output of 20. We show that the shapelets we find are consistent with
the discriminatory shapelet in the original work, and show that our approach can lead to further
insight into the problem by looking at a number of the top shapelets.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 7
CHAPTER 3
SYSTEM DESIGN
3.1 DesignConsideration
The reason for the design is to arrange the arrangement of the issue determined by the
necessities report. This stage is the initial phase in moving from issue to the arrangement space.
As such, beginning with what is obliged; outline takes us to work towards how to full fill
those needs. The configuration of the framework is maybe the most basic component influencing
the nature of the product and has a noteworthy effect on the later stages, especially testing and
upkeep.
Framework outline depicts all the significant information structure, document
arrangement, yield and real modules in the framework and their Specification is chosen.
3.2 System Architecture
The architectural configuration procedure is concerned with building up a fundamental
basic system for a framework. It includes recognizing the real parts of the framework and
interchanges between these segments.
The beginning configuration procedure of recognizing these subsystems and building up a
structure for subsystem control and correspondence is called construction modeling outline and
the yield of this outline procedure is a portrayal of the product structural planning. [5]
The proposed architecture for this system is given below. It shows the way this system is
designed and brief working of the system.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 8
Check Hyping Quality and Pay
($)
Post TP and Pay Fake Review
($) Quality Guarentee
Purchase & Hyping (€)
(€)
(€) : Fake reviewers make genuine purchases
($) : Store owners pay fake reviewers and purchasing cost through user cloud
Fig 3.1 System Architecture
Spammer Cloud
Fake
Reviewers
Online
Stores
Target
Products
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 9
3.3 Use Case Diagram
Use case diagram shows the various interactions of actors with a system. Use case is a
coherent piece of functionality that a system can provide by interacting with actors. Actors are
the external end users of the system.
Fig 3.2 Use Case Diagram
Register
Login
Browse products
Buy products
Give reviews
Analyze reviews
Detect spams
Display spammers
Block spam users
User
Admin
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 10
3.4 Dataflow Diagram
The DFD is straightforward graphical formalism that can be utilized to speak to a
framework as far as the info information to the framework, different preparing did on this
information and the yield information created by the framework.
A DFD model uses an exceptionally predetermined number of primitive images to speak
to the capacities performed by a framework and the information stream among the capacities.
The principle motivation behind why the DFD method is so famous is most likely in
light of the way that DFD is an exceptionally basic formalism.
It is easy to comprehend and utilization. Beginning with the arrangement of abnormal
state works that a framework performs, a DFD display progressively speaks to different sub
capacities. Actually, any various leveled model is easy to get it.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 11
Fig 3.3 Data Flow Diagram
LIST OF
PRODUCTS
BUY
PRODUCT
REVIEW PRODUCTS
GET TOTAL
REVIEW
EXTRACT
FEATURE
NLP
SPAM
DETECTION
FETCH DETAILS WEBSITEUSER
SET OF
REVIEWS
BLOCKED
USER
GET
DETAILS
SEARCH
PRODUCTS
REQUEST
DATA
EXTRACT
DATA
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 12
3.4 Activity Diagram
Activity diagram is another important diagram in UML to describe dynamic aspects of
the system. Activity diagram is basically a flow chart to represent the flow from one activity to
another activity. The activity can be described as an operation of the system. So the control flow
is drawn from one operation to another.
Fig 3.4 Activity Diagra
BROWSE PRODUCTS
BUY PRODUCTS
FEEDBACK
PROCESS REVIEW
EVALUATE USERS SPAM ANALYSIS
NATURAL LANG. PROCESSING
RESULTS FOR SPAMMERS
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 13
CHAPTER 4
SYSTEM REQUIREMENT SPECIFICATION
System Requirement Specification (SRS) is a central report, which frames the establishment of
the product advancement process. It records the necessities of a framework as well as has a
depiction of its significant highlight. A SRS is essentially an association's seeing (in composing)
of a client or potential customer's frame work necessities and conditions at a specific point in
time (generally) before any genuine configuration or improvement work. It's a two-way
protection approach that guarantees that both the customer and the association comprehend
alternate's necessities from that viewpoint at a given point in time.
The composition of programming necessity detail lessens advancement exertion, as watchful
audit of the report can uncover oversights, mistaken assumptions, and irregularities ahead of
schedule in the improvement cycle when these issues are less demanding to right. The SRS talks
about the item however not the venture that created it, consequently the SRS serves as a premise
for later improvement of the completed item.
The SRS may need to be changed, however it does give an establishment to proceed with
creation assessment. In straightforward words, programming necessity determination is the
beginning stage of the product improvement action. The SRS means deciphering the thoughts in
the brains of the customers – the information, into a formal archive – the yield of the prerequisite
stage. Subsequently the yield of the stage is a situated of formally determined necessities, which
ideally are finished and steady, while the data has none of these properties.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 14
4.1 Hardware Requirements
The most common set of requirements defined by any operating system or software
application is the physical computer resources, also known as hardware, A hardware
requirements list is often accompanied by a hardware compatibility list (HCL), especially
in case of operating systems. The hardware requirements are a follows.,
 System : Intel i3 2.1 GHZ
 Memory : 4GB.
 Hard Disk : 40 GB.
 Monitor : 15 VGA Color
4.2 Software Requirements:
Software requirements may be calculations, technical details, data manipulation
and processing and other specific functionality that define what a system is supposed to
accomplish. Behavioural requirements describing all the cases where the system uses the
functional requirements are captured in use cases. These are things that the system is
required to do.
 Operating System : Windows 7 / 8
 Language : JAVA / J2EE
 Database : MySQL
 Tool : NetBeans, Navicat, Tomcat Server
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 15
Java Server Pages
JavaServer Pages(JSP) are a technology that helps software developers create
dynamically generated web pages based on HTML,XML, or other document types. Released in
1999 by Sun Microsystems, JSP is similar to PHP, but it uses the Java programming language.
To deploy and run JavaServer Pages, a compatible web server with a servlet container, such
as Apache Tomcat or Jetty, is required.
Architecturally, JSP may be viewed as a high-level abstraction of Java servlets. JSPs are
translated into servlets at runtime each JSP, servlet is cached and re-used until the original JSP is
modified.
JSP can be used independently or as the view component of a server-side model–view–
controller design, normally with JavaBeans as the model and Java servlets (or a framework such
as Apache Struts) as the controller. This is a type of Model 2 architecture.
JSP allows Java code and certain pre-defined actions to be interleaved with static web
markup content, with the resulting page being compiled and executed on the server to deliver a
document. The compiled pages, as well as any dependent Java libraries, use Java bytecode rather
than a native software format. Like any other Java program, they must be executed within a Java
virtual machine (JVM) that integrates with the server's host operating system to provide an
abstract platform-neutral environment.
JSPs are usually used to deliver HTML and XML documents, but through the use of
Output Stream, they can deliver other types of data as well. The Web container creates JSP
implicit objects like page Context, Servlet Context, session, request & response.
Fig 4.1 JSP Model
Web Browser
Data Sources/Database
Instantiate
Server
Servelet Filter
(Controller)
JSP Pages
(View)
JavaBeans
(Model)
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 16
A JavaServer Pages compiler is a program that parses JSPs, and transforms them into
executable Java Servlets. A program of this type is usually embedded into the application
server and run automatically the first time a JSP is accessed, but pages may also be recompiled
for better performance, or compiled as a part of the build process to test for errors. Some JSP
containers support configuring how often the container checks JSP file timestamps to see
whether the page has changed. Typically, this timestamp would be set to a short interval
(perhaps seconds) during software development, and a longer interval (perhaps minutes, or even
never) for a deployed Web application.
Java Servlet
The servlet is a Java programming language class used to extend the capabilities of
a server. Although servlets can respond to any types of requests, they are commonly used to
extend the applications hosted by web servers, so they can be thought of as Java applets that run
on servers instead of in web browsers. These kinds of servlets are the Java counterpart to other
dynamic Web content technologies such as PHP and ASP.NET.
Response Request
(a)
JSP Container
(a) Translation occurs at this point, if JSP has been changed or is new.
(b) If not, translation is skipped.
Fig. 4.2 Life of a JSP File
JSP Page
(.JSP) (b)
Translation Phase
Execution
Phase
JSP
Translator
(Tomcat)
Servelet
Source Code
(Java)
Java Compiler
(embedded
server)
Server
Class
(.class)
Text
Buffer
(in
memory)
JRE
System
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 17
Three methods are central to the life cycle of a servlet. These are init(), service(),
and destroy(). They are implemented by every servlet and are invoked at specific times by the
server.
 During the initialization stage of the servlet life cycle, the web container initializes the
servlet instance by calling the init() method, passing an object implementing the
javax.servlet.ServletConfig interface. This configuration object allows the servlet to
access name-value initialization parameters from the web application.
 After initialization, the servlet instance, can service client requests. Each request is serviced
in its own separate thread. The web container calls the service() method of the servlet for
every request. The service() method determines the kind of request being made and
dispatches it to an appropriate method to handle the request. The developer of the servlet
must provide an implementation for these methods. If a request is made for a method that is
not implemented by the servlet, the method of the parent class is called, typically resulting in
an error being returned to the requester.
 Finally, the web container calls the destroy() method that takes the servlet out of service.
The destroy() method, like init(), is called only once in the lifecycle of a servlet.
The following is a typical user scenario of these methods.
1. Assume that a user requests to visit a URL.
 The browser then generates an HTTP request for this URL.
 This request is then sent to the appropriate server.
2. The HTTP request is received by the web server and forwarded to the servlet container.
 The container maps this request to a particular servlet.
 The servlet is dynamically retrieved and loaded into the address space of the
container.
3. The container invokes the init() method of the servlet.
 This method is invoked only when the servlet is first loaded into memory.
 It is possible to pass initialization parameters to the servlet so that it may configure
itself.
4. The container invokes the service() method of the servlet.
 This method is called to process the HTTP request.
 The servlet may read data that have been provided in the HTTP request.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 18
 The servlet may also formulate an HTTP response for the client.
5. The servlet remains in the container's address space and is available to process any other
HTTP requests received from clients.
 The service() method is called for each HTTP request.
6. The container may, at some point, decide to unload the servlet from its memory.
 The algorithms by which this decision is made are specific to each container.
7. The container calls the servlet's destroy() method to relinquish any resources such as file
handles that are allocated for the servlet; important data may be saved to a persistent store.
8. The memory allocated for the servlet and its objects can then be garbage collected.
MySQL
Structured Query Language is a special-purpose programming language designed for
managing data held in a relational database management system (RDBMS).Originally based
upon relational algebra and tuple relational calculus, SQL consists of a data definition
language and a data manipulation language. The scope of SQL includes data insert,
query, update and delete, schema creation and modification, and data access control. Although
SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes
procedural elements.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks." Despite not entirely adhering to the relational model as described by Codd, it became
the most widely used database language.
SQL became a standard of the American National Standards Institute(ANSI) in 1986,
and of the International Organization for Standardization(ISO) in 1987. Since then, the standard
has been enhanced several times with added features. Because the editor is extensible, you can
plug in support for many other languages. Keeping a clear overview of large applications, with
thousands of folders and files, and millions of lines of code, is a daunting task. Despite these
standards, code is not completely portable among different database systems, which can lead
to vendor lock-in. The difference makers do not perfectly adhere to the standard, for instance by
adding extensions, and the standard itself is sometimes ambiguous.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 19
NetBeans IDE
NetBeans IDE is the official IDE for Java 8. With its editors, code analyzers, and
converters, you can quickly and smoothly upgrade your applications to use new Java 8 language
constructs, such as lambdas, functional operations, and method references. Batch analyzers and
converters are provided to search through multiple applications at the same time, matching
patterns for conversion to new Java 8 language constructs. With its constantly improving Java
Editor, many rich features and an extensive range of tools, templates and samples, NetBeans
IDE sets the standard for developing with cutting edge technologies out of the box. An IDE is
much more than a text editor. The NetBeans Editor indent lines, matches words and brackets,
and highlight source code syntactically and semantically. It also provides code templates, coding
tips, and refactoring tools. The editor supports many languages from Java, C/C++, XML and
HTML, to PHP, Groovy, Javadoc, JavaScript and JSP. Because the editor is extensible, you can
plug in support for many other languages. Keeping a clear overview of large applications, with
thousands of folders and files, and millions of lines of code, is a daunting task. NetBeans IDE
provides different views of your data, from multiple project windows to helpful tools for setting
up your applications and managing them efficiently, letting you drill down into your data
quickly and easily, while giving you versioning tools via Subversion, Mercurial, and Get
integration out of the box. When new developers join your project, they can understand the
structure of your application because your code is well-organized.
Design GUIs for Java SE, HTML5, Java EE, PHP, C/C++, and Java ME applications
quickly and smoothly by using editors and drag-and-drop tools in the IDE. For Java SE
applications, the NetBeans GUI Builder automatically takes care of correct spacing and
alignment, while supporting in-place editing, as well. The GUI builder is so easy to use and
intuitive that it has been used to prototype GUIs live at customer presentations. The cost of
buggy code increases the longer it remains unfixed. NetBeans provide static analysis tools,
especially integration with the widely used FindBugs tool, for identifying and fixing common
problems in Java code. In addition, the NetBeans Debugger lets you place breakpoints in your
source code, add field watches, step through your code, run into methods.
The NetBeans Profiler provides expert assistance for optimizing your application's speed and
memory usage, and makes it easier to build reliable and scalable Java SE, JavaFX and Java EE
applications. NetBeans IDE includes a visual debugger for Java SE applications, letting you
debug user interfaces without looking into source code. Take GUI snapshots of your applications
and click on user interface elements to jump back into the related source code.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 20
Fig. 4.3 Snap Shot of Net Beans
Apache
The Apache HTTP Server is a web server software notable for playing a key role in the
initial growth of the World Wide Web. In 2009 it became the first web server software to
surpass the 100 million web site milestone. Apache is developed and maintained by an open
community of developers under the auspices of the Apache Software Foundation. Since April
1996 Apache has been the most popular HTTP server software in use. As of November 2010
Apache served over 59.36% of all websites and over 66.56% of the first one million busiest
websites.
Navicat Premium
Navicat Premium is a multi-connections database administration tool allowing you to
connect to MySQL, MariaDB, SQL Server, and SQLite, Oracle and PostgreSQL databases
simultaneously within a single application, making database administration to multiple kinds of
database so easy.
Navicat Premium combines the functions of other Navicat members and supports most of
the features in MySQL, MariaDB, SQL Server, SQLite, Oracle and PostgreSQL including
Stored Procedure, Event, Trigger, Function, View, etc.
Navicat Premium enables you to easily and quickly transfer data across various database
systems, or to a plain text file with the designated SQL format and encoding. Also, batch job for
different kind of databases can also be scheduled and run at a specific time. Other features
include Import/ Export Wizard, Query Builder, Report Builder, Data Synchronization, Backup,
Job Scheduler and more. Features in Navicat are sophisticated enough to provide professional
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 21
developers for all their specific needs, yet easy to learn for users who are new to database server.
Establish a secure SSH session through SSH Tunnelling in Navicat. You can enjoy a
strong authentication and secure encrypted communications between two hosts. The
authentication method can use a password or public / private key pair. And, Navicat comes with
HTTP Tunnelling while your ISPs do not allow direct connections to their database servers but
allow establishing HTTP connections. HTTP Tunnelling is a method for connecting to a server
that uses the same protocol (http://) and the same port (port 80) as a webserver does.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 22
CHAPTER 5
IMPLEMENTATION
5.1 Programming Language Selection
Java is a little, basic, safe, item situated, translated or rapidly improved, byte coded,
engineering, waste gathered, multithreaded programming dialect with a specifically exemption
taking care of for composing circulated and powerfully extensible projects.
With most programming dialects, you either accumulate or translate a project so you can
run it on your PC. The Java programming dialect is irregular in that a project is both
accumulated and deciphered. The stage autonomous codes deciphered by the mediator on the
Java stage. The mediator parses and runs every Java byte code guideline on the PC. Aggregation
happens just once; understanding happens every time the project is executed. The accompanying
figure delineates how this function You can consider Java byte codes as the machine code
directions for the Java Virtual Machine (Java VM). Each Java mediator, whether it’s an
advancement device or a Web program that can run applets, is an execution of the Java VM.
Fig. 5.1 Features of Java
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 23
5.2 SelectionofPlatform
A platform is the hardware or software environment in which a program runs. As already
mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS.
Most platforms can be described as a combination of the operating system and hardware. The
Java platform differs from most other platforms in that it’s a software-only platform that runs on
topof other hardware-based platforms.
The Java platform has two components:
• The Java Virtual Machine (JVM)
• The Java Application Programming Interface (Java API)
We’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported
onto various hardware-based platforms.
The Java API is a large collection of ready-made software components that provide many useful
capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into
libraries of related classes and interfaces; these libraries are known as packages.
The figure depicts a program that’s running on the Java platform. As the figure shows, the Java
API and the virtual machine insulate the program from the hardware.
Java Platform
Fig. 5.2 Java Interpreter Architecture
myProgram.java
Java API
Java Virtual Machine
Hardware Based Platform
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 24
5.3 Functional Descriptionof Modules
 Shapelet Learning Model: Shapelets are discriminative subsequence of time series that
best predict the target variable, while shapelet learning models are usually designed with
a classification purpose that aims to identify the similarity between two items.[4]
 Product Network Regularization: The product network provides correlation
information about all online stores. We model three types of heterogeneous information
network as regularization terms: store-based regularization, product-based regularization,
and user-correlation regularization.[9]
 Collaborative Hyping Detection Model: We propose our collaborative hyping
detection model (CHDM) to solve the collective marketing hyping problem defined
earlier. This model integrates all the regularization terms we’ve defined into a shapelet
learning model that utilizes temporal features and product network information for
clustering.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 25
CHAPTER 6
SYSTEM TESTING
Types of Testing:
 Unit Testing
Individual component are tested to ensure that they operate correctly. Each
component is tested independently, without other system component. This system
was tested with the set of proper test data for each module and the results were
checked with the expected output. Unit testing focuses on verification effort on the
smallest unit of the software design module. This is also known as MODULE
TESTING. This testing is carried out during phases, each module is found to be
working satisfactory as regards to the expected output from the module.
 Integration Testing
Integration testing is another aspect of testing that is generally done in order to
uncover errors associated with flow of data across interfaces. The unit-tested
modules are grouped together and tested in small segment, which make it easier to
isolate and correct errors. This approach is continued unit I have integrated all
modules to form the system as a whole.
 System Testing
System testing is actually a series of different tests whose primary purpose is to fully
exercise the computer-based system. System testing ensures that the entire integrated
software system meets requirements. It tests a configuration to ensure known and
predictable results. An example of system testing is the configuration oriented
system integration testing. System testing is based on process description and flows,
emphasizing pre-driver process and integration points.
 Performance Testing
The performance testing ensure that the output being produced within the time limits
and time taken for the system compiling, giving response to the users and request
being send to the system in order to retrieve the results.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 26
 Validation Testing
The validation testing can be defined in many ways, but a simple definition is that.
Validation succeeds when the software functions in a manner that can be reasonably
expected by the end user.
 Black Box testing
Black box testing is done to find the following
 Incorrect or missing functions
 Interface errors
 Errors on external database access
 Performance error
 Initialization and termination error
 White Box Testing
This allows the tests to
 Check whether all independent paths within a module have been
exercised at least once
 Exercise all logical decisions on their false sides
 Execute all loops and their boundaries and within their boundaries
 Exercise the internal data structure to ensure their validity
 Ensure whether all possible validity checks and validity lookups
have been provided to validate data entry.
 Acceptance Testing
This is the final stage of testing process before the system is accepted for operational
use. The system is tested within the data supplied from the system procurer rather
than simulated data.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 27
CHAPTER7
CONCLUSION
As previously discussed, the MSSD model identifies spam stores or products one by one by
detecting abnormal singleton reviewers appearing in an assigned time window. However, this
method misses the latent information that underlies evolving hyping activities. We pick up two
of the representative cases, which were tagged as “spam” by the MSSD model but that our
model placed in a “clean” class. Apparently, there’s a remarkable purchasing burst in both of
them, with 80 percent of buyers in this time window being singleton reviewers. In our
experiment, we define customers who have made fewer than five transactions online since their
registration as singleton reviewers. Because of the different customer level–segmentation
strategies and privacy policies in Taobao, this provides the best match with the definition of
singleton reviewers in the MSSD model.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 28
REFERENCES
[1] F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to Identify Review Spam,” Proc. Int’l
Joint Conf. Artificial Intelligence, 2011, pp. 2488–2493.
[2] S. Feng et al., “Distributional Footprints of Deceptive Product Reviews,” Proc. Int’l Conf.
Web and Social Media, 2012, pp. 98–105.
[3] G. Fei et al., “Exploiting Burstiness in Reviews for Review Spammer Detection,” Proc. Int’l
Conf. Web and Social Media, 2013, pp. 175–184.
[4] Q. Xu et al., “SMS Spam Detection Using Noncontent Features,” IEEE Intelligent Systems,
vol. 27, no. 6, 2012, pp. 44–51.
[5] S. Xie et al., “Review Spam Detection via Temporal Pattern Discovery,” Proc. ACM Int’l
Conf. Knowledge Discovery and Data Mining, 2012, pp. 823–831.
[6] T. Moore, R. Clayton, and H. Stern, “Temporal Correlations between Spam and Phishing
Websites,” Proc. 2nd Usenix Conf. Large-scale Exploits and Emergent Threats, 2009, p. 5.
[7] J. Grabocka et al., “Learning Time- Series Shapelets,” Proc. ACM Int’l Conf. Knowledge
Discovery and Data Mining, 2014, pp. 392–401.
[8] J. Lines et al., “A Shapelet Transform for Time Series Classification,” Proc. ACM Int’l Conf.
Knowledge Discovery and Data Mining, 2012, pp. 289–297.
[9] Q. Zhang et al., “Exploring Heterogeneous Product Networks for Discovering Collective
Marketing Hyping Behavior,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining,
2016, pp. 40–51.
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 29
Appendix A
Snapshots
Fig. A 1 Registeration Page
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 30
Fig A 2 Login Page
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 31
Fig A 3 User Home
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 32
Fig A 4 IP Blocking
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 33
Fig A 5 Blocked user message
Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018
Department of CSE, TOCE 34
Appendix B
Conference Details
Presenting and publishing paper entitled “Collective Hyping Detection System To Identify
Online Spam Activities Using AI” in proceedings of National Conference on Science
Engineering and Management (NCSEM-2018) which will be help on 24-25th May 2018 at
bThe Oxford College of Engineering, Bengaluru.

More Related Content

What's hot

Placement management system
Placement management systemPlacement management system
Placement management systemMehul Ranavasiya
 
14.project online eamination system
14.project online eamination system14.project online eamination system
14.project online eamination systemjbpatel7290
 
A.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical scienceA.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical sciencePrashant Gupta
 
IRJET - College Enquiry Chatbot
IRJET - College Enquiry ChatbotIRJET - College Enquiry Chatbot
IRJET - College Enquiry ChatbotIRJET Journal
 
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)Amit Mangukiya
 
Harsh Mathur Final Year Project Report on Restaurant Billing System
Harsh  Mathur Final Year Project Report on Restaurant Billing SystemHarsh  Mathur Final Year Project Report on Restaurant Billing System
Harsh Mathur Final Year Project Report on Restaurant Billing SystemHarsh Mathur
 
Chat Application [Full Documentation]
Chat Application [Full Documentation]Chat Application [Full Documentation]
Chat Application [Full Documentation]Rajon
 
Online Ticket Reservation System-SRS, ERD, DFD, Structured Charts
Online Ticket Reservation System-SRS, ERD, DFD, Structured ChartsOnline Ticket Reservation System-SRS, ERD, DFD, Structured Charts
Online Ticket Reservation System-SRS, ERD, DFD, Structured Chartsgrandhiprasuna
 
Training report on web developing
Training report on web developingTraining report on web developing
Training report on web developingJawhar Ali
 
Online Attendance System
Online Attendance SystemOnline Attendance System
Online Attendance SystemAkash Kr Sinha
 
Face Recognition Attendance System
Face Recognition Attendance System Face Recognition Attendance System
Face Recognition Attendance System Shreya Dandavate
 
Final Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaFinal Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaSharath Raj
 
Android Based Application Project Report.
Android Based Application Project Report. Android Based Application Project Report.
Android Based Application Project Report. Abu Kaisar
 
Attendance Management Report 2016
Attendance Management Report 2016Attendance Management Report 2016
Attendance Management Report 2016Pooja Maan
 
eye phone technology
eye phone technologyeye phone technology
eye phone technologyNaga Dinesh
 
Android College Application Project Report
Android College Application Project ReportAndroid College Application Project Report
Android College Application Project Reportstalin george
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Quiz application
Quiz applicationQuiz application
Quiz applicationHarsh Verma
 

What's hot (20)

Placement management system
Placement management systemPlacement management system
Placement management system
 
14.project online eamination system
14.project online eamination system14.project online eamination system
14.project online eamination system
 
Virtual Mouse
Virtual MouseVirtual Mouse
Virtual Mouse
 
A.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical scienceA.I based chatbot on healthcare and medical science
A.I based chatbot on healthcare and medical science
 
IRJET - College Enquiry Chatbot
IRJET - College Enquiry ChatbotIRJET - College Enquiry Chatbot
IRJET - College Enquiry Chatbot
 
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
ONLINE E-WASTE COLLECTION SYSTEM project Report (Approved)
 
Harsh Mathur Final Year Project Report on Restaurant Billing System
Harsh  Mathur Final Year Project Report on Restaurant Billing SystemHarsh  Mathur Final Year Project Report on Restaurant Billing System
Harsh Mathur Final Year Project Report on Restaurant Billing System
 
Chat Application [Full Documentation]
Chat Application [Full Documentation]Chat Application [Full Documentation]
Chat Application [Full Documentation]
 
Online Ticket Reservation System-SRS, ERD, DFD, Structured Charts
Online Ticket Reservation System-SRS, ERD, DFD, Structured ChartsOnline Ticket Reservation System-SRS, ERD, DFD, Structured Charts
Online Ticket Reservation System-SRS, ERD, DFD, Structured Charts
 
Training report on web developing
Training report on web developingTraining report on web developing
Training report on web developing
 
Online Attendance System
Online Attendance SystemOnline Attendance System
Online Attendance System
 
Face Recognition Attendance System
Face Recognition Attendance System Face Recognition Attendance System
Face Recognition Attendance System
 
Final Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-SticaFinal Year Project BCA Presentation on Pic-O-Stica
Final Year Project BCA Presentation on Pic-O-Stica
 
Android Based Application Project Report.
Android Based Application Project Report. Android Based Application Project Report.
Android Based Application Project Report.
 
Attendance Management Report 2016
Attendance Management Report 2016Attendance Management Report 2016
Attendance Management Report 2016
 
eye phone technology
eye phone technologyeye phone technology
eye phone technology
 
Android College Application Project Report
Android College Application Project ReportAndroid College Application Project Report
Android College Application Project Report
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Quiz application
Quiz applicationQuiz application
Quiz application
 
Brain gate
Brain gateBrain gate
Brain gate
 

Similar to VTU final year project report Main

IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET Journal
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET Journal
 
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...ijtsrd
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkIRJET Journal
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET Journal
 
Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniquesijceronline
 
Net spam a network based spam detection framework for reviews in online socia...
Net spam a network based spam detection framework for reviews in online socia...Net spam a network based spam detection framework for reviews in online socia...
Net spam a network based spam detection framework for reviews in online socia...CloudTechnologies
 
A survey on identification of ranking fraud for mobile applications
A survey on identification of ranking fraud for mobile applicationsA survey on identification of ranking fraud for mobile applications
A survey on identification of ranking fraud for mobile applicationseSAT Journals
 
FAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptxFAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptxNareshKumar675331
 
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...IRJET Journal
 
Fake Product Review Monitoring System
Fake Product Review Monitoring SystemFake Product Review Monitoring System
Fake Product Review Monitoring Systemijtsrd
 
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONCOMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONShakas Technologies
 
IRJET- Detection of Ranking Fraud in Mobile Applications
IRJET-  	  Detection of Ranking Fraud in Mobile ApplicationsIRJET-  	  Detection of Ranking Fraud in Mobile Applications
IRJET- Detection of Ranking Fraud in Mobile ApplicationsIRJET Journal
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamIRJET Journal
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET Journal
 
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating SitesAutomatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating SitesIRJET Journal
 
Recommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data StreamsRecommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data StreamsIRJET Journal
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
Fuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemFuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemRSIS International
 

Similar to VTU final year project report Main (20)

IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
IRJET- Improving Performance of Fake Reviews Detection in Online Review’s usi...
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
 
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
Netspam: An Efficient Approach to Prevent Spam Messages using Support Vector ...
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
Fraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning TechniquesFraud Detection in Online Reviews using Machine Learning Techniques
Fraud Detection in Online Reviews using Machine Learning Techniques
 
Net spam a network based spam detection framework for reviews in online socia...
Net spam a network based spam detection framework for reviews in online socia...Net spam a network based spam detection framework for reviews in online socia...
Net spam a network based spam detection framework for reviews in online socia...
 
A survey on identification of ranking fraud for mobile applications
A survey on identification of ranking fraud for mobile applicationsA survey on identification of ranking fraud for mobile applications
A survey on identification of ranking fraud for mobile applications
 
FAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptxFAKE PRODUCT PAPER PRESENTATION.pptx
FAKE PRODUCT PAPER PRESENTATION.pptx
 
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...
Identifying Malicious Reviews Using NLP and Bayesian Technique on Ecommerce H...
 
Fake Product Review Monitoring System
Fake Product Review Monitoring SystemFake Product Review Monitoring System
Fake Product Review Monitoring System
 
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONCOMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
 
IRJET- Detection of Ranking Fraud in Mobile Applications
IRJET-  	  Detection of Ranking Fraud in Mobile ApplicationsIRJET-  	  Detection of Ranking Fraud in Mobile Applications
IRJET- Detection of Ranking Fraud in Mobile Applications
 
Survey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based SpamSurvey in Online Social Media Skelton by Network based Spam
Survey in Online Social Media Skelton by Network based Spam
 
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
IRJET - An Automated System for Detection of Social Engineering Phishing Atta...
 
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating SitesAutomatic Recommendation of Trustworthy Users in Online Product Rating Sites
Automatic Recommendation of Trustworthy Users in Online Product Rating Sites
 
Recommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data StreamsRecommender System- Analyzing products by mining Data Streams
Recommender System- Analyzing products by mining Data Streams
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
Fuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemFuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender System
 

Recently uploaded

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 

Recently uploaded (20)

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 

VTU final year project report Main

  • 1. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 1 CHAPTER1 PREAMBLE 1.1 Introduction Many approaches have been successfully developed to detect online spam. Fangtao Li and colleagues1 initially analyzed several attributes related to spam behavior, such as content, sentiment, product, and metadata features, and exploited a two-view semi supervised method to identify spam reviews. Song Feng and colleagues2 defined three types of reviewers (any-time, multi-time, and single- time reviewers) and statistically made distributional footprints of deceptive reviews by using neuro-linguistic programming (NLP) techniques. Geli Fei and colleagues3 proposed a model to detect spammed products or product groups by comparing the differences in rating behaviors between suspicious and normal users. All these models rely on content features that can be easily found by inserting special characters, but other features, such as temporal and network information, have been employed as well. Qian Xu and colleagues4 collected large-scale real-world datasets from telecommunication service providers and combined temporal and user network information to classify spammers using Short Message Service (SMS). Sihong Xie and colleagues5 proposed a model that uses only temporal features, with no semantic or rating behavior analysis, to detect abnormal bursts as the number of reviews increases. Finally, Tyler Moore and colleagues6 studied the problem of temporal correlations between spam and phishing websites. Intuitively, these works can also be used to uncover sophisticated spam strategies. Amazon has sued more than 1,000 product review sellers who sell fake promotions on Fiverr.com (one of the most famous being Spam Reviewer Cloud; http://money.cnn.com/2015/10/18 /technology/amazon-lawsuit-fake -reviews). On such user cloud platforms, business owners can purchase anonymous comments generated by real users by paying for them. It makes spam detection very challenging, as the advent of a massive number of apparently genuine fake reviewers (which we refer to as “genuine fakes” in this article) makes the fraud pattern much more nebulous to track. To date many third party platforms have created various fake review markets for online product sellers and fake review providers. In real-world business processes, massive numbers of random but genuine fake review providers conduct real transactions and write positive
  • 2. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 2 comments to claim a bonus (many e-commerce websites think they can reduce spam reviews by allowing only real buyers to write them). Existing research ignores the latent connections in product networks, which are difficult to discover, especially when these spam activities have become a hyping and advertising investment that has gained increased popularity among homogeneous competitors online. Thus, antispam rules can be easily avoided, which also impairs the efficiency and effectiveness of detection performance. In this work, we coin a new solution— collaborative marketing hyping detection—that aims to detect groups of online stores that simultaneously adopt marketing hyping. [1] This field involves various challenges: • How can heterogeneous product information network be defined to infer their latent collaborative hyping behaviors? Network information might not be directly observed in the original datasets, so we need to build up a relationship matrix between products to represent their underlying correlation. • What features need to be selected to best solve our problem? Traditional features such as semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose dedicated features according to our specific scenario. • How can we design a model that effectively identifies collaborative marketing hyping behavior? A model that can employ the power of heterogeneous product networks to discover collective hyping behavior is required here. To overcome these challenges, we propose an unsupervised shapelet learning model to discover the temporal features of product reviews and then integrate the heterogeneous product network information as regularization terms, to discover the products that are subject to collaborative hyping. We define three regularization terms that reflect the underlying correlations among users, products, and online store networks.[1] The beginning configuration procedure of recognizing these subsystems and building up a structure for subsystem control and correspondence is called construction modeling outline and the yield of this outline procedure is a portrayal of the product structural planning.
  • 3. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 3 1.2 Objectives And Goals A new solution aims to identify spam comments and detect products that adopt an evolving spam strategy for promotion. Specifically, an unsupervised learning model combines heterogeneous product review networks to discover collective hyping activities. 1.3 Existing System Traditional features such as semantic clues or user relations might no longer be suitable for discovering fraud due to rapidly evolving spam strategies. Hence, we need to choose dedicated features according to our specific scenario. Disadvantages:  Decreases the inaccuracy caused by only using the user name information.  Stores usually purchase fake reviews periodically. 1.4 ProposedSystem We propose an unsupervised shape let learning model to discover the temporal features of product reviews and then integrate the heterogeneous product network information as regularization terms, to discover the products that are subject to collaborative hyping. We define three regularization terms that reflect the underlying correlations among users, products, and online store networks. Advantages:  Gained increased popularity among homogeneous competitors online.  Efficiency and effectiveness of detection performance.  Aims to detect groups of online stores that simultaneously adopt marketing hyping.
  • 4. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 4 CHAPTER 2 LITERATURE SURVEY  Learning to Identify Review Spam by Fangtao Li, Minlie Huang, Yi Yang and Xiaoyan Zhu [1] In this paper, we study the review spam identification task in our product review mining system. We manually build a review spam collection based on our crawled reviews. We first employ supervised learning methods and analyze the effect of different features in review spam identification. We also observe that the spammer consistently writes spam. This provides us another view to identify review spam: we can identify if the author of the review is spammer. Based on the observation, we provide a two-view semi-supervised methods to exploit the large amount of unlabeled data. The experiment results show that the two-view co-training algorithms can achieve better results than the single-view algorithm. Our designed machine learning methods achieve significant improvements as compared with the heuristic baselines.  Distributional Footprints of Deceptive Product Reviews by Song Feng, Longfei Xing, Anupam Gogar & Yejin Choi [2] This paper postulates that there are natural distributions of opinions in product reviews. In particular, we hypothesize that for a given domain, there is a set of representative distributions of review rating scores. A deceptive business entity that hires people to write fake reviews will necessarily distort its distribution of review scores, leaving distributional footprints behind. In order to validate this hypothesis, we introduce strategies to create dataset with pseudo-gold standard that is labeled automatically based on different types of distributional footprints. A range of experiments confirm the hypothesized connection between the distributional anomaly and deceptive reviews. This study also provides novel quantitative insights into the characteristics of natural distributions of opinions in the Trip Advisor hotel review and the Amazon product review domains.
  • 5. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 5  Exploiting Burstiness in Reviews for Review Spammer Detection by Geli Fei, Arjun Mukherjee & Bing Liu [3] In this paper, we proposed to exploit bursts in detecting opinion spammers due to the similar nature of reviewers in a burst. A graph propagation method for identifying spammers was presented. A novel evaluation method based on supervised learning was also described to deal with the difficult problem of evaluation without ground truth data, which classifies reviews based on a different set of features from the features used in identifying spammers. Our experimental results using Amazon.com reviews from the software domain showed that the proposed method is effective, which not only demonstrated its effectiveness objectively based on supervised learning (or classification), but also subjectively based on human expert evaluation. The fact that the supervised learning/classification results are consistent with human judgment also indicates that the proposed supervised learning based evaluation technique is justified.  Topic: SMS Spam Detection Using Non-Content Features by Qian Xu, Evan Wei Xiang and Qiang Yang [4] In this paper, we have examined mobile-phone SMS message features from static, network and temporal views, and proposed an effective way to identify important features that can be used to construct an anti-spam algorithm. We exploited a temporal analysis to design features that can detect SMS spammers with both high performance, and incorporated these features into an SVM classification algorithm. Our evaluation on a real SMS dataset showed that the temporal features and network features can be effectively incorporated to build an SVM classifier, with a gain of around 8% in improvement on AUC, as compared with those that are only based on conventional static features.  Topic: Temporal Correlations between Spam and Phishing Websites by Tyler Moore, Richard Clayton & Henry Stern [6] Empirical study of malicious online activity is hard. Attackers remain elusive, compromises happen fast, and strategies change frequently. Unfortunately, each of these factors cannot be changed. In this paper, we have combined phishing website lifetimes with detailed spam data, and consequently we have provided several new insights. First, we have demonstrated the gravity of the threat posed by attackers using fast-flux techniques. They send out 68% of spam while hosting only 3% of all phishing websites. They also transmit spam
  • 6. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 6 effectively: the bulk is sent out early, it stops once the site is removed, and keeps going whenever websites are overlooked by the take-down companies. In this respect, we also conclude that long-lived phishing websites continue to cause harm and should be taken down.  A Shapelet Transfer for Time Series Classification by Jason Lines, Luke M. Davis & Anthony Bagnall [8] In this paper, we have proposed a shapelet transform for TSC that extracts the k best shapelets from a dataset in a single pass. We implement this using a novel caching algorithm to store shapelets, and apply a simple, parameter-free cross-validation approach for extracting the most significant shapelets. We transform a total of 26 data sets with our filter and demonstrate that a C4.5 decision tree classifier trained with transformed data is competitive with an implementation of the original shapelet decision tree. We show that our filtered data can be applied to further, non-tree based classifiers to achieve improved classification performance, whilst still maintaining the interpretability of shapelets. We provide two implementations of the filter using different quality measures for discriminating between shapelets; we use information gain as proposed by in the first, and introduce the application of the F-statistic as an evaluation method for shapelets in the second. We show that classifiers trained using features derived from an F-statistic filter are competitive with classifiers trained with the information gain approach, whilst being easier to apply to multi-class classification problems. Finally, we provide exploratory data analysis of the shapelets extracted by our Filter on the Gun=NoGun problem and compare them with the output of 20. We show that the shapelets we find are consistent with the discriminatory shapelet in the original work, and show that our approach can lead to further insight into the problem by looking at a number of the top shapelets.
  • 7. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 7 CHAPTER 3 SYSTEM DESIGN 3.1 DesignConsideration The reason for the design is to arrange the arrangement of the issue determined by the necessities report. This stage is the initial phase in moving from issue to the arrangement space. As such, beginning with what is obliged; outline takes us to work towards how to full fill those needs. The configuration of the framework is maybe the most basic component influencing the nature of the product and has a noteworthy effect on the later stages, especially testing and upkeep. Framework outline depicts all the significant information structure, document arrangement, yield and real modules in the framework and their Specification is chosen. 3.2 System Architecture The architectural configuration procedure is concerned with building up a fundamental basic system for a framework. It includes recognizing the real parts of the framework and interchanges between these segments. The beginning configuration procedure of recognizing these subsystems and building up a structure for subsystem control and correspondence is called construction modeling outline and the yield of this outline procedure is a portrayal of the product structural planning. [5] The proposed architecture for this system is given below. It shows the way this system is designed and brief working of the system.
  • 8. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 8 Check Hyping Quality and Pay ($) Post TP and Pay Fake Review ($) Quality Guarentee Purchase & Hyping (€) (€) (€) : Fake reviewers make genuine purchases ($) : Store owners pay fake reviewers and purchasing cost through user cloud Fig 3.1 System Architecture Spammer Cloud Fake Reviewers Online Stores Target Products
  • 9. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 9 3.3 Use Case Diagram Use case diagram shows the various interactions of actors with a system. Use case is a coherent piece of functionality that a system can provide by interacting with actors. Actors are the external end users of the system. Fig 3.2 Use Case Diagram Register Login Browse products Buy products Give reviews Analyze reviews Detect spams Display spammers Block spam users User Admin
  • 10. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 10 3.4 Dataflow Diagram The DFD is straightforward graphical formalism that can be utilized to speak to a framework as far as the info information to the framework, different preparing did on this information and the yield information created by the framework. A DFD model uses an exceptionally predetermined number of primitive images to speak to the capacities performed by a framework and the information stream among the capacities. The principle motivation behind why the DFD method is so famous is most likely in light of the way that DFD is an exceptionally basic formalism. It is easy to comprehend and utilization. Beginning with the arrangement of abnormal state works that a framework performs, a DFD display progressively speaks to different sub capacities. Actually, any various leveled model is easy to get it.
  • 11. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 11 Fig 3.3 Data Flow Diagram LIST OF PRODUCTS BUY PRODUCT REVIEW PRODUCTS GET TOTAL REVIEW EXTRACT FEATURE NLP SPAM DETECTION FETCH DETAILS WEBSITEUSER SET OF REVIEWS BLOCKED USER GET DETAILS SEARCH PRODUCTS REQUEST DATA EXTRACT DATA
  • 12. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 12 3.4 Activity Diagram Activity diagram is another important diagram in UML to describe dynamic aspects of the system. Activity diagram is basically a flow chart to represent the flow from one activity to another activity. The activity can be described as an operation of the system. So the control flow is drawn from one operation to another. Fig 3.4 Activity Diagra BROWSE PRODUCTS BUY PRODUCTS FEEDBACK PROCESS REVIEW EVALUATE USERS SPAM ANALYSIS NATURAL LANG. PROCESSING RESULTS FOR SPAMMERS
  • 13. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 13 CHAPTER 4 SYSTEM REQUIREMENT SPECIFICATION System Requirement Specification (SRS) is a central report, which frames the establishment of the product advancement process. It records the necessities of a framework as well as has a depiction of its significant highlight. A SRS is essentially an association's seeing (in composing) of a client or potential customer's frame work necessities and conditions at a specific point in time (generally) before any genuine configuration or improvement work. It's a two-way protection approach that guarantees that both the customer and the association comprehend alternate's necessities from that viewpoint at a given point in time. The composition of programming necessity detail lessens advancement exertion, as watchful audit of the report can uncover oversights, mistaken assumptions, and irregularities ahead of schedule in the improvement cycle when these issues are less demanding to right. The SRS talks about the item however not the venture that created it, consequently the SRS serves as a premise for later improvement of the completed item. The SRS may need to be changed, however it does give an establishment to proceed with creation assessment. In straightforward words, programming necessity determination is the beginning stage of the product improvement action. The SRS means deciphering the thoughts in the brains of the customers – the information, into a formal archive – the yield of the prerequisite stage. Subsequently the yield of the stage is a situated of formally determined necessities, which ideally are finished and steady, while the data has none of these properties.
  • 14. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 14 4.1 Hardware Requirements The most common set of requirements defined by any operating system or software application is the physical computer resources, also known as hardware, A hardware requirements list is often accompanied by a hardware compatibility list (HCL), especially in case of operating systems. The hardware requirements are a follows.,  System : Intel i3 2.1 GHZ  Memory : 4GB.  Hard Disk : 40 GB.  Monitor : 15 VGA Color 4.2 Software Requirements: Software requirements may be calculations, technical details, data manipulation and processing and other specific functionality that define what a system is supposed to accomplish. Behavioural requirements describing all the cases where the system uses the functional requirements are captured in use cases. These are things that the system is required to do.  Operating System : Windows 7 / 8  Language : JAVA / J2EE  Database : MySQL  Tool : NetBeans, Navicat, Tomcat Server
  • 15. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 15 Java Server Pages JavaServer Pages(JSP) are a technology that helps software developers create dynamically generated web pages based on HTML,XML, or other document types. Released in 1999 by Sun Microsystems, JSP is similar to PHP, but it uses the Java programming language. To deploy and run JavaServer Pages, a compatible web server with a servlet container, such as Apache Tomcat or Jetty, is required. Architecturally, JSP may be viewed as a high-level abstraction of Java servlets. JSPs are translated into servlets at runtime each JSP, servlet is cached and re-used until the original JSP is modified. JSP can be used independently or as the view component of a server-side model–view– controller design, normally with JavaBeans as the model and Java servlets (or a framework such as Apache Struts) as the controller. This is a type of Model 2 architecture. JSP allows Java code and certain pre-defined actions to be interleaved with static web markup content, with the resulting page being compiled and executed on the server to deliver a document. The compiled pages, as well as any dependent Java libraries, use Java bytecode rather than a native software format. Like any other Java program, they must be executed within a Java virtual machine (JVM) that integrates with the server's host operating system to provide an abstract platform-neutral environment. JSPs are usually used to deliver HTML and XML documents, but through the use of Output Stream, they can deliver other types of data as well. The Web container creates JSP implicit objects like page Context, Servlet Context, session, request & response. Fig 4.1 JSP Model Web Browser Data Sources/Database Instantiate Server Servelet Filter (Controller) JSP Pages (View) JavaBeans (Model)
  • 16. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 16 A JavaServer Pages compiler is a program that parses JSPs, and transforms them into executable Java Servlets. A program of this type is usually embedded into the application server and run automatically the first time a JSP is accessed, but pages may also be recompiled for better performance, or compiled as a part of the build process to test for errors. Some JSP containers support configuring how often the container checks JSP file timestamps to see whether the page has changed. Typically, this timestamp would be set to a short interval (perhaps seconds) during software development, and a longer interval (perhaps minutes, or even never) for a deployed Web application. Java Servlet The servlet is a Java programming language class used to extend the capabilities of a server. Although servlets can respond to any types of requests, they are commonly used to extend the applications hosted by web servers, so they can be thought of as Java applets that run on servers instead of in web browsers. These kinds of servlets are the Java counterpart to other dynamic Web content technologies such as PHP and ASP.NET. Response Request (a) JSP Container (a) Translation occurs at this point, if JSP has been changed or is new. (b) If not, translation is skipped. Fig. 4.2 Life of a JSP File JSP Page (.JSP) (b) Translation Phase Execution Phase JSP Translator (Tomcat) Servelet Source Code (Java) Java Compiler (embedded server) Server Class (.class) Text Buffer (in memory) JRE System
  • 17. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 17 Three methods are central to the life cycle of a servlet. These are init(), service(), and destroy(). They are implemented by every servlet and are invoked at specific times by the server.  During the initialization stage of the servlet life cycle, the web container initializes the servlet instance by calling the init() method, passing an object implementing the javax.servlet.ServletConfig interface. This configuration object allows the servlet to access name-value initialization parameters from the web application.  After initialization, the servlet instance, can service client requests. Each request is serviced in its own separate thread. The web container calls the service() method of the servlet for every request. The service() method determines the kind of request being made and dispatches it to an appropriate method to handle the request. The developer of the servlet must provide an implementation for these methods. If a request is made for a method that is not implemented by the servlet, the method of the parent class is called, typically resulting in an error being returned to the requester.  Finally, the web container calls the destroy() method that takes the servlet out of service. The destroy() method, like init(), is called only once in the lifecycle of a servlet. The following is a typical user scenario of these methods. 1. Assume that a user requests to visit a URL.  The browser then generates an HTTP request for this URL.  This request is then sent to the appropriate server. 2. The HTTP request is received by the web server and forwarded to the servlet container.  The container maps this request to a particular servlet.  The servlet is dynamically retrieved and loaded into the address space of the container. 3. The container invokes the init() method of the servlet.  This method is invoked only when the servlet is first loaded into memory.  It is possible to pass initialization parameters to the servlet so that it may configure itself. 4. The container invokes the service() method of the servlet.  This method is called to process the HTTP request.  The servlet may read data that have been provided in the HTTP request.
  • 18. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 18  The servlet may also formulate an HTTP response for the client. 5. The servlet remains in the container's address space and is available to process any other HTTP requests received from clients.  The service() method is called for each HTTP request. 6. The container may, at some point, decide to unload the servlet from its memory.  The algorithms by which this decision is made are specific to each container. 7. The container calls the servlet's destroy() method to relinquish any resources such as file handles that are allocated for the servlet; important data may be saved to a persistent store. 8. The memory allocated for the servlet and its objects can then be garbage collected. MySQL Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS).Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language and a data manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements. SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks." Despite not entirely adhering to the relational model as described by Codd, it became the most widely used database language. SQL became a standard of the American National Standards Institute(ANSI) in 1986, and of the International Organization for Standardization(ISO) in 1987. Since then, the standard has been enhanced several times with added features. Because the editor is extensible, you can plug in support for many other languages. Keeping a clear overview of large applications, with thousands of folders and files, and millions of lines of code, is a daunting task. Despite these standards, code is not completely portable among different database systems, which can lead to vendor lock-in. The difference makers do not perfectly adhere to the standard, for instance by adding extensions, and the standard itself is sometimes ambiguous.
  • 19. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 19 NetBeans IDE NetBeans IDE is the official IDE for Java 8. With its editors, code analyzers, and converters, you can quickly and smoothly upgrade your applications to use new Java 8 language constructs, such as lambdas, functional operations, and method references. Batch analyzers and converters are provided to search through multiple applications at the same time, matching patterns for conversion to new Java 8 language constructs. With its constantly improving Java Editor, many rich features and an extensive range of tools, templates and samples, NetBeans IDE sets the standard for developing with cutting edge technologies out of the box. An IDE is much more than a text editor. The NetBeans Editor indent lines, matches words and brackets, and highlight source code syntactically and semantically. It also provides code templates, coding tips, and refactoring tools. The editor supports many languages from Java, C/C++, XML and HTML, to PHP, Groovy, Javadoc, JavaScript and JSP. Because the editor is extensible, you can plug in support for many other languages. Keeping a clear overview of large applications, with thousands of folders and files, and millions of lines of code, is a daunting task. NetBeans IDE provides different views of your data, from multiple project windows to helpful tools for setting up your applications and managing them efficiently, letting you drill down into your data quickly and easily, while giving you versioning tools via Subversion, Mercurial, and Get integration out of the box. When new developers join your project, they can understand the structure of your application because your code is well-organized. Design GUIs for Java SE, HTML5, Java EE, PHP, C/C++, and Java ME applications quickly and smoothly by using editors and drag-and-drop tools in the IDE. For Java SE applications, the NetBeans GUI Builder automatically takes care of correct spacing and alignment, while supporting in-place editing, as well. The GUI builder is so easy to use and intuitive that it has been used to prototype GUIs live at customer presentations. The cost of buggy code increases the longer it remains unfixed. NetBeans provide static analysis tools, especially integration with the widely used FindBugs tool, for identifying and fixing common problems in Java code. In addition, the NetBeans Debugger lets you place breakpoints in your source code, add field watches, step through your code, run into methods. The NetBeans Profiler provides expert assistance for optimizing your application's speed and memory usage, and makes it easier to build reliable and scalable Java SE, JavaFX and Java EE applications. NetBeans IDE includes a visual debugger for Java SE applications, letting you debug user interfaces without looking into source code. Take GUI snapshots of your applications and click on user interface elements to jump back into the related source code.
  • 20. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 20 Fig. 4.3 Snap Shot of Net Beans Apache The Apache HTTP Server is a web server software notable for playing a key role in the initial growth of the World Wide Web. In 2009 it became the first web server software to surpass the 100 million web site milestone. Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation. Since April 1996 Apache has been the most popular HTTP server software in use. As of November 2010 Apache served over 59.36% of all websites and over 66.56% of the first one million busiest websites. Navicat Premium Navicat Premium is a multi-connections database administration tool allowing you to connect to MySQL, MariaDB, SQL Server, and SQLite, Oracle and PostgreSQL databases simultaneously within a single application, making database administration to multiple kinds of database so easy. Navicat Premium combines the functions of other Navicat members and supports most of the features in MySQL, MariaDB, SQL Server, SQLite, Oracle and PostgreSQL including Stored Procedure, Event, Trigger, Function, View, etc. Navicat Premium enables you to easily and quickly transfer data across various database systems, or to a plain text file with the designated SQL format and encoding. Also, batch job for different kind of databases can also be scheduled and run at a specific time. Other features include Import/ Export Wizard, Query Builder, Report Builder, Data Synchronization, Backup, Job Scheduler and more. Features in Navicat are sophisticated enough to provide professional
  • 21. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 21 developers for all their specific needs, yet easy to learn for users who are new to database server. Establish a secure SSH session through SSH Tunnelling in Navicat. You can enjoy a strong authentication and secure encrypted communications between two hosts. The authentication method can use a password or public / private key pair. And, Navicat comes with HTTP Tunnelling while your ISPs do not allow direct connections to their database servers but allow establishing HTTP connections. HTTP Tunnelling is a method for connecting to a server that uses the same protocol (http://) and the same port (port 80) as a webserver does.
  • 22. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 22 CHAPTER 5 IMPLEMENTATION 5.1 Programming Language Selection Java is a little, basic, safe, item situated, translated or rapidly improved, byte coded, engineering, waste gathered, multithreaded programming dialect with a specifically exemption taking care of for composing circulated and powerfully extensible projects. With most programming dialects, you either accumulate or translate a project so you can run it on your PC. The Java programming dialect is irregular in that a project is both accumulated and deciphered. The stage autonomous codes deciphered by the mediator on the Java stage. The mediator parses and runs every Java byte code guideline on the PC. Aggregation happens just once; understanding happens every time the project is executed. The accompanying figure delineates how this function You can consider Java byte codes as the machine code directions for the Java Virtual Machine (Java VM). Each Java mediator, whether it’s an advancement device or a Web program that can run applets, is an execution of the Java VM. Fig. 5.1 Features of Java
  • 23. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 23 5.2 SelectionofPlatform A platform is the hardware or software environment in which a program runs. As already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that it’s a software-only platform that runs on topof other hardware-based platforms. The Java platform has two components: • The Java Virtual Machine (JVM) • The Java Application Programming Interface (Java API) We’ve already been introduced to the Java VM. It’s the base for the Java platform and is ported onto various hardware-based platforms. The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The figure depicts a program that’s running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware. Java Platform Fig. 5.2 Java Interpreter Architecture myProgram.java Java API Java Virtual Machine Hardware Based Platform
  • 24. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 24 5.3 Functional Descriptionof Modules  Shapelet Learning Model: Shapelets are discriminative subsequence of time series that best predict the target variable, while shapelet learning models are usually designed with a classification purpose that aims to identify the similarity between two items.[4]  Product Network Regularization: The product network provides correlation information about all online stores. We model three types of heterogeneous information network as regularization terms: store-based regularization, product-based regularization, and user-correlation regularization.[9]  Collaborative Hyping Detection Model: We propose our collaborative hyping detection model (CHDM) to solve the collective marketing hyping problem defined earlier. This model integrates all the regularization terms we’ve defined into a shapelet learning model that utilizes temporal features and product network information for clustering.
  • 25. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 25 CHAPTER 6 SYSTEM TESTING Types of Testing:  Unit Testing Individual component are tested to ensure that they operate correctly. Each component is tested independently, without other system component. This system was tested with the set of proper test data for each module and the results were checked with the expected output. Unit testing focuses on verification effort on the smallest unit of the software design module. This is also known as MODULE TESTING. This testing is carried out during phases, each module is found to be working satisfactory as regards to the expected output from the module.  Integration Testing Integration testing is another aspect of testing that is generally done in order to uncover errors associated with flow of data across interfaces. The unit-tested modules are grouped together and tested in small segment, which make it easier to isolate and correct errors. This approach is continued unit I have integrated all modules to form the system as a whole.  System Testing System testing is actually a series of different tests whose primary purpose is to fully exercise the computer-based system. System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration testing. System testing is based on process description and flows, emphasizing pre-driver process and integration points.  Performance Testing The performance testing ensure that the output being produced within the time limits and time taken for the system compiling, giving response to the users and request being send to the system in order to retrieve the results.
  • 26. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 26  Validation Testing The validation testing can be defined in many ways, but a simple definition is that. Validation succeeds when the software functions in a manner that can be reasonably expected by the end user.  Black Box testing Black box testing is done to find the following  Incorrect or missing functions  Interface errors  Errors on external database access  Performance error  Initialization and termination error  White Box Testing This allows the tests to  Check whether all independent paths within a module have been exercised at least once  Exercise all logical decisions on their false sides  Execute all loops and their boundaries and within their boundaries  Exercise the internal data structure to ensure their validity  Ensure whether all possible validity checks and validity lookups have been provided to validate data entry.  Acceptance Testing This is the final stage of testing process before the system is accepted for operational use. The system is tested within the data supplied from the system procurer rather than simulated data.
  • 27. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 27 CHAPTER7 CONCLUSION As previously discussed, the MSSD model identifies spam stores or products one by one by detecting abnormal singleton reviewers appearing in an assigned time window. However, this method misses the latent information that underlies evolving hyping activities. We pick up two of the representative cases, which were tagged as “spam” by the MSSD model but that our model placed in a “clean” class. Apparently, there’s a remarkable purchasing burst in both of them, with 80 percent of buyers in this time window being singleton reviewers. In our experiment, we define customers who have made fewer than five transactions online since their registration as singleton reviewers. Because of the different customer level–segmentation strategies and privacy policies in Taobao, this provides the best match with the definition of singleton reviewers in the MSSD model.
  • 28. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 28 REFERENCES [1] F. Li, M. Huang, Y. Yang, and X. Zhu, “Learning to Identify Review Spam,” Proc. Int’l Joint Conf. Artificial Intelligence, 2011, pp. 2488–2493. [2] S. Feng et al., “Distributional Footprints of Deceptive Product Reviews,” Proc. Int’l Conf. Web and Social Media, 2012, pp. 98–105. [3] G. Fei et al., “Exploiting Burstiness in Reviews for Review Spammer Detection,” Proc. Int’l Conf. Web and Social Media, 2013, pp. 175–184. [4] Q. Xu et al., “SMS Spam Detection Using Noncontent Features,” IEEE Intelligent Systems, vol. 27, no. 6, 2012, pp. 44–51. [5] S. Xie et al., “Review Spam Detection via Temporal Pattern Discovery,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2012, pp. 823–831. [6] T. Moore, R. Clayton, and H. Stern, “Temporal Correlations between Spam and Phishing Websites,” Proc. 2nd Usenix Conf. Large-scale Exploits and Emergent Threats, 2009, p. 5. [7] J. Grabocka et al., “Learning Time- Series Shapelets,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2014, pp. 392–401. [8] J. Lines et al., “A Shapelet Transform for Time Series Classification,” Proc. ACM Int’l Conf. Knowledge Discovery and Data Mining, 2012, pp. 289–297. [9] Q. Zhang et al., “Exploring Heterogeneous Product Networks for Discovering Collective Marketing Hyping Behavior,” Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2016, pp. 40–51.
  • 29. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 29 Appendix A Snapshots Fig. A 1 Registeration Page
  • 30. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 30 Fig A 2 Login Page
  • 31. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 31 Fig A 3 User Home
  • 32. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 32 Fig A 4 IP Blocking
  • 33. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 33 Fig A 5 Blocked user message
  • 34. Collective Hyping Detection System To Identify Online Spam Activities Using AI 2018 Department of CSE, TOCE 34 Appendix B Conference Details Presenting and publishing paper entitled “Collective Hyping Detection System To Identify Online Spam Activities Using AI” in proceedings of National Conference on Science Engineering and Management (NCSEM-2018) which will be help on 24-25th May 2018 at bThe Oxford College of Engineering, Bengaluru.