SlideShare a Scribd company logo
1 of 25
Download to read offline
M.Phil Computer Science Data Mining Projects
Web : www.kasanpro.com Email : sales@kasanpro.com
List Link : http://kasanpro.com/projects-list/m-phil-computer-science-data-mining-projects
Title :Bridging Socially Enhanced Virtual Communities
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/bridging-socially-enhanced-virtual-communities
Abstract : Interactions spanning multiple organizations have become an important aspect in todays collaboration
landscape. Organizations create alliances to fulfill strategic objectives. The dynamic nature of collaborations
increasingly demands for automated techniques and algorithms to support the creation of such alliances. Our
approach bases on the recommendation of potential alliances by discovery of currently relevant competence sources
and the support of semi automatic formation. The environment is service-oriented comprising humans and software
services with distinct capabilities. To mediate between previously separated groups and organizations, we introduce
the broker concept that bridges disconnected networks. We present a dynamic broker discovery approach based on
interaction mining techniques and trust metrics. We evaluate our approach by using simulations in real Web services
testbeds.
Title :Mood Recognition During Online Self Assessment Test
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/mood-recognition-during-online-self-assessment-test
Abstract : Individual emotions play a crucial role during any learning interaction. Identifying a student's emotional
state and providing personalized feedback, based on integrated pedagogical models, has been considered to be one
of the main limits of traditional tools of e-learning. This paper presents an empirical study that llustrates how learner
mood may be predicted during online self-assessmenttests. Here, a previous method of determining student mood
has been refined based on the assumption that the influence on learner mood of questions already answered declines
in relation to their distance from the current question. Moreover, this paper sets out to indicate that "exponential logic"
may help produce more efficient models if integrated adequately with affective modeling. The results show that these
assumptions may prove useful to future research.
Title :On The Path To A World Wide Web Census: A Large Scale Survey
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/world-wide-web-census-large-scale-survey
Abstract : How large is the World Wide Web? We present the results of the largest Web survey performed to
date.We use an inter-disciplinary approach which uses methods from ecology. In addition to Web server counts, we
also present other information collected, such as Web server market share, operating system type used by Web
servers and Web server distribution. The software system used to collect data is a prototype of a system that we
believe can be used for a complete Web census.
Title :Knowledge Sharing In Virtual Organizations: Barriers and Enablers
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/knowledge-sharing-in-virtual-organizations-barriers-enablers
Abstract : Modern organizations have to deal with many drastic external and internal constraints due notably to the
globalization of the economy, the fast technological changes, and the shifts in customers demand. Moreover,
organizations functionally divided and hierarchical internal structures are too rigid and make difficult their adjustment
to the changing constraints resulting from the pressure of their external environment. Consequently, to survive and
maintain their competitive advantage in the market, modern organizations must alter their internal structure to become
organic and flexible systems able to adapt and progress in a high velocity environment. Virtual organizations are
among the most popular solutions which provide organizations with more agility and improve their efficiency and
effectiveness. Despite many success stories materialized by economic and non-economic benefits, many virtual
organizations have failed to reach their goals due to the problems they have encountered while trying to manage
knowledge. In this work, we analyze the barriers and enablers of knowledge management in virtual organizations.
Title :Adaptive Provisioning of Human Expertise in Service-oriented Systems
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/adaptive-provisioning-human-expertise-service-oriented-systems
Abstract : Web-based collaborations have become essential in today's business environments. Due to the availability
of various SOA frameworks, Web services emerged as the de facto technology to realize flexible compositions of
services. While most existing work focuses on the discovery and composition of software based services, we highlight
concepts for a people-centric Web. Knowledge-intensive environments clearly demand for provisioning of human
expertise along with sharing of computing resources or business data through software-based services. To address
these challenges, we introduce an adaptive approach allowing humans to provide their expertise through services
using SOA standards, such as WSDL and SOAP. The seamless integration of humans in the SOA loop triggers
numerous social implications, such as evolving expertise and drifting interests of human service providers. Here we
propose a framework that is based on interaction monitoring techniques enabling adaptations in SOA-based
socio-technical systems.
M.Phil Computer Science Data Mining Projects
Title :Cost-aware rank join with random and sorted access
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/cost-aware-rank-join-random-sorted-access
Abstract : In this project, we address the problem of joining ranked results produced by two or more services on the
Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns
tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join
operators combine objects of two or more relations and output the k combinations with the highest aggregate score.
While the past literature has studied suitable bounding schemes for this setting, in this paper we focus on the
definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the CARS
(Cost-Aware with Random and Sorted access) pulling strategy, which is derived at compile-time and is oblivious of
the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small
set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real
and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always
within a very short margin from that of an oracle-based optimal strategy. In addition, CARS is shown to be robust. The
uncertainty that may characterize the estimated parameters.
Title :USHER Improving Data Quality with Dynamic Forms
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/usher-improving-data-quality-dynamic-forms
Abstract : ta quality is a critical problem in modern databases. Data entry forms present the first and arguably best
opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving
data quality at entry time. In this paper, we propose USHER, an end-to-end system for form design, entry, and data
quality assurance. Using previous form submissions, USHER learns a probabilistic model over the questions of the
form. USHER then applies this model at every step of the data entry process to improve data quality. Before entry, it
induces a form layout that captures the most important data values of a form instance as quickly as possible. During
entry, it dynamically adapts the form to the values being entered, and enables real-time feedback to guide the data
enterer toward their intended values. After entry, it re-asks questions that it deems likely to have been entered
incorrectly. We evaluate all three components of USHER using two real-world data sets. Our results demonstrate that
each component has the potential to improve data quality considerably, at a reduced cost when compared to current
practice.
Title :A Dual Framework and Algorithms for Targeted Data Delivery
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/algorithms-targeted-data-delivery
Abstract : In this project, we develop a framework for comparing pull based solutions and present dual optimization
approaches. The first approach maximizes user utility while satisfying constraints on the usage of system resources.
The second approach satisfies the utility of user profiles while minimizing the usage of system resources. We present
an adaptive algorithm and show how it can incorporate feedback to improve user utility with only a moderate increase
in resource utilization.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Selecting Attributes for Sentiment Classification Using Feature Relation Networks
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/sentiment-classification-using-feature-relation-networks
Abstract : A major concern when incorporating large sets of diverse n-gram features for sentiment classification is
the presence of noisy, irrelevant, and redundant attributes. These concerns can often make it difficult to harness the
augmented discriminatory potential of extended feature sets. We propose a rule-based multivariate text feature
selection method called Feature Relation Network (FRN) that considers semantic information and also leverages the
syntactic relationships between n-gram features. FRN is intended to efficiently enable the inclusion of extended sets
of heterogeneous n-gram features for enhanced sentiment classification. Experiments were conducted on three online
review test beds in comparison with methods used in prior sentiment classification research. FRN outperformed the
comparison univariate, multivariate, and hybrid feature selection methods; it was able to select attributes resulting in
significantly better classification accuracy irrespective of the feature subset sizes. Furthermore, by incorporating
syntactic information about n-gram relations, FRN is able to select features in a more computationally efficient manner
than many multivariate and hybrid techniques.
Title :Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/aggregate-recommendation-diversity-using-ranking-based
Abstract : Recommender systems are becoming increasingly important to individual users and businesses for
providing personalized recommendations. However, while the majority of algorithms proposed in recommender
systems literature have focused on improving recommendation accuracy, other important aspects of recommendation
quality, such as the diversity of recommendations, have often been overlooked. In this paper, we introduce and
explore a number of item ranking techniques that can generate recommendations that have substantially higher
aggregate diversity across all users while maintaining comparable levels of recommendation accuracy.
Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several
real-world rating datasets and different rating prediction algorithms.
M.Phil Computer Science Data Mining Projects
Title :Integration of Sound Signature in Graphical Password Authentication System
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/sound-signature-graphical-password-authentication-system
Abstract : In this project, a graphical password system with a supportive sound signature to increase the
remembrance of the password is discussed. In proposed work a click-based graphical password scheme called Cued
Click Points (CCP) is presented. In this system a password consists of sequence of some images in which user can
select one click-point per image. In addition user is asked to select a sound signature corresponding to click point this
sound signature will be used to help the user to login. System showed very good Performance in terms of speed,
accuracy, and ease of use. Users preferred CCP to Pass Points, saying that selecting and remembering only one
point per image was easier and sound signature helps considerably in recalling the click points.
Title :Monitoring Service Systems from a Language-Action Perspective
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/monitoring-service-systems-language-action
Abstract : The Exponential growth in the global economy is being supported by service systems, realized by
recasting mission-critical application services accessed across organizational boundaries. Language-Action
Perspective (LAP) is based upon the notion as proposed that "expert behavior requires an exquisite sensitivity to
context and that such sensitivity is more in the realm of the human than in that of the artificial.
Business processes are increasingly distributed and open, making them prone to failure. Monitoring is, therefore, an
important concern not only for the processes themselves but also for the services that comprise these processes. We
present a framework for multilevel monitoring of these service systems. It formalizes interaction protocols, policies,
and commitments that account for standard and extended effects following the language-action perspective, and
allows specification of goals and monitors at varied abstraction levels. We demonstrate how the framework can be
implemented and evaluate it with multiple scenarios like between merchant and customer transaction that include
specifying and monitoring open-service policy commitments.
Title :A Personalized Ontology Model for Web Information Gathering
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/ontology-model-web-information-gathering
Abstract : As a model for knowledge description and formalization, ontologies are widely used to represent user
profiles in personalized web information gathering. However, when representing user profiles, many models have
utilized only knowledge from either a global knowledge base or user local information. In this paper, a personalized
ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns
ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model
is evaluated by comparing it against benchmark models in web information gathering. The results show that this
ontology model is successful.
Title :Publishing Search Logs-A Comparative Study of Privacy Guarantees
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/publishing-search-logs-privacy-guarantees
Abstract : Search engine companies collect the "database of intentions", the histories of their users' search queries.
These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing
search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent
keywords, queries and clicks of a search log. We first show how methods that achieve variants of k-anonymity are
vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by differential privacy
unfortunately does not provide any utility for this problem. Our paper concludes with a large experimental study using
real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing.
Our results show that ZEALOUS yields comparable utility to k?anonymity while at the same time achieving much
stronger privacy guarantees.
Title :Scalable Scheduling of Updates in Streaming Data Warehouse
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/scheduling-updates-streaming-data-warehouse
Abstract : This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Face book, Twitter, Flicker, and YouTube present
opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict
collective behavior in social media. In particular, given information about some individuals, how can we infer the
behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown
effective in addressing the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails
scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an
edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed
approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction
performance to other non-scalable methods.
M.Phil Computer Science Data Mining Projects
Title :The Awareness Network, To Whom Should I Display My Actions? And, Whose Actions Should I Monitor?
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/accessing-monitoring-inawareness-network
Abstract : The concept of awareness plays a pivotal role in research in Computer Supported Cooperative Work.
Recently, Software Engineering researchers interested in the collaborative nature of software development have
explored the implications of this concept in the design of software development tools. A critical aspect of awareness is
the associated coordinative work practices of displaying and monitoring actions. This aspect concerns how colleagues
monitor one another's actions to understand how these actions impact their own work and how they display their
actions in such a way that others can easily monitor them while doing their own work. In this paper, we focus on an
additional aspect of awareness: the identification of the social actors who should be monitored and the actors to
whom their actions should be displayed. We address this aspect by presenting software developers' work practices
based on ethnographic data from three different software development teams. In addition, we illustrate how these
work practices are influenced by different factors, including the organizational setting, the age of the project, and the
software architecture. We discuss how our results are relevant for both CSCW and Software Engineering
researchers.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :The World in a Nutshell Concise Range Queries
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/world-nutshell-concise-range-queries
Abstract : With the advance of wireless communication technology, it is quite common for people to view maps or get
related services from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most
commonly used tools, are often posed by the users to retrieve needful information from a spatial database. However,
due to the limits of communication bandwidth and hardware power of handheld devices, displaying all the results of a
range query on a handheld device is neither communicationefficient nor informative to the users. This is simply
because that there are often too many results returned from a range query.
In view of this problem, we present a novel idea that a concise representation of a specified size for the range query
results, while incurring minimal information loss, shall be computed and returned to the user. Such a concise range
query not only reduces communication costs, but also offers better usability to the users, providing an opportunity for
interactive exploration.
The usefulness of the concise range queries is confirmed by comparing it with other possible alternatives, such as
sampling and clustering. Unfortunately, we prove that finding the optimal representation with minimum information
loss is an NP-hard problem. Therefore, we propose several effective and nontrivial algorithms to find a good
approximate result. Extensive experiments on real-world data have demonstrated the effectiveness and efficiency of
the proposed techniques.
Title :A Query Formulation Language for the Data Web
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/query-formulation-language-data-web
Abstract : We present a query formulation language called MashQL in order to easily query and fuse structured data
on the web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one or
multiple data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of
these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source
should have -an offline or inline- schema. This poses several language-design and performance complexities that we
fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose
the Data Web scenario. We also chose querying RDF, as it is the most primitive data model; hence, MashQL can be
similarly used for querying relational databases and XML. We present two implementations of MashQL, an online
mashup editor, and a Firefox add-on. The former illustrates how MashQL can be used to query and mash up the Data
Web as simple as filtering and piping web feeds; and the Firefox addon illustrates using the browser as a web
composer rather than only a navigator. To end, we evaluate MashQL on querying two datasets, DBLP and DBPedia,
and show that our indexing techniques allow instant user-interaction.
Title :Exploring Application-Level Semantics for Data Compression
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/exploring-application-level-semantics-data-compression
Abstract : Natural phenomena show that many creatures form large social groups and move in regular patterns.
However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we
first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their
movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which
exploits the obtained group movement patterns to reduce the amount of delivered data.
The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge
phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the
entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that
obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression
ratio. The experimental results show that the proposed compression algorithm leverages the group movement
patterns to reduce the amount of delivered data effectively and efficiently.
Title :Data Leakage Detection
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/data-leakage-detection
Abstract : A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of
the data is leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must
assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently
gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of
identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases
we can also inject "realistic but fake" data records to further improve our chances of detecting leakage and identifying
the guilty party.
M.Phil Computer Science Data Mining Projects
Title :Knowledge Based Interactive Postmining of Association Rules Using Ontologies
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/knowledge-based-interactive-postmining-association-rules-using-ontologies
Abstract : In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered
rules. To overcome this drawback, several methods were proposed in the literature such as item set concise
representations, redundancy reduction, and post processing. However, being generally based on statistical
information, most of these methods do not guarantee that the extracted rules are interesting for the user. Thus, it is
crucial to help the decision-maker with an efficient post processing step in order to reduce the number of rules. This
paper proposes a new interactive approach to prune and filter discovered rules. First, we propose to use ontologies in
order to improve the integration of user knowledge in the post processing task. Second, we propose the Rule Schema
formalism extending the specification language proposed by Liu et al. for user expectations. Furthermore, an
interactive framework is designed to assist the user throughout the analyzing task. Applying our new approach over
voluminous sets of rules, we were able, by integrating domain expert knowledge in the post processing step, to
reduce the number of rules to several dozens or less. Moreover, the quality of the filtered rules was validated by the
domain expert at various points in the interactive process.
Title :A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/link-analysis-mining-relational-databases
Abstract : This work introduces a link-analysis procedure for discovering relationships in a relational database or a
graph, generalizing both simple and multiple correspondence analysis. It is based on a random-walk model through
the database defining a Markov chain having as many states as elements in the database. Suppose we are interested
in analyzing the relationships between some elements (or records) contained in two different tables of the relational
database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest
and preserving the main characteristics of the initial chain is extracted by stochastic complementation. This reduced
chain is then analyzed by projecting jointly the elements of interest in the diffusion-map subspace and visualizing the
results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined and to
multiple correspondence analyses when the database takes the form of a simple star schema. On the other hand, a
kernel version of the diffusion-map distance, generalizing the basic diffusion-map distance to directed graphs, is also
introduced and the links with spectral clustering are discussed. Several datasets are analyzed by using the proposed
methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.
Title :Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/query-planning-continuous-aggregation-queries
Abstract : Continuous queries are used to monitor changes to time varying data and to provide results useful for
online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data
items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In
these queries a client specifies a coherency requirement as part of the query. We present a low-cost, scalable
technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a
network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various
fragments of a dynamic web-page are served by one or more nodes of a content distribution network, our technique
involves decomposing a client query into sub-queries and executing sub-queries on judiciously chosen data
aggregators with their individual sub-query incoherency bounds. We provide a technique for getting the optimal set of
sub-queries with their incoherency bounds which satisfies client query's coherency requirement with least number of
refresh messages sent from aggregators to the client. For estimating the number of refresh messages, we build a
query cost model which can be used to estimate the number of messages required to satisfy the client specified
incoherency bound. Performance results using real-world traces show that our cost based query planning leads to
queries being executed using less than one third the number of messages required by existing schemes.
Title :Scalable learning of collective behavior
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/scalable-learning-collective-behavior
Abstract : This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Face book, Twitter, Flicker, and YouTube present
opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict
collective behavior in social media. In particular, given information about some individuals, how can we infer the
behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown
effective in addressing the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails
scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an
edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed
approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction
performance to other non-scalable methods.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Horizontal Aggregations in SQL to prepare Data Sets for Data Mining Analysis
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/horizontal-aggregations-sql-data-mining-analysis
Abstract : Preparing a data set for analysis is generally the most time consuming task in a data mining project,
requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have
limitations to prepare data sets because they return one column per aggregated group. In general, a significant
manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful,
methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of
numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal
aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation-variable,
instance-feature), which is the standard layout required by most data mining algorithms. We propose three
fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ:
Based on standard relational algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by
some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method
has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT
methods exhibit linear scalability, whereas the SPJ method does not.
M.Phil Computer Science Data Mining Projects
Title :A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/machine-learning-identifying-disease-treatment-relations-short-texts
Abstract : The Machine Learning (ML) field has gained its momentum in almost any domain of research and just
recently has become a reliable tool in the medical domain. The empirical domain of automatic learning is used in
tasks such as medical decision support, medical imaging, protein-protein interaction, extraction of medical knowledge,
and for overall patient management care.
ML is envisioned as a tool by which computer-based systems can be integrated in the healthcare field in order to get
a better, more efficient medical care. This paper describes a ML-based methodology for building an application that is
capable of identifying and disseminating healthcare information.
It extracts sentences from published medical papers that mention diseases and treatments, and identifies semantic
relations that exist between diseases and treatments.
Our evaluation results for these tasks show that the proposed methodology obtains reliable outcomes that could be
integrated in an application to be used in the medical care domain. The potential value of this paper stands in the ML
settings that we propose and in the fact that we outperform previous results on the same data set.
Title :m-Privacy for Collaborative Data Publishing
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/privacy-collaborative-data-publishing
Abstract : In this paper, we consider the collaborative data publishing problem for anonymizing horizontally
partitioned data at multiple data providers. We consider a new type of "insider attack" by colluding data providers who
may use their own data records (a subset of the overall data) in addition to the external background knowledge to
infer the data records contributed by other data providers. The paper addresses this new threat and makes several
contributions. First, we introduce the notion of m-privacy, which guarantees that the anonymized data satisfies a given
privacy constraint against any group of up to m colluding data providers. Second, we present heuristic algorithms
exploiting the equivalence group monotonicity of privacy constraints and adaptive ordering techniques for efficiently
checking m-privacy given a set of records. Finally, we present a data provider-aware anonymization algorithm with
adaptive m- privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency.
Experiments on real-life datasets suggest that our approach achieves better or comparable utility and efficiency than
existing and baseline algorithms while providing m-privacy guarantee.
Title :Spatial Approximate String Search
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/spatial-approximate-string-search
Abstract : This work deals with the approximate string search in large spatial databases. Specifically, we investigate
range queries augmented with a string similarity search predicate in both Euclidean space and road networks. We
dub this query the spatial approximate string (S AS ) query. In Euclidean space, we propose an approximate solution,
the M H R-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps
a concise representation of the union of q-grams from strings under the sub-tree of u. We analyze the pruning
functionality of such signatures based on the set resemblance between the query string and the q-grams from the
sub-trees of index nodes. We also discuss how to estimate the selectivity of a S AS query in Euclidean space, for
which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information
stored in the tree. For queries on road networks, we propose a novel exact method, R SAS S OL, which significantly
outperforms the baseline algorithm in practice. The R SAS S OL combines the q-gram based inverted lists and the
reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and
effectiveness of our approaches.
Title :Predicting iPhone Sales from iPhone Tweets
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/predicting-iphone-sales-iphone-tweets
Abstract : Recent research in the field of computational social science have shown how data resulting from the
widespread adoption and use of social media channels such as twitter can be used to predict outcomes such as
movie revenues, election winners, localized moods, and epidemic outbreaks. Underlying assumptions for this
research stream on predictive analytics are that social media actions such as tweeting, liking, commenting and rating
are proxies for user/consumer's attention to a particular object/product and that the shared digital artefact that is
persistent can create social influence. In this paper, we demonstrate how social media data from twitter can be used
to predict the sales of iPhones. Based on a conceptual model of social data consisting of social graph (actors, actions,
activities, and artefacts) and social text (topics, keywords, pronouns, and sentiments), we develop and evaluate a
linear regression model that transforms iPhone tweets into a prediction of the quarterly iPhone sales with an average
error close to the established prediction models from investment banks. This strong correlation between iPhone
tweets and iPhone sales becomes marginally stronger after incorporating sentiments of tweets. We discuss the
findings and conclude with implications for predictive analytics with big social data.
Title :A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/clustering-based-feature-subset-selection-algorithm-high-dimensional-data
Abstract : Feature selection involves identifying a subset of the most useful features that produces compatible results
as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and
effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the
effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature
selection algorithm, FAST, is proposed and experimentally evaluated in this paper. The FAST algorithm works in two
steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second
step, the most representative feature that is strongly related to target classes is selected from each cluster to form a
subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has
a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we
adopt the efficient minimum-spanning tree clustering method. The efficiency and effectiveness of the FAST algorithm
are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several
representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to
four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the
instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly
available real-world high dimensional image, microarray, and text data, demonstrate that FAST not only produces
smaller subsets of features but also improves the performances of the four types of classifiers.
M.Phil Computer Science Data Mining Projects
Title :Crowdsourcing Predictors of Behavioral Outcomes
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/crowdsourcing-predictors-behavioral-outcomes
Abstract : Generating models from large data sets--and deter- mining which subsets of data to mine--is becoming
increasingly automated. However choosing what data to collect in the first place requires human intuition or
experience, usually supplied by a domain expert. This paper describes a new approach to machine science which
demonstrates for the first time that non-domain experts can collectively formulate features, and provide values for
those features such that they are predictive of some behavioral outcome of interest. This was accomplished by
building a web platform in which human groups interact to both respond to questions likely to help predict a behavioral
outcome and pose new questions to their peers. This results in a dynamically-growing online survey, but the result of
this cooperative behavior also leads to models that can predict user's outcomes based on their responses to the
user-generated survey questions. Here we describe two web-based experiments that instantiate this approach: the
first site led to models that can predict users' monthly electric energy consumption; the other led to models that can
predict users' body mass index. As exponential increases in content are often observed in successful online
collaborative communities, the proposed methodology may, in the future, lead to similar exponential rises in discovery
and insight into the causal factors of behavioral outcomes.
Title :Data Extraction for Deep Web Using WordNet
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/data-extraction-deep-web-using-wordnet
Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to
achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a
lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity
of data records and detect the correct data region with higher precision using the semantic properties of these data
records. The advantages of this method are that it can extract three types of data records, namely, single-section data
records, multiple-section data records, and loosely structured data records, and it also provides options for aligning
iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than
the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from
multilingual web pages and that it is domain independent.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Data Extraction for Deep Web Using WordNet
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/data-extraction-deep-web-using-wordnet-code
Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to
achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a
lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity
of data records and detect the correct data region with higher precision using the semantic properties of these data
records. The advantages of this method are that it can extract three types of data records, namely, single-section data
records, multiple-section data records, and loosely structured data records, and it also provides options for aligning
iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than
the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from
multilingual web pages and that it is domain independent.
Title :Data Extraction for Deep Web Using WordNet
Language : PHP
Project Link : http://kasanpro.com/p/php/data-extraction-deep-web-using-wordnet-implement
Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to
achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a
lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity
of data records and detect the correct data region with higher precision using the semantic properties of these data
records. The advantages of this method are that it can extract three types of data records, namely, single-section data
records, multiple-section data records, and loosely structured data records, and it also provides options for aligning
iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than
the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from
multilingual web pages and that it is domain independent.
Title :An Effective Retrieval of Medical Records using Data Mining Techniques
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/retrieval-medical-records-data-mining
Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and
efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to
obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable
of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate
and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword
searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm
(KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively.
M.Phil Computer Science Data Mining Projects
Title :An Effective Retrieval of Medical Records using Data Mining Techniques
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/retrieval-medical-records-data-mining-code
Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and
efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to
obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable
of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate
and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword
searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm
(KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively.
Title :An Effective Retrieval of Medical Records using Data Mining Techniques
Language : PHP
Project Link : http://kasanpro.com/p/php/retrieval-medical-records-data-mining-implement
Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and
efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to
obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable
of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate
and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword
searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm
(KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively.
Title :Design and analysis of concept adapting real time data stream Applications
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/concept-adapting-real-time-data-stream-applications
Abstract : Real - time signals are continuous in nature and abruptly changing hence there is a need to apply an
efficient and concept adapting real - time data stream mining technique to take intelligent decisions online. Concept
drift in real time data stream refers to a change in the class (concept) definitions over time. It is also called as NON -
STATIONARY LEARNING (NSL).
The most important criteria are to solve the real - time data stream mining problem with 'concept drift' in well manner.
Title :Data Extraction for Deep Web Using WordNet
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/data-extraction-deep-web-using-wordnet-module
Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to
achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a
lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity
of data records and detect the correct data region with higher precision using the semantic properties of these data
records. The advantages of this method are that it can extract three types of data records, namely, single-section data
records, multiple-section data records, and loosely structured data records, and it also provides options for aligning
iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than
the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from
multilingual web pages and that it is domain independent.
Title :Answering General Time-Sensitive Queries
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/answering-general-time-sensitive-queries
Abstract : Time is an important dimension of relevance for a large number of searches, such as over blogs and news
archives. So far, research on searching over such collections has largely focused on locating topically similar
documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this
paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the
documents in a news archive is important and should be considered in conjunction with the topic similarity to derive
the final document ranking. Earlier work has focused on improving retrieval for "recency" queries that target recent
documents. We propose a more general framework for handling time-sensitive queries and we automatically identify
the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that
seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental
evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the
Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over
a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust
and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.
M.Phil Computer Science Data Mining Projects
Title :Answering General Time-Sensitive Queries
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/answering-general-time-sensitive-queries-framwork
Abstract : Time is an important dimension of relevance for a large number of searches, such as over blogs and news
archives. So far, research on searching over such collections has largely focused on locating topically similar
documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this
paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the
documents in a news archive is important and should be considered in conjunction with the topic similarity to derive
the final document ranking. Earlier work has focused on improving retrieval for "recency" queries that target recent
documents. We propose a more general framework for handling time-sensitive queries and we automatically identify
the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that
seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental
evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the
Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over
a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust
and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.
Title :A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/indexing-scalable-record-linkage-deduplication
Abstract : Record linkage is the process of matching records from several databases that refer to the same entities.
When applied on a single database, this process is known as deduplication. Increasingly, matched data are becoming
important in many application areas, because they can contain information that is not available otherwise, or that is
too costly to acquire. Removing duplicate records in a single database is a crucial step in the data cleaning process,
because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the
increasing size of today's databases, the complexity of the matching process becomes one of the major challenges
for record linkage and deduplication. In recent years, various indexing techniques have been developed for record
linkage and deduplication. They are aimed at reducing the number of record pairs to be compared in the matching
process by removing obvious non-matching pairs, while at the same time maintaining high matching quality. This
paper presents a survey of twelve variations of six indexing techniques. Their complexity is analysed, and their
performance and scalability is evaluated within an experimental framework using both synthetic and real data sets. No
such detailed survey has so far been published.
Title :Decentralized Probabilistic Text Clustering
Language : NS2
Project Link : http://kasanpro.com/p/ns2/decentralized-probabilistic-text-clustering
Abstract : Text clustering is an established technique for improving quality in information retrieval, for both
centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly
distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high
scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of
its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers
probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental
evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the
algorithm.
Title :Decentralized Probabilistic Text Clustering
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/decentralized-probabilistic-text-clustering-code
Abstract : Text clustering is an established technique for improving quality in information retrieval, for both
centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly
distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high
scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of
its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers
probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental
evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the
algorithm.
Title :Effective Pattern Discovery for Text Mining
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/effective-pattern-discovery-text-mining
Abstract : Many data mining techniques have been proposed for mining useful patterns in text documents. However,
how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text
mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of
polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based
approaches should perform better than the term-based ones, but many experiments do not support this hypothesis.
This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern
deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding
relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate
that the proposed solution achieves encouraging performance.
M.Phil Computer Science Data Mining Projects
Title :Ranking Model Adaptation for Domain-Specific Search
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/adaptation-domain-specific-search
Abstract : With the explosive emergence of vertical search domains, applying the broad-based ranking model directly
to different domains is no longer desirable due to domain differences, while building a unique ranking model for each
domain is both laborious for labeling data and time-consuming for training models. In this paper, we address these
difficulties by proposing a regularization based algorithm called ranking adaptation SVM (RA-SVM), through which we
can adapt an existing ranking model to a new domain, so that the amount of labeled data and the training cost is
reduced while the performance is still guaranteed. Our algorithm only requires the prediction from the existing ranking
models, rather than their internal representations or the data from auxiliary domains. In addition, we assume that
documents similar in the domain-specific feature space should have consistent rankings, and add some constraints to
control the margin and slack variables of RA-SVM adaptively. Finally, ranking adaptability measurement is proposed
to quantitatively estimate if an existing ranking model can be adapted to a new domain. Experiments performed over
Letor and two large scale datasets crawled from a commercial search engine demonstrate the applicabilities of the
proposed ranking adaptation algorithms and the ranking adaptability measurement.
Title :Ranking Model Adaptation for Domain-Specific Search
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/ranking-adaptation-domain-specific-search
Abstract : With the explosive emergence of vertical search domains, applying the broad-based ranking model directly
to different domains is no longer desirable due to domain differences, while building a unique ranking model for each
domain is both laborious for labeling data and time-consuming for training models. In this paper, we address these
difficulties by proposing a regularization based algorithm called ranking adaptation SVM (RA-SVM), through which we
can adapt an existing ranking model to a new domain, so that the amount of labeled data and the training cost is
reduced while the performance is still guaranteed. Our algorithm only requires the prediction from the existing ranking
models, rather than their internal representations or the data from auxiliary domains. In addition, we assume that
documents similar in the domain-specific feature space should have consistent rankings, and add some constraints to
control the margin and slack variables of RA-SVM adaptively. Finally, ranking adaptability measurement is proposed
to quantitatively estimate if an existing ranking model can be adapted to a new domain. Experiments performed over
Letor and two large scale datasets crawled from a commercial search engine demonstrate the applicabilities of the
proposed ranking adaptation algorithms and the ranking adaptability measurement.
Title :Scalable Learning of Collective Behavior
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/scalable-learning-collective-behavior-code
Abstract : This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present
opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict
collective behavior in social media. In particular, given information about some individuals, how can we infer the
behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown
effective in addressing the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails
scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an
edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed
approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction
performance to other non-scalable methods.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Scalable Learning of Collective Behavior
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/scalable-learning-collective-behavior-implement
Abstract : This study of collective behavior is to understand how individuals behave in a social networking
environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present
opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict
collective behavior in social media. In particular, given information about some individuals, how can we infer the
behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown
effective in addressing the heterogeneity of connections presented in social media. However, the networks in social
media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails
scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an
edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed
approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction
performance to other non-scalable methods.
Title :Resilient Identity Crime Detection
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/resilient-identity-crime-detection
Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity
crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching
have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new
multilayered detection system complemented with two additional layers: communal detection (CD) and spike
detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic
social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to
increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a
variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing
legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million
real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns
are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud
detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to
the design, implementation, and evaluation of all detection systems.
M.Phil Computer Science Data Mining Projects
Title :Resilient Identity Crime Detection
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/resilient-identity-crime-detection-code
Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity
crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching
have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new
multilayered detection system complemented with two additional layers: communal detection (CD) and spike
detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic
social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to
increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a
variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing
legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million
real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns
are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud
detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to
the design, implementation, and evaluation of all detection systems.
Title :Resilient Identity Crime Detection
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/resilient-identity-crime-detection-implement
Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity
crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching
have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new
multilayered detection system complemented with two additional layers: communal detection (CD) and spike
detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic
social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to
increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a
variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing
legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million
real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns
are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud
detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to
the design, implementation, and evaluation of all detection systems.
Title :Resilient Identity Crime Detection
Language : PHP
Project Link : http://kasanpro.com/p/php/resilient-identity-crime-detection-module
Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity
crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching
have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new
multilayered detection system complemented with two additional layers: communal detection (CD) and spike
detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic
social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to
increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a
variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing
legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million
real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns
are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud
detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to
the design, implementation, and evaluation of all detection systems.
Title :Real-Time Analysis of Physiological Data to Support Medical Applications
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/real-time-analysis-physiological-data-support-medical-applications
Abstract : This paper presents a flexible framework that per- forms real-time analysis of physiological data to monitor
people's health conditions in any context (e.g., during daily activities, in hospital environments). Given historical
physiological data, different behavioral models tailored to specific conditions (e.g., a particular disease, a specific
patient) are automatically learnt. A suitable model for the currently monitored patient is exploited in the real- time
stream classification phase. The framework has been designed to perform both instantaneous evaluation and stream
analysis over a sliding time window. To allow ubiquitous monitoring, real-time analysis could also be executed on
mobile devices. As a case study, the framework has been validated in the intensive care scenario. Experimental
validation, performed on 64 patients affected by different critical illnesses, demonstrates the effectiveness and the
flexibility of the proposed framework in detecting different severity levels of monitored people's clinical situations.
Title :Contextual query classification in web search
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/contextual-query-classification-web-search
Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality
of a search engine's results. User context elements like interests, preferences and intents are the main sources
exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve
the query specific retrieval search relies on classifying web queries into three types: informational, navigational and
transactional according to the user intent. However, query type classification strategies involved are based solely on
query features where the query type decision is made out of the user context represented by his search history. In this
paper, we present a con- textual query classification method making use of both query features and the user context
defined by quality indicators of the previous query session type called the query profile. We define a query session as
a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our
approach is promising.
M.Phil Computer Science Data Mining Projects
Title :Contextual query classification in web search
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/contextual-query-classification-web-search-results
Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality
of a search engine's results. User context elements like interests, preferences and intents are the main sources
exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve
the query specific retrieval search relies on classifying web queries into three types: informational, navigational and
transactional according to the user intent. However, query type classification strategies involved are based solely on
query features where the query type decision is made out of the user context represented by his search history. In this
paper, we present a con- textual query classification method making use of both query features and the user context
defined by quality indicators of the previous query session type called the query profile. We define a query session as
a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our
approach is promising.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Contextual query classification in web search
Language : PHP
Project Link : http://kasanpro.com/p/php/query-classification-web-search
Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality
of a search engine's results. User context elements like interests, preferences and intents are the main sources
exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve
the query specific retrieval search relies on classifying web queries into three types: informational, navigational and
transactional according to the user intent. However, query type classification strategies involved are based solely on
query features where the query type decision is made out of the user context represented by his search history. In this
paper, we present a con- textual query classification method making use of both query features and the user context
defined by quality indicators of the previous query session type called the query profile. We define a query session as
a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our
approach is promising.
Title :Annotating Search Results from Web Databases
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/annotating-search-results-web-databases
Abstract : An increasing number of databases have become web accessible through HTML form-based search
interfaces. The data units returned from the underlying database are usually encoded into the result pages
dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many
applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and
assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units
on a result page into different groups such that the data in the same group have the same semantic. Then, for each
group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label
for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result
pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
Title :Annotating Search Results from Web Databases
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/annotating-search-results-web-databas
Abstract : An increasing number of databases have become web accessible through HTML form-based search
interfaces. The data units returned from the underlying database are usually encoded into the result pages
dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many
applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and
assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units
on a result page into different groups such that the data in the same group have the same semantic. Then, for each
group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label
for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result
pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
Title :Annotating Search Results from Web Databases
Language : PHP
Project Link : http://kasanpro.com/p/php/annotating-search-results-web-databases-efficient
Abstract : An increasing number of databases have become web accessible through HTML form-based search
interfaces. The data units returned from the underlying database are usually encoded into the result pages
dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many
applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and
assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units
on a result page into different groups such that the data in the same group have the same semantic. Then, for each
group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label
for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result
pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
M.Phil Computer Science Data Mining Projects
Title :A cost sensitive decision tree classification in credit card identity crime detection system
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/cost-sensitive-decision-tree-classification-credit-card-identity-crime-detec
Abstract :
Title :A cost sensitive decision tree classification in credit card identity crime detection system
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/cost-sensitive-decision-tree-classification-credit-card-identity-crime-d
Abstract :
Title :A cost sensitive decision tree classification in credit card identity crime detection system
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/cost-sensitive-decision-tree-classification-credit-card-identity-fraud-crime-detection
Abstract :
Title :A cost sensitive decision tree classification in credit card identity crime detection system
Language : PHP
Project Link : http://kasanpro.com/p/php/decision-tree-classification-credit-card-identity-crime-detection-system
Abstract :
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :A cost-sensitive decision tree approach for fraud detection
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/credit-card-identity-crime-detection-system-cost-sensitive-decision-tree-classification
Abstract : With the developments in the information technology, fraud is spreading all over the world, resulting in
huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card
systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over
virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection
becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive
decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at
each non-terminal node is developed and the performance of this approach is compared with the well-known
traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken
as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known
methods on the given prob- lem set with respect to the well-known performance metrics such as accuracy and true
positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly,
financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud
detection systems.
M.Phil Computer Science Data Mining Projects
Title :A cost-sensitive decision tree approach for fraud detection
Language : VB.NET
Project Link :
http://kasanpro.com/p/vb-net/cost-sensitive-decision-tree-classify-credit-card-identity-crime-detection-system
Abstract : With the developments in the information technology, fraud is spreading all over the world, resulting in
huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card
systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over
virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection
becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive
decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at
each non-terminal node is developed and the performance of this approach is compared with the well-known
traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken
as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known
methods on the given prob- lem set with respect to the well-known performance metrics such as accuracy and true
positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly,
financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud
detection systems.
Title :PREDICTING HOME SERVICE DEMANDS FROM APPLIANCE USAGE DATA
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/predicting-home-service-demands-from-appliance-usage-data
Abstract : Power management in homes and offices requires appliance usage prediction when the future user
requests are not available. The randomness and uncertainties associated with an appliance usage make the
prediction of appliance usage from energy consumption data a non-trivial task. A general model for prediction at the
appliance level is still lacking. In this work, we propose to enrich learning algorithms with expert knowledge and
propose a general model using a knowledge driven approach to forecast if a particular appliance will start at a given
hour or not. The approach is both a knowledge driven and data driven one. The overall energy management for a
house requires that the prediction is done for the next 24 hours in the future. The proposed model is tested over the
Irise data and the results are compared with some trivial knowledge driven predictors.
Title :Data Mining and Wireless Sensor Network for Groundnut Pest/Disease Interaction and Predictions - A
Preliminary Study
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/data-mining-wireless-sensor-network-groundnut-pest-disease-predictions
Abstract : Data driven precision agriculture aspects, particularly the pest/disease management, require a dynamic
crop-weather data. An experiment was conducted in semi-arid region of India to understand the
crop-weather-pest/disease relations using wireless sensory and field-level surveillance data on closely related and
interdependent pest (Thrips) - disease (Bud Necrosis) dynamics of groundnut (peanut) crop. Various data mining
techniques were used to turn the data into useful information/ knowledge/ relations/ trends and correlation of
crop-weather-pest/disease continuum. These dynamics obtained from the data mining techniques and trained through
mathematical models were validated with corresponding ground level surveillance data. It was found that Bud
Necrosis viral disease infection is strongly influenced by Humidity, Maximum Temperature, prolonged duration of leaf
wetness, age of the crop and propelled by a carrier pest Thrips. Results obtained from the four continuous agriculture
seasons (monsoon & post monsoon) data has led to develop cumulative and non-cumulative prediction models,
which can assist the user community to take respective ameliorative measures.
Title :Mining Social Media Data for Understanding Student's Learning Experiences
Language : ASP.NET with VB
Project Link :
http://kasanpro.com/p/asp-net-with-vb/mining-social-media-data-understanding-students-learning-experiences
Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational
experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented
environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be
challenging. The complexity of student's experiences reflected from social media content requires human
interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we
developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on
engineering student's Twitter posts to understand issues and problems in their educational experiences. We first
conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep
deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets
reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000
tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and
results that show how informal social media data can provide insights into students' experiences.
Title :Mining Social Media Data for Understanding Student's Learning Experiences
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/mining-social-media-data-understanding-students-learning-experien
Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational
experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented
environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be
challenging. The complexity of student's experiences reflected from social media content requires human
interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we
developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on
engineering student's Twitter posts to understand issues and problems in their educational experiences. We first
conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep
deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets
reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000
tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and
results that show how informal social media data can provide insights into students' experiences.
M.Phil Computer Science Data Mining Projects
Title :Mining Social Media Data for Understanding Student's Learning Experiences
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/mining-social-media-data-understanding-students-learning-experiences-implement
Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational
experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented
environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be
challenging. The complexity of student's experiences reflected from social media content requires human
interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we
developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on
engineering student's Twitter posts to understand issues and problems in their educational experiences. We first
conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep
deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets
reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000
tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and
results that show how informal social media data can provide insights into students' experiences.
Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks
Language : ASP.NET with VB
Project Link : http://kasanpro.com/p/asp-net-with-vb/viral-marketing-cost-effective-time-critical-campaigns-large-scale-social-n
Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and
advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in
social networks can be used to increase product adoption or widely spread content over the network. The common
perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of
traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few
hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature.
With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to
economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral
marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on
power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding.
To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size
networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees
a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics
which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding
within a ratio better than O(log n) is unlikely possible.
http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews
Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks
Language : ASP.NET with C#
Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/cost-effective-viral-marketing-time-critical-campaigns-large-scale-so
Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and
advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in
social networks can be used to increase product adoption or widely spread content over the network. The common
perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of
traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few
hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature.
With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to
economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral
marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on
power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding.
To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size
networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees
a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics
which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding
within a ratio better than O(log n) is unlikely possible.
Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks
Language : C#
Project Link :
http://kasanpro.com/p/c-sharp/effective-viral-marketing-time-critical-campaigns-large-scale-social-networks
Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and
advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in
social networks can be used to increase product adoption or widely spread content over the network. The common
perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of
traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few
hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature.
With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to
economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral
marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on
power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding.
To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size
networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees
a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics
which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding
within a ratio better than O(log n) is unlikely possible.
Title :Green Mining: Investigating Power Consumption across Versions
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/green-mining-investigating-power-consumption-versions
Abstract : Power consumption is increasingly becoming a concern for not only electrical engineers, but for software
engineers as well, due to the increasing popularity of new power-limited contexts such as mobile-computing,
smart-phones and cloud-computing. Software changes can alter software power consumption behaviour and can
cause power performance regressions. By tracking software power consumption we can build models to provide
suggestions to avoid power regressions. There is much research on software power consumption, but little focus on
the relationship between software changes and power consumption. Most work measures the power consumption of
a single software task; instead we seek to extend this work across the history (revisions) of a project. We develop a
set of tests for a well established product and then run those tests across all versions of the product while recording
the power usage of these tests. We provide and demonstrate a methodology that enables the analysis of power
consumption performance for over 500 nightly builds of Firefox 3.6; we show that software change does induce
changes in power consumption. This methodology and case study are a first step towards combining power
measurement and mining software repositories research, thus enabling developers to avoid power regressions via
power consumption awareness.
M.Phil Computer Science Data Mining Projects
Title :Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster
number
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/categorical-numerical-attribute-data-clustering-based
Abstract : Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but
not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and
categorical attributes because there exists an awkward gap between the similarity metrics for categorical and
numerical data. This paper therefore presents a general clustering framework based on the concept of object-cluster
similarity and gives a unified similarity metric which can be simply applied to the data with categorical, numerical, and
mixed attributes. Accordingly, an iterative clustering algorithm is developed, whose outstanding performance is
experimentally demonstrated on different benchmark data sets. Moreover, to circumvent the difficult selection problem
of cluster number, we further develop a penalized competitive learning algorithm within the proposed clustering
framework. The embedded competition and penalization mechanisms enable this improved algorithm to determine
the number of clusters automatically by gradually eliminating the redundant clusters. The experimental results show
the efficacy of the proposed approach.
Title :Categorical-and-numerical-attribute data clustering using K - Mode clustering and Fuzzy K - Mode clustering
Language : C#
Project Link : http://kasanpro.com/p/c-sharp/categorical-numerical-attribute-data-clustering-fuzzy
Abstract : Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but
not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and
categorical attributes because there exists an awkward gap between the similarity metrics for categorical and
numerical data. This paper therefore presents a general clustering framework based on the concept of object-cluster
similarity and gives a unified similarity metric which can be simply applied to the data with categorical, numerical, and
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects

More Related Content

What's hot

Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17redpel dot com
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )SBGC
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistStratos Kontopoulos
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
Mining developer communication data streams
Mining developer communication data streamsMining developer communication data streams
Mining developer communication data streamscsandit
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big DataKhadija Atiya
 
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe SystemOntology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe Systemtheijes
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databasesijdms
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...iosrjce
 
Keyur_Joshi_resume - Copy
Keyur_Joshi_resume - CopyKeyur_Joshi_resume - Copy
Keyur_Joshi_resume - CopyKeyur Joshi
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision MakingPatrick Sunter
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atifAtif Latif
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titlestema_solution
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET Journal
 

What's hot (19)

The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
 
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge ScientistEthics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
Ethics & (Explainable) AI – Semantic AI & the Role of the Knowledge Scientist
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
Mining developer communication data streams
Mining developer communication data streamsMining developer communication data streams
Mining developer communication data streams
 
An Introduction to CCDH
An Introduction to CCDHAn Introduction to CCDH
An Introduction to CCDH
 
Recommender System in light of Big Data
Recommender System in light of Big DataRecommender System in light of Big Data
Recommender System in light of Big Data
 
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe SystemOntology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
Ontology-Based Routing for Large-Scale Unstructured P2P Publish/Subscribe System
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
Implemenation of Enhancing Information Retrieval Using Integration of Invisib...
 
Keyur_Joshi_resume - Copy
Keyur_Joshi_resume - CopyKeyur_Joshi_resume - Copy
Keyur_Joshi_resume - Copy
 
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
2013 Melbourne Software Freedom Day talk - FOSS in Public Decision Making
 
Vu2012
Vu2012Vu2012
Vu2012
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atif
 
Mca & diplamo java titles
Mca & diplamo java titlesMca & diplamo java titles
Mca & diplamo java titles
 
Ck34520526
Ck34520526Ck34520526
Ck34520526
 
IRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database TechniquesIRJET- Recommendation System based on Graph Database Techniques
IRJET- Recommendation System based on Graph Database Techniques
 

Viewers also liked

Lesson 10 uses of metals and extraction
Lesson 10 uses of metals and extractionLesson 10 uses of metals and extraction
Lesson 10 uses of metals and extractionAl Baha University
 
Analisis karakteristik gain edf
Analisis karakteristik gain edfAnalisis karakteristik gain edf
Analisis karakteristik gain edfAkhmad Hambali
 
Hi5 Creative Portfolio
Hi5 Creative PortfolioHi5 Creative Portfolio
Hi5 Creative PortfolioHi5Publicity
 
Driver's Attitudes toward Speed Limits
Driver's Attitudes toward Speed LimitsDriver's Attitudes toward Speed Limits
Driver's Attitudes toward Speed LimitsBasil Psarianos
 
مصارعاتنا ليست مع دم و لحم
مصارعاتنا ليست مع دم و لحممصارعاتنا ليست مع دم و لحم
مصارعاتنا ليست مع دم و لحمIbrahimia Church Ftriends
 
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™InstantGMP™
 
Cinap motorpart 2012
Cinap motorpart 2012Cinap motorpart 2012
Cinap motorpart 2012mapple2012
 
Professor of Physical Chemistry, Saudi Arabia ORCID 0000-0002-310...
Professor of Physical Chemistry, Saudi Arabia    ORCID          0000-0002-310...Professor of Physical Chemistry, Saudi Arabia    ORCID          0000-0002-310...
Professor of Physical Chemistry, Saudi Arabia ORCID 0000-0002-310...Al Baha University
 
Augmenting Salesforce by Integrating with a Virtual Agent
Augmenting Salesforce by Integrating with a Virtual AgentAugmenting Salesforce by Integrating with a Virtual Agent
Augmenting Salesforce by Integrating with a Virtual AgentnoHold, Inc.
 
Mark Tewart – The Death Of The Traditional Dealership And Salesperson
Mark Tewart – The Death Of The Traditional Dealership And SalespersonMark Tewart – The Death Of The Traditional Dealership And Salesperson
Mark Tewart – The Death Of The Traditional Dealership And SalespersonSean Bradley
 
Getting started-joomla-rytechsites - 2912
Getting started-joomla-rytechsites - 2912Getting started-joomla-rytechsites - 2912
Getting started-joomla-rytechsites - 2912li_gordon
 

Viewers also liked (20)

Growing Empire
Growing EmpireGrowing Empire
Growing Empire
 
Lesson 10 uses of metals and extraction
Lesson 10 uses of metals and extractionLesson 10 uses of metals and extraction
Lesson 10 uses of metals and extraction
 
Question 4
Question 4Question 4
Question 4
 
Analisis karakteristik gain edf
Analisis karakteristik gain edfAnalisis karakteristik gain edf
Analisis karakteristik gain edf
 
Hi5 Creative Portfolio
Hi5 Creative PortfolioHi5 Creative Portfolio
Hi5 Creative Portfolio
 
Driver's Attitudes toward Speed Limits
Driver's Attitudes toward Speed LimitsDriver's Attitudes toward Speed Limits
Driver's Attitudes toward Speed Limits
 
مصارعاتنا ليست مع دم و لحم
مصارعاتنا ليست مع دم و لحممصارعاتنا ليست مع دم و لحم
مصارعاتنا ليست مع دم و لحم
 
Academy mortgage
Academy mortgageAcademy mortgage
Academy mortgage
 
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™
Moving From Paper-Based Systems to Electronic Batch Records - InstantGMP™
 
Cinap motorpart 2012
Cinap motorpart 2012Cinap motorpart 2012
Cinap motorpart 2012
 
Loutfy hamid madkour
Loutfy hamid madkourLoutfy hamid madkour
Loutfy hamid madkour
 
Boaj gazar(1)
Boaj gazar(1)Boaj gazar(1)
Boaj gazar(1)
 
Indutec 2012
Indutec 2012Indutec 2012
Indutec 2012
 
Week 3
Week 3Week 3
Week 3
 
Professor of Physical Chemistry, Saudi Arabia ORCID 0000-0002-310...
Professor of Physical Chemistry, Saudi Arabia    ORCID          0000-0002-310...Professor of Physical Chemistry, Saudi Arabia    ORCID          0000-0002-310...
Professor of Physical Chemistry, Saudi Arabia ORCID 0000-0002-310...
 
Augmenting Salesforce by Integrating with a Virtual Agent
Augmenting Salesforce by Integrating with a Virtual AgentAugmenting Salesforce by Integrating with a Virtual Agent
Augmenting Salesforce by Integrating with a Virtual Agent
 
Mark Tewart – The Death Of The Traditional Dealership And Salesperson
Mark Tewart – The Death Of The Traditional Dealership And SalespersonMark Tewart – The Death Of The Traditional Dealership And Salesperson
Mark Tewart – The Death Of The Traditional Dealership And Salesperson
 
права детей
права  детейправа  детей
права детей
 
About Harvey-Daco
About Harvey-DacoAbout Harvey-Daco
About Harvey-Daco
 
Getting started-joomla-rytechsites - 2912
Getting started-joomla-rytechsites - 2912Getting started-joomla-rytechsites - 2912
Getting started-joomla-rytechsites - 2912
 

Similar to M phil-computer-science-data-mining-projects

IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# ProjectsVijay Karan
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# ProjectsVijay Karan
 
IEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsIEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsVijay Karan
 
IEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsIEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsVijay Karan
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )SBGC
 
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 1   50+ AbstNcct   Ieee Software Abstract Collection Volume 1   50+ Abst
Ncct Ieee Software Abstract Collection Volume 1 50+ Abstncct
 
M.E Computer Science Cloud Computing Projects
M.E Computer Science Cloud Computing ProjectsM.E Computer Science Cloud Computing Projects
M.E Computer Science Cloud Computing ProjectsVijay Karan
 
M.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsM.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsVijay Karan
 
M.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsM.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsVijay Karan
 
IEEE Cloud computing 2016 Title and Abstract
IEEE Cloud computing 2016 Title and AbstractIEEE Cloud computing 2016 Title and Abstract
IEEE Cloud computing 2016 Title and Abstracttsysglobalsolutions
 
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...NAbderrahim
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseLeMeniz Infotech
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)Editor IJCATR
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com msudan92
 
A k main routes approach to spatial network
A k main routes approach to spatial networkA k main routes approach to spatial network
A k main routes approach to spatial networkShakas Technologies
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMAssociate Professor in VSB Coimbatore
 
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...SBGC
 

Similar to M phil-computer-science-data-mining-projects (20)

IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# Projects
 
IEEE 2014 C# Projects
IEEE 2014 C# ProjectsIEEE 2014 C# Projects
IEEE 2014 C# Projects
 
IEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsIEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# Projects
 
IEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# ProjectsIEEE 2014 ASP.NET with C# Projects
IEEE 2014 ASP.NET with C# Projects
 
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 1   50+ AbstNcct   Ieee Software Abstract Collection Volume 1   50+ Abst
Ncct Ieee Software Abstract Collection Volume 1 50+ Abst
 
M.E Computer Science Cloud Computing Projects
M.E Computer Science Cloud Computing ProjectsM.E Computer Science Cloud Computing Projects
M.E Computer Science Cloud Computing Projects
 
Ak4301197200
Ak4301197200Ak4301197200
Ak4301197200
 
M.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsM.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing Projects
 
M.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing ProjectsM.Phil Computer Science Cloud Computing Projects
M.Phil Computer Science Cloud Computing Projects
 
Cloud java titles adrit solutions
Cloud java titles adrit solutionsCloud java titles adrit solutions
Cloud java titles adrit solutions
 
IEEE Cloud computing 2016 Title and Abstract
IEEE Cloud computing 2016 Title and AbstractIEEE Cloud computing 2016 Title and Abstract
IEEE Cloud computing 2016 Title and Abstract
 
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...
A_Deep_Neural_Network_With_Multiplex_Interactions_for_Cold-Start_Service_Reco...
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 
A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)A Generic Model for Student Data Analytic Web Service (SDAWS)
A Generic Model for Student Data Analytic Web Service (SDAWS)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com
 
A k main routes approach to spatial network
A k main routes approach to spatial networkA k main routes approach to spatial network
A k main routes approach to spatial network
 
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHMIMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
IMPROVEMENT OF ENERGY EFFICIENCY IN CLOUD COMPUTING BY LOAD BALANCING ALGORITHM
 
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
 

More from Vijay Karan

IEEE 2014 Java Projects
IEEE 2014 Java ProjectsIEEE 2014 Java Projects
IEEE 2014 Java ProjectsVijay Karan
 
IEEE 2015 Java Projects
IEEE 2015 Java ProjectsIEEE 2015 Java Projects
IEEE 2015 Java ProjectsVijay Karan
 
IEEE 2014 NS2 Projects
IEEE 2014 NS2 ProjectsIEEE 2014 NS2 Projects
IEEE 2014 NS2 ProjectsVijay Karan
 
IEEE 2015 NS2 Projects
IEEE 2015 NS2 ProjectsIEEE 2015 NS2 Projects
IEEE 2015 NS2 ProjectsVijay Karan
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsVijay Karan
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsVijay Karan
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# ProjectsVijay Karan
 
IEEE 2014 ASP.NET with VB Projects
IEEE 2014 ASP.NET with VB ProjectsIEEE 2014 ASP.NET with VB Projects
IEEE 2014 ASP.NET with VB ProjectsVijay Karan
 
M.E Computer Science Medical Imaging Projects
M.E Computer Science Medical Imaging ProjectsM.E Computer Science Medical Imaging Projects
M.E Computer Science Medical Imaging ProjectsVijay Karan
 
M.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsM.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsVijay Karan
 
M.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsM.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsVijay Karan
 
M.Phil Computer Science Wireless Communication Projects
M.Phil Computer Science Wireless Communication ProjectsM.Phil Computer Science Wireless Communication Projects
M.Phil Computer Science Wireless Communication ProjectsVijay Karan
 
M.E Computer Science Wireless Communication Projects
M.E Computer Science Wireless Communication ProjectsM.E Computer Science Wireless Communication Projects
M.E Computer Science Wireless Communication ProjectsVijay Karan
 
M.Phil Computer Science Parallel and Distributed System Projects
M.Phil Computer Science Parallel and Distributed System ProjectsM.Phil Computer Science Parallel and Distributed System Projects
M.Phil Computer Science Parallel and Distributed System ProjectsVijay Karan
 
M.E Computer Science Parallel and Distributed System Projects
M.E Computer Science Parallel and Distributed System ProjectsM.E Computer Science Parallel and Distributed System Projects
M.E Computer Science Parallel and Distributed System ProjectsVijay Karan
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsVijay Karan
 
M.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System ProjectsM.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System ProjectsVijay Karan
 
M.E Computer Science Biometric System Projects
M.E Computer Science Biometric System ProjectsM.E Computer Science Biometric System Projects
M.E Computer Science Biometric System ProjectsVijay Karan
 
M.Phil Computer Science Secure Computing Projects
M.Phil Computer Science Secure Computing ProjectsM.Phil Computer Science Secure Computing Projects
M.Phil Computer Science Secure Computing ProjectsVijay Karan
 
M.E Computer Science Secure Computing Projects
M.E Computer Science Secure Computing ProjectsM.E Computer Science Secure Computing Projects
M.E Computer Science Secure Computing ProjectsVijay Karan
 

More from Vijay Karan (20)

IEEE 2014 Java Projects
IEEE 2014 Java ProjectsIEEE 2014 Java Projects
IEEE 2014 Java Projects
 
IEEE 2015 Java Projects
IEEE 2015 Java ProjectsIEEE 2015 Java Projects
IEEE 2015 Java Projects
 
IEEE 2014 NS2 Projects
IEEE 2014 NS2 ProjectsIEEE 2014 NS2 Projects
IEEE 2014 NS2 Projects
 
IEEE 2015 NS2 Projects
IEEE 2015 NS2 ProjectsIEEE 2015 NS2 Projects
IEEE 2015 NS2 Projects
 
IEEE 2014 Matlab Projects
IEEE 2014 Matlab ProjectsIEEE 2014 Matlab Projects
IEEE 2014 Matlab Projects
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab Projects
 
IEEE 2015 C# Projects
IEEE 2015 C# ProjectsIEEE 2015 C# Projects
IEEE 2015 C# Projects
 
IEEE 2014 ASP.NET with VB Projects
IEEE 2014 ASP.NET with VB ProjectsIEEE 2014 ASP.NET with VB Projects
IEEE 2014 ASP.NET with VB Projects
 
M.E Computer Science Medical Imaging Projects
M.E Computer Science Medical Imaging ProjectsM.E Computer Science Medical Imaging Projects
M.E Computer Science Medical Imaging Projects
 
M.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsM.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing Projects
 
M.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsM.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing Projects
 
M.Phil Computer Science Wireless Communication Projects
M.Phil Computer Science Wireless Communication ProjectsM.Phil Computer Science Wireless Communication Projects
M.Phil Computer Science Wireless Communication Projects
 
M.E Computer Science Wireless Communication Projects
M.E Computer Science Wireless Communication ProjectsM.E Computer Science Wireless Communication Projects
M.E Computer Science Wireless Communication Projects
 
M.Phil Computer Science Parallel and Distributed System Projects
M.Phil Computer Science Parallel and Distributed System ProjectsM.Phil Computer Science Parallel and Distributed System Projects
M.Phil Computer Science Parallel and Distributed System Projects
 
M.E Computer Science Parallel and Distributed System Projects
M.E Computer Science Parallel and Distributed System ProjectsM.E Computer Science Parallel and Distributed System Projects
M.E Computer Science Parallel and Distributed System Projects
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking Projects
 
M.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System ProjectsM.Phil Computer Science Biometric System Projects
M.Phil Computer Science Biometric System Projects
 
M.E Computer Science Biometric System Projects
M.E Computer Science Biometric System ProjectsM.E Computer Science Biometric System Projects
M.E Computer Science Biometric System Projects
 
M.Phil Computer Science Secure Computing Projects
M.Phil Computer Science Secure Computing ProjectsM.Phil Computer Science Secure Computing Projects
M.Phil Computer Science Secure Computing Projects
 
M.E Computer Science Secure Computing Projects
M.E Computer Science Secure Computing ProjectsM.E Computer Science Secure Computing Projects
M.E Computer Science Secure Computing Projects
 

M phil-computer-science-data-mining-projects

  • 1. M.Phil Computer Science Data Mining Projects Web : www.kasanpro.com Email : sales@kasanpro.com List Link : http://kasanpro.com/projects-list/m-phil-computer-science-data-mining-projects Title :Bridging Socially Enhanced Virtual Communities Language : C# Project Link : http://kasanpro.com/p/c-sharp/bridging-socially-enhanced-virtual-communities Abstract : Interactions spanning multiple organizations have become an important aspect in todays collaboration landscape. Organizations create alliances to fulfill strategic objectives. The dynamic nature of collaborations increasingly demands for automated techniques and algorithms to support the creation of such alliances. Our approach bases on the recommendation of potential alliances by discovery of currently relevant competence sources and the support of semi automatic formation. The environment is service-oriented comprising humans and software services with distinct capabilities. To mediate between previously separated groups and organizations, we introduce the broker concept that bridges disconnected networks. We present a dynamic broker discovery approach based on interaction mining techniques and trust metrics. We evaluate our approach by using simulations in real Web services testbeds. Title :Mood Recognition During Online Self Assessment Test Language : C# Project Link : http://kasanpro.com/p/c-sharp/mood-recognition-during-online-self-assessment-test Abstract : Individual emotions play a crucial role during any learning interaction. Identifying a student's emotional state and providing personalized feedback, based on integrated pedagogical models, has been considered to be one of the main limits of traditional tools of e-learning. This paper presents an empirical study that llustrates how learner mood may be predicted during online self-assessmenttests. Here, a previous method of determining student mood has been refined based on the assumption that the influence on learner mood of questions already answered declines in relation to their distance from the current question. Moreover, this paper sets out to indicate that "exponential logic" may help produce more efficient models if integrated adequately with affective modeling. The results show that these assumptions may prove useful to future research. Title :On The Path To A World Wide Web Census: A Large Scale Survey Language : C# Project Link : http://kasanpro.com/p/c-sharp/world-wide-web-census-large-scale-survey Abstract : How large is the World Wide Web? We present the results of the largest Web survey performed to date.We use an inter-disciplinary approach which uses methods from ecology. In addition to Web server counts, we also present other information collected, such as Web server market share, operating system type used by Web servers and Web server distribution. The software system used to collect data is a prototype of a system that we believe can be used for a complete Web census. Title :Knowledge Sharing In Virtual Organizations: Barriers and Enablers Language : C# Project Link : http://kasanpro.com/p/c-sharp/knowledge-sharing-in-virtual-organizations-barriers-enablers Abstract : Modern organizations have to deal with many drastic external and internal constraints due notably to the globalization of the economy, the fast technological changes, and the shifts in customers demand. Moreover, organizations functionally divided and hierarchical internal structures are too rigid and make difficult their adjustment to the changing constraints resulting from the pressure of their external environment. Consequently, to survive and maintain their competitive advantage in the market, modern organizations must alter their internal structure to become
  • 2. organic and flexible systems able to adapt and progress in a high velocity environment. Virtual organizations are among the most popular solutions which provide organizations with more agility and improve their efficiency and effectiveness. Despite many success stories materialized by economic and non-economic benefits, many virtual organizations have failed to reach their goals due to the problems they have encountered while trying to manage knowledge. In this work, we analyze the barriers and enablers of knowledge management in virtual organizations. Title :Adaptive Provisioning of Human Expertise in Service-oriented Systems Language : C# Project Link : http://kasanpro.com/p/c-sharp/adaptive-provisioning-human-expertise-service-oriented-systems Abstract : Web-based collaborations have become essential in today's business environments. Due to the availability of various SOA frameworks, Web services emerged as the de facto technology to realize flexible compositions of services. While most existing work focuses on the discovery and composition of software based services, we highlight concepts for a people-centric Web. Knowledge-intensive environments clearly demand for provisioning of human expertise along with sharing of computing resources or business data through software-based services. To address these challenges, we introduce an adaptive approach allowing humans to provide their expertise through services using SOA standards, such as WSDL and SOAP. The seamless integration of humans in the SOA loop triggers numerous social implications, such as evolving expertise and drifting interests of human service providers. Here we propose a framework that is based on interaction monitoring techniques enabling adaptations in SOA-based socio-technical systems. M.Phil Computer Science Data Mining Projects Title :Cost-aware rank join with random and sorted access Language : C# Project Link : http://kasanpro.com/p/c-sharp/cost-aware-rank-join-random-sorted-access Abstract : In this project, we address the problem of joining ranked results produced by two or more services on the Web. We consider services endowed with two kinds of access that are often available: i) sorted access, which returns tuples sorted by score; ii) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of two or more relations and output the k combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, in this paper we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the CARS (Cost-Aware with Random and Sorted access) pulling strategy, which is derived at compile-time and is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. In addition, CARS is shown to be robust. The uncertainty that may characterize the estimated parameters. Title :USHER Improving Data Quality with Dynamic Forms Language : C# Project Link : http://kasanpro.com/p/c-sharp/usher-improving-data-quality-dynamic-forms Abstract : ta quality is a critical problem in modern databases. Data entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose USHER, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, USHER learns a probabilistic model over the questions of the form. USHER then applies this model at every step of the data entry process to improve data quality. Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as possible. During entry, it dynamically adapts the form to the values being entered, and enables real-time feedback to guide the data enterer toward their intended values. After entry, it re-asks questions that it deems likely to have been entered incorrectly. We evaluate all three components of USHER using two real-world data sets. Our results demonstrate that each component has the potential to improve data quality considerably, at a reduced cost when compared to current practice. Title :A Dual Framework and Algorithms for Targeted Data Delivery Language : C# Project Link : http://kasanpro.com/p/c-sharp/algorithms-targeted-data-delivery
  • 3. Abstract : In this project, we develop a framework for comparing pull based solutions and present dual optimization approaches. The first approach maximizes user utility while satisfying constraints on the usage of system resources. The second approach satisfies the utility of user profiles while minimizing the usage of system resources. We present an adaptive algorithm and show how it can incorporate feedback to improve user utility with only a moderate increase in resource utilization. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Selecting Attributes for Sentiment Classification Using Feature Relation Networks Language : C# Project Link : http://kasanpro.com/p/c-sharp/sentiment-classification-using-feature-relation-networks Abstract : A major concern when incorporating large sets of diverse n-gram features for sentiment classification is the presence of noisy, irrelevant, and redundant attributes. These concerns can often make it difficult to harness the augmented discriminatory potential of extended feature sets. We propose a rule-based multivariate text feature selection method called Feature Relation Network (FRN) that considers semantic information and also leverages the syntactic relationships between n-gram features. FRN is intended to efficiently enable the inclusion of extended sets of heterogeneous n-gram features for enhanced sentiment classification. Experiments were conducted on three online review test beds in comparison with methods used in prior sentiment classification research. FRN outperformed the comparison univariate, multivariate, and hybrid feature selection methods; it was able to select attributes resulting in significantly better classification accuracy irrespective of the feature subset sizes. Furthermore, by incorporating syntactic information about n-gram relations, FRN is able to select features in a more computationally efficient manner than many multivariate and hybrid techniques. Title :Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques Language : C# Project Link : http://kasanpro.com/p/c-sharp/aggregate-recommendation-diversity-using-ranking-based Abstract : Recommender systems are becoming increasingly important to individual users and businesses for providing personalized recommendations. However, while the majority of algorithms proposed in recommender systems literature have focused on improving recommendation accuracy, other important aspects of recommendation quality, such as the diversity of recommendations, have often been overlooked. In this paper, we introduce and explore a number of item ranking techniques that can generate recommendations that have substantially higher aggregate diversity across all users while maintaining comparable levels of recommendation accuracy. Comprehensive empirical evaluation consistently shows the diversity gains of the proposed techniques using several real-world rating datasets and different rating prediction algorithms. M.Phil Computer Science Data Mining Projects Title :Integration of Sound Signature in Graphical Password Authentication System Language : C# Project Link : http://kasanpro.com/p/c-sharp/sound-signature-graphical-password-authentication-system Abstract : In this project, a graphical password system with a supportive sound signature to increase the remembrance of the password is discussed. In proposed work a click-based graphical password scheme called Cued Click Points (CCP) is presented. In this system a password consists of sequence of some images in which user can select one click-point per image. In addition user is asked to select a sound signature corresponding to click point this sound signature will be used to help the user to login. System showed very good Performance in terms of speed, accuracy, and ease of use. Users preferred CCP to Pass Points, saying that selecting and remembering only one point per image was easier and sound signature helps considerably in recalling the click points. Title :Monitoring Service Systems from a Language-Action Perspective Language : C# Project Link : http://kasanpro.com/p/c-sharp/monitoring-service-systems-language-action
  • 4. Abstract : The Exponential growth in the global economy is being supported by service systems, realized by recasting mission-critical application services accessed across organizational boundaries. Language-Action Perspective (LAP) is based upon the notion as proposed that "expert behavior requires an exquisite sensitivity to context and that such sensitivity is more in the realm of the human than in that of the artificial. Business processes are increasingly distributed and open, making them prone to failure. Monitoring is, therefore, an important concern not only for the processes themselves but also for the services that comprise these processes. We present a framework for multilevel monitoring of these service systems. It formalizes interaction protocols, policies, and commitments that account for standard and extended effects following the language-action perspective, and allows specification of goals and monitors at varied abstraction levels. We demonstrate how the framework can be implemented and evaluate it with multiple scenarios like between merchant and customer transaction that include specifying and monitoring open-service policy commitments. Title :A Personalized Ontology Model for Web Information Gathering Language : C# Project Link : http://kasanpro.com/p/c-sharp/ontology-model-web-information-gathering Abstract : As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful. Title :Publishing Search Logs-A Comparative Study of Privacy Guarantees Language : C# Project Link : http://kasanpro.com/p/c-sharp/publishing-search-logs-privacy-guarantees Abstract : Search engine companies collect the "database of intentions", the histories of their users' search queries. These search logs are a gold mine for researchers. Search engine companies, however, are wary of publishing search logs in order not to disclose sensitive information. In this paper we analyze algorithms for publishing frequent keywords, queries and clicks of a search log. We first show how methods that achieve variants of k-anonymity are vulnerable to active attacks. We then demonstrate that the stronger guarantee ensured by differential privacy unfortunately does not provide any utility for this problem. Our paper concludes with a large experimental study using real applications where we compare ZEALOUS and previous work that achieves k-anonymity in search log publishing. Our results show that ZEALOUS yields comparable utility to k?anonymity while at the same time achieving much stronger privacy guarantees. Title :Scalable Scheduling of Updates in Streaming Data Warehouse Language : C# Project Link : http://kasanpro.com/p/c-sharp/scheduling-updates-streaming-data-warehouse Abstract : This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Face book, Twitter, Flicker, and YouTube present opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other non-scalable methods. M.Phil Computer Science Data Mining Projects Title :The Awareness Network, To Whom Should I Display My Actions? And, Whose Actions Should I Monitor? Language : C# Project Link : http://kasanpro.com/p/c-sharp/accessing-monitoring-inawareness-network
  • 5. Abstract : The concept of awareness plays a pivotal role in research in Computer Supported Cooperative Work. Recently, Software Engineering researchers interested in the collaborative nature of software development have explored the implications of this concept in the design of software development tools. A critical aspect of awareness is the associated coordinative work practices of displaying and monitoring actions. This aspect concerns how colleagues monitor one another's actions to understand how these actions impact their own work and how they display their actions in such a way that others can easily monitor them while doing their own work. In this paper, we focus on an additional aspect of awareness: the identification of the social actors who should be monitored and the actors to whom their actions should be displayed. We address this aspect by presenting software developers' work practices based on ethnographic data from three different software development teams. In addition, we illustrate how these work practices are influenced by different factors, including the organizational setting, the age of the project, and the software architecture. We discuss how our results are relevant for both CSCW and Software Engineering researchers. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :The World in a Nutshell Concise Range Queries Language : C# Project Link : http://kasanpro.com/p/c-sharp/world-nutshell-concise-range-queries Abstract : With the advance of wireless communication technology, it is quite common for people to view maps or get related services from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most commonly used tools, are often posed by the users to retrieve needful information from a spatial database. However, due to the limits of communication bandwidth and hardware power of handheld devices, displaying all the results of a range query on a handheld device is neither communicationefficient nor informative to the users. This is simply because that there are often too many results returned from a range query. In view of this problem, we present a novel idea that a concise representation of a specified size for the range query results, while incurring minimal information loss, shall be computed and returned to the user. Such a concise range query not only reduces communication costs, but also offers better usability to the users, providing an opportunity for interactive exploration. The usefulness of the concise range queries is confirmed by comparing it with other possible alternatives, such as sampling and clustering. Unfortunately, we prove that finding the optimal representation with minimum information loss is an NP-hard problem. Therefore, we propose several effective and nontrivial algorithms to find a good approximate result. Extensive experiments on real-world data have demonstrated the effectiveness and efficiency of the proposed techniques. Title :A Query Formulation Language for the Data Web Language : C# Project Link : http://kasanpro.com/p/c-sharp/query-formulation-language-data-web Abstract : We present a query formulation language called MashQL in order to easily query and fuse structured data on the web. The main novelty of MashQL is that it allows people with limited IT-skills to explore and query one or multiple data sources without prior knowledge about the schema, structure, vocabulary, or any technical details of these sources. More importantly, to be robust and cover most cases in practice, we do not assume that a data source should have -an offline or inline- schema. This poses several language-design and performance complexities that we fundamentally tackle. To illustrate the query formulation power of MashQL, and without loss of generality, we chose the Data Web scenario. We also chose querying RDF, as it is the most primitive data model; hence, MashQL can be similarly used for querying relational databases and XML. We present two implementations of MashQL, an online mashup editor, and a Firefox add-on. The former illustrates how MashQL can be used to query and mash up the Data Web as simple as filtering and piping web feeds; and the Firefox addon illustrates using the browser as a web composer rather than only a navigator. To end, we evaluate MashQL on querying two datasets, DBLP and DBPedia, and show that our indexing techniques allow instant user-interaction. Title :Exploring Application-Level Semantics for Data Compression Language : C# Project Link : http://kasanpro.com/p/c-sharp/exploring-application-level-semantics-data-compression Abstract : Natural phenomena show that many creatures form large social groups and move in regular patterns.
  • 6. However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently. Title :Data Leakage Detection Language : C# Project Link : http://kasanpro.com/p/c-sharp/data-leakage-detection Abstract : A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases we can also inject "realistic but fake" data records to further improve our chances of detecting leakage and identifying the guilty party. M.Phil Computer Science Data Mining Projects Title :Knowledge Based Interactive Postmining of Association Rules Using Ontologies Language : C# Project Link : http://kasanpro.com/p/c-sharp/knowledge-based-interactive-postmining-association-rules-using-ontologies Abstract : In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered rules. To overcome this drawback, several methods were proposed in the literature such as item set concise representations, redundancy reduction, and post processing. However, being generally based on statistical information, most of these methods do not guarantee that the extracted rules are interesting for the user. Thus, it is crucial to help the decision-maker with an efficient post processing step in order to reduce the number of rules. This paper proposes a new interactive approach to prune and filter discovered rules. First, we propose to use ontologies in order to improve the integration of user knowledge in the post processing task. Second, we propose the Rule Schema formalism extending the specification language proposed by Liu et al. for user expectations. Furthermore, an interactive framework is designed to assist the user throughout the analyzing task. Applying our new approach over voluminous sets of rules, we were able, by integrating domain expert knowledge in the post processing step, to reduce the number of rules to several dozens or less. Moreover, the quality of the filtered rules was validated by the domain expert at various points in the interactive process. Title :A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases Language : C# Project Link : http://kasanpro.com/p/c-sharp/link-analysis-mining-relational-databases Abstract : This work introduces a link-analysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a random-walk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion-map subspace and visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined and to multiple correspondence analyses when the database takes the form of a simple star schema. On the other hand, a kernel version of the diffusion-map distance, generalizing the basic diffusion-map distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several datasets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.
  • 7. Title :Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators Language : C# Project Link : http://kasanpro.com/p/c-sharp/query-planning-continuous-aggregation-queries Abstract : Continuous queries are used to monitor changes to time varying data and to provide results useful for online decision making. Typically a user desires to obtain the value of some aggregation function over distributed data items, for example, to know value of portfolio for a client; or the AVG of temperatures sensed by a set of sensors. In these queries a client specifies a coherency requirement as part of the query. We present a low-cost, scalable technique to answer continuous aggregation queries using a network of aggregators of dynamic data items. In such a network of data aggregators, each data aggregator serves a set of data items at specific coherencies. Just as various fragments of a dynamic web-page are served by one or more nodes of a content distribution network, our technique involves decomposing a client query into sub-queries and executing sub-queries on judiciously chosen data aggregators with their individual sub-query incoherency bounds. We provide a technique for getting the optimal set of sub-queries with their incoherency bounds which satisfies client query's coherency requirement with least number of refresh messages sent from aggregators to the client. For estimating the number of refresh messages, we build a query cost model which can be used to estimate the number of messages required to satisfy the client specified incoherency bound. Performance results using real-world traces show that our cost based query planning leads to queries being executed using less than one third the number of messages required by existing schemes. Title :Scalable learning of collective behavior Language : C# Project Link : http://kasanpro.com/p/c-sharp/scalable-learning-collective-behavior Abstract : This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Face book, Twitter, Flicker, and YouTube present opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other non-scalable methods. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Horizontal Aggregations in SQL to prepare Data Sets for Data Mining Analysis Language : C# Project Link : http://kasanpro.com/p/c-sharp/horizontal-aggregations-sql-data-mining-analysis Abstract : Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: CASE: Exploiting the programming CASE construct; SPJ: Based on standard relational algebra operators (SPJ queries); PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method does not. M.Phil Computer Science Data Mining Projects Title :A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts Language : C#
  • 8. Project Link : http://kasanpro.com/p/c-sharp/machine-learning-identifying-disease-treatment-relations-short-texts Abstract : The Machine Learning (ML) field has gained its momentum in almost any domain of research and just recently has become a reliable tool in the medical domain. The empirical domain of automatic learning is used in tasks such as medical decision support, medical imaging, protein-protein interaction, extraction of medical knowledge, and for overall patient management care. ML is envisioned as a tool by which computer-based systems can be integrated in the healthcare field in order to get a better, more efficient medical care. This paper describes a ML-based methodology for building an application that is capable of identifying and disseminating healthcare information. It extracts sentences from published medical papers that mention diseases and treatments, and identifies semantic relations that exist between diseases and treatments. Our evaluation results for these tasks show that the proposed methodology obtains reliable outcomes that could be integrated in an application to be used in the medical care domain. The potential value of this paper stands in the ML settings that we propose and in the fact that we outperform previous results on the same data set. Title :m-Privacy for Collaborative Data Publishing Language : C# Project Link : http://kasanpro.com/p/c-sharp/privacy-collaborative-data-publishing Abstract : In this paper, we consider the collaborative data publishing problem for anonymizing horizontally partitioned data at multiple data providers. We consider a new type of "insider attack" by colluding data providers who may use their own data records (a subset of the overall data) in addition to the external background knowledge to infer the data records contributed by other data providers. The paper addresses this new threat and makes several contributions. First, we introduce the notion of m-privacy, which guarantees that the anonymized data satisfies a given privacy constraint against any group of up to m colluding data providers. Second, we present heuristic algorithms exploiting the equivalence group monotonicity of privacy constraints and adaptive ordering techniques for efficiently checking m-privacy given a set of records. Finally, we present a data provider-aware anonymization algorithm with adaptive m- privacy checking strategies to ensure high utility and m-privacy of anonymized data with efficiency. Experiments on real-life datasets suggest that our approach achieves better or comparable utility and efficiency than existing and baseline algorithms while providing m-privacy guarantee. Title :Spatial Approximate String Search Language : C# Project Link : http://kasanpro.com/p/c-sharp/spatial-approximate-string-search Abstract : This work deals with the approximate string search in large spatial databases. Specifically, we investigate range queries augmented with a string similarity search predicate in both Euclidean space and road networks. We dub this query the spatial approximate string (S AS ) query. In Euclidean space, we propose an approximate solution, the M H R-tree, which embeds min-wise signatures into an R-tree. The min-wise signature for an index node u keeps a concise representation of the union of q-grams from strings under the sub-tree of u. We analyze the pruning functionality of such signatures based on the set resemblance between the query string and the q-grams from the sub-trees of index nodes. We also discuss how to estimate the selectivity of a S AS query in Euclidean space, for which we present a novel adaptive algorithm to find balanced partitions using both the spatial and string information stored in the tree. For queries on road networks, we propose a novel exact method, R SAS S OL, which significantly outperforms the baseline algorithm in practice. The R SAS S OL combines the q-gram based inverted lists and the reference nodes based pruning. Extensive experiments on large real data sets demonstrate the efficiency and effectiveness of our approaches. Title :Predicting iPhone Sales from iPhone Tweets Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/predicting-iphone-sales-iphone-tweets Abstract : Recent research in the field of computational social science have shown how data resulting from the widespread adoption and use of social media channels such as twitter can be used to predict outcomes such as movie revenues, election winners, localized moods, and epidemic outbreaks. Underlying assumptions for this research stream on predictive analytics are that social media actions such as tweeting, liking, commenting and rating are proxies for user/consumer's attention to a particular object/product and that the shared digital artefact that is persistent can create social influence. In this paper, we demonstrate how social media data from twitter can be used
  • 9. to predict the sales of iPhones. Based on a conceptual model of social data consisting of social graph (actors, actions, activities, and artefacts) and social text (topics, keywords, pronouns, and sentiments), we develop and evaluate a linear regression model that transforms iPhone tweets into a prediction of the quarterly iPhone sales with an average error close to the established prediction models from investment banks. This strong correlation between iPhone tweets and iPhone sales becomes marginally stronger after incorporating sentiments of tweets. We discuss the findings and conclude with implications for predictive analytics with big social data. Title :A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data Language : C# Project Link : http://kasanpro.com/p/c-sharp/clustering-based-feature-subset-selection-algorithm-high-dimensional-data Abstract : Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high dimensional image, microarray, and text data, demonstrate that FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. M.Phil Computer Science Data Mining Projects Title :Crowdsourcing Predictors of Behavioral Outcomes Language : C# Project Link : http://kasanpro.com/p/c-sharp/crowdsourcing-predictors-behavioral-outcomes Abstract : Generating models from large data sets--and deter- mining which subsets of data to mine--is becoming increasingly automated. However choosing what data to collect in the first place requires human intuition or experience, usually supplied by a domain expert. This paper describes a new approach to machine science which demonstrates for the first time that non-domain experts can collectively formulate features, and provide values for those features such that they are predictive of some behavioral outcome of interest. This was accomplished by building a web platform in which human groups interact to both respond to questions likely to help predict a behavioral outcome and pose new questions to their peers. This results in a dynamically-growing online survey, but the result of this cooperative behavior also leads to models that can predict user's outcomes based on their responses to the user-generated survey questions. Here we describe two web-based experiments that instantiate this approach: the first site led to models that can predict users' monthly electric energy consumption; the other led to models that can predict users' body mass index. As exponential increases in content are often observed in successful online collaborative communities, the proposed methodology may, in the future, lead to similar exponential rises in discovery and insight into the causal factors of behavioral outcomes. Title :Data Extraction for Deep Web Using WordNet Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/data-extraction-deep-web-using-wordnet Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from
  • 10. multilingual web pages and that it is domain independent. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Data Extraction for Deep Web Using WordNet Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/data-extraction-deep-web-using-wordnet-code Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from multilingual web pages and that it is domain independent. Title :Data Extraction for Deep Web Using WordNet Language : PHP Project Link : http://kasanpro.com/p/php/data-extraction-deep-web-using-wordnet-implement Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from multilingual web pages and that it is domain independent. Title :An Effective Retrieval of Medical Records using Data Mining Techniques Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/retrieval-medical-records-data-mining Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm (KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively. M.Phil Computer Science Data Mining Projects Title :An Effective Retrieval of Medical Records using Data Mining Techniques Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/retrieval-medical-records-data-mining-code Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm
  • 11. (KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively. Title :An Effective Retrieval of Medical Records using Data Mining Techniques Language : PHP Project Link : http://kasanpro.com/p/php/retrieval-medical-records-data-mining-implement Abstract : Nowadays, the standard of healthcare domain mainly depends on in the delivery of modern healthcare and efficiency of healthcare systems. Due to time and cost constraints, most of the people rely on health care systems to obtain healthcare services. Healthcare system becomes very important to develop an automated tool that is capable of identifying and disseminating relevant healthcare information. This work focuses on retrieval of updated, accurate and relevant information from Medline datasets using Machine earning approach. The proposed work uses keyword searching algorithm for extracting relevant information from Medline datasets and K-Nearest Neighbor algorithm (KNN) to get the relation between disease and treatment. Since, improvement of patient care achieved effectively. Title :Design and analysis of concept adapting real time data stream Applications Language : C# Project Link : http://kasanpro.com/p/c-sharp/concept-adapting-real-time-data-stream-applications Abstract : Real - time signals are continuous in nature and abruptly changing hence there is a need to apply an efficient and concept adapting real - time data stream mining technique to take intelligent decisions online. Concept drift in real time data stream refers to a change in the class (concept) definitions over time. It is also called as NON - STATIONARY LEARNING (NSL). The most important criteria are to solve the real - time data stream mining problem with 'concept drift' in well manner. Title :Data Extraction for Deep Web Using WordNet Language : C# Project Link : http://kasanpro.com/p/c-sharp/data-extraction-deep-web-using-wordnet-module Abstract : Our survey shows that the techniques used in data extraction from deep webs need to be improved to achieve the efficiency and accuracy of automatic wrappers. Further investigations indicate that the development of a lightweight ontological technique using existing lexical database for English (WordNet) is able to check the similarity of data records and detect the correct data region with higher precision using the semantic properties of these data records. The advantages of this method are that it can extract three types of data records, namely, single-section data records, multiple-section data records, and loosely structured data records, and it also provides options for aligning iterative and disjunctive data items. Experimental results show that our technique is robust and performs better than the existing state-of-the-art wrappers. Tests also show that our wrapper is able to extract data records from multilingual web pages and that it is domain independent. Title :Answering General Time-Sensitive Queries Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/answering-general-time-sensitive-queries Abstract : Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for "recency" queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques. M.Phil Computer Science Data Mining Projects
  • 12. Title :Answering General Time-Sensitive Queries Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/answering-general-time-sensitive-queries-framwork Abstract : Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for "recency" queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques. Title :A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication Language : C# Project Link : http://kasanpro.com/p/c-sharp/indexing-scalable-record-linkage-deduplication Abstract : Record linkage is the process of matching records from several databases that refer to the same entities. When applied on a single database, this process is known as deduplication. Increasingly, matched data are becoming important in many application areas, because they can contain information that is not available otherwise, or that is too costly to acquire. Removing duplicate records in a single database is a crucial step in the data cleaning process, because duplicates can severely influence the outcomes of any subsequent data processing or data mining. With the increasing size of today's databases, the complexity of the matching process becomes one of the major challenges for record linkage and deduplication. In recent years, various indexing techniques have been developed for record linkage and deduplication. They are aimed at reducing the number of record pairs to be compared in the matching process by removing obvious non-matching pairs, while at the same time maintaining high matching quality. This paper presents a survey of twelve variations of six indexing techniques. Their complexity is analysed, and their performance and scalability is evaluated within an experimental framework using both synthetic and real data sets. No such detailed survey has so far been published. Title :Decentralized Probabilistic Text Clustering Language : NS2 Project Link : http://kasanpro.com/p/ns2/decentralized-probabilistic-text-clustering Abstract : Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the algorithm. Title :Decentralized Probabilistic Text Clustering Language : C# Project Link : http://kasanpro.com/p/c-sharp/decentralized-probabilistic-text-clustering-code Abstract : Text clustering is an established technique for improving quality in information retrieval, for both centralized and distributed environments. However, traditional text clustering algorithms fail to scale on highly distributed environments, such as peer-to-peer networks. Our algorithm for peer-to-peer clustering achieves high scalability by using a probabilistic approach for assigning documents to clusters. It enables a peer to compare each of its documents only with very few selected clusters, without significant loss of clustering quality. The algorithm offers probabilistic guarantees for the correctness of each document assignment to a cluster. Extensive experimental evaluation with up to 1 million peers and 1 million documents demonstrates the scalability and effectiveness of the
  • 13. algorithm. Title :Effective Pattern Discovery for Text Mining Language : C# Project Link : http://kasanpro.com/p/c-sharp/effective-pattern-discovery-text-mining Abstract : Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. This paper presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance. M.Phil Computer Science Data Mining Projects Title :Ranking Model Adaptation for Domain-Specific Search Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/adaptation-domain-specific-search Abstract : With the explosive emergence of vertical search domains, applying the broad-based ranking model directly to different domains is no longer desirable due to domain differences, while building a unique ranking model for each domain is both laborious for labeling data and time-consuming for training models. In this paper, we address these difficulties by proposing a regularization based algorithm called ranking adaptation SVM (RA-SVM), through which we can adapt an existing ranking model to a new domain, so that the amount of labeled data and the training cost is reduced while the performance is still guaranteed. Our algorithm only requires the prediction from the existing ranking models, rather than their internal representations or the data from auxiliary domains. In addition, we assume that documents similar in the domain-specific feature space should have consistent rankings, and add some constraints to control the margin and slack variables of RA-SVM adaptively. Finally, ranking adaptability measurement is proposed to quantitatively estimate if an existing ranking model can be adapted to a new domain. Experiments performed over Letor and two large scale datasets crawled from a commercial search engine demonstrate the applicabilities of the proposed ranking adaptation algorithms and the ranking adaptability measurement. Title :Ranking Model Adaptation for Domain-Specific Search Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/ranking-adaptation-domain-specific-search Abstract : With the explosive emergence of vertical search domains, applying the broad-based ranking model directly to different domains is no longer desirable due to domain differences, while building a unique ranking model for each domain is both laborious for labeling data and time-consuming for training models. In this paper, we address these difficulties by proposing a regularization based algorithm called ranking adaptation SVM (RA-SVM), through which we can adapt an existing ranking model to a new domain, so that the amount of labeled data and the training cost is reduced while the performance is still guaranteed. Our algorithm only requires the prediction from the existing ranking models, rather than their internal representations or the data from auxiliary domains. In addition, we assume that documents similar in the domain-specific feature space should have consistent rankings, and add some constraints to control the margin and slack variables of RA-SVM adaptively. Finally, ranking adaptability measurement is proposed to quantitatively estimate if an existing ranking model can be adapted to a new domain. Experiments performed over Letor and two large scale datasets crawled from a commercial search engine demonstrate the applicabilities of the proposed ranking adaptation algorithms and the ranking adaptability measurement. Title :Scalable Learning of Collective Behavior Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/scalable-learning-collective-behavior-code Abstract : This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present
  • 14. opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other non-scalable methods. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Scalable Learning of Collective Behavior Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/scalable-learning-collective-behavior-implement Abstract : This study of collective behavior is to understand how individuals behave in a social networking environment. Oceans of data generated by social media like Facebook, Twitter, Flickr, and YouTube present opportunities and challenges to study collective behavior on a large scale. In this work, we aim to learn to predict collective behavior in social media. In particular, given information about some individuals, how can we infer the behavior of unobserved individuals in the same network? A social-dimension-based approach has been shown effective in addressing the heterogeneity of connections presented in social media. However, the networks in social media are normally of colossal size, involving hundreds of thousands of actors. The scale of these networks entails scalable learning of models for collective behavior prediction. To address the scalability issue, we propose an edge-centric clustering scheme to extract sparse social dimensions. With sparse social dimensions, the proposed approach can efficiently handle networks of millions of actors while demonstrating a comparable prediction performance to other non-scalable methods. Title :Resilient Identity Crime Detection Language : C# Project Link : http://kasanpro.com/p/c-sharp/resilient-identity-crime-detection Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new multilayered detection system complemented with two additional layers: communal detection (CD) and spike detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to the design, implementation, and evaluation of all detection systems. M.Phil Computer Science Data Mining Projects Title :Resilient Identity Crime Detection Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/resilient-identity-crime-detection-code Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new multilayered detection system complemented with two additional layers: communal detection (CD) and spike detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a
  • 15. variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to the design, implementation, and evaluation of all detection systems. Title :Resilient Identity Crime Detection Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/resilient-identity-crime-detection-implement Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new multilayered detection system complemented with two additional layers: communal detection (CD) and spike detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to the design, implementation, and evaluation of all detection systems. Title :Resilient Identity Crime Detection Language : PHP Project Link : http://kasanpro.com/p/php/resilient-identity-crime-detection-module Abstract : Identity crime is well known, prevalent, and costly; and credit application fraud is a specific case of identity crime. The existing nondata mining detection system of business rules and scorecards, and known fraud matching have limitations. To address these limitations and combat identity crime in real time, this paper proposes a new multilayered detection system complemented with two additional layers: communal detection (CD) and spike detection (SD). CD finds real social relationships to reduce the suspicion score, and is tamper resistant to synthetic social relationships. It is the whitelist-oriented approach on a fixed set of attributes. SD finds spikes in duplicates to increase the suspicion score, and is probe-resistant for attributes. It is the attribute-oriented approach on a variable-size set of attributes. Together, CD and SD can detect more types of attacks, better account for changing legal behavior, and remove the redundant attributes. Experiments were carried out on CD and SD with several million real credit applications. Results on the data support the hypothesis that successful credit application fraud patterns are sudden and exhibit sharp spikes in duplicates. Although this research is specific to credit application fraud detection, the concept of resilience, together with adaptivity and quality data discussed in the paper, are general to the design, implementation, and evaluation of all detection systems. Title :Real-Time Analysis of Physiological Data to Support Medical Applications Language : C# Project Link : http://kasanpro.com/p/c-sharp/real-time-analysis-physiological-data-support-medical-applications Abstract : This paper presents a flexible framework that per- forms real-time analysis of physiological data to monitor people's health conditions in any context (e.g., during daily activities, in hospital environments). Given historical physiological data, different behavioral models tailored to specific conditions (e.g., a particular disease, a specific patient) are automatically learnt. A suitable model for the currently monitored patient is exploited in the real- time stream classification phase. The framework has been designed to perform both instantaneous evaluation and stream analysis over a sliding time window. To allow ubiquitous monitoring, real-time analysis could also be executed on mobile devices. As a case study, the framework has been validated in the intensive care scenario. Experimental validation, performed on 64 patients affected by different critical illnesses, demonstrates the effectiveness and the flexibility of the proposed framework in detecting different severity levels of monitored people's clinical situations. Title :Contextual query classification in web search Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/contextual-query-classification-web-search
  • 16. Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality of a search engine's results. User context elements like interests, preferences and intents are the main sources exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve the query specific retrieval search relies on classifying web queries into three types: informational, navigational and transactional according to the user intent. However, query type classification strategies involved are based solely on query features where the query type decision is made out of the user context represented by his search history. In this paper, we present a con- textual query classification method making use of both query features and the user context defined by quality indicators of the previous query session type called the query profile. We define a query session as a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our approach is promising. M.Phil Computer Science Data Mining Projects Title :Contextual query classification in web search Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/contextual-query-classification-web-search-results Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality of a search engine's results. User context elements like interests, preferences and intents are the main sources exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve the query specific retrieval search relies on classifying web queries into three types: informational, navigational and transactional according to the user intent. However, query type classification strategies involved are based solely on query features where the query type decision is made out of the user context represented by his search history. In this paper, we present a con- textual query classification method making use of both query features and the user context defined by quality indicators of the previous query session type called the query profile. We define a query session as a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our approach is promising. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Contextual query classification in web search Language : PHP Project Link : http://kasanpro.com/p/php/query-classification-web-search Abstract : There has been an increasing interest in exploiting multiple sources of evidence for improving the quality of a search engine's results. User context elements like interests, preferences and intents are the main sources exploited in information retrieval approaches to better fit the user information needs. Using the user intent to improve the query specific retrieval search relies on classifying web queries into three types: informational, navigational and transactional according to the user intent. However, query type classification strategies involved are based solely on query features where the query type decision is made out of the user context represented by his search history. In this paper, we present a con- textual query classification method making use of both query features and the user context defined by quality indicators of the previous query session type called the query profile. We define a query session as a sequence of queries of the same type. Preliminary experimental results carried out using TREC data show that our approach is promising. Title :Annotating Search Results from Web Databases Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/annotating-search-results-web-databases Abstract : An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.
  • 17. Title :Annotating Search Results from Web Databases Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/annotating-search-results-web-databas Abstract : An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective. Title :Annotating Search Results from Web Databases Language : PHP Project Link : http://kasanpro.com/p/php/annotating-search-results-web-databases-efficient Abstract : An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective. M.Phil Computer Science Data Mining Projects Title :A cost sensitive decision tree classification in credit card identity crime detection system Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/cost-sensitive-decision-tree-classification-credit-card-identity-crime-detec Abstract : Title :A cost sensitive decision tree classification in credit card identity crime detection system Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/cost-sensitive-decision-tree-classification-credit-card-identity-crime-d Abstract : Title :A cost sensitive decision tree classification in credit card identity crime detection system Language : C# Project Link : http://kasanpro.com/p/c-sharp/cost-sensitive-decision-tree-classification-credit-card-identity-fraud-crime-detection Abstract : Title :A cost sensitive decision tree classification in credit card identity crime detection system Language : PHP Project Link : http://kasanpro.com/p/php/decision-tree-classification-credit-card-identity-crime-detection-system
  • 18. Abstract : http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :A cost-sensitive decision tree approach for fraud detection Language : C# Project Link : http://kasanpro.com/p/c-sharp/credit-card-identity-crime-detection-system-cost-sensitive-decision-tree-classification Abstract : With the developments in the information technology, fraud is spreading all over the world, resulting in huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at each non-terminal node is developed and the performance of this approach is compared with the well-known traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known methods on the given prob- lem set with respect to the well-known performance metrics such as accuracy and true positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly, financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud detection systems. M.Phil Computer Science Data Mining Projects Title :A cost-sensitive decision tree approach for fraud detection Language : VB.NET Project Link : http://kasanpro.com/p/vb-net/cost-sensitive-decision-tree-classify-credit-card-identity-crime-detection-system Abstract : With the developments in the information technology, fraud is spreading all over the world, resulting in huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at each non-terminal node is developed and the performance of this approach is compared with the well-known traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known methods on the given prob- lem set with respect to the well-known performance metrics such as accuracy and true positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly, financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud detection systems. Title :PREDICTING HOME SERVICE DEMANDS FROM APPLIANCE USAGE DATA Language : C# Project Link : http://kasanpro.com/p/c-sharp/predicting-home-service-demands-from-appliance-usage-data Abstract : Power management in homes and offices requires appliance usage prediction when the future user requests are not available. The randomness and uncertainties associated with an appliance usage make the prediction of appliance usage from energy consumption data a non-trivial task. A general model for prediction at the appliance level is still lacking. In this work, we propose to enrich learning algorithms with expert knowledge and propose a general model using a knowledge driven approach to forecast if a particular appliance will start at a given hour or not. The approach is both a knowledge driven and data driven one. The overall energy management for a house requires that the prediction is done for the next 24 hours in the future. The proposed model is tested over the Irise data and the results are compared with some trivial knowledge driven predictors. Title :Data Mining and Wireless Sensor Network for Groundnut Pest/Disease Interaction and Predictions - A
  • 19. Preliminary Study Language : C# Project Link : http://kasanpro.com/p/c-sharp/data-mining-wireless-sensor-network-groundnut-pest-disease-predictions Abstract : Data driven precision agriculture aspects, particularly the pest/disease management, require a dynamic crop-weather data. An experiment was conducted in semi-arid region of India to understand the crop-weather-pest/disease relations using wireless sensory and field-level surveillance data on closely related and interdependent pest (Thrips) - disease (Bud Necrosis) dynamics of groundnut (peanut) crop. Various data mining techniques were used to turn the data into useful information/ knowledge/ relations/ trends and correlation of crop-weather-pest/disease continuum. These dynamics obtained from the data mining techniques and trained through mathematical models were validated with corresponding ground level surveillance data. It was found that Bud Necrosis viral disease infection is strongly influenced by Humidity, Maximum Temperature, prolonged duration of leaf wetness, age of the crop and propelled by a carrier pest Thrips. Results obtained from the four continuous agriculture seasons (monsoon & post monsoon) data has led to develop cumulative and non-cumulative prediction models, which can assist the user community to take respective ameliorative measures. Title :Mining Social Media Data for Understanding Student's Learning Experiences Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/mining-social-media-data-understanding-students-learning-experiences Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be challenging. The complexity of student's experiences reflected from social media content requires human interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on engineering student's Twitter posts to understand issues and problems in their educational experiences. We first conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000 tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and results that show how informal social media data can provide insights into students' experiences. Title :Mining Social Media Data for Understanding Student's Learning Experiences Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/mining-social-media-data-understanding-students-learning-experien Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be challenging. The complexity of student's experiences reflected from social media content requires human interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on engineering student's Twitter posts to understand issues and problems in their educational experiences. We first conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000 tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and results that show how informal social media data can provide insights into students' experiences. M.Phil Computer Science Data Mining Projects Title :Mining Social Media Data for Understanding Student's Learning Experiences Language : C# Project Link : http://kasanpro.com/p/c-sharp/mining-social-media-data-understanding-students-learning-experiences-implement Abstract : Students' informal conversations on social media (e.g. Twitter, Facebook) shed light into their educational
  • 20. experiences - opinions, feelings, and concerns about the learning process. Data from such uninstrumented environment can provide valuable knowledge to inform student learning. Analyzing such data, however, can be challenging. The complexity of student's experiences reflected from social media content requires human interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we developed a workflow to integrate both qualitative analysis and large - scale data mining techniques. We focus on engineering student's Twitter posts to understand issues and problems in their educational experiences. We first conducted a qualitative analysis on samples taken from about 25,000 tweets related to engagement, and sleep deprivation. Based on these results, we implemented a multi - label classification algorithm to classify tweets reflecting student's problems. We then used the algorithm to train a detector of student problems from about 35,000 tweets streamed at the geo - location of Purdue University. This work, for the first time, presents a methodology and results that show how informal social media data can provide insights into students' experiences. Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks Language : ASP.NET with VB Project Link : http://kasanpro.com/p/asp-net-with-vb/viral-marketing-cost-effective-time-critical-campaigns-large-scale-social-n Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in social networks can be used to increase product adoption or widely spread content over the network. The common perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature. With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding. To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding within a ratio better than O(log n) is unlikely possible. http://kasanpro.com/ieee/final-year-project-center-thanjavur-reviews Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks Language : ASP.NET with C# Project Link : http://kasanpro.com/p/asp-net-with-c-sharp/cost-effective-viral-marketing-time-critical-campaigns-large-scale-so Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in social networks can be used to increase product adoption or widely spread content over the network. The common perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature. With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding. To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding within a ratio better than O(log n) is unlikely possible. Title :Cost-effective Viral Marketing for Time-critical Campaigns in Large-scale Social Networks Language : C# Project Link : http://kasanpro.com/p/c-sharp/effective-viral-marketing-time-critical-campaigns-large-scale-social-networks Abstract : Online social networks (OSNs) have become one of the most effective channels for marketing and
  • 21. advertising. Since users are often influenced by their friends, "wordof- mouth" exchanges, so-called viral marketing, in social networks can be used to increase product adoption or widely spread content over the network. The common perception of viral marketing about being cheap, easy, and massively effective makes it an ideal replacement of traditional advertising. However, recent studies have revealed that the propagation often fades quickly within only few hops from the sources, counteracting the assumption on the self-perpetuating of influence considered in literature. With only limited influence propagation, is massively reaching customers via viral marketing still affordable? How to economically spend more resources to increase the spreading speed? We investigate the cost-effective massive viral marketing problem, taking into the consideration the limited influence propagation. Both analytical analysis based on power-law network theory and numerical analysis demonstrate that the viral marketing might involve costly seeding. To minimize the seeding cost, we provide mathematical programming to find optimal seeding for medium-size networks and propose VirAds, an efficient algorithm, to tackle the problem on largescale networks. VirAds guarantees a relative error bound of O(1) from the optimal solutions in power-law networks and outperforms the greedy heuristics which realizes on the degree centrality. Moreover, we also show that, in general, approximating the optimal seeding within a ratio better than O(log n) is unlikely possible. Title :Green Mining: Investigating Power Consumption across Versions Language : C# Project Link : http://kasanpro.com/p/c-sharp/green-mining-investigating-power-consumption-versions Abstract : Power consumption is increasingly becoming a concern for not only electrical engineers, but for software engineers as well, due to the increasing popularity of new power-limited contexts such as mobile-computing, smart-phones and cloud-computing. Software changes can alter software power consumption behaviour and can cause power performance regressions. By tracking software power consumption we can build models to provide suggestions to avoid power regressions. There is much research on software power consumption, but little focus on the relationship between software changes and power consumption. Most work measures the power consumption of a single software task; instead we seek to extend this work across the history (revisions) of a project. We develop a set of tests for a well established product and then run those tests across all versions of the product while recording the power usage of these tests. We provide and demonstrate a methodology that enables the analysis of power consumption performance for over 500 nightly builds of Firefox 3.6; we show that software change does induce changes in power consumption. This methodology and case study are a first step towards combining power measurement and mining software repositories research, thus enabling developers to avoid power regressions via power consumption awareness. M.Phil Computer Science Data Mining Projects Title :Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number Language : C# Project Link : http://kasanpro.com/p/c-sharp/categorical-numerical-attribute-data-clustering-based Abstract : Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering framework based on the concept of object-cluster similarity and gives a unified similarity metric which can be simply applied to the data with categorical, numerical, and mixed attributes. Accordingly, an iterative clustering algorithm is developed, whose outstanding performance is experimentally demonstrated on different benchmark data sets. Moreover, to circumvent the difficult selection problem of cluster number, we further develop a penalized competitive learning algorithm within the proposed clustering framework. The embedded competition and penalization mechanisms enable this improved algorithm to determine the number of clusters automatically by gradually eliminating the redundant clusters. The experimental results show the efficacy of the proposed approach. Title :Categorical-and-numerical-attribute data clustering using K - Mode clustering and Fuzzy K - Mode clustering Language : C# Project Link : http://kasanpro.com/p/c-sharp/categorical-numerical-attribute-data-clustering-fuzzy Abstract : Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering framework based on the concept of object-cluster similarity and gives a unified similarity metric which can be simply applied to the data with categorical, numerical, and