SlideShare a Scribd company logo
1 of 118
Download to read offline
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Data Mining for java/ dot net
Check following Projects ,also check if any spelling mistakes
before showing to your Guide:
Use of FCM & fuzzy min max algorithm in lung cancer
Abstract— Lung cancer is a disease characterized by uncontrolled cell growth in tissues of
the lung and is the most common fatal malignancy in both men and women. Early detection
and treatment of lung cancer can greatly improve the survival rate of patient. Artificial
Neural Network (ANN), Fuzzy C-Mean (FcM) and Fuzzy Min-Max Neural network (FMNN)
are useful in medical diagnosis because of several advantages. Like ANN has fault
tolerance, flexibility, non linearity, while FcM gives best result for overlapped data set, data
point may belong to more then one cluster center and always converges .and , also, FMNN
has advantages like online adaptation, non-linear separability, less training time, soft and
hard decision. In this work, we propose to use FcM and FMNN on standard datasets, to
detect lung cancer
Systematic prediction of keywords over IMDB
Database.
ABSTRACT: Keyword queries on databases provide easy access to data, but often suffer
from low ranking quality, i.e., low precision and/or recall, as shown in recent benchmarks. It
would be useful to identify queries that are likely to have low ranking quality to improve the
user satisfaction. For instance, the system may suggest to the user alternative queries for
such hard queries. In this paper, we analyze the characteristics of hard queries and propose
a novel framework to measure the degree of difficulty for a keyword query over a database,
considering both the structure and the content of the database and the query results. We
evaluate our query difficulty prediction model against two effectiveness benchmarks for
popular keyword search ranking methods. Our empirical results show that our model
predicts the hard queries with high accuracy. Further, we present a suite of optimizations to
minimize the incurred time overhead.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Performance Evaluation and Estimation Model Using
Regression Method for Hadoop WordCount .
ABSTRACT : Given the rapid growth in cloud computing, it is important to analyze the
performance of different Hadoop MapReduce applications and to understand the
performance bottleneck in a cloud cluster that contributes to higher or lower performance. It
is also important to analyze the underlying hardware in cloud cluster servers to enable the
optimization of software and hardware to achieve the maximum
performance possible. Hadoop is based on MapReduce, which is one of the most popular
programming models for big data analysis in a parallel computing environment. In this
paper, we present a detailed performance analysis, characterization, and evaluation of
Hadoop MapReduce WordCount application.
We also propose an estimation model based on Amdahl's law regression method to
estimate performance and total processing time versus different input sizes for a given
processor architecture. The estimation regression model is veried to estimate performance
and run time with an error margin of <5%.
An Efficient Privacy-Preserving Ranked Keyword
Search Method
Abstract —Cloud data owners prefer to outsource documents in an encrypted form for the
purpose of privacy preserving. Therefore it is essential to develop efficient and reliable
ciphertext search techniques. One challenge is that the relationship between documents will
be normally concealed in the process of encryption, which will lead to significant search
accuracy performance degradation. Also the volume of data in data centers has
experienced a dramatic growth. This will make it even more challenging to design ciphertext
search schemes that can provide efficient and reliable online information retrieval on large
volume of encrypted data. In this paper, a hierarchical clustering method is proposed to
support more search semantics and also to meet the demand for fast ciphertext search
within a big data environment. The proposed hierarchical approach clusters the documents
based on the minimum relevance threshold, and then partitions the resulting clusters into
sub-clusters until the constraint on the maximum size of cluster is reached. In the search
phase, this approach can reach a linear computational complexity against an exponential
size increase of document collection. In order to verify the authenticity of search results, a
structure called minimum hash sub-tree is designed in this paper. Experiments have been
conducted using the collection set built from the IEEE Xplore. The results show that with a
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
sharp increase of documents in the dataset the search time of the proposed method
increases linearly whereas the search time of the traditional method increases
exponentially. Furthermore, the proposed method has an advantage over the traditional
method in the rank privacy and relevance of retrieved documents.
PRISM: PRivacy-aware Interest Sharing and Matching
in Mobile Social Networks.
Abstract —In a profile matchmaking application of mobile social networks, users need to
reveal their interests to each other in order to find the common interests. A malicious user
may harm a user by knowing his personal information. Therefore,
mutual interests need to be found in a privacy preserving manner. In this paper, we propose
an efficient privacy protection and interests sharing protocol referred to as PRivacy-aware
Interest Sharing and Matching (PRISM). PRISM enables users
to discover mutual interests without revealing their interests. Unlike existing approaches,
PRISM does not require revealing the interests to a trusted server. Moreover, the protocol
considers attacking scenarios that have not been addressed previously and provides an
efficient solution. The inherent mechanism reveals any cheating attempt by a malicious
user. PRISM also proposes the procedure to eliminate Sybil attacks. We analyze the
security of PRISM against both passive and active attacks. Through implementation, we
also present a detailed analysis of the performance of PRISM and compare it with existing
approaches. The results show the effectiveness of PRISM without any significant
performance degradation.
Mapping Bug Reports to Relevant Files using instant
selection and feature selection.
Abstract—
Open source projects for example Eclipse and Firefox have open source bug repositories.
User reports bugs to these repositories. Users of these repositories are usually non-
technical and cannot assign correct class to these bugs.
Triaging of bugs, to developer, to fix them is a tedious and time consuming task. Developers
are usually expert in particular areas. For example, few developers are expert in GUI and
others are in java functionality. Assigning a particular bug to relevant developer could save
time and would help to maintain the interest level of developers by assigning bugs
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
according to their interest. However, assigning right bug to right developer is quite difficult
for tri-ager without knowing the actual class, the bug belongs to. In this research, we have
classified the bugs in different labels on the basis of summary of the bug. Multinomial Naïve
Bayes text classifier is used for classification purpose. For feature selection, Chi-Square
and TFIDF algorithms were used. Using Naïve Bayes and Chi- square, we get average of
83 % accuracy.
Inference Patterns from Big Data using Aggregation,
Filtering and Tagging- A Survey.
Abstract : This paper reviews various approaches to infer the patterns from Big Data using
aggregation, filtering and tagging. Earlier research shows that data aggregation concerns
about gathered data and how efficiently it can be utilized. It is understandable that at the
time of data gathering one does not care much about whether the gathered data will be
useful or not. Hence, filtering and tagging of the data are the crucial steps in collecting the
relevant data to fulfill the need. Therefore the main goal of this paper is to present a detailed
and comprehensive survey on different approaches. To make the concept clearer, we have
provided a brief introduction of Big Data, how it works, working of two data aggregation
tools (namely, flume and sqoop), data processing tools (hive and mahout) and various
algorithms that can be useful to understand the topic. At last we have included comparisons
between aggregation tools, processing tools as well as various algorithms through its pre-
process, matching time, results and reviews.
Outsourced Similarity Search on Metric Data Assets.
ABSTRACT:
This paper considers a cloud computing setting in which similarity querying of metric data is
outsourced to a service provider. The data is to be revealed only to trusted users, not to the
service provider or anyone else. Users query the server for the most similar data objects to
a query example. Outsourcing offers the data owner scalability and a low-initial investment.
The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable
(e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents
techniques that transform the data prior to supplying it to the service provider for similarity
queries on the transformed data. Our techniques provide interesting trade-offs between
query cost and accuracy. They are then further extended to offer an intuitive privacy
guarantee. Empirical studies with real data demonstrate that the techniques are capable of
offering privacy while enabling efficient and accurate processing of similarity queries.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
CCD: A Distributed Publish/Subscribe Framework for
Rich Content Formats.
Abstract:
In this paper, we propose a content-based publish/subscribe (pub/sub) framework that
delivers matching content to subscribers in their desired format. Such a framework enables
the pub/sub system to accommodate richer content formats including multimedia
publications with image and video content. In our proposed framework, users (consumers)
in addition to specifying their information needs (subscription queries), also specify their
profile which includes the information about their receiving context which includes
characteristics of the device used to receive the content (e.g., resolution of a PDA used by a
consumer). The pub/sub system besides being responsible for matching and routing the
published content, also becomes responsible for converting the content into the suitable
format for each user. Content conversion is achieved through a set of content adaptation
operators (e.g., image transcoder, document translator, etc.). We study algorithms for
placement of such operators in heterogeneous pub/sub broker overlay in order to minimize
the communication and computation resource consumption. Our experimental results show
that careful placement of operators in pub/sub overlay network results in significant cost
reduction.
Measuring the Sky: On Computing Data Cubes via
Skylining the Measures.
ABSTRACT:Data cube is a key element in supporting fast OLAP. Traditionally, an
aggregate function is used to compute the values in data cubes. In this paper, we extend
the notion of data cubes with a new perspective. Instead of using an aggregate function, we
propose to build data cubes using the skyline operation as the ―aggregate function.‖ Data
cubes built in this way are called ―group-by skyline cubes‖ and can support a variety of
analytical tasks. Nevertheless, there are several challenges in implementing group-by
skyline cubes in data warehouses: 1) the skyline operation is computational intensive, 2) the
skyline operation is holistic, and 3) a group-by skyline cube contains both grouping and
skyline dimensions, rendering it infeasible to pre-compute all cuboids in advance. This
paper gives details on how to store, materialize, and query such cubes.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Finding Frequently Occurring Item set Pair On Big
Data.
Abstract—Frequent Itemset Mining (FIM) is one of the most well known techniques to
extract knowledge from data. The combinatorial explosion of FIM methods become even
more problematic when they are applied to Big Data. Fortunately, recent improvements in
the field of parallel programming already provide good tools to tackle this problem.
However, these tools come with their own technical challenges, e.g. balanced data
distribution and inter-communication costs. In this paper, we investigate the applicability of
FIM techniques on the MapReduce platform. We introduce two new methods for mining
large datasets: Dist-Eclat focuses on speed while BigFIM is optimized to run on really large
datasets. In our experiments we show the scalability of our methods.
Mining Social Media for Understanding Students’
Learning Experiences.
Abstract—Students’ informal conversations on social media (e.g., Twitter, Facebook) shed
light into their educational experiences— opinions, feelings, and concerns about the
learning process. Data from such uninstrumented environments can provide valuable
knowledge to inform student learning. Analyzing such data, however, can be challenging.
The complexity of students’ experiences reflected from social media content requires
human interpretation. However, the growing scale of data demands automatic data analysis
techniques. In this paper, we developed a workflow to integrate both qualitative analysis
and large-scale data mining techniques. We focused on engineering students’ Twitter posts
to understand issues and problems in their educational experiences. We first conducted a
qualitative analysis on samples taken from about 25,000 tweets related to engineering
students’ college life. We found engineering students encounter problems such as heavy
study load, lack of social engagement, and sleep deprivation. Based on these results, we
implemented a multi-label classification algorithm to classify tweets reflecting students’
problems. We then used the algorithm to train a detector of student problems from about
35,000 tweets streamed at the geo-location of Purdue University. This work, for the first
time, presents a methodology and results that show how informal social media data can
provide insights into students’ experiences.
Private Search and Content Protecting Location
Based Queries on google map.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
ABSTRACT:
In this paper we present a solution to one of the location-based query problems. This
problem is defined as follows: (i) a user wants to query a database of location data, known
as Points Of Interest (POIs), and does not want to reveal his/her location to the server due
to privacy concerns; (ii) the owner of the location data, that is, the location server, does not
want to simply distribute its data to all users. The location server desires to have some
control over its data, since the data is its asset. We propose a major enhancement upon
previous solutions by introducing a two stage approach, where the first step is based on
Oblivious Transfer and the second step is based on Private Information Retrieval, to
achieve a secure solution for both parties. The solution we present is efficient and practical
in many scenarios. We implement our solution on a desktop machine and a mobile device
to assess the efficiency of our protocol. We also introduce a security model and analyse
the security in the context of our protocol. Finally, we highlight a security weakness of our
previous work and present a solution to overcome it.
CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG
DATA USING PRE-PROCESSING BASED ON
MAPREDUCE FRAMEWORK.
ABSTRACT:
Now a day enormous amount of data is getting explored through Internet of Things (IoT) as
technologies are advancing and people uses these technologies in day to day activities, this
data is termed as Big Data having its characteristics and challenges. Frequent Itemset
Mining algorithms are aimed to disclose frequent itemsets from transactional database but
as the dataset size increases, it cannot be handled by traditional frequent itemset mining.
MapReduce programming model solves the problem of large datasets but it has large
communication cost which reduces execution efficiency. This proposed new pre-processed
k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach,
clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and
Eclat to mine frequent itemsets from generated clusters using MapReduce programming
model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by
applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing
technique
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Clustering and Sequential Pattern Mining of Online
Collaborative Learning Data.
Abstract : Group work is widespread in education. The growing use of online tools
supporting group work generates huge amounts of data. We aim to exploit this data to
support mirroring: presenting useful high-level views of information about the group,
together with desired patterns characterizing the behavior of strong groups. The goal is to
enable the groups and their facilitators to see relevant aspects of the group's operation and
provide feedback if these are more likely to be associated with positive or negative
outcomes and indicate where the problems are. We explore how useful mirror information
can be extracted via a theory-driven approach and a range of clustering and sequential
pattern mining. The context is a senior software development project where students use
the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker
groups and get insights in the success factors. The results point to the importance of
leadership and group interaction, and give promising indications if they are occurring.
Patterns indicating good individual practices were also identified. We found that some key
measures can be mined from early data. The results are promising for advising groups at
the start and early identification of effective and poor practices, in time for remediation.
Monitoring online Test.
Abstract : E-testing systems are widely adopted in academic environments, as well as in
combination with other assessment means, providing tutors with powerful tools to submit
different types of tests in order to assess learners’ knowledge. Among these, multiple-
choice tests are extremely popular, since they can be automatically corrected. However,
many learners do not welcome this type of test, because often, it does not let them properly
express their capacity, due to the characteristics of multiple-choice questions of being
closed-ended. Even many examiners doubt about the real effectiveness of structured tests
in assessing learners’ knowledge, and they wonder whether learners are more conditioned
by the question type than by its actual difficulty.
In this project, we propose a data exploration approach exploiting information
visualization in order to involve tutors in a visual data mining process aiming to detect
structures, patterns, and relations between data, which can potentially reveal previously
unknown knowledge inherent in tests, such as the test strategies used by the learners,
correlations among different questions, and many other aspects, including their impact on
the final score .It captures the occurrence of question browsing and answering events by
the learners and uses these data to visualize charts containing a chronological review of
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
tests. Other than identifying the most frequently employed strategies, the tutor can
determine their effectiveness by correlating their use with the final test scores.
profile matching in social networking.
ABSTRACT : In this paper, we study user profile matching with privacy-preservation in
mobile social networks (MSNs) and introduce a family of novel profile matching protocols.
We first propose an explicit Comparison-based Profile Matching protocol (eCPM) which
runs between two parties, an initiator and a responder. The eCPM enables the initiator to
obtain the comparison-based matching result about a specified attribute in their profiles,
while preventing their attribute values from disclosure. We then propose an implicit
Comparison-based Profile Matching protocol (iCPM) which allows the initiator to directly
obtain some messages instead of the comparison result from the responder. The messages
unrelated to user profile can be divided into multiple categories by the responder. The
initiator implicitly chooses the interested category which is unknown to the responder. Two
messages in each category are prepared by the responder, and only one message can be
obtained by the initiator according to the comparison result on a single attribute. We further
generalize the iCPM to an implicit Predicate-based Profile Matching protocol (iPPM) which
allows complex comparison criteria spanning multiple attributes. The anonymity analysis
shows all these protocols achieve the confidentiality of user profiles. In addition, the eCPM
reveals the comparison result to the initiator and provides only conditional anonymity; the
iCPM and the iPPM do not reveal the result at all and provide full anonymity. We analyze
the communication overhead and the anonymity strength of the protocols.
Analysis of twitter trends based on key detection and
link detection.
ABSTRACT:
Detection of emerging topics is now receiving renewed interest motivated by the rapid
growth of social networks. Conventional-term-frequency-based approaches may not be
appropriate in this context, because the information exchanged in social-network posts
include not only text but also images, URLs, and videos. We focus on emergence of topics
signaled by social aspects of theses networks. Specifically, we focus on mentions of user
links between users that are generated dynamically (intentionally or unintentionally) through
replies, mentions, and retweets. We propose a probability model of the mentioning behavior
of a social network user, and propose to detect the emergence of a new topic from the
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
anomalies measured through the model. Aggregating anomaly scores from hundreds of
users, we show that we can detect emerging topics only based on the reply/mention
relationships in social-network posts. We demonstrate our technique in several real data
sets we gathered from Twitter. The experiments show that the proposed mention-anomaly-
based approaches can detect new topics at least as early as text-anomaly-based
approaches, and in some cases much earlier when the topic is poorly identified by the
textual contents in posts.
Big Data Frequent Pattern Mining.
Abstract : Frequent pattern mining is an essential data mining task, with a goal of
discovering knowledge in the form of repeated patterns. Many efficient pattern mining
algorithms have been discovered in the last two decades, yet most do not scale to the type
of data we are presented with today, the so-called ―Big Data‖. Scalable parallel algorithms
hold the key to solving the problem in this context. In this chapter, we review recent
advances in parallel frequent pattern mining, analyzing them through the Big Data lens. We
identify three areas as challenges to designing parallel frequent pattern mining algorithms:
memory scalability, work partitioning, and load balancing. With these challenges as a frame
of reference, we extract and describe key algorithmic design patterns from the wealth of
research conducted in this domain.
Bootstrapping Privacy ontology for web services.
ABSTRACT: Ontologies have become the de-facto modeling tool of choice, employed in
many applications and prominently in the semantic web. Nevertheless, ontology
construction remains a daunting task. Ontological bootstrapping, which aims at
automatically generating concepts and their relations in a given domain, is a promising
technique for ontology construction. Bootstrapping an ontology based on a set of predefined
textual sources, such as web services, must address the problem of multiple, largely
unrelated concepts. In this paper, we propose an ontology bootstrapping process for web
services. We exploit the advantage that web services usually consist of both WSDL and
free text descriptors. The WSDL descriptor is evaluated using two methods, namely Term
Frequency/Inverse Document Frequency (TF/IDF) and web context generation. Our
proposed ontology bootstrapping process integrates the results of both methods and
applies a third method to validate the concepts using the service free text descriptor,
thereby offering a more accurate definition of ontologies. We extensively validated our
bootstrapping method using a large repository of real-world web services and verified the
results against existing ontologies. The experimental results indicate high precision.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Furthermore, the recall versus precision comparison of the results when each method is
separately implemented presents the advantage of our integrated bootstrapping approach
Then and Now: On the Maturity of the Cybercrime
Markets.
ABSTRACT: Due to the rise and rapid growth of E-Commerce, use of credit cards for online
purchases has dramatically increased and it caused an explosion in the credit card fraud.
As credit card becomes the most popular mode of payment for both online as well as
regular purchase, cases of fraud associated with it are also rising. In real life, fraudulent
transactions are scattered with genuine transactions and simple pattern matching
techniques are not often sufficient to detect those frauds accurately. Implementation of
efficient fraud detection systems has thus become imperative for all credit card issuing
banks to minimize their losses. Many modern techniques based on Artificial Intelligence,
Data mining, Fuzzy logic, Machine learning, Sequence Alignment, Genetic Programming
etc., has evolved in detecting various credit card fraudulent transactions. A clear
understanding on all these approaches will certainly lead to an efficient credit card fraud
detection system. This paper presents a survey of various techniques used in credit card
fraud detection mechanisms and evaluates each methodology based on certain design
criteria.
Social Set Analysis: A Set Theoretical Approach to
Big Data Analytics.
ABSTRACT : Current analytical approaches in computational social science can be
characterized by four dominant paradigms: text analysis (information extraction and
classification), social network analysis (graph theory), social complexity analysis (complex
systems science), and social simulations (cellular automata and agent-based modeling).
However, when it comes to organizational and societal units of analysis, there exists no
approach to conceptualize, model, analyze, explain, and predict social media interactions
as individuals’ associations with ideas, values, identities, and so on. To address this
limitation, based on the sociology of associations and the mathematics of set theory, this
paper presents a new approach to big data analytics called social set analysis. Social set
analysis consists of a generative framework for the philosophies of computational social
science, theory of social data, conceptual and formal models of social data, and an
analytical framework for combining big social data sets with organizational and societal data
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
sets. Three empirical studies of big social data are presented to illustrate and demonstrate
social set analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical
interaction analysis, and eventstudies-oriented set-theoretical visualizations. Implications for
big data analytics, current limitations of the set-theoretical approach, and future directions
are outlined.
Personalized Travel Sequence Recommendation on
Multi-Source Big Social Media.
ABSTRACT: Recent years have witnessed an increased interest in recommender systems.
Despite significant progress in this field, there still remain numerous avenues to explore.
Indeed, this paper provides a study of exploiting online travel information for personalized
travel package recommendation. A critical challenge along this line is to address the unique
characteristics of travel data, which distinguish travel packages from traditional items for
recommendation. To that end, in this paper, we first analyze the characteristics of the
existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST
model can represent travel packages and tourists by different topic distributions, where the
topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations,
travel seasons) of the landscapes. Then, based on this topic model representation, we
propose a cocktail approach to generate the lists for personalized travel package
recommendation. Furthermore, we extend the TAST model to the tourist-relation-area-
season topic (TRAST) model for capturing the latent relationships among the tourists in
each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail
recommendation approach on the real-world travel package data. Experimental results
show that the TAST model can effectively capture the unique characteristics of the travel
data and the cocktail approach is, thus, much more effective than traditional
recommendation techniques for travel package recommendation. Also, by considering
tourist relationships, the TRAST model can be used as an effective assessment for travel
group formation.
A Parallel Patient Treatment Time Prediction
Algorithm and Its Applications in Hospital Queuing-
Recommendation in a Big Data Environment.
Abstract : There is a need of continuous monitoring of vitalparameters of patient at critical
situation. The current scenario in hospital has a digital display for such parameters which is
observed by nurse. For such monitoring a dedicated person(nurse) is required. But looking
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
at the growing population this ratio of one nurse per patient would be a considerable
probable in future. So manually monitoring the patient should be replaced by some other
method. Online monitoring has attracted considerable attraction for many years. It includes
the applications which are not only limited up to industrial process monitoring and control
but has been extended up to civilian application areas like healthcare application, home
automation, traffic control etc. This paper discusses the feasibility of Instant Notification
System in Heterogeneous Sensor Network with Deployment of XMPP Protocol for medical
application. The system aims to provide an environment which enables medical
practitioners to distantly monitor various vital parameters of patients. For academic purpose
we have limited this system for use of monitoring patients’ body temperature and blood
pressure. The proposed system collects data from various heterogeneous sensor networks
– for example: patients’ body temperature, and blood pressure - converts it to a standard
packet and provides the facility to send it over a network using Extensible Messaging and
Presence Protocol (XMPP)- (in more common terms Instant Messaging (IM)). Use of
heterogeneous sensor networks (HSN) provides the much required platform independence,
while XMPP enables the instant notification
Relevance Feature Discovery for Text Mining.
Abstract —It is a big challenge to guarantee the quality of discovered relevance features in
text documents for describing user preferences because of large scale terms and data
patterns. Most existing popular text mining and classification methods have adopted term-
based approaches. However, they have all suffered from the problems of polysemy and
synonymy. Over the years, there has been often held the hypothesis that pattern-based
methods should perform better than term-based ones in describing user preferences; yet,
how to effectively use large scale patterns remains a hard problem in text mining. To make
a breakthrough in this challenging issue, this paper presents an innovative model for
relevance feature discovery. It discovers both positive and negative patterns in text
documents as higher level features and deploys them over low-level features (terms). It also
classifies terms into categories and updates term weights based on their specificity and
their distributions in patterns. Substantial experiments using this model on RCV1, TREC
topics and Reuters-21578 show that the proposed model significantly outperforms both the
state-of-the-art term-based methods and the pattern based methods.
A Novel Methodology of Frequent Itemset Mining on
Hadoop.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Abstract — Frequent Itemset Mining is one of the classical data mining problems in most of
the data mining applications. It requires very large computations and I/O traffic capacity.
Also resources like single processor’s memory and CPU are very limited, which degrades
the performance of algorithm. In this paper we have proposed one such distributed
algorithm which will run on Hadoop – one of the recent most popular distributed frameworks
which mainly focus on mapreduce paradigm. The proposed approach takes into account
inherent characteristics of the Apriori algorithm related to the frequent itemset generation
and through a block-based partitioning uses a dynamic workload management. The
algorithm greatly enhances the performance and achieves high scalability compared to the
existing distributed Apriori based approaches. Proposed algorithm is implemented and
tested on large scale datasets distributed over a cluster.
Online java compiler
Abstract
As it is a competitive world and very fast world, every thing in the universes is to be internet.
In this internet world all the things are on-line. So we created software called ―On-line java
compiler with security editor‖.
The main aim of this project we can easily to write a java program and compile it and debug
in on-line. The client machine doesn’t having java development kit .The client machine only
connected to the server. The server having java compiler .so server executes the java code
and produce the error message to the appropriate client machine.
In this project is also creating a security editor. This editor performs Encrypt and decrypts
the file. Encryption and decryption process perform using RSA Algorithms. There is lot of
security algorithms are there, but RSA algorithm is very efficient to encrypt and decrypt the
file.
In this project is used to view all type of java API .It is very useful for writing the java
program easily, for example if any error in the format of API means we can able to view API
throw this modules.
A Cloud Service Architecture for Analyzing Big
Monitoring
Abstract: Cloud monitoring is of a source of big data that are constantly produced from
traces of infrastructures, platforms, and applications. Analysis of monitoring data delivers
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
insights of the system’s workload and usage pattern and ensures workloads are operating
at optimum levels. The analysis process involves data query and
extraction, data analysis, and result visualization. Since the volume of monitoring data is
big, these operations require a scalable and reliable architecture to extract, aggregate, and
analyze data in an arbitrary range of granularity. Ultimately, the results of analysis become
the knowledge of the system and should be shared and
communicated. This paper presents our cloud service architecture that explores a search
cluster for data indexing and query. We develop REST APIs that the data can be accessed
by different analysis modules. This architecture enables extensions to integrate with
software frameworks of both batch processing (such as Hadoop) and stream processing
(such as Spark) of big data. The analysis results are structured in Semantic Media Wiki
pages in the context of the monitoring data source and the analysis process. This cloud
architecture is empirically assessed to evaluate its responsiveness when processing a large
set of data records under node failures.
A Tutorial on Secure Outsourcing of Large-scale
Computations for Big Data.
ABSTRACT: Today's society is collecting a massive and exponentially growing amount of
data that can potentially revolutionize scientic and engineering elds, and promote business
innovations.With the advent of cloud computing, in order to analyze data in a cost-effective
and practical way, users can outsource their computing tasks to the cloud, which offers
access to vast computing resources on an on-demand and pay-per-use basis. However,
since users' data contains sensitive information that needs to be kept secret for ethical,
security, or legal reasons, many users are reluctant to adopt cloud computing. To this end,
researchers have proposed techniques that enable users to ofoad computations to the
cloud while protecting their data privacy. In this paper, we review the recent advances in the
secure outsourcing of large-scale computations for a big data analysis. We rst introduce two
most fundamental and common computational problems, i.e., linear algebra and
optimization, and then provide an extensive review of the data privacy preserving
techniques. After that, we explain how researchers have exploited the data privacy
preserving techniques to construct secure outsourcing algorithms for large-scale
computations.
Protection of Big Data Privacy.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
ABSTRACT : In recent years, big data have become a hot research topic. The increasing
amount of big data also increases the chance of breaching the privacy of individuals. Since
big data require high computational power and large storage, distributed systems are used.
As multiple parties are involved in these systems, the
risk of privacy violation is increased. There have been a number of privacy-preserving
mechanisms developed for privacy protection at different stages (e.g., data generation, data
storage, and data processing) of a big data life cycle. The goal of this paper is to provide a
comprehensive overview of the privacy preservation
mechanisms in big data and present the challenges for existing mechanisms. In particular,
in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy-
preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the
challenges and future research directions related to privacy preservation in big data.
Towards a Virtual Domain Based authentication on
mapreduce.
ABSTRACT : This paper has proposed a novel authentication solution for the MapReduce
(MR) model, a new distributed and parallel computing paradigm commonly deployed to
process BigData by major IT players, such as Facebook and Yahoo. It identies a set of
security, performance, and scalability requirements that are
specied from a comprehensive study of a job execution process using MR and security
threats and attacks in this environment. Based on the requirements, it critically analyzes the
state-of-the-art authentication solutions, discovering that the authentication services
currently proposed for the MR model is not adequate.
This paper then presents a novel layered authentication solution for the MR model and
describes the core components of this solution, which includes the virtual domain based
authentication framework (VDAF). These novel ideas are signicant, because, rst, the
approach embeds the characteristics of MR-in-cloud deployments into security solution
designs, and this will allow the MR model be delivered as a software as a service in a public
cloud environment along with our proposed authentication solution; second, VDAF supports
the authentication of every interactions by any MR components involved in a job execution
ow, so long as the interactions are for accessing resources of the job; third, this continuous
authentication service is provided in such a manner that the costs incurred in providing the
authentication service should be as low as possible.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Predicting Instructor Performance Using Data Mining
Techniques in Higher Education.
ABSTRACT : Data mining applications are becoming a more common tool in understanding
and solving educational and administrative problems in higher education. In general,
research in educational mining focuses on modeling student's performance instead of
instructors' performance. One of the common tools to evaluate instructors' performance is
the course evaluation questionnaire to evaluate based on students' perception. In this
paper, four different classication techniquesdecision tree algorithms, support vector
machines, articial neural networks, and discriminant analysisare used to build classier
models. Their
performances are compared over a data set composed of responses of students to a real
course evaluation questionnaire using accuracy, precision, recall, and specicity
performance metrics. Although all the classier models show comparably high classication
performances, C5.0 classier is the best with respect to accuracy, precision, and specicity. In
addition, an analysis of the variable importance for each classier model is done.
Accordingly, it is shown that many of the questions in the course evaluation questionnaire
appear to be irrelevant. Furthermore, the analysis shows that the instructors' success based
on the students' perception mainly depends on the interest of the students in the course.
The ndings of this paper indicate the effectiveness and expressiveness of data mining
models in course evaluation and higher education mining. Moreover, these ndings may be
used to improve the measurement instruments.
Intra- and Inter-Fractional Variation Prediction of Lung
Tumors Using Fuzzy Deep Learning.
ABSTRACT : Tumor movements should be accurately predicted to improve delivery
accuracy and reduce unnecessary radiation exposure to healthy tissue during radiotherapy.
The tumor movements pertaining to respiration are divided into intra-fractional variation
occurring in a single treatment session and inter- fractional variation arising between
different sessions. Most studies of patients' respiration movements deal with intra-fractional
variation. Previous studies on inter-fractional variation are hardly mathematized and cannot
predict movements well due to inconstant variation. Moreover, the computation time of the
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
prediction should be reduced. To overcome these limitations, we propose a new predictor
for intra- and inter-fractional data variation, called intra- and inter-fraction fuzzy deep
learning (IIFDL), where FDL, equipped with breathing clustering, predicts the movement
accurately and decreases the computation time. Through the experimental results, we
validated that the IIFDL improved root-mean-square error (RMSE) by 29.98% and
prediction overshoot by 70.93%, compared with existing methods. The results also showed
that the IIFDL enhanced the average RMSE and overshoot by 59.73% and 83.27%,
respectively. In addition, the average computation time of IIFDL was 1.54 ms for both intra-
and inter-fractional variation, which was much smaller than the existing methods. Therefore,
the proposed IIFDL might achieve real-time estimation as well as better tracking techniques
in radiotherapy.
Web Service Personalized Quality of Service
Prediction via Reputation-Based Matrix Factorization.
Abstract—With the fast development of Web services in service oriented systems, the
requirement of efficient Quality of Service (QoS) evaluation methods becomes strong.
However, many QoS values are unknown in reality. Therefore, it is necessary to predict the
unknown QoS values of Web services based on the obtainable QoS values. Generally, the
QoS values of similar users are employed to make predictions for the current user.
However, the QoS values may be contributed from unreliable users, leading to inaccuracy
of the prediction results. To address this problem, we present a highly credible approach,
called reputation-based Matrix Factorization (RMF), for predicting the unknown Web service
QoS values. RMF first calculates the reputation of each user based on their contributed
QoS values to quantify the credibility of users, and then takes the users' reputation into
consideration for achieving more accurate QoS prediction. Reputation-based
matrixfactorization is applicable to the prediction of QoS data in the presence of unreliable
user-provided QoS values. Extensive experiments are conducted with real-world Web
service QoS data sets, and the experimental results show that our proposed approach
outperforms other existing approaches.
A Supermodularity-Based Differential Privacy Preserving
Algorithm for Data Anonymization.
Maximizing data usage and minimizing privacy risk are two conflicting goals. Organizations
always apply a set of transformations on their data before releasing it. While determining
the best set of transformations has been the focus of extensive work in the database
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
community, most of this work suffered from one or both of the following major problems:
scalability and privacy guarantee. Differential Privacy provides a theoretical formulation for
privacy that ensures that the system essentially behaves the same way regardless of
whether any individual is included in the database. In this paper, we address both scalability
and privacy risk of data anonymization. We propose a scalable algorithm that meets
differential privacy when applying a specific random sampling. The contribution of the paper
is two-fold: 1) we propose a personalized anonymization technique based on an aggregate
formulation and prove that it can be implemented in polynomial time; and 2) we show that
combining the proposed aggregate formulation with specific sampling gives an
anonymization algorithm that satisfies differential privacy. Our results rely heavily on
exploring the supermodularity properties of the risk function, which allow us to employ
techniques from convex optimization. Through experimental studies we compare our
proposed algorithm with other anonymization schemes in terms of both time and privacy
risk.
A Data-Mining Model for Protection of FACTS-Based
Transmission Line.
Synopsis:
This paper presents a data-mining model for fault-zone identification of a flexible ac
transmission systems (FACTS)-based transmission line including a thyristor-controlled
series compensator (TCSC) and unified power-flow controller (UPFC), using ensemble
decision trees. Given the randomness in the ensemble of decision trees stacked inside the
random forests model, it provides effective decision on fault-zone identification. Half-cycle
postfault current and voltage samples from the fault inception are used as an input vector
against target output ―1‖ for the fault after TCSC/UPFC and ― - 1‖ for the fault before
TCSC/UPFC for fault-zone identification. The algorithm is tested on simulated fault data
with wide variations in operating parameters of the power system network, including noisy
environment providing a reliability measure of 99% with faster response time (3/4th cycle
from fault inception). The results of the presented approach using the RF model indicate
reliable identification of the fault zone in FACTS-based transmission lines.
A Temporal Pattern Search Algorithm for Personal History
Event Visualization.
Synopsis:
We present Temporal Pattern Search (TPS), a novel algorithm for searching for temporal
patterns of events in historical personal histories. The traditional method of searching for
such patterns uses an automaton-based approach over a single array of events, sorted by
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
time stamps. Instead, TPS operates on a set of arrays, where each array contains all events
of the same type, sorted by time stamps. TPS searches for a particular item in the pattern
using a binary search over the appropriate arrays. Although binary search is considerably
more expensive per item, it allows TPS to skip many unnecessary events in personal
histories. We show that TPS's running time is bounded by O(m2n lg(n)), where m is the
length of (number of events) a search pattern, and n is the number of events in a record
(history). Although the asymptotic running time of TPS is inferior to that of a
nondeterministic finite automaton (NFA) approach (O(mn)), TPS performs better than NFA
under our experimental conditions. We also show TPS is very competitive with Shift-And, a
bit-parallel approach, with real data. Since the experimental conditions we describe here
subsume the conditions under which analysts would typically use TPS (i.e., within an
interactive visualization program), we argue that TPS is an appropriate design choice for us.
Adaptive Cluster Distance Bounding for High Dimensional
Indexing.
Synopsis:
We consider approaches for similarity search in correlated, high-dimensional data sets,
which are derived within a clustering framework. We note that indexing by ―vector
approximation‖ (VA-File), which was proposed as a technique to combat the ―Curse of
Dimensionality,‖ employs scalar quantization, and hence necessarily ignores dependencies
across dimensions, which represents a source of suboptimality. Clustering, on the other
hand, exploits interdimensional correlations and is thus a more compact representation of
the data set. However, existing methods to prune irrelevant clusters are based on bounding
hyperspheres and/or bounding rectangles, whose lack of tightness compromises their
efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance
bound based on separating hyperplane boundaries of Voronoi clusters to complement our
cluster based index. This bound enables efficient spatial filtering, with a relatively small
preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity
measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets,
show that our indexing method is scalable with data set size and data dimensionality and
outperforms several recently proposed indexes. Relative to the VA-File, over a wide range
of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the
same amount of sequential IO operations, by factors reaching 100X and more.
Approximate Shortest Distance Computing: A Query-
Dependent Local Landmark Scheme.
Synopsis:
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Shortest distance query is a fundamental operation in large-scale networks. Many existing
methods in the literature take a landmark embedding approach, which selects a set of graph
nodes as landmarks and computes the shortest distances from each landmark to all nodes
as an embedding. To answer a shortest distance query, the precomputed distances from
the landmarks to the two query nodes are used to compute an approximate shortest
distance based on the triangle inequality. In this paper, we analyze the factors that affect
the accuracy of distance estimation in landmark embedding. In particular, we find that a
globally selected, query-independent landmark set may introduce a large relative error,
especially for nearby query nodes. To address this issue, we propose a query-dependent
local landmark scheme, which identifies a local landmark close to both query nodes and
provides more accurate distance estimation than the traditional global landmark approach.
We propose efficient local landmark indexing and retrieval techniques, which achieve low
offline indexing complexity and online query complexity. Two optimization techniques on
graph compression and graph online search are also proposed, with the goal of further
reducing index size and improving query accuracy. Furthermore, the challenge of immense
graphs whose index may not fit in the memory leads us to store the embedding in relational
database, so that a query of the local landmark scheme can be expressed with relational
operators. Effective indexing and query optimization mechanisms are designed in this
context. Our experimental results on large-scale social networks and road networks
demonstrate that the local landmark scheme reduces the shortest distance estimation error
significantly when compared with global landmark embedding and the state-of-the-art
sketch-based embedding.
A Fast Clustering-Based Feature Subset Selection Algorithm
for High Dimensional Data.
Synopsis:
Feature selection involves identifying a subset of the most useful features that produces
compatible results as the original entire set of features. A feature selection algorithm may
be evaluated from both the efficiency and effectiveness points of view. While the efficiency
concerns the time required to find a subset of features, the effectiveness is related to the
quality of the subset of features. Based on these criteria, a fast clustering-based feature
selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The
FAST algorithm works in two steps. In the first step, features are divided into clusters by
using graph-theoretic clustering methods. In the second step, the most representative
feature that is strongly related to target classes is selected from each cluster to form a
subset of features. Features in different clusters are relatively independent, the clustering-
based strategy of FAST has a high probability of producing a subset of useful and
independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-
spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
algorithm are evaluated through an empirical study. Extensive experiments are carried out
to compare FAST and several representative feature selection algorithms, namely, FCBF,
ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers,
namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1,
and the rule-based RIPPER before and after feature selection. The results, on 35 publicly
available real-world high-dimensional image, microarray, and text data, demonstrate that
the FAST not only produces smaller subsets of features but also improves the
performances of the four types of classifiers.
Advance Mining of Temporal High Utility Itemset.
Synopsis:
The stock market domain is a dynamic and unpredictable environment. Traditional
techniques, such as fundamental and technical analysis can provide investors with some
tools for managing their stocks and predicting their prices. However, these techniques
cannot discover all the possible relations between stocks and thus there is a need for a
different approach that will provide a deeper kind of analysis. Data mining can be used
extensively in the financial markets and help in stock-price forecasting. Therefore, we
propose in this paper a portfolio management solution with business intelligence
characteristics. We know that the temporal high utility itemsets are the itemsets with support
larger than a pre-specified threshold in current time window of data stream. Discovery of
temporal high utility itemsets is an important process for mining interesting patterns like
association rules from data streams. We proposed the novel algorithm for temporal
association mining with utility approach. This make us to find the temporal high utility
itemset which can generate less candidate itemsets.
Data Leakage Detection.
Synopsis:
We study the following problem: A data distributor has given sensitive data to a set of
supposedly trusted agents (third parties). Some of the data are leaked and found in an
unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess
the likelihood that the leaked data came from one or more agents, as opposed to having
been independently gathered by other means. We propose data allocation strategies
(across the agents) that improve the probability of identifying leakages. These methods do
not rely on alterations of the released data (e.g., watermarks). In some cases, we can also
inject ―realistic but fake‖ data records to further improve our chances of detecting leakage
and identifying the guilty party.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Best Keyword Cover Search.
Synopsis:
It is common that the objects in a spatial database (e.g., restaurants/hotels) are
associated with keyword(s) to indicate their businesses/services/features. An
interesting problem known as Closest Keywords search is to query objects, called
keyword cover, which together cover a set of query keywords and have the
minimum inter-objects distance. In recent years, we observe the increasing
availability and importance of keyword rating in object evaluation for the better
decision making. This motivates us to investigate a generic version of Closest
Keywords search called Best Keyword Cover which considers inter-objects distance
as well as the keyword rating of objects. The baseline algorithm is inspired by the
methods of Closest Keywords search which is based on exhaustively combining
objects from different query keywords to generate candidate keyword covers. When
the number of query keywords increases, the performance of the baseline algorithm
drops dramatically as a result of massive candidate keyword covers generated. To
attack this drawback, this work proposes a much more scalable algorithm called
keyword nearest neighbor expansion (keyword-NNE). Compared to the baseline
algorithm, keyword-NNE algorithm significantly reduces the number of candidate
keyword covers generated. The in-depth analysis and extensive experiments on real
data sets have justified the superiority of our keyword-NNE algorithm.
A Generalized Flow-Based Method for Analysis of Implicit
Relationships on Wikipedia.
Synopsis:
We focus on measuring relationships between pairs of objects in Wikipedia whose pages
can be regarded as individual objects. Two kinds of relationships between two objects exist:
in Wikipedia, an explicit relationship is represented by a single link between the two pages
for the objects, and an implicit relationship is represented by a link structure containing the
two pages. Some of the previously proposed methods for measuring relationships are
cohesion-based methods, which underestimate objects having high degrees, although such
objects could be important in constituting relationships in Wikipedia. The other methods are
inadequate for measuring implicit relationships because they use only one or two of the
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
following three important factors: distance, connectivity, and cocitation. We propose a new
method using a generalized maximum flow which reflects all the three factors and does not
underestimate objects having high degree. We confirm through experiments that our
method can measure the strength of a relationship more appropriately than these previously
proposed methods do. Another remarkable aspect of our method is mining elucidatory
objects, that is, objects constituting a relationship. We explain that mining elucidatory
objects would open a novel way to deeply understand a relationship.
An Exploration of Improving Collaborative Recommender
Systems via User-Item Subgroups.
Synopsis:
Collaborative filtering (CF) is one of the most successful recommendation approaches. It
typically associates a user with a group of like-minded users based on their preferences
over all the items, and recommends to the user those items enjoyed by others in the group.
However we find that two users with similar tastes on one item subset may have totally
different tastes on another set. In other words, there exist many user-item subgroups each
consisting of a subset of items and a group of like-minded users on these items. It is more
natural to make preference predictions for a user via the correlated subgroups than the
entire user-item matrix. In this paper, to find meaningful subgroups, we formulate the
Multiclass Co-Clustering (MCoC) problem and propose an effective solution to it. Then we
propose an unified framework to extend the traditional CF algorithms by utilizing the
subgroups information for improving their top-N recommendation performance. Our
approach can be seen as an extension of traditional clustering CF models. Systematic
experiments on three real world data sets have demonstrated the effectiveness of our
proposed approach.
Decision Trees for Uncertain Data.
Synopsis:
Traditional decision tree classifiers work with data whose values are known and precise. We
extend such classifiers to handle data with uncertain information. Value uncertainty arises in
many applications during the data collection process. Example sources of uncertainty include
measurement/quantization errors, data staleness, and multiple repeated measurements. With
uncertainty, the value of a data item is often represented not by one single value, but by multiple
values forming a probability distribution. Rather than abstracting uncertain data by statistical
derivatives (such as mean and median), we discover that the accuracy of a decision tree
classifier can be much improved if the "complete information" of a data item (taking into account
the probability density function (pdf)) is utilized. We extend classical decision tree building
algorithms to handle data tuples with uncertain values. Extensive experiments have been
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
conducted which show that the resulting classifiers are more accurate than those using value
averages. Since processing pdfs is computationally more costly than processing single values
(e.g., averages), decision tree construction on uncertain data is more CPU demanding than that
for certain data. To tackle this problem, we propose a series of pruning techniques that can
greatly improve construction efficiency.
Building Confidential and Efficient Query Services in the
Cloud with RASP Data Perturbation..
Synopsis:
With the wide deployment of public cloud computing infrastructures, using clouds to host
data query services has become an appealing solution for the advantages on scalability and
cost-saving. However, some data might be sensitive that the data owner does not want to
move to the cloud unless the data confidentiality and query privacy are guaranteed. On the
other hand, a secured query service should still provide efficient query processing and
significantly reduce the in-house workload to fully realize the benefits of cloud computing.
We propose the random space perturbation (RASP) data perturbation method to provide
secure and efficient range query and kNN query services for protected data in the cloud.
The RASP data perturbation method combines order preserving encryption, dimensionality
expansion, random noise injection, and random projection, to provide strong resilience to
attacks on the perturbed data and queries. It also preserves multidimensional ranges, which
allows existing indexing techniques to be applied to speedup range query processing. The
kNN-R algorithm is designed to work with the RASP range query algorithm to process the
kNN queries. We have carefully analyzed the attacks on data and queries under a precisely
defined threat model and realistic security assumptions. Extensive experiments have been
conducted to show the advantages of this approach on efficiency and security.
A Methodology for Direct and Indirect Discrimination
Prevention in Data Mining.
Synopsis:
Data mining is an increasingly important technology for extracting useful knowledge hidden
in large collections of data. There are, however, negative social perceptions about data
mining, among which potential privacy invasion and potential discrimination. The latter
consists of unfairly treating people on the basis of their belonging to a specific group.
Automated data collection and data mining techniques such as classification rule mining
have paved the way to making automated decisions, like loan granting/denial, insurance
premium computation, etc. If the training data sets are biased in what regards discriminatory
(sensitive) attributes like gender, race, religion, etc., discriminatory decisions may ensue.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
For this reason, anti-discrimination techniques including discrimination discovery and
prevention have been introduced in data mining. Discrimination can be either direct or
indirect. Direct discrimination occurs when decisions are made based on sensitive
attributes. Indirect discrimination occurs when decisions are made based on nonsensitive
attributes which are strongly correlated with biased sensitive ones. In this paper, we tackle
discrimination prevention in data mining and propose new techniques applicable for direct
or indirect discrimination prevention individually or both at the same time. We discuss how
to clean training data sets and outsourced data sets in such a way that direct and/or indirect
discriminatory decision rules are converted to legitimate (nondiscriminatory) classification
rules. We also propose new metrics to evaluate the utility of the proposed approaches and
we compare these approaches. The experimental evaluations demonstrate that the
proposed techniques are effective at removing direct and/or indirect discrimination biases in
the original data set while preserving data quality.
Anomaly Detection for Discrete Sequences: A Survey.
Synopsis:
This survey attempts to provide a comprehensive and structured overview of the existing
research for the problem of detecting anomalies in discrete/symbolic sequences. The
objective is to provide a global understanding of the sequence anomaly detection problem
and how existing techniques relate to each other. The key contribution of this survey is the
classification of the existing research into three distinct categories, based on the problem
formulation that they are trying to solve. These problem formulations are: 1) identifying
anomalous sequences with respect to a database of normal sequences; 2) identifying an
anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence
whose frequency of occurrence is anomalous. We show how each of these problem
formulations is characteristically distinct from each other and discuss their relevance in
various application domains. We review techniques from many disparate and disconnected
application domains that address each of these formulations. Within each problem
formulation, we group techniques into categories based on the nature of the underlying
algorithm. For each category, we provide a basic anomaly detection technique, and show
how the existing techniques are variants of the basic technique. This approach shows how
different techniques within a category are related or different from each other. Our
categorization reveals new variants and combinations that have not been investigated
before for anomaly detection. We also provide a discussion of relative strengths and
weaknesses of different techniques. We show how techniques developed for one problem
formulation can be adapted to solve a different formulation, thereby providing several novel
adaptations to solve the different problem formulations. We also highlight the applicability of
the techniques that handle discrete sequences to other related areas such as online
anomaly detection and time series anomaly detection.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Discovering Conditional Functional Dependencies.
Synopsis:
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs
are a recent extension of functional dependencies (FDs) by supporting patterns of
semantically related constants, and can be used as rules for cleaning relational data.
However, finding quality CFDs is an expensive process that involves intensive manual
effort. To effectively identify data cleaning rules, we develop techniques for discovering
CFDs from relations. Already hard for traditional FDs, the discovery problem is more difficult
for CFDs. Indeed, mining patterns in CFDs introduces new challenges. We provide three
methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for
mining closed item sets, and is used to discover constant CFDs, namely, CFDs with
constant patterns only. Constant CFDs are particularly important for object identification,
which is essential to data cleaning and data integration. The other two algorithms are
developed for discovering general CFDs. One algorithm, referred to as CTANE, is a
levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other,
referred to as FastCFD, is based on the depth-first approach used in FastFD, a method for
discovering FDs. It leverages closed-item-set mining to reduce the search space. As
verified by our experimental study, CFDMiner can be multiple orders of magnitude faster
than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given
relation is large, but it does not scale well with the arity of the relation. FastCFD is far more
efficient than CTANE when the arity of the relation is large; better still, leveraging
optimization based on closed-item-set mining, FastCFD also scales well with the size of the
relation. These algorithms provide a set of cleaning-rule discovery tools for users to choose
for different applications.
Capturing Telic/Atelic Temporal Data Semantics:
Generalizing Conventional Conceptual Models.
Synopsis:
Time provides context for all our experiences, cognition, and coordinated collective action.
Prior research in linguistics, artificial intelligence, and temporal databases suggests the
need to differentiate between temporal facts with goal-related semantics (i.e., telic) from
those are intrinsically devoid of culmination (i.e., atelic). To differentiate between telic and
atelic data semantics in conceptual database design, we propose an annotation-based
temporal conceptual model that generalizes the semantics of a conventional conceptual
model. Our temporal conceptual design approach involves: 1) capturing "what" semantics
using a conventional conceptual model; 2) employing annotations to differentiate between
telic and atelic data semantics that help capture "when" semantics; 3) specifying temporal
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
constraints, specifically nonsequenced semantics, in the temporal data dictionary as
metadata. Our proposed approach provides a mechanism to represent telic/atelic temporal
semantics using temporal annotations. We also show how these semantics can be formally
defined using constructs of the conventional conceptual model and axioms in first-order
logic. Via what we refer to as the "semantics of composition," i.e., semantics implied by the
interaction of annotations, we illustrate the logical consequences of representing telic/atelic
data semantics during temporal conceptual design.
A New Algorithm for Inferring User Search Goals with
Feedback Sessions.
Synopsis:
For a broad-topic and ambiguous query, different users may have different search goals
when they submit it to a search engine. The inference and analysis of user search goals
can be very useful in improving search engine relevance and user experience. In this paper,
we propose a novel approach to infer user search goals by analyzing search engine query
logs. First, we propose a framework to discover different user search goals for a query by
clustering the proposed feedback sessions. Feedback sessions are constructed from user
click-through logs and can efficiently reflect the information needs of users. Second, we
propose a novel approach to generate pseudo-documents to better represent the feedback
sessions for clustering. Finally, we propose a new criterion )―Classified Average Precision
(CAP)‖ to evaluate the performance of inferring user search goals. Experimental results are
presented using user click-through logs from a commercial search engine to validate the
effectiveness of our proposed methods.
Automatic Discovery of Association Orders between Name
and Aliases from the Web using Anchor Texts-based Co-
occurrences.
Synopsis:
Many celebrities and experts from various fields may have been referred by not only their
personal names but also by their aliases on web. Aliases are very important in information
retrieval to retrieve complete information about a personal name from the web, as some of
the web pages of the person may also be referred by his aliases. The aliases for a personal
name are extracted by previously proposed alias extraction method. In information retrieval,
the web search engine automatically expands the search query on a person name by
tagging his aliases for complete information retrieval thereby improving recall in relation
detection task and achieving a significant mean reciprocal rank (MRR) of search engine.
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
For the further substantial improvement on recall and MRR from the previously proposed
methods, our proposed method will order the aliases based on their associations with the
name using the definition of anchor texts-based co-occurrences between name and aliases
in order to help the search engine tag the aliases according to the order of associations.
The association orders will automatically be discovered by creating an anchor texts-based
co-occurrence graph between name and aliases. Ranking support vector machine (SVM)
will be used to create connections between name and aliases in the graph by performing
ranking on anchor texts-based co-occurrence measures. The hop distances between nodes
in the graph will lead to have the associations between name and aliases. The hop
distances will be found by mining the graph. The proposed method will outperform
previously proposed methods, achieving substantial growth on recall and MRR.
Effective Navigation of Query Results Based on Concept
Hierarchies.
Synopsis:
Search queries on biomedical databases, such as PubMed, often return a large number of
results, only a small subset of which is relevant to the user. Ranking and categorization,
which can also be combined, have been proposed to alleviate this information overload
problem. Results categorization for biomedical databases is the focus of this work. A natural
way to organize biomedical citations is according to their MeSH annotations. MeSH is a
comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav
system, a novel search interface that enables the user to navigate large number of query
results by organizing them using the MeSH concept hierarchy. First, the query results are
organized into a navigation tree. At each node expansion step, BioNav reveals only a small
subset of the concept nodes, selected such that the expected user navigation cost is
minimized. In contrast, previous works expand the hierarchy in a predefined static manner,
without navigation cost modeling. We show that the problem of selecting the best concepts
to reveal at each node expansion is NP-complete and propose an efficient heuristic as well
as a feasible optimal algorithm for relatively small trees. We show experimentally that
BioNav outperforms state-of-the-art categorization systems by up to an order of magnitude,
with respect to the user navigation cost. BioNav for the MEDLINE database is available at.
Dealing With Concept Drifts in Process MiningServices.
Synopsis:
Although most business processes change over time, contemporary process mining
techniques tend to analyze these processes as if they are in a steady state. Processes may
change suddenly or gradually. The drift may be periodic (e.g., because of seasonal
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
influences) or one-of-a-kind (e.g., the effects of new legislation). For the process
management, it is crucial to discover and understand such concept drifts in processes. This
paper presents a generic framework and specific techniques to detect when a process
changes and to localize the parts of the process that have changed. Different features are
proposed to characterize relationships among activities. These features are used to
discover differences between successive populations. The approach has been implemented
as a plug-in of the ProM process mining framework and has been evaluated using both
simulated event data exhibiting controlled concept drifts and real-life event data from a
Dutch municipality.
A Probabilistic Approach to String Transformation.
Synopsis:
Many problems in natural language processing, data mining, information retrieval, and
bioinformatics can be formalized as string transformation, which is a task as follows. Given
an input string, the system generates the k most likely output strings corresponding to the
input string. This paper proposes a novel and probabilistic approach to string
transformation, which is both accurate and efficient. The approach includes the use of a log
linear model, a method for training the model, and an algorithm for generating the top k
candidates, whether there is or is not a predefined dictionary. The log linear model is
defined as a conditional probability distribution of an output string and a rule set for the
transformation conditioned on an input string. The learning method employs maximum
likelihood estimation for parameter estimation. The string generation algorithm based on
pruning is guaranteed to generate the optimal top k candidates. The proposed method is
applied to correction of spelling errors in queries as well as reformulation of queries in web
search. Experimental results on large scale data show that the proposed approach is very
accurate and efficient improving upon existing methods in terms of accuracy and efficiency
in different settings.
Confucius A Tool Supporting Collaborative Scientific
Workflow Composition.
Synopsis:
Modern scientific data management and analysis usually rely on multiple scientists with
diverse expertise. In recent years, such a collaborative effort is often structured and
automated by a data flow-oriented process called scientific workflow. However, such
workflows may have to be designed and revised among multiple scientists over a long time
period. Existing workbenches are single user-oriented and do not support scientific workflow
application development in a "collaborative fashion". In this paper, we report our research
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
on the enabling techniques in the aspects of collaboration provenance management and
reproduciability. Based on a scientific collaboration ontology, we propose a service-oriented
collaboration model supported by a set of composable collaboration primitives and patterns.
The collaboration protocols are then applied to support effective concurrency control in the
process of collaborative workflow composition. We also report the design and development
of Confucius, a service-oriented collaborative scientific workflow composition tool that
extends an open-source, single-user development environment.
Extended XML Tree Pattern Matching: Theories and
Algorithms.
Synopsis:
As business and enterprises generate and exchange XML data more often, there is an
increasing need for efficient processing of queries on XML data. Searching for the
occurrences of a tree pattern query in an XML database is a core operation in XML query
processing. Prior works demonstrate that holistic twig pattern matching algorithm is an
efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor-
descendant (A-D) relationships, as it can effectively control the size of intermediate results
during query processing. However, XML query languages (e.g., XPath and XQuery) define
more axes and functions such as negation function, order-based axis, and wildcards. In this
paper, we research a large set of XML tree pattern, called extended XML tree pattern,
which may include P-C, A-D relationships, negation functions, wildcards, and order
restriction. We establish a theoretical framework about ―matching cross‖ which
demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on
our theorems, we propose a set of novel algorithms to efficiently process three categories of
extended XML tree patterns. A set of experimental results on both real-life and synthetic
data sets demonstrate the effectiveness and efficiency of our proposed theories and
algorithms.
Efficient Ranking on Entity Graphs with Personalized
Relationships.
Synopsis:
Authority flow techniques like PageRank and ObjectRank can provide personalized ranking of
typed entity-relationship graphs. There are two main ways to personalize authority flow ranking:
Node-based personalization, where authority originates from a set of user-specific nodes; edge-
based personalization, where the importance of different edge types is user-specific. We
propose the first approach to achieve efficient edge-based personalization using a combination
of precomputation and runtime algorithms. In particular, we apply our method to ObjectRank,
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
where a personalized weight assignment vector (WAV) assigns different weights to each edge
type or relationship type. Our approach includes a repository of rankings for various WAVs. We
consider the following two classes of approximation: (a) SchemaApprox is formulated as a
distance minimization problem at the schema level; (b) DataApprox is a distance minimization
problem at the data graph level. SchemaApprox is not robust since it does not distinguish
between important and trivial edge types based on the edge distribution in the data graph. In
contrast, DataApprox has a provable error bound. Both SchemaApprox and DataApprox are
expensive so we develop efficient heuristic implementations, ScaleRank and PickOne
respectively. Extensive experiments on the DBLP data graph show that ScaleRank provides a
fast and accurate personalized authority flow ranking.
A Survey of XML Tree Patterns.
Synopsis:
With XML becoming a ubiquitous language for data interoperability purposes in various
domains, efficiently querying XML data is a critical issue. This has lead to the design of
algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model
of XML. Tree patterns are graphic representations of queries over data trees. They are
actually matched against an input data tree to answer a query. Since the turn of the 21st
century, an astounding research effort has been focusing on tree pattern models and
matching optimization (a primordial issue). This paper is a comprehensive survey of these
topics, in which we outline and compare the various features of tree patterns. We also
review and discuss the two main families of approaches for optimizing tree pattern
matching, namely pattern tree minimization and holistic matching. We finally present actual
tree pattern-based developments, to provide a global overview of this significant research
topic.
Coupled Behavior Analysis for Capturing Coupling
Relationships in Group-based Market Manipulations.
Synopsis:
Coupled behaviors, which refer to behaviors having some relationships between them, are
usually seen in many real-world scenarios, especially in stock markets. Recently, the
coupled hidden Markov model (CHMM)-based coupled behavior analysis has been
proposed to consider the coupled relationships in a hidden state space. However, it requires
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
aggregation of the behavioral data to cater for the CHMM modeling, which may overlook the
couplings within the aggregated behaviors to some extent. In addition, the Markov
assumption limits its capability to capturing temporal couplings. Thus, this paper proposes a
novel graph-based framework for detecting abnormal coupled behaviors. The proposed
framework represents the coupled behaviors in a graph view without aggregating the
behavioral data and is flexible to capture richer coupling information of the behaviors (not
necessarily temporal relations). On top of that, the couplings are learned via relational
learning methods and an efficient anomaly detection algorithm is proposed as well.
Experimental results on a real-world data set in stock markets show that the proposed
framework outperforms the CHMM-based one in both technical and business measures.
Group Enclosing Queries..
Synopsis:
Given a set of points P and a query set Q, a group enclosing query (Geq) fetches the point
p* ∈ P such that the maximum distance of p* to all points in Q is minimized. This problem is
equivalent to the Min-Max case (minimizing the maximum distance) of aggregate nearest
neighbor queries for spatial databases. This work first designs a new exact solution by
exploring new geometric insights, such as the minimum enclosing ball, the convex hull, and
the furthest voronoi diagram of the query group. To further reduce the query cost, especially
when the dimensionality increases, we turn to approximation algorithms. Our main
approximation algorithm has a worst case √2-approximation ratio if one can find the exact
nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a
large number of data sets up to six dimensions. We also discuss how to extend it to higher
dimensions (up to 74 in our experiment) and show that it still maintains a very good
approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend
the √2-approximation algorithm to get a (1 + ε)-approximate solution for the Geq problem.
Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where
N and M are the sizes of the data set P and query group Q. Extensive experiments on both
synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the
efficiency, effectiveness, and scalability of the proposed algorithms, especially their
significant improvement over the state-of-the-art method.
Facilitating Document Annotation using Content and
Querying Value.
Synopsis:
A large number of organizations today generate and share textual descriptions of their
products, services, and actions. Such collections of textual data contain significant amount
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
of structured information, which remains buried in the unstructured text. While information
extraction algorithms facilitate the extraction of structured relations, they are often
expensive and inaccurate, especially when operating on top of text that does not contain
any instances of the targeted structured information. We present a novel alternative
approach that facilitates the generation of the structured metadata by identifying documents
that are likely to contain information of interest and this information is going to be
subsequently useful for querying the database. Our approach relies on the idea that
humans are more likely to add the necessary metadata during creation time, if prompted by
the interface; or that it is much easier for humans (and/or algorithms) to identify the
metadata when such information actually exists in the document, instead of naively
prompting users to fill in forms with information that is not available in the document. As a
major contribution of this paper, we present algorithms that identify structured attributes that
are likely to appear within the document, by jointly utilizing the content of the text and the
query workload. Our experimental evaluation shows that our approach generates superior
results compared to approaches that rely only on the textual content or only on the query
workload, to identify attributes of interest.
A System to Filter Unwanted Messages from OSN User
Walls.
Synopsis:
One fundamental issue in today's Online Social Networks (OSNs) is to give users the ability
to control the messages posted on their own private space to avoid that unwanted content
is displayed. Up to now, OSNs provide little support to this requirement. To fill the gap, in
this paper, we propose a system allowing OSN users to have a direct control on the
messages posted on their walls. This is achieved through a flexible rule-based system,
thatallows users to customize the filtering criteria to be applied to their walls, and a Machine
Learning-based soft classifier automatically labeling messages in support of content-based
filtering.
Creating Evolving User Behaviour Profiles Automatically.
Synopsis:
Knowledge about computer users is very beneficial for assisting them, predicting their future
actions or detecting masqueraders. In this paper, a new approach for creating and
recognizing automatically the behavior profile of a computer user is presented. In this case,
a computer user behavior is represented as the sequence of the commands she/he types
during her/his work. This sequence is transformed into a distribution of relevant
subsequences of commands in order to find out a profile that defines its behavior. Also,
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
because a user profile is not necessarily fixed but rather it evolves/changes, we propose an
evolving method to keep up to date the created profiles using an Evolving Systems
approach. In this paper, we combine the evolving classifier with a trie-based user profiling to
obtain a powerful self-learning online scheme. We also develop further the recursive
formula of the potential of a data point to become a cluster center using cosine distance,
which is provided in the Appendix. The novel approach proposed in this paper can be
applicable to any problem of dynamic/evolving user behavior modeling where it can be
represented as a sequence of actions or events. It has been evaluated on several real data
streams.
Load Shedding in Mobile Systems with MobiQual.
Synopsis:
In location-based, mobile continual query (CQ) systems, two key measures of quality-of-
service (QoS) are: freshness and accuracy. To achieve freshness, the CQ server must
perform frequent query reevaluations. To attain accuracy, the CQ server must receive and
process frequent position updates from the mobile nodes. However, it is often difficult to
obtain fresh and accurate CQ results simultaneously, due to 1) limited resources in
computing and communication and 2) fast-changing load conditions caused by continuous
mobile node movement. Hence, a key challenge for a mobile CQ system is: How do we
achieve the highest possible quality of the CQ results, in both freshness and accuracy, with
currently available resources? In this paper, we formulate this problem as a load shedding
one, and develop MobiQual-a QoS-aware approach to performing both update load
shedding and query load shedding. The design of MobiQual highlights three important
features. 1) Differentiated load shedding: We apply different amounts of query load
shedding and update load shedding to different groups of queries and mobile nodes,
respectively. 2) Per-query QoS specification: Individualized QoS specifications are used to
maximize the overall freshness and accuracy of the query results. 3) Low-cost adaptation:
MobiQual dynamically adapts, with a minimal overhead, to changing load conditions and
available resources. We conduct a set of comprehensive experiments to evaluate the
effectiveness of MobiQual. The results show that, through a careful combination of update
and query load shedding, the MobiQual approach leads to much higher freshness and
accuracy in the query results in all cases, compared to existing approaches that lack the
QoS-awareness properties of MobiQual, as well as the solutions that perform query-only or
update-only load shedding.
Fast Nearest Neighbor Search with Keywords.
Synopsis:
www.redpel.com +917620593389 redpelsoftware@gmail.com
WhitePel Software Pvt Ltd
63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001
www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com
Conventional spatial queries, such as range search and nearest neighbor retrieval, involve
only conditions on objects' geometric properties. Today, many modern applications call for
novel forms of queries that aim to find objects satisfying both a spatial predicate, and a
predicate on their associated texts. For example, instead of considering all the restaurants,
a nearest neighbor query would instead ask for the restaurant that is the closest among
those whose menus contain ―steak, spaghetti, brandy‖ all at the same time. Currently, the
best solution to such queries is based on the IR 2-tree, which, as shown in this paper, has a
few deficiencies that seriously impact its efficiency. Motivated by this, we develop a new
access method called the spatial inverted index that extends the conventional inverted
index to cope with multidimensional data, and comes with algorithms that can answer
nearest neighbor queries with keywords in real time. As verified by experiments, the
proposed techniques outperform the IR 2-tree in query response time significantly, often by
a factor of orders of magnitude.
Annotating Search Results from Web Databases..
Synopsis:
An increasing number of databases have become web accessible through HTML form-
based search interfaces. The data units returned from the underlying database are usually
encoded into the result pages dynamically for human browsing. For the encoded data units
to be machine processable, which is essential for many applications such as deep web data
collection and Internet comparison shopping, they need to be extracted out and assigned
meaningful labels. In this paper, we present an automatic annotation approach that first
aligns the data units on a result page into different groups such that the data in the same
group have the same semantic. Then, for each group we annotate it from different aspects
and aggregate the different annotations to predict a final annotation label for it. An
annotation wrapper for the search site is automatically constructed and can be used to
annotate new result pages from the same web database. Our experiments indicate that the
proposed approach is highly effective.
Credibility Ranking of Tweets during High Impact Events.
Synopsis:
Twitter has evolved from being a conversation or opinion sharing medium among friends
into a platform to share and disseminate information about current events. Events in the real
world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on
Twitter is trustworthy or useful in providing information about the event. In this paper, we
analyzed the credibility of information in tweets corresponding to fourteen high impact news
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17
Data mining for_java_and_dot_net 2016-17

More Related Content

What's hot

IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET Journal
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query RoutingSWAMI06
 
report
reportreport
reportbutest
 
IEEE 2014 Title's list for computer science students
IEEE 2014 Title's list for computer science studentsIEEE 2014 Title's list for computer science students
IEEE 2014 Title's list for computer science studentsgagnertechnologies
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsPvrtechnologies Nellore
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET Journal
 
The Champion Supervisor
The Champion SupervisorThe Champion Supervisor
The Champion SupervisorHassan Rizwan
 
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
IRJET - Efficient and Verifiable Queries over Encrypted Data in CloudIRJET Journal
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesAbhinav Shashank
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEEMEMTECHSTUDENTPROJECTS
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)theijes
 
Scalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsScalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsIJCSIS Research Publications
 
IRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET Journal
 
Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...journalBEEI
 
Provable data possession for securing the data from untrusted server
Provable data possession for securing the data from untrusted serverProvable data possession for securing the data from untrusted server
Provable data possession for securing the data from untrusted serverIJERA Editor
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...TELKOMNIKA JOURNAL
 
A Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search MethodA Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search MethodIRJET Journal
 

What's hot (20)

IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
IRJET- A Data Stream Mining Technique Dynamically Updating a Model with Dynam...
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
report
reportreport
report
 
IEEE 2014 Title's list for computer science students
IEEE 2014 Title's list for computer science studentsIEEE 2014 Title's list for computer science students
IEEE 2014 Title's list for computer science students
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
IRJET- Top-K Query Processing using Top Order Preserving Encryption (TOPE)
 
The Champion Supervisor
The Champion SupervisorThe Champion Supervisor
The Champion Supervisor
 
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
IRJET - Efficient and Verifiable Queries over Encrypted Data in Cloud
 
Innovaccer service capabilities with case studies
Innovaccer service capabilities with case studiesInnovaccer service capabilities with case studies
Innovaccer service capabilities with case studies
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
 
The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)The International Journal of Engineering and Science (IJES)
The International Journal of Engineering and Science (IJES)
 
Scalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsScalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled Applications
 
IRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline QueriesIRJET- Secure Data Access on Distributed Database using Skyline Queries
IRJET- Secure Data Access on Distributed Database using Skyline Queries
 
Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...Performance evaluation of decision tree classification algorithms using fraud...
Performance evaluation of decision tree classification algorithms using fraud...
 
Provable data possession for securing the data from untrusted server
Provable data possession for securing the data from untrusted serverProvable data possession for securing the data from untrusted server
Provable data possession for securing the data from untrusted server
 
An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...An adaptive clustering and classification algorithm for Twitter data streamin...
An adaptive clustering and classification algorithm for Twitter data streamin...
 
Sub1522
Sub1522Sub1522
Sub1522
 
A Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search MethodA Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
A Survey on Efficient Privacy-Preserving Ranked Keyword Search Method
 

Similar to Data mining for_java_and_dot_net 2016-17

2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )SBGC
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageIRJET Journal
 
IRJET- E-commerce Recommendation System
IRJET- E-commerce Recommendation SystemIRJET- E-commerce Recommendation System
IRJET- E-commerce Recommendation SystemIRJET Journal
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringConvetit
 
Secure computing for java and dot net
Secure computing for java and dot netSecure computing for java and dot net
Secure computing for java and dot netredpel dot com
 
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...SBGC
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )SBGC
 
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...SBGC
 
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )SBGC
 
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-students
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-studentsIeee projects-2013-2014-title-list-for-me-be-mphil-final-year-students
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-studentsPruthivi Rajan
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityHappiest Minds Technologies
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com msudan92
 
Service computing project list for java and dotnet
Service computing project list  for java and dotnetService computing project list  for java and dotnet
Service computing project list for java and dotnetredpel dot com
 
WebSite Visit Forecasting Using Data Mining Techniques
WebSite Visit Forecasting Using Data Mining  TechniquesWebSite Visit Forecasting Using Data Mining  Techniques
WebSite Visit Forecasting Using Data Mining TechniquesChandana Napagoda
 
Query-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentQuery-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentIRJET Journal
 
Data Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor ResearchData Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor ResearchFindWhitePapers
 
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDSSECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDSGyan Prakash
 

Similar to Data mining for_java_and_dot_net 2016-17 (20)

2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
2017 IEEE Projects 2017 For Cse ( Trichy, Chennai )
 
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug TriageSurvey on Software Data Reduction Techniques Accomplishing Bug Triage
Survey on Software Data Reduction Techniques Accomplishing Bug Triage
 
IRJET- E-commerce Recommendation System
IRJET- E-commerce Recommendation SystemIRJET- E-commerce Recommendation System
IRJET- E-commerce Recommendation System
 
Sub1582
Sub1582Sub1582
Sub1582
 
Top 8 Trends in Performance Engineering
Top 8 Trends in Performance EngineeringTop 8 Trends in Performance Engineering
Top 8 Trends in Performance Engineering
 
Secure computing for java and dot net
Secure computing for java and dot netSecure computing for java and dot net
Secure computing for java and dot net
 
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
 
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
Ieee Projects 2013 for Cse @ Seabirds(Trichy, Pudukkottai, Perambalur, Thanja...
 
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )
Bulk Ieee Projects 2013 @ Seabirds ( Chennai, Trichy, Hyderabad, Pune, Mumbai )
 
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-students
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-studentsIeee projects-2013-2014-title-list-for-me-be-mphil-final-year-students
Ieee projects-2013-2014-title-list-for-me-be-mphil-final-year-students
 
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com
 
Service computing project list for java and dotnet
Service computing project list  for java and dotnetService computing project list  for java and dotnet
Service computing project list for java and dotnet
 
WebSite Visit Forecasting Using Data Mining Techniques
WebSite Visit Forecasting Using Data Mining  TechniquesWebSite Visit Forecasting Using Data Mining  Techniques
WebSite Visit Forecasting Using Data Mining Techniques
 
Query-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated DocumentQuery-Based Retrieval of Annotated Document
Query-Based Retrieval of Annotated Document
 
Data Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor ResearchData Migration: A White Paper by Bloor Research
Data Migration: A White Paper by Bloor Research
 
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDSSECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
 

More from redpel dot com

An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsredpel dot com
 
Validation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netValidation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netredpel dot com
 
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...redpel dot com
 
Towards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceTowards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceredpel dot com
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureredpel dot com
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacyredpel dot com
 
Privacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsPrivacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsredpel dot com
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...redpel dot com
 
Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...redpel dot com
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...redpel dot com
 
Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...redpel dot com
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com
 
Bayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersBayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersredpel dot com
 
Architecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkArchitecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkredpel dot com
 
Analysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingAnalysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingredpel dot com
 
An anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingAn anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingredpel dot com
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
 
A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...redpel dot com
 
A mobile offloading game against smart attacks
A mobile offloading game against smart attacksA mobile offloading game against smart attacks
A mobile offloading game against smart attacksredpel dot com
 

More from redpel dot com (20)

An efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of thingsAn efficient tree based self-organizing protocol for internet of things
An efficient tree based self-organizing protocol for internet of things
 
Validation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri netValidation of pervasive cloud task migration with colored petri net
Validation of pervasive cloud task migration with colored petri net
 
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...
 
Towards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduceTowards a virtual domain based authentication on mapreduce
Towards a virtual domain based authentication on mapreduce
 
Toward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architectureToward a real time framework in cloudlet-based architecture
Toward a real time framework in cloudlet-based architecture
 
Protection of big data privacy
Protection of big data privacyProtection of big data privacy
Protection of big data privacy
 
Privacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applicationsPrivacy preserving and delegated access control for cloud applications
Privacy preserving and delegated access control for cloud applications
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...Frequency and similarity aware partitioning for cloud storage based on space ...
Frequency and similarity aware partitioning for cloud storage based on space ...
 
Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...Multiagent multiobjective interaction game system for service provisoning veh...
Multiagent multiobjective interaction game system for service provisoning veh...
 
Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...Efficient multicast delivery for data redundancy minimization over wireless d...
Efficient multicast delivery for data redundancy minimization over wireless d...
 
Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...Cloud assisted io t-based scada systems security- a review of the state of th...
Cloud assisted io t-based scada systems security- a review of the state of th...
 
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storageI-Sieve: An inline High Performance Deduplication System Used in cloud storage
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
 
Bayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centersBayes based arp attack detection algorithm for cloud centers
Bayes based arp attack detection algorithm for cloud centers
 
Architecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog networkArchitecture harmonization between cloud radio access network and fog network
Architecture harmonization between cloud radio access network and fog network
 
Analysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computingAnalysis of classical encryption techniques in cloud computing
Analysis of classical encryption techniques in cloud computing
 
An anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computingAn anomalous behavior detection model in cloud computing
An anomalous behavior detection model in cloud computing
 
A tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big dataA tutorial on secure outsourcing of large scalecomputation for big data
A tutorial on secure outsourcing of large scalecomputation for big data
 
A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...A parallel patient treatment time prediction algorithm and its applications i...
A parallel patient treatment time prediction algorithm and its applications i...
 
A mobile offloading game against smart attacks
A mobile offloading game against smart attacksA mobile offloading game against smart attacks
A mobile offloading game against smart attacks
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 

Data mining for_java_and_dot_net 2016-17

  • 1. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Data Mining for java/ dot net Check following Projects ,also check if any spelling mistakes before showing to your Guide: Use of FCM & fuzzy min max algorithm in lung cancer Abstract— Lung cancer is a disease characterized by uncontrolled cell growth in tissues of the lung and is the most common fatal malignancy in both men and women. Early detection and treatment of lung cancer can greatly improve the survival rate of patient. Artificial Neural Network (ANN), Fuzzy C-Mean (FcM) and Fuzzy Min-Max Neural network (FMNN) are useful in medical diagnosis because of several advantages. Like ANN has fault tolerance, flexibility, non linearity, while FcM gives best result for overlapped data set, data point may belong to more then one cluster center and always converges .and , also, FMNN has advantages like online adaptation, non-linear separability, less training time, soft and hard decision. In this work, we propose to use FcM and FMNN on standard datasets, to detect lung cancer Systematic prediction of keywords over IMDB Database. ABSTRACT: Keyword queries on databases provide easy access to data, but often suffer from low ranking quality, i.e., low precision and/or recall, as shown in recent benchmarks. It would be useful to identify queries that are likely to have low ranking quality to improve the user satisfaction. For instance, the system may suggest to the user alternative queries for such hard queries. In this paper, we analyze the characteristics of hard queries and propose a novel framework to measure the degree of difficulty for a keyword query over a database, considering both the structure and the content of the database and the query results. We evaluate our query difficulty prediction model against two effectiveness benchmarks for popular keyword search ranking methods. Our empirical results show that our model predicts the hard queries with high accuracy. Further, we present a suite of optimizations to minimize the incurred time overhead.
  • 2. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Performance Evaluation and Estimation Model Using Regression Method for Hadoop WordCount . ABSTRACT : Given the rapid growth in cloud computing, it is important to analyze the performance of different Hadoop MapReduce applications and to understand the performance bottleneck in a cloud cluster that contributes to higher or lower performance. It is also important to analyze the underlying hardware in cloud cluster servers to enable the optimization of software and hardware to achieve the maximum performance possible. Hadoop is based on MapReduce, which is one of the most popular programming models for big data analysis in a parallel computing environment. In this paper, we present a detailed performance analysis, characterization, and evaluation of Hadoop MapReduce WordCount application. We also propose an estimation model based on Amdahl's law regression method to estimate performance and total processing time versus different input sizes for a given processor architecture. The estimation regression model is veried to estimate performance and run time with an error margin of <5%. An Efficient Privacy-Preserving Ranked Keyword Search Method Abstract —Cloud data owners prefer to outsource documents in an encrypted form for the purpose of privacy preserving. Therefore it is essential to develop efficient and reliable ciphertext search techniques. One challenge is that the relationship between documents will be normally concealed in the process of encryption, which will lead to significant search accuracy performance degradation. Also the volume of data in data centers has experienced a dramatic growth. This will make it even more challenging to design ciphertext search schemes that can provide efficient and reliable online information retrieval on large volume of encrypted data. In this paper, a hierarchical clustering method is proposed to support more search semantics and also to meet the demand for fast ciphertext search within a big data environment. The proposed hierarchical approach clusters the documents based on the minimum relevance threshold, and then partitions the resulting clusters into sub-clusters until the constraint on the maximum size of cluster is reached. In the search phase, this approach can reach a linear computational complexity against an exponential size increase of document collection. In order to verify the authenticity of search results, a structure called minimum hash sub-tree is designed in this paper. Experiments have been conducted using the collection set built from the IEEE Xplore. The results show that with a
  • 3. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com sharp increase of documents in the dataset the search time of the proposed method increases linearly whereas the search time of the traditional method increases exponentially. Furthermore, the proposed method has an advantage over the traditional method in the rank privacy and relevance of retrieved documents. PRISM: PRivacy-aware Interest Sharing and Matching in Mobile Social Networks. Abstract —In a profile matchmaking application of mobile social networks, users need to reveal their interests to each other in order to find the common interests. A malicious user may harm a user by knowing his personal information. Therefore, mutual interests need to be found in a privacy preserving manner. In this paper, we propose an efficient privacy protection and interests sharing protocol referred to as PRivacy-aware Interest Sharing and Matching (PRISM). PRISM enables users to discover mutual interests without revealing their interests. Unlike existing approaches, PRISM does not require revealing the interests to a trusted server. Moreover, the protocol considers attacking scenarios that have not been addressed previously and provides an efficient solution. The inherent mechanism reveals any cheating attempt by a malicious user. PRISM also proposes the procedure to eliminate Sybil attacks. We analyze the security of PRISM against both passive and active attacks. Through implementation, we also present a detailed analysis of the performance of PRISM and compare it with existing approaches. The results show the effectiveness of PRISM without any significant performance degradation. Mapping Bug Reports to Relevant Files using instant selection and feature selection. Abstract— Open source projects for example Eclipse and Firefox have open source bug repositories. User reports bugs to these repositories. Users of these repositories are usually non- technical and cannot assign correct class to these bugs. Triaging of bugs, to developer, to fix them is a tedious and time consuming task. Developers are usually expert in particular areas. For example, few developers are expert in GUI and others are in java functionality. Assigning a particular bug to relevant developer could save time and would help to maintain the interest level of developers by assigning bugs
  • 4. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com according to their interest. However, assigning right bug to right developer is quite difficult for tri-ager without knowing the actual class, the bug belongs to. In this research, we have classified the bugs in different labels on the basis of summary of the bug. Multinomial Naïve Bayes text classifier is used for classification purpose. For feature selection, Chi-Square and TFIDF algorithms were used. Using Naïve Bayes and Chi- square, we get average of 83 % accuracy. Inference Patterns from Big Data using Aggregation, Filtering and Tagging- A Survey. Abstract : This paper reviews various approaches to infer the patterns from Big Data using aggregation, filtering and tagging. Earlier research shows that data aggregation concerns about gathered data and how efficiently it can be utilized. It is understandable that at the time of data gathering one does not care much about whether the gathered data will be useful or not. Hence, filtering and tagging of the data are the crucial steps in collecting the relevant data to fulfill the need. Therefore the main goal of this paper is to present a detailed and comprehensive survey on different approaches. To make the concept clearer, we have provided a brief introduction of Big Data, how it works, working of two data aggregation tools (namely, flume and sqoop), data processing tools (hive and mahout) and various algorithms that can be useful to understand the topic. At last we have included comparisons between aggregation tools, processing tools as well as various algorithms through its pre- process, matching time, results and reviews. Outsourced Similarity Search on Metric Data Assets. ABSTRACT: This paper considers a cloud computing setting in which similarity querying of metric data is outsourced to a service provider. The data is to be revealed only to trusted users, not to the service provider or anyone else. Users query the server for the most similar data objects to a query example. Outsourcing offers the data owner scalability and a low-initial investment. The need for privacy may be due to the data being sensitive (e.g., in medicine), valuable (e.g., in astronomy), or otherwise confidential. Given this setting, the paper presents techniques that transform the data prior to supplying it to the service provider for similarity queries on the transformed data. Our techniques provide interesting trade-offs between query cost and accuracy. They are then further extended to offer an intuitive privacy guarantee. Empirical studies with real data demonstrate that the techniques are capable of offering privacy while enabling efficient and accurate processing of similarity queries.
  • 5. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com CCD: A Distributed Publish/Subscribe Framework for Rich Content Formats. Abstract: In this paper, we propose a content-based publish/subscribe (pub/sub) framework that delivers matching content to subscribers in their desired format. Such a framework enables the pub/sub system to accommodate richer content formats including multimedia publications with image and video content. In our proposed framework, users (consumers) in addition to specifying their information needs (subscription queries), also specify their profile which includes the information about their receiving context which includes characteristics of the device used to receive the content (e.g., resolution of a PDA used by a consumer). The pub/sub system besides being responsible for matching and routing the published content, also becomes responsible for converting the content into the suitable format for each user. Content conversion is achieved through a set of content adaptation operators (e.g., image transcoder, document translator, etc.). We study algorithms for placement of such operators in heterogeneous pub/sub broker overlay in order to minimize the communication and computation resource consumption. Our experimental results show that careful placement of operators in pub/sub overlay network results in significant cost reduction. Measuring the Sky: On Computing Data Cubes via Skylining the Measures. ABSTRACT:Data cube is a key element in supporting fast OLAP. Traditionally, an aggregate function is used to compute the values in data cubes. In this paper, we extend the notion of data cubes with a new perspective. Instead of using an aggregate function, we propose to build data cubes using the skyline operation as the ―aggregate function.‖ Data cubes built in this way are called ―group-by skyline cubes‖ and can support a variety of analytical tasks. Nevertheless, there are several challenges in implementing group-by skyline cubes in data warehouses: 1) the skyline operation is computational intensive, 2) the skyline operation is holistic, and 3) a group-by skyline cube contains both grouping and skyline dimensions, rendering it infeasible to pre-compute all cuboids in advance. This paper gives details on how to store, materialize, and query such cubes.
  • 6. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Finding Frequently Occurring Item set Pair On Big Data. Abstract—Frequent Itemset Mining (FIM) is one of the most well known techniques to extract knowledge from data. The combinatorial explosion of FIM methods become even more problematic when they are applied to Big Data. Fortunately, recent improvements in the field of parallel programming already provide good tools to tackle this problem. However, these tools come with their own technical challenges, e.g. balanced data distribution and inter-communication costs. In this paper, we investigate the applicability of FIM techniques on the MapReduce platform. We introduce two new methods for mining large datasets: Dist-Eclat focuses on speed while BigFIM is optimized to run on really large datasets. In our experiments we show the scalability of our methods. Mining Social Media for Understanding Students’ Learning Experiences. Abstract—Students’ informal conversations on social media (e.g., Twitter, Facebook) shed light into their educational experiences— opinions, feelings, and concerns about the learning process. Data from such uninstrumented environments can provide valuable knowledge to inform student learning. Analyzing such data, however, can be challenging. The complexity of students’ experiences reflected from social media content requires human interpretation. However, the growing scale of data demands automatic data analysis techniques. In this paper, we developed a workflow to integrate both qualitative analysis and large-scale data mining techniques. We focused on engineering students’ Twitter posts to understand issues and problems in their educational experiences. We first conducted a qualitative analysis on samples taken from about 25,000 tweets related to engineering students’ college life. We found engineering students encounter problems such as heavy study load, lack of social engagement, and sleep deprivation. Based on these results, we implemented a multi-label classification algorithm to classify tweets reflecting students’ problems. We then used the algorithm to train a detector of student problems from about 35,000 tweets streamed at the geo-location of Purdue University. This work, for the first time, presents a methodology and results that show how informal social media data can provide insights into students’ experiences. Private Search and Content Protecting Location Based Queries on google map.
  • 7. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com ABSTRACT: In this paper we present a solution to one of the location-based query problems. This problem is defined as follows: (i) a user wants to query a database of location data, known as Points Of Interest (POIs), and does not want to reveal his/her location to the server due to privacy concerns; (ii) the owner of the location data, that is, the location server, does not want to simply distribute its data to all users. The location server desires to have some control over its data, since the data is its asset. We propose a major enhancement upon previous solutions by introducing a two stage approach, where the first step is based on Oblivious Transfer and the second step is based on Private Information Retrieval, to achieve a secure solution for both parties. The solution we present is efficient and practical in many scenarios. We implement our solution on a desktop machine and a mobile device to assess the efficiency of our protocol. We also introduce a security model and analyse the security in the context of our protocol. Finally, we highlight a security weakness of our previous work and present a solution to overcome it. CLUSTBIGFIM-FREQUENT ITEMSET MINING OF BIG DATA USING PRE-PROCESSING BASED ON MAPREDUCE FRAMEWORK. ABSTRACT: Now a day enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using kmeans algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique
  • 8. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Clustering and Sequential Pattern Mining of Online Collaborative Learning Data. Abstract : Group work is widespread in education. The growing use of online tools supporting group work generates huge amounts of data. We aim to exploit this data to support mirroring: presenting useful high-level views of information about the group, together with desired patterns characterizing the behavior of strong groups. The goal is to enable the groups and their facilitators to see relevant aspects of the group's operation and provide feedback if these are more likely to be associated with positive or negative outcomes and indicate where the problems are. We explore how useful mirror information can be extracted via a theory-driven approach and a range of clustering and sequential pattern mining. The context is a senior software development project where students use the collaboration tool TRAC. We extract patterns distinguishing the better from the weaker groups and get insights in the success factors. The results point to the importance of leadership and group interaction, and give promising indications if they are occurring. Patterns indicating good individual practices were also identified. We found that some key measures can be mined from early data. The results are promising for advising groups at the start and early identification of effective and poor practices, in time for remediation. Monitoring online Test. Abstract : E-testing systems are widely adopted in academic environments, as well as in combination with other assessment means, providing tutors with powerful tools to submit different types of tests in order to assess learners’ knowledge. Among these, multiple- choice tests are extremely popular, since they can be automatically corrected. However, many learners do not welcome this type of test, because often, it does not let them properly express their capacity, due to the characteristics of multiple-choice questions of being closed-ended. Even many examiners doubt about the real effectiveness of structured tests in assessing learners’ knowledge, and they wonder whether learners are more conditioned by the question type than by its actual difficulty. In this project, we propose a data exploration approach exploiting information visualization in order to involve tutors in a visual data mining process aiming to detect structures, patterns, and relations between data, which can potentially reveal previously unknown knowledge inherent in tests, such as the test strategies used by the learners, correlations among different questions, and many other aspects, including their impact on the final score .It captures the occurrence of question browsing and answering events by the learners and uses these data to visualize charts containing a chronological review of
  • 9. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com tests. Other than identifying the most frequently employed strategies, the tutor can determine their effectiveness by correlating their use with the final test scores. profile matching in social networking. ABSTRACT : In this paper, we study user profile matching with privacy-preservation in mobile social networks (MSNs) and introduce a family of novel profile matching protocols. We first propose an explicit Comparison-based Profile Matching protocol (eCPM) which runs between two parties, an initiator and a responder. The eCPM enables the initiator to obtain the comparison-based matching result about a specified attribute in their profiles, while preventing their attribute values from disclosure. We then propose an implicit Comparison-based Profile Matching protocol (iCPM) which allows the initiator to directly obtain some messages instead of the comparison result from the responder. The messages unrelated to user profile can be divided into multiple categories by the responder. The initiator implicitly chooses the interested category which is unknown to the responder. Two messages in each category are prepared by the responder, and only one message can be obtained by the initiator according to the comparison result on a single attribute. We further generalize the iCPM to an implicit Predicate-based Profile Matching protocol (iPPM) which allows complex comparison criteria spanning multiple attributes. The anonymity analysis shows all these protocols achieve the confidentiality of user profiles. In addition, the eCPM reveals the comparison result to the initiator and provides only conditional anonymity; the iCPM and the iPPM do not reveal the result at all and provide full anonymity. We analyze the communication overhead and the anonymity strength of the protocols. Analysis of twitter trends based on key detection and link detection. ABSTRACT: Detection of emerging topics is now receiving renewed interest motivated by the rapid growth of social networks. Conventional-term-frequency-based approaches may not be appropriate in this context, because the information exchanged in social-network posts include not only text but also images, URLs, and videos. We focus on emergence of topics signaled by social aspects of theses networks. Specifically, we focus on mentions of user links between users that are generated dynamically (intentionally or unintentionally) through replies, mentions, and retweets. We propose a probability model of the mentioning behavior of a social network user, and propose to detect the emergence of a new topic from the
  • 10. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com anomalies measured through the model. Aggregating anomaly scores from hundreds of users, we show that we can detect emerging topics only based on the reply/mention relationships in social-network posts. We demonstrate our technique in several real data sets we gathered from Twitter. The experiments show that the proposed mention-anomaly- based approaches can detect new topics at least as early as text-anomaly-based approaches, and in some cases much earlier when the topic is poorly identified by the textual contents in posts. Big Data Frequent Pattern Mining. Abstract : Frequent pattern mining is an essential data mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called ―Big Data‖. Scalable parallel algorithms hold the key to solving the problem in this context. In this chapter, we review recent advances in parallel frequent pattern mining, analyzing them through the Big Data lens. We identify three areas as challenges to designing parallel frequent pattern mining algorithms: memory scalability, work partitioning, and load balancing. With these challenges as a frame of reference, we extract and describe key algorithmic design patterns from the wealth of research conducted in this domain. Bootstrapping Privacy ontology for web services. ABSTRACT: Ontologies have become the de-facto modeling tool of choice, employed in many applications and prominently in the semantic web. Nevertheless, ontology construction remains a daunting task. Ontological bootstrapping, which aims at automatically generating concepts and their relations in a given domain, is a promising technique for ontology construction. Bootstrapping an ontology based on a set of predefined textual sources, such as web services, must address the problem of multiple, largely unrelated concepts. In this paper, we propose an ontology bootstrapping process for web services. We exploit the advantage that web services usually consist of both WSDL and free text descriptors. The WSDL descriptor is evaluated using two methods, namely Term Frequency/Inverse Document Frequency (TF/IDF) and web context generation. Our proposed ontology bootstrapping process integrates the results of both methods and applies a third method to validate the concepts using the service free text descriptor, thereby offering a more accurate definition of ontologies. We extensively validated our bootstrapping method using a large repository of real-world web services and verified the results against existing ontologies. The experimental results indicate high precision.
  • 11. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Furthermore, the recall versus precision comparison of the results when each method is separately implemented presents the advantage of our integrated bootstrapping approach Then and Now: On the Maturity of the Cybercrime Markets. ABSTRACT: Due to the rise and rapid growth of E-Commerce, use of credit cards for online purchases has dramatically increased and it caused an explosion in the credit card fraud. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising. In real life, fraudulent transactions are scattered with genuine transactions and simple pattern matching techniques are not often sufficient to detect those frauds accurately. Implementation of efficient fraud detection systems has thus become imperative for all credit card issuing banks to minimize their losses. Many modern techniques based on Artificial Intelligence, Data mining, Fuzzy logic, Machine learning, Sequence Alignment, Genetic Programming etc., has evolved in detecting various credit card fraudulent transactions. A clear understanding on all these approaches will certainly lead to an efficient credit card fraud detection system. This paper presents a survey of various techniques used in credit card fraud detection mechanisms and evaluates each methodology based on certain design criteria. Social Set Analysis: A Set Theoretical Approach to Big Data Analytics. ABSTRACT : Current analytical approaches in computational social science can be characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis (graph theory), social complexity analysis (complex systems science), and social simulations (cellular automata and agent-based modeling). However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualize, model, analyze, explain, and predict social media interactions as individuals’ associations with ideas, values, identities, and so on. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called social set analysis. Social set analysis consists of a generative framework for the philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social data sets with organizational and societal data
  • 12. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com sets. Three empirical studies of big social data are presented to illustrate and demonstrate social set analysis in terms of fuzzy set-theoretical sentiment analysis, crisp set-theoretical interaction analysis, and eventstudies-oriented set-theoretical visualizations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined. Personalized Travel Sequence Recommendation on Multi-Source Big Social Media. ABSTRACT: Recent years have witnessed an increased interest in recommender systems. Despite significant progress in this field, there still remain numerous avenues to explore. Indeed, this paper provides a study of exploiting online travel information for personalized travel package recommendation. A critical challenge along this line is to address the unique characteristics of travel data, which distinguish travel packages from traditional items for recommendation. To that end, in this paper, we first analyze the characteristics of the existing travel packages and develop a tourist-area-season topic (TAST) model. This TAST model can represent travel packages and tourists by different topic distributions, where the topic extraction is conditioned on both the tourists and the intrinsic features (i.e., locations, travel seasons) of the landscapes. Then, based on this topic model representation, we propose a cocktail approach to generate the lists for personalized travel package recommendation. Furthermore, we extend the TAST model to the tourist-relation-area- season topic (TRAST) model for capturing the latent relationships among the tourists in each travel group. Finally, we evaluate the TAST model, the TRAST model, and the cocktail recommendation approach on the real-world travel package data. Experimental results show that the TAST model can effectively capture the unique characteristics of the travel data and the cocktail approach is, thus, much more effective than traditional recommendation techniques for travel package recommendation. Also, by considering tourist relationships, the TRAST model can be used as an effective assessment for travel group formation. A Parallel Patient Treatment Time Prediction Algorithm and Its Applications in Hospital Queuing- Recommendation in a Big Data Environment. Abstract : There is a need of continuous monitoring of vitalparameters of patient at critical situation. The current scenario in hospital has a digital display for such parameters which is observed by nurse. For such monitoring a dedicated person(nurse) is required. But looking
  • 13. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com at the growing population this ratio of one nurse per patient would be a considerable probable in future. So manually monitoring the patient should be replaced by some other method. Online monitoring has attracted considerable attraction for many years. It includes the applications which are not only limited up to industrial process monitoring and control but has been extended up to civilian application areas like healthcare application, home automation, traffic control etc. This paper discusses the feasibility of Instant Notification System in Heterogeneous Sensor Network with Deployment of XMPP Protocol for medical application. The system aims to provide an environment which enables medical practitioners to distantly monitor various vital parameters of patients. For academic purpose we have limited this system for use of monitoring patients’ body temperature and blood pressure. The proposed system collects data from various heterogeneous sensor networks – for example: patients’ body temperature, and blood pressure - converts it to a standard packet and provides the facility to send it over a network using Extensible Messaging and Presence Protocol (XMPP)- (in more common terms Instant Messaging (IM)). Use of heterogeneous sensor networks (HSN) provides the much required platform independence, while XMPP enables the instant notification Relevance Feature Discovery for Text Mining. Abstract —It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term- based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods. A Novel Methodology of Frequent Itemset Mining on Hadoop.
  • 14. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Abstract — Frequent Itemset Mining is one of the classical data mining problems in most of the data mining applications. It requires very large computations and I/O traffic capacity. Also resources like single processor’s memory and CPU are very limited, which degrades the performance of algorithm. In this paper we have proposed one such distributed algorithm which will run on Hadoop – one of the recent most popular distributed frameworks which mainly focus on mapreduce paradigm. The proposed approach takes into account inherent characteristics of the Apriori algorithm related to the frequent itemset generation and through a block-based partitioning uses a dynamic workload management. The algorithm greatly enhances the performance and achieves high scalability compared to the existing distributed Apriori based approaches. Proposed algorithm is implemented and tested on large scale datasets distributed over a cluster. Online java compiler Abstract As it is a competitive world and very fast world, every thing in the universes is to be internet. In this internet world all the things are on-line. So we created software called ―On-line java compiler with security editor‖. The main aim of this project we can easily to write a java program and compile it and debug in on-line. The client machine doesn’t having java development kit .The client machine only connected to the server. The server having java compiler .so server executes the java code and produce the error message to the appropriate client machine. In this project is also creating a security editor. This editor performs Encrypt and decrypts the file. Encryption and decryption process perform using RSA Algorithms. There is lot of security algorithms are there, but RSA algorithm is very efficient to encrypt and decrypt the file. In this project is used to view all type of java API .It is very useful for writing the java program easily, for example if any error in the format of API means we can able to view API throw this modules. A Cloud Service Architecture for Analyzing Big Monitoring Abstract: Cloud monitoring is of a source of big data that are constantly produced from traces of infrastructures, platforms, and applications. Analysis of monitoring data delivers
  • 15. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com insights of the system’s workload and usage pattern and ensures workloads are operating at optimum levels. The analysis process involves data query and extraction, data analysis, and result visualization. Since the volume of monitoring data is big, these operations require a scalable and reliable architecture to extract, aggregate, and analyze data in an arbitrary range of granularity. Ultimately, the results of analysis become the knowledge of the system and should be shared and communicated. This paper presents our cloud service architecture that explores a search cluster for data indexing and query. We develop REST APIs that the data can be accessed by different analysis modules. This architecture enables extensions to integrate with software frameworks of both batch processing (such as Hadoop) and stream processing (such as Spark) of big data. The analysis results are structured in Semantic Media Wiki pages in the context of the monitoring data source and the analysis process. This cloud architecture is empirically assessed to evaluate its responsiveness when processing a large set of data records under node failures. A Tutorial on Secure Outsourcing of Large-scale Computations for Big Data. ABSTRACT: Today's society is collecting a massive and exponentially growing amount of data that can potentially revolutionize scientic and engineering elds, and promote business innovations.With the advent of cloud computing, in order to analyze data in a cost-effective and practical way, users can outsource their computing tasks to the cloud, which offers access to vast computing resources on an on-demand and pay-per-use basis. However, since users' data contains sensitive information that needs to be kept secret for ethical, security, or legal reasons, many users are reluctant to adopt cloud computing. To this end, researchers have proposed techniques that enable users to ofoad computations to the cloud while protecting their data privacy. In this paper, we review the recent advances in the secure outsourcing of large-scale computations for a big data analysis. We rst introduce two most fundamental and common computational problems, i.e., linear algebra and optimization, and then provide an extensive review of the data privacy preserving techniques. After that, we explain how researchers have exploited the data privacy preserving techniques to construct secure outsourcing algorithms for large-scale computations. Protection of Big Data Privacy.
  • 16. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com ABSTRACT : In recent years, big data have become a hot research topic. The increasing amount of big data also increases the chance of breaching the privacy of individuals. Since big data require high computational power and large storage, distributed systems are used. As multiple parties are involved in these systems, the risk of privacy violation is increased. There have been a number of privacy-preserving mechanisms developed for privacy protection at different stages (e.g., data generation, data storage, and data processing) of a big data life cycle. The goal of this paper is to provide a comprehensive overview of the privacy preservation mechanisms in big data and present the challenges for existing mechanisms. In particular, in this paper, we illustrate the infrastructure of big data and the state-of-the-art privacy- preserving mechanisms in each stage of the big data life cycle. Furthermore, we discuss the challenges and future research directions related to privacy preservation in big data. Towards a Virtual Domain Based authentication on mapreduce. ABSTRACT : This paper has proposed a novel authentication solution for the MapReduce (MR) model, a new distributed and parallel computing paradigm commonly deployed to process BigData by major IT players, such as Facebook and Yahoo. It identies a set of security, performance, and scalability requirements that are specied from a comprehensive study of a job execution process using MR and security threats and attacks in this environment. Based on the requirements, it critically analyzes the state-of-the-art authentication solutions, discovering that the authentication services currently proposed for the MR model is not adequate. This paper then presents a novel layered authentication solution for the MR model and describes the core components of this solution, which includes the virtual domain based authentication framework (VDAF). These novel ideas are signicant, because, rst, the approach embeds the characteristics of MR-in-cloud deployments into security solution designs, and this will allow the MR model be delivered as a software as a service in a public cloud environment along with our proposed authentication solution; second, VDAF supports the authentication of every interactions by any MR components involved in a job execution ow, so long as the interactions are for accessing resources of the job; third, this continuous authentication service is provided in such a manner that the costs incurred in providing the authentication service should be as low as possible.
  • 17. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Predicting Instructor Performance Using Data Mining Techniques in Higher Education. ABSTRACT : Data mining applications are becoming a more common tool in understanding and solving educational and administrative problems in higher education. In general, research in educational mining focuses on modeling student's performance instead of instructors' performance. One of the common tools to evaluate instructors' performance is the course evaluation questionnaire to evaluate based on students' perception. In this paper, four different classication techniquesdecision tree algorithms, support vector machines, articial neural networks, and discriminant analysisare used to build classier models. Their performances are compared over a data set composed of responses of students to a real course evaluation questionnaire using accuracy, precision, recall, and specicity performance metrics. Although all the classier models show comparably high classication performances, C5.0 classier is the best with respect to accuracy, precision, and specicity. In addition, an analysis of the variable importance for each classier model is done. Accordingly, it is shown that many of the questions in the course evaluation questionnaire appear to be irrelevant. Furthermore, the analysis shows that the instructors' success based on the students' perception mainly depends on the interest of the students in the course. The ndings of this paper indicate the effectiveness and expressiveness of data mining models in course evaluation and higher education mining. Moreover, these ndings may be used to improve the measurement instruments. Intra- and Inter-Fractional Variation Prediction of Lung Tumors Using Fuzzy Deep Learning. ABSTRACT : Tumor movements should be accurately predicted to improve delivery accuracy and reduce unnecessary radiation exposure to healthy tissue during radiotherapy. The tumor movements pertaining to respiration are divided into intra-fractional variation occurring in a single treatment session and inter- fractional variation arising between different sessions. Most studies of patients' respiration movements deal with intra-fractional variation. Previous studies on inter-fractional variation are hardly mathematized and cannot predict movements well due to inconstant variation. Moreover, the computation time of the
  • 18. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com prediction should be reduced. To overcome these limitations, we propose a new predictor for intra- and inter-fractional data variation, called intra- and inter-fraction fuzzy deep learning (IIFDL), where FDL, equipped with breathing clustering, predicts the movement accurately and decreases the computation time. Through the experimental results, we validated that the IIFDL improved root-mean-square error (RMSE) by 29.98% and prediction overshoot by 70.93%, compared with existing methods. The results also showed that the IIFDL enhanced the average RMSE and overshoot by 59.73% and 83.27%, respectively. In addition, the average computation time of IIFDL was 1.54 ms for both intra- and inter-fractional variation, which was much smaller than the existing methods. Therefore, the proposed IIFDL might achieve real-time estimation as well as better tracking techniques in radiotherapy. Web Service Personalized Quality of Service Prediction via Reputation-Based Matrix Factorization. Abstract—With the fast development of Web services in service oriented systems, the requirement of efficient Quality of Service (QoS) evaluation methods becomes strong. However, many QoS values are unknown in reality. Therefore, it is necessary to predict the unknown QoS values of Web services based on the obtainable QoS values. Generally, the QoS values of similar users are employed to make predictions for the current user. However, the QoS values may be contributed from unreliable users, leading to inaccuracy of the prediction results. To address this problem, we present a highly credible approach, called reputation-based Matrix Factorization (RMF), for predicting the unknown Web service QoS values. RMF first calculates the reputation of each user based on their contributed QoS values to quantify the credibility of users, and then takes the users' reputation into consideration for achieving more accurate QoS prediction. Reputation-based matrixfactorization is applicable to the prediction of QoS data in the presence of unreliable user-provided QoS values. Extensive experiments are conducted with real-world Web service QoS data sets, and the experimental results show that our proposed approach outperforms other existing approaches. A Supermodularity-Based Differential Privacy Preserving Algorithm for Data Anonymization. Maximizing data usage and minimizing privacy risk are two conflicting goals. Organizations always apply a set of transformations on their data before releasing it. While determining the best set of transformations has been the focus of extensive work in the database
  • 19. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com community, most of this work suffered from one or both of the following major problems: scalability and privacy guarantee. Differential Privacy provides a theoretical formulation for privacy that ensures that the system essentially behaves the same way regardless of whether any individual is included in the database. In this paper, we address both scalability and privacy risk of data anonymization. We propose a scalable algorithm that meets differential privacy when applying a specific random sampling. The contribution of the paper is two-fold: 1) we propose a personalized anonymization technique based on an aggregate formulation and prove that it can be implemented in polynomial time; and 2) we show that combining the proposed aggregate formulation with specific sampling gives an anonymization algorithm that satisfies differential privacy. Our results rely heavily on exploring the supermodularity properties of the risk function, which allow us to employ techniques from convex optimization. Through experimental studies we compare our proposed algorithm with other anonymization schemes in terms of both time and privacy risk. A Data-Mining Model for Protection of FACTS-Based Transmission Line. Synopsis: This paper presents a data-mining model for fault-zone identification of a flexible ac transmission systems (FACTS)-based transmission line including a thyristor-controlled series compensator (TCSC) and unified power-flow controller (UPFC), using ensemble decision trees. Given the randomness in the ensemble of decision trees stacked inside the random forests model, it provides effective decision on fault-zone identification. Half-cycle postfault current and voltage samples from the fault inception are used as an input vector against target output ―1‖ for the fault after TCSC/UPFC and ― - 1‖ for the fault before TCSC/UPFC for fault-zone identification. The algorithm is tested on simulated fault data with wide variations in operating parameters of the power system network, including noisy environment providing a reliability measure of 99% with faster response time (3/4th cycle from fault inception). The results of the presented approach using the RF model indicate reliable identification of the fault zone in FACTS-based transmission lines. A Temporal Pattern Search Algorithm for Personal History Event Visualization. Synopsis: We present Temporal Pattern Search (TPS), a novel algorithm for searching for temporal patterns of events in historical personal histories. The traditional method of searching for such patterns uses an automaton-based approach over a single array of events, sorted by
  • 20. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com time stamps. Instead, TPS operates on a set of arrays, where each array contains all events of the same type, sorted by time stamps. TPS searches for a particular item in the pattern using a binary search over the appropriate arrays. Although binary search is considerably more expensive per item, it allows TPS to skip many unnecessary events in personal histories. We show that TPS's running time is bounded by O(m2n lg(n)), where m is the length of (number of events) a search pattern, and n is the number of events in a record (history). Although the asymptotic running time of TPS is inferior to that of a nondeterministic finite automaton (NFA) approach (O(mn)), TPS performs better than NFA under our experimental conditions. We also show TPS is very competitive with Shift-And, a bit-parallel approach, with real data. Since the experimental conditions we describe here subsume the conditions under which analysts would typically use TPS (i.e., within an interactive visualization program), we argue that TPS is an appropriate design choice for us. Adaptive Cluster Distance Bounding for High Dimensional Indexing. Synopsis: We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. We note that indexing by ―vector approximation‖ (VA-File), which was proposed as a technique to combat the ―Curse of Dimensionality,‖ employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand, exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations, by factors reaching 100X and more. Approximate Shortest Distance Computing: A Query- Dependent Local Landmark Scheme. Synopsis:
  • 21. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding approach, which selects a set of graph nodes as landmarks and computes the shortest distances from each landmark to all nodes as an embedding. To answer a shortest distance query, the precomputed distances from the landmarks to the two query nodes are used to compute an approximate shortest distance based on the triangle inequality. In this paper, we analyze the factors that affect the accuracy of distance estimation in landmark embedding. In particular, we find that a globally selected, query-independent landmark set may introduce a large relative error, especially for nearby query nodes. To address this issue, we propose a query-dependent local landmark scheme, which identifies a local landmark close to both query nodes and provides more accurate distance estimation than the traditional global landmark approach. We propose efficient local landmark indexing and retrieval techniques, which achieve low offline indexing complexity and online query complexity. Two optimization techniques on graph compression and graph online search are also proposed, with the goal of further reducing index size and improving query accuracy. Furthermore, the challenge of immense graphs whose index may not fit in the memory leads us to store the embedding in relational database, so that a query of the local landmark scheme can be expressed with relational operators. Effective indexing and query optimization mechanisms are designed in this context. Our experimental results on large-scale social networks and road networks demonstrate that the local landmark scheme reduces the shortest distance estimation error significantly when compared with global landmark embedding and the state-of-the-art sketch-based embedding. A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data. Synopsis: Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST) is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clustering- based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum- spanning tree (MST) clustering method. The efficiency and effectiveness of the FAST
  • 22. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUS-SF, with respect to four types of well-known classifiers, namely, the probability-based Naive Bayes, the tree-based C4.5, the instance-based IB1, and the rule-based RIPPER before and after feature selection. The results, on 35 publicly available real-world high-dimensional image, microarray, and text data, demonstrate that the FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. Advance Mining of Temporal High Utility Itemset. Synopsis: The stock market domain is a dynamic and unpredictable environment. Traditional techniques, such as fundamental and technical analysis can provide investors with some tools for managing their stocks and predicting their prices. However, these techniques cannot discover all the possible relations between stocks and thus there is a need for a different approach that will provide a deeper kind of analysis. Data mining can be used extensively in the financial markets and help in stock-price forecasting. Therefore, we propose in this paper a portfolio management solution with business intelligence characteristics. We know that the temporal high utility itemsets are the itemsets with support larger than a pre-specified threshold in current time window of data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. We proposed the novel algorithm for temporal association mining with utility approach. This make us to find the temporal high utility itemset which can generate less candidate itemsets. Data Leakage Detection. Synopsis: We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody's laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject ―realistic but fake‖ data records to further improve our chances of detecting leakage and identifying the guilty party.
  • 23. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Best Keyword Cover Search. Synopsis: It is common that the objects in a spatial database (e.g., restaurants/hotels) are associated with keyword(s) to indicate their businesses/services/features. An interesting problem known as Closest Keywords search is to query objects, called keyword cover, which together cover a set of query keywords and have the minimum inter-objects distance. In recent years, we observe the increasing availability and importance of keyword rating in object evaluation for the better decision making. This motivates us to investigate a generic version of Closest Keywords search called Best Keyword Cover which considers inter-objects distance as well as the keyword rating of objects. The baseline algorithm is inspired by the methods of Closest Keywords search which is based on exhaustively combining objects from different query keywords to generate candidate keyword covers. When the number of query keywords increases, the performance of the baseline algorithm drops dramatically as a result of massive candidate keyword covers generated. To attack this drawback, this work proposes a much more scalable algorithm called keyword nearest neighbor expansion (keyword-NNE). Compared to the baseline algorithm, keyword-NNE algorithm significantly reduces the number of candidate keyword covers generated. The in-depth analysis and extensive experiments on real data sets have justified the superiority of our keyword-NNE algorithm. A Generalized Flow-Based Method for Analysis of Implicit Relationships on Wikipedia. Synopsis: We focus on measuring relationships between pairs of objects in Wikipedia whose pages can be regarded as individual objects. Two kinds of relationships between two objects exist: in Wikipedia, an explicit relationship is represented by a single link between the two pages for the objects, and an implicit relationship is represented by a link structure containing the two pages. Some of the previously proposed methods for measuring relationships are cohesion-based methods, which underestimate objects having high degrees, although such objects could be important in constituting relationships in Wikipedia. The other methods are inadequate for measuring implicit relationships because they use only one or two of the
  • 24. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com following three important factors: distance, connectivity, and cocitation. We propose a new method using a generalized maximum flow which reflects all the three factors and does not underestimate objects having high degree. We confirm through experiments that our method can measure the strength of a relationship more appropriately than these previously proposed methods do. Another remarkable aspect of our method is mining elucidatory objects, that is, objects constituting a relationship. We explain that mining elucidatory objects would open a novel way to deeply understand a relationship. An Exploration of Improving Collaborative Recommender Systems via User-Item Subgroups. Synopsis: Collaborative filtering (CF) is one of the most successful recommendation approaches. It typically associates a user with a group of like-minded users based on their preferences over all the items, and recommends to the user those items enjoyed by others in the group. However we find that two users with similar tastes on one item subset may have totally different tastes on another set. In other words, there exist many user-item subgroups each consisting of a subset of items and a group of like-minded users on these items. It is more natural to make preference predictions for a user via the correlated subgroups than the entire user-item matrix. In this paper, to find meaningful subgroups, we formulate the Multiclass Co-Clustering (MCoC) problem and propose an effective solution to it. Then we propose an unified framework to extend the traditional CF algorithms by utilizing the subgroups information for improving their top-N recommendation performance. Our approach can be seen as an extension of traditional clustering CF models. Systematic experiments on three real world data sets have demonstrated the effectiveness of our proposed approach. Decision Trees for Uncertain Data. Synopsis: Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information" of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been
  • 25. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency. Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation.. Synopsis: With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We propose the random space perturbation (RASP) data perturbation method to provide secure and efficient range query and kNN query services for protected data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, to provide strong resilience to attacks on the perturbed data and queries. It also preserves multidimensional ranges, which allows existing indexing techniques to be applied to speedup range query processing. The kNN-R algorithm is designed to work with the RASP range query algorithm to process the kNN queries. We have carefully analyzed the attacks on data and queries under a precisely defined threat model and realistic security assumptions. Extensive experiments have been conducted to show the advantages of this approach on efficiency and security. A Methodology for Direct and Indirect Discrimination Prevention in Data Mining. Synopsis: Data mining is an increasingly important technology for extracting useful knowledge hidden in large collections of data. There are, however, negative social perceptions about data mining, among which potential privacy invasion and potential discrimination. The latter consists of unfairly treating people on the basis of their belonging to a specific group. Automated data collection and data mining techniques such as classification rule mining have paved the way to making automated decisions, like loan granting/denial, insurance premium computation, etc. If the training data sets are biased in what regards discriminatory (sensitive) attributes like gender, race, religion, etc., discriminatory decisions may ensue.
  • 26. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com For this reason, anti-discrimination techniques including discrimination discovery and prevention have been introduced in data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on sensitive attributes. Indirect discrimination occurs when decisions are made based on nonsensitive attributes which are strongly correlated with biased sensitive ones. In this paper, we tackle discrimination prevention in data mining and propose new techniques applicable for direct or indirect discrimination prevention individually or both at the same time. We discuss how to clean training data sets and outsourced data sets in such a way that direct and/or indirect discriminatory decision rules are converted to legitimate (nondiscriminatory) classification rules. We also propose new metrics to evaluate the utility of the proposed approaches and we compare these approaches. The experimental evaluations demonstrate that the proposed techniques are effective at removing direct and/or indirect discrimination biases in the original data set while preserving data quality. Anomaly Detection for Discrete Sequences: A Survey. Synopsis: This survey attempts to provide a comprehensive and structured overview of the existing research for the problem of detecting anomalies in discrete/symbolic sequences. The objective is to provide a global understanding of the sequence anomaly detection problem and how existing techniques relate to each other. The key contribution of this survey is the classification of the existing research into three distinct categories, based on the problem formulation that they are trying to solve. These problem formulations are: 1) identifying anomalous sequences with respect to a database of normal sequences; 2) identifying an anomalous subsequence within a long sequence; and 3) identifying a pattern in a sequence whose frequency of occurrence is anomalous. We show how each of these problem formulations is characteristically distinct from each other and discuss their relevance in various application domains. We review techniques from many disparate and disconnected application domains that address each of these formulations. Within each problem formulation, we group techniques into categories based on the nature of the underlying algorithm. For each category, we provide a basic anomaly detection technique, and show how the existing techniques are variants of the basic technique. This approach shows how different techniques within a category are related or different from each other. Our categorization reveals new variants and combinations that have not been investigated before for anomaly detection. We also provide a discussion of relative strengths and weaknesses of different techniques. We show how techniques developed for one problem formulation can be adapted to solve a different formulation, thereby providing several novel adaptations to solve the different problem formulations. We also highlight the applicability of the techniques that handle discrete sequences to other related areas such as online anomaly detection and time series anomaly detection.
  • 27. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Discovering Conditional Functional Dependencies. Synopsis: This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed-item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to choose for different applications. Capturing Telic/Atelic Temporal Data Semantics: Generalizing Conventional Conceptual Models. Synopsis: Time provides context for all our experiences, cognition, and coordinated collective action. Prior research in linguistics, artificial intelligence, and temporal databases suggests the need to differentiate between temporal facts with goal-related semantics (i.e., telic) from those are intrinsically devoid of culmination (i.e., atelic). To differentiate between telic and atelic data semantics in conceptual database design, we propose an annotation-based temporal conceptual model that generalizes the semantics of a conventional conceptual model. Our temporal conceptual design approach involves: 1) capturing "what" semantics using a conventional conceptual model; 2) employing annotations to differentiate between telic and atelic data semantics that help capture "when" semantics; 3) specifying temporal
  • 28. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com constraints, specifically nonsequenced semantics, in the temporal data dictionary as metadata. Our proposed approach provides a mechanism to represent telic/atelic temporal semantics using temporal annotations. We also show how these semantics can be formally defined using constructs of the conventional conceptual model and axioms in first-order logic. Via what we refer to as the "semantics of composition," i.e., semantics implied by the interaction of annotations, we illustrate the logical consequences of representing telic/atelic data semantics during temporal conceptual design. A New Algorithm for Inferring User Search Goals with Feedback Sessions. Synopsis: For a broad-topic and ambiguous query, different users may have different search goals when they submit it to a search engine. The inference and analysis of user search goals can be very useful in improving search engine relevance and user experience. In this paper, we propose a novel approach to infer user search goals by analyzing search engine query logs. First, we propose a framework to discover different user search goals for a query by clustering the proposed feedback sessions. Feedback sessions are constructed from user click-through logs and can efficiently reflect the information needs of users. Second, we propose a novel approach to generate pseudo-documents to better represent the feedback sessions for clustering. Finally, we propose a new criterion )―Classified Average Precision (CAP)‖ to evaluate the performance of inferring user search goals. Experimental results are presented using user click-through logs from a commercial search engine to validate the effectiveness of our proposed methods. Automatic Discovery of Association Orders between Name and Aliases from the Web using Anchor Texts-based Co- occurrences. Synopsis: Many celebrities and experts from various fields may have been referred by not only their personal names but also by their aliases on web. Aliases are very important in information retrieval to retrieve complete information about a personal name from the web, as some of the web pages of the person may also be referred by his aliases. The aliases for a personal name are extracted by previously proposed alias extraction method. In information retrieval, the web search engine automatically expands the search query on a person name by tagging his aliases for complete information retrieval thereby improving recall in relation detection task and achieving a significant mean reciprocal rank (MRR) of search engine.
  • 29. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com For the further substantial improvement on recall and MRR from the previously proposed methods, our proposed method will order the aliases based on their associations with the name using the definition of anchor texts-based co-occurrences between name and aliases in order to help the search engine tag the aliases according to the order of associations. The association orders will automatically be discovered by creating an anchor texts-based co-occurrence graph between name and aliases. Ranking support vector machine (SVM) will be used to create connections between name and aliases in the graph by performing ranking on anchor texts-based co-occurrence measures. The hop distances between nodes in the graph will lead to have the associations between name and aliases. The hop distances will be found by mining the graph. The proposed method will outperform previously proposed methods, achieving substantial growth on recall and MRR. Effective Navigation of Query Results Based on Concept Hierarchies. Synopsis: Search queries on biomedical databases, such as PubMed, often return a large number of results, only a small subset of which is relevant to the user. Ranking and categorization, which can also be combined, have been proposed to alleviate this information overload problem. Results categorization for biomedical databases is the focus of this work. A natural way to organize biomedical citations is according to their MeSH annotations. MeSH is a comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav system, a novel search interface that enables the user to navigate large number of query results by organizing them using the MeSH concept hierarchy. First, the query results are organized into a navigation tree. At each node expansion step, BioNav reveals only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modeling. We show that the problem of selecting the best concepts to reveal at each node expansion is NP-complete and propose an efficient heuristic as well as a feasible optimal algorithm for relatively small trees. We show experimentally that BioNav outperforms state-of-the-art categorization systems by up to an order of magnitude, with respect to the user navigation cost. BioNav for the MEDLINE database is available at. Dealing With Concept Drifts in Process MiningServices. Synopsis: Although most business processes change over time, contemporary process mining techniques tend to analyze these processes as if they are in a steady state. Processes may change suddenly or gradually. The drift may be periodic (e.g., because of seasonal
  • 30. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com influences) or one-of-a-kind (e.g., the effects of new legislation). For the process management, it is crucial to discover and understand such concept drifts in processes. This paper presents a generic framework and specific techniques to detect when a process changes and to localize the parts of the process that have changed. Different features are proposed to characterize relationships among activities. These features are used to discover differences between successive populations. The approach has been implemented as a plug-in of the ProM process mining framework and has been evaluated using both simulated event data exhibiting controlled concept drifts and real-life event data from a Dutch municipality. A Probabilistic Approach to String Transformation. Synopsis: Many problems in natural language processing, data mining, information retrieval, and bioinformatics can be formalized as string transformation, which is a task as follows. Given an input string, the system generates the k most likely output strings corresponding to the input string. This paper proposes a novel and probabilistic approach to string transformation, which is both accurate and efficient. The approach includes the use of a log linear model, a method for training the model, and an algorithm for generating the top k candidates, whether there is or is not a predefined dictionary. The log linear model is defined as a conditional probability distribution of an output string and a rule set for the transformation conditioned on an input string. The learning method employs maximum likelihood estimation for parameter estimation. The string generation algorithm based on pruning is guaranteed to generate the optimal top k candidates. The proposed method is applied to correction of spelling errors in queries as well as reformulation of queries in web search. Experimental results on large scale data show that the proposed approach is very accurate and efficient improving upon existing methods in terms of accuracy and efficiency in different settings. Confucius A Tool Supporting Collaborative Scientific Workflow Composition. Synopsis: Modern scientific data management and analysis usually rely on multiple scientists with diverse expertise. In recent years, such a collaborative effort is often structured and automated by a data flow-oriented process called scientific workflow. However, such workflows may have to be designed and revised among multiple scientists over a long time period. Existing workbenches are single user-oriented and do not support scientific workflow application development in a "collaborative fashion". In this paper, we report our research
  • 31. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com on the enabling techniques in the aspects of collaboration provenance management and reproduciability. Based on a scientific collaboration ontology, we propose a service-oriented collaboration model supported by a set of composable collaboration primitives and patterns. The collaboration protocols are then applied to support effective concurrency control in the process of collaborative workflow composition. We also report the design and development of Confucius, a service-oriented collaborative scientific workflow composition tool that extends an open-source, single-user development environment. Extended XML Tree Pattern Matching: Theories and Algorithms. Synopsis: As business and enterprises generate and exchange XML data more often, there is an increasing need for efficient processing of queries on XML data. Searching for the occurrences of a tree pattern query in an XML database is a core operation in XML query processing. Prior works demonstrate that holistic twig pattern matching algorithm is an efficient technique to answer an XML tree pattern with parent-child (P-C) and ancestor- descendant (A-D) relationships, as it can effectively control the size of intermediate results during query processing. However, XML query languages (e.g., XPath and XQuery) define more axes and functions such as negation function, order-based axis, and wildcards. In this paper, we research a large set of XML tree pattern, called extended XML tree pattern, which may include P-C, A-D relationships, negation functions, wildcards, and order restriction. We establish a theoretical framework about ―matching cross‖ which demonstrates the intrinsic reason in the proof of optimality on holistic algorithms. Based on our theorems, we propose a set of novel algorithms to efficiently process three categories of extended XML tree patterns. A set of experimental results on both real-life and synthetic data sets demonstrate the effectiveness and efficiency of our proposed theories and algorithms. Efficient Ranking on Entity Graphs with Personalized Relationships. Synopsis: Authority flow techniques like PageRank and ObjectRank can provide personalized ranking of typed entity-relationship graphs. There are two main ways to personalize authority flow ranking: Node-based personalization, where authority originates from a set of user-specific nodes; edge- based personalization, where the importance of different edge types is user-specific. We propose the first approach to achieve efficient edge-based personalization using a combination of precomputation and runtime algorithms. In particular, we apply our method to ObjectRank,
  • 32. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com where a personalized weight assignment vector (WAV) assigns different weights to each edge type or relationship type. Our approach includes a repository of rankings for various WAVs. We consider the following two classes of approximation: (a) SchemaApprox is formulated as a distance minimization problem at the schema level; (b) DataApprox is a distance minimization problem at the data graph level. SchemaApprox is not robust since it does not distinguish between important and trivial edge types based on the edge distribution in the data graph. In contrast, DataApprox has a provable error bound. Both SchemaApprox and DataApprox are expensive so we develop efficient heuristic implementations, ScaleRank and PickOne respectively. Extensive experiments on the DBLP data graph show that ScaleRank provides a fast and accurate personalized authority flow ranking. A Survey of XML Tree Patterns. Synopsis: With XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic. Coupled Behavior Analysis for Capturing Coupling Relationships in Group-based Market Manipulations. Synopsis: Coupled behaviors, which refer to behaviors having some relationships between them, are usually seen in many real-world scenarios, especially in stock markets. Recently, the coupled hidden Markov model (CHMM)-based coupled behavior analysis has been proposed to consider the coupled relationships in a hidden state space. However, it requires
  • 33. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com aggregation of the behavioral data to cater for the CHMM modeling, which may overlook the couplings within the aggregated behaviors to some extent. In addition, the Markov assumption limits its capability to capturing temporal couplings. Thus, this paper proposes a novel graph-based framework for detecting abnormal coupled behaviors. The proposed framework represents the coupled behaviors in a graph view without aggregating the behavioral data and is flexible to capture richer coupling information of the behaviors (not necessarily temporal relations). On top of that, the couplings are learned via relational learning methods and an efficient anomaly detection algorithm is proposed as well. Experimental results on a real-world data set in stock markets show that the proposed framework outperforms the CHMM-based one in both technical and business measures. Group Enclosing Queries.. Synopsis: Given a set of points P and a query set Q, a group enclosing query (Geq) fetches the point p* ∈ P such that the maximum distance of p* to all points in Q is minimized. This problem is equivalent to the Min-Max case (minimizing the maximum distance) of aggregate nearest neighbor queries for spatial databases. This work first designs a new exact solution by exploring new geometric insights, such as the minimum enclosing ball, the convex hull, and the furthest voronoi diagram of the query group. To further reduce the query cost, especially when the dimensionality increases, we turn to approximation algorithms. Our main approximation algorithm has a worst case √2-approximation ratio if one can find the exact nearest neighbor of a point. In practice, its approximation ratio never exceeds 1.05 for a large number of data sets up to six dimensions. We also discuss how to extend it to higher dimensions (up to 74 in our experiment) and show that it still maintains a very good approximation quality (still close to 1) and low query cost. In fixed dimensions, we extend the √2-approximation algorithm to get a (1 + ε)-approximate solution for the Geq problem. Both approximation algorithms have O(log N + M) query cost in any fixed dimension, where N and M are the sizes of the data set P and query group Q. Extensive experiments on both synthetic and real data sets, up to 10 million points and 74 dimensions, confirm the efficiency, effectiveness, and scalability of the proposed algorithms, especially their significant improvement over the state-of-the-art method. Facilitating Document Annotation using Content and Querying Value. Synopsis: A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount
  • 34. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest. A System to Filter Unwanted Messages from OSN User Walls. Synopsis: One fundamental issue in today's Online Social Networks (OSNs) is to give users the ability to control the messages posted on their own private space to avoid that unwanted content is displayed. Up to now, OSNs provide little support to this requirement. To fill the gap, in this paper, we propose a system allowing OSN users to have a direct control on the messages posted on their walls. This is achieved through a flexible rule-based system, thatallows users to customize the filtering criteria to be applied to their walls, and a Machine Learning-based soft classifier automatically labeling messages in support of content-based filtering. Creating Evolving User Behaviour Profiles Automatically. Synopsis: Knowledge about computer users is very beneficial for assisting them, predicting their future actions or detecting masqueraders. In this paper, a new approach for creating and recognizing automatically the behavior profile of a computer user is presented. In this case, a computer user behavior is represented as the sequence of the commands she/he types during her/his work. This sequence is transformed into a distribution of relevant subsequences of commands in order to find out a profile that defines its behavior. Also,
  • 35. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com because a user profile is not necessarily fixed but rather it evolves/changes, we propose an evolving method to keep up to date the created profiles using an Evolving Systems approach. In this paper, we combine the evolving classifier with a trie-based user profiling to obtain a powerful self-learning online scheme. We also develop further the recursive formula of the potential of a data point to become a cluster center using cosine distance, which is provided in the Appendix. The novel approach proposed in this paper can be applicable to any problem of dynamic/evolving user behavior modeling where it can be represented as a sequence of actions or events. It has been evaluated on several real data streams. Load Shedding in Mobile Systems with MobiQual. Synopsis: In location-based, mobile continual query (CQ) systems, two key measures of quality-of- service (QoS) are: freshness and accuracy. To achieve freshness, the CQ server must perform frequent query reevaluations. To attain accuracy, the CQ server must receive and process frequent position updates from the mobile nodes. However, it is often difficult to obtain fresh and accurate CQ results simultaneously, due to 1) limited resources in computing and communication and 2) fast-changing load conditions caused by continuous mobile node movement. Hence, a key challenge for a mobile CQ system is: How do we achieve the highest possible quality of the CQ results, in both freshness and accuracy, with currently available resources? In this paper, we formulate this problem as a load shedding one, and develop MobiQual-a QoS-aware approach to performing both update load shedding and query load shedding. The design of MobiQual highlights three important features. 1) Differentiated load shedding: We apply different amounts of query load shedding and update load shedding to different groups of queries and mobile nodes, respectively. 2) Per-query QoS specification: Individualized QoS specifications are used to maximize the overall freshness and accuracy of the query results. 3) Low-cost adaptation: MobiQual dynamically adapts, with a minimal overhead, to changing load conditions and available resources. We conduct a set of comprehensive experiments to evaluate the effectiveness of MobiQual. The results show that, through a careful combination of update and query load shedding, the MobiQual approach leads to much higher freshness and accuracy in the query results in all cases, compared to existing approaches that lack the QoS-awareness properties of MobiQual, as well as the solutions that perform query-only or update-only load shedding. Fast Nearest Neighbor Search with Keywords. Synopsis:
  • 36. www.redpel.com +917620593389 redpelsoftware@gmail.com WhitePel Software Pvt Ltd 63/A, Ragvilas , Lane No – C, Koregaon Park Pune -411001 www.whitepel.com , info@whitepel.com , whitepelpune@gmail.com Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest neighbor query would instead ask for the restaurant that is the closest among those whose menus contain ―steak, spaghetti, brandy‖ all at the same time. Currently, the best solution to such queries is based on the IR 2-tree, which, as shown in this paper, has a few deficiencies that seriously impact its efficiency. Motivated by this, we develop a new access method called the spatial inverted index that extends the conventional inverted index to cope with multidimensional data, and comes with algorithms that can answer nearest neighbor queries with keywords in real time. As verified by experiments, the proposed techniques outperform the IR 2-tree in query response time significantly, often by a factor of orders of magnitude. Annotating Search Results from Web Databases.. Synopsis: An increasing number of databases have become web accessible through HTML form- based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective. Credibility Ranking of Tweets during High Impact Events. Synopsis: Twitter has evolved from being a conversation or opinion sharing medium among friends into a platform to share and disseminate information about current events. Events in the real world create a corresponding spur of posts (tweets) on Twitter. Not all content posted on Twitter is trustworthy or useful in providing information about the event. In this paper, we analyzed the credibility of information in tweets corresponding to fourteen high impact news