Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...ijcsa
The correlation and interactions among different biological entities comprise the biological system. Although already revealed interactions contribute to the understanding of different existing systems, researchers face many questions everyday regarding inter-relationships among entities. Their queries have potential role in exploring new relations which may open up a new area of investigation. In this paper, we introduce a text mining based method for answering the biological queries in terms of statistical computation such that researchers can come up with new knowledge discovery. It facilitates user to submit their query in natural linguistic form which can be treated as hypothesis. Our proposed approach analyzes the hypothesis and measures the p-value of the hypothesis with respect to the existing literature. Based on the measured value, the system either accepts or rejects the hypothesis from statistical point of view. Moreover, even it does not find any direct relationship among the entities of the hypothesis, it presents a network to give an integral overview of all the entities through which the entities might be related. This is also congenial for the researchers to widen their view and thus think of new hypothesis for further investigation. It assists researcher to get a quantitative evaluation of their assumptions such that they can reach a logical conclusion and thus aids in relevant re-searches of biological knowledge discovery. The system also provides the researchers a graphical interactive interface to submit their hypothesis for assessment in a more convenient way.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
New kind of intrusions causes deviation in the normal behaviour of traffic flow in
computer networks every day. This study focused on enhancing the learning capabilities of IDS
to detect the anomalies present in a network traffic flow by comparing the k-means approach of
data mining for intrusion detection and the outlier detection approach. The k-means approach
uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters.
Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow
record, whose value decides whether a traffic flow is normal or abnormal. These two methods
were then compared in terms of various performance metrics and the amount of computer
resources consumed by them. Overall, k-means was more accurate and precise and has better
classification rate than outlier detection in intrusion detection using traffic flows. This will help
systems administrators in their choice of IDS.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
Statistical Analysis based Hypothesis Testing Method in Biological Knowledge ...ijcsa
The correlation and interactions among different biological entities comprise the biological system. Although already revealed interactions contribute to the understanding of different existing systems, researchers face many questions everyday regarding inter-relationships among entities. Their queries have potential role in exploring new relations which may open up a new area of investigation. In this paper, we introduce a text mining based method for answering the biological queries in terms of statistical computation such that researchers can come up with new knowledge discovery. It facilitates user to submit their query in natural linguistic form which can be treated as hypothesis. Our proposed approach analyzes the hypothesis and measures the p-value of the hypothesis with respect to the existing literature. Based on the measured value, the system either accepts or rejects the hypothesis from statistical point of view. Moreover, even it does not find any direct relationship among the entities of the hypothesis, it presents a network to give an integral overview of all the entities through which the entities might be related. This is also congenial for the researchers to widen their view and thus think of new hypothesis for further investigation. It assists researcher to get a quantitative evaluation of their assumptions such that they can reach a logical conclusion and thus aids in relevant re-searches of biological knowledge discovery. The system also provides the researchers a graphical interactive interface to submit their hypothesis for assessment in a more convenient way.
National Resource for Networks Biology's TR&D Theme 3: Although networks have been very useful for representing molecular interactions and mechanisms, network diagrams do not visually resemble the contents of cells. Rather, the cell involves a multi-scale hierarchy of components – proteins are subunits of protein complexes which, in turn, are parts of pathways, biological processes, organelles, cells, tissues, and so on. In this technology research project, we will pursue methods that move Network Biology towards such hierarchical, multi-scale views of cell structure and function.
New kind of intrusions causes deviation in the normal behaviour of traffic flow in
computer networks every day. This study focused on enhancing the learning capabilities of IDS
to detect the anomalies present in a network traffic flow by comparing the k-means approach of
data mining for intrusion detection and the outlier detection approach. The k-means approach
uses clustering mechanisms to group the traffic flow data into normal and abnormal clusters.
Outlier detection calculates an outlier score (neighbourhood outlier factor (NOF)) for each flow
record, whose value decides whether a traffic flow is normal or abnormal. These two methods
were then compared in terms of various performance metrics and the amount of computer
resources consumed by them. Overall, k-means was more accurate and precise and has better
classification rate than outlier detection in intrusion detection using traffic flows. This will help
systems administrators in their choice of IDS.
Technology R&D Theme 2: From Descriptive to Predictive NetworksAlexander Pico
National Resource for Networks Biology's TR&D Theme 2: Genomics is mapping complex data about human biology and promises major medical advances. However, the routine use of genomics data in medical research is in its infancy, due mainly to the challenges of working with highly complex “big data”. In this theme, we will use network information to help organize, analyze and integrate these data into models that can be used to make clinically relevant diagnoses and predictions about an individual.
The quality of online news articles is decisive both for a reliable perception of their informativeness and for
including them as a reference when creating an encyclopaedia entry for a public attention event. Tackling
the enormous volume, variety and complexity of different articles disseminated online, several natural
language processing techniques have been developed for the purpose of capturing the quality of the web
content based on the concepts of objectivity classification and stylometric features, knowledge maturing,
factual density, or simple word count. This paper explores utilizes the factual density as a quality measure
of the information reported on the missing Malaysia Airliners Flight 370 as a public attention event in two
instances, considering the coverage during the initial investigation and the reporting marking the one year
anniversary since the plane got missing. The results suggest that the factual density can be utilized in
creating a high-quality encyclopaedia entry, however, under strict conditions in terms of increased
confidence level of the automated factual extraction.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
Event detection and summarization based on social networks and semantic query...ijnlc
Events can be characterized by a set of descriptive, collocated keywords extracted documents. Intuitively,
documents describing the same event will contain similar sets of keywords, and the graph for a document collection will contain clusters individual events. Helping users to understand the event is an acute problem nowadays as the users are struggling to keep up with tremendous amount of information published every day in the Internet. The challenging task is to detect the events from online web resources, it is getting more attentions. The important data source for event detection is a Web search log because the information it contains reflects users’ activities and interestingness to various real world events. There are three major issues playing role for event detection from web search logs: effectiveness, efficiency of
detected events. We focus on modeling the content of events by their semantic relations with other events
and generating structured summarization. Event mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization provides a comprehensible explanation of the event sequence based on certain aspects.
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...IJMTST Journal
Social media is attracting global crowd rapidly. In websites such as Facebook, twitter etc one can share, view, like posts, such as images, videos, texts. Users also interact with each other. Communities are part of few such social networking websites. In a community people can learn more about their area of interest, share information on those topics, discuss about their perspectives etc. This paper recommends how community can be suggested to a user based on enhanced quasi clique technique.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
Community Finding with Applications on Phylogenetic Networks [Extended Abstract]Luís Rita
[Master Thesis Extended Abstract]
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics.
The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and Layered Label Propagation (LLP); to benchmark them using two synthetic networks – Girvan-Newman (GN) and Lancichinetti-Fortunato-Radicchi (LFR); to test them in real networks, particularly, in one derived from a Staphylococcus aureus MLST dataset; to compare visualization frameworks – Cytoscape.js and D3.js, and, finally, to make it all available online (mscthesis.herokuapp.com).
Louvain, Infomap and LLP were implemented in JavaScript. Unless otherwise stated, next conclusions are valid for GN and LFR. In terms of speed, Louvain outperformed all others. Considering accuracy, in networks with well-defined communities, Louvain was the most accurate. For higher mixing, LLP was the best. Contrarily to weakly mixed, it is advantageous to increase the resolution parameter in highly mixed GN. In LFR, higher resolution decreases the accuracy of detection, independently of the mixing parameter. The increase of the average node degree enhanced partitioning accuracy and suggested detection by chance was minimized. It is computationally more intensive to generate GN with higher mixing or average degree, using the algorithm developed in the thesis or the LFR implementation. In S. aureus network, Louvain was the fastest and the most accurate in detecting the clusters of seven groups of strains directly evolved from the common ancestor.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
A large amount of data is present on the web. It contains huge number of web pages and to find suitable
information from them is very cumbersome task. There is need to organize data in formal manner so that
user can easily access and use them. To retrieve information from documents, there are many Information
Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit
semantic knowledge within documents and give precise results. IR technology is major factor responsible
for handling annotations in Semantic Web (SW) languages. With the rate of growth of web and huge
amount of information available on the web which may be in unstructured, semi structured or structured
form, it has become increasingly difficult to identify the relevant pieces of information on the internet. IR
technology is major factor responsible for handling annotations in Semantic Web (SW) languages.
Knowledgeable representation languages are used for retrieving information. So, there is need to build an
ontology that uses well defined methodology and process of developing ontology is called Ontology
Development. Secondly, Cloud computing and data mining have become famous phenomena in the current
application of information technology. With the changing trends and emerging of the new concept in the
information technology sector, data mining and knowledge discovery have proved to be of significant
importance. Data mining can be defined as the process of extracting data or information from a database
which is not explicitly defined by the database and can be used to come up with generalized conclusions
based on the trends obtained from the data. A database may be described as a collection of formerly
structured data. Multi agents data mining may be defined as the use of various agents cooperatively
interact with the environment to achieve a specified objective. Multi agents will always act on behalf of
users and will coordinate, cooperate, negotiate and exchange data with each other. An agent would
basically refer to a software agent, a robot or a human being Knowledge discovery can be defined as the
process of critically searching large collections of data with the aim of coming up with patterns that can be
used to make generalized conclusions. These patterns are sometimes referred to as knowledge about the
data. Cloud computing can be defined as the delivery of computing services in which shared resources,
information and software’s are provided over a network, for example, the information super highway.
Cloud computing is normally provided over a web based service which hosts all the resources required. As,
the knowledge mining is used in many fields of study such as in science and medicine, finance, education,
manufacturing and commerce. In this paper, the Semantic Web addresses the first part of this challenge by
trying to make the data also machine understandable in the form of Ontology, while Multi-Agen
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, research and review articles, engineering journal, International Journal of Engineering Inventions, hard copy of journal, hard copy of certificates, journal of engineering, online Submission, where to publish research paper, journal publishing, international journal, publishing a paper, hard copy journal, engineering journal
Resource management and search is very important yet challenging in large-scale distributed systems like
P2Pnetworks. Most existing P2P systems rely on indexing to efficiently route queries over the network.
However, searches based on such indices face two key issues. First, majority of existing search schemes
often rely on simply keyword based indices that can only support exact string based matches without taking
into account the meaning of words. Second it is difficult, if not impossible, to devise query based indexing
schemes that can represent all possible concept combinations without resulting in exponential index sizes.
To address these problems, we present BSI, a novel P2P indexing and query routing strategy to support
semantic based content searches. The BSI indexing structure captures the semantic content of documents
using a reference ontology. Our indexing scheme can efficiently handle multi-concept queries by
maintaining summary level information for each individual concept and concept combinations using a
novel space-efficient Two-level Semantic Bloom Filter(TSBF) data structure. By using TSBFs to represent
a large document and query base, BSI significantly reduces the communication cost and storage cost of
indices. Furthermore, We devise a low-overhead mechanism to allow peers to dynamically estimate the
relevance strength of a peer for multi-concept queries with high accuracy solely based on TSBFs. We also
propose a routing index compression mechanism to observe peers’ dynamic storage limitations with
minimal loss of information by exploiting a reference ontology structure. Based on the proposed index
structure, we design a novel query routing algorithm that exploits semantic based information to route
queries to semantically relevant peers. Performance evaluation demonstrates that our proposed approach
can improve the search recall of unstructured P2P systems up to 383.71% while keeping the
communication cost at a low level compared to state-of-art search mechanism OSQR [7].
Sensing complicated meanings from unstructured data: a novel hybrid approachIJECEIAES
The majority of data on computers nowadays is in the form of unstructured data and unstructured text. The inherent ambiguity of natural language makes it incredibly difficult but also highly profitable to find hidden information or comprehend complex semantics in unstructured text. In this paper, we present the combination of natural language processing (NLP) and convolution neural network (CNN) hybrid architecture called automated analysis of unstructured text using machine learning (AAUT-ML) for the detection of complex semantics from unstructured data that enables different users to make understand formal semantic knowledge to be extracted from an unstructured text corpus. The AAUT-ML has been evaluated using three datasets data mining (DM), operating system (OS), and data base (DB), and compared with the existing models, i.e., YAKE, term frequency-inverse document frequency (TF-IDF) and text-R. The results show better outcomes in terms of precision, recall, and macro-averaged F1-score. This work presents a novel method for identifying complex semantics using unstructured data.
The quality of online news articles is decisive both for a reliable perception of their informativeness and for
including them as a reference when creating an encyclopaedia entry for a public attention event. Tackling
the enormous volume, variety and complexity of different articles disseminated online, several natural
language processing techniques have been developed for the purpose of capturing the quality of the web
content based on the concepts of objectivity classification and stylometric features, knowledge maturing,
factual density, or simple word count. This paper explores utilizes the factual density as a quality measure
of the information reported on the missing Malaysia Airliners Flight 370 as a public attention event in two
instances, considering the coverage during the initial investigation and the reporting marking the one year
anniversary since the plane got missing. The results suggest that the factual density can be utilized in
creating a high-quality encyclopaedia entry, however, under strict conditions in terms of increased
confidence level of the automated factual extraction.
The NRNB has been funded as an NIGMS Biomedical Technology Research Resource since 2010. During the previous five-year period, NRNB investigators introduced a series of innovative methods for network biology including network-based biomarkers, network-based stratification of genomes, and automated inference of gene ontologies using network data. Over the next five years, we will seek to catalyze major phase transitions in how biological networks are represented and used, working across three broad themes: (1) From static to differential networks, (2) From descriptive to predictive networks, and (3) From flat to hierarchical networks bridging across scales. All of these efforts leverage and further support our growing stable of network technologies, including the popular Cytoscape network analysis infrastructure.
Event detection and summarization based on social networks and semantic query...ijnlc
Events can be characterized by a set of descriptive, collocated keywords extracted documents. Intuitively,
documents describing the same event will contain similar sets of keywords, and the graph for a document collection will contain clusters individual events. Helping users to understand the event is an acute problem nowadays as the users are struggling to keep up with tremendous amount of information published every day in the Internet. The challenging task is to detect the events from online web resources, it is getting more attentions. The important data source for event detection is a Web search log because the information it contains reflects users’ activities and interestingness to various real world events. There are three major issues playing role for event detection from web search logs: effectiveness, efficiency of
detected events. We focus on modeling the content of events by their semantic relations with other events
and generating structured summarization. Event mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization provides a comprehensible explanation of the event sequence based on certain aspects.
Alluding Communities in Social Networking Websites using Enhanced Quasi-cliqu...IJMTST Journal
Social media is attracting global crowd rapidly. In websites such as Facebook, twitter etc one can share, view, like posts, such as images, videos, texts. Users also interact with each other. Communities are part of few such social networking websites. In a community people can learn more about their area of interest, share information on those topics, discuss about their perspectives etc. This paper recommends how community can be suggested to a user based on enhanced quasi clique technique.
National Resource for Networks Biology's TR&D Theme 1: In this theme, we will develop a series of tools and methodologies for conducting differential analyses of biological networks perturbed under multiple conditions. The novel algorithmic methodologies enable us to make use of high-throughput proteomic level data to recover biological networks under specific biological perturbations. The software tools developed in this project enable researchers to further predict, analyze, and visualize the effects of these perturbations and alterations, while enabling researchers to aggregate additional information regarding the known roles of the involved interactions and their participants.
Community Finding with Applications on Phylogenetic Networks [Extended Abstract]Luís Rita
[Master Thesis Extended Abstract]
With the advent of high-throughput sequencing methods, new ways of visualizing and analyzing increasingly amounts of data are needed. Although some software already exist, they do not scale well or require advanced skills to be useful in phylogenetics.
The aim of this thesis was to implement three community finding algorithms – Louvain, Infomap and Layered Label Propagation (LLP); to benchmark them using two synthetic networks – Girvan-Newman (GN) and Lancichinetti-Fortunato-Radicchi (LFR); to test them in real networks, particularly, in one derived from a Staphylococcus aureus MLST dataset; to compare visualization frameworks – Cytoscape.js and D3.js, and, finally, to make it all available online (mscthesis.herokuapp.com).
Louvain, Infomap and LLP were implemented in JavaScript. Unless otherwise stated, next conclusions are valid for GN and LFR. In terms of speed, Louvain outperformed all others. Considering accuracy, in networks with well-defined communities, Louvain was the most accurate. For higher mixing, LLP was the best. Contrarily to weakly mixed, it is advantageous to increase the resolution parameter in highly mixed GN. In LFR, higher resolution decreases the accuracy of detection, independently of the mixing parameter. The increase of the average node degree enhanced partitioning accuracy and suggested detection by chance was minimized. It is computationally more intensive to generate GN with higher mixing or average degree, using the algorithm developed in the thesis or the LFR implementation. In S. aureus network, Louvain was the fastest and the most accurate in detecting the clusters of seven groups of strains directly evolved from the common ancestor.
A scalable, lexicon based technique for sentiment analysisijfcstjournal
Rapid increase in the volume of sentiment rich social media on the web has resulted in an increased
interest among researchers regarding Sentimental Analysis and opinion mining. However, with so much
social media available on the web, sentiment analysis is now considered as a big data task. Hence the
conventional sentiment analysis approaches fails to efficiently handle the vast amount of sentiment data
available now a days. The main focus of the research was to find such a technique that can efficiently
perform sentiment analysis on big data sets. A technique that can categorize the text as positive, negative
and neutral in a fast and accurate manner. In the research, sentiment analysis was performed on a large
data set of tweets using Hadoop and the performance of the technique was measured in form of speed and
accuracy. The experimental results shows that the technique exhibits very good efficiency in handling big
sentiment data sets.
Mining in Ontology with Multi Agent System in Semantic Web : A Novel Approachijma
A large amount of data is present on the web. It contains huge number of web pages and to find suitable
information from them is very cumbersome task. There is need to organize data in formal manner so that
user can easily access and use them. To retrieve information from documents, there are many Information
Retrieval (IR) techniques. Current IR techniques are not so advanced that they can be able to exploit
semantic knowledge within documents and give precise results. IR technology is major factor responsible
for handling annotations in Semantic Web (SW) languages. With the rate of growth of web and huge
amount of information available on the web which may be in unstructured, semi structured or structured
form, it has become increasingly difficult to identify the relevant pieces of information on the internet. IR
technology is major factor responsible for handling annotations in Semantic Web (SW) languages.
Knowledgeable representation languages are used for retrieving information. So, there is need to build an
ontology that uses well defined methodology and process of developing ontology is called Ontology
Development. Secondly, Cloud computing and data mining have become famous phenomena in the current
application of information technology. With the changing trends and emerging of the new concept in the
information technology sector, data mining and knowledge discovery have proved to be of significant
importance. Data mining can be defined as the process of extracting data or information from a database
which is not explicitly defined by the database and can be used to come up with generalized conclusions
based on the trends obtained from the data. A database may be described as a collection of formerly
structured data. Multi agents data mining may be defined as the use of various agents cooperatively
interact with the environment to achieve a specified objective. Multi agents will always act on behalf of
users and will coordinate, cooperate, negotiate and exchange data with each other. An agent would
basically refer to a software agent, a robot or a human being Knowledge discovery can be defined as the
process of critically searching large collections of data with the aim of coming up with patterns that can be
used to make generalized conclusions. These patterns are sometimes referred to as knowledge about the
data. Cloud computing can be defined as the delivery of computing services in which shared resources,
information and software’s are provided over a network, for example, the information super highway.
Cloud computing is normally provided over a web based service which hosts all the resources required. As,
the knowledge mining is used in many fields of study such as in science and medicine, finance, education,
manufacturing and commerce. In this paper, the Semantic Web addresses the first part of this challenge by
trying to make the data also machine understandable in the form of Ontology, while Multi-Agen
call for papers, research paper publishing, where to publish research paper, journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJEI, call for papers 2012,journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, research and review articles, engineering journal, International Journal of Engineering Inventions, hard copy of journal, hard copy of certificates, journal of engineering, online Submission, where to publish research paper, journal publishing, international journal, publishing a paper, hard copy journal, engineering journal
Resource management and search is very important yet challenging in large-scale distributed systems like
P2Pnetworks. Most existing P2P systems rely on indexing to efficiently route queries over the network.
However, searches based on such indices face two key issues. First, majority of existing search schemes
often rely on simply keyword based indices that can only support exact string based matches without taking
into account the meaning of words. Second it is difficult, if not impossible, to devise query based indexing
schemes that can represent all possible concept combinations without resulting in exponential index sizes.
To address these problems, we present BSI, a novel P2P indexing and query routing strategy to support
semantic based content searches. The BSI indexing structure captures the semantic content of documents
using a reference ontology. Our indexing scheme can efficiently handle multi-concept queries by
maintaining summary level information for each individual concept and concept combinations using a
novel space-efficient Two-level Semantic Bloom Filter(TSBF) data structure. By using TSBFs to represent
a large document and query base, BSI significantly reduces the communication cost and storage cost of
indices. Furthermore, We devise a low-overhead mechanism to allow peers to dynamically estimate the
relevance strength of a peer for multi-concept queries with high accuracy solely based on TSBFs. We also
propose a routing index compression mechanism to observe peers’ dynamic storage limitations with
minimal loss of information by exploiting a reference ontology structure. Based on the proposed index
structure, we design a novel query routing algorithm that exploits semantic based information to route
queries to semantically relevant peers. Performance evaluation demonstrates that our proposed approach
can improve the search recall of unstructured P2P systems up to 383.71% while keeping the
communication cost at a low level compared to state-of-art search mechanism OSQR [7].
Sensing complicated meanings from unstructured data: a novel hybrid approachIJECEIAES
The majority of data on computers nowadays is in the form of unstructured data and unstructured text. The inherent ambiguity of natural language makes it incredibly difficult but also highly profitable to find hidden information or comprehend complex semantics in unstructured text. In this paper, we present the combination of natural language processing (NLP) and convolution neural network (CNN) hybrid architecture called automated analysis of unstructured text using machine learning (AAUT-ML) for the detection of complex semantics from unstructured data that enables different users to make understand formal semantic knowledge to be extracted from an unstructured text corpus. The AAUT-ML has been evaluated using three datasets data mining (DM), operating system (OS), and data base (DB), and compared with the existing models, i.e., YAKE, term frequency-inverse document frequency (TF-IDF) and text-R. The results show better outcomes in terms of precision, recall, and macro-averaged F1-score. This work presents a novel method for identifying complex semantics using unstructured data.
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
In recent years, many daily processes such as internet web searching, e-mail filter-ing, social media services, e-commerce have benefited from machine learning tech-niques (ML). The implementation of ML techniques has been largely focused on blackbox methods where the general conclusions are not easily interpretable. Hence, theelaboration with other declarative software models to identify the correctness and com-pleteness of the models is not easy to perform. On the other hand, the emerge of somelogic-based machine learning techniques with their advantage of white box approachhave been proven to be well-suited for many software engineering tasks. In this paper,we propose the use of a logic-based approach to learn user preference in the form ofpairwise comparisons. APARELL as a novel approach of inductive learning is able tomodel the user’s preferences in description logic representation. This offers a rich, re-lational representation which is then can be used to produce a set of recommendations.A user study has been performed in our experiment to evaluate the implementation ofpairwise preference recommender system when compared to a standard list interface.The result of the experiment shows that the pairwise interface was significantly betterthan the other interface in many ways.
A recommender system-using novel deep network collaborative filteringIAESIJAI
The recommendation model aims to predict the user’s preferred items among million through analyzing the user-item relations; furthermore, Collaborative Filtering has been utilized as one of the successful recommendation approaches in last few years; however, it has the issue of sparsity. This research work develops a deep network collaborative filtering (DeepNCF), which incorporates graph neural network (GNN), and novel network collaborative filtering (NCF) for performance enhancement. At first user-item dual network is constructed, thereafter-custom weighted dual mode modularity is developed for edge clustering. Furthermore, GNN is utilized for capturing the complex relation between user and item. DeepNCF is evaluated considering the two distinctive. The experimental analysis is carried out on two datasets for Amazon and movielens dataset for recall@20 and recall@50 and the normalized discounted cumulative gain (NDCG) metric is evaluated for Amazon Dataset for NDCG@20 and NDCG@50. The proposed method outperforms the most relevant research and is accurate enough to give personalized recommendations and diversity.
Towards enhancing the user experience of ChIP-Seq data analysis web toolsIJECEIAES
Deoxyribonucleic acid (DNA) sequencing is the process of locating the sequence of the main chemical bases in the DNA. Next-generation sequencing (NGS) is the state-of-the-art DNA sequencing technique. The NGS technique advanced the biological science in analyzing human DNA due to its scalability, high throughput, and speed. Analyzing human DNA is crucial to determine the ability of a person to develop certain diseases and his ability to respond to certain medications. ChIP-sequencing is a method that combines chromatin immunoprecipitation (ChIP) with NGS sequencing to analyze protein interactions with DNA to identify binding sites. Many online web tools have been developed to conduct ChIP-Seq data analysis to either discover or find motifs, i.e., patterns of binding sites. Since these ChIP-Seq web tools need to be used by clinical practitioners, they must comply to the web-related usability tasks including effectiveness, efficiency and satisfaction to enhance the user experience (UX). To that end, we have conducted an empirical study to understand their UX design. Specifically, we have evaluated the usability of 8 widely used ChIP-Seq web tools against 6 known usability quality metrics. Our study shows that the design of the studied ChIP-Seq web tools does not follow the UX design principles.
Advance Clustering Technique Based on Markov Chain for Predicting Next User M...idescitation
According to the survey India is one of the
leading countries in the word for technical education and
management education. Numbers of students are increasing
day by day by the growth rate of 45% per annum. Advancement
in technology puts special effect on education system. This
helps in upgrading higher education. Some universities and
colleges are using these technologies. Weblog is one of them.
Main aim of this paper is to represent web logs using clustering
technique for predicting next user movement and user
behavior analysis. This paper moves around the web log
clustering technique based on Markov chain results .In this
paper we present an ideal approach to web clustering
(clustering web site users) and predicting their behavior for
next visit. Methodology: For generating effective result approx
14 engineering college web usage data is used and an advance
clustering approach is presenting after optimizing the other
clustering approach.Results: The user behavior is predicted
with the help of the advance clustering approach based on the
FPCM and k-mean. Proposed algorithm is used to mined and
predict user’s preferred paths. To predict the user behavior
existing approaches have been used. But the existing
approaches are not enough because of its reaction towards
noise. Thus with the help of ACM, noise is reduced, provides
more accurate result for predicting the user behavior. Approach
Implementation:The algorithm was implemented in MAT
LAB, DTRG and in Java .The experiment result proves that
this method is very effective in predicting user behavior. The
experimental results have validated the method’s effectiveness
in comparison with some previous studies.
Framework for opinion as a service on review data of customer using semantics...IJECEIAES
At opinion mining plays a significant role in representing the original and unbiased perception of the products/services. However, there are various challenges associated with performing an effective opinion mining in the present era of distributed computing system with dynamic behaviour of users. Existing approaches is more laborious towards extracting knowledge from the reviews of user which is further subjected to various rounds of operation with complex procedures. The proposed system addresses the problem by introducing a novel framework called as opinion-as-a-service which is meant for direct utilization of the extracted knowledge in most user friendly manner. The proposed system introduces a set of three sequential algorithm that performs aggregated of incoming stream of opinion data, performing indexing, followed by applying semantics for extracting knowledge. The study outcome shows that proposed system is better than existing system in mining performance.
A COMPREHENSIVE STUDY ON WILLINGNESS MAXIMIZATION FOR SOCIAL ACTIVITY PLANNIN...Nexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
Behavioural Modelling Outcomes prediction using Casual FactorsIJMER
Generating models from large data sets—and deter-mining which subsets of data to
mine—is becoming increasingly automated. However choosing what data to collect in the first place
requires human intuition or experience, usually supplied by a domain expert. This paper describes a
new approach to machine science which demonstrates for the first time that non-domain experts can
collectively formulate features, and provide values for those features such that they are predictive of
some behavioral outcome of interest. This was accomplished by building a web platform in which
human groups interact to both respond to questions likely to help predict a behavioral outcome and
pose new questions to their peers. This results in a dynamically-growing online survey, but the result
of this cooperative behavior also leads to models that can predict user's outcomes based on their
responses to the user-generated survey questions. Here we describe two web-based experiments that
instantiate this approach: the first site led to models that can predict users' monthly electric energy
consumption; the other led to models that can predict users' body mass index. As exponential
increases in content are often observed in successful online collaborative communities, the proposed
methodology may, in the future, lead to similar exponential rises in discovery and insight into the
causal factors of behavioral outcomes
Similar to Summary on the Conference of WISE 2013 (20)
Recommender system slides for undergraduateYueshen Xu
Slides for undergraduate in IR class. Presented in Chinese
Mainly focus on the background, application, real case, idea, basic method of recommender systems
This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
2. Overview
Introduction
The 14th
International Conference on Web Information System
Engineering (WISE)
13th
~ 15th
, Nanjing, China
Before: HK, Kyoto, Singapore, Roma, Brisbane, NY, Nancy,
Poznan, etc.
Statistics of acceptance
Num. of Research papers: 48
Accepted rate: 24%
Num. of Long papers: 25; Num. of Short papers: 23
10 Demos, 5 challenge reports
Come from: 38 Countries around the world
07/11/14 Middleware, CCNT, ZJU 2
3. Overview
General Co-chairs
07/11/14 Middleware, CCNT, ZJU 3
PC Co-chairs
Yahho!
Research Lab
Victoria Univesity University of New
South Wales
Aristotle University AT&T Lab
Industry Chairs
Google
Research
HKUST
Tutorial Co-chairs
CUHK Poznan
University
4. Overview
Publicity Co-chairs
07/11/14 Middleware, CCNT, ZJU 4
Society Representative
Aristotle
University
University of
New South Wales
University of
Queensland
Keynote Speaker
Peking University
Academician
Towards web-based
video processing
UCSB, ACM Fellow
Data-driven Methodologies
for understanding, managing
and analyzing Online Social
Networks
5. Overview
07/11/14 Middleware, CCNT, ZJU 5
Keynote Speaker
University of Technology,
Sydney, Australia
Senior Member, IEEE
Big Data Related
Research Issues and
Progress
New Jersey Institute of
Technology
Security of Cyber-Physical
Systems
Distinguished Young Scientists Forum on Big Data
Jianmin Wang, Tsinghua Univ.
Enhong Chen, USTC
Aoying Zhou, East China normal Univ.
Guoren Wang, Northeastern Univ.
Etc.
6. Session
Web Mining (2): 11
Web Recommendation (2): 9
Hidden Web: 4
Web Services: 4
Semi-structured Data and Modeling: 7
Social Web (2) : 11
Web Monitoring and Management: 6
Innovative Techniques and Creations (2): 8
Web Text Mining: 6
Networks and Graphs: 6
Demo (2): 5
07/11/14 Middleware, CCNT, ZJU 6
7. Web Mining(I)
Ying Xu, Zhiqiang Gao, Campbell Wilson, Zhizheng Zhang, Man Zhu, Qiu Ji:
Entity Correspondence with Second-Order Markov Logic. 1-14
Youliang Zhong, Lan Du, Jian Yang: Learning Social Relationship Strength
via Matrix Co-Factorization with Multiple Kernels. 15-28
Shengsheng Shi, Wu Wei, Yulong Liu, Haitao Wang, Lei Luo, Chunfeng
Yuan, Yihua Huang: NEXIR: A Novel Web Extraction Rule Language toward
a Three-Stage Web Data Extraction Model. 29-42
Jun Deng, Liang Du, Yi-Dong Shen: Heterogeneous Metric Learning for
Cross-Modal Multimedia Retrieval. 43-56
Margarita Karkali, François Rousseau, Alexandros Ntoulas, Michalis
Vazirgiannis: Efficient Online Novelty Detection in News Online. 57-71
07/11/14 Middleware, CCNT, ZJU 7
In this paper we propose a KPMCF model to learn social relationship strength based on users’ latent
features inferred from both profile and interaction information. The proposed model takes an
uniformed approach of integrating Matrix Co-Factorization with Multiple Kernels. We conduct
experiments on real-world data sets for typical web mining applications, showing that the proposed
model produces better relationship strength measurement in comparison with other social factors.
In this paper, we propose a Bayesian personalized ranking based heterogeneous metric learning
(BPRHML) algorithm, which optimizes for correctly ranking the retrieval results. It uses pairwise
preference constraints as training data and explicitly optimizes for preserving these constraints. To
further encourage the smoothness of learning results, we integrate graph regularization with
Bayesian personalized ranking
In this paper, we propose a new novelty detection algorithm based on the Inverse Document
Frequency (IDF) scoring function. Computing novelty based on IDF enables us to avoid similarity
comparisons with previous documents in the text online, thus leading to faster execution times. At the
same time, our proposed approach outperforms several commonly used baselines when applied on a
real-world news articles dataset.
Eric Xing
CMU
Yueting Zhuang, YanFei Wang, Fei Wu, Yin Zhang, Weiming Lu: Supervised
Coupled Dictionary Learning with Group Structures for Multi-modal
Retrieval. AAAI 2013, Regular Paper
Deng Cai, Xiaofei He, Jiawei Han, Thomas S. Huang: Graph Regularized
Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern
Anal. Mach. Intell. 33(8): 1548-1560 (2011)
8. Web Mining(II)
Daling Wang, Shi Feng, Dong Wang, Ge Yu: Detecting Opinion Drift from
Chinese Web Comments Based on Sentiment Distribution Computing.
72-81
Peng Zhao, Xue Li, Ke Wang: Feature Extraction from Micro-blogs for
Comparison of Products and Services. 82-91
Shahida Jabeen, Xiaoying Gao, Peter Andreae: Directional Context Helps:
Guiding Semantic Relatedness Computation by Asymmetric Word
Associations. 92-101
Jun Hou, Richi Nayak: The Heterogeneous Cluster Ensemble Method
Using Hubness for Clustering Text Documents. 102-110
Abdul Wahid, Xiaoying Gao, Peter Andreae: Exploiting User Queries for
Search Result Clustering. 111-120
07/11/14 Middleware, CCNT, ZJU 8
The proposed approach first determines possible drift timestamps according to the change of
comment number, computes different sentiment orientations and their distributions at these
timestamps, detects opinion drift according to the distribution changes, and analyzes the
influences of related events occurring in the timestamps. Extensive experiments were conducted in a
real comment set of Chinese forum.In this paper, we show our system namely OpinionAnalyzer, a novel social network analyzer designed
to collect opinions from Twitter micro-blogs about two given similar products for an effective
comparison between them. The system outcome is a structure of features for the given products that
people have expressed opinions about. Then the corresponding sentiment analysis on those features
is performed. Our system can be used to understand user’s preference to a certain product and show
the reasons why users prefer this product.
We propose a cluster ensemble method to map the corpus documents into the semantic space
embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous
cluster ensemble is constructed with multiple types of relations i.e. document-term, document-
concept and document-category. A final clustering solution is obtained by exploiting associations
between document pairs and hubness of the documents
Adaboost & Bagging
George
Mason
9. Web Recommendation(I)
Xin Liu: Towards Context-Aware Social Recommendation via Trust
Networks. 121-134
Weilong Yao, Jing He, Guangyan Huang, Jie Cao, Yanchun
Zhang: Personalized Recommendation on Multi-Layer Context
Graph. 135-148
Giseli Rabello Lopes, Luiz André P. Paes Leme, Bernardo Pereira
Nunes, Marco Antonio Casanova, Stefan Dietze: Recommending Tripleset
Interlinking through a Social Network Approach. 149-161
Chong Wang, Yao Shen, Huan Yang, Minyi Guo: Improving Rocchio
Algorithm for Updating User Profile in Recommender Systems. 162-174
Kai Wang, Richong Zhang, Xudong Liu, Xiaohui Guo, Hailong Sun, Jinpeng
Huai: Time-Aware Recommendation based on Tensor Factorization. 175-
188
07/11/14 Middleware, CCNT, ZJU 9
We employ random walk to collect the most relevant ratings based on the multi-dimensional
trustworthiness of users in the trust network. Factorization machines model is then applied on the
collected ratings to predict missing ratings considering various evaluation based on a real dataset
demonstrates that our approach improves the accuracy of the state-of-the-art social, context-aware
and trust-aware recommendation modelsIn this paper, we propose a Multi-Layer Context Graph (MLCG) model which incorporates a variety
of contextual information into a recommendation process and models the interactions between users
and items for better recommendation. Moreover, we provide a new ranking algorithm based on
Personalized PageRank for recommendation in MLCG, which captures users’ preferences and
current situations. Top-K Recommendation
In this paper, we exploit a 3-way tensor to integrate context information. Based on this model, we
propose a time-aware recommendation approach. In addition, a tensor factorization-based
approach by maximizing the ranking performance measure is proposed for predicting the possible
temporal-spatial correlations.
SVM
Supervised v.s.
Unsupervised
10. Web Recommendation(II)
Fangfang Li, Guandong Xu, Longbing Cao, Xiaozhong Fan, Zhendong Niu:
CGMF: Coupled Group-Based Matrix Factorization for Recommender
System. 189-198
Zhengang Wu, Liangwen Yu, Huiping Sun, Zhi Guan, Zhong Chen:
Authenticating Users of Recommender Systems Using Naive Bayes. 199-
208
Junyang Rao, Aixia Jia, Yansong Feng, Dongyan Zhao: Taxonomy Based
Personalized News Recommendation: Novelty and Diversity. 209-218
Xiaochi Wei, Heyan Huang, Xin Xin, Xianxiang Yang: Distinguishing Social
Ties in Recommender Systems by Graph-Based Algorithms. 219-228
07/11/14 Middleware, CCNT, ZJU 10
In this paper, we propose an innovative coupled group-based matrix factorization model for
recommender system by leveraging the user and item groups learned by topic modeling and
incorporating couplings between users and items and within users and items.
Given a recommendation list, we improve a user’s satisfaction by introducing the taxonomy based
novelty and diversity metrics to include novel, but potentially related items into the list, and filter out
redundant ones. The experimental results show that the coarse grained knowledge resources can
help a content-based news recommender system provides accurate as well as user-oriented
recommendations. ::: Case Study
In this paper, we investigate the issue of distinguishing different users’ influence power in
recommendation systematically. We propose to employ three graph-based algorithms (including
PageRank, HITS, and heat diffusion) to distinguish and propagate the influence among the friends of
an active user, and then integrate them into the factorization-based social recommendation
framework.
Tomoharu Iwata, Amar Shah, Zoubin Ghahramani: Discovering latent influence in
online social activities via shared cascade poisson processes. 266-274, SIGKDD,
2013
11. Social Web (I)
Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Lam Ngoc Tran. An
Evaluation of Aggregation Techniques in Crowdsourcing, pp, 1-15
Zhunchen Luo, jintao Tang and Ting Wang. Propagated Opinion Retrieval
in Twitter
Meiling Wang, Xiang Zhou, Qiuming Tao, Wei Wu. Diversifying Tag
Selection Result for Tag Clouds by Enhancing both Coverage and
Dissimilarity
Zhiang Wu, Alfredo Cuzzocrea. Community Detection in Multi-relational
Socail Networks
Maria Giatsoglou, Despoina Chatzakou. Community Detection in Social
Networks by Leveraging Interactions and intensities
Hemank Lamba and Ramasuri Narayanam. A Novel and Model
Independent Approach for Efficient Influence Maximization in Social
Networks
07/11/14 Middleware, CCNT, ZJU 11
We attempt to address this challenge by introducing a novel co-ranking framework,
named MutuRank. It makes full use of the mutual influence between relations and actors to transform
the multi-relational network to the single-relational network. We then present GMM-NK (Gaussian
Mixture Model with Neighbor Knowledge) based on local consistency principle to enhance the
performance of spectral clustering process in discovering overlapping communities.
In this paper we present a community detection approach for user interaction networks which
exploits both their structural properties and intensity patterns. The proposed approach builds on
existing graph clustering methods that identify both communities of nodes, as well as outliers. The
importance of incorporating interactions’ intensity in the community detection algorithm is initially
investigated by a benchmarking process on synthetic graphs.
In this paper, we precisely address this problem by proposing a new framework which fuses both link
and interaction data to come up with a backbone for a given social network, which can further be
used for efficient influence maximization. We then conduct thorough experimentation with several real
life social network datasets such as DBLP, Epinions, Digg, and Slashdot
Tomoharu Iwata, Amar Shah, Zoubin Ghahramani: Discovering latent influence in
online social activities via shared cascade poisson processes. 266-274, SIGKDD,
2013
12. Social Web (II)
Lijiang Chen, Yibing Zhao, Shimin Chen. Personalized List Recommenda-
tion in Twitter, pp 88-103
John Pfaltz. The Irreducible Spine of Undirected Networks
Fotios Psallidas, Alexandros Ntoulas. Soc Web: Efficient Monitoring of
Social Network Acivities, pp 118-136
Xiang Wang, Lele Yu, and Bin Cui. A multiple Feature Integration Model
to infer occupation from Social Media Records, pp 137-150
Jinpeng Chen, Zhenyu Wu, etc. Recommending Interesting Landmarks
Based on Geo-tags from Photo Sharing Sites, pp 151-159
07/11/14 Middleware, CCNT, ZJU 12
To address the challenge of bootstrapping Twitter Lists, we envision a novel tool that automatically
creates personalized Twitter Lists and recommends them to users. Compared with lists created by
real Twitter users, the lists generated by our algorithms achieve 73.6% similarity. Demo
In this paper, we propose a comprehensive framework to infer user’s occupation from his/her social
activities recorded in micro-blog message streams. A multi-source integrated classification model
is set up with some fine selected features. We first identify some beneficial basic content features,
and then we proceed to tailor a community discovery based latent dimension solution to extract
community features.
By using DFCM, we can cluster a large-scale geo-tagged web photo collection into groups (or
landmarks) by location. And then, we provide more friendly and comprehensive overviews for each
landmark. Subsequently, we model the users’ dynamical behaviors using the fusion user similarity,
which not only captures the overview semantic similarity, but also extract the trajectory similarity and
the landmark trajectory similarity.
Social Media/
Video SearchBei Pan, Yu Zheng, David Wilkie and Cyrus Shahabi. Crowd Sensing of Traffic
Anomalies based on Human Mobility and Social Media. SIGSPATIAL, 2013.
Jing Yuan, Yu Zheng, Xing Xie. Discovering regions of different functions in a city using
human mobility and POIs. SIGKDD, 2012
13. Web Text Mining
Seema Nagar, Kanika Narang, Sameep Mehta, L. V. Subramaniam, Kuntal
Dey. Topical Discussions on unstructured Microblogs: Analysis from a
Geographical Perspective, pp. 160-173
Lili Yang, Chunping Li, etc. Discovering Correlated Entities from News
Archives, pp. 174-187
Min Peng, Jiajia Huang, etc. High Quality Microblog Extraction Based on
Multiple Features Fusion and Time Frequency Transformation, pp. 188-
201
David S. Batista, Rui Silva, Bruno Martins, etc. A Minwise Hashing Method
for Addressing Relationship Extraction from Text, pp. 216-230
Roberto Rodriguez, Victor m.Pavon, Dernando Macias, etc. Generating a
Conceptual Representation of a Legacy Web Application, pp. 231-240
07/11/14 Middleware, CCNT, ZJU 13
we identify and characterize topical discussions at different geographical granularities, such as
countries and cities. We observe geographical localization of evolution of topical discussions.
Experimental results suggest that these discussion threads tend to evolve more strongly over
geographically finer granularities: they evolve more at city levels compared to country levels, and
more at country levels compared to globally.
We propose an extraction framework to get high quality information by considering different features
globally in social media. Specially, in order to reduce computing time and improve extraction
precision, some important social media features are employed and transformed into wavelet domain
and fused further, to get a weighted ensemble value. A large scale of Sina microblog dataset is used
to evaluate the framework’s performance.
14. Networks and Graphs
Shanshan Huang and Xiaojun Wan. AKMiner: Domain-Specific Knowledge
Graph Mining from Academic Literatures, pp. 241-255
Dayong Ye and minjie Zhang. A Study on the Evolution of Cooperation in
Networks. pp 285-298
Natwar Modani, Kuntal Dey, Ritesh Gupta, Shantanu Godbole. CDR
Analysis Based Telco Churn Prediction and Customer Behavior Insights:
A Case Study, pp 256-269
Helan Liang, Yanhua Du, Sujian Li. An Improved Genetic Algorithm for
Service Selection under Temporal Constraints in Cloud Computing, pp.
309-318
07/11/14 Middleware, CCNT, ZJU 14
In this paper, we propose a novel system called AKMiner (Academic Knowledge Miner) to
automatically mine useful knowledge from the articles in a specific domain, and then visually
present the knowledge graph to users. Our system consists of two major components: a) the
extraction module which extracts academic concepts and relations jointly based on Markov Logic
Network, and b) the visualization module which generates knowledge graphs, including concept-
cloud graphs and concept relation graphs.
In this paper, a self-organisation based strategy is proposed for the evolution of cooperation in
networks, which can utilise the strengths of current strategies and avoid the limitations of current
strategies. The proposed strategy is empirically evaluated and its good performance is exhibited.
Moreover, we also theoretically find that, in static networks, the final proportion of cooperators evolved
by any pure strategies fluctuates cyclically irrespective of the initial proportion of cooperators.
In this case study paper, we present our experience of participating in a competitive evaluation for
churn prediction and customer insights for a leading Asian telecom operator. We build a data mining
model to predict churners using key performance indicators (KPI) based on customer Call Detail
Records (CDR) and additional customer data available with the operator. Further, we analyze the
social network formed between the (prepaid and postpaid) churners as well as the entire subscriber
base. ::: Case Study
15. Thank You !
Q&A
Thank You !
Q&A
07/11/14 15Middleware, CCNT, ZJU
Summary of WISE 2013