SlideShare a Scribd company logo
1 of 6
Download to read offline
Towards A Skills Taxonomy∗
Mohsen Sayyadiharikandeh
School of Informatics and Computing
Indiana University
Bloomington, Indiana 47405
msayyadi@indiana.edu
Raquel L. Hill
School of Informatics and Computing
Indiana University
Bloomington, Indiana 47405
ralhill@indiana.edu
ABSTRACT
When evaluating job applications, recruiters and employers
try to determine whether the information that is provided
by a job seeker is accurate and whether it describes an in-
dividual that possesses sufficient skills. These questions are
related to the hidden skills and skills resolution problems.
In this paper we argue that using a skills taxonomy to iden-
tify and resolve unknown relationships between text that
describe an applicant and job descriptions is the best way
for addressing these problems. Unfortunately, no compre-
hensive, publicly available taxonomy exists. To this end,
this work proposes an automated process for creating a skills
taxonomy.
Effective and efficient methods for bootstrapping a taxon-
omy are critical to any process that names and characterizes
the properties and interrelationships of entities. To this end,
we present three potential methods for bootstrapping and
extending our skills taxonomy. We propose a hybrid scheme
that combines the beneficial features of those methods. Our
hybrid approach seeds the bootstrapping process with pub-
licly available resources and identifies new skill terms and
corresponding entity relationships. In this paper, we focus
specifically on using Wikipedia as our corpus and exploiting
its structure to populate the taxonomy. We begin by con-
structing a relationship graph of possible skill terms from
Wikipedia. We then use a data mining methodology to iden-
tify skill terms. Our results are promising, and we are able
to achieve a 98% classification rate.
Keywords
skill, ontology, bootstrapping, job application
1. INTRODUCTION
Research data show that recruiters and employers screen
applicants based on age, class, personality, and other charac-
teristics that may not be gleaned from a resume. In an ideal
∗
Submitted to DDTA’16. Please do not circulate.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
c 2016 Copyright held by the owner/author(s).
ACM ISBN XXXX.
DOI: XXXXX
situation, one might imagine that employers hire the most
skilled applicant, but sociological research shows this is not
the case [15, 12, 19, 21, 17, 8]. On the other hand, research
also shows that job applicants lack the tools to express and
present their skills. In addition, constant changes in technol-
ogy limit recruiters ability to specify a skill set that meets
job requirements [7]. Both issues contribute to the hidden
skills problem, which concerns representing, identifying and
measuring skills that haven’t been explicitly articulated in
the job description or resume. For example a job descrip-
tion may state that the applicant should know PHP, and an
applicant’s resume lists Symphony as a skill. Symphony is
a PHP web application framework, but the person that is
evaluating the resume may not be able to recognize the rela-
tionship between PHP and Symphony. Therefore the appli-
cant has a hidden skill, PHP, which has not been explicitly
articulated by the applicant, and may not be detected by the
evaluator. In addition to the hidden skills problem, there is
the skills resolution problem. The skills resolution problem
involves using online social media and online blogs to iden-
tify and confirm the skills of potential applicants. Software
like hiringSolved [14] scrapes social media to build a po-
tential skill cloud for applicants. The effectiveness of such
software depends in part on the software’s ability to map
text taken from social media to specific skills.
This research seeks to address the hidden skills and skills
resolution problems by proposing a framework for devel-
oping a skills taxonomy that specifies a hierarchy of job
skills and relationships between the specified skills. The
proposed framework uses the web as its language corpus,
and uses a hybrid model to extract relationship informa-
tion from the corpus. It extends prior work by using the
structure of web text to define rules for extracting informa-
tion and to specify the relationships between skill terms. we
present three potential methods for bootstrapping and ex-
tending our skills taxonomy and propose a hybrid method
that combines the beneficial features of those methods. Our
hybrid approach seeds the bootstrapping process with pub-
licly available resources. In this paper, we focus specifically
on using Wikipedia as our corpus and exploiting its struc-
ture to populate the taxonomy. We begin by constructing
a relationship graph of possible skill terms related to com-
puting domain from Wikipedia. We then use a data min-
ing methodology to identify skill terms. Experiments show
promising results, and we are able to achieve a 98% classifi-
cation rate.
The remainder of this paper is organized as follows. In
section 2 we explain prior efforts for creating a skills taxon-
omy. Section 3 describes a motivating example of the skills
resolution problem and how a skill taxonomy can be used
to resolve a job seeker’s skills from online data. Section
4 describes three different approaches for bootstrapping a
taxanomy, and our hybrid approach. We also present the
results of our initial experiments. We conclude the paper
and discuss future work in Section 5.
2. RELATED WORKS
The problem of creating a dataset or list of skills has been
tackled before [4, 2, 10], but the results cannot be accessed
freely for scientific projects. In addition, these datasets may
not specify the relationship between skill terms, thereby lim-
iting their use for solving the hidden skills and skills resolu-
tion problems.
Itsyourskills is a project that gets updated regularly, and
its aim is to cover skills across industries and functions [4].
The project website offers an API for the developers but it
is not freely available. Skill-project’s objective is to build a
comprehensive and organized database of all human skills [2].
The developers claim to have built the largest, multilingual
skills database by contribution of a skillful community. How-
ever, there is no API available for developers to benefit from
this project, and the manual process for generating the lists
is time consuming.
Bastian et al proposed a large-scale topic extraction and
inference method for inferring skills from the LinkedIn pro-
file of users [10]. They assume that ”Skills”are a data-driven
feature on LinkedIn, which means that the LinkedIn recom-
mender engine can use the data of the users and propose
skills for them. However, LinkedIn also allows members to
specify their skills and endorse (verify) the skills of others.
Linkedin offers an API to developers but skills are not part of
the attributes of basic profile data [1] and cannot be scraped.
Though there are instances of skill data sets, as previously
mentioned, they are not publicly available or they do not
contain comprehensive relationship information.
3. MOTIVATING EXAMPLE: SKILLS RES-
OLUTION PROBLEM
In this section we present a simple example of the skill res-
olution problem and discuss how a skill taxonomy provides
an effective solution for this problem. We use active users of
Stack Overflow [3] who have LinkedIn profiles to illustrate
this problem. Our method for identifying skills included
creating a skill cloud from the content of the replies that
a user posts in the Stack Overflow forum. User responses
have been evaluated by the forum’s community and assigned
ratings of technical relevance. Therefore, we consider skills
that are derived from highly rated responses to be reliable.
We implemented three simple methods: extracting keywords
from posts, using most frequently used tags, and using tags
related to a pre-defined set of skills.
For the first method of creating the skill cloud, we re-
trieved user posts and removed stop words. We use term fre-
quency and inverse document frequency (TF-IDF) to mea-
sure the importance of terms and to identify possible skills.
As shown in Figure 1, this approach produced noisy results.
There were many irrelevant terms in the keyword cloud,
which limits its usefulness as a skill cloud.
To refine the noisy results of the first method, our second
method uses Stack Overflow’s embedded structure to iden-
Figure 1: Keyword Cloud using posts authored by
user
tify relevant skill terms. A useful feature of Stack Overflow
is that the questions are associated with tags selected by
the users. Tags are words or phrases describing the topic of
the question and up to 5 tags are allowed when posting a
question. We created another cloud for the same user using
the frequency of the tags in his replies as the importance in-
dicator. Using tag frequency results in more relevant terms,
and less noise, as illustrated in Figure 2.
Words like ’mod-rewrite’ or ’.htaccess’ are not relevant
skill terms, and therefore an automatic way of intelligent fil-
tering is needed. For our third method, we manually created
a simple list of computer science related skills(flat table with
no complex structure), and used it to enhance the accuracy
of our skill cloud. Simply, we included in our skill cloud the
tags from users’ post in Stack Overflow that also appeared
in the skills list. Figure 3 shows that the list of irrelevant
words are filtered out.
Figure 2: Tag Cloud using frequent tags in posts
authored by user
However, the weakness of this method is that it discards
possibly relevant words that have no exact match in the
skills data set. Such terms may allow us to infer additional
skills by further analysis. For example, tags like ’Array’
or ’List’ which indicate that the user has some knowledge
of ’data structures’, can be used infer such knowledge. In
order for an automatic agent to be able to do this inference
task we would need to have a robust taxonomy that contains
different entities with embedded relationships between them.
Contextual domain information may be used to ease the
design of such a taxonomy, but populating the instances for
each entity would be time consuming task. In this paper
we explore three potential approaches for bootstrapping a
skill taxonomy from a initial set of skills. We also propose a
Figure 3: Tag Cloud using frequent associated tags
in posts with Skill dataset
hybrid approach that seeds the bootstrapping process with
publicly available data and identifies candidate skill terms.
4. METHODOLOGY
Based on prior works we propose to develop a skills tax-
onomy and a method capable of automatically bootstrap-
ping the taxonomy. Bootstrapping is a process for learning
relationship rules that alternates between learning rules or
rule accuracy from sets of instances of included entities and
finding instances using sets of rules. We studied different
approaches for determining relationships between skills, and
discovered several that in combination could help with creat-
ing and bootstrapping our skills taxonomy. We present these
works below and propose a hybrid approach that leverages
the benefits of these prior works.
4.1 Word2Vec
Word2Vec is a tool that creates a language model that
enables a user to measure the semantic similarity between a
queried word and words found in the input corpus [18]. The
resulting language model contains continuous vector repre-
sentations of words, and these vectors can be used to mea-
sure word similarity. This is a nice feature which can be
employed in our project for bootstrapping an initial data
set of skills. Another useful feature of this tool is that you
can use context to filter out irrelevant terms.
Word2Vec can use a corpus like Wikipedia to create a
language model, or any text corpus can be used to train a
language model. We conducted preliminary experiments us-
ing both Wikipedia(text8 corpus contains 100 megabytes of
wikipedia content) and our own corpus that contained job
descriptions scraped from famous job advertisement web-
sites. Initial experiments using Word2Vec produced some
noisy results that contained terms that did not fit our con-
text. For example, our experiment assumed a computer
science, programming languages context. When querying
python with Wikipedia as our corpus,we get a list of words
containing monty, moby, dracula, and circus. Table 1 shows
the results of the top 10 similar words for ’python’.
We also created our own corpus using job descriptions
collected by using the Bing search engine API [6] to scrape
Indeed.com website. Table 2 shows the results. While the
terms seem to be more related to computer science, all terms
cannot be considered as skills. To increase the overall accu-
Table 1: Top 10 similar words to ’python’ by
Word2Vec Results using pre-trained models on
text8 corpus
Term Similarity
xml 0.84
PHP 0.81
JavaScript 0.81
txt 0.81
pdf 0.80
ldap 0.79
Xhtml 0.78
Searchable 0.77
Perl 0.75
ASCII 0.73
Table 2: Top 10 similar words to ’python’ by
Word2Vec Results using trained model on our own
corpus
Term Similarity
Object-oriented’ 0.78
Interfaces’ 0.75
Implementations’ 0.75
Compilers 0.71
Scripting 0.71
Functionality 0.71
PHP 0.70
javascript 0.70
Fortran 0.69
Programmers 0.69
racy of these results, we propose to use the structure of web
text, such as Wikipedia articles, to specify semantic relation-
ships between terms, thereby enabling us to filter irrelevant
terms.
4.2 Exploiting Wikipedia Structure
The research community in the domain of information re-
trieval and semantic web often rely on Wikipedia as an ex-
ternal source of knowledge. Additionally, researchers use
the content and the embedded structure of the articles to
enhance the performance of tasks like query expansion [9],
named entity recognition [16, 20]. Mining the wiki can en-
hance our bootstrapping process by providing a hierarchy of
semantic relationships between terms.
We think that Wikipedia is a convenient knowledge source,
because it is a large corpus containing millions of articles
about named entities. Wikipedia is perhaps the the biggest
knowledge base in computing that can be mined in a more
structured way than search engine results. In addition, it
has more coverage than well-known ontologies like word net
[22] Our interest is to analyze the content of Wikipedia and
its internal structure in order to bootstrap our skills taxon-
omy. We propose to use the Wikipedia API [5], a web service
that provides convenient access to wiki features, data, and
meta-data. The Wikipedia API enables us to retrieve the
corresponding Wikipedia entry for each candidate word and
extract categories of the associated article or the content of
it. For instance, we use the first sentence of the entry(using
simple lexico-syntactic pattern) to build a hierarchy of con-
cepts and identify the relationship between terms. We use
this relationship information to create a graph of terms with
the links within the graph indicating a relationship between
two adjacent terms.
Relationship between terms in Wikipedia can be derived
in many other ways, including using html hyperlinks within
articles as indicators of relationships. This approach pro-
duces a lot of noise in the graph. This noise should be fil-
tered out because links can have different semantic meanings
far beyond the expected semantic relationship in our taxon-
omy. In addition to links, the Wikipedia ”category” tag may
be used to indicate a relationship between terms. Using the
category tag may still produce a large graph in terms of
branching factor and multiple inheritance relations, which
creates obstacles in searching and mining problems [22]. In
our project we decided to use this tag for creating graph of
wikipedia.
We used the Wikipedia API to start scraping from one
article with the title ’Python (programming language)’. Us-
ing category relationships, we scraped parent categories and
their subcategories of each category in the path with the
length equal to 2 from starter node(python article). The
resulted undirected graph had 1359 nodes and 1581 edges.
Two human annotators manually annotated the nodes in
the graph, identifying each node with a binary value of ei-
ther 1 or 0 with 1 indicating a new skill term. We used the
annotated data set as a training set for machine learning
algorithms that would classify terms as either related or not
related to IT skills. Approximately 80 percent of the nodes
are not related to skills and 20 percent have some informa-
tion related to skills. We used Gephi to visualize the graph
and used Force Atlas algorithm to change the layout of the
graph. Figure 4 shows the resulting Python-related graph.
Nodes shown in green were labeled as skills.
Figure 4: Resulted Wikipedia Graph starting by
python article
As we can see most of the desired nodes exist in the dense
sub-graph. We defined some structure based features, and
fit some models using different feature sets.
So for this binary classification problem we used logistic
regression, and decision tree to do the initial experiments.
Note that we started by some minimal features(degree, avg
degree of neighbors and exist-in-k-core) and incrementally
added other features until the point we got satisfying re-
Table 3: List of features
Feature Definition
Degree number of adjacent nodes
AVG degree of neighbors’ average degree of adjacent nodes
Exist in k-core
whether node exist
in 2-core subgraph
No of neighbours as skills
number of adjacent nodes
within the path equal to 2
Path to skill minimum path to skill node
Cluster no
the id of the cluster
that node is assigned
Cosine similarity
maximum cosine similarity value
with skills in dataset
No of similar skills
number of skills that the cosine
similarity value is above threshold
Target column article is related to IT skills
sults. We used cross validation with 10 folds and randomly
chose 80 percent of the data set for the training set and 20
percent for the test set. As common in machine learning we
found the fit model on training set and tested on the test set.
Since 80 percent of the dataset has one label the maximum
likelihood estimator(MLE) can predict with 80 percent ac-
curacy by predicting every node to non-relevant. However,
we consider MLE as a baseline and try to compare other
algorithm to it. Table 4 shows accuracy, recall and F1-score
of predictions. We can reframe the our problem here as a
retrieval task. So far we have indeed treated the problem as
a binary classification task, assuming that ‘being related to
IT skill’ was the ‘True’ label, and ‘not related to IT skills’
the ‘False’ one. We can instead consider each as the set of
“relevant”examples, and compute accuracy and recall scores
accordingly. Accuracy shows the performance of classifier in
average meaning that it calculates fraction of right predic-
tions. Precision is the fraction of positive predicted samples
that are truly related to IT skills over number of positive
predictions.
Precision =
relevant samples ∩ Retrieved samples
Retrieved samples
Recall shows the fraction of the samples that are relevant
to IT skills and are successfully retrieved.
Recall =
relevant samples ∩ Retrieved samples
Relevant samples
F1-score is defined as following:
F1 − score =
2 × Precision × Recall
Precision + Recall
Weights of features in the fit model by logistic regression
and most important decision tree rules shows that noOf-
SkillsInNeighbour is the most important feature for predic-
tion task.
The proposed automatic method for using Wikipedia’s
structure to identify relevant skills is scalable, and more fis-
cally efficient than prior approaches that used Amazon’s Me-
chanical Turks to find the associated articles [10]. However,
Table 4: Performance of classifiers
algorithm Acc. Prec. Rec. F1-score
MLE 0.80 80 0.80 0.80
Logistic Regression 0.81 0.78 0.37 0.53
Decision Tree 0.98 0.98 0.94 0.96
additional work is needed to refine the resulting list and re-
move the articles that are related to computing domain but
cannot be considered as skills.
4.3 Co-occurrence on web
Learning an ontology by web mining has been extensively
used by researchers [23, 13] Sanches and Moreno proposed
a methodology to build automatically an ontology, extract-
ing information from the World Wide Web by the help of
search engine from an initial keyword. Their method relies
extensively on a publicly available search engine and ex-
tracts concepts (based on its relation to the initial one and
statistical data about appearance) [23].
KnowItall [11] is an information extraction system that
can extract large numbers of high-quality instances of classes
(for example instances of the class City) or relations (like in-
stances of capitalOf(City, Country)). It is seeded with an
extensible ontology and a few generic rule templates from
which it creates text extraction rules for each class and rela-
tion in its ontology. It is a Web-based information extraction
system with a novel extract and-assess architecture that uses
generic lexico-syntactic patterns and Web-scale statistics to
extract class and relation instances from unstructured text
on the Web. KNOWITALL evaluates the information it ex-
tracts using statistics computed by treating the web as a
large corpus of text(no structure).
The common components of mentioned information re-
trieval systems is that they use search engine using some
initial terms and try to measure the co-occurrence of other
terms in the search result with the seed terms. We employ
these ideas for learning a skills taxonomy. We can use search
engine operators to focus on certain types of websites that
post job advertisements, like Indeed.com and Monster.com.
In other words, we can exploit the co-occurrence values of
terms in job descriptions for bootstrapping purposes.
4.4 Hybrid approach
Knowing the strengths and weaknesses of different ap-
proaches we decided to use a hybrid approach to build a
more robust method for building a skills taxonomy. The
best way to combine the outputs of three methods can be
learned through experimental exploration. Again we pro-
pose a machine learning based method in this level which
takes input features from other prior levels and outputs la-
bels. During the first phase of our experimental exploration,
we will run each method with the initial terms in our taxon-
omy and collect the results from each of them. Subsequently,
we will define meta-features and feed them to the last layer
classifier in order to do the final prediction task. Perhaps
any method that we use to learn how to combine the outputs
of the three previous methods is based on the assumption
that a frequent term in a set of domain specific texts with
close structure in Wikipedia indicates occurrence of a rele-
vant concept in our taxonomy. The general framework that
we propose is shown in Figure 5:
Figure 5: Proposed framework for hybrid approach
However, this is the macroscopic view of the hybrid method
and details of how features from previous layers can collabo-
ratively help a classifier to learn new skills can be understood
by further experiments.
5. CONCLUSIONS
Effective and efficient methods for bootstrapping of a tax-
onomy are critical to any process that names and charac-
terizes the properties and interrelationships of entities. We
presented three potential methods for bootstrapping and ex-
tending our skills taxonomy and propose a hybrid method
that combines the beneficial features of those methods. Our
hybrid approach seeds the bootstrapping process with pub-
licly available resources. In this paper, we focus specifically
on using Wikipedia as our corpus and exploiting its struc-
ture to populate the taxonomy. We begin by constructing
a relationship graph of possible skill terms from Wikipedia.
We identified a set of features in order to characterize the
nodes in Wikipedia graph. We were able to achieve 98% ac-
curacy with decision tree in finding the related nodes within
the graph that originated with one term.
These results are promising, but limited. As we extend our
relationship graph by searching for additional terms within
Wikipedia or other related corpuses on the web, more noise
is introduced. We are hopeful that our fully implemented
hybrid model will eliminate irrelevant terms from the graph
and produce accurate results.
6. ACKNOWLEDGMENTS
This work is funded by the National Science Foundation:
1537768
7. REFERENCES
[1] https://developer.linkedin.com/docs/fields/basic-
profile.
[2] http://skill-project.org.
[3] http://stackoverflow.com/.
[4] https://www.itsyourskills.com/list-of-all-skills.
[5] https://www.mediawiki.org/wiki/api:mainpage.
[6] http://www.bing.com/developers/s/apibasics.html.
[7] A rare opportunity never knocks.
[8] A. Acquisti and C. M. Fong. An experiment in hiring
discrimination via online social networks (july 17, 2015).
[9] M. Almasri, C. Berrut, and J.-P. Chevallet. Exploiting
wikipedia structure for short query expansion in cultural
heritage. In M.-F. Moens, C. Viard-Gaudin,
H. Zargayouna, and O. R. Terrades, editors,
CORIA-CIFED, pages 287–302. ARIA-GRCE, 2014.
[10] M. Bastian, M. Hayes, W. Vaughan, S. Shah,
P. Skomoroch, H. Kim, S. Uryasev, and C. Lloyd. Linkedin
skills: Large-scale topic extraction and inference. In
Proceedings of the 8th ACM Conference on Recommender
Systems, RecSys ’14, pages 1–8, New York, NY, USA,
2014. ACM.
[11] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M.
Popescu, T. Shaked, S. Soderland, D. S. Weld, and
A. Yates. Web-scale information extraction in knowitall:
(preliminary results). In Proceedings of the 13th
International Conference on World Wide Web, WWW ’04,
pages 100–110, New York, NY, USA, 2004. ACM.
[12] G. B. C. for Racial Equality, S. Carter, J. Hubbuck,
Nottingham, and D. C. R. Council. Half a chance? : a
report on job discrimination against young blacks in
Nottingham. London : Commission for Racial Equality in
association with Nottingham and District Community
Relations Council, 1980. Nottinghamshire. Nottingham.
Young coloured persons. Employment. Racial
discrimination (BNB/PRECIS).
[13] M. Hazman, S. R. El-Beltagy, and A. Rafea.
Ontology learning from domain specific web documents.
Int. J. Metadata Semant. Ontologies, 4(1/2):24–33, May
2009.
[14] HiringSolved. hiringsolved.com.
[15] R. Jowell and P. Prescott-Clarke. Racial discrimination
and white-collar workers in britain. Race Class,
11(4):397–417, 1970.
[16] J. Kazama and K. Torisawa. Exploiting wikipedia as
external knowledge for named entity recognition. In Joint
Conference on Empirical Methods in Natural Language
Processing and Computational Natural Language Learning,
pages 698–707, 2007.
[17] T. Kneese, A. Rosenblat, and D. Boyd. Understanding fair
labor practices in a networked age. October 2014.
[18] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient
estimation of word representations in vector space. CoRR,
abs/1301.3781, 2013.
[19] P. A. Riach and J. Rich. Testing for racial discrimination
in the labour market. Cambridge Journal of Economics,
15(3):239–256, 1991.
[20] A. E. Richman and P. Schone. Mining wiki resources for
multilingual named entity recognition,ˆa˘A˙I aclˆa˘A´Z08, 2008.
[21] L. A. Rivera. Hiring as cultural matching: The case of elite
professional service firms. American Sociological Review,
77(6):999–1022, 2012.
[22] M. Strube and S. P. Ponzetto. Wikirelate! computing
semantic relatedness using wikipedia. In Proceedings of the
21st National Conference on Artificial Intelligence -
Volume 2, AAAI’06, pages 1419–1424. AAAI Press, 2006.
[23] D. S˜AANCHEZ and A. MORENO. Creating ontologies
from web documents. In Recent Advances in Artificial
Intelligence Research and Development. IOS Press, 113,
pp.11-18.

More Related Content

What's hot

Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
 
Requirements identification for distributed agile team communication using hi...
Requirements identification for distributed agile team communication using hi...Requirements identification for distributed agile team communication using hi...
Requirements identification for distributed agile team communication using hi...journalBEEI
 
Semantic web based software engineering by automated requirements ontology ge...
Semantic web based software engineering by automated requirements ontology ge...Semantic web based software engineering by automated requirements ontology ge...
Semantic web based software engineering by automated requirements ontology ge...IJwest
 
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...ijwscjournal
 
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...ijcseit
 
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSMAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSijdms
 
IRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET Journal
 
Automatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product ReviewsAutomatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product ReviewsTELKOMNIKA JOURNAL
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02IJwest
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi LanguageIRJET Journal
 
Novel character segmentation reconstruction approach for license plate recogn...
Novel character segmentation reconstruction approach for license plate recogn...Novel character segmentation reconstruction approach for license plate recogn...
Novel character segmentation reconstruction approach for license plate recogn...Conference Papers
 
How and when to flatten java classes
How and when to flatten java classesHow and when to flatten java classes
How and when to flatten java classesijcseit
 
Fundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersFundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersAbdul Rahman Sherzad
 
IRJET- Information Chatbot for an Educational Institute
IRJET- Information Chatbot for an Educational InstituteIRJET- Information Chatbot for an Educational Institute
IRJET- Information Chatbot for an Educational InstituteIRJET Journal
 
EXTENDING WS-CDL TO SUPPORT REUSABILITY
EXTENDING WS-CDL TO SUPPORT REUSABILITY EXTENDING WS-CDL TO SUPPORT REUSABILITY
EXTENDING WS-CDL TO SUPPORT REUSABILITY ijwscjournal
 

What's hot (16)

Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
Requirements identification for distributed agile team communication using hi...
Requirements identification for distributed agile team communication using hi...Requirements identification for distributed agile team communication using hi...
Requirements identification for distributed agile team communication using hi...
 
Semantic web based software engineering by automated requirements ontology ge...
Semantic web based software engineering by automated requirements ontology ge...Semantic web based software engineering by automated requirements ontology ge...
Semantic web based software engineering by automated requirements ontology ge...
 
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
SOME INTEROPERABILITY ISSUES IN THE DESIGNING OF WEB SERVICES : CASE STUDY ON...
 
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...
EMPLOYERS’ NEEDS FOR COMPUTER SCIENCE, INFORMATION TECHNOLOGY AND SOFTWARE EN...
 
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERSMAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
MAPPING COMMON ERRORS IN ENTITY RELATIONSHIP DIAGRAM DESIGN OF NOVICE DESIGNERS
 
IRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi LanguageIRJET- Review of Chatbot System in Marathi Language
IRJET- Review of Chatbot System in Marathi Language
 
Automatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product ReviewsAutomatic Summarization in Chinese Product Reviews
Automatic Summarization in Chinese Product Reviews
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02
 
IRJET- Review of Chatbot System in Hindi Language
IRJET-  	  Review of Chatbot System in Hindi LanguageIRJET-  	  Review of Chatbot System in Hindi Language
IRJET- Review of Chatbot System in Hindi Language
 
Novel character segmentation reconstruction approach for license plate recogn...
Novel character segmentation reconstruction approach for license plate recogn...Novel character segmentation reconstruction approach for license plate recogn...
Novel character segmentation reconstruction approach for license plate recogn...
 
How and when to flatten java classes
How and when to flatten java classesHow and when to flatten java classes
How and when to flatten java classes
 
Fundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and AnswersFundamentals of Database Systems Questions and Answers
Fundamentals of Database Systems Questions and Answers
 
Meta Modelling Techniques
Meta Modelling TechniquesMeta Modelling Techniques
Meta Modelling Techniques
 
IRJET- Information Chatbot for an Educational Institute
IRJET- Information Chatbot for an Educational InstituteIRJET- Information Chatbot for an Educational Institute
IRJET- Information Chatbot for an Educational Institute
 
EXTENDING WS-CDL TO SUPPORT REUSABILITY
EXTENDING WS-CDL TO SUPPORT REUSABILITY EXTENDING WS-CDL TO SUPPORT REUSABILITY
EXTENDING WS-CDL TO SUPPORT REUSABILITY
 

Viewers also liked

Digital Catapult's Innovation Optimism Index
Digital Catapult's Innovation Optimism IndexDigital Catapult's Innovation Optimism Index
Digital Catapult's Innovation Optimism IndexCallum Lee
 
Ports Services Ref
Ports Services RefPorts Services Ref
Ports Services RefTrevor Doke
 
2013 10 sub 21 sueca resultats
2013 10 sub 21 sueca resultats2013 10 sub 21 sueca resultats
2013 10 sub 21 sueca resultatsclubbillarsueca
 
Enter & Work in Confine Space0001
Enter & Work in Confine Space0001Enter & Work in Confine Space0001
Enter & Work in Confine Space0001Trevor Doke
 
Survivorship Care Plans in the U.S.: Current Status and Future Challenges
Survivorship Care Plans in the U.S.: Current Status and Future ChallengesSurvivorship Care Plans in the U.S.: Current Status and Future Challenges
Survivorship Care Plans in the U.S.: Current Status and Future ChallengesCarevive
 
Apresentação alphaville alphaville urbanismo
Apresentação alphaville   alphaville urbanismo Apresentação alphaville   alphaville urbanismo
Apresentação alphaville alphaville urbanismo Geraldo Mendes Cerqueira
 
Enjeux de pouvoir au cœur de la ville intelligente
Enjeux de pouvoir  au cœur de la ville intelligenteEnjeux de pouvoir  au cœur de la ville intelligente
Enjeux de pouvoir au cœur de la ville intelligenteOPcyberland
 
Power point tecno
Power point tecnoPower point tecno
Power point tecnoDavidchill
 
Optimization of International student expenses at Wayne state ppt
Optimization of International student expenses at Wayne state pptOptimization of International student expenses at Wayne state ppt
Optimization of International student expenses at Wayne state pptParag Kapile
 
Current Status of Open Access Policy
Current Status of Open Access PolicyCurrent Status of Open Access Policy
Current Status of Open Access Policysmine
 
Story of Dhirubhai ambani
Story of Dhirubhai ambaniStory of Dhirubhai ambani
Story of Dhirubhai ambanirenujain1208
 
IE7610_REPORT_GROUP_8
IE7610_REPORT_GROUP_8IE7610_REPORT_GROUP_8
IE7610_REPORT_GROUP_8Parag Kapile
 
Careers in Computer Games: Game Analytics
Careers in Computer Games: Game AnalyticsCareers in Computer Games: Game Analytics
Careers in Computer Games: Game AnalyticsThomas Hulvershorn
 
Milton nascimento canção da américa
Milton nascimento   canção da américaMilton nascimento   canção da américa
Milton nascimento canção da américaClaudineiCamara
 

Viewers also liked (20)

GoMetro
GoMetroGoMetro
GoMetro
 
Digital Catapult's Innovation Optimism Index
Digital Catapult's Innovation Optimism IndexDigital Catapult's Innovation Optimism Index
Digital Catapult's Innovation Optimism Index
 
Ports Services Ref
Ports Services RefPorts Services Ref
Ports Services Ref
 
2013 10 sub 21 sueca resultats
2013 10 sub 21 sueca resultats2013 10 sub 21 sueca resultats
2013 10 sub 21 sueca resultats
 
re-disign
re-disignre-disign
re-disign
 
2013 sub21 lliure
2013 sub21 lliure2013 sub21 lliure
2013 sub21 lliure
 
Enter & Work in Confine Space0001
Enter & Work in Confine Space0001Enter & Work in Confine Space0001
Enter & Work in Confine Space0001
 
Survivorship Care Plans in the U.S.: Current Status and Future Challenges
Survivorship Care Plans in the U.S.: Current Status and Future ChallengesSurvivorship Care Plans in the U.S.: Current Status and Future Challenges
Survivorship Care Plans in the U.S.: Current Status and Future Challenges
 
Apresentação alphaville alphaville urbanismo
Apresentação alphaville   alphaville urbanismo Apresentação alphaville   alphaville urbanismo
Apresentação alphaville alphaville urbanismo
 
Agrobalkan77 ltd presentation
Agrobalkan77 ltd presentationAgrobalkan77 ltd presentation
Agrobalkan77 ltd presentation
 
Enjeux de pouvoir au cœur de la ville intelligente
Enjeux de pouvoir  au cœur de la ville intelligenteEnjeux de pouvoir  au cœur de la ville intelligente
Enjeux de pouvoir au cœur de la ville intelligente
 
PRESENTATION CYNARA CARDUNCULUS 2016.pptx
PRESENTATION CYNARA CARDUNCULUS 2016.pptxPRESENTATION CYNARA CARDUNCULUS 2016.pptx
PRESENTATION CYNARA CARDUNCULUS 2016.pptx
 
Power point tecno
Power point tecnoPower point tecno
Power point tecno
 
Optimization of International student expenses at Wayne state ppt
Optimization of International student expenses at Wayne state pptOptimization of International student expenses at Wayne state ppt
Optimization of International student expenses at Wayne state ppt
 
Current Status of Open Access Policy
Current Status of Open Access PolicyCurrent Status of Open Access Policy
Current Status of Open Access Policy
 
Story of Dhirubhai ambani
Story of Dhirubhai ambaniStory of Dhirubhai ambani
Story of Dhirubhai ambani
 
IE7610_REPORT_GROUP_8
IE7610_REPORT_GROUP_8IE7610_REPORT_GROUP_8
IE7610_REPORT_GROUP_8
 
Careers in Computer Games: Game Analytics
Careers in Computer Games: Game AnalyticsCareers in Computer Games: Game Analytics
Careers in Computer Games: Game Analytics
 
Milton nascimento canção da américa
Milton nascimento   canção da américaMilton nascimento   canção da américa
Milton nascimento canção da américa
 
Lumbar puncture
Lumbar punctureLumbar puncture
Lumbar puncture
 

Similar to Taxonomy_DDTA2016

MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...cscpconf
 
Multiagent system for scrutiny of
Multiagent system for scrutiny ofMultiagent system for scrutiny of
Multiagent system for scrutiny ofcsandit
 
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...cscpconf
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Modern Resume Analyser For Students And Organisations
Modern Resume Analyser For Students And OrganisationsModern Resume Analyser For Students And Organisations
Modern Resume Analyser For Students And OrganisationsIRJET Journal
 
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS ijaia
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Jill Lyons
 
Student’s Skills Evaluation Techniques using Data Mining.
Student’s Skills Evaluation Techniques using Data Mining.Student’s Skills Evaluation Techniques using Data Mining.
Student’s Skills Evaluation Techniques using Data Mining.IOSRjournaljce
 
Creating a Use Case
Creating a Use Case                                               Creating a Use Case
Creating a Use Case CruzIbarra161
 
Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...christophefeltus
 
Data mining for prediction of human
Data mining for prediction of humanData mining for prediction of human
Data mining for prediction of humanIJDKP
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemIRJET Journal
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...ijcsit
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...AIRCC Publishing Corporation
 
Information Extraction using Rule based Software Agents in Knowledge Grid
Information Extraction using Rule based Software Agents in Knowledge GridInformation Extraction using Rule based Software Agents in Knowledge Grid
Information Extraction using Rule based Software Agents in Knowledge Grididescitation
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND Rijaia
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R gerogepatton
 
CV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNINGCV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNINGIRJET Journal
 

Similar to Taxonomy_DDTA2016 (20)

Measurement And Validation
Measurement And ValidationMeasurement And Validation
Measurement And Validation
 
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
 
Multiagent system for scrutiny of
Multiagent system for scrutiny ofMultiagent system for scrutiny of
Multiagent system for scrutiny of
 
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
MULTIAGENT SYSTEM FOR SCRUTINY OF ADMISSION FORMS USING AUTOMATIC KNOWLEDGE C...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Modern Resume Analyser For Students And Organisations
Modern Resume Analyser For Students And OrganisationsModern Resume Analyser For Students And Organisations
Modern Resume Analyser For Students And Organisations
 
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
AUTOMATED TOOL FOR RESUME CLASSIFICATION USING SEMENTIC ANALYSIS
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
Student’s Skills Evaluation Techniques using Data Mining.
Student’s Skills Evaluation Techniques using Data Mining.Student’s Skills Evaluation Techniques using Data Mining.
Student’s Skills Evaluation Techniques using Data Mining.
 
Creating a Use Case
Creating a Use Case                                               Creating a Use Case
Creating a Use Case
 
Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...
 
Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...Alignment of remmo with rbac to manage access rights in the frame of enterpri...
Alignment of remmo with rbac to manage access rights in the frame of enterpri...
 
Data mining for prediction of human
Data mining for prediction of humanData mining for prediction of human
Data mining for prediction of human
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
 
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
DEVELOPING A FRAMEWORK FOR ONLINE PRACTICE EXAMINATION AND AUTOMATED SCORE GE...
 
Information Extraction using Rule based Software Agents in Knowledge Grid
Information Extraction using Rule based Software Agents in Knowledge GridInformation Extraction using Rule based Software Agents in Knowledge Grid
Information Extraction using Rule based Software Agents in Knowledge Grid
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
 
CV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNINGCV INSPECTION USING NLP AND MACHINE LEARNING
CV INSPECTION USING NLP AND MACHINE LEARNING
 

Taxonomy_DDTA2016

  • 1. Towards A Skills Taxonomy∗ Mohsen Sayyadiharikandeh School of Informatics and Computing Indiana University Bloomington, Indiana 47405 msayyadi@indiana.edu Raquel L. Hill School of Informatics and Computing Indiana University Bloomington, Indiana 47405 ralhill@indiana.edu ABSTRACT When evaluating job applications, recruiters and employers try to determine whether the information that is provided by a job seeker is accurate and whether it describes an in- dividual that possesses sufficient skills. These questions are related to the hidden skills and skills resolution problems. In this paper we argue that using a skills taxonomy to iden- tify and resolve unknown relationships between text that describe an applicant and job descriptions is the best way for addressing these problems. Unfortunately, no compre- hensive, publicly available taxonomy exists. To this end, this work proposes an automated process for creating a skills taxonomy. Effective and efficient methods for bootstrapping a taxon- omy are critical to any process that names and characterizes the properties and interrelationships of entities. To this end, we present three potential methods for bootstrapping and extending our skills taxonomy. We propose a hybrid scheme that combines the beneficial features of those methods. Our hybrid approach seeds the bootstrapping process with pub- licly available resources and identifies new skill terms and corresponding entity relationships. In this paper, we focus specifically on using Wikipedia as our corpus and exploiting its structure to populate the taxonomy. We begin by con- structing a relationship graph of possible skill terms from Wikipedia. We then use a data mining methodology to iden- tify skill terms. Our results are promising, and we are able to achieve a 98% classification rate. Keywords skill, ontology, bootstrapping, job application 1. INTRODUCTION Research data show that recruiters and employers screen applicants based on age, class, personality, and other charac- teristics that may not be gleaned from a resume. In an ideal ∗ Submitted to DDTA’16. Please do not circulate. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). c 2016 Copyright held by the owner/author(s). ACM ISBN XXXX. DOI: XXXXX situation, one might imagine that employers hire the most skilled applicant, but sociological research shows this is not the case [15, 12, 19, 21, 17, 8]. On the other hand, research also shows that job applicants lack the tools to express and present their skills. In addition, constant changes in technol- ogy limit recruiters ability to specify a skill set that meets job requirements [7]. Both issues contribute to the hidden skills problem, which concerns representing, identifying and measuring skills that haven’t been explicitly articulated in the job description or resume. For example a job descrip- tion may state that the applicant should know PHP, and an applicant’s resume lists Symphony as a skill. Symphony is a PHP web application framework, but the person that is evaluating the resume may not be able to recognize the rela- tionship between PHP and Symphony. Therefore the appli- cant has a hidden skill, PHP, which has not been explicitly articulated by the applicant, and may not be detected by the evaluator. In addition to the hidden skills problem, there is the skills resolution problem. The skills resolution problem involves using online social media and online blogs to iden- tify and confirm the skills of potential applicants. Software like hiringSolved [14] scrapes social media to build a po- tential skill cloud for applicants. The effectiveness of such software depends in part on the software’s ability to map text taken from social media to specific skills. This research seeks to address the hidden skills and skills resolution problems by proposing a framework for devel- oping a skills taxonomy that specifies a hierarchy of job skills and relationships between the specified skills. The proposed framework uses the web as its language corpus, and uses a hybrid model to extract relationship informa- tion from the corpus. It extends prior work by using the structure of web text to define rules for extracting informa- tion and to specify the relationships between skill terms. we present three potential methods for bootstrapping and ex- tending our skills taxonomy and propose a hybrid method that combines the beneficial features of those methods. Our hybrid approach seeds the bootstrapping process with pub- licly available resources. In this paper, we focus specifically on using Wikipedia as our corpus and exploiting its struc- ture to populate the taxonomy. We begin by constructing a relationship graph of possible skill terms related to com- puting domain from Wikipedia. We then use a data min- ing methodology to identify skill terms. Experiments show promising results, and we are able to achieve a 98% classifi- cation rate. The remainder of this paper is organized as follows. In section 2 we explain prior efforts for creating a skills taxon-
  • 2. omy. Section 3 describes a motivating example of the skills resolution problem and how a skill taxonomy can be used to resolve a job seeker’s skills from online data. Section 4 describes three different approaches for bootstrapping a taxanomy, and our hybrid approach. We also present the results of our initial experiments. We conclude the paper and discuss future work in Section 5. 2. RELATED WORKS The problem of creating a dataset or list of skills has been tackled before [4, 2, 10], but the results cannot be accessed freely for scientific projects. In addition, these datasets may not specify the relationship between skill terms, thereby lim- iting their use for solving the hidden skills and skills resolu- tion problems. Itsyourskills is a project that gets updated regularly, and its aim is to cover skills across industries and functions [4]. The project website offers an API for the developers but it is not freely available. Skill-project’s objective is to build a comprehensive and organized database of all human skills [2]. The developers claim to have built the largest, multilingual skills database by contribution of a skillful community. How- ever, there is no API available for developers to benefit from this project, and the manual process for generating the lists is time consuming. Bastian et al proposed a large-scale topic extraction and inference method for inferring skills from the LinkedIn pro- file of users [10]. They assume that ”Skills”are a data-driven feature on LinkedIn, which means that the LinkedIn recom- mender engine can use the data of the users and propose skills for them. However, LinkedIn also allows members to specify their skills and endorse (verify) the skills of others. Linkedin offers an API to developers but skills are not part of the attributes of basic profile data [1] and cannot be scraped. Though there are instances of skill data sets, as previously mentioned, they are not publicly available or they do not contain comprehensive relationship information. 3. MOTIVATING EXAMPLE: SKILLS RES- OLUTION PROBLEM In this section we present a simple example of the skill res- olution problem and discuss how a skill taxonomy provides an effective solution for this problem. We use active users of Stack Overflow [3] who have LinkedIn profiles to illustrate this problem. Our method for identifying skills included creating a skill cloud from the content of the replies that a user posts in the Stack Overflow forum. User responses have been evaluated by the forum’s community and assigned ratings of technical relevance. Therefore, we consider skills that are derived from highly rated responses to be reliable. We implemented three simple methods: extracting keywords from posts, using most frequently used tags, and using tags related to a pre-defined set of skills. For the first method of creating the skill cloud, we re- trieved user posts and removed stop words. We use term fre- quency and inverse document frequency (TF-IDF) to mea- sure the importance of terms and to identify possible skills. As shown in Figure 1, this approach produced noisy results. There were many irrelevant terms in the keyword cloud, which limits its usefulness as a skill cloud. To refine the noisy results of the first method, our second method uses Stack Overflow’s embedded structure to iden- Figure 1: Keyword Cloud using posts authored by user tify relevant skill terms. A useful feature of Stack Overflow is that the questions are associated with tags selected by the users. Tags are words or phrases describing the topic of the question and up to 5 tags are allowed when posting a question. We created another cloud for the same user using the frequency of the tags in his replies as the importance in- dicator. Using tag frequency results in more relevant terms, and less noise, as illustrated in Figure 2. Words like ’mod-rewrite’ or ’.htaccess’ are not relevant skill terms, and therefore an automatic way of intelligent fil- tering is needed. For our third method, we manually created a simple list of computer science related skills(flat table with no complex structure), and used it to enhance the accuracy of our skill cloud. Simply, we included in our skill cloud the tags from users’ post in Stack Overflow that also appeared in the skills list. Figure 3 shows that the list of irrelevant words are filtered out. Figure 2: Tag Cloud using frequent tags in posts authored by user However, the weakness of this method is that it discards possibly relevant words that have no exact match in the skills data set. Such terms may allow us to infer additional skills by further analysis. For example, tags like ’Array’ or ’List’ which indicate that the user has some knowledge of ’data structures’, can be used infer such knowledge. In order for an automatic agent to be able to do this inference task we would need to have a robust taxonomy that contains different entities with embedded relationships between them. Contextual domain information may be used to ease the design of such a taxonomy, but populating the instances for each entity would be time consuming task. In this paper we explore three potential approaches for bootstrapping a skill taxonomy from a initial set of skills. We also propose a
  • 3. Figure 3: Tag Cloud using frequent associated tags in posts with Skill dataset hybrid approach that seeds the bootstrapping process with publicly available data and identifies candidate skill terms. 4. METHODOLOGY Based on prior works we propose to develop a skills tax- onomy and a method capable of automatically bootstrap- ping the taxonomy. Bootstrapping is a process for learning relationship rules that alternates between learning rules or rule accuracy from sets of instances of included entities and finding instances using sets of rules. We studied different approaches for determining relationships between skills, and discovered several that in combination could help with creat- ing and bootstrapping our skills taxonomy. We present these works below and propose a hybrid approach that leverages the benefits of these prior works. 4.1 Word2Vec Word2Vec is a tool that creates a language model that enables a user to measure the semantic similarity between a queried word and words found in the input corpus [18]. The resulting language model contains continuous vector repre- sentations of words, and these vectors can be used to mea- sure word similarity. This is a nice feature which can be employed in our project for bootstrapping an initial data set of skills. Another useful feature of this tool is that you can use context to filter out irrelevant terms. Word2Vec can use a corpus like Wikipedia to create a language model, or any text corpus can be used to train a language model. We conducted preliminary experiments us- ing both Wikipedia(text8 corpus contains 100 megabytes of wikipedia content) and our own corpus that contained job descriptions scraped from famous job advertisement web- sites. Initial experiments using Word2Vec produced some noisy results that contained terms that did not fit our con- text. For example, our experiment assumed a computer science, programming languages context. When querying python with Wikipedia as our corpus,we get a list of words containing monty, moby, dracula, and circus. Table 1 shows the results of the top 10 similar words for ’python’. We also created our own corpus using job descriptions collected by using the Bing search engine API [6] to scrape Indeed.com website. Table 2 shows the results. While the terms seem to be more related to computer science, all terms cannot be considered as skills. To increase the overall accu- Table 1: Top 10 similar words to ’python’ by Word2Vec Results using pre-trained models on text8 corpus Term Similarity xml 0.84 PHP 0.81 JavaScript 0.81 txt 0.81 pdf 0.80 ldap 0.79 Xhtml 0.78 Searchable 0.77 Perl 0.75 ASCII 0.73 Table 2: Top 10 similar words to ’python’ by Word2Vec Results using trained model on our own corpus Term Similarity Object-oriented’ 0.78 Interfaces’ 0.75 Implementations’ 0.75 Compilers 0.71 Scripting 0.71 Functionality 0.71 PHP 0.70 javascript 0.70 Fortran 0.69 Programmers 0.69 racy of these results, we propose to use the structure of web text, such as Wikipedia articles, to specify semantic relation- ships between terms, thereby enabling us to filter irrelevant terms. 4.2 Exploiting Wikipedia Structure The research community in the domain of information re- trieval and semantic web often rely on Wikipedia as an ex- ternal source of knowledge. Additionally, researchers use the content and the embedded structure of the articles to enhance the performance of tasks like query expansion [9], named entity recognition [16, 20]. Mining the wiki can en- hance our bootstrapping process by providing a hierarchy of semantic relationships between terms. We think that Wikipedia is a convenient knowledge source, because it is a large corpus containing millions of articles about named entities. Wikipedia is perhaps the the biggest knowledge base in computing that can be mined in a more structured way than search engine results. In addition, it has more coverage than well-known ontologies like word net [22] Our interest is to analyze the content of Wikipedia and its internal structure in order to bootstrap our skills taxon- omy. We propose to use the Wikipedia API [5], a web service that provides convenient access to wiki features, data, and meta-data. The Wikipedia API enables us to retrieve the corresponding Wikipedia entry for each candidate word and extract categories of the associated article or the content of
  • 4. it. For instance, we use the first sentence of the entry(using simple lexico-syntactic pattern) to build a hierarchy of con- cepts and identify the relationship between terms. We use this relationship information to create a graph of terms with the links within the graph indicating a relationship between two adjacent terms. Relationship between terms in Wikipedia can be derived in many other ways, including using html hyperlinks within articles as indicators of relationships. This approach pro- duces a lot of noise in the graph. This noise should be fil- tered out because links can have different semantic meanings far beyond the expected semantic relationship in our taxon- omy. In addition to links, the Wikipedia ”category” tag may be used to indicate a relationship between terms. Using the category tag may still produce a large graph in terms of branching factor and multiple inheritance relations, which creates obstacles in searching and mining problems [22]. In our project we decided to use this tag for creating graph of wikipedia. We used the Wikipedia API to start scraping from one article with the title ’Python (programming language)’. Us- ing category relationships, we scraped parent categories and their subcategories of each category in the path with the length equal to 2 from starter node(python article). The resulted undirected graph had 1359 nodes and 1581 edges. Two human annotators manually annotated the nodes in the graph, identifying each node with a binary value of ei- ther 1 or 0 with 1 indicating a new skill term. We used the annotated data set as a training set for machine learning algorithms that would classify terms as either related or not related to IT skills. Approximately 80 percent of the nodes are not related to skills and 20 percent have some informa- tion related to skills. We used Gephi to visualize the graph and used Force Atlas algorithm to change the layout of the graph. Figure 4 shows the resulting Python-related graph. Nodes shown in green were labeled as skills. Figure 4: Resulted Wikipedia Graph starting by python article As we can see most of the desired nodes exist in the dense sub-graph. We defined some structure based features, and fit some models using different feature sets. So for this binary classification problem we used logistic regression, and decision tree to do the initial experiments. Note that we started by some minimal features(degree, avg degree of neighbors and exist-in-k-core) and incrementally added other features until the point we got satisfying re- Table 3: List of features Feature Definition Degree number of adjacent nodes AVG degree of neighbors’ average degree of adjacent nodes Exist in k-core whether node exist in 2-core subgraph No of neighbours as skills number of adjacent nodes within the path equal to 2 Path to skill minimum path to skill node Cluster no the id of the cluster that node is assigned Cosine similarity maximum cosine similarity value with skills in dataset No of similar skills number of skills that the cosine similarity value is above threshold Target column article is related to IT skills sults. We used cross validation with 10 folds and randomly chose 80 percent of the data set for the training set and 20 percent for the test set. As common in machine learning we found the fit model on training set and tested on the test set. Since 80 percent of the dataset has one label the maximum likelihood estimator(MLE) can predict with 80 percent ac- curacy by predicting every node to non-relevant. However, we consider MLE as a baseline and try to compare other algorithm to it. Table 4 shows accuracy, recall and F1-score of predictions. We can reframe the our problem here as a retrieval task. So far we have indeed treated the problem as a binary classification task, assuming that ‘being related to IT skill’ was the ‘True’ label, and ‘not related to IT skills’ the ‘False’ one. We can instead consider each as the set of “relevant”examples, and compute accuracy and recall scores accordingly. Accuracy shows the performance of classifier in average meaning that it calculates fraction of right predic- tions. Precision is the fraction of positive predicted samples that are truly related to IT skills over number of positive predictions. Precision = relevant samples ∩ Retrieved samples Retrieved samples Recall shows the fraction of the samples that are relevant to IT skills and are successfully retrieved. Recall = relevant samples ∩ Retrieved samples Relevant samples F1-score is defined as following: F1 − score = 2 × Precision × Recall Precision + Recall Weights of features in the fit model by logistic regression and most important decision tree rules shows that noOf- SkillsInNeighbour is the most important feature for predic- tion task. The proposed automatic method for using Wikipedia’s structure to identify relevant skills is scalable, and more fis- cally efficient than prior approaches that used Amazon’s Me- chanical Turks to find the associated articles [10]. However,
  • 5. Table 4: Performance of classifiers algorithm Acc. Prec. Rec. F1-score MLE 0.80 80 0.80 0.80 Logistic Regression 0.81 0.78 0.37 0.53 Decision Tree 0.98 0.98 0.94 0.96 additional work is needed to refine the resulting list and re- move the articles that are related to computing domain but cannot be considered as skills. 4.3 Co-occurrence on web Learning an ontology by web mining has been extensively used by researchers [23, 13] Sanches and Moreno proposed a methodology to build automatically an ontology, extract- ing information from the World Wide Web by the help of search engine from an initial keyword. Their method relies extensively on a publicly available search engine and ex- tracts concepts (based on its relation to the initial one and statistical data about appearance) [23]. KnowItall [11] is an information extraction system that can extract large numbers of high-quality instances of classes (for example instances of the class City) or relations (like in- stances of capitalOf(City, Country)). It is seeded with an extensible ontology and a few generic rule templates from which it creates text extraction rules for each class and rela- tion in its ontology. It is a Web-based information extraction system with a novel extract and-assess architecture that uses generic lexico-syntactic patterns and Web-scale statistics to extract class and relation instances from unstructured text on the Web. KNOWITALL evaluates the information it ex- tracts using statistics computed by treating the web as a large corpus of text(no structure). The common components of mentioned information re- trieval systems is that they use search engine using some initial terms and try to measure the co-occurrence of other terms in the search result with the seed terms. We employ these ideas for learning a skills taxonomy. We can use search engine operators to focus on certain types of websites that post job advertisements, like Indeed.com and Monster.com. In other words, we can exploit the co-occurrence values of terms in job descriptions for bootstrapping purposes. 4.4 Hybrid approach Knowing the strengths and weaknesses of different ap- proaches we decided to use a hybrid approach to build a more robust method for building a skills taxonomy. The best way to combine the outputs of three methods can be learned through experimental exploration. Again we pro- pose a machine learning based method in this level which takes input features from other prior levels and outputs la- bels. During the first phase of our experimental exploration, we will run each method with the initial terms in our taxon- omy and collect the results from each of them. Subsequently, we will define meta-features and feed them to the last layer classifier in order to do the final prediction task. Perhaps any method that we use to learn how to combine the outputs of the three previous methods is based on the assumption that a frequent term in a set of domain specific texts with close structure in Wikipedia indicates occurrence of a rele- vant concept in our taxonomy. The general framework that we propose is shown in Figure 5: Figure 5: Proposed framework for hybrid approach However, this is the macroscopic view of the hybrid method and details of how features from previous layers can collabo- ratively help a classifier to learn new skills can be understood by further experiments. 5. CONCLUSIONS Effective and efficient methods for bootstrapping of a tax- onomy are critical to any process that names and charac- terizes the properties and interrelationships of entities. We presented three potential methods for bootstrapping and ex- tending our skills taxonomy and propose a hybrid method that combines the beneficial features of those methods. Our hybrid approach seeds the bootstrapping process with pub- licly available resources. In this paper, we focus specifically on using Wikipedia as our corpus and exploiting its struc- ture to populate the taxonomy. We begin by constructing a relationship graph of possible skill terms from Wikipedia. We identified a set of features in order to characterize the nodes in Wikipedia graph. We were able to achieve 98% ac- curacy with decision tree in finding the related nodes within the graph that originated with one term. These results are promising, but limited. As we extend our relationship graph by searching for additional terms within Wikipedia or other related corpuses on the web, more noise is introduced. We are hopeful that our fully implemented hybrid model will eliminate irrelevant terms from the graph and produce accurate results. 6. ACKNOWLEDGMENTS This work is funded by the National Science Foundation: 1537768 7. REFERENCES [1] https://developer.linkedin.com/docs/fields/basic- profile. [2] http://skill-project.org. [3] http://stackoverflow.com/. [4] https://www.itsyourskills.com/list-of-all-skills. [5] https://www.mediawiki.org/wiki/api:mainpage. [6] http://www.bing.com/developers/s/apibasics.html. [7] A rare opportunity never knocks. [8] A. Acquisti and C. M. Fong. An experiment in hiring discrimination via online social networks (july 17, 2015). [9] M. Almasri, C. Berrut, and J.-P. Chevallet. Exploiting wikipedia structure for short query expansion in cultural heritage. In M.-F. Moens, C. Viard-Gaudin,
  • 6. H. Zargayouna, and O. R. Terrades, editors, CORIA-CIFED, pages 287–302. ARIA-GRCE, 2014. [10] M. Bastian, M. Hayes, W. Vaughan, S. Shah, P. Skomoroch, H. Kim, S. Uryasev, and C. Lloyd. Linkedin skills: Large-scale topic extraction and inference. In Proceedings of the 8th ACM Conference on Recommender Systems, RecSys ’14, pages 1–8, New York, NY, USA, 2014. ACM. [11] O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web, WWW ’04, pages 100–110, New York, NY, USA, 2004. ACM. [12] G. B. C. for Racial Equality, S. Carter, J. Hubbuck, Nottingham, and D. C. R. Council. Half a chance? : a report on job discrimination against young blacks in Nottingham. London : Commission for Racial Equality in association with Nottingham and District Community Relations Council, 1980. Nottinghamshire. Nottingham. Young coloured persons. Employment. Racial discrimination (BNB/PRECIS). [13] M. Hazman, S. R. El-Beltagy, and A. Rafea. Ontology learning from domain specific web documents. Int. J. Metadata Semant. Ontologies, 4(1/2):24–33, May 2009. [14] HiringSolved. hiringsolved.com. [15] R. Jowell and P. Prescott-Clarke. Racial discrimination and white-collar workers in britain. Race Class, 11(4):397–417, 1970. [16] J. Kazama and K. Torisawa. Exploiting wikipedia as external knowledge for named entity recognition. In Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 698–707, 2007. [17] T. Kneese, A. Rosenblat, and D. Boyd. Understanding fair labor practices in a networked age. October 2014. [18] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013. [19] P. A. Riach and J. Rich. Testing for racial discrimination in the labour market. Cambridge Journal of Economics, 15(3):239–256, 1991. [20] A. E. Richman and P. Schone. Mining wiki resources for multilingual named entity recognition,ˆa˘A˙I aclˆa˘A´Z08, 2008. [21] L. A. Rivera. Hiring as cultural matching: The case of elite professional service firms. American Sociological Review, 77(6):999–1022, 2012. [22] M. Strube and S. P. Ponzetto. Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2, AAAI’06, pages 1419–1424. AAAI Press, 2006. [23] D. S˜AANCHEZ and A. MORENO. Creating ontologies from web documents. In Recent Advances in Artificial Intelligence Research and Development. IOS Press, 113, pp.11-18.