This paper proposes a methodology for identifying
hot topics and tracking technology trends from the patent
domain. The methodology uses frequency information in
combination with the International Patent Classification (IPC) to
capture semantic information on word categorization, doing so in
a way that heretofore has not been employed for topic detection
and trend tracking. Term Frequency and Proportional Document
Frequency (TF*PDF) is employed as a means to detect hot topics
from patents, and IPCs are used to calculate semantic
importance of terms based on the IPCs where terms are
distributed. Aging Theory is also used to calculate the variation
of trends over time. Four types of trends including very stable
trends, stable trends, normal trends, and unstable trends are
defined and evaluated based on TF*PDF and TF*PDF combined
with Aging Theory. Experiment results show that for very stable
trends, the combination of TF*PDF and Aging Theory achieves
0.976% in Precision; for stable trends and all trends, TF*PDF
achieves 0.959% and 0.84% in Precision, respectively. By
applying TF*PDF in consideration of semantic information, we
also show a new criteria for weighting hot topics and technology
trend tracking.
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
This document proposes an ontology and text mining technique to select research papers. It involves 3 phases: 1) constructing a research ontology using keywords and frequencies from past papers, 2) classifying new papers based on ontology keywords, and 3) clustering papers in each domain using text mining and the K-means algorithm. The technique aims to better group papers and assign them to relevant reviewers by addressing limitations of keyword-based methods. It constructs a research ontology, classifies papers, clusters them based on textual similarities, and systematically assigns papers to reviewers.
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
This study analyzed the referencing of grey literature in scientific papers from the ACM Digital Library and proposed techniques to boost the retrieval of grey literature in scientific paper information retrieval systems. The study found that grey literature materials were referenced in around 16% of the bibliographic references analyzed. It then proposed a boosting technique that assigns a boosting weight to increase the ranking of documents that are grey literature or reference grey literature frequently. An experiment found the boosting technique improved retrieval of grey literature for literature search queries compared to baseline techniques. The study concluded by discussing limitations and opportunities for future work applying these techniques to recommender systems.
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
The document compares techniques for measuring research coverage in scientific papers, including HITS, Topical and Peripheral Coverage (TPC), and Topical Coverage (TC). An experiment on papers in information retrieval found that TPC provided the best results by identifying a diverse set of papers, recent papers, and papers covering various sub-topics. TPC and TC performed better than HITS at identifying seminal papers. Combining TPC and HITS may further improve identification of survey papers. The document concludes that the integrated TPC and HITS technique is best for building initial reading lists for literature review.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
LEAN THINKING IN SOFTWARE ENGINEERING: A SYSTEMATIC REVIEWijseajournal
The field of Software Engineering has suffered considerable transformation in the last decades due to the influence of the philosophy of Lean Thinking. The purpose of this systematic review is to identify practices and approaches proposed by researchers in this area in the last 5 years, who have worked under the influence of this thinking. The search strategy brought together 549 studies, 80 of which were classified as
relevant for synthesis in this review. Seventeen tools of Lean Thinking adapted to Software Engineering were catalogued, as well as 35 practices created for the development of software that has been influenced by this philosophy. The study rovides a roadmap of results with the current state of the art and the identification of gaps pointing to opportunities for further esearch.
This document summarizes a lecture on research methodology given by Dr. Said Mirza Pahlevi. The lecture covered four main topics: 1) computer science as a discipline, 2) the nature of research, 3) types of research methodology, and 4) characteristics and roles of research. The lecture defined computer science, discussed what constitutes research, described quantitative, qualitative and design research methods, and outlined the roles of different types of researchers.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
This document presents a systematic literature review of 38 empirical studies on factors relating to successful business intelligence (BI) system implementation. The review identified 10 key factors that frequently influenced implementation success based on their frequency of occurrence in the literature. These factors included management support, data source systems, organizational resources, IT infrastructure, vision, champion, team skills, project manager, user participation, and change management. The study aims to help researchers better identify relevant studies for literature reviews on factors impacting BI system implementation.
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
This document proposes an ontology and text mining technique to select research papers. It involves 3 phases: 1) constructing a research ontology using keywords and frequencies from past papers, 2) classifying new papers based on ontology keywords, and 3) clustering papers in each domain using text mining and the K-means algorithm. The technique aims to better group papers and assign them to relevant reviewers by addressing limitations of keyword-based methods. It constructs a research ontology, classifies papers, clusters them based on textual similarities, and systematically assigns papers to reviewers.
More Than Just Black and White: A Case for Grey Literature References in Scie...Aravind Sesagiri Raamkumar
This study analyzed the referencing of grey literature in scientific papers from the ACM Digital Library and proposed techniques to boost the retrieval of grey literature in scientific paper information retrieval systems. The study found that grey literature materials were referenced in around 16% of the bibliographic references analyzed. It then proposed a boosting technique that assigns a boosting weight to increase the ranking of documents that are grey literature or reference grey literature frequently. An experiment found the boosting technique improved retrieval of grey literature for literature search queries compared to baseline techniques. The study concluded by discussing limitations and opportunities for future work applying these techniques to recommender systems.
Comparison of Techniques for Measuring Research Coverage of Scientific Papers...Aravind Sesagiri Raamkumar
The document compares techniques for measuring research coverage in scientific papers, including HITS, Topical and Peripheral Coverage (TPC), and Topical Coverage (TC). An experiment on papers in information retrieval found that TPC provided the best results by identifying a diverse set of papers, recent papers, and papers covering various sub-topics. TPC and TC performed better than HITS at identifying seminal papers. Combining TPC and HITS may further improve identification of survey papers. The document concludes that the integrated TPC and HITS technique is best for building initial reading lists for literature review.
Presentation made on December 7th 2016 during ICADL'16
Full text can be found at http://link.springer.com/chapter/10.1007/978-3-319-49304-6_12
Extended version can be found at https://arxiv.org/abs/1609.01415
LEAN THINKING IN SOFTWARE ENGINEERING: A SYSTEMATIC REVIEWijseajournal
The field of Software Engineering has suffered considerable transformation in the last decades due to the influence of the philosophy of Lean Thinking. The purpose of this systematic review is to identify practices and approaches proposed by researchers in this area in the last 5 years, who have worked under the influence of this thinking. The search strategy brought together 549 studies, 80 of which were classified as
relevant for synthesis in this review. Seventeen tools of Lean Thinking adapted to Software Engineering were catalogued, as well as 35 practices created for the development of software that has been influenced by this philosophy. The study rovides a roadmap of results with the current state of the art and the identification of gaps pointing to opportunities for further esearch.
This document summarizes a lecture on research methodology given by Dr. Said Mirza Pahlevi. The lecture covered four main topics: 1) computer science as a discipline, 2) the nature of research, 3) types of research methodology, and 4) characteristics and roles of research. The lecture defined computer science, discussed what constitutes research, described quantitative, qualitative and design research methods, and outlined the roles of different types of researchers.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
This document presents a systematic literature review of 38 empirical studies on factors relating to successful business intelligence (BI) system implementation. The review identified 10 key factors that frequently influenced implementation success based on their frequency of occurrence in the literature. These factors included management support, data source systems, organizational resources, IT infrastructure, vision, champion, team skills, project manager, user participation, and change management. The study aims to help researchers better identify relevant studies for literature reviews on factors impacting BI system implementation.
Sabrina is a PhD student interested in studying agile software development teams. She needs to select a research method but is unfamiliar with the options. Dr. Who recommends Grounded Theory (GT) as a way to generate a new theory by collecting qualitative data from practitioners. However, Sabrina finds the GT literature complex. The patterns in this document provide an overview of GT procedures to help make it more accessible for software engineering researchers. They describe how to get started with GT by reading key books and examples, applying for ethics approval to collect data, and avoiding an initial hypothesis to allow theory to emerge from the data.
Big data is prevalent in our daily life. Not surprisingly, big data becomes a hot topic discussedby commercial worlds, media, magazines, general publics and elsewhere. From academic point of view, isit a research area of potential worth being explored? Or it is just another hype? Are there only computer orIS related scholars suitable for big data research due to its nature? Or scholars from other research areas are alsosuitable for this subject? This study aims to answer these questions through the use of informetricsapproach and data source form the SSCI Journal database, leveraging informetric‟s robust natures ofquantitative power of analyze information in any form onto the data source of representativeness. This research shows that big data research is at its growth phase with an exponential growth patternsince 2012 and with great potential for years to come. And perhaps surprisingly, computer or IS relateddisciplinesare not on the top 5 research areas fromthis research results. In fact, the top five research disciplinesare more diversified then expected: business economics (#1), Government Law (#2), InformationScience/ Library Science (#3), Social Science (#4) and Computer Science (#5). Scholars from the USuniversities are the most productive in this subject while Asian countries, including Taiwan, are alsovisible. Besides, this study also identifies that big data publications from SSCI journal database during2005-2015 do fit Lotka‟s law. This study contributes tounderstand the current big data research trends and also show the ways toresearchers who are interested to conduct future research in big data regardless of their research backgrounds.
- The document discusses evaluation and assessment in software engineering research and argues for a greater focus on design science.
- Design science aims to produce prescriptive knowledge for professionals through rigorous design and evaluation of solutions in context. This contrasts with explanatory science which studies existing practice.
- Three key aspects of design science research are proposed - the technological rule or theory, empirical instances of problems and solutions, and assessment of the knowledge produced.
- Choosing the right research context, balancing rigor with relevance through appropriate methods, and assessing contributions in terms of relevance, rigor and novelty are discussed as important aspects of design science.
Business Process Management Research As An Interdisciplinary Fieldharryyjin
The document provides an overview of research on business process management (BPM) as an interdisciplinary field. It analyzes 115 BPM research articles published between 2000 and 2009 in various disciplines. Key findings include that BPM research is emerging rapidly, with Hammer's book on business process reengineering being highly influential. However, the technical side of BPM represents the mainstream of research. Popular topics include web services, Petri nets, process mining, and supply chain management. The analysis provides a comprehensive picture of BPM as a multidisciplinary research area.
This document outlines the format and structure of a research report. It discusses the different types of research reports, including technical reports, popular reports, interim reports, and summary reports. It notes that the intended audience and purpose should be considered when determining the type of report. The document also details the typical sections included in a research report, such as the title page, abstract, introduction, methodology, results, and conclusions. It emphasizes that a research report must be well-organized, complete, and carefully written to effectively communicate the research findings to peers in the field.
Universidad Técnica Particular de Loja
Ciclo Académico Abril Agosto 2011
Carrera: Inglés
Docente: Lic. Alba Bitalina Vargas Saritama
Ciclo: Séptimo
Bimestre: Segundo
Research on Image Recognition and Tracking Based on Knowledge Mappingijtsrd
During 2015 2020, 489 articles with the theme of target , recognition and tracking were selected as the research objects, which were included in CNKI database. This paper analyzes the amount of papers, research power and research hotspots of target recognition and tracking in China, and obtains two research fields by using cluster analysis of high frequency keywords, which are recognition field and tracking field, in order to grasp its development trend and provide reference for theoretical research. Chen Rong "Research on Image Recognition and Tracking Based on Knowledge Mapping" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-6 , October 2020, URL: https://www.ijtsrd.com/papers/ijtsrd35700.pdf Paper Url: https://www.ijtsrd.com/engineering/information-technology/35700/research-on-image-recognition-and-tracking-based-on-knowledge-mapping/chen-rong
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
Technological Route between Pioneerism and ImprovementRoberto Nani
The document presents a methodology for determining the technological route between pioneering inventions and incremental improvements using patent data. The methodology involves: (1) performing an initial patent search, (2) analyzing patent trends over time to calculate an "intellectual property density", (3) fitting the density data to logistic curves to identify major technological phases, and (4) using text clustering to identify new areas of innovation. As a case study, the methodology is applied to analyze the evolution of textile loom weft insertion technologies from 1932 to 2008. Three major technological phases were identified from the logistic curve fitting, each characterized by midpoint year and other metrics.
Data Mining of Project Management Data: An Analysis of Applied Research Studies.Gurdal Ertek
Data collected and generated through and posterior to projects, such as data residing in project management software and post project review documents, can be a major source of actionable insights and competitive advantage. This paper presents a rigorous
methodological analysis of the applied research published in academic literature, on the application of data mining (DM) for project management (PM). The objective of the paper is to provide a comprehensive analysis and discussion of where and how data mining is applied for project management data and to provide practical insights for future research in the field.
https://dl.acm.org/citation.cfm?id=3176714
https://ertekprojects.com/ftp/papers/2017/ertek_et_al_2017_Data_Mining_of_Project_Management_Data.pdf
This document provides guidance on writing the methodology chapter of a thesis. It discusses the key purposes and components of the methodology chapter, including an introduction, research questions or objectives, research design and framework, data collection methods, data analysis procedures, and considerations of reliability and validity. Examples of methodology chapter outlines and components are also provided for reference. The document emphasizes using the past tense and passive voice when describing the methodology.
A guide to deal with uncertainties in software project managementijcsit
Various project management approaches do not consider the impact that uncertainties have on the project.
The identified threats by uncertainty in a projec day-to-day are real and immediate and the expectations in
a project are often high. The project manager faces a dilemma: decisions must be made in the present
about future situations which are inherently uncertain. The use of uncertainty management in project can
be a determining factor for the project success. This paper presents a systematic review about uncertainties
management in software projects and a guide is proposed based on the review. It aims to present the best
practices to manage uncertainties in software projects in a structured way including techniques and
strategies to uncertainties containment.
This document discusses three frameworks used by Gartner Group to analyze information systems research: the technology maturity curve, adoption curve, and identification of strategic technologies. The maturity curve tracks how a technology matures over time through various stages from embryonic to obsolescence. The adoption curve shows how technologies are adopted cumulatively by organizations over time. Considering where technologies fall on these curves can provide insights into appropriate research questions and methodologies. Identifying strategic technologies may help determine promising areas for new research.
This document discusses three frameworks used by Gartner Group to analyze information systems research: the technology maturity curve, adoption curve, and identification of strategic technologies. The maturity curve tracks how a technology matures over time through various stages from embryonic to obsolescence. The adoption curve shows how technologies are adopted cumulatively by organizations over time. Considering where technologies fall on these curves can provide insights into appropriate research questions and methodologies. Identifying strategic technologies may help determine promising areas for new research.
Towards a Software Engineering Research Framework: Extending Design Science R...IRJET Journal
This document proposes a framework for software engineering research that extends design science research. It discusses how software engineering is a relatively young discipline driven by technical innovations and trends. While much research has explored solutions, fundamental problems still exist. The proposed framework aims to consider both research paradigms and the theoretical and trans-disciplinary foundations of software engineering as an applied discipline. The framework includes elements of defining the research problem, determining if the research is theory-oriented or practice-oriented, incorporating relevant theories and knowledge, and employing a design-build-test-evaluate cycle. The goal is to provide a model that integrates prescriptions from different research paradigms while accounting for software engineering's characteristics as an applied field.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
IRJET- Characteristics of Research Process and Methods for Web-Based Rese...IRJET Journal
This document discusses characteristics of web-based research support systems (WRSS). It presents a framework for WRSS that focuses on supporting various phases of the research process through different information systems and sub-systems. WRSS aim to help scientists find relevant information, choose appropriate tools, and effectively present research results. The document provides an overview of the research process and phases that WRSS could support, such as idea generation, problem definition, planning experiments, analyzing data, and disseminating findings.
This document presents a study on the theoretical and empirical validation that has been done on aspect-oriented software maintainability metrics. It describes the methodology used, which involved searching literature sources and selecting papers related to aspect-oriented maintainability metrics. The results are discussed in tables, showing that most papers focus on empirical validation of metrics rather than theoretical validation. Several papers are described that empirically validated specific metrics related to maintainability. However, the study notes that more theoretical validation of metrics is still needed before empirical validation. Threats to validity, such as bias and limited data extraction, are also presented.
A Comprehensive Survey on Comparisons across Contextual Pre-Filtering, Contex...TELKOMNIKA JOURNAL
Recently, there has been growing interest in recommender systems (RS) and particularly in context-aware RS. Methods for generating context-aware recommendations are classified into pre-filtering, post-filtering and contextual modelling approaches. In this paper, we present the several novel approaches of the different variant of each of these three contextualization paradigms and present a complete survey on the state-of-the-art comparisons across them. We then identify the significant challenges that require being addressed by the current RS researchers, which will help academicians and practitioners in comparing these three approaches to select the best alternative according to their strategies.
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
This paper proposes a method to mine rare sequential topic patterns (URSTPs) from tweet data. It involves preprocessing tweets to extract topics, identifying user sessions, generating sequential topic pattern (STP) candidates, and selecting URSTPs based on rarity analysis. Experiments show the approach can identify special users and interpretable URSTPs, indicating users' characteristics. The paper aims to capture personalized and abnormal user behaviors through sequential relationships between extracted topics from successive tweets.
The social dynamics of software developmentaliaalistartup
This document summarizes a study examining the software procurement processes between a university and several software vendors over a decade. It presents three case studies of information systems development histories and analyzes them using a social process model to depict how relationships evolved over time. Major events that changed the relationship between parties were identified as "encounters", with stable periods in between labeled "episodes". While traditional models view procurement statically, this longitudinal study revealed the dynamic nature of procurement strategies over time in the case studies.
Sabrina is a PhD student interested in studying agile software development teams. She needs to select a research method but is unfamiliar with the options. Dr. Who recommends Grounded Theory (GT) as a way to generate a new theory by collecting qualitative data from practitioners. However, Sabrina finds the GT literature complex. The patterns in this document provide an overview of GT procedures to help make it more accessible for software engineering researchers. They describe how to get started with GT by reading key books and examples, applying for ethics approval to collect data, and avoiding an initial hypothesis to allow theory to emerge from the data.
Big data is prevalent in our daily life. Not surprisingly, big data becomes a hot topic discussedby commercial worlds, media, magazines, general publics and elsewhere. From academic point of view, isit a research area of potential worth being explored? Or it is just another hype? Are there only computer orIS related scholars suitable for big data research due to its nature? Or scholars from other research areas are alsosuitable for this subject? This study aims to answer these questions through the use of informetricsapproach and data source form the SSCI Journal database, leveraging informetric‟s robust natures ofquantitative power of analyze information in any form onto the data source of representativeness. This research shows that big data research is at its growth phase with an exponential growth patternsince 2012 and with great potential for years to come. And perhaps surprisingly, computer or IS relateddisciplinesare not on the top 5 research areas fromthis research results. In fact, the top five research disciplinesare more diversified then expected: business economics (#1), Government Law (#2), InformationScience/ Library Science (#3), Social Science (#4) and Computer Science (#5). Scholars from the USuniversities are the most productive in this subject while Asian countries, including Taiwan, are alsovisible. Besides, this study also identifies that big data publications from SSCI journal database during2005-2015 do fit Lotka‟s law. This study contributes tounderstand the current big data research trends and also show the ways toresearchers who are interested to conduct future research in big data regardless of their research backgrounds.
- The document discusses evaluation and assessment in software engineering research and argues for a greater focus on design science.
- Design science aims to produce prescriptive knowledge for professionals through rigorous design and evaluation of solutions in context. This contrasts with explanatory science which studies existing practice.
- Three key aspects of design science research are proposed - the technological rule or theory, empirical instances of problems and solutions, and assessment of the knowledge produced.
- Choosing the right research context, balancing rigor with relevance through appropriate methods, and assessing contributions in terms of relevance, rigor and novelty are discussed as important aspects of design science.
Business Process Management Research As An Interdisciplinary Fieldharryyjin
The document provides an overview of research on business process management (BPM) as an interdisciplinary field. It analyzes 115 BPM research articles published between 2000 and 2009 in various disciplines. Key findings include that BPM research is emerging rapidly, with Hammer's book on business process reengineering being highly influential. However, the technical side of BPM represents the mainstream of research. Popular topics include web services, Petri nets, process mining, and supply chain management. The analysis provides a comprehensive picture of BPM as a multidisciplinary research area.
This document outlines the format and structure of a research report. It discusses the different types of research reports, including technical reports, popular reports, interim reports, and summary reports. It notes that the intended audience and purpose should be considered when determining the type of report. The document also details the typical sections included in a research report, such as the title page, abstract, introduction, methodology, results, and conclusions. It emphasizes that a research report must be well-organized, complete, and carefully written to effectively communicate the research findings to peers in the field.
Universidad Técnica Particular de Loja
Ciclo Académico Abril Agosto 2011
Carrera: Inglés
Docente: Lic. Alba Bitalina Vargas Saritama
Ciclo: Séptimo
Bimestre: Segundo
Research on Image Recognition and Tracking Based on Knowledge Mappingijtsrd
During 2015 2020, 489 articles with the theme of target , recognition and tracking were selected as the research objects, which were included in CNKI database. This paper analyzes the amount of papers, research power and research hotspots of target recognition and tracking in China, and obtains two research fields by using cluster analysis of high frequency keywords, which are recognition field and tracking field, in order to grasp its development trend and provide reference for theoretical research. Chen Rong "Research on Image Recognition and Tracking Based on Knowledge Mapping" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-6 , October 2020, URL: https://www.ijtsrd.com/papers/ijtsrd35700.pdf Paper Url: https://www.ijtsrd.com/engineering/information-technology/35700/research-on-image-recognition-and-tracking-based-on-knowledge-mapping/chen-rong
Research on Image Recognition and Tracking Based on Knowledge Mapping
Similar to Hot Topic Detection and Technology Trend Tracking for Patents utilizing Term Frequency and Proportional Document Frequency and Semantic Information
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
Technological Route between Pioneerism and ImprovementRoberto Nani
The document presents a methodology for determining the technological route between pioneering inventions and incremental improvements using patent data. The methodology involves: (1) performing an initial patent search, (2) analyzing patent trends over time to calculate an "intellectual property density", (3) fitting the density data to logistic curves to identify major technological phases, and (4) using text clustering to identify new areas of innovation. As a case study, the methodology is applied to analyze the evolution of textile loom weft insertion technologies from 1932 to 2008. Three major technological phases were identified from the logistic curve fitting, each characterized by midpoint year and other metrics.
Data Mining of Project Management Data: An Analysis of Applied Research Studies.Gurdal Ertek
Data collected and generated through and posterior to projects, such as data residing in project management software and post project review documents, can be a major source of actionable insights and competitive advantage. This paper presents a rigorous
methodological analysis of the applied research published in academic literature, on the application of data mining (DM) for project management (PM). The objective of the paper is to provide a comprehensive analysis and discussion of where and how data mining is applied for project management data and to provide practical insights for future research in the field.
https://dl.acm.org/citation.cfm?id=3176714
https://ertekprojects.com/ftp/papers/2017/ertek_et_al_2017_Data_Mining_of_Project_Management_Data.pdf
This document provides guidance on writing the methodology chapter of a thesis. It discusses the key purposes and components of the methodology chapter, including an introduction, research questions or objectives, research design and framework, data collection methods, data analysis procedures, and considerations of reliability and validity. Examples of methodology chapter outlines and components are also provided for reference. The document emphasizes using the past tense and passive voice when describing the methodology.
A guide to deal with uncertainties in software project managementijcsit
Various project management approaches do not consider the impact that uncertainties have on the project.
The identified threats by uncertainty in a projec day-to-day are real and immediate and the expectations in
a project are often high. The project manager faces a dilemma: decisions must be made in the present
about future situations which are inherently uncertain. The use of uncertainty management in project can
be a determining factor for the project success. This paper presents a systematic review about uncertainties
management in software projects and a guide is proposed based on the review. It aims to present the best
practices to manage uncertainties in software projects in a structured way including techniques and
strategies to uncertainties containment.
This document discusses three frameworks used by Gartner Group to analyze information systems research: the technology maturity curve, adoption curve, and identification of strategic technologies. The maturity curve tracks how a technology matures over time through various stages from embryonic to obsolescence. The adoption curve shows how technologies are adopted cumulatively by organizations over time. Considering where technologies fall on these curves can provide insights into appropriate research questions and methodologies. Identifying strategic technologies may help determine promising areas for new research.
This document discusses three frameworks used by Gartner Group to analyze information systems research: the technology maturity curve, adoption curve, and identification of strategic technologies. The maturity curve tracks how a technology matures over time through various stages from embryonic to obsolescence. The adoption curve shows how technologies are adopted cumulatively by organizations over time. Considering where technologies fall on these curves can provide insights into appropriate research questions and methodologies. Identifying strategic technologies may help determine promising areas for new research.
Towards a Software Engineering Research Framework: Extending Design Science R...IRJET Journal
This document proposes a framework for software engineering research that extends design science research. It discusses how software engineering is a relatively young discipline driven by technical innovations and trends. While much research has explored solutions, fundamental problems still exist. The proposed framework aims to consider both research paradigms and the theoretical and trans-disciplinary foundations of software engineering as an applied discipline. The framework includes elements of defining the research problem, determining if the research is theory-oriented or practice-oriented, incorporating relevant theories and knowledge, and employing a design-build-test-evaluate cycle. The goal is to provide a model that integrates prescriptions from different research paradigms while accounting for software engineering's characteristics as an applied field.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
IRJET- Characteristics of Research Process and Methods for Web-Based Rese...IRJET Journal
This document discusses characteristics of web-based research support systems (WRSS). It presents a framework for WRSS that focuses on supporting various phases of the research process through different information systems and sub-systems. WRSS aim to help scientists find relevant information, choose appropriate tools, and effectively present research results. The document provides an overview of the research process and phases that WRSS could support, such as idea generation, problem definition, planning experiments, analyzing data, and disseminating findings.
This document presents a study on the theoretical and empirical validation that has been done on aspect-oriented software maintainability metrics. It describes the methodology used, which involved searching literature sources and selecting papers related to aspect-oriented maintainability metrics. The results are discussed in tables, showing that most papers focus on empirical validation of metrics rather than theoretical validation. Several papers are described that empirically validated specific metrics related to maintainability. However, the study notes that more theoretical validation of metrics is still needed before empirical validation. Threats to validity, such as bias and limited data extraction, are also presented.
A Comprehensive Survey on Comparisons across Contextual Pre-Filtering, Contex...TELKOMNIKA JOURNAL
Recently, there has been growing interest in recommender systems (RS) and particularly in context-aware RS. Methods for generating context-aware recommendations are classified into pre-filtering, post-filtering and contextual modelling approaches. In this paper, we present the several novel approaches of the different variant of each of these three contextualization paradigms and present a complete survey on the state-of-the-art comparisons across them. We then identify the significant challenges that require being addressed by the current RS researchers, which will help academicians and practitioners in comparing these three approaches to select the best alternative according to their strategies.
Mining Users Rare Sequential Topic Patterns from Tweets based on Topic Extrac...IRJET Journal
This paper proposes a method to mine rare sequential topic patterns (URSTPs) from tweet data. It involves preprocessing tweets to extract topics, identifying user sessions, generating sequential topic pattern (STP) candidates, and selecting URSTPs based on rarity analysis. Experiments show the approach can identify special users and interpretable URSTPs, indicating users' characteristics. The paper aims to capture personalized and abnormal user behaviors through sequential relationships between extracted topics from successive tweets.
The social dynamics of software developmentaliaalistartup
This document summarizes a study examining the software procurement processes between a university and several software vendors over a decade. It presents three case studies of information systems development histories and analyzes them using a social process model to depict how relationships evolved over time. Major events that changed the relationship between parties were identified as "encounters", with stable periods in between labeled "episodes". While traditional models view procurement statically, this longitudinal study revealed the dynamic nature of procurement strategies over time in the case studies.
Nowadays much is written about how to manage projects, but too little on what really happens in project
actuality. Project Actuality came out in the Rethinking Project Management (RPM) agenda in 2006 and it
aims at understanding what really happens at project context. To be able to understand project actuality
phenomenon, we first need to get a better comprehension on its definition and discover how to observe it
and analyse it. This paper presents the results of the systematic review conducted to collect evidence on
Project Actuality. The research focused on four search engines, in publications from 1994 to 2013. Among
others, the study concludes that project actuality has been analysed by several methods and techniques,
mostly on large organization and public sectors, in Northern Europe. The most common definitions,
techniques, and tips were identified as well as the intent of transforming the results in knowledge.
An effective pre processing algorithm for information retrieval systemsijdms
The Internet is probably the most successful distributed computing system ever. However, our capabilities
for data querying and manipulation on the internet are primordial at best. The user expectations are
enhancing over the period of time along with increased amount of operational data past few decades. The
data-user expects more deep, exact, and detailed results. Result retrieval for the user query is always
relative o the pattern of data storage and index. In Information retrieval systems, tokenization is an
integrals part whose prime objective is to identifying the token and their count. In this paper, we have
proposed an effective tokenization approach which is based on training vector and result shows that
efficiency/ effectiveness of proposed algorithm. Tokenization on documents helps to satisfy user’s
information need more precisely and reduced search sharply, is believed to be a part of information
retrieval. Pre-processing of input document is an integral part of Tokenization, which involves preprocessing
of documents and generates its respective tokens which is the basis of these tokens probabilistic
IR generate its scoring and gives reduced search space. The comparative analysis is based on the two
parameters; Number of Token generated, Pre-processing time.
Creation of Software Focusing on Patent AnalysisIRJET Journal
The document describes a software system created by students to analyze patents by extracting problems and solutions through text mining techniques like stemming and classification. The system uses stemming algorithms to reduce words to their root forms and removes stop words to make the analysis more efficient. It identifies problem and solution indicators in patent texts and classifies the problems and solutions to help users better understand patent information. The students evaluated the system on a set of patents and aim to expand it to handle more patents and provide more advanced clustering and mapping of analysis results.
This document provides an overview of how TRIZ (Theory of Inventive Problem Solving) and TRM (Technology Road Mapping) can be integrated based on previous literature. It describes the understanding of TRIZ and TRM, including their individual strengths. Three modes of integration are proposed: 1) Applying TRIZ concepts to enhance the TRM process, 2) Applying TRM concepts to enhance the TRIZ innovation process, and 3) Applying TRIZ methodology to link successive roadmapping processes. Previous efforts that focus on enhancement of TRM with TRIZ techniques are also discussed.
Process Mining in Supply Chains: A Systematic Literature Review IJECEIAES
Performance analysis and continuous process improvement efforts are often supported by the construction of process models representing the interactions of the partners in the supply chain. This study was conducted to determine the state of the art in the process mining field, specifically in the context of cross-organizational process. The Systematic Literature Review (SLR) method is used to review a collection of twenty-one papers that are classified according to the Artifact framework of Hevner, et al. and within the Process Mining framework of Van der Aalst. In the reviewed papers, the authors conducted a variety of techniques to establish the event log, which is then used to perform the process mining analysis. Eight of the reviewed papers focus on the definition of concepts or measures. Five of the papers describe models and other abstractions that are used as a theoretical basis for process mining in the context of supply chains. The majority twenty of papers describe some kind of informal method or formal algorithm to perform process mining analysis. Nine of the papers that propose a formal algorithm also present an accompanying software implementation. Eight papers discuss the data preparation challenges and twelve papers discuss process discovery techniques.
This summarizes an academic paper that proposes an automatic ontology creation method for classifying research papers. It uses text mining techniques like classification and clustering algorithms. It first builds a research ontology by extracting keywords and patterns from previous papers. It then uses a decision tree algorithm to classify new papers into disciplines defined in the ontology. The classified papers are then clustered based on similarities to group them. The method was tested on a dataset of 100 papers and achieved average precision of 85.7% for term-based and 89.3% for pattern-based keyword extraction.
Similar to Hot Topic Detection and Technology Trend Tracking for Patents utilizing Term Frequency and Proportional Document Frequency and Semantic Information (20)
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Influence of Marketing Strategy and Market Competition on Business Plan
Hot Topic Detection and Technology Trend Tracking for Patents utilizing Term Frequency and Proportional Document Frequency and Semantic Information
1. Hot Topic Detection and Technology Trend Tracking
for Patents utilizing Term Frequency and Proportional
Document Frequency and Semantic Information
Khanh-Ly Nguyen1
, Byung-Joo Shin2
, Seong Joon Yoo3
Dept. of Computer Engineering
Sejong University
Seoul, Republic of Korea
khanhly4682@gmail.com1
, {bjshin2
, sjyoo3
}@sejong.ac.kr
Abstract—This paper proposes a methodology for identifying
hot topics and tracking technology trends from the patent
domain. The methodology uses frequency information in
combination with the International Patent Classification (IPC) to
capture semantic information on word categorization, doing so in
a way that heretofore has not been employed for topic detection
and trend tracking. Term Frequency and Proportional Document
Frequency (TF*PDF) is employed as a means to detect hot topics
from patents, and IPCs are used to calculate semantic
importance of terms based on the IPCs where terms are
distributed. Aging Theory is also used to calculate the variation
of trends over time. Four types of trends including very stable
trends, stable trends, normal trends, and unstable trends are
defined and evaluated based on TF*PDF and TF*PDF combined
with Aging Theory. Experiment results show that for very stable
trends, the combination of TF*PDF and Aging Theory achieves
0.976% in Precision; for stable trends and all trends, TF*PDF
achieves 0.959% and 0.84% in Precision, respectively. By
applying TF*PDF in consideration of semantic information, we
also show a new criteria for weighting hot topics and technology
trend tracking.
Keywords—Technology forecast; Trend analysis; Patent
analysis; Topic detection; Hot term extraction
I. INTRODUCTION
The increasing amount of patent applications and the
growing need to access patent information make the task of
patent analysis become vital to 1) analyze large amounts of
patent data that is expensive being done by human, 2) increase
the quality of generating useful information, and 3) support
decision making processes to eventually increase the quality
of the patents. Patent intelligence is used to encourage the
development of innovative products, devise technology
strategies, and reveal legal/business insights amid the
technical transformation.
Various tools and techniques have been developed for use
in detecting trends and forecasting future developments from
news stories or patent documents. In the patent domain, the
techniques include such as keyword-based
approaches[1][2][3], Subject-Action-Object (SAO)-based
approaches[4][5], property-function-based
approaches[6][7][8], rule-based approaches[9][10][11][12],
semantic analysis-based approaches[13][14][15], etc.
Keyword-based approaches use frequencies and co-
occurrences among keywords that result in the lack of
representation of relationships among technological concepts
and require expert knowledge in terms of predefining
keywords. SAO-based approaches extract SAO structures,
which consist of a subject, action, and object and represent the
concepts of technology in the properties/functions format.
Usually, the SAO approaches are employed in TRIZ trend
analysis (TRIZ is a Russian acronym that means Theory of
Inventive Problem Solving). The process for TRIZ trend
analysis is to analyze and categorize patents in know trend
phases, and the results have been used to identify the evolution
of technologies or to seek for further improvements of specific
product. However, the method depends on the expertise and
skills of TRIZ experts to identify specific trends and trend
phases manually, which may be expensive or unfeasible.
For news stories collected from websites such as Google
News, Reuters, Yahoo, etc., news topics are detected by
techniques proposed under Topic Detection and Tracking
(TDT). TDT is intended to identify topics by exploring and
organizing the content of textual materials, and enabling to
group pieces of information into manageable clusters, wherein
each cluster represents a single topic. Reference [16][17]
proposed methods for hot topic extraction based on TF*PDF
and agglomerative clustering algorithm. Reference [18]
constructed topic hierarchy by identification of burst periods
of features. Reference [19] detected topic by identification of
both aperiodic and periodic features bursts.
Given the large amounts of news topics constantly being
created and updated, there is a concern about how to rank
those topics in terms of timeliness and importance. Generally,
topic ranking is determined by two factors; one is how
frequently and recently a topic is reported by websites; the
other is how much attention users pay to it[20]. The first rule
focuses on returning timely results and the second one
considers larger topics more important. Either rule involves
only one aspect of the ranking problem. Besides, other factors
must be taken into consideration, i.e., (1) every news story of a
topic contributes to its importance, while the contribution
3
Corresponding author: Seong Joon Yoo; E-mail address: sjyoo@sejong.ac.kr
223978-1-4673-8796-5/16/$31.00 2016 IEEE BigComp 2016
2. decays along the timeline; and (2) topics that attract more
users' attention should be ranked higher.
Generally, current topic detection and trend tracking
researches are mainly based on timeline and frequency
features[21][22][23][24]. However, most of the researches are
lack of representing the semantic relationship between terms.
Some researches[25][26][27][28] consider the binary
relationships such as relationships from patents and
relationships from a predefined trend database, which requires
knowledge of domain experts for classifying trends in advance.
In this paper, we propose a methodology for hot topic
detection and trend tracking using frequency information in
combination with the IPC to capture semantic information on
word categorization, which has not been employed in the state
of art. To obtain semantically meaningful topics of interest, we
apply the TF*PDF algorithm, which allows for the generation
of hot topics over time. Moreover, we exploit the IPCs to
capture the importance of semantic information on word
categorization as presented in multiple IPCs, from which hot
topics are detected. Trends are identified as the normalized
weight of topic over time. Four types of trends are defined
including very stable trends, stable trends, normal trends and
unstable trends. Experimental results were compared with the
baseline TF*IDF for very stable trends, stable trends and all of
trends detected by the system.
II. RELATED WORKS
A. Patent Analysis
Because patents play an important role in intellectual
property protection, there has been a growing interest in
research into patent analysis, patent search, patent query
formulation[29], and trend identification[30][31] from patent
documents. Reference [29] extracted key terms for patent
query formation using semantic patterns and a keyword
dependency relation graph. Key terms are defined as
"Problem/Solution" and are extracted through semantic
patterns. Relationships among key terms are considered in a
term-weighting scheme and important terms are ranked on the
basis of weight. To identify emerging topics and transitions in
topics, many techniques using term frequency have been
applied. A timeline chart is created by textual analysis and the
evolutionary process of emerging technology is visualized as
an S-curve shape. In other research studies, the combination of
citation networks and text-mining techniques have improved
reliability in the detection of chronological changes for which
the co-citation networks provide closer connections among
patents. Timeline visualization of co-citation networks with
labels extracted by text-mining techniques has been used to
detect and trace emerging trends. Particularly, the co-citation
clusters have been used to build patent maps that could be
used to analyze the numbers of patents filed between company
competitors through the years and thereby to visualize the
evolutionary process of emerging technology, etc. Reference
[32] used SAO structures to generate patent maps for
identifying the technological competition trends. Semantic
similarity is measured on the basis of SAO-based semantic
similarities and a patent similarity matrix is constructed. The
output is visualized in the form of a dynamic patent map,
which is used to identify technological vacuums and
technological hotspots. Reference [33] identifies promising
patents for technology transfer and uses TRIZ evolution trends
to evaluate technologies in patents. A patent is considered to
be a high future value patent if it is relevant to future
important TRIZ trends. The patents are ranked based on the
similarity scores and are classified. However, the
disadvantages are that the classification of the TRIZ trends
may not be applicable to all the technological domains and the
revision of classification by domain experts having knowledge
in TRIZ trends are required. Moreover, [15] extracts
information related to properties and functions of a product by
identifying binary relationships in the form of "adjective +
noun" and "verb + noun". The Stanford dependency parser is
used to identify all binary relationships from titles and
abstracts in patents. A "reasons for jumps" rule base that
arranges trend-specific binary relationships for trend
identification is defined, whereupon the most likely trends and
trend phases are determined by measuring sentence semantic
similarity between the binary relationships from patents and
the binary relationships from a "reason for jumps" rule base. If
two or more phases related to a trend are identified from a
patent, the currently developed logic of the trend map chooses
the more evolved one. The final output depicts the
evolutionary which can be used as input for technology
forecasting based on TRIZ trends. Additionally, [31] proposed
Patent Trend Change Mining techniques as the means to
capture changes in patent trends through metadata analysis
without the need of specialist knowledge. The approach
includes a patent indicator calculator that determines the
patent values based on citation index, originality, generality,
and technology cycle time. Then, patent change trends are
determined by association rule mining to compute the
similarities and differences of patent trends between two
different times.
B. International Patent Classification
The International Patent Classification (IPC) is
administered by the World Intellectual Property Organization.
Patent documents which are relevant to a particular inventive
concept are organized through an examination process by the
examiner.
Each patent document is classified into IPCs based on
technical field of the invention and can be assigned to more
than one IPC code. Each IPC is divided into subclass, main
group and sub group. In this research we use the data under
the class H01M 04. Fig. 1 shows hierarchy of H01M 04,
which contains patent documents belonging to the electrodes
category and includes six subgroups.
Description
H01M 04 Electrodes (electrodes for electrolytic processes)
H01M 04/02 .Electrodes compose of, or comprising active material
H01M 04/04 .. Processes of manufacture in general
H01M 04/06 .. Electrodes for primary cells
H01M 04/08 … Processes of manufacture
H01M 04/10 … of pressed electrodes with central core
H01M 04/12 …. Of consumable metal or alloy electrodes
(use of alloy compositions as active materials )
Fig. 1. The IPC H01M 04 hierarchy
224
3. The identification of technological trends is one of the
most important tasks in acquiring knowledge from patent
sources so as to quickly understand the latest advanced,
innovative technologies in high-tech industries and to acquire
technologies for future use. In the state of the art, however,
there is no research target pertaining to the detection of patent
trends by utilizing IPC. In our work we utilize IPC
information in a practical way so as to identify our target hot-
term extraction within a patent-data corpus. Given the
assumption that there are many technical trends invented in
patent documents, it is very difficult to identify the major
trends relative to each domain without IPC information.
For example, in the following sample sentence "drain" and
"flash amperage" are highly ranked terms by general term
weightings such as TF*IDF based on frequency information.
"A high DRAIN rate, primary alkaline cell comprising a
negative electrode...capable of providing a FLASH
AMPERAGE greater than an average of..."
However, by using IPC information we rank those terms
with higher weight based on the number of IPCs (e.g. H01M
04 or H01M 10) where "drain" and "flash amperage" belong
to.
C. Hot Topic Detection and Trend Analysis
Automatic extraction of meaningful topics, which can help
detecting topics of interest and facilitate the analysis of user
behavioral data, has been studied previously in the various
context. Reference [18] used Latent Dirichlet Allocation
(LDA) to extract latent topics by modeling temporal trends on
Twitter over time. Reference [19] modeled topics from text
corpus in order to determine whether a topic description is
well formed by doing so throught the use of LDA and
selective Zipf distribution. Reference [34] proposed a
framework using probability inference for detecting
objectionable text content that has been shown to be harmful
to Web users. For a given sentence, the probability value,
which shows the likelihood of the sentence with respect to the
model, is calculated and then a mapping function is used to
transform the probability value into a new indicator for
making decision about the type of Web text content.
Reference [35] proposed a hierarchical topic extraction
algorithm based on topic grain computation. By considering
the distribution of word document frequency as a Gaussian
mixture, the topic grain is defined based on the mixture of
Gaussian parameters and feature words are selected for the
grain by employing an EM-like algorithm. A clustering
algorithm is used to generate a multiple-grain hierarchical
topic structure with different subtopic description. Reference
[36] incorporated topic transition in topic detection along with
tracking from Reuters and BBS websites. They employed a
topic representation based on the hidden Markov model and
applied fuzzy-kMeans clustering to find the most likely topic-
transition sequence. Reference [37] addressed the problem of
extracting significant words that are highly useful for
summarizing and presenting topics from a huge number of
news articles. To extract keywords, [37] used an unsupervised
keyword extraction technique called Table Term Frequency,
which includes several variants of the conventional TF-IDF
model and filtered keywords with cross-domain comparison.
Reference [20] introduced an automatic online news-topic
key-phrase extraction system. Topics are constructed and
updated online automatically with techniques to determine the
degree of burst of terms along with the aging theory. The
proposed system extracts keyword candidates from single
news stories, filters them with topic information, and then
combines them into phrase candidates using position
information. Finally, the phrases are ranked and the top ones
are selected as topic key phrases.
In TDT, a topic is defined as a seminal event or activity,
along with all the directly related events and activities. A hot
topic is defined as a topic that appears frequently over a period
of time[17]. The hotness of a topic depends on two factor:
how often hot terms appear in a document and the number of
documents that contain those terms. However, the hotness of
each topic evolves over a given period of time through the life
cycle of birth growth, maturity, and death.
Some research studies have utilized the aging theory or
timeline analysis in TDT and hot-topic extraction[16][17];
topic hierarchy construction based on the identification of
burst periods of features[18]; topic sentence extraction along a
timeline given a query[38]; topic detection based on the
identification of both aperiodic and periodic features'
bursts[19]; finding top burst topics by identifying burst
words[30]; and so on. The previous approaches listed above
analyzed the characteristics of features from a fixed corpus on
the whole timeline. In order to identify topics in large sets of
documents, we have to determine the key terms that
sufficiently describe the topics. A term-weighting scheme is
used to capture important or representative terms that feature
in the content of a document, such as calculating the term
distribution level in a document or in a corpus[16][36][37].
The most common term-weighting scheme for processing
index terms is TF-IDF. Because the TF-IDF scheme
emphasizes the importance or uniqueness of each term, it only
identifies terms that occur in a few of the documents contained
in a corpus. For hot-topic extraction, however, terms that
appear in many of the documents in a corpus must be
identified. Therefore, a different term-weighting scheme TF-
Proportional Document Frequency (TF*PDF) assigns greater
weights to terms that occur frequently in many documents on
many channels and lower weights to others to avoid the
collapse of important terms when they appear in many
documents. Although TF-PDF captures the basic concept of a
hot topic, its weakness is that it does not consider variations in
the popularity of a topic over time. Therefore, [17] combined
TF*PDF and aging theory to the extraction of hot terms from a
data corpus in consideration of the term life cycle. An aging
theory is used to model a news-topic life span with the four
stages of birth, growth, decay, and death to reflect its
popularity over time. To track the life cycles of topics, they
used the concept of energy function. The energy of an event
increases when the event becomes popular, and it decreases as
its popularity reduces. Hence, the aging theory is suitable for
tracking the variations in the frequency of terms, which are
critical to success in hot-topic extraction.
A technology lifecycle is usually referred to as an S-curve,
containing the four stages of innovation, growth stage,
225
4. maturity and decline. The innovation stage is when a
technology is born from a new technical method or when
phrases appear in a small number of patents and slowly
increase. The growth stage is when a technology has been
recognized, thus gathering strength with the increasing
number of patents over time. The maturity stage is when the
recognition is high and stable with the rapid increase of
number of patents. The decline stage is when the technology is
reduced. We apply three functions from [17] in order to
calculate and update the energy of topics in every time slot;
getEnergy() calculates the nutrition that a topic receives from
a story; energyFunction() converts a topic nutritional value
into an energy value; energyDecay() carries out the energy
decrease in each time slot.
We have explored hot-topic detection from patent
documents by utilizing TF*PDF. Moreover, we seek to utilize
the importance of IPC, where the terms from a patent
document are categorized. In this research we apply the hot-
term detection algorithm (TF*PDF) and utilize IPC
information in order to identify hot topics in a patent data
corpus.
III. SYSTEM ARCHITECTURE
The overall procedure for hot-term detection and
technological trend identification from patent documents
consists of several steps, including data crawling, keyword
extraction, hot-term detection, and trend analysis as shown in
Fig. 2. Firstly, raw patent data is collected from a U.S. patent
database and transformed into structured data. POStagger is
used for keyword extraction. All common stop words are
removed in combination with the list of patent stop words.
Then, a candidate list of detected hot terms is compiled by
TF*PDF algorithm and the aging theory. Finally, we analyze
patent trend based on the hot-term detection and evaluate the
results.
Fig. 2. System Architecture
A. Hot Term Detection
Two stop-word lists are used. One contains common stop
words and the other contains the words that are common in
patent documents and irrelevant to patent content.
In hot-term extraction process, two characteristics of a
term are considered. One is the frequency of the term in the
question collection (Definition 1) and the other is the term
variation over time (Definition 2). Term frequency is
measured by TF*PDF[17][39]. TF*PDF is considered more
suitable for topic detection than TF*IDF because the former
assigns greater weights to terms that occur frequently in many
documents. We adopt the TF*PDF scheme in this paper and
the top m terms are chosen as final terms for trend analysis.
Phrases that do not include any term among the final terms are
therefore excluded.
Definition 1 (TF*PDF). Given a term j in a document, the
TF*PDF weight of term j is calculated from [39] through the
following (1) and (2):
c
jc
Cc
c
jcj
N
n
FW exp
1
where
Kk
k
kc
jc
jc
F
F
F
1
2
, where Wj is the TF*PDF value of term j, which is the
summation of term weights gained from each IPC c; |C| is the
number of IPCs. Fjc is the frequency of term j in the IPC c; K
is the total number of terms in the IPC c; njc is the number of
documents that belong to the IPC c where term j occurs; Nc is
the total number of documents in the IPC c.
Definition 2 (Term Life Cycle). Reference [17] defined a
term life cycle model in order to calculate the variation of each
term value from its cycle of birth, growth, decay, and death.
This step is suitable for tracking the variations in the
frequency of terms, which are critical to a successful hot-topic
extraction. We apply three functions from [17] to calculate the
energy of topics in each time slot, including getEnergy(),
energyFunction(), and getVariation().
getEnergy() calculates the energy that a term receives from
patents at a specific time slot. The energy Et,s of term t
measures the frequency of t appearing in a specified time slot s,
which is the accumulated value of term t from all patent IPCs,
as in (3). Therefore, hot terms are those that have high energy
in all IPCs.
,2
,
Cc ctts XE
where C is the set of IPCs, X2
t,c is the association between
term t and the time slot s in IPC c, given by (4):
))()()((
))(( 2
2
DBCADCBA
BCADDCBA
X
226
5. For each term we calculate the contingency table, as
shown in Table 1:
TABLE I. CONTINGENCY TABLE
s s
T A B
−
T
C D
, where A is the count of the patents that contain term t in time
span s; B is the count of patents that contain term t on other
time spans; C is the count of patents that do not contain term t
in time span s; D is the count of patents that do not contain
term t on other time spans.
energyFunction() converts a term energy value into a life
support value. The life support value lifeSupportt,s of t at time
slot s is calculated as the logarithm of accumulated energy Et,s,
as in (5).
)ln( ,, stst EtlifeSuppor
getVariation() calculates the variation of the life support
values of term t over time in the patent collection can be
computed as (6):
2
, )(
1
tlifeSupportlifeSuppor
N
V stst
, where N is the number of time slots in the given interval I;
lifesupportt,s is the life support value in each time slot;
tlifeSuppor is the average life support value; and Vt is the
variation in the life support values of t during I.
The overall weight of term t is measured by combining
TF*PDF and Variation value together, as in (7).
tt VPDFTFweight *
Finally, the terms in the candidate list with the combined
weight will be ranked. The top-ranked k terms can be chosen
as hot terms that reflect the hot topics in the corpus.
B. Trend Detection
We divide the patent timeline into yearly time slots. In
each time slot s, a trend is represented by the normalized
weight of occurrences of term t from n documents.
IV. EXPERIMENTS
For hot topic detection, we present the results by
comparing topics detected by three algorithms including Chi-
square, TF*PDF, and the combination of TF*PDF and Aging
Theory where IPC information is employed under TF*PDF.
For hot technology trend tracking, we define four types of
trends and eliminate insignificant ones. To evaluate the
significance of the trends we choose TF*IDF as the baseline
since TF*IDF is the most common technique in data mining
area.
A. Data Collection
The data collection contains 513 patent documents crawled
from the U.S. Patent and Trademark Office database. Patent
documents are selected from the domain of batteries, as
published from 1977 to the present. Fig. 3 shows the IPC
H01M 04, which contains patent documents belonging to the
electrodes category and includes six subgroups (H01M 04/02,
H01M 04/04, H01M 04/06, H01M 04/08, H01M 04/10, and
H01M 04/12).
United States Patent 4,016,339 Gray et al. April 5, 1977
Abstract A battery electrode structure of flat configuration comprises a
cast mass of electrochemically active material, said mass having
contained therein and exposed opposite surfaces thereof an open-mesh
electrically conductive structure adapted for connection to a battery
terminal. An open-mesh electrically conductive support member in the
mass and in contact with the exposed electrically conductive structure
maintains electrical conductivity throughout discharge to ensure
maximum use of the active material.
Current International Class: H01M 04/06?(20060101); H01M
04/58?(20060101); H01M 06/34?(20060101); H01M 06/30?(20060101);
H01M 04/70?(20060101); H01M 004/02?()
Fig. 3. Sample input patent document
B. Data Preparation
1) Data Extraction
A patent document from USPTO contains sections,
including “Title”, “Abstract”, “Claims”, and “Description”.
"Title" is too short that is not suitable for our research.
"Claims" is insufficient because it may not contain as many
technology phrases as another section or those terms are
included in "Abstract". “Description” is lengthy and includes
sub fields such as "Field of the Invention", "Prior Art",
"Summary of the Invention", "Detailed Description of the
Preferred Embodiment", "Brief Description of the Drawings",
etc. "Description" contains meaningful terms for the
problem/solution extraction method, as shown by [29]. For
preliminary experiments we use only "Abstract" which is a
short, very precise summary of the invention.
2) IPC Extraction
From patent documents, we extract IPC information under
the "Current International Class" tag as shown in Fig. 3. The
IPC information is then extracted from Main Groups by using
regular expression and stored in a list of IPCs, which would be
later utilized in term weighting using TF*PDF algorithm.
3) Timestamps
To identify the technology lifecycle, we split the whole
time span into each one-year intervals. Patent documents were
crawled from 1976 to 2014, but there is no patent filed in the
H01M 04 class in 2006. Therefore, the range of data include a
total of 30 years from 1976 to 2005.
C. Hot Topic Extraction
We compare hot terms extracted by the TF*PDF, Chi-
square, and the combined weight. This experiment validates
the effectiveness of each term-weighting methodology in hot-
term detection. Then we demonstrate how we can identify
genuine hot topics by ranking term based on its weight.
227
6. Tables 2, 3, and 4 show the top 10 ranked hot topics by
using TF*PDF, Chi-square, and the combined weight
respectively. The results show that hot terms extracted by each
algorithm are different since terms are weighted in
consideration of different factors such as pervasiveness,
topicality, or variation of the life cycle. It is shown that terms
detected by Chi-square do not frequently appear and cannot be
detected using frequency information. Those terms are
significantly rare terms that would be suitable for detecting
innovation technology from the patent domain. Contrastingly,
TF*PDF and the combined weight mainly detect terms that are
pervasive and topical. TF*PDF and the combined-weight
methodology detect important terms that may be considered as
the topic of patent documents.
TABLE II. TOP 10 HOT TOPICS DETECTED BY TF*PDF
1976 1980 1985 1988 1990 1994 2000 2003 2005
atmosphere atmosphere atmosphere resistance reactant areas increase liquid dry
refractory refractory shear layers accumulator coated increases ion free
telescopic conjugated tab retain makes coat fuel liquids spray
free axis reinforce retaining make uncoated iron shapes dryer
spray deterioration amalgamated strong end injecting ion wet inventive
disturbing size precipitation smooth severed nozzle polysaccharide floc condition
orous end establish web lugs passing dehydration disposition telescopic
electron wash force separating band fabrication period polytetrafluoroethylene disturbing
electronic reduce globular product negligible fabric corrosive dimensions flex
constant crystal acetate end permeates mounting pasting ions expectancy
TABLE III. TOP 10 HOT TOPICS DETECTED BY CHI-SQUARE
1976 1980 1985 1988 1990 1994 2000 2003 2005
web silver layer body grids hydrogen nickel electrode electrode
metal cathode graphite dy battery mprising cathode electro rod
emulsion athode battery powder lead comprising athode rod edge
catalytic group conductor electrode plastic rising hydroxide cell assembly
support active deposition lithium grooves copper electrochemical battery oil
mu vanadium metal electrolyte spaced lithium manganese mprising tab
fibers battery film portions space oil dioxide comprising coiled
heating connection material portion power foil improved rising ss
heat connect gel carbon lugs conductive mno end metallic
electrode addition deposit battery plates electrodes lithium single metal
TABLE IV. TOP 10 HOT TOPICS DETECTED BY THE COMBINED WEIGHT
1976 1980 1985 1988 1990 1994 2000 2003 2005
form material electrode material battery layer nickel electrode electrode
layer oxide electro powder lead hydrogen material cells electro
la active layer body grids mprising lithium cell rod
electrolyte cathode battery dy power comprising improved ce lithium
material athode material electrode electrode rising electrode el alloy
ia silver al electro electro material electrochemical battery electrolyte
battery battery ia lithium rod lithium chemical mprising anode
high vanadium er cathode layer copper electrolyte comprising electrochemical
anode anode graphite athode plastic battery cathode rising chemical
web lithium ph electrolyte high electrode athode plate battery
improved alloy includes Electro material electro ring material layer
D. Trend Tracking and Analysis
In our experiments we obtained a list of 3,211 topics. To
identify topics that could be candidates to represent technical
trends and eliminate insignificant ones, we define four types
of trends that would be satisfied with the following criteria:
Considered that a decay value is 0,
Very stable trends: trends have no decay values for the
entire life cycle.
Stable trends: trends in which the number of decay values
is less than 3.
Normal trends: trends which have at least two continuous
time spans of three years or more; with number of decay
values in the range of (3~7).
Unstable trends: trends which have only one or two
continuous time spans of at least three years in which the
number of decay values in greater than seven.
1) Trends by Chi-square
Fig. 4 shows four very stable trends detected by Chi-
square algorithm. As shown in Fig. 4, "ion" is the hottest
technology developed for secondary battery during 1976 to
2005.
Fig. 4. Very-Stable Trends by Chi-square
All normal trends detected by Chi-square from 1976 to
2005 are shown in Table 5.
TABLE V. NORMAL TRENDS BY CHI-SQUARE
hydrogen, portion, battery, include, anode, agent, high, electrodes,
electrochemical, electrolyte, mproved, gas, sheet, end, structure, electric,
surface, hydroxide, comprising, electrical, providing, current, salt, porous,
oxide, lithium, solution, substrate, including, alkaline, coat, cathode, mixing,
manufacturing, energy, active, alkali, density, face, plurality, powder, storage,
mixture, process, carbon, proper, ions, tab, contact, disc, compound, step,
method, area, chemical, matrix, part, ratio, cycle, article, composition,
charge, pen, igh, line, bind, fabric, improve, treat, car, polymer, dioxide,
sulfide, excellent, outer, voltage, element, forming, conductive, type,
improved, alloy, life, rate, nickel, solid, plate, mix, making, acid, orous,
athode
2) Trends by TF*PDF
The top 10 hottest very stable trends by TF*PDF as are
shown in Fig. 5.
Fig. 5. Top 10 Very Stable Trends by TF*PDF
Fig. 6 shows the top 10 very stable trends detected by the
combined weight from 1976 to 2005. The "method" is not a
technical term and is of little significance in relation to a
technological trend. These trends had at least one or less than
three times decay and then grew continuously. It is shown that
"material" is the hottest trend that received high attention in
the electrode domain over the life cycle of 1976 to 2005. The
228
7. second very stable trend is "electrode". Although the trend of
"electrode" does not have a higher peak than "electro" or
"electrolyte", it does not have falling values in years as
compared to the others. The trend of "ba" falls nearly to the
bottom, though one year before decaying it has a very high
peak. However, two other trends such as "electrolyte" and
"da" is growing. It is considered to be one of the hottest trends
that are highly paid attention in the patent domain in the next
few years. It also has high value continuously as a trend
because "method" is a very frequent term used in almost patent
documents, particularly to identify Problem and Solution
terms as by [29]. In further experiments it would be excluded
from our term list.
Fig. 6. Top 10 Very Stable Trends by combination of TF*PDF and Aging
Theory (1976~2005)
E. Evaluation
We evaluate the results for very stable trends, stable
trends, and all of trends detected by the system. We compare
the TF*PDF algorithm and the combined weight with the
baseline. We do not show the evaluation for Chi-square
because Chi-square is not very effective in detecting trends
(Recall = 0.0851%).
As shown in Table 6, the combined weight algorithm
achieves the high precision for very stable trends (0.976%);
however, the Recall is lower than the TF*PDF algorithm by
0.064%. It is also shown that the TF*PDF is more effective
than the combined weight in detecting trends. Thus, in the
patent domain the hot trends extracted by means of the
TF*PDF by utilizing IPC information are not affected by their
life cycles over time.
TABLE VI. NORMAL TRENDS BY CHI-SQUARE
Precision Recall
Very stable trends
TF*PDF 0.957% 0.936%
Combined Weight 0.976% 0.872%
Stable trends
TF*PDF 0.959% 0.959%
Combined Weight 0.864% 0.776%
All trends
TF*PDF 0.84% 0.601%
Combined Weight 0.665% 0.612%
V. CONCLUSIONS
We have proposed a system for the extraction of hot topics
and the detection of hot trends from the patent domain within
a specific time period using TF*PDF and semantic
information where terms are distributed (IPCs). Our work has
the following contributions:
The automatic detection of hot technological topics
from the patent domain,
The use of semantic information for hot-topic
detection, which has not heretofore been done in the
state of the art,
The automatic tracking of hot trends from patent
domain.
Our implementation is intended to detect hot topics and
track trends in terms of pervasiveness and topicality. We apply
the TF*PDF weighting algorithm to extract terms with
pervasiveness. To determine term’s topicality, we apply the
aging theory to track the change in term’s life cycle. The
combination of TF*PDF and the aging theory are proved to
improve the quality of the hot-topic extraction in news
documents. However, our research with patent documents
shows that a term life cycle does not affect the topicality of
hot topics in the patent domain. It is because patent documents
contain technical terms distributed yearly or monthly, and
there are significantly rare terms that appear only several times
in the entire corpus. Meanwhile, for news documents, terms
are distributed daily with a very high frequency of change and
consequently variation plays an important role in the change
of a term’s life cycle.
By utilizing IPC information, in which documents are
manually classified into specific categories by patent experts,
we automatically detect technological trends from the patent
domain in a specific time period. The terms extracted from
those documents therefore belong to specific categories,
whereby the importance of terms is evaluated based on its
importance in specific IPCs. That allows a hot topic to be
identified based on its importance in each IPC as opposed to
being affected by the variation in its life cycle.
For a huge number of patents with lengthy and difficulty
technical terms, it is necessary to quickly identify the hottest
information about which technologies were invented with high
attention. The experiment results show that our approach
yields a substantial methodology of hot-topic extraction and
technological trend detection from the patent domain. By
apply the hot-term detection algorithm using Term Frequency
– Proportional Document Frequency in consideration of IPC
information, we have shown an important new criteria for
weighting hot topics based on semantic categorization, which
has not previously been applied in the patent domain.
ACKNOWLEDGMENT
This research was supported by the MSIP(Ministry of
Science, ICT and Future Planning), Korea, under the Global IT
Talent support program(IITP-2014-H0905-14-1005) and the
Establishing IT Research Infrastructure Projects(I2221-14-
229
8. 1012) supervised by the IITP(Institute for Information and
Communication Technology Promotion).
REFERENCES
[1] K. Borner, C. Chen, and K.W. Boyack, “Visualizing knowledge
domains," Annual Review of Information Science and Technology, vol.
37, pp. 179-255, 2003.
[2] Y. Ding, G.G. Chowdhury, and S. Foo, "Bibliometric cartography of
information retrieval research by using co-word analysis," Information
Processing and Management, vol. 37, no. 6, pp. 817-842, 2001.
[3] S. Lee, B. Yoon, and Y. Park, "An approach to discovering new
technology opportunities: keyword-based patent map approach,"
Technovation, vol. 29, no. 6-7, pp. 481-497, 2009
[4] M. Moehrle, L. Walter, A. Geritz, and S. Muller, "Patent-based inventor
profiles as a basis for human resource decisions in research and
development," R&D Management, vol. 35, no. 5, pp. 513-524, 2005.
[5] J. Yoon and K. Kim, "Identifying rapidly evolving technological trends
for R&D planning using SAO-based semantic patent networks,"
Scientometrics, vol. 88, no. 1, pp. 313-331, 2013.
[6] S. Dewulf, "Directed variation: variation of properties for new or
improved function product DNA, a base for 'connect and develop',"
World Conference: TRIZ Future, 2006.
[7] D. Mann, Hands-on systematic innovation, Belgium: Creax press, 2002.
[8] J. Yoon and K. Kim, "An analysis of property-function based patent
networks for strategic R&D planning in fast-moving industries: Teh case
of silicon-based thin film solar cells," Expert Systems with Application,
vol. 39, no. 9, pp.7709-7717, 2012.
[9] M.L. Antonie and O.R. Zaiane, "Text document categorization by term
association," In Proceedings of the 2002 IEEE international conference
on data mining, 2002.
[10] X.Y. Chen, Y. Chen, L. Wang, and Y.F. Hu, "Text categorization based
on frequent patterns with term frequency," In Proceedings of 2004
international conference on machine learning and cybernetics, 2004.
[11] H. Han, E. Manavogulu, C. Giles, and H. Zha, "Rule-based word
clustering for text classification," In Proceedings of the 26th annual
international ACM SIGIR conference on research and development in
information retrieval, 2003.
[12] C. He and H.T. Loh, "Pattern-oriented associative rule-based patent
classification," Expert Systems with Application, vol. 37, no.3, 2010.
[13] I. Bergmann, D. Butzke, L. Walter, J.P. Fuerste, M.G. Moehrle, and V.A.
Erdmann, "Evaluating the risk of patent infringement by means of
semantic patent analysis: the case of DNA chips," R&D Management,
vol. 38, no. 5, pp.550-562, 2008.
[14] T. Magerman, B.V. Looy, and X. Song, "Exploring the feasibility and
accuracy of Latent Semantic Analysis based text mining techniques to
detect similarity between patent documents and scientific publications,"
Scientometrics, vol. 82, no. 2, pp. 289-306, 2010
[15] J. Yoon and K. Kim, "An automated method for identifying TRIZ
evolution trends from patents," Expert Systems with Applications, vol.
38, no. 12, pp.15540-15548, 2011.
[16] C.C. Chen, Y.T. Chen, Y. Sun, and M.C. Chen, "Life cycle modeling of
news events using Aging Theory," In Proceedings of 14th European
Conference of Machine Learning, pp. 47-59, 2003.
[17] K.Y. Chen, L. Luesukprasert, and S.T. Chou, "Hot topic extraction
based on timeline analysis and multidimensional sentence modeling,"
IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8,
2007
[18] M.C. Yang and H.C. Rim, "Identifying interesting Twitter contents
using topical analysis," Expert Systems with Applications, vol. 41, no. 9,
pp. 4330-4336, 2014.
[19] J. Zeng, J. Duan, W. Cao, and C. Wu, "Topics modeling based on
selective Zipf distribution," Expert Systems with Applications, vol. 39,
no. 7, pp. 6541-6546, 2012.
[20] C. Wang, M. Zhang, L. Ru, and S. Ma, "An automatic online news topic
key phrase extraction system," IEEE/WIC/ACM International
Conference on Web Intelligence and Intelligent Agent Technology, 2008.
[21] Y. Chen, H. Amiri, Z. Li, and T. Chua, "Emerging topic detection for
organizations from microblogs," In Proceedings of the 36th international
ACM SIGIR conference on research and development in information
retrieval, pp. 43-52, 2013.
[22] L. Christiansen, T. Schimoler, R. Burke, and B. Mobasher, "Modeling
topic trends on the social web using temporal signatures," In
Proceedings of the twelfth international workshop on Web information
and data management, pp. 3-10, 2012.
[23] S. Lee, J. Lee, C. Park, and J. Lee, "Blog topic analysis using TF
smoothing and LDA," In Proceedings of the 7th International
Conference on Ubiquitous Information Management and
Communication, 2013.
[24] R. Long, H. Wang, Y. Chen, O. Jin, and Y. Yu, "Towards effective
event detection, tracking and summarization on Microblog data," Web-
Age Information Management (Lecture Notes in Computer Science),
6897, pp. 652-663, 2011.
[25] P. Erdi, K. Makovi, Z. Somogyvari, K. Strandburg, J. Tobochnik, P.
Volf, and L. Zalanyi, "Prediction of emerging technologies based on
analysis of the US patent citation network," Scientometrics, vol. 95, no.
1, pp. 225-242, 2013.
[26] C. Lee, B. Song, and Y. Park, "How to assess patent infringement risks:
a semantic patent claim analysis using dependency relationships,"
Technology Analysis & Strategic Management, vol. 25, no. 1, pp. 23-38,
2013.
[27] A.J. Trappey, C.V. Trappey, C. Wu, C.Y. Fan, and Y. Lin, "Intelligent
patent recommendation system for innovative design collaboration,"
Journal of Network and Computer Applications, vol. 36, no. 6, pp. 1441-
1450, 2013.
[28] J. Yoon and K. Kim, "TrendPerceptor: A property-function-based
technology intelligence system for identifying technological trends from
patents," Expert Systems with Applications, vol. 39, no. 3, pp. 2927-
2938, 2012.
[29] K.L. Nguyen and S.H. Myaeng, "Query enhancement for patent prior-art
search based on key-term dependency relationships and semantic tags,"
Lecture Notes in Computer Science, 7356, pp. 28-42, 2012.
[30] Y.G. Kim, J.H. Suh, and S.C. Park, "Visualization of patent analysis for
emerging technology," Expert Systems with Applications, vol. 34, no. 3,
pp. 1804-1812, 2008.
[31] M.J. Shih, D.R. Liu, and M.L. Hsu, "Discovering competitive
intelligence by mining changes in patent trends," Expert Systems with
Applications, vol. 37, no. 4, pp. 2882-2890, 2010.
[32] J. Yoon, H. Park, and K. Kim, "Identifying technological competition
trends for R&D planning using dynamic patent maps: SAO-based
content analysis," Scientometrics Journal, vol. 94, no. 1, pp. 313-331,
2013.
[33] H. Park, J.J. Ree, and K. Kim, "Identification of promising patents for
technology transfers using TRIZ evolution trends," Expert Systems with
Applications, vol. 40, no. 2, pp. 736-743, 2013.
[34] J. Duan and J. Zeng, "Web objectionable text content detection using
topic modeling technique," Expert Systems with Applications, vol. 40,
no. 15, pp. 6094-6104, 2013.
[35] J. Zeng, C. Wu, and W. Wang, "Multiple-grain hierarchical topic
extraction algorithm for text mining," Expert Systems with Applications,
vol. 37, no. 4, pp. 3202-3208, 2010.
[36] J.P. Zeng and S.Y. Zhang, "Incorporating topic transition in topic
detection and tracking algorithms," Expert Systems with Applications,
vol. 36, no. 1, pp. 227-232, 2009.
[37] S. Lee and H.J. Kim, "News Keyword Extraction for Topic Tracking,"
Networked Computing and Advanced Information Management NCM,
2008.
[38] S.Y. Chen, T.T. Tseng, H.E. Ke, and C.T. Sun, "Social trend tracking by
time series based social tagging clustering," Expert Systems with
Applications, vol. 38, no. 10, pp. 12807-12817, 2011.
[39] K.K. Bun and M. Ishizuka, "Topic Extraction from News Archive Using
TF*PDF Algorithm," In Proceedings of the 3rd International Conference
Web Information System Eng, pp. 73-82, 2002.
230