SlideShare a Scribd company logo
1 of 6
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3509
Data Dimension Reduction for Clustering Semantic Documents using
SVD Fuzzy C-Mean (SVD-FCM)
Hsu-Kuang Chang
Associative Professor, Department of Information Engineering, I-Shou University, Kaohsiung, Taiwan
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The rapid growth of XML adoption has urged
for the need of a proper representation for semi-
structured documents, where the document semantic
structural information has to be taken into account so as
to support more precise document analysis. In order to
analyze the information represented in XML documents
efficiently, researches on XML document clustering are
actively in progress. The key issue is how to devise the
similarity measure between XML documents to be used for
clustering. In this paper, we introduce data dimension
reduction (DDR) based on the SVD factorization
(DDR/SVD) for the documents similarity calculating and
DDR SVD Fuzzy C-Mean (DDR SVD-FCM) After projecting
XML documents to the lower dimensional space obtained
from DDR, our proposed method fuzzy c-mean to execute
the document-analysis clustering algorithms (SVD-FCM).
DDR can substantially reduce the computing time and/or
memory requirement of a given document-analysis
clustering algorithm, especially when we need to run the
document-analysis algorithm many times for estimating
parameters or searching for a better solution.
Key Words: SVD, DDR, XML, SVD-FCM
1. INTRODUCTION
An XML document, which is semi-structured data, has a
hierarchical structure. Therefore, rather than using the
similarity measure of the general document clustering
techniques as it is, a new similarity measure which
considers the semantic and structural information of an
XML document must be investigated. However, some XML
clustering methods used the similarity measure which
only takes the structural information of XML documents
into account. Hwang proposes a clustering method which
extracts a typical structure of the maximum frequency
pattern n using PrefixSpan algorithm [1] on XML
documents [2, 3]. However, since such a typical structure
extracted from XML documents is not the only structure
which represents the XML document itself, it cannot be the
representative of the whole document corpus, since there
is an accuracy issue of similarity. Lian summarizes XML
documents into an S-graph which is a structural graph,
and proposes that the calculation method of the distance
between S-graphs is used for clustering [4]. However, they
do not consider semantic information on XML documents
since they only focus on structural information. Since
dimension reduction is one of the fundamental methods of
data analysis, there have been a great many studies on
effective and efficient dimension reduction algorithms.
There are linear dimension reduction algorithms including
principal component analysis (PCA) [5] and
multidimensional scaling (MDS) [6]. There are also
nonlinear dimensional reduction algorithms (NLDR)
including an Isomap [7], locally linear embedding (LLE)
[8], [9], Hessian LLE [10], Laplacian eigenmaps [11], local
tangent space alignment (LTSA) [12] and a distance
preserving dimension reduction based on a singular value
decomposition (DPDR/QR) [13]. These dimensions cover a
variety of areas such as biomedical image recognition,
biomedical text data mining, and biological data analysis.
The rest of this paper is organized as follows. In Section 2,
we introduce the prepared XML documents on a vector
space model. In Section 3, we show the DDR based on the
SVD decomposition and the DDR SVD-FCM. Section 4
presents the experimental results illustrating properties of
the proposed DDR methods. A summary is made in Section
5.
2. Preparation of Semantic-based XML Documents
In this section, we first introduce the pre-processing
steps for the incorporation of hierarchical information in
encoding the XML tree’s paths. This is based on the
preorder tree representation (PTR) [14] and will be
introduced after a brief review of how to generate an XML
tree from an XML document.
Chart-1 illustrates an example of structural summary. By
applying the phase of the nested reduction on tree T1, we
derived tree T2 where there are no nested nodes. By
applying the repetition reduction on tree T2, we derived
tree T3, which is the structural summary tree without
nested and repeated nodes. Once the trees have been
compacted using structural summaries, the nesting and
repetition are reduced.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3510
Chart -1: nested and repeated nodes extraction
Now the XML document is modeled as a XML tree T=(V,E)
where V={ v1, v2, ....} as a set of vertices and V1 ,
V2 , Evv ),( 21 as a set of edges. As an example,
Chart-2 depicts a sample XML tree containing some
information about the collection of books. The book
consists of intro tags, each comprising title, author and
date tags. Each author contains fname and lname, each
date includes year and month tags. Chart-2 left shows only
the first letter of each tag for simplicity.
Chart -2: Example of XML Document
The XML document has a hierarchical structure and this
structure is organized with tag paths, to represent
document characteristics which can predict the contents
of the XML document. Strictly speaking, this shows the
semantic structural characteristics of the XML document.
In this paper, we propose a new method for calculating the
similarity using all of the tag paths of the XML tree
representing the semantic structural information of the
XML document. From now on, a tag path is termed path
element. Table-1 shows path elements obtained from the
XML document in Chart-3. The PEL-i represents the
extracted path elements on the XML document tree from
the ith tree level to the leaf node. For example, the PEL-1
means the path element from the root level (level 1) to the
leaf node, and PEL-2 means the path element from the level
2 to the leaf node respectively.
Chart -3: XML Documents Example
Table -1: Path elements example
PEL-1 PEL-2 PEL-3 PEL-4
/B/I/T/D /I/T/D /T/D D
/B/I/A/F /I/A/F /A/F F
/B/I/A/L /I/A/L /A/L L
/M/I/A/L /I/T /T
/B/I/T/ /I/A /A
/B/I/A /I
/B/I/
/B
/M/I/T
/M
/M/I
3. Path Element of the Vector Space Model (PEVSM)
Vector model represents a document as a vector whose
elements are the weights of the path elements within the
document. To calculate the weight of each path element
within the document, a Term Frequency and IDF (Inverse
Document Frequency) method is used [15]. We define the
PESSW (Path Element Structural Semantic Weight) which
calculates the weight of the path element in an XML
document. The PESSW is PEWF (Path Element Weighted
Frequency) multiplied by the PEIDF (Path Element Inverse
Document Frequency). The PESSWij of ith path element in
the jth document is shown in equation (1). In this paper,
we use the PESSW and DPESSW interchange.
ijijij PEIDFPEWFPESSW  (1)
ijPEWF is shown in equation (2).
nijij
x
freqPEWF
1
 (2)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3511
ijfreq is a frequency of j-th path element in a i-th
document and it is multiplied by level weight n
x
1
in order
to consider the semantic importance of a path element in a
document. X refers to the level number of the highest tag
of a tag path. The level number of the root tag is 1, and
that of a tag under the root tag is 2, and so on. N is a real
number larger than 1, and in this paper, 1 is chosen for the
value of n. PEIDFij is shown in equation (3). PEIDFij is
shown in equation (3).
j
ij
DF
N
PEIDF log (3)
where N is the total number of documents and DFj is the
number of documents in which the jth path element
appears. The PESSW is prudently calculated to correctly
reflect the structural semantic similarity. Table-2 shows
the PEWF, PEIDF, and PESSW on sample trees in Chart-3.
Table-2: An example of PTWF, PTIDF and PESSW
Path
Eleme
nt
PEWF PEIDF PESSW
do
c1
do
c2
do
c3
do
c1
do
c2
do
c3
do
c1
do
c2
do
c3/B/I/T
/D
1.0 0.0 0.0 1.1 0.0 0.0 1.1 0.0 0.0
/B/I/A
/F
1.0 1.0 0.0 0.4
1
0.4
1
0.0 0.4
1
0.4
1
0.0
/B/I/A
/L
1.0 1.0 0.0 0.4
1
0.4
1
0.0 0.4
1
0.4
1
0.0
/M/I/
A/L
0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1
/B/I/T
/
2.0 1.0 0.0 0.4
1
0.4
1
0.0 0.8
1
0.4
1
0.0
/B/I/A 1.0 1.0 0.0 0.4
1
0.4
1
0.0 0.4
1
0.4
1
0.0
/B/I/ 2.0 1.0 0.0 0.4
1
0.4
1
0.0 0.8
1
0.4
1
0.0
/B 1.0 1.0 0.0 0.4
1
0.4
1
0.0 0.4
1
0.4
1
0.0
/M/I/
T
0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1
/M 0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1
/M/I 0.0 0.0 2.0 0.0 0.0 1.1 0.0 0.0 2.2
/I/T/D 0.5 0.0 0.0 1.1
0
0.0 0.0 0.5 0.0 0.0
/I/A/F 0.5 0.5 0.0 0.4
1
0.4
1
0.0 0.2
1
0.2
1
0.0
/I/A/L 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0
/I/T 1.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0
/I/A 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0
/I 1.0 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0
/T/D 0.3
3
0.0 0.0 1.1 0.0 0.0 0.3
3
0.0 0.0
/A/F 0.3
3
0.3
3
0.0 0.4
1
0.4
1
0.0 0.1
4
0.1
4
0.0
/A/L 0.3
3
0.3
3
0.3
3
0.0 0.0 0.0 0.0 0.0 0.0
/T 0.6
7
0.3
3
0.3
3
0.0 0.0 0.0 0.0 0.0 0.0
/A 0.3
3
0.3
3
0.3
3
0.0 0.0 0.0 0.0 0.0 0.0
D 0.2
5
0.0 0.0 1.1 0.0 0.0 0.2
7
0.0 0.0
F 0.2
5
0.2
5
0.0 0.4
1
0.4
1
0.0 0.1 0.1 0.0
L 0.2
5
0.2
5
0.2
5
0.0 0.0 0.0 0.0 0.0 0.0
Let dx and dy be two vectors which represent an XML
document docx and docy. Cosine similarity is defined as
being the angle between two vectors and is quantified by
equation (4) and (5).
||||
cos
yx
T
yx
dd
dd


 , that is (4)










t
k
ky
t
k
kx
t
k
kykx
yx
T
yx
yx
ww
ww
dd
dd
docdocsim
1
2
1
2
1
||||
),( (5)
),...,,( 21 txxxx wwwd  , ),...,,( 21 tyyyy wwwd 
and ),...,,( 21 txxx www is weight of dx, ),...,,( 21 tyyy www
is weight of document of dy, and t is the total number of
path elements in dx, dy respectively [16].
3.1 Singular Value Decomposition (SVD) and DDR SVD
Using SVDLSI, the original path element document
matrix mxnPESSW is first decomposes into three matrices:
T
nxnmxnmxmmxn VSUPESSW 
where U and V contain orthonormal columns and S is
diagonal. By restricting the matrices U, V and S to their
first k <min(m,n) columns, one obtains the matrix
T
kxnkxkmxkmxnmxn VSUPESSWD ˆˆˆ)(ˆ 
, where Dˆ is the best square approximation of D by a
matrix of rank k. To deal with novel documents not
included in the path element-document matrix D, one can
project the novel document vector onto the “semantic
space” of dimension k and measure distance directly in the
semantic space. Thus, an XML document will eventually
be represented as a matrix,
kxm
xd  , with each column
being the projection of the element-specific feature vector
on the semantic space. We call this version of PESSW as
thin-SVD of the DPESSW in the subsequent sections. In this
paper, we focus on DDR based on the SVD (DDR/SVD) for
the sake of simple presentation.
The thin SVD of D is
T
VSUD 111 ,
where
nm
U 
1 has only n basis vectors,
nn
S 
1 is a
diagonal matrix, and
nn
V 
1 is an orthogonal matrix.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3512
According to the cosine similarities are preserved in the t-
dimensional space owing to
22222121
11
ˆˆ
ˆˆ)()(
),cos(
ji
j
T
i
ji
j
T
i
j
T
j
T
j
TT
i
T
ji
yy
yy
yy
yy
dUdU
dUdU
dd  .
We refer this method to as DDR SVD is based on the SVD.
3.2 Our proposed DDR SVD-FCM Algorithm
As described in the previous section from the DDR SVD
decomposition, we have the originated document vector
PESSW, document vector DU T
(refer to economic SVD
decomposition), document vector DQTˆ (refer to SVD
decomposition of rank PESSW), and document vector
DUT~
(refer to efficient low rank PESSW on SVD
decomposition) based on the XML documents, which is
taken as the SVD-FCM input data and then goes through
the clustering.







dUVctor
dUVctor
dUVector
PESSWVector
input
T
T
T
e
~
ˆ
if
F(I)=[f1,f2,…..,fc] , where  

N
j
ij
N
j
jiji
N
Pf
11
1
 .
This method developed by [17] and improved by [18] [19]
is frequently used in pattern recognition. It is based on the
minimization of the following objective function:
2
1 1
ji
N
i
C
j
m
ijm cdJ   
 , 1 ≤ m ≤∞
where m is any real number greater than 1, uij is the
degree of membership of di in the cluster j, di is the ith of d-
dimensional measured data, cj is the d-dimension center of
the cluster, and ||*|| is any norm expressing the
dissimilarity between any measured data and the center.
The objective function shown above with the update of
membership uij and the cluster centers cj by:













C
k
m
ki
ji
ij
cd
cd
1
1
2
1
 ,





 N
i
m
ij
N
i
i
m
ij
j
d
c
1
1


This iteration will stop when  
}{max )()1( k
ij
k
ijij ,
where  is a termination criterion between 0 and 1,
whereas k are the iteration steps.
4. Experiment Result
In the Chart-4, we show the occupied space (k) on the
variant document vectors DPESSW, DU T
, DUTˆ , and
DUT~
from 400, 800, 1200, 1600, and 2000 XMLs
separately. When comparing the DPESSW with the DUT~
on
2000 XMLs, we could save a lot of space, and importantly
unaffected the clustering result where we used the SVD-
FCM would be unaffected. In the Chart-5, we show CPU
executing time (ms) to run SVD-FCM on the variant
document vectors DPESSW , DU T
, DUTˆ , and DUT~
from
400, 800, 1200, 1600, and 2000 XMLs separately. When
comparing the DPESSW with the DUT~
on 2000 XMLs, we
could save a great deal of time, and importantly the
clustering result where we used the SVD-FCM still remains
the right result.
Chart-4: Space on the variant vector from different #
XMLs
Chart-5: CPU (ms) executing SVD-FCM on the variant
vector
SVD-FCM
Clusterin
g
Algorithm
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3513
In the Table-3, we show the percentage of the space saved
when comparing DU T
with DUTˆ , DU T
with
DUT~
, DUTˆ with DUT~
,on running SVD-FCM using 2000
XMLs and 1600 XMLs from 5 DTDs. We also show the
percentage of the CPU executing time when comparing
DU T
with DUT~
, DU T
and DUTˆ , DUTˆ with DUT~
,
on running SVD-FCM using 2000 XMLs and 1600 XMLs
from 5 DTDs. On comparing DU T
with DUTˆ , we find
that using DUTˆ instead of DU T
on 2000 XMLs for the
SVD-FCM saving 69% of space in k and 13% of CPU time in
ms. In the Table-4, we show the percentage of the space
saved and CPU time (ms) when comparing original
document vector DPESSW with DU T
, DUTˆ , and DUT~
on
running SVD-FCM using 2000 XMLs and 1600 XMLs from 5
DTDs. When comparing DPESSW with DUTˆ , we find that
using DUTˆ instead of DPESSW on 2000 XMLs for the SVD-
FCM saving 94% of space in k and 36% of CPU running
time in ms.
Table-3: Percentage of space and CPU saved in thin-SVD
and low rank thin-SVD
Space(k)/CPU(ns)
Variant Vectors
2000 XMLs 1600 XMLs
% saved
Space CPU
% saved
Space CPUDU T
~ DUTˆ 69% 13% 69% 14%
DU T
~ DUT~ 84% 57 % 84% 58%
DUTˆ ~ DUT~ 49% 50% 49% 51%
Table-4: Percentage of space and CPU saved in DPESSW /
thin-SVD
Space(k)/CPU(ns)
Variant Vectors
2000 XMLs 1600 XMLs
% saved
Space CPU
% saved
Space CPU
DPESSW ~ DU T 80% 26 % 80%
26%
DPESSW ~ DUTˆ 94% 36 % 94%
36%
DPESSW ~ DUT~ 97% 68% 97% 68
%
5. CONCLUSION
The original XML documents DN=[d1,d2,..,dN] are modeled
on the vector space model according to the path element
of each document, that is DPESSW (PESSW), then a SVD was
conducted on the DPESSW. We derived the DPESSW =USVT,
and then took the low-rank on the
T
kkkk VSUD ˆ upon
the k low-rank from rank (DPESSW) to efficient low rank of
DPESSW. We passed the 4 resulting vectors (DPESSW, DU T
,
DUTˆ and DUT~
) into the SVD-FCM clustering algorithm
to attain the clustering result. In terms of the clustering
results of the section experiment, we found the same
clustering result as from the variant DPESSW, DU T
, DUTˆ ,
and DUT~
vectors. From the practical experiment results,
we conclude that using the low-rank vector DUT~
instead
of the original document (PESSW), not only saved the
space on the input vector but also took less time to cluster
on the documents.
REFERENCES
[1] J. Pei, J. Han, B. M. Asi, H. Pinto, “PrefixSpan : Mining
Sequential Pattern efficiently by Prefix-Projected
Pattern Growth”, Int. Conf. Data Engineering(ICDE),
2001.
[2] J. H. Hwang, K. H. Ryu, XML A New XML clustering for
Structural Retrieval, International Conference on
Conceptual Modeling, 2004.
[3] Jwong Hee Hwang, Keun ho Ryu, Clustering and
Retrieval of XML documents by Structure,
Computational Science and Its Applications-ICCSA
2005.
[4] Wang Lian, David Wai-lok, An Efficient and Scalable
Algorithm for Clustering XML Documents by
Structure, IEEE Computer Society Technical
Committee on Data Engineer-ing , 2004.
[5] W. F. Massay, Principal components regression in
exploratory statistical research, J. Amer Statist. Assoc.,
vol. 60, pp. 234-246, 1965.
[6] W. S. Torgerson, Theory & Methods of Scaling. New
York: Wiley,1958.
[7] J. B. Tenenbaum, V. de Silva, and J. C. Langford, "A
global geometric framework for nonlinear
dimensionality reduction," Science, vol. 290, no. 5500,
pp. 2319-2323, 2000.
[8] S. T. Roweis and L. K. Saul, Nonlinear dimensionality
reduction by locally linear embedding Science, vol.
290, pp. 2323-2326, 2000.
[9] L. K. Saul and S. T. Roweis, "Think globally, fit locally:
Unsupervised learning of low dimensional manifolds,"
Journal of Machine Learning Research, vol. 4, pp. 119-
155, 2003.
[10] D. L. Donoho and C. E. Grimes, "Hessian eigenmaps:
locally embedding techniques for high-dimensional
data," Proc. Natl Acad. Sci. USA, vol. 100, pp. 5591-
5596, 2003.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3514
[11] M. Belkin and P. Niyogi, Laplacian eigenmaps for
dimensionality reduction and data representation,
Neural Computation, vol. 15, no. 6, pp. 1373-1396,
2003.
[12] Z. Zhang and H. Zha, "Principal manifolds and
nonlinear dimension reduction via tangent space
alignment," SIAM Journal of Scientific Computing, vol.
26, no. 1, pp. 313-338, 2004.
[13] Hyunsoo Kim, Haesun Park, and Hongyuan Zha,
Distance preserving dimension reduction using the QR
factorization or the Cholesky factorization,
Proceedings of the 2007 IEEE 7th International
Symposium on Bioinformatics & Bioengineering (BIBE
2007), vol I, pages 263-269.
[14] Theodore Dalamagas, Tao Cheng, Klaas Jan Winkel,
Timos Sellis, A Methodology for Clustering XML
Documents by Structure, Information Systems, 31(3):
187-228, 2006.
[15] Gao J. and Zhang J. (2005): Clustered SVD strategies in
latent semantic indexing.—Inf. Process. Manag., Vol.
41, No. 5, pp. 1051–1063.
[16] Berry M.W. and Shakhina A.P. (2005): Computing
sparse reduced-rank approximation to sparse matrices.
— ACM Trans. Math. Software, 2005, Vol. 31, No. 2, pp.
252–269.
[17] J. C. Dunn, Well Separated Clusters and Optimal Fuzzy
Partitions, Journal Cybern., 4, 1974,. 95-104.
[18] J. C. Bezdek, Numerical Taxonomy with Fuzzy Sets, J.
Math. Biol., 1, 1974, 57-71.
[19] J. C. Bezdek, Fuzzy Mathematics in Pattern
Classification, (Ph.D Thesis, Cornell University, 1973).

More Related Content

What's hot

Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniquesIJDKP
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARIntroduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARBHARATH KUMAR
 
The Database Environment Chapter 14
The Database Environment Chapter 14The Database Environment Chapter 14
The Database Environment Chapter 14Jeanie Arnoco
 
A New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured DocumentsA New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured DocumentsIJAEMSJORNAL
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewPrabu U
 
Modified approach to transform arc from text to linear form text a preproces...
Modified approach to transform arc from text to linear form text  a preproces...Modified approach to transform arc from text to linear form text  a preproces...
Modified approach to transform arc from text to linear form text a preproces...sipij
 
Research methods module 5 msf
Research methods module 5 msfResearch methods module 5 msf
Research methods module 5 msfIndependent
 
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...Optimization of Surface Roughness Index for a Roller Burnishing Process Using...
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...paperpublications3
 
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGS
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSXML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGS
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSijdms
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...eSAT Publishing House
 
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...AM Publications
 
UNIT III NON LINEAR DATA STRUCTURES – TREES
UNIT III 	NON LINEAR DATA STRUCTURES – TREESUNIT III 	NON LINEAR DATA STRUCTURES – TREES
UNIT III NON LINEAR DATA STRUCTURES – TREESKathirvel Ayyaswamy
 
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTS
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTSINVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTS
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTScsandit
 
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSE
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSEGENERALIZATION AND INSTANTIATION FOR COMPONENT REUSE
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSESamira Sadaoui
 

What's hot (20)

Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARIntroduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
 
The Database Environment Chapter 14
The Database Environment Chapter 14The Database Environment Chapter 14
The Database Environment Chapter 14
 
A New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured DocumentsA New Structural Similarity Measure: Clustering of Multi-Structured Documents
A New Structural Similarity Measure: Clustering of Multi-Structured Documents
 
Introduction to DBMS and SQL Overview
Introduction to DBMS and SQL OverviewIntroduction to DBMS and SQL Overview
Introduction to DBMS and SQL Overview
 
Modified approach to transform arc from text to linear form text a preproces...
Modified approach to transform arc from text to linear form text  a preproces...Modified approach to transform arc from text to linear form text  a preproces...
Modified approach to transform arc from text to linear form text a preproces...
 
Research methods module 5 msf
Research methods module 5 msfResearch methods module 5 msf
Research methods module 5 msf
 
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...Optimization of Surface Roughness Index for a Roller Burnishing Process Using...
Optimization of Surface Roughness Index for a Roller Burnishing Process Using...
 
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGS
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGSXML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGS
XML COMPACTION IMPROVEMENTS BASED ON BINARY STRING ENCODINGS
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...An extended database reverse engineering – a key for database forensic invest...
An extended database reverse engineering – a key for database forensic invest...
 
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
Improvement in Traditional Set Partitioning in Hierarchical Trees (SPIHT) Alg...
 
Bo4301369372
Bo4301369372Bo4301369372
Bo4301369372
 
Ds
DsDs
Ds
 
UNIT III NON LINEAR DATA STRUCTURES – TREES
UNIT III 	NON LINEAR DATA STRUCTURES – TREESUNIT III 	NON LINEAR DATA STRUCTURES – TREES
UNIT III NON LINEAR DATA STRUCTURES – TREES
 
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTS
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTSINVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTS
INVESTIGATING BINARY STRING ENCODING FOR COMPACT REPRESENTATION OF XML DOCUMENTS
 
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSE
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSEGENERALIZATION AND INSTANTIATION FOR COMPONENT REUSE
GENERALIZATION AND INSTANTIATION FOR COMPONENT REUSE
 
09.01 normalization
09.01 normalization09.01 normalization
09.01 normalization
 

Similar to IRJET- Data Dimension Reduction for Clustering Semantic Documents using SVD Fuzzy C-Mean (SVD-FCM)

Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsMethodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsIRJET Journal
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringIRJET Journal
 
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...IRJET Journal
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern MatchingIOSR Journals
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
 
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODELQUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODELIRJET Journal
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...journalBEEI
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
 
The Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHPThe Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHPDr. Amarjeet Singh
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET Journal
 
Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015ijcsbi
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONcscpconf
 
Data-Structure-original-QuantumSupply.pdf
Data-Structure-original-QuantumSupply.pdfData-Structure-original-QuantumSupply.pdf
Data-Structure-original-QuantumSupply.pdflehal93146
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGIJCI JOURNAL
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONcsandit
 

Similar to IRJET- Data Dimension Reduction for Clustering Semantic Documents using SVD Fuzzy C-Mean (SVD-FCM) (20)

Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLsMethodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
Methodology for Managing Dynamic Collections on Semantic Semi-Structured XMLs
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
Semantic Semi-Structured Documents of Least Edit Distance (LED) Calculation f...
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
 
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODELQUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL
QUICK GLANCE: UNSUPERVISED EXTRACTIVE SUMMARIZATION MODEL
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...The evaluation performance of letter-based technique on text steganography sy...
The evaluation performance of letter-based technique on text steganography sy...
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
 
The Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHPThe Evaluation Model of Garbage Classification System Based on AHP
The Evaluation Model of Garbage Classification System Based on AHP
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF Metric
 
Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015Vol 15 No 3 - May 2015
Vol 15 No 3 - May 2015
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
 
Data-Structure-original-QuantumSupply.pdf
Data-Structure-original-QuantumSupply.pdfData-Structure-original-QuantumSupply.pdf
Data-Structure-original-QuantumSupply.pdf
 
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSINGFEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
FEATURES MATCHING USING NATURAL LANGUAGE PROCESSING
 
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATIONLATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
LATTICE-CELL : HYBRID APPROACH FOR TEXT CATEGORIZATION
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 

Recently uploaded (20)

Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 

IRJET- Data Dimension Reduction for Clustering Semantic Documents using SVD Fuzzy C-Mean (SVD-FCM)

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3509 Data Dimension Reduction for Clustering Semantic Documents using SVD Fuzzy C-Mean (SVD-FCM) Hsu-Kuang Chang Associative Professor, Department of Information Engineering, I-Shou University, Kaohsiung, Taiwan ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - The rapid growth of XML adoption has urged for the need of a proper representation for semi- structured documents, where the document semantic structural information has to be taken into account so as to support more precise document analysis. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. In this paper, we introduce data dimension reduction (DDR) based on the SVD factorization (DDR/SVD) for the documents similarity calculating and DDR SVD Fuzzy C-Mean (DDR SVD-FCM) After projecting XML documents to the lower dimensional space obtained from DDR, our proposed method fuzzy c-mean to execute the document-analysis clustering algorithms (SVD-FCM). DDR can substantially reduce the computing time and/or memory requirement of a given document-analysis clustering algorithm, especially when we need to run the document-analysis algorithm many times for estimating parameters or searching for a better solution. Key Words: SVD, DDR, XML, SVD-FCM 1. INTRODUCTION An XML document, which is semi-structured data, has a hierarchical structure. Therefore, rather than using the similarity measure of the general document clustering techniques as it is, a new similarity measure which considers the semantic and structural information of an XML document must be investigated. However, some XML clustering methods used the similarity measure which only takes the structural information of XML documents into account. Hwang proposes a clustering method which extracts a typical structure of the maximum frequency pattern n using PrefixSpan algorithm [1] on XML documents [2, 3]. However, since such a typical structure extracted from XML documents is not the only structure which represents the XML document itself, it cannot be the representative of the whole document corpus, since there is an accuracy issue of similarity. Lian summarizes XML documents into an S-graph which is a structural graph, and proposes that the calculation method of the distance between S-graphs is used for clustering [4]. However, they do not consider semantic information on XML documents since they only focus on structural information. Since dimension reduction is one of the fundamental methods of data analysis, there have been a great many studies on effective and efficient dimension reduction algorithms. There are linear dimension reduction algorithms including principal component analysis (PCA) [5] and multidimensional scaling (MDS) [6]. There are also nonlinear dimensional reduction algorithms (NLDR) including an Isomap [7], locally linear embedding (LLE) [8], [9], Hessian LLE [10], Laplacian eigenmaps [11], local tangent space alignment (LTSA) [12] and a distance preserving dimension reduction based on a singular value decomposition (DPDR/QR) [13]. These dimensions cover a variety of areas such as biomedical image recognition, biomedical text data mining, and biological data analysis. The rest of this paper is organized as follows. In Section 2, we introduce the prepared XML documents on a vector space model. In Section 3, we show the DDR based on the SVD decomposition and the DDR SVD-FCM. Section 4 presents the experimental results illustrating properties of the proposed DDR methods. A summary is made in Section 5. 2. Preparation of Semantic-based XML Documents In this section, we first introduce the pre-processing steps for the incorporation of hierarchical information in encoding the XML tree’s paths. This is based on the preorder tree representation (PTR) [14] and will be introduced after a brief review of how to generate an XML tree from an XML document. Chart-1 illustrates an example of structural summary. By applying the phase of the nested reduction on tree T1, we derived tree T2 where there are no nested nodes. By applying the repetition reduction on tree T2, we derived tree T3, which is the structural summary tree without nested and repeated nodes. Once the trees have been compacted using structural summaries, the nesting and repetition are reduced.
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3510 Chart -1: nested and repeated nodes extraction Now the XML document is modeled as a XML tree T=(V,E) where V={ v1, v2, ....} as a set of vertices and V1 , V2 , Evv ),( 21 as a set of edges. As an example, Chart-2 depicts a sample XML tree containing some information about the collection of books. The book consists of intro tags, each comprising title, author and date tags. Each author contains fname and lname, each date includes year and month tags. Chart-2 left shows only the first letter of each tag for simplicity. Chart -2: Example of XML Document The XML document has a hierarchical structure and this structure is organized with tag paths, to represent document characteristics which can predict the contents of the XML document. Strictly speaking, this shows the semantic structural characteristics of the XML document. In this paper, we propose a new method for calculating the similarity using all of the tag paths of the XML tree representing the semantic structural information of the XML document. From now on, a tag path is termed path element. Table-1 shows path elements obtained from the XML document in Chart-3. The PEL-i represents the extracted path elements on the XML document tree from the ith tree level to the leaf node. For example, the PEL-1 means the path element from the root level (level 1) to the leaf node, and PEL-2 means the path element from the level 2 to the leaf node respectively. Chart -3: XML Documents Example Table -1: Path elements example PEL-1 PEL-2 PEL-3 PEL-4 /B/I/T/D /I/T/D /T/D D /B/I/A/F /I/A/F /A/F F /B/I/A/L /I/A/L /A/L L /M/I/A/L /I/T /T /B/I/T/ /I/A /A /B/I/A /I /B/I/ /B /M/I/T /M /M/I 3. Path Element of the Vector Space Model (PEVSM) Vector model represents a document as a vector whose elements are the weights of the path elements within the document. To calculate the weight of each path element within the document, a Term Frequency and IDF (Inverse Document Frequency) method is used [15]. We define the PESSW (Path Element Structural Semantic Weight) which calculates the weight of the path element in an XML document. The PESSW is PEWF (Path Element Weighted Frequency) multiplied by the PEIDF (Path Element Inverse Document Frequency). The PESSWij of ith path element in the jth document is shown in equation (1). In this paper, we use the PESSW and DPESSW interchange. ijijij PEIDFPEWFPESSW  (1) ijPEWF is shown in equation (2). nijij x freqPEWF 1  (2)
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3511 ijfreq is a frequency of j-th path element in a i-th document and it is multiplied by level weight n x 1 in order to consider the semantic importance of a path element in a document. X refers to the level number of the highest tag of a tag path. The level number of the root tag is 1, and that of a tag under the root tag is 2, and so on. N is a real number larger than 1, and in this paper, 1 is chosen for the value of n. PEIDFij is shown in equation (3). PEIDFij is shown in equation (3). j ij DF N PEIDF log (3) where N is the total number of documents and DFj is the number of documents in which the jth path element appears. The PESSW is prudently calculated to correctly reflect the structural semantic similarity. Table-2 shows the PEWF, PEIDF, and PESSW on sample trees in Chart-3. Table-2: An example of PTWF, PTIDF and PESSW Path Eleme nt PEWF PEIDF PESSW do c1 do c2 do c3 do c1 do c2 do c3 do c1 do c2 do c3/B/I/T /D 1.0 0.0 0.0 1.1 0.0 0.0 1.1 0.0 0.0 /B/I/A /F 1.0 1.0 0.0 0.4 1 0.4 1 0.0 0.4 1 0.4 1 0.0 /B/I/A /L 1.0 1.0 0.0 0.4 1 0.4 1 0.0 0.4 1 0.4 1 0.0 /M/I/ A/L 0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1 /B/I/T / 2.0 1.0 0.0 0.4 1 0.4 1 0.0 0.8 1 0.4 1 0.0 /B/I/A 1.0 1.0 0.0 0.4 1 0.4 1 0.0 0.4 1 0.4 1 0.0 /B/I/ 2.0 1.0 0.0 0.4 1 0.4 1 0.0 0.8 1 0.4 1 0.0 /B 1.0 1.0 0.0 0.4 1 0.4 1 0.0 0.4 1 0.4 1 0.0 /M/I/ T 0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1 /M 0.0 0.0 1.0 0.0 0.0 1.1 0.0 0.0 1.1 /M/I 0.0 0.0 2.0 0.0 0.0 1.1 0.0 0.0 2.2 /I/T/D 0.5 0.0 0.0 1.1 0 0.0 0.0 0.5 0.0 0.0 /I/A/F 0.5 0.5 0.0 0.4 1 0.4 1 0.0 0.2 1 0.2 1 0.0 /I/A/L 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 /I/T 1.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 /I/A 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 /I 1.0 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0 /T/D 0.3 3 0.0 0.0 1.1 0.0 0.0 0.3 3 0.0 0.0 /A/F 0.3 3 0.3 3 0.0 0.4 1 0.4 1 0.0 0.1 4 0.1 4 0.0 /A/L 0.3 3 0.3 3 0.3 3 0.0 0.0 0.0 0.0 0.0 0.0 /T 0.6 7 0.3 3 0.3 3 0.0 0.0 0.0 0.0 0.0 0.0 /A 0.3 3 0.3 3 0.3 3 0.0 0.0 0.0 0.0 0.0 0.0 D 0.2 5 0.0 0.0 1.1 0.0 0.0 0.2 7 0.0 0.0 F 0.2 5 0.2 5 0.0 0.4 1 0.4 1 0.0 0.1 0.1 0.0 L 0.2 5 0.2 5 0.2 5 0.0 0.0 0.0 0.0 0.0 0.0 Let dx and dy be two vectors which represent an XML document docx and docy. Cosine similarity is defined as being the angle between two vectors and is quantified by equation (4) and (5). |||| cos yx T yx dd dd    , that is (4)           t k ky t k kx t k kykx yx T yx yx ww ww dd dd docdocsim 1 2 1 2 1 |||| ),( (5) ),...,,( 21 txxxx wwwd  , ),...,,( 21 tyyyy wwwd  and ),...,,( 21 txxx www is weight of dx, ),...,,( 21 tyyy www is weight of document of dy, and t is the total number of path elements in dx, dy respectively [16]. 3.1 Singular Value Decomposition (SVD) and DDR SVD Using SVDLSI, the original path element document matrix mxnPESSW is first decomposes into three matrices: T nxnmxnmxmmxn VSUPESSW  where U and V contain orthonormal columns and S is diagonal. By restricting the matrices U, V and S to their first k <min(m,n) columns, one obtains the matrix T kxnkxkmxkmxnmxn VSUPESSWD ˆˆˆ)(ˆ  , where Dˆ is the best square approximation of D by a matrix of rank k. To deal with novel documents not included in the path element-document matrix D, one can project the novel document vector onto the “semantic space” of dimension k and measure distance directly in the semantic space. Thus, an XML document will eventually be represented as a matrix, kxm xd  , with each column being the projection of the element-specific feature vector on the semantic space. We call this version of PESSW as thin-SVD of the DPESSW in the subsequent sections. In this paper, we focus on DDR based on the SVD (DDR/SVD) for the sake of simple presentation. The thin SVD of D is T VSUD 111 , where nm U  1 has only n basis vectors, nn S  1 is a diagonal matrix, and nn V  1 is an orthogonal matrix.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3512 According to the cosine similarities are preserved in the t- dimensional space owing to 22222121 11 ˆˆ ˆˆ)()( ),cos( ji j T i ji j T i j T j T j TT i T ji yy yy yy yy dUdU dUdU dd  . We refer this method to as DDR SVD is based on the SVD. 3.2 Our proposed DDR SVD-FCM Algorithm As described in the previous section from the DDR SVD decomposition, we have the originated document vector PESSW, document vector DU T (refer to economic SVD decomposition), document vector DQTˆ (refer to SVD decomposition of rank PESSW), and document vector DUT~ (refer to efficient low rank PESSW on SVD decomposition) based on the XML documents, which is taken as the SVD-FCM input data and then goes through the clustering.        dUVctor dUVctor dUVector PESSWVector input T T T e ~ ˆ if F(I)=[f1,f2,…..,fc] , where    N j ij N j jiji N Pf 11 1  . This method developed by [17] and improved by [18] [19] is frequently used in pattern recognition. It is based on the minimization of the following objective function: 2 1 1 ji N i C j m ijm cdJ     , 1 ≤ m ≤∞ where m is any real number greater than 1, uij is the degree of membership of di in the cluster j, di is the ith of d- dimensional measured data, cj is the d-dimension center of the cluster, and ||*|| is any norm expressing the dissimilarity between any measured data and the center. The objective function shown above with the update of membership uij and the cluster centers cj by:              C k m ki ji ij cd cd 1 1 2 1  ,       N i m ij N i i m ij j d c 1 1   This iteration will stop when   }{max )()1( k ij k ijij , where  is a termination criterion between 0 and 1, whereas k are the iteration steps. 4. Experiment Result In the Chart-4, we show the occupied space (k) on the variant document vectors DPESSW, DU T , DUTˆ , and DUT~ from 400, 800, 1200, 1600, and 2000 XMLs separately. When comparing the DPESSW with the DUT~ on 2000 XMLs, we could save a lot of space, and importantly unaffected the clustering result where we used the SVD- FCM would be unaffected. In the Chart-5, we show CPU executing time (ms) to run SVD-FCM on the variant document vectors DPESSW , DU T , DUTˆ , and DUT~ from 400, 800, 1200, 1600, and 2000 XMLs separately. When comparing the DPESSW with the DUT~ on 2000 XMLs, we could save a great deal of time, and importantly the clustering result where we used the SVD-FCM still remains the right result. Chart-4: Space on the variant vector from different # XMLs Chart-5: CPU (ms) executing SVD-FCM on the variant vector SVD-FCM Clusterin g Algorithm
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3513 In the Table-3, we show the percentage of the space saved when comparing DU T with DUTˆ , DU T with DUT~ , DUTˆ with DUT~ ,on running SVD-FCM using 2000 XMLs and 1600 XMLs from 5 DTDs. We also show the percentage of the CPU executing time when comparing DU T with DUT~ , DU T and DUTˆ , DUTˆ with DUT~ , on running SVD-FCM using 2000 XMLs and 1600 XMLs from 5 DTDs. On comparing DU T with DUTˆ , we find that using DUTˆ instead of DU T on 2000 XMLs for the SVD-FCM saving 69% of space in k and 13% of CPU time in ms. In the Table-4, we show the percentage of the space saved and CPU time (ms) when comparing original document vector DPESSW with DU T , DUTˆ , and DUT~ on running SVD-FCM using 2000 XMLs and 1600 XMLs from 5 DTDs. When comparing DPESSW with DUTˆ , we find that using DUTˆ instead of DPESSW on 2000 XMLs for the SVD- FCM saving 94% of space in k and 36% of CPU running time in ms. Table-3: Percentage of space and CPU saved in thin-SVD and low rank thin-SVD Space(k)/CPU(ns) Variant Vectors 2000 XMLs 1600 XMLs % saved Space CPU % saved Space CPUDU T ~ DUTˆ 69% 13% 69% 14% DU T ~ DUT~ 84% 57 % 84% 58% DUTˆ ~ DUT~ 49% 50% 49% 51% Table-4: Percentage of space and CPU saved in DPESSW / thin-SVD Space(k)/CPU(ns) Variant Vectors 2000 XMLs 1600 XMLs % saved Space CPU % saved Space CPU DPESSW ~ DU T 80% 26 % 80% 26% DPESSW ~ DUTˆ 94% 36 % 94% 36% DPESSW ~ DUT~ 97% 68% 97% 68 % 5. CONCLUSION The original XML documents DN=[d1,d2,..,dN] are modeled on the vector space model according to the path element of each document, that is DPESSW (PESSW), then a SVD was conducted on the DPESSW. We derived the DPESSW =USVT, and then took the low-rank on the T kkkk VSUD ˆ upon the k low-rank from rank (DPESSW) to efficient low rank of DPESSW. We passed the 4 resulting vectors (DPESSW, DU T , DUTˆ and DUT~ ) into the SVD-FCM clustering algorithm to attain the clustering result. In terms of the clustering results of the section experiment, we found the same clustering result as from the variant DPESSW, DU T , DUTˆ , and DUT~ vectors. From the practical experiment results, we conclude that using the low-rank vector DUT~ instead of the original document (PESSW), not only saved the space on the input vector but also took less time to cluster on the documents. REFERENCES [1] J. Pei, J. Han, B. M. Asi, H. Pinto, “PrefixSpan : Mining Sequential Pattern efficiently by Prefix-Projected Pattern Growth”, Int. Conf. Data Engineering(ICDE), 2001. [2] J. H. Hwang, K. H. Ryu, XML A New XML clustering for Structural Retrieval, International Conference on Conceptual Modeling, 2004. [3] Jwong Hee Hwang, Keun ho Ryu, Clustering and Retrieval of XML documents by Structure, Computational Science and Its Applications-ICCSA 2005. [4] Wang Lian, David Wai-lok, An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, IEEE Computer Society Technical Committee on Data Engineer-ing , 2004. [5] W. F. Massay, Principal components regression in exploratory statistical research, J. Amer Statist. Assoc., vol. 60, pp. 234-246, 1965. [6] W. S. Torgerson, Theory & Methods of Scaling. New York: Wiley,1958. [7] J. B. Tenenbaum, V. de Silva, and J. C. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. 290, no. 5500, pp. 2319-2323, 2000. [8] S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding Science, vol. 290, pp. 2323-2326, 2000. [9] L. K. Saul and S. T. Roweis, "Think globally, fit locally: Unsupervised learning of low dimensional manifolds," Journal of Machine Learning Research, vol. 4, pp. 119- 155, 2003. [10] D. L. Donoho and C. E. Grimes, "Hessian eigenmaps: locally embedding techniques for high-dimensional data," Proc. Natl Acad. Sci. USA, vol. 100, pp. 5591- 5596, 2003.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 3514 [11] M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, vol. 15, no. 6, pp. 1373-1396, 2003. [12] Z. Zhang and H. Zha, "Principal manifolds and nonlinear dimension reduction via tangent space alignment," SIAM Journal of Scientific Computing, vol. 26, no. 1, pp. 313-338, 2004. [13] Hyunsoo Kim, Haesun Park, and Hongyuan Zha, Distance preserving dimension reduction using the QR factorization or the Cholesky factorization, Proceedings of the 2007 IEEE 7th International Symposium on Bioinformatics & Bioengineering (BIBE 2007), vol I, pages 263-269. [14] Theodore Dalamagas, Tao Cheng, Klaas Jan Winkel, Timos Sellis, A Methodology for Clustering XML Documents by Structure, Information Systems, 31(3): 187-228, 2006. [15] Gao J. and Zhang J. (2005): Clustered SVD strategies in latent semantic indexing.—Inf. Process. Manag., Vol. 41, No. 5, pp. 1051–1063. [16] Berry M.W. and Shakhina A.P. (2005): Computing sparse reduced-rank approximation to sparse matrices. — ACM Trans. Math. Software, 2005, Vol. 31, No. 2, pp. 252–269. [17] J. C. Dunn, Well Separated Clusters and Optimal Fuzzy Partitions, Journal Cybern., 4, 1974,. 95-104. [18] J. C. Bezdek, Numerical Taxonomy with Fuzzy Sets, J. Math. Biol., 1, 1974, 57-71. [19] J. C. Bezdek, Fuzzy Mathematics in Pattern Classification, (Ph.D Thesis, Cornell University, 1973).