This document evaluates the reliability of using Wikipedia as a source for cross-lingual query translation between English and Portuguese in the medical domain. Experiments were conducted translating single-word, two-word, and three-word queries between the two languages using Wikipedia links. Results showed Wikipedia coverage of single-word queries was around 81% for English and 80% for Portuguese. For query translation, coverage was 60% from English to Portuguese and 88% from Portuguese to English for single-word queries. Coverage decreased as query length increased. The study demonstrates Wikipedia has potential for cross-lingual information retrieval but coverage varies with query complexity.
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALIJCI JOURNAL
Now a days, number of Web Users accessing information over Internet is increasing day by day. A huge
amount of information on Internet is available in different language that can be access by anybody at any
time. Information Retrieval (IR) deals with finding useful information from a large collection of
unstructured, structured and semi-structured data. Information Retrieval can be classified into different
classes such as monolingual information retrieval, cross language information retrieval and multilingual
information retrieval (MLIR) etc. In the current scenario, the diversity of information and language
barriers are the serious issues for communication and cultural exchange across the world. To solve such
barriers, cross language information retrieval (CLIR) system, are nowadays in strong demand. CLIR refers
to the information retrieval activities in which the query or documents may appear in different languages.
This paper takes an overview of the new application areas of CLIR and reviews the approaches used in the
process of CLIR research for query and document translation. Further, based on available literature, a
number of challenges and issues in CLIR have been identified and discussed.
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
A Review on the Cross and Multilingual Information Retrievaldannyijwest
In this paper we explore some of the most important areas of information retrieval. In particular, Cross-
lingual Information Retrieval (CLIR) and Multilingual Information Retrieval (MLIR). CLIR deals with
asking questions in one language and retrieving documents in different language. MLIR deals with asking
questions in one or more languages and retrieving documents in one or more different languages. With an
increasingly globalized economy, the ability to find information in other languages is becoming a necessity.
We also presented the evaluation initiatives of information retrieval domain. Finally we have presented the
overall review of the research works in Indian and Foreign languages.
Developing an arabic plagiarism detection corpuscsandit
A corpus is a collection of documents. It is a valuable resource in linguistics research to
perform statistical analysis and testing hypothesis for different linguistic rules. An annotated
corpus consists of documents or entities annotated with some task related labels such as part of
speech tags, sentiment etc One such task is plagiarism detection that seeks to identify if a given
document is plagiarized or not. This paper describes our efforts to build a plagiarism detection
corpus for Arabic. The corpus consists of about 350 plagiarized – source document pairs and
more than 250 documents where no plagiarism was found. The plagiarized documents consists
of students submitted assignments. For each of the plagiarized documents, the source document
was located from the Web and downloaded for further investigation. We report corpus statistics
including number of documents, number of sentences and number of tokens for each of the
plagiarized and source categories.
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
With rapid growth of multilingual information on the Internet, Cross Language Information Retrieval (CLIR) is becoming need of the day. It helps user to query in their native language and retrieve information in any language. But the performance of CLIR is poor as compared to monolingual retrieval due to lexical ambiguity, mismatching of query terms and out-of-vocabulary words. In this paper, we have proposed an algorithm for improving the performance of Marathi-English CLIR system. The system first finds possible translations of input query in target language, disambiguates them and then gives English queries to search engine for relevant document retrieval. The disambiguation is based on unsupervised corpus-based method which uses English dictionary as additional resource. The experiment is performed on FIRE 2011 (Forum of Information Retrieval Evaluation) dataset using “Title” and “Description” fields as inputs. The experimental results show that proposed approach gives better performance of Marathi-English CLIR system with good precision level.
The logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.
Information residing in relational databases and delimited file systems are inadequate for reuse and sharing over the web. These file systems do not adhere to commonly set principles for maintaining data harmony. Due to these reasons, the resources have been suffering from lack of uniformity, heterogeneity as well as redundancy throughout the web. Ontologies have been widely used for solving such type of problems, as they help in extracting knowledge out of any information system. In this article, we focus on extracting concepts and their relations from a set of CSV files. These files are served as individual concepts and grouped into a particular domain, called the domain ontology. Furthermore, this domain ontology is used for capturing CSV data and represented in RDF format retaining links among files or concepts. Datatype and object properties are automatically detected from header fields. This reduces the task of user involvement in generating mapping files. The detail analysis has been performed on Baseball tabular data and the result shows a rich set of semantic information.
A SURVEY ON CROSS LANGUAGE INFORMATION RETRIEVALIJCI JOURNAL
Now a days, number of Web Users accessing information over Internet is increasing day by day. A huge
amount of information on Internet is available in different language that can be access by anybody at any
time. Information Retrieval (IR) deals with finding useful information from a large collection of
unstructured, structured and semi-structured data. Information Retrieval can be classified into different
classes such as monolingual information retrieval, cross language information retrieval and multilingual
information retrieval (MLIR) etc. In the current scenario, the diversity of information and language
barriers are the serious issues for communication and cultural exchange across the world. To solve such
barriers, cross language information retrieval (CLIR) system, are nowadays in strong demand. CLIR refers
to the information retrieval activities in which the query or documents may appear in different languages.
This paper takes an overview of the new application areas of CLIR and reviews the approaches used in the
process of CLIR research for query and document translation. Further, based on available literature, a
number of challenges and issues in CLIR have been identified and discussed.
Information retrieval (IR) system aims to retrieve
relevant documents to a user query where the query is a set of
keywords. Cross-language information retrieval (CLIR) is a
retrieval process in which the user fires queries in one language to
retrieve information from another language. The growing
requirement on the Internet for users to access information
expressed in language other than their own has led to Cross
Language Information Retrieval (CLIR) becoming established as
a major topic in IR.
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
A Review on the Cross and Multilingual Information Retrievaldannyijwest
In this paper we explore some of the most important areas of information retrieval. In particular, Cross-
lingual Information Retrieval (CLIR) and Multilingual Information Retrieval (MLIR). CLIR deals with
asking questions in one language and retrieving documents in different language. MLIR deals with asking
questions in one or more languages and retrieving documents in one or more different languages. With an
increasingly globalized economy, the ability to find information in other languages is becoming a necessity.
We also presented the evaluation initiatives of information retrieval domain. Finally we have presented the
overall review of the research works in Indian and Foreign languages.
Developing an arabic plagiarism detection corpuscsandit
A corpus is a collection of documents. It is a valuable resource in linguistics research to
perform statistical analysis and testing hypothesis for different linguistic rules. An annotated
corpus consists of documents or entities annotated with some task related labels such as part of
speech tags, sentiment etc One such task is plagiarism detection that seeks to identify if a given
document is plagiarized or not. This paper describes our efforts to build a plagiarism detection
corpus for Arabic. The corpus consists of about 350 plagiarized – source document pairs and
more than 250 documents where no plagiarism was found. The plagiarized documents consists
of students submitted assignments. For each of the plagiarized documents, the source document
was located from the Web and downloaded for further investigation. We report corpus statistics
including number of documents, number of sentences and number of tokens for each of the
plagiarized and source categories.
Marathi-English CLIR using detailed user query and unsupervised corpus-based WSDIJERA Editor
With rapid growth of multilingual information on the Internet, Cross Language Information Retrieval (CLIR) is becoming need of the day. It helps user to query in their native language and retrieve information in any language. But the performance of CLIR is poor as compared to monolingual retrieval due to lexical ambiguity, mismatching of query terms and out-of-vocabulary words. In this paper, we have proposed an algorithm for improving the performance of Marathi-English CLIR system. The system first finds possible translations of input query in target language, disambiguates them and then gives English queries to search engine for relevant document retrieval. The disambiguation is based on unsupervised corpus-based method which uses English dictionary as additional resource. The experiment is performed on FIRE 2011 (Forum of Information Retrieval Evaluation) dataset using “Title” and “Description” fields as inputs. The experimental results show that proposed approach gives better performance of Marathi-English CLIR system with good precision level.
The logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.
Information residing in relational databases and delimited file systems are inadequate for reuse and sharing over the web. These file systems do not adhere to commonly set principles for maintaining data harmony. Due to these reasons, the resources have been suffering from lack of uniformity, heterogeneity as well as redundancy throughout the web. Ontologies have been widely used for solving such type of problems, as they help in extracting knowledge out of any information system. In this article, we focus on extracting concepts and their relations from a set of CSV files. These files are served as individual concepts and grouped into a particular domain, called the domain ontology. Furthermore, this domain ontology is used for capturing CSV data and represented in RDF format retaining links among files or concepts. Datatype and object properties are automatically detected from header fields. This reduces the task of user involvement in generating mapping files. The detail analysis has been performed on Baseball tabular data and the result shows a rich set of semantic information.
Ontology languages are used in modelling the semantics of concepts within a particular domain and the relationships between those concepts. The Semantic Web standard provides a number of modelling languages that differ in their level of expressivity and are organized in a Semantic Web Stack in such a way that each language level builds on the expressivity of the other. There are several problems when one attempts to use independently developed ontologies. When existing ontologies are adapted for new purposes it requires that certain operations are performed on them. These operations are currently performed in a semi-automated manner. This paper seeks to model categorically the syntax and semantics of RDF ontology as a step towards the formalization of ontological operations using category theory.
Iaetsd hierarchical fuzzy rule based classificationIaetsd Iaetsd
This document discusses a hierarchical fuzzy rule-based classification system using genetic rule selection to filter unwanted messages from online social networks. It aims to improve performance on imbalanced data sets by increasing granularity of fuzzy partitions at class boundaries. The system uses a neural network learning model and genetic algorithm for rule selection to build an accurate and compact fuzzy rule-based model. It analyzes challenges in classifying short texts from social media posts and reviews related work on content-based filtering and policy-based personalization for social networks. The document also discusses issues with imbalanced data sets and proposes oversampling the minority class using SMOTE (Synthetic Minority Over-sampling Technique) as a preprocessing step to address class imbalance problems.
1) The document discusses various services for open access literature including institutional archives, metadata harvesting through Celestial, and citation analysis services like Citebase.
2) It describes how Citebase extracts references from texts and stores them in a structured database to enable citation linking and navigation between cited and citing articles.
3) Early download frequency data from arXiv.org is shown to correlate with longer-term citation frequency, indicating web impact may predict future citation impact.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Diacritic Oriented Arabic Information Retrieval SystemCSCJournals
Arabic language support in search engines and operating systems is improved in recent years. Searching in the Internet is reliable and can be compared to the excellent support for several other languages, including English. However, for text with diacritics there are some limitations. For this reason, most Information retrieval (IR) systems remove diacritics from text and ignore it for its complexity. Searching text with diacritics is important for some kinds of documents, such as those of religious books, some newspapers and children stories. This research shows the design and development of the system that overcome the problem. The proposed system considers diacritics. The proposed system includes the design complexity in the retrieving algorithm rather than the information repository, which is database in this study. Also, this study analyses the results and the performance. Results are promising and performance analysis shows methods to enhance design and increase the performance. The proposed system can be integrated in search engines, text editors and any information retrieval system that include Arabic text. Performance analysis of the proposed system shows that this system is reliable. The proposed system is applied on database of Hadeeth, which is religious book includes the prophet action and statements. The system can be applied in any kind of data repository.
Stellar Measurements with the New Intensity FormulaIOSR Journals
In this paper a linear relationship in stellar optical spectra has been found by using a
spectroscopical method used on optical light sources where it is possible to organize atomic and ionic data.
This method is based on a new intensity formula in optical emission spectroscopy (OES). Like the HR-diagram ,
it seems to be possible to organize the luminosity of stars from different spectral classes. From that organization
it is possible to determine the temperature , density and mass of stars by using the new intensity formula. These
temperature, density and mass values agree well with literature values. It is also possible to determine the mean
electron temperature of the optical layers (photospheres) of the stars as it is for atoms in the for laboratory
plasmas. The mean value of the ionization energies of the different elements of the stars has shown to be very
significant for each star. This paper also shows that the hydrogen Balmer absorption lines in the stars follow
the new intensity formula.
Five bacterial isolates from the rhizosphere of tea plants in Barak Valley, Assam, India were tested for their ability to control the fungal pathogen Fusarium oxysporum. The isolates were identified as Bacillus species based on their morphological and biochemical traits. Three isolates showed moderate antagonistic activity against F. oxysporum in vitro, inhibiting its growth by 2.7 to 3.6 cm. All three isolates were found to produce volatile metabolites against the fungus but did not produce hydrogen cyanide. The study revealed biocontrol potential of certain Bacillus isolates from tea rhizosphere against F. oxysporum.
This document presents the modeling and simulation of a solar thermoelectric generator (TEG) using Matlab/Simulink. It discusses the basic principles of thermoelectricity and categories of thermoelectric materials. Mathematical models of the thermal and electrical circuits of a TEG module are developed. The models are used to simulate the current-voltage and current-power characteristics of a sample TEG module. The simulation results show that maximum power output of 19W can be achieved at an input temperature difference of 2000C and 4.5% conversion efficiency.
Lean and Agile Manufacturing as productivity enhancement techniques - a compa...IOSR Journals
Lean and agile manufacturing are productivity enhancement techniques that aim to improve responsiveness to customers and reduce costs. Lean focuses on eliminating waste through continuous improvement processes, while agile emphasizes flexibility and nimbleness to respond quickly to changes. Both approaches seek to enhance value for customers. Key differences are that lean focuses more on efficiency and waste reduction within operations, while agile takes a more holistic view across the entire enterprise to thrive in uncertain environments through rapid adaptation.
This document discusses architectural and security management for grid computing. It begins by defining grid computing as an environment that enables sharing of distributed resources across organizations to achieve common goals. It then describes the key components of a grid, including computation resources, storage, communications, software/licenses, and special equipment. The document outlines a four-level grid architecture including a fabric level, core middleware level, user middleware level, and application level. It also discusses important aspects of grid computing such as resource balancing, reliability through distribution, parallel CPU capacity, and management of different projects. Finally, it emphasizes that security is a major concern for grid computing due to the open nature of sharing resources across organizational boundaries.
This document describes a proposed system for web extraction using a trinity construction and efficient algorithms. It begins with an abstract discussing how trinity characteristics can be used to automatically extract content from websites in a sequential tree structure. It then discusses the existing system which uses trinity tree and prefix/suffix sorting but has limitations. The proposed system introduces fuzzy logic for multi-perspective crawling across multiple websites, genetic algorithms to load extracted content into the trinity structure, and an ant colony optimization algorithm to obtain an effective structure and suggest optimized solutions. Key aspects of fuzzy logic, genetic algorithms, and ant colony optimization are also summarized.
This document reviews techniques for predicting epileptic seizures. It discusses how seizure prediction could improve quality of life for epilepsy patients by allowing them to avoid dangerous situations. The document outlines different types of seizures and classification systems. It reviews literature on using sensors and video analysis to detect seizures. Effective prediction is important for clinical management to increase treatment effectiveness and prevent injuries from unexpected seizures.
Antibacterial Application of Novel Mixed-Ligand Dithiocarbamate Complexes of ...IOSR Journals
The document summarizes the synthesis and characterization of nine novel mixed-ligand dithiocarbamate complexes of nickel(II) and their antibacterial properties. The complexes were prepared by reacting nickel chloride with salicylaldehyde and aryl dithiocarbamates. They were characterized using various analytical techniques and had general formula [Ni(Sal)(Rdtc)]. The complexes exhibited broad-spectrum antibacterial activity against Escherichia coli, Staphylococcus aureus, Klebsiella oxytoca and Pseudomonas aureginosa based on agar diffusion tests, suggesting their potential as broad-spectrum antibacterial agents.
The document presents a dynamic model for lifting railway tracks as a one-mass system. It derives the equation of motion for the model numerically using a Runge-Kutta integration scheme. The model accounts for parameters like mass, stiffness, damping, and lifting force. Numerical solutions show that damping coefficients affect the duration and quality of lifting, while lifting forces impact the lift value and duration. The dynamic model allows studying how system parameters influence behavior and stability during railway track lifting.
A Comparative Study on Balance and Flexibility between Dancer and Non-Dancer ...IOSR Journals
This document summarizes a study that compares balance and flexibility between dancer girls and non-dancer girls. 30 dancer girls and 30 non-dancer girls were given tests to measure their balance (using the stork stand test) and flexibility (using the sit and reach test). The results showed that on average, dancer girls had better balance and flexibility scores compared to non-dancer girls, though the differences were not statistically significant. The study concludes that dance can have positive impacts on factors like balance and flexibility, and recommends that dance be included in physical education curriculums.
Design And Analysis Of Chain Outer Link By Using Composite MaterialIOSR Journals
This document summarizes the design and analysis of using a composite material for the outer link of a roller conveyor chain. It begins with an introduction to roller conveyor chains and their applications. It then describes the design process for the original outer link made of carbon steel, including hand calculations to determine the link dimensions. Finite element analysis was conducted on both the original design and a modified design using glass fiber-epoxy composite material. The results showed that the composite material link weighed less and experienced lower stresses than the original design, indicating that using a composite material can improve the design of the chain outer link.
OFDM PAPR Reduction by Shifting Two Null Subcarriers among Two Data SubcarriersIOSR Journals
This document proposes and evaluates methods for reducing the peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It discusses how null subcarriers, which are unused carriers with zero transmit energy, can be shifted or switched among data subcarriers to generate candidate signals with lower PAPR values. The proposed null subcarrier shifting and two null subcarrier switching methods are shown to achieve around 1.4dB and 1.7dB lower PAPR respectively compared to existing techniques. Simulation results also indicate the bit error rate performance of the proposed null subcarrier shifting method is close to existing methods but with lower PAPR.
Structure, Morphology & Infrared Spectroscopic Characterization of Ga (2x+2) ...IOSR Journals
This document summarizes a study that characterized the structure, morphology, and infrared spectroscopic properties of Ga(2x+2)NFe2(49-x)O3 ferrite synthesized using the sol gel technique. X-ray diffraction (XRD), scanning electron microscopy (SEM), and Fourier transform infrared spectroscopy (FTIR) were used to analyze the size, type of bonding, and surface properties between the compounds. The results agreed with previous literature. GaN was used as a dopant for x=1 and x=5 in the formula to introduce its semiconducting properties into the ferrite system and study the effects.
Synthesis, Characterization and Anti-inflammatory Activity of Novel Triazolod...IOSR Journals
This document describes the synthesis and characterization of novel triazolodiazepine derivatives and their evaluation for anti-inflammatory activity. Piperidin-4-ones were reacted to form diazepan-5-ones, which were then converted to thione derivatives using Lawessen's reagent. Reaction of the thiones with acetylhydrazide yielded the target triazolodiazepine compounds. The compounds were characterized using techniques such as FT-IR, NMR, and mass spectrometry. They were then tested for anti-inflammatory activity using a carrageenan-induced paw edema assay in comparison to the standard drug indomethacin.
Leather Quality Estimation Using an Automated Machine Vision SystemIOSR Journals
This document describes a proposed machine vision system to automate the inspection and quality estimation of leather materials. Key steps of the proposed methodology include image acquisition, preprocessing, segmentation of defects, computation of defect features like location, area, perimeter, and a histogram analysis to estimate surface smoothness. These quantitative defect features would be compiled into a feature vector to objectively determine the quality of the leather in a standardized way. Related works on automated leather inspection and applications of machine vision in leather manufacturing processes are also reviewed. The proposed system aims to provide repeatable, consistent and time-efficient leather quality assessment compared to manual inspection.
This document summarizes a paper that presents a novel method for determining the optimal location of Flexible AC Transmission System (FACTS) controllers in a multi-machine power system using a Fuzzy Controlled Genetic Algorithm (FCGA). The proposed algorithm aims to simultaneously optimize the location, type, and rated values of FACTS controllers while minimizing the overall system cost, which includes generation and investment costs. The algorithm is tested on IEEE 14-bus and 30-bus test systems, incorporating thyristor-controlled series compensator (TCSC) and unified power flow controller (UPFC) devices. Simulation results show the obtained solution is feasible and accurate for solving the optimal power flow problem.
Ontology languages are used in modelling the semantics of concepts within a particular domain and the relationships between those concepts. The Semantic Web standard provides a number of modelling languages that differ in their level of expressivity and are organized in a Semantic Web Stack in such a way that each language level builds on the expressivity of the other. There are several problems when one attempts to use independently developed ontologies. When existing ontologies are adapted for new purposes it requires that certain operations are performed on them. These operations are currently performed in a semi-automated manner. This paper seeks to model categorically the syntax and semantics of RDF ontology as a step towards the formalization of ontological operations using category theory.
Iaetsd hierarchical fuzzy rule based classificationIaetsd Iaetsd
This document discusses a hierarchical fuzzy rule-based classification system using genetic rule selection to filter unwanted messages from online social networks. It aims to improve performance on imbalanced data sets by increasing granularity of fuzzy partitions at class boundaries. The system uses a neural network learning model and genetic algorithm for rule selection to build an accurate and compact fuzzy rule-based model. It analyzes challenges in classifying short texts from social media posts and reviews related work on content-based filtering and policy-based personalization for social networks. The document also discusses issues with imbalanced data sets and proposes oversampling the minority class using SMOTE (Synthetic Minority Over-sampling Technique) as a preprocessing step to address class imbalance problems.
1) The document discusses various services for open access literature including institutional archives, metadata harvesting through Celestial, and citation analysis services like Citebase.
2) It describes how Citebase extracts references from texts and stores them in a structured database to enable citation linking and navigation between cited and citing articles.
3) Early download frequency data from arXiv.org is shown to correlate with longer-term citation frequency, indicating web impact may predict future citation impact.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Diacritic Oriented Arabic Information Retrieval SystemCSCJournals
Arabic language support in search engines and operating systems is improved in recent years. Searching in the Internet is reliable and can be compared to the excellent support for several other languages, including English. However, for text with diacritics there are some limitations. For this reason, most Information retrieval (IR) systems remove diacritics from text and ignore it for its complexity. Searching text with diacritics is important for some kinds of documents, such as those of religious books, some newspapers and children stories. This research shows the design and development of the system that overcome the problem. The proposed system considers diacritics. The proposed system includes the design complexity in the retrieving algorithm rather than the information repository, which is database in this study. Also, this study analyses the results and the performance. Results are promising and performance analysis shows methods to enhance design and increase the performance. The proposed system can be integrated in search engines, text editors and any information retrieval system that include Arabic text. Performance analysis of the proposed system shows that this system is reliable. The proposed system is applied on database of Hadeeth, which is religious book includes the prophet action and statements. The system can be applied in any kind of data repository.
Stellar Measurements with the New Intensity FormulaIOSR Journals
In this paper a linear relationship in stellar optical spectra has been found by using a
spectroscopical method used on optical light sources where it is possible to organize atomic and ionic data.
This method is based on a new intensity formula in optical emission spectroscopy (OES). Like the HR-diagram ,
it seems to be possible to organize the luminosity of stars from different spectral classes. From that organization
it is possible to determine the temperature , density and mass of stars by using the new intensity formula. These
temperature, density and mass values agree well with literature values. It is also possible to determine the mean
electron temperature of the optical layers (photospheres) of the stars as it is for atoms in the for laboratory
plasmas. The mean value of the ionization energies of the different elements of the stars has shown to be very
significant for each star. This paper also shows that the hydrogen Balmer absorption lines in the stars follow
the new intensity formula.
Five bacterial isolates from the rhizosphere of tea plants in Barak Valley, Assam, India were tested for their ability to control the fungal pathogen Fusarium oxysporum. The isolates were identified as Bacillus species based on their morphological and biochemical traits. Three isolates showed moderate antagonistic activity against F. oxysporum in vitro, inhibiting its growth by 2.7 to 3.6 cm. All three isolates were found to produce volatile metabolites against the fungus but did not produce hydrogen cyanide. The study revealed biocontrol potential of certain Bacillus isolates from tea rhizosphere against F. oxysporum.
This document presents the modeling and simulation of a solar thermoelectric generator (TEG) using Matlab/Simulink. It discusses the basic principles of thermoelectricity and categories of thermoelectric materials. Mathematical models of the thermal and electrical circuits of a TEG module are developed. The models are used to simulate the current-voltage and current-power characteristics of a sample TEG module. The simulation results show that maximum power output of 19W can be achieved at an input temperature difference of 2000C and 4.5% conversion efficiency.
Lean and Agile Manufacturing as productivity enhancement techniques - a compa...IOSR Journals
Lean and agile manufacturing are productivity enhancement techniques that aim to improve responsiveness to customers and reduce costs. Lean focuses on eliminating waste through continuous improvement processes, while agile emphasizes flexibility and nimbleness to respond quickly to changes. Both approaches seek to enhance value for customers. Key differences are that lean focuses more on efficiency and waste reduction within operations, while agile takes a more holistic view across the entire enterprise to thrive in uncertain environments through rapid adaptation.
This document discusses architectural and security management for grid computing. It begins by defining grid computing as an environment that enables sharing of distributed resources across organizations to achieve common goals. It then describes the key components of a grid, including computation resources, storage, communications, software/licenses, and special equipment. The document outlines a four-level grid architecture including a fabric level, core middleware level, user middleware level, and application level. It also discusses important aspects of grid computing such as resource balancing, reliability through distribution, parallel CPU capacity, and management of different projects. Finally, it emphasizes that security is a major concern for grid computing due to the open nature of sharing resources across organizational boundaries.
This document describes a proposed system for web extraction using a trinity construction and efficient algorithms. It begins with an abstract discussing how trinity characteristics can be used to automatically extract content from websites in a sequential tree structure. It then discusses the existing system which uses trinity tree and prefix/suffix sorting but has limitations. The proposed system introduces fuzzy logic for multi-perspective crawling across multiple websites, genetic algorithms to load extracted content into the trinity structure, and an ant colony optimization algorithm to obtain an effective structure and suggest optimized solutions. Key aspects of fuzzy logic, genetic algorithms, and ant colony optimization are also summarized.
This document reviews techniques for predicting epileptic seizures. It discusses how seizure prediction could improve quality of life for epilepsy patients by allowing them to avoid dangerous situations. The document outlines different types of seizures and classification systems. It reviews literature on using sensors and video analysis to detect seizures. Effective prediction is important for clinical management to increase treatment effectiveness and prevent injuries from unexpected seizures.
Antibacterial Application of Novel Mixed-Ligand Dithiocarbamate Complexes of ...IOSR Journals
The document summarizes the synthesis and characterization of nine novel mixed-ligand dithiocarbamate complexes of nickel(II) and their antibacterial properties. The complexes were prepared by reacting nickel chloride with salicylaldehyde and aryl dithiocarbamates. They were characterized using various analytical techniques and had general formula [Ni(Sal)(Rdtc)]. The complexes exhibited broad-spectrum antibacterial activity against Escherichia coli, Staphylococcus aureus, Klebsiella oxytoca and Pseudomonas aureginosa based on agar diffusion tests, suggesting their potential as broad-spectrum antibacterial agents.
The document presents a dynamic model for lifting railway tracks as a one-mass system. It derives the equation of motion for the model numerically using a Runge-Kutta integration scheme. The model accounts for parameters like mass, stiffness, damping, and lifting force. Numerical solutions show that damping coefficients affect the duration and quality of lifting, while lifting forces impact the lift value and duration. The dynamic model allows studying how system parameters influence behavior and stability during railway track lifting.
A Comparative Study on Balance and Flexibility between Dancer and Non-Dancer ...IOSR Journals
This document summarizes a study that compares balance and flexibility between dancer girls and non-dancer girls. 30 dancer girls and 30 non-dancer girls were given tests to measure their balance (using the stork stand test) and flexibility (using the sit and reach test). The results showed that on average, dancer girls had better balance and flexibility scores compared to non-dancer girls, though the differences were not statistically significant. The study concludes that dance can have positive impacts on factors like balance and flexibility, and recommends that dance be included in physical education curriculums.
Design And Analysis Of Chain Outer Link By Using Composite MaterialIOSR Journals
This document summarizes the design and analysis of using a composite material for the outer link of a roller conveyor chain. It begins with an introduction to roller conveyor chains and their applications. It then describes the design process for the original outer link made of carbon steel, including hand calculations to determine the link dimensions. Finite element analysis was conducted on both the original design and a modified design using glass fiber-epoxy composite material. The results showed that the composite material link weighed less and experienced lower stresses than the original design, indicating that using a composite material can improve the design of the chain outer link.
OFDM PAPR Reduction by Shifting Two Null Subcarriers among Two Data SubcarriersIOSR Journals
This document proposes and evaluates methods for reducing the peak-to-average power ratio (PAPR) in orthogonal frequency division multiplexing (OFDM) systems. It discusses how null subcarriers, which are unused carriers with zero transmit energy, can be shifted or switched among data subcarriers to generate candidate signals with lower PAPR values. The proposed null subcarrier shifting and two null subcarrier switching methods are shown to achieve around 1.4dB and 1.7dB lower PAPR respectively compared to existing techniques. Simulation results also indicate the bit error rate performance of the proposed null subcarrier shifting method is close to existing methods but with lower PAPR.
Structure, Morphology & Infrared Spectroscopic Characterization of Ga (2x+2) ...IOSR Journals
This document summarizes a study that characterized the structure, morphology, and infrared spectroscopic properties of Ga(2x+2)NFe2(49-x)O3 ferrite synthesized using the sol gel technique. X-ray diffraction (XRD), scanning electron microscopy (SEM), and Fourier transform infrared spectroscopy (FTIR) were used to analyze the size, type of bonding, and surface properties between the compounds. The results agreed with previous literature. GaN was used as a dopant for x=1 and x=5 in the formula to introduce its semiconducting properties into the ferrite system and study the effects.
Synthesis, Characterization and Anti-inflammatory Activity of Novel Triazolod...IOSR Journals
This document describes the synthesis and characterization of novel triazolodiazepine derivatives and their evaluation for anti-inflammatory activity. Piperidin-4-ones were reacted to form diazepan-5-ones, which were then converted to thione derivatives using Lawessen's reagent. Reaction of the thiones with acetylhydrazide yielded the target triazolodiazepine compounds. The compounds were characterized using techniques such as FT-IR, NMR, and mass spectrometry. They were then tested for anti-inflammatory activity using a carrageenan-induced paw edema assay in comparison to the standard drug indomethacin.
Leather Quality Estimation Using an Automated Machine Vision SystemIOSR Journals
This document describes a proposed machine vision system to automate the inspection and quality estimation of leather materials. Key steps of the proposed methodology include image acquisition, preprocessing, segmentation of defects, computation of defect features like location, area, perimeter, and a histogram analysis to estimate surface smoothness. These quantitative defect features would be compiled into a feature vector to objectively determine the quality of the leather in a standardized way. Related works on automated leather inspection and applications of machine vision in leather manufacturing processes are also reviewed. The proposed system aims to provide repeatable, consistent and time-efficient leather quality assessment compared to manual inspection.
This document summarizes a paper that presents a novel method for determining the optimal location of Flexible AC Transmission System (FACTS) controllers in a multi-machine power system using a Fuzzy Controlled Genetic Algorithm (FCGA). The proposed algorithm aims to simultaneously optimize the location, type, and rated values of FACTS controllers while minimizing the overall system cost, which includes generation and investment costs. The algorithm is tested on IEEE 14-bus and 30-bus test systems, incorporating thyristor-controlled series compensator (TCSC) and unified power flow controller (UPFC) devices. Simulation results show the obtained solution is feasible and accurate for solving the optimal power flow problem.
Old wine in new wineskins: Revisiting counselling in traditional Ndebele and ...IOSR Journals
The document summarizes counselling in traditional Ndebele and Shona societies in Zimbabwe before the advent of western formal counselling. It discusses how counselling was an integral part of communities with various roles providing advice, such as close family friends, aunts, uncles, grandparents, traditional healers, and elders. Folklore and proverbs were also important means of imparting guidance. Counselling emphasized prevention and was provided informally and freely within a holistic community approach, rather than a professionalized, crisis-focused model as seen today. The document argues counselling is not new but an old practice now in a modernized form, representing "old wine in new wineskins."
1. The document examines a local Nigerian game called "tsorry checkerboard" and applies group theory concepts.
2. The game is played on a 2x2 board with each player having up to 3 pieces, and the goal is to line up all three of one's pieces horizontally, vertically, or diagonally.
3. The possible moves of each piece (vertical, horizontal, diagonal, or staying in place) form a Klein four-group, satisfying the group properties of closure, associativity, identity, and inverses. Therefore, group theory can be applied to model the game.
Novel Way to Isolate Adipose Derive Mesenchymal Stem Cells & Its Future Clini...IOSR Journals
Abstract: Adipose-derived stem cells (ADSCs), were isolated from discarded human fat tissue, obtained from csection with our recently modified methods, in Stem Cell & Regenerative Medicine Lab, VSBT. Here we develop
two methods to isolate Adipose derived mesenchymal stem cells with enzyme digestion and use of
phosphatidylcholine and deoxycholate. Surface protein expression was analyzed by flow cytometry to
characterize the cell phenotype. The multi-lineage potential of ADSCs was testified by differentiating cells with
adipogenic inducer. ADSCs can be cultured in vitro for up to one month without passage. Also, the flow
cytometry analysis showed that ADSCs expressed high levels of stem cell related surface marker CD105.
ADSCs have strong proliferation ability, maintain their phenotypes, and have stronger multi-differentiation
potential. The molecular basis of ADSC differentiation was studied using bioinformatics tools with an aim to
identify the key proteins involved in differentiation, such that they could be used as potential targets for drug
development for the treatment of obesity. The key proteins involved were found to be PPARG and C/EBPα. The
structures of the proteins were retrieved from MMDB (Molecular Modelling Database) and PDB (Protein Data
Bank) respectively. Key Words: Adipose-derived stem cells, Mesenchymal stem cells, Enzyme digestion, Phosphatidylcholine, Deoxycholate, PPARG, C/EBPα, etc.
Chemical Examination of Sandbox (Hura Crepitans) Seed: Amino Acid and Seed Pr...IOSR Journals
Abstract: Amino acid composition as well as the seed protein solubility of (Hura crepitans) seeds was studied. The chemical scores for the determined amino acids of the seed in % showed tryptophan, leucine, methionine and isoleucine with 175.71, 175.00, 161.82 and 134.52 as the most abundant amino acids in that order while lysine and phenylalanine with 44.29 and 45.71 respectively were the most limiting amino acids. The ratio of percentage essential and non-essential amino acids in the seed was found to be 79: 21. All the values determined for amino acids were higher than the FAO/WHO standard except for lysine, cysteine and phenylalanine where lower values were obtained. Four solvents (0.1M each of NaOH, Na2CO3, NaHCO3 and NaCl) were used to test for solubility of the seed protein and out of these, 0.1M NaOH was found to be the most effective solvent compared to the deionized distilled water. The protein was found to be more soluble in the alkaline than the acidic medium with PH4 having the lowest protein solubility of 20% while PH8 have the highest solubility of 65% after which increasing pH do not increase solubility and a relative stability established. The outcome of this work is a useful indication of how well protein isolate would perform when they are applied to food and to the extent of protein denaturation due to chemical treatment,
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
MULTILINGUAL INFORMATION RETRIEVAL BASED ON KNOWLEDGE CREATION TECHNIQUESijcseit
As the information access across languages increases, the importance of a system that supports querybased
searching with the presence of multilingual also grows. Gathering the information in different
natural language is the most difficult task, which requires huge resources like database and digital
libraries. Cross language information retrieval (CLIR) enables to search in multilingual document
collections using the native language which can be supported by the different data mining techniques. This
paper deals with various data mining techniques that can be used for solving the problems encountered in
CLIR.
Towards optimize-ESA for text semantic similarity: A case study of biomedical...IJECEIAES
Explicit Semantic Analysis (ESA) is an approach to measure the semantic relatedness between terms or documents based on similarities to documents of a references corpus usually Wikipedia. ESA usage has received tremendous attention in the field of natural language processing NLP and information retrieval. However, ESA utilizes a huge Wikipedia index matrix in its interpretation by multiplying a large matrix by a term vector to produce a high-dimensional vector. Consequently, the ESA process is too expensive in interpretation and similarity steps. Therefore, the efficiency of ESA will slow down because we lose a lot of time in unnecessary operations. This paper propose enhancements to ESA called optimize-ESA that reduce the dimension at the interpretation stage by computing the semantic similarity in a specific domain. The experimental results show clearly that our method correlates much better with human judgement than the full version ESA approach.
A hybrid English to Malayalam machine translator for Malayalam content creati...IOSR Journals
This document discusses a proposed system to integrate a hybrid English-to-Malayalam machine translator with wiki systems to automate content creation in Malayalam. The hybrid translator combines statistical machine translation and translation memory. The system is intended to address the lack of Malayalam content in popular wikis like Wikipedia by speeding up content creation processes and overcoming language barriers. An overview of related work on wiki-NLP integration and English-to-Malayalam translation is also provided.
EMPLOYING THE CATEGORIES OF WIKIPEDIA IN THE TASK OF AUTOMATIC DOCUMENTS CLUS...IJCI JOURNAL
In this paper we describe a new unsupervised algorithm for automatic documents clustering with the aid of Wikipedia. Contrary to other related algorithms in the field, our algorithm utilizes only two aspects of Wikipedia, namely its categories network and articles titles. We do not utilize the inner content of the articles in Wikipedia or their inner or inter links. The implemented algorithm was evaluated in an
experiment for documents clustering. The findings we obtained indicate that the utilized features from
Wikipedia in our framework can give competing results especially when compared against other models in
the literature which employ the inner content of Wikipedia articles.
Survey On Building A Database Driven Reverse DictionaryEditor IJMTER
Reverse dictionaries are widely used for a reference work that is organized by concepts,
phrases, or the definitions of words. This paper describe the many challenges inherent in building a
reverse lexicon, and map drawback to the well known abstract similarity problem The criterion web
search engines are basic versions of system; they take benefit of huge scale which permits inferring
general interest concerning documents from link information. This paper describe the basic study of
database driven reverse dictionary using three large-scale dataset namely person names, general English
words and biomedical concepts. This paper analyzes difficulties arising in the use of documents
produced by Reverse dictionary.
This document summarizes a research paper on cross language text retrieval. It discusses the different types of translations that can occur in cross language retrieval, including query translation, document translation, and various approaches like machine translation, dictionary-based, and corpus-based. It also outlines some common ranking methods used in information retrieval for cross language documents, such as Okapi BM25, language modeling, and TF-IDF. The paper provides an overview of the key components and challenges of cross language text retrieval systems.
ABBREVIATION DICTIONARY FOR TWITTER HATE SPEECHIJCI JOURNAL
This document presents a study that compiled an abbreviation dictionary to normalize abbreviations used in hate speech tweets in English. The study:
1) Used a Python library and keywords from an annotated hate speech dataset to collect over 24,000 tweets containing abbreviations.
2) Manually reviewed the abbreviations from the tweets according to developed rules to determine their complete forms.
3) Compiled a dictionary of 300 abbreviations and their full forms to help normalize abbreviations in Twitter hate speech detection.
The study compared its methodology and results to previous work on abbreviation dictionaries and found it extracted more tweets and abbreviations than similar prior studies. The dictionary will be used to normalize hate speech data in future research.
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
This document describes MELIS (Meaning Elicitation and Lexical Integration System), a tool developed to support the annotation of data sources with conceptual and lexical information during the ontology generation process in MOMIS (Mediator envirOnment for Multiple Information Systems). MELIS improves upon MOMIS by enabling more automated annotation of data sources and by extracting relationships between lexical elements using lexical and domain knowledge to provide a richer semantics. The document outlines the MOMIS architecture integrated with MELIS and explains how MELIS supports key steps in the ontology creation process, including source annotation, common thesaurus generation, and relationship extraction.
Biomedical indexing and retrieval system based on language modeling approachijseajournal
This summarizes a research paper that proposes a biomedical indexing and retrieval system called BIOINSY. It uses a language modeling approach to select the best Medical Subject Headings (MeSH) descriptors to index medical articles from sources like PUBMED. The system first preprocesses articles by splitting text, stemming words, and removing stop words. It then extracts terms using a hybrid linguistic and statistical approach. Terms are weighted based on semantic relationships in MeSH, not just statistics. Descriptors are selected by disambiguating terms and estimating the probability a descriptor was generated by the article's language model. Experiments showed the effectiveness of this conceptual indexing approach.
Great model a model for the automatic generation of semantic relations betwee...ijcsity
The
large
a
v
ailable
am
ou
n
t
of
non
-
structured
texts
that
b
e
-
long
to
differe
n
t
domains
su
c
h
as
healthcare
(e.g.
medical
records),
justice
(e.g.
l
a
ws,
declarations),
insurance
(e.g.
declarations),
etc. increases
the
effort
required
for
the
analysis
of
information
in
a
decision making
pro
-
cess.
Differe
n
t
pr
o
jects
and t
o
ols
h
av
e
pro
p
osed
strategies
to
reduce
this
complexi
t
y
b
y
classifying,
summarizing
or
annotating
the
texts.
P
artic
-
ularl
y
,
text
summary
strategies
h
av
e
pr
ov
en
to
b
e
v
ery
useful
to
pr
o
vide
a
compact
view
of
an
original
text.
H
ow
e
v
er,
the
a
v
ailable
strategies
to
generate
these
summaries
do
not
fit
v
ery
w
ell
within
the
domains
that
require
ta
k
e
i
n
to
consideration
the
tem
p
oral
dimension
of
the
text
(e.g.
a
rece
n
t
piece
of
text
in
a
medical
record
is
more
im
p
orta
n
t
than
a
pre
-
vious
one)
and
the
profile
of
the
p
erson
who
requires
the
summary
(e.g
the
medical
s
p
ecialization).
T
o
co
p
e with
these
limitations
this
pa
p
er
prese
n
ts
”GRe
A
T”
a
m
o
del
for
automatic
summary
generation
that
re
-
lies
on
natural
language
pr
o
cessing
and
text
mining
te
c
hniques
to
extract
the
most
rele
v
a
n
t
information
from
narrati
v
e
texts
and
disc
o
v
er
new
in
-
formation
from
the
detection
of
related
information. GRe
A
T
M
o
del
w
as impleme
n
ted
on
sof
tw
are
to
b
e
v
alidated
in
a
health
institution
where
it
has
sh
o
wn
to
b
e
v
ery
useful
to displ
a
y
a
preview
of
the
information
a
b
ou
t
medical
health
records
and
disc
o
v
er
new
facts
and
h
y
p
otheses
within
the
information.
Se
v
eral
tests
w
ere
executed
su
c
h
as
F
unctional
-
i
t
y
,
Usabili
t
y
and
P
erformance
regarding
to
the
impleme
n
ted
sof
t
w
are.
In
addition,
precision
and
recall
measures
w
ere
applied
on
the
results
ob
-
tained
through
the
impleme
n
ted
t
o
ol,
as
w
ell
as
on
the
loss
of
information
obtained
b
y
pr
o
viding
a
text
more
shorter than
the
original
QUrdPro: Query processing system for Urdu LanguageIJERA Editor
This document describes QUrdPro, a query processing system for the Urdu language. It proposes an ontology-based architecture that uses natural language processing to analyze user queries in Urdu, formulate queries based on the domain ontology, search documents to extract relevant answers, and return results to the user. The system aims to improve information retrieval for the Urdu language by leveraging ontologies and avoiding users having to sift through large amounts of unstructured text. It discusses related work on question answering systems and outlines the proposed architecture and four-phase process model of QUrdPro.
Multilingualism in Information Retrieval SystemAriel Hess
The document discusses multilingualism in information retrieval systems. It examines challenges in creating systems that allow users to input queries and receive results in multiple languages. Several existing systems are described that have implemented multilingual features, such as translating queries or documents. However, challenges remain around accurate translation and handling diverse languages and regions. Future research areas discussed include developing better translation tools and testing systems on a wider range of languages and users.
This document provides an overview of information retrieval models, including vector space models, TF-IDF, Doc2Vec, and latent semantic analysis. It begins with basic concepts in information retrieval like document indexing and relevance scoring. Then it discusses vector space models and how documents and queries are represented as vectors. TF-IDF weighting is explained as assigning higher weight to rare terms. Doc2Vec is introduced as an extension of word2vec to learn document embeddings. Latent semantic analysis uses singular value decomposition to project documents to a latent semantic space. Implementation details and examples are provided for several models.
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringKelly Lipiec
This document summarizes a research paper that proposes an improved method for mining biomedical data from web documents using clustering. Specifically, it develops an optimized k-means clustering algorithm to group similar biomedical documents together based on identifying relevant terms using the Unified Medical Language System (UMLS). The approach aims to more efficiently retrieve relevant biomedical documents for users. It compares the proposed method to the original k-means algorithm and finds it achieves an average F-measure of 99.06%, indicating more accurate clustering of biomedical web documents.
Linked Data Generation for the University Data From Legacy Database dannyijwest
Web was developed to share information among the users through internet as some hyperlinked documents.
If someone wants to collect some data from the web he has to search and crawl through the documents to
fulfil his needs. Concept of Linked Data creates a breakthrough at this stage by enabling the links within
data. So, besides the web of connected documents a new web developed both for humans and machines, i.e.,
the web of connected data, simply known as Linked Data Web. Since it is a very new domain, still a very
few works has been done, specially the publication of legacy data within a University domain as Linked
Data.
A Review on Evolution and Versioning of Ontology Based Information Systemsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document provides a review of existing approaches to ontology evolution and versioning. It begins by defining ontologies and discussing why evolution and versioning are needed as ontologies are used in information systems. It then outlines some existing solutions for ontology evolution and version management, noting different languages used to conceptualize ontologies. Challenges of ontology versioning and evolution are discussed. Increased usage of ontologies in different domains is reviewed. Finally, some available tools for ontology change management are mentioned.
Swoogle: Showcasing the Significance of Semantic SearchIDES Editor
The World Wide Web hosts vast repositories of
information. The retrieval of required information from the
Internet is a great challenge since computer applications
understand only the structure and layout of web pages and
they do not have access to their intended meaning. Semantic
web is an effort to enhance the Internet, so that computers
can process the information presented on WWW, interpret
and communicate with it, to help humans find required
essential knowledge. Application of Ontology is the
predominant approach helping the evolution of the Semantic
web. The aim of our work is to illustrate how Swoogle, a
semantic search engine, helps make computer and WWW
interoperable and more intelligent. In this paper, we discuss
issues related to traditional and semantic web searching. We
outline how an understanding of the semantics of the search
terms can be used to provide better results. The experimental
results establish that semantic search provides more focused
results than the traditional search.
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
This document discusses exploiting Wikipedia and Twitter for text mining applications. It explores using Wikipedia's category-article structure for text classification, subjectivity analysis, and keyword extraction. It evaluates classifying tweets as relevant/irrelevant to entities or brands and classifying tweets into topical dimensions like workplace or innovation. Features used include relatedness scores between tweet text and Wikipedia categories, topic modeling scores, and Twitter-specific features. Experimental results show the Wikipedia framework based on its category-article structure outperforms standard text mining techniques.
This document provides a technical review of secure banking using RSA and AES encryption methodologies. It discusses how RSA and AES are commonly used encryption standards for secure data transmission between ATMs and bank servers. The document first provides background on ATM security measures and risks of attacks. It then reviews related work analyzing encryption techniques. The document proposes using a one-time password in addition to a PIN for ATM authentication. It concludes that implementing encryption standards like RSA and AES can make transactions more secure and build trust in online banking.
This document analyzes the performance of various modulation schemes for achieving energy efficient communication over fading channels in wireless sensor networks. It finds that for long transmission distances, low-order modulations like BPSK are optimal due to their lower SNR requirements. However, as transmission distance decreases, higher-order modulations like 16-QAM and 64-QAM become more optimal since they can transmit more bits per symbol, outweighing their higher SNR needs. Simulations show lifetime extensions up to 550% are possible in short-range networks by using higher-order modulations instead of just BPSK. The optimal modulation depends on transmission distance and balancing the energy used by electronic components versus power amplifiers.
This document provides a review of mobility management techniques in vehicular ad hoc networks (VANETs). It discusses three modes of communication in VANETs: vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and hybrid vehicle (HV) communication. For each communication mode, different mobility management schemes are required due to their unique characteristics. The document also discusses mobility management challenges in VANETs and outlines some open research issues in improving mobility management for seamless communication in these dynamic networks.
This document provides a review of different techniques for segmenting brain MRI images to detect tumors. It compares the K-means and Fuzzy C-means clustering algorithms. K-means is an exclusive clustering algorithm that groups data points into distinct clusters, while Fuzzy C-means is an overlapping clustering algorithm that allows data points to belong to multiple clusters. The document finds that Fuzzy C-means requires more time for brain tumor detection compared to other methods like hierarchical clustering or K-means. It also reviews related work applying these clustering algorithms to segment brain MRI images.
1) The document simulates and compares the performance of AODV and DSDV routing protocols in a mobile ad hoc network under three conditions: when users are fixed, when users move towards the base station, and when users move away from the base station.
2) The results show that both protocols have higher packet delivery and lower packet loss when users are either fixed or moving towards the base station, since signal strength is better in those scenarios. Performance degrades when users move away from the base station due to weaker signals.
3) AODV generally has better performance than DSDV, with higher throughput and packet delivery rates observed across the different user mobility conditions.
This document describes the design and implementation of 4-bit QPSK and 256-bit QAM modulation techniques using MATLAB. It compares the two techniques based on SNR, BER, and efficiency. The key steps of implementing each technique in MATLAB are outlined, including generating random bits, modulation, adding noise, and measuring BER. Simulation results show scatter plots and eye diagrams of the modulated signals. A table compares the results, showing that 256-bit QAM provides better performance than 4-bit QPSK. The document concludes that QAM modulation is more effective for digital transmission systems.
The document proposes a hybrid technique using Anisotropic Scale Invariant Feature Transform (A-SIFT) and Robust Ensemble Support Vector Machine (RESVM) to accurately identify faces in images. A-SIFT improves upon traditional SIFT by applying anisotropic scaling to extract richer directional keypoints. Keypoints are processed with RESVM and hypothesis testing to increase accuracy above 95% by repeatedly reprocessing images until the threshold is met. The technique was tested on similar and different facial images and achieved better results than SIFT in retrieval time and reduced keypoints.
This document studies the effects of dielectric superstrate thickness on microstrip patch antenna parameters. Three types of probes-fed patch antennas (rectangular, circular, and square) were designed to operate at 2.4 GHz using Arlondiclad 880 substrate. The antennas were tested with and without an Arlondiclad 880 superstrate of varying thicknesses. It was found that adding a superstrate slightly degraded performance by lowering the resonant frequency and increasing return loss and VSWR, while decreasing bandwidth and gain. Specifically, increasing the superstrate thickness or dielectric constant resulted in greater changes to the antenna parameters.
This document describes a wireless environment monitoring system that utilizes soil energy as a sustainable power source for wireless sensors. The system uses a microbial fuel cell to generate electricity from the microbial activity in soil. Two microbial fuel cells were created using different soil types and various additives to produce different current and voltage outputs. An electronic circuit was designed on a printed circuit board with components like a microcontroller and ZigBee transceiver. Sensors for temperature and humidity were connected to the circuit to monitor the environment wirelessly. The system provides a low-cost way to power remote sensors without needing battery replacement and avoids the high costs of wiring a power source.
1) The document proposes a model for a frequency tunable inverted-F antenna that uses ferrite material.
2) The resonant frequency of the antenna can be significantly shifted from 2.41GHz to 3.15GHz, a 31% shift, by increasing the static magnetic field placed on the ferrite material.
3) Altering the permeability of the ferrite allows tuning of the antenna's resonant frequency without changing the physical dimensions, providing flexibility to operate over a wide frequency range.
This document summarizes a research paper that presents a speech enhancement method using stationary wavelet transform. The method first classifies speech into voiced, unvoiced, and silence regions based on short-time energy. It then applies different thresholding techniques to the wavelet coefficients of each region - modified hard thresholding for voiced speech, semi-soft thresholding for unvoiced speech, and setting coefficients to zero for silence. Experimental results using speech from the TIMIT database corrupted with white Gaussian noise at various SNR levels show improved performance over other popular denoising methods.
This document reviews the design of an energy-optimized wireless sensor node that encrypts data for transmission. It discusses how sensing schemes that group nodes into clusters and transmit aggregated data can reduce energy consumption compared to individual node transmissions. The proposed node design calculates the minimum transmission power needed based on received signal strength and uses a periodic sleep/wake cycle to optimize energy when not sensing or transmitting. It aims to encrypt data at both the node and network level to further optimize energy usage for wireless communication.
This document discusses group consumption modes. It analyzes factors that impact group consumption, including external environmental factors like technological developments enabling new forms of online and offline interactions, as well as internal motivational factors at both the group and individual level. The document then proposes that group consumption modes can be divided into four types based on two dimensions: vertical (group relationship intensity) and horizontal (consumption action period). These four types are instrument-oriented, information-oriented, enjoyment-oriented, and relationship-oriented consumption modes. Finally, the document notes that consumption modes are dynamic and can evolve over time.
The document summarizes a study of different microstrip patch antenna configurations with slotted ground planes. Three antenna designs were proposed and their performance evaluated through simulation: a conventional square patch, an elliptical patch, and a star-shaped patch. All antennas were mounted on an FR4 substrate. The effects of adding different slot patterns to the ground plane on resonance frequency, bandwidth, gain and efficiency were analyzed parametrically. Key findings were that reshaping the patch and adding slots increased bandwidth and shifted resonance frequency. The elliptical and star patches in particular performed better than the conventional design. Three antenna configurations were selected for fabrication and measurement based on the simulations: a conventional patch with a slot under the patch, an elliptical patch with slots
1) The document describes a study conducted to improve call drop rates in a GSM network through RF optimization.
2) Drive testing was performed before and after optimization using TEMS software to record network parameters like RxLevel, RxQuality, and events.
3) Analysis found call drops were occurring due to issues like handover failures between sectors, interference from adjacent channels, and overshooting due to antenna tilt.
4) Corrective actions taken included defining neighbors between sectors, adjusting frequencies to reduce interference, and lowering the mechanical tilt of an antenna.
5) Post-optimization drive testing showed improvements in RxLevel, RxQuality, and a reduction in dropped calls.
This document describes the design of an intelligent autonomous wheeled robot that uses RF transmission for communication. The robot has two modes - automatic mode where it can make its own decisions, and user control mode where a user can control it remotely. It is designed using a microcontroller and can perform tasks like object recognition using computer vision and color detection in MATLAB, as well as wall painting using pneumatic systems. The robot's movement is controlled by DC motors and it uses sensors like ultrasonic sensors and gas sensors to navigate autonomously. RF transmission allows communication between the robot and a remote control unit. The overall aim is to develop a low-cost robotic system for industrial applications like material handling.
This document reviews cryptography techniques to secure the Ad-hoc On-Demand Distance Vector (AODV) routing protocol in mobile ad-hoc networks. It discusses various types of attacks on AODV like impersonation, denial of service, eavesdropping, black hole attacks, wormhole attacks, and Sybil attacks. It then proposes using the RC6 cryptography algorithm to secure AODV by encrypting data packets and detecting and removing malicious nodes launching black hole attacks. Simulation results show that after applying RC6, the packet delivery ratio and throughput of AODV increase while delay decreases, improving the security and performance of the network under attack.
The document describes a proposed modification to the conventional Booth multiplier that aims to increase its speed by applying concepts from Vedic mathematics. Specifically, it utilizes the Urdhva Tiryakbhyam formula to generate all partial products concurrently rather than sequentially. The proposed 8x8 bit multiplier was coded in VHDL, simulated, and found to have a path delay 44.35% lower than a conventional Booth multiplier, demonstrating its potential for higher speed.
This document discusses image deblurring techniques. It begins by introducing image restoration and focusing on image deblurring. It then discusses challenges with image deblurring being an ill-posed problem. It reviews existing approaches to screen image deconvolution including estimating point spread functions and iteratively estimating blur kernels and sharp images. The document also discusses handling spatially variant blur and summarizes the relationship between the proposed method and previous work for different blur types. It proposes using color filters in the aperture to exploit parallax cues for segmentation and blur estimation. Finally, it proposes moving the image sensor circularly during exposure to prevent high frequency attenuation from motion blur.
This document describes modeling an adaptive controller for an aircraft roll control system using PID, fuzzy-PID, and genetic algorithm. It begins by introducing the aircraft roll control system and motivation for developing an adaptive controller to minimize errors from noisy analog sensor signals. It then provides the mathematical model of aircraft roll dynamics and describes modeling the real-time flight control system in MATLAB/Simulink. The document evaluates PID, fuzzy-PID, and PID-GA (genetic algorithm) controllers for aircraft roll control and finds that the PID-GA controller delivers the best performance.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 3, Ver. II (May-Jun. 2016), PP 16-22
www.iosrjournals.org
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 16 | Page
Can Wikipedia Be A Reliable Source For Translation?Testing
Wikipedia Cross Lingual Coverage of Medical Domain
Eslam Amer1
, Mohamed Abd-Elfattah2
1
Computer Science Dep. Faculty of Computers and InformationBenha University, Egypt
2
Information System Dep. Faculty of Computers and Information,.BenhaUniversity , Egypt,
2
Information System Dep. Faculty of Computers and Information,Islamic University, Al-Madinah al-
Munawwarah, KSA
Abstract:This paper introduces Wiki-Transpose, a query translation system for cross-lingual information
retrieval (CLIR). Wiki-Transpose rely only on Wikipedia as information source for translations. The main goal
of this paper is to check the coverage ratio of Wikipedia against specializedqueries that are related to medical
domain. Wiki-Transpose was evaluated using bothEnglish and Portuguese medical queries. Queries are mapped
into both English and Portuguese Wikipedia concepts. Experiments showed that Wikipedia coverage ratio of
queries is inversely proportional to the query size. Wikipedia coverage of single English term query is about
81%, and 80% for single Portuguese term query. This ratio is decreased when the number of terms in query
increased. However in the case of query translation, Wikipedia showed a comparative performance for (English
– Portuguese) and (Portuguese – English) translation. In English – Portuguese translation, Wikipedia showed
coverage ratio around 60% for single term queries, compared to 88% for Portuguese – English single term
queries. The translation coverage ratio is also decreased when number of terms in query increase.
Keywords: CLIR, query translation, Wiki-Transpose
I. Introduction
Retrieving relevant information from the constantly increasing amounts of available multilingual
content on the web is becoming a significant issue. Recently Cross-lingual information retrieval (CLIR) has
become more critically in need. As users can find it difficult to formulate a query in their non-native languages,
query translation for CLIR became the most widely used technique to access documents in a different language
from the query.
CLIR can be handled using one of two approaches [1]: Query translation in the target language or
documents translation tothe source language and then search with the original query. Historically, the bulk of
CLIR systems tended to favor query translation, probably because it’s a more computationally economical [1-2].
The main approaches to query translation are lexicon–based, or using parallel corpora and machine
translation (MT) [3]. Lexicon-based can achieve high accuracy levels when translating general text [2, 3].
However, the accuracy is lowered with possible ambiguity in case of complex phrases. Moreover, the coverage
of lexicons is a limiting factor due to the difficulty and the cost to optimize; especially because queries can refer
to a great number of named entities and multi-word terms[1, 2][4].
Nowadays, new source for translations is based on Wikipedia. Wikipedia has features that can provide
solutions to these two issues. Because of voluntary contributions of millions of users, it gathers better coverage
of named entities and domain specific terms [5], based on that, user can easily extract up-to-date multilingual
dictionaries that have an optimal lexical coverage [6].
Each Wikipedia article has hyperlinks to versions of the same article in other languages. A number of
researchers have exploited these cross-lingual associations to translate unknown terms, assuming that article
names linked in this fashion are mutual translations [7-9].
Wikipedia articles will be used as representations of concepts. The internal structure of Wikipedia is
used to get the translation of terms by traversing the cross-lingual links inside the article to get the
corresponding translation for a query.
Most Wikipedia articles contain cross-lingual links; links to articles about the same concept in a
different language. These cross-lingual links can be used to obtain translations. In this paper Wiki-Transpose
will use the structure of Wikipedia to map user query into Wikipedia concept(s), then through the cross-lingual
links, translation of the concept is retrieved
In this paper, the medical domain ischosen to check the reliability of Wikipedia as a translation
resource. According to a recent survey, health is one of the most important and searched topics on the internet.
One in three American adults went online to diagnose some medical condition they or someone else might have
[10]. Not only patients use the internet, physicians are active internet users as well. PubMed, which indexes the
2. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 17 | Page
biomedical literature, reports more than one hundred million users of which two-thirds are medical professionals
[11].
This paper introduces a model that performs query translation relying only on Wikipedia as a
translation resource. The goal of this research is to explore the possibilities of relying on Wikipedia for query
translation in CLIR.
Initially, the medical query Q(s)will be mapped into Wikipedia concepts using Wikipedia in the
language of s. Translation displayed in language concepts t can be obtained with the available cross-linguistic
relations. With these translations Q(t) request can be created.
The main challenge raised of this paper is “the coverage ratio of Wikipedia for query terms, and the
existence of corresponding translation to the target languages. However this also raises the following challenges:
How can queries be mapped to Wikipedia concepts? And How to create a query given the Wikipedia concepts?
Considering that mapping a query to concepts is a difficult, but crucial step. One of the difficulties in
this step is also a major problem in CLIR itself: word sense disambiguation. A hypothesis is that the structure of
Wikipedia (e.g. articles, internal links etc.) makes Wikipedia a useful source for CLIR. Wikipedia can be able to
handle the challenges mentioned above and will demonstrate that is it a promising approach in the field of
CLIR.
The remainder of this paper is organized as follows. In Section II, a review for translation approaches
to CLIR is presented. In Section III describe in detail the proposed approach. In Section IVexperimental results
are presented, finally, conclusion is presented in Section V.
II. CLIR :Related Works
Cross-language information retrieval (CLIR) considered active sub-domain of information retrieval
(IR) that searches for documents and for information contained within those documents. However, the bulk of
IR systems are based on Bag-of-words (BOW) which doesn’t consider sentence structure or terms order [3].
Moreover, queries submitted to systems are often short in which it can’t provide enough description of user
needs in an unambiguous and accurate way [1-3].
CLIR systems normally have two operational modes: query translation or document translation modes
[1, 2]. Additionally, translation can be direct or indirect [1] [3]. Direct translation usesdictionaries, parallel
corpora, and machine translation algorithms to translate the source text which usually require handling issues
such as ambiguityand lack of coverage [12, 13].Indirect translation on the other hand relies on the use of an
intermediary; that is placed between the source query and the target document where query is translated into an
intermediate language (or several languages) to enable comparison with the target document [14].
Several challenges that CLIR has to deal with; for example, Out Of Vocabulary words (OOV), named
entity recognition and translation, and word sense disambiguation (WSD) [3].Due to ambiguity, generating one
translation for a given query may not always be accurate. That’s why CLIR system may be enhanced if
synonyms and query–related words are included [15].
A possible solution to the above challenge is by integrating query expansion using dependent or
independent terms. In independent expansion, it is necessary to measure the semantic similarity between words.
However it is shown in [16] that it is often negatively affect performance.
In dependent expansion, documents retrieved by initial query are used for expansion, then query
expansion can take place after or before translation, or in combined modes simultaneously [17-18].
In the case of ambiguity, multiple translations must be generated for a given query. This can be
achieved using parallel corpus [19, 20]. In [20] authors showed that, the best matching documents were
retrieved in the source language, and then select the frequently occurred words that are common in retrieved
documents. Final query is expanded and enriched using these words. An advantage of the approach used in [19]
is that it automatically achieves word sense disambiguation as it relies on the co-occurrence frequency statistics.
In [21] authors use Wikipedia itself to generate similar sentences across different languages. Two
sentences are considered similar if they share (some or a large amount of) overlapping information.
Similar approach is applied as [22], however in [21] it has the exception in that it doesn’t select new
words for query, instead, it make a relevancy model for query term.
Recent works[23,24] make use of the world wide web to mine multilingual anchor text or using free
translations provided by Web page authors (e.g., technical terms followed by a translation in parenthesis).
Extension of this web-based approach makes use of Wikipedia. Wikipedia is the greatest online,
multilingual, free-content encyclopedia where every user can contribute to any content. Due to user
involvement, the growth rate of Wikipedia becomes exponentially [25]. Wikipedia is managed by its users, the
topic coverage it contains are depending on the interests of users, however according to [26] topics that of rarely
used are also covered well by contributors.
3. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 18 | Page
Wikipedia viewed as a comparable corpus [21] [27], since articles are represented in different
languages and connected through cross-lingual links. Such great feature of Wikipedia makes it possible to treat
it as a comparable corpus of sentences to find similar sentences across languages.
Due to great Features and properties of Wikipedia, research works like [28]use Wikipedia as a semantic
lexical resource, Automatic Word Sense Disambiguation (WSD) [29], and also for translation of vocabulary
words [30]. Many research works uses Wikipedia in query translation, in [30-32] they made use of cross-lingual
links that are available in Wikipedia articles to translate terms.
In addition to the aforementioned approaches, [33] exploited the hyperlinks between articles in
Wikipedia to identify the corresponding articles in different languages. However, this simple approach does not
perform a deep mining in Wikipedia and only a limited number of translation relations can be extracted.
However, it is reported in [33] that, while MT systems can provide reasonable translations for general
language expressions, they are often not sufficient for domain-specific phrases that contain personal names,
place names, technical terms, titles of artworks, etc.
The objective proposed work is to measure the reliability of using Wikipedia as a translator for some
specific domains (e.g., medical domain). Themedical domain is chosen as it used by both professional and non-
professional users. Non-professional users use commercial Search Engines like Google and Yahoo to find
information about diseases or symptoms of diseases.
Portuguese language is targeted as it is the native language for more than 200 million population.
In this paper, a proposed system is introduced that relyonlyWikipedia to translate Portuguese medical
queries into English and English medical queries into Portuguese.
The main limitation that obviously can affect the results is the weakly coverage of Portuguese topics on
Wikipedia compared to Englishcoverageof topics. For example, the size of indexed English titles of articles in
Wikipedia is about 650 MB compared to 60 MB that is the size of indexed Portuguese titles in Wikipedia.
III. The Proposed Wiki-Transpose Approach
Wiki-transpose (Figure. 1) is used to test the reliability of Wikipedia to get the corresponding
translation coverage of Portuguese to English and also English to Portuguesequeries.
User initially insert the query, the query goes through prepressing steps to remove trivial word, then the
processed query is mapped into Wikipedia concept(s). Through navigating cross-languages links found inside
the Wikipedia article(s) that represents Wikipedia concepts, the corresponding target translations extracted.
An advantage of Wiki-Transpose is that it allows extraction of phrases fromthe topics, since the titles
of Wikipedia articles are often phrases.
The proposed model has two main important steps: The pre-translation stepthis activity can be broken
downinto four separate activities - tokenization, stop word removal, stemming, and term expansion. The second
step is the translation step, this step include mapping the query in source language to Wikipedia concepts and
creating the final query in the target language using these found concepts.
The difference between this approach and other traditional approaches is that we make use of the
internal structure information of Wikipedia to get some text from internal links, compared to approaches that are
based on dictionaries and parallel corpora.
Figure 1. Wiki-Transpose Workflow
Initially, queries are going through pre-processing steps to remove unneeded words or characters. To
achieve the coverage and effectiveness of query, the processed query is stemmed, as it tends to produce more
potentially relevant documents [22]. In our model we developed a Portuguese stemmer to stem Portuguese terms
that is based on [34] and we use Porter algorithm to stem English terms [35].
The query is searched over indexed English and Portuguese Wikipedia titles. A matching algorithm is
developed to return either an exact match or possible matches for query term. If the query is found as an exact
match, the exact match is add. Otherwise, the stemmed query is used to search Wikipedia to get at most (20
4. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 19 | Page
Wikipedia titles) where the stemmed query occurs. The words included with the query in Wikipedia titles will
be used in expanding original query.
Resulted Wikipedia titles are ranked according to the similarity to the original query.The similarity is
calculated based on the Levenshteineditdistance [36, 37] between the original query, and titles returned from
Wikipedia.
Wikipedia articles that represent Wikipedia titles resulted from last step are parsed to extract the
equivalent translation for a query.
As an example the following lines are the lines that depict that Portuguese translation for English term
DNA.
<a href="//pt.wikipedia.org/wiki/Ácido_desoxirribonucleico " title="Ácidodesoxirribonucleico –
Portuguese" lang="pt" hreflang="pt">Português</a></li>
IV. Results And Discussion
Experiments were done over two medical datasets one is in Portuguese language and theother was in English
language.
The factors that are going to be measured are:
- Can Wikipedia be a good information source that cover specific domain like medical terms?
- Can Wikipedia be used as an effective tool for query translation?
The first experiment was applied to English Open Access, Collaborative Consumer Health Vocabulary
Initiative dataset1
. The second experiment was applied to collection of Portuguese medical terms that were
assessed by medical experts as medical terms.
To evaluate our technique, we divided terms based on the number of terms (grams) in the query into either (1-
gram query), (2-gram query), (3-gram query), (4-gram query), (5- gram query), and queries that have length
over fivegrams.
The following tables, Table (1), and Table (2) shows the results of applying Wiki-Transpose on English
and Portuguese over different n-grams queries where n can be ranged from one to greater than five terms per
query.
Results showing that, when the query has few terms, the probability to find an exact Wikipedia article
increase. However, whennumber of terms in query increase, such probability decrease.
For example, experimental results in (Table. 1)clarified that; for (1-word) query, the coverage ratio of
exact matching for typical Wikipedia articles that matches the query is about 81%; however when number of
terms in query increases, the coverage ratio of finding exact matching is decreased dramatically.
Table 1.English Wikipedia Coverage for different query size
Results in (Table. 2) and figure 3 show the corresponding Portuguese translation ratio for different
English queries, it shows that about 60% of English Wikipedia articles contains its equivalent Portuguese
translation, although the ratio decreased with the increase number of terms in query, but the decreasing ratio is
not sharp.
1
http://consumerhealthvocab.org/
0
0.5
1
(1-word) (2-words) (3-words)
Figure 2 English Wikipedia Coverage for different query size
Query Terms Found English
Wikipedia
Coverage
Ratio
Exact Match
(1-word) 36394 29549 0.81
(2-words) 76930 31063 0.40
(3-words) 26082 5754 0.22
5. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 20 | Page
Table 2.Translation ratio of Wiki-Transpose for English-Portuguese medical terms
Figure 3 comparison of Wiki-Transpose for English-Portuguese medical terms
In comparison with English Wikipedia coverage, results in (Table. 3, figure 3)showing that; the
Portuguese Wikipedia coverage for medical term queries is slightly better than English one. However, same
issue occurred; that is when the number of terms in query increase, the coverage probability decrease.
Translation into English for Portuguese medical terms (Table. 4) showing a remarkable better
translation ratio. Although this ratio was expected; as the number of active users in English Wikipedia are
greater than those in Portuguese Wikipedia (Table. 5). Due to the large number of English Wikipedia users, it is
expected to find English translation in almost every Portuguese article.
Table3. Portuguese Wikipedia Coverage for different query size
Table4. Translation ratio of Wiki-Transpose for Portuguese- English medical terms
Table 5.Comparison between English and Portuguese Wikipedia2
When there are no exact match, terms are going to be searched for partial match. In this type of search,
every Wikipedia article that partially contain the query is retrieved.
Results in (Table 6) depicts number of English query terms that doesn’t have exact Wikipedia
matching, and number of terms collected after expansion. For example, for 1-word query; the number of query
terms that doesn’t have exact Wikipedia article equal (4138), the total number of articles results after expanding
every term equals to (21236). Same issue is applied to Portuguese terms (Table. 7).
The collected articles for each term are going to be used for expanding the query term by adding some
related words that are highly relevant to the original query. Query expansion will be applied in further version of
this research paper, to test the impact of query expansion on translation.
Query Terms English
Wikipedia
Translated Portuguese
Wikipedia Translation
RatioExact Match Exact Match
(1-word) 36394 29549 17542 0.59
(2-words) 76930 31063 16052 0.52
(3-words) 26082 5754 2771 0.48
Query Terms FoundPortugueseWikipedia Coverage Ratio
Exact Match
(1-word) 974 782 0.80
(2-words) 144 74 0.51
(3-words) 12 5 0.42
Query Terms Portuguese
Wikipedia
Translated
EnglishWikipedia Translation
RatioExact Match Exact Match
(1-word) 974 782 692 0.88
(2-words) 144 74 74 1.00
(3-words) 12 5 1 0.20
Language # Total Articles # Users #Active users
English 34,297,033 23,193,755 135,933
Portuguese 3,656,766 1,405,746 5,924
6. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 21 | Page
Table6. Expansion ratio of Wiki-Transpose for English medical terms
Table7.Expansion ratio of Wiki-Transpose for Portuguese medical terms
English Wikipedia can be a reliable information source that users can depend on to find information
about the topic of interest.
Although the more expansion results, the more opportunity to search inside such articles, and the more
probability to find new items that are related to the original search query, however a critical question should be
raised, that is how to form a concept cloud which are terms that are ranked according to its relevancy to the
original term with respect to the term context.
The concept cloud can be used to semantically translate a query based on the understanding of terms
that are related to query [38].Weighted concept cloud will be use to avoid falling into the problem of term
ambiguity, as the ranks of relations between concepts in the concept cloud of a term will be different depending
on the term context.
V. Conclusion And Future Work
In this paper, Wiki-Transpose is introduced as a query translation system that relies only on Wikipedia
as an information and translation source. Wiki-Transpose maps queries to Wikipedia concepts and through
following the cross-lingual links, the final translation is obtained.
It is possible to achieve good result relying only on Wikipedia to be a valuable alternative source of
translation. Due to its scalability, the structure of Wikipedia makes it the very useful in CLIR. Wikipedia
concepts are very known and understandable for people that make it the first choice for users to know
information about a subject. Experiment results showed great coverage ratio for specialized scientific terms in
Wikipedia. The coverage ratio in Wikipedia reached about 81% and about 80% in single English and Portuguese
terms respectively. Incorporating term weightings methods can be used to refine the translation, it is also
interesting to customize the expansion of query as query expansion can cause query drift, it might be better to
give the added weights concepts, and then expand concepts that are highly relevant and related. Incorporating
concept cloud will enhance the performance of Wiki-Transpose, with some added natural language processing
techniques especially, incorporating Key phrase extraction technique with intelligent machine learning
clustering mechanisms, the concept cloud can be approached.
References
[1] D. Nguyen, A. Overwijk, C. Hau , R.B. Trieschnigg, D. Hiemstra, and F.M.G. Jong de. WikiTranslate: Query Translation for Cross-
lingual Information Retrieval using only Wikipedia. In Proceedings of CLEF , pages 58-65, 2009.
[2] Monika Sharma1, SudhaMorwal. “A Survey on Cross Language Information Retrieval”, International Journal of Advanced
Research in Computer and Communication Engineering Vol. 4, Issue 2, February 2015.
[3] Le, Quoc V., and Tomas Mikolov. "Distributed representations of sentences and documents." Proceedings of the 31st International
Conference on Machine Learning , Beijing, China, 2014. JMLR: W&CP volume 32
[4] Benoît Gaillard, MalekBoualem, Olivier Collin.” Query Translation using Wikipedia-based resources for analysis and
disambiguation” Proc EAMT 2010 , 14th Annual Conference of the European Association for Machine Translation, 2010.
[5] Zesch, T., Gurevych, I., Mühlhäuser, M.: Analyzing and Accessing Wikipedia as a Lexical Semantic Resource. In: Data Structures
for Linguistic Resources and Applications, pp. 197–205 (2007).
[6] Sanderson, M.: Word sense disambiguation and information retrieval. In: Proceedings of the 17th annual international ACM
SIGIR conference on Research and development in information retrieval, pp. 142–151. Springer-Verlag New York, Inc., Dublin
(1994).
[7] JONES, G. J. F., FANTINO, F., NEWMAN, E., AND ZHANG, Y. 2008. Domain-Specific query translation for multilingual
information access using machine translation augmented with dictionaries mined from Wikipedia. In Proceedings of the 2nd
International Workshop on Cross Lingual Information Access - the Information Need of Multilingual Societies (CLIA 08). PP.34–
41.
[8] SU, C.-Y., LIN, T.-C., AND WU, S.-H. 2007. Using Wikipedia to translate OOV terms on MLIR. In Proceedings of the 6th NTCIR
Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering, and Cross-
Lingual Information Access. PP.109–115.
Query Terms English Wikipedia
Query Terms to Expand Expanded terms Expansion Rate
(1-word) 36394 4138 21236 513%
(2-words) 76930 45841 359540 748%
(3-words) 26082 20328 194347 956%
(4-words) 4439 3772 35508 941%
(5-words) 814 760 6632 872%
(>5-words) 396 390 3430 879%
Query Terms Portuguese Wikipedia
Terms to Expand Expanded terms Expansion Rate
(1-word) 974 97 927 955%
(2-words) 144 70 919 1313%
(3-words) 12 7 122 1743%
7. Can Wikipedia Be A Reliable Source For Translation?Testing Wikipedia Cross Lingual Coverage..
DOI: 10.9790/0661-1803021622 www.iosrjournals.org 22 | Page
[9] M.-C., LI, M.-X., HSU, C.-C., AND WU, S.-H. 2010. Query expansion from Wikipedia and topic Web crawler on CLIR. In
Proceedings of the 8th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval,
Question Answering, and Cross-Lingual Information Access. PP.101–106.
[10] S. Fox and M. Duggan. Health online 2013. Technical report, The Pew Internet & American Life Project, January 2013.
[11] João R. M. Palotti, Allan Hanbury, Henning Müller.” Exploiting Health Related Features to Infer User Expertise in the Medical
Domain” 4th
Web Search Click Data workshop held in conjunction with WSDM 2014, New York, New York, USA.
[12] V. Lavrenko, M. Choquette, and W. B. Croft, "Cross-lingual relevance models," in Proceedings of the 25th annual international
ACM SIGIR conference on Research and development in information retrieval Tampere, Finland: ACM, 2002, pp. 175 - 182.
[13] P. Sheridan and J. P. Ballerini, "Experiments in multilingual information retrieval using the SPIDER system," in Proceedings of the
19th annual international ACM SIGIR conference on Research and development in information retrieval Zurich, Switzerland:
ACM, 1996, pp. 58 – 65
[14] D. ZHOU, M. TRURAN, T. BRAILSFORD, V. WADE, H. ASHMAN. “Translation Techniques in Cross-Language Information
Retrieval”. ACM Computing Surveys, Vol. 45, No. 1, Article 1, 2012.
[15] Kraaij, W., et.al.“Embedding web-based statistical translation models in cross-language information retrieval”. Journal of
Computational Linguistics. 29, 381–419 (2003).
[16] E. M. Voorhees, "Query expansion using lexical-semantic relations," in Proceedings of the 17th
annual international ACM SIGIR
conference on Research and development in information retrieval Dublin, Ireland: Springer-Verlag New York, Inc., 1994, pp. 61 –
69.
[17] P. McNamee and J. Mayfield, "Comparing cross-language query expansion techniques by degrading translation resources," in
Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Tampere, Finland: ACM, 2002.
[18] L. Ballesteros and W. B. Croft, "Resolving ambiguity for cross-language retrieval," in Proceedings of the 21st annual international
ACM SIGIR conference on Research and development in information retrieval Melbourne, Australia: ACM, 1998, pp. 64 – 71.
[19] V. Lavrenko, M. Choquette, and W. B. Croft, "Cross-lingual relevance models," in Proceedings of the 25th annual international
ACM SIGIR conference on Research and development in information retrieval Tampere, Finland: ACM, 2002, pp. 175 - 182.
[20] P. Sheridan and J. P. Ballerini, "Experiments in multilingual information retrieval using the SPIDER system," in Proceedings of the
19th annual international ACM SIGIR conference on Research and development in information retrieval Zurich, Switzerland:
ACM, 1996, pp. 58 – 65.
[21] S. F. Adafre and M. de Rijke, "Finding Similar Sentences across Multiple Languages in Wikipedia" in Proceedings of the 11th
Conference of the European Chapter of the Association for Computational Linguistics, 2006, pp. 62--69.
[22] FAUTSCH, C. AND SAVOY. “Algorithmic stemmers or morphological analysis? An evaluation”. J. Amer. Soc. Inf. Sci. Technol.
60, 1616–1624., 2009.
[23] CAO, G., GAO, J., AND NIE, J.-Y. 2007a. “A system to mine large-scale bilingual dictionaries from monolingual Web. In
Proceedings of the 11th Machine Translation Summit (MT Summit XI). 57–64. 2007.
[24] [24] SHI, L. 2010. “Mining OOV translations from mixed-language Web pages for cross language information retrieval”. In
Proceedings of the 32nd European Conference on Information Retrieval (ECIR 10). 471–482. 2010.
[25] J. Voss, "Measuring Wikipedia," in the 10th
International Conference of the International Society for Scientometricsand Informatics,
2005, pp. 221-231.
[26] [A. Halavais and D. Lackaff, "An Analysis of Topical Coverage of Wikipedia," Journal of Computer-Mediated Communication, pp.
429–440, 2008.
[27] [27] M. Potthast, B. Stein, and M. Anderka "A Wikipedia based Multilingual Retrieval Model," in 30th
European conference on
information retrieval Glasgow, Scotland, 2008.
[28] T. Zesch, I. Gurevych, and M. Mühlhäuser, "Analyzing and Accessing Wikipedia as a Lexical Semantic Resource," Data Structures
for Linguistic Resources and Applications, pp. 197-205, 2007.
[29] R. Mihalcea, "Using Wikipedia for Automatic Word Sense Disambiguation," in the North American Chapter of the Association for
Computational Linguistics (NAACL 2007), Rochester, 2007.
[30] C.-Y. Su, T.-C. Lin, and W. Shih-Hung, "Using Wikipedia to Translate OOV Term on MLIR," in The 6th NTCIR Workshop
Tokyo, 2007.
[31] P. Schönhofen, A. Benczúr, I. Bíró, and K. Csalogány, "Performing Cross-Language Retrieval with Wikipedia," in CLEF 2007
Budapest, 2007.
[32] Nguyen, Dong, et al. "WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia." Evaluating
Systems for Multilingual and Multimodal Information Access. Springer Berlin Heidelberg, 2009. 58-65.
[33] Jones, G., Fantino, F., Newman, E., and Zhang, Y. (2008). “Domain-specific query translation for multilingual information access
using machine translation augmented with dictionaries mined from Wikipedia,” in Proceedings of Workshop on Cross Lingual
Information Access, pp. 34–41. 2008.
[34] VM Orengo, et.al “A study on the use of stemming for monolingual ad-hoc Portuguese information retrieval” Evaluation of
Multilingual and Multi-modal Information Retrieval, 7th Workshop of the Cross-Language Evaluation Forum pp. 91-98, 2007.
[35] Porter, Martin F. (1980); An Algorithm for Suffix Stripping, Program, 14(3): 130–137, 1980.
[36] Andrew T. Freeman, et.al. “Cross Linguistic Name Matching in English and Arabic: A “One to Many Mapping” Extension of the
Levenshtein Edit Distance Algorithm. Proceedings of the Human Language Technology Conference of the North American Chapter
of the ACL, pp. 471–478, New York, June 2006.
[37] Thierry Lavoie, Ettore Merlo. “An Accurate Estimation of the Levenshtein Distance Using Metric Trees and Manhattan Distance”,
Proceeding of IWSC pp.1-7, 2012
[38] Aliaa A.A. Youssif, Atef Z. Ghalwash, and Eslam Amer.” HSWS: Enhancing efficiency of web search engine via semantic web”.
ACM Proceeding of the 3rd International Conference on (MEDES’11), pp. 212-219, 2011.