Content based retrieval has an advantage of higher prediction accuracy as compared to tagging based approach. However, the complexity in its representation and classification approach, results in lower processing accuracy and computation overhead. The correlative nature of the feature data are un-explored in the conventional modeling, where all the data features are taken as a set of feature values to give a decision. The recurrent feature class attribute is observed for the feature regrouping in action model prediction. In this paper a co-relative information, bounding grouping approach is suggested for action model prediction in CBMR application. The co-relative recurrent feature mapping results in faster retrieval process as compared to the conventional retrieval system.
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...University of Piraeus
Multimedia information has been increased in the recent years while new content delivery services enhanced with personalization functionalities are provided to users. Several standards are proposed for the representation and retrieval of multimedia content. This paper makes an overview of the available standards and technologies. Furthermore a prototype semantic P2P architecture is presented which delivers personalized audio information. The metadata which support personalization are separated in two categories: the metadata describing user preferences stored at each user and the resource adaptation metadata stored at the P2P network’s web services. The multimedia models MPEG-21 and MPEG-7 are used to describe metadata information and the Web Ontology Language (OWL) to produce and manipulate ontological descriptions. SPARQL is used for querying the OWL ontologies. The MPEG Query Format (MPQF) is also used, providing a well-known framework for applying queries to the metadata and to the ontologies.
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...cscpconf
Materials have become a very important aspect of our daily life and the search for better and
new kind of engineered materials has created some opportunities for the Information science
and technology fraternity to investigate in to the world of materials. Hence this combination of
materials science and Information science together is nowadays known as Materials
Informatics. An Object-Oriented Database Model has been proposed for organizing advanced engineering materials datasets.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
Automated hierarchical classification of scanned documents using convolutiona...IJECEIAES
This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from Pusat Data Teknologi dan Informasi (Technology and Information Data Center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.
Web Mining Research Issues and Future Directions – A SurveyIOSR Journals
This document summarizes research on web mining techniques. It begins with an abstract describing how web mining aims to extract useful information from vast amounts of unstructured web data. It then reviews various web mining techniques including web content mining, web structure mining, and web usage mining. The document surveys literature on pattern extraction techniques such as association rule mining, clustering, classification, and sequential pattern mining. It also discusses challenges in pre-processing web data and issues related to scaling up data mining algorithms for large web datasets. In closing, the document outlines future research directions in web mining including dealing with unstructured data and multimedia content.
A Review on Resource Discovery Strategies in Grid Computingiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Personalized Multimedia Web Services in Peer to Peer Networks Using MPEG-7 an...University of Piraeus
Multimedia information has been increased in the recent years while new content delivery services enhanced with personalization functionalities are provided to users. Several standards are proposed for the representation and retrieval of multimedia content. This paper makes an overview of the available standards and technologies. Furthermore a prototype semantic P2P architecture is presented which delivers personalized audio information. The metadata which support personalization are separated in two categories: the metadata describing user preferences stored at each user and the resource adaptation metadata stored at the P2P network’s web services. The multimedia models MPEG-21 and MPEG-7 are used to describe metadata information and the Web Ontology Language (OWL) to produce and manipulate ontological descriptions. SPARQL is used for querying the OWL ontologies. The MPEG Query Format (MPQF) is also used, providing a well-known framework for applying queries to the metadata and to the ontologies.
The huge volume of text documents available on the internet has made it difficult to find valuable
information for specific users. In fact, the need for efficient applications to extract interested knowledge
from textual documents is vitally important. This paper addresses the problem of responding to user
queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a
cluster-based information retrieval framework was proposed in this paper, in order to design and develop
a system for analysing and extracting useful patterns from text documents. In this approach, a pre-
processing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector
Space Model (VSM) is performed to represent the dataset. The system was implemented through two main
phases. In phase 1, the clustering analysis process is designed and implemented to group documents into
several clusters, while in phase 2, an information retrieval process was implemented to rank clusters
according to the user queries in order to retrieve the relevant documents from specific clusters deemed
relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision
(P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...cscpconf
Materials have become a very important aspect of our daily life and the search for better and
new kind of engineered materials has created some opportunities for the Information science
and technology fraternity to investigate in to the world of materials. Hence this combination of
materials science and Information science together is nowadays known as Materials
Informatics. An Object-Oriented Database Model has been proposed for organizing advanced engineering materials datasets.
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
3D reconstruction is a technique used in computer vision which has a wide range of applications in
areas like object recognition, city modelling, virtual reality, physical simulations, video games and
special effects. Previously, to perform a 3D reconstruction, specialized hardwares were required.
Such systems were often very expensive and was only available for industrial or research purpose.
With the rise of the availability of high-quality low cost 3D sensors, it is now possible to design
inexpensive complete 3D scanning systems. The objective of this work was to design an acquisition and
processing system that can perform 3D scanning and reconstruction of objects seamlessly. In addition,
the goal of this work also included making the 3D scanning process fully automated by building and
integrating a turntable alongside the software. This means the user can perform a full 3D scan only by
a press of a few buttons from our dedicated graphical user interface. Three main steps were followed
to go from acquisition of point clouds to the finished reconstructed 3D model. First, our system
acquires point cloud data of a person/object using inexpensive camera sensor. Second, align and
convert the acquired point cloud data into a watertight mesh of good quality. Third, export the
reconstructed model to a 3D printer to obtain a proper 3D print of the model.
Automated hierarchical classification of scanned documents using convolutiona...IJECEIAES
This research proposed automated hierarchical classification of scanned documents with characteristics content that have unstructured text and special patterns (specific and short strings) using convolutional neural network (CNN) and regular expression method (REM). The research data using digital correspondence documents with format PDF images from Pusat Data Teknologi dan Informasi (Technology and Information Data Center). The document hierarchy covers type of letter, type of manuscript letter, origin of letter and subject of letter. The research method consists of preprocessing, classification, and storage to database. Preprocessing covers extraction using Tesseract optical character recognition (OCR) and formation of word document vector with Word2Vec. Hierarchical classification uses CNN to classify 5 types of letters and regular expression to classify 4 types of manuscript letter, 15 origins of letter and 25 subjects of letter. The classified documents are stored in the Hive database in Hadoop big data architecture. The amount of data used is 5200 documents, consisting of 4000 for training, 1000 for testing and 200 for classification prediction documents. The trial result of 200 new documents is 188 documents correctly classified and 12 documents incorrectly classified. The accuracy of automated hierarchical classification is 94%. Next, the search of classified scanned documents based on content can be developed.
Web Mining Research Issues and Future Directions – A SurveyIOSR Journals
This document summarizes research on web mining techniques. It begins with an abstract describing how web mining aims to extract useful information from vast amounts of unstructured web data. It then reviews various web mining techniques including web content mining, web structure mining, and web usage mining. The document surveys literature on pattern extraction techniques such as association rule mining, clustering, classification, and sequential pattern mining. It also discusses challenges in pre-processing web data and issues related to scaling up data mining algorithms for large web datasets. In closing, the document outlines future research directions in web mining including dealing with unstructured data and multimedia content.
A Review on Resource Discovery Strategies in Grid Computingiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
The document describes the architecture of PIER, an Internet-scale query engine that is designed to operate at a massive scale across thousands or millions of nodes. PIER aims to strike a balance between traditional database approaches that emphasize strong consistency and the relaxed consistency models of Internet systems. It provides a full relational data model and query operators while distributing data and queries across nodes without regard for physical location. The architecture reflects the challenges of building a general-purpose query engine for this environment.
Producer Mobility Support Schemes for Named Data Networking: A Survey IJECEIAES
This document summarizes and analyzes different schemes for supporting producer mobility in Named Data Networking (NDN). It reviews four main approaches:
1. Mapping-Based Mobility Approach schemes depend on mapping servers like DNS to map content identifiers to locations. They provide optimal routing but high signaling overhead and handoff latency.
2. Indirection-Based Mobility Approach schemes use home agents to tunnel interest packets to producers, introducing triangular routing but normal handoff delays.
3. Locator-Based Mobility Approach schemes separate identifiers from locators but can introduce triangular routing.
4. Control/Data Plane-Based Approach schemes have sub-optimal routing and high handoff delays.
The document
Big Data in Bioinformatics & the Era of Cloud ComputingIOSR Journals
This document discusses the challenges of big data in bioinformatics and how cloud computing can address them. It notes that high-throughput experiments are generating huge amounts of biological data from fields like genomics and proteomics. Storing and analyzing this "big data" requires massive computational resources that are costly for individual organizations. However, cloud computing provides elastic, on-demand access to storage and processing power at an affordable cost. This allows bioinformatics data to be securely stored and shared on the cloud to enable collaborative analysis and overcome issues of data transfer, storage limitations, and infrastructure maintenance.
This document discusses using Bayesian networks for predictive analysis and machine learning perspectives on data utilization. It provides an example of using Bayesian networks to accurately predict incident clearance time based on variables like type of incident, number of police/ambulance vehicles, number of injuries, and number of vehicles involved. The document also discusses applying Bayesian networks by collecting current situation data as evidence to perform inference on a constructed inference model.
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
The processing of massive amounts of data has become indispensable especially with the potential proliferation of big data. The volume of information available nowadays makes it difficult for the user to find relevant information in a vast collection of documents. As a result, the exploitation of vast document collections necessitates the implementation of automated technologies that enable appropriate and effective retrieval. In this paper, we will examine the state of the art of IR in XML documents. We will also discuss some works that have used graphs to represent documents in the context of IR. In the same vein, the relationships between the components of a graph are the center of our attention.
This document discusses a system for integrating structured and unstructured data from heterogeneous sources and allowing querying of both data types. The system uses Open Grid Services Architecture Data Access and Integration (OGSA-DAI) services supported by the Globus Toolkit to provide a data abstraction layer. This layer generates metadata from unstructured files to allow database operations on both structured and unstructured data. The system provides a unified interface for users to search and retrieve data from different sources in various formats.
DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...cscpconf
This document discusses the development of a knowledge management system for long-term digital preservation on a semantic grid. It presents a conceptual model that integrates knowledge management principles with archival workflows. A logical architecture is described that realizes this model using a semantic datagrid infrastructure based on the OAIS reference model. The goal is to support flexible management of distributed digital archives while enabling new knowledge discovery and value creation through dynamic semantic annotations over time.
This document discusses multimedia data mining. It describes how multimedia data mining focuses on mining image, audio, and video data. Some key techniques discussed include similarity search to find similar multimedia objects, multidimensional analysis of multimedia data cubes, classification and prediction of multimedia data, and mining associations within and between multimedia objects.
An Approach for Managing Knowledge in Digital Forensics ExaminationsCSCJournals
Computers and digital devices are continuing to evolve in the areas of storage, processing power, memory, and features. Resultantly, digital forensic investigations are becoming more complex due to the increasing size of digital storage reaching gigabytes and terabytes. Due to this growth in disk storage, new approaches for managing the case details of a digital forensics investigation must be developed. In this paper, the importance of managing and reusing knowledge in digital forensic examinations is discussed, a modeling approach for managing knowledge is presented, and experimental results are presented that show how this modeling approach was used by law enforcement to manage the case details of a digital forensic examination.
Study and analysis of mobility, security, and caching issues in CCN IJECEIAES
This document summarizes a study that analyzed mobility, security, and caching issues in Content-Centric Networking (CCN). It discussed the basics of CCN including its key components like the Forwarding Information Base, Pending Interest Table, and Content Store. It then analyzed mobility management in CCN, comparing different schemes for handling consumer and producer mobility. It also discussed security schemes in CCN and reviewed different caching studies. The document aimed to identify open research challenges in these three important areas of CCN and discuss future trends for its large-scale deployment.
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET Journal
This document summarizes a research paper that proposes a new method for securely sharing and querying genomic DNA sequences stored in the cloud without violating privacy. The method builds on existing frameworks by offering deterministic results with zero error probability, and a scheme that is twice as fast but uses twice the storage space, which is preferable given cloud storage pricing. The encoding of the data supports a richer set of query types beyond exact matching, including counting matches, logical OR matches, handling ambiguities, threshold queries, and concealing results from the decrypting server. Linear and logistic regression algorithms are used to analyze the data. The literature review discusses previous work on securely sharing genomic data and transforming protocols to ensure accountability without compromising privacy.
Security Aspects of the Information Centric Networks ModelCSCJournals
With development of internet and the enormous growth of contents over networks, that motivated the researchers to proposed new paradigm model called Information Centric Networks ICN , the most features of ICN model is based on the content itself, instead, of the server located the contents over internet. This new model has a lot of challenges such as, mobility of contents, naming, replications, cashing, communications, and the security issue to secure the contents, customer, and providers. In this paper we will focus on ICN Model and propose solutions of security to protect the network elements, since the security is based on the packet itself rather than the host-centric.
encryption based lsb steganography technique for digital images and text dataINFOGAIN PUBLICATION
Digital steganography is the art and science of hiding communications; a steganographic system thus embeds secret data in public cover media so as not to arouse an eavesdropper’s suspicion. A steganographic system has two main aspects: steganographic capacity and imperceptibility. However, these two characteristics are at odds with each other. Furthermore, it is quite difficult to increase the steganographic capacity and simultaneously maintain the imperceptibility of a steganographic system. Additionally, there are still very limited methods of Steganography to be used with communication protocols, which represent unconventional but promising Steganography mediums. Digital image Steganography, as a method of secret communication, aims to convey a large amount of secret data, relatively to the size of cover image, between communicating parties. Additionally, it aims to avoid the suspicion of non-communicating parties to this kind of communication. Thus, this research addresses and proposes some methods to improve these fundamental aspects of digital image Steganography. Hence, some characteristics and properties of digital images have been employed to increase the steganographic capacity and enhance the stego image quality (imperceptibility). Here, the research aim is identified based on the established definition of the research problem and motivations. Unlike encryption, Steganography hides the very existence of secret information rather than hiding its meaning only. Image based Steganography is the most common system used since digital images are widely used over the Internet and Web. However, the capacity is mostly limited and restricted by the size of cover images. In addition, there is a tradeoff between both steganographic capacity and stego image quality. Therefore, increasing steganographic capacity and enhancing stego image quality are still challenges, and this is exactly our research main aim. To get a high steganographic capacity, novel Steganography methods were proposed. The first method was based on using 8x8 non-overlapping blocks and quantization table for DCT with compression. Second method incorporates the DWT technique, with quality of any stego images as enhanced to get correct hidden image. And last LSB as to store images with Key type security built in.
Analysis Model in the Cloud Optimization Consumption in Pricing the Internet ...IJECEIAES
The problem of internet pricing is a problem that is often a major problem in optimization. In this study, the internet pricing scheme focuses on optimizing the use of bandwidth consumption. This research utilizes modification of cloud model in finding optimal solution in network. Cloud computing is computational model which is like network, server, storage and service that is utilizing internet connection. As ISP's Internet service provider requires appropriate pricing schemes in order to maximize revenue and provide quality of service (Quality on Service) or QoS so as to satisfy internet users or users. The model used will be completed with the help of LINGO software program to get optimal solution and accurate result. Based on the optimal solution obtained from the modification of the cloud model can be utilized ISP to maximize revenue and provide services in accordance with needs and requests.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. 1 Data mining is an interdisciplinary sub field of computer science and statistics with an overall goal to extract from a data set and transform the information into a comprehensible structure for further use. 1 2 3 4 The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as 'knowledge discovery' in databases, the term data mining wasn't coined until the 1990s. What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Rupashi Koul "Overview of Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31368.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31368/overview-of-data-mining/rupashi-koul
The document discusses mining frequent items and item sets from data streams using fuzzy approaches. It describes objectives of mining frequent items from datasets in real-time using fuzzy sets and slices. This involves fetching relevant records, analyzing the data, searching for liked items using fuzzy slices, identifying frequently viewed item lists, making recommendations, and evaluating the results. Algorithms used for mining frequent items from data streams in a single or multiple pass are also reviewed.
Next generation big data analytics state of the artNazrul Islam
This document reviews recent research on network big data, including different data types (online network data, mobile/IoT data, geography data, spatial temporal data, streaming/real-time data, and visual data), storage models, privacy and security issues, analysis methods, and applications. It discusses the trends in network big data moving from simple online social network data to include more data sources and data that has spatial, temporal, and real-time components. The challenges of analyzing large and diverse datasets from networks are also addressed.
This document summarizes a proposed method for text-based video retrieval. The method involves:
1) Extracting frames from videos and segmenting text regions within frames.
2) Recognizing characters using optical character recognition (OCR) and extracting color features.
3) Storing the text features and color features in a database.
4) Matching user-inputted text queries to the stored text features to retrieve matching videos. The proposed method aims to improve video indexing and retrieval accuracy compared to visual query methods.
Content based video retrieval using discrete cosine transformnooriasukmaningtyas
A content based video retrieval (CBVR) framework is built in this paper.
One of the essential features of video retrieval process and CBVR is a color
value. The discrete cosine transform (DCT) is used to extract a query video
features to compare with the video features stored in our database. Average
result of 0.6475 was obtained by using the DCT after implementing it to the
database we created and collected, and on all categories. This technique was
applied on our database of video, check 100 database videos, 5 videos in
Keywords: each category.
Precision face image retrieval by extracting the face features and comparing ...prjpublications
This document describes a proposed method for improving content-based face image retrieval. The method uses two orthogonal techniques: attribute-enhanced sparse coding and attribute-embedded inverted indexing. Attribute-enhanced sparse coding exploits global features to construct semantic codewords offline. Attribute-embedded inverted indexing considers local query image features in a binary signature to efficiently retrieve images. By combining these techniques, the method reduces errors and achieves better face image extraction from databases compared to existing content-based retrieval systems. It works by extracting features from the query image, matching them to database images, and returning ranked results.
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
A digital library is a type of information retrieval (IR) system. The existing information retrieval
methodologies generally have problems on keyword-searching. We proposed a model to solve
the problem by using concept-based approach (ontology) and metadata case base. This model
consists of identifying domain concepts in user’s query and applying expansion to them. The
system aims at contributing to an improved relevance of results retrieved from digital libraries
by proposing a conceptual query expansion for intelligent concept-based retrieval. We need to
import the concept of ontology, making use of its advantage of abundant semantics and
standard concept. Domain specific ontology can be used to improve information retrieval from
traditional level based on keyword to the lay based on knowledge (or concept) and change the
process of retrieval from traditional keyword matching to semantics matching. One approach is
query expansion techniques using domain ontology and the other would be introducing a case
based similarity measure for metadata information retrieval using Case Based Reasoning
(CBR) approach. Results show improvements over classic method, query expansion using
general purpose ontology and a number of other approaches.
The document describes the architecture of PIER, an Internet-scale query engine that is designed to operate at a massive scale across thousands or millions of nodes. PIER aims to strike a balance between traditional database approaches that emphasize strong consistency and the relaxed consistency models of Internet systems. It provides a full relational data model and query operators while distributing data and queries across nodes without regard for physical location. The architecture reflects the challenges of building a general-purpose query engine for this environment.
Producer Mobility Support Schemes for Named Data Networking: A Survey IJECEIAES
This document summarizes and analyzes different schemes for supporting producer mobility in Named Data Networking (NDN). It reviews four main approaches:
1. Mapping-Based Mobility Approach schemes depend on mapping servers like DNS to map content identifiers to locations. They provide optimal routing but high signaling overhead and handoff latency.
2. Indirection-Based Mobility Approach schemes use home agents to tunnel interest packets to producers, introducing triangular routing but normal handoff delays.
3. Locator-Based Mobility Approach schemes separate identifiers from locators but can introduce triangular routing.
4. Control/Data Plane-Based Approach schemes have sub-optimal routing and high handoff delays.
The document
Big Data in Bioinformatics & the Era of Cloud ComputingIOSR Journals
This document discusses the challenges of big data in bioinformatics and how cloud computing can address them. It notes that high-throughput experiments are generating huge amounts of biological data from fields like genomics and proteomics. Storing and analyzing this "big data" requires massive computational resources that are costly for individual organizations. However, cloud computing provides elastic, on-demand access to storage and processing power at an affordable cost. This allows bioinformatics data to be securely stored and shared on the cloud to enable collaborative analysis and overcome issues of data transfer, storage limitations, and infrastructure maintenance.
This document discusses using Bayesian networks for predictive analysis and machine learning perspectives on data utilization. It provides an example of using Bayesian networks to accurately predict incident clearance time based on variables like type of incident, number of police/ambulance vehicles, number of injuries, and number of vehicles involved. The document also discusses applying Bayesian networks by collecting current situation data as evidence to perform inference on a constructed inference model.
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
The processing of massive amounts of data has become indispensable especially with the potential proliferation of big data. The volume of information available nowadays makes it difficult for the user to find relevant information in a vast collection of documents. As a result, the exploitation of vast document collections necessitates the implementation of automated technologies that enable appropriate and effective retrieval. In this paper, we will examine the state of the art of IR in XML documents. We will also discuss some works that have used graphs to represent documents in the context of IR. In the same vein, the relationships between the components of a graph are the center of our attention.
This document discusses a system for integrating structured and unstructured data from heterogeneous sources and allowing querying of both data types. The system uses Open Grid Services Architecture Data Access and Integration (OGSA-DAI) services supported by the Globus Toolkit to provide a data abstraction layer. This layer generates metadata from unstructured files to allow database operations on both structured and unstructured data. The system provides a unified interface for users to search and retrieve data from different sources in various formats.
DEVELOPING A KNOWLEDGE MANAGEMENT SPIRAL FOR THE LONG-TERM PRESERVATION SYSTE...cscpconf
This document discusses the development of a knowledge management system for long-term digital preservation on a semantic grid. It presents a conceptual model that integrates knowledge management principles with archival workflows. A logical architecture is described that realizes this model using a semantic datagrid infrastructure based on the OAIS reference model. The goal is to support flexible management of distributed digital archives while enabling new knowledge discovery and value creation through dynamic semantic annotations over time.
This document discusses multimedia data mining. It describes how multimedia data mining focuses on mining image, audio, and video data. Some key techniques discussed include similarity search to find similar multimedia objects, multidimensional analysis of multimedia data cubes, classification and prediction of multimedia data, and mining associations within and between multimedia objects.
An Approach for Managing Knowledge in Digital Forensics ExaminationsCSCJournals
Computers and digital devices are continuing to evolve in the areas of storage, processing power, memory, and features. Resultantly, digital forensic investigations are becoming more complex due to the increasing size of digital storage reaching gigabytes and terabytes. Due to this growth in disk storage, new approaches for managing the case details of a digital forensics investigation must be developed. In this paper, the importance of managing and reusing knowledge in digital forensic examinations is discussed, a modeling approach for managing knowledge is presented, and experimental results are presented that show how this modeling approach was used by law enforcement to manage the case details of a digital forensic examination.
Study and analysis of mobility, security, and caching issues in CCN IJECEIAES
This document summarizes a study that analyzed mobility, security, and caching issues in Content-Centric Networking (CCN). It discussed the basics of CCN including its key components like the Forwarding Information Base, Pending Interest Table, and Content Store. It then analyzed mobility management in CCN, comparing different schemes for handling consumer and producer mobility. It also discussed security schemes in CCN and reviewed different caching studies. The document aimed to identify open research challenges in these three important areas of CCN and discuss future trends for its large-scale deployment.
A Survey of Agent Based Pre-Processing and Knowledge RetrievalIOSR Journals
Abstract: Information retrieval is the major task in present scenario as quantum of data is increasing with a
tremendous speed. So, to manage & mine knowledge for different users as per their interest, is the goal of every
organization whether it is related to grid computing, business intelligence, distributed databases or any other.
To achieve this goal of extracting quality information from large databases, software agents have proved to be
a strong pillar. Over the decades, researchers have implemented the concept of multi agents to get the process
of data mining done by focusing on its various steps. Among which data pre-processing is found to be the most
sensitive and crucial step as the quality of knowledge to be retrieved is totally dependent on the quality of raw
data. Many methods or tools are available to pre-process the data in an automated fashion using intelligent
(self learning) mobile agents effectively in distributed as well as centralized databases but various quality
factors are still to get attention to improve the retrieved knowledge quality. This article will provide a review of
the integration of these two emerging fields of software agents and knowledge retrieval process with the focus
on data pre-processing step.
Keywords: Data Mining, Multi Agents, Mobile Agents, Preprocessing, Software Agents
IRJET- Swift Retrieval of DNA Databases by Aggregating QueriesIRJET Journal
This document summarizes a research paper that proposes a new method for securely sharing and querying genomic DNA sequences stored in the cloud without violating privacy. The method builds on existing frameworks by offering deterministic results with zero error probability, and a scheme that is twice as fast but uses twice the storage space, which is preferable given cloud storage pricing. The encoding of the data supports a richer set of query types beyond exact matching, including counting matches, logical OR matches, handling ambiguities, threshold queries, and concealing results from the decrypting server. Linear and logistic regression algorithms are used to analyze the data. The literature review discusses previous work on securely sharing genomic data and transforming protocols to ensure accountability without compromising privacy.
Security Aspects of the Information Centric Networks ModelCSCJournals
With development of internet and the enormous growth of contents over networks, that motivated the researchers to proposed new paradigm model called Information Centric Networks ICN , the most features of ICN model is based on the content itself, instead, of the server located the contents over internet. This new model has a lot of challenges such as, mobility of contents, naming, replications, cashing, communications, and the security issue to secure the contents, customer, and providers. In this paper we will focus on ICN Model and propose solutions of security to protect the network elements, since the security is based on the packet itself rather than the host-centric.
encryption based lsb steganography technique for digital images and text dataINFOGAIN PUBLICATION
Digital steganography is the art and science of hiding communications; a steganographic system thus embeds secret data in public cover media so as not to arouse an eavesdropper’s suspicion. A steganographic system has two main aspects: steganographic capacity and imperceptibility. However, these two characteristics are at odds with each other. Furthermore, it is quite difficult to increase the steganographic capacity and simultaneously maintain the imperceptibility of a steganographic system. Additionally, there are still very limited methods of Steganography to be used with communication protocols, which represent unconventional but promising Steganography mediums. Digital image Steganography, as a method of secret communication, aims to convey a large amount of secret data, relatively to the size of cover image, between communicating parties. Additionally, it aims to avoid the suspicion of non-communicating parties to this kind of communication. Thus, this research addresses and proposes some methods to improve these fundamental aspects of digital image Steganography. Hence, some characteristics and properties of digital images have been employed to increase the steganographic capacity and enhance the stego image quality (imperceptibility). Here, the research aim is identified based on the established definition of the research problem and motivations. Unlike encryption, Steganography hides the very existence of secret information rather than hiding its meaning only. Image based Steganography is the most common system used since digital images are widely used over the Internet and Web. However, the capacity is mostly limited and restricted by the size of cover images. In addition, there is a tradeoff between both steganographic capacity and stego image quality. Therefore, increasing steganographic capacity and enhancing stego image quality are still challenges, and this is exactly our research main aim. To get a high steganographic capacity, novel Steganography methods were proposed. The first method was based on using 8x8 non-overlapping blocks and quantization table for DCT with compression. Second method incorporates the DWT technique, with quality of any stego images as enhanced to get correct hidden image. And last LSB as to store images with Key type security built in.
Analysis Model in the Cloud Optimization Consumption in Pricing the Internet ...IJECEIAES
The problem of internet pricing is a problem that is often a major problem in optimization. In this study, the internet pricing scheme focuses on optimizing the use of bandwidth consumption. This research utilizes modification of cloud model in finding optimal solution in network. Cloud computing is computational model which is like network, server, storage and service that is utilizing internet connection. As ISP's Internet service provider requires appropriate pricing schemes in order to maximize revenue and provide quality of service (Quality on Service) or QoS so as to satisfy internet users or users. The model used will be completed with the help of LINGO software program to get optimal solution and accurate result. Based on the optimal solution obtained from the modification of the cloud model can be utilized ISP to maximize revenue and provide services in accordance with needs and requests.
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. 1 Data mining is an interdisciplinary sub field of computer science and statistics with an overall goal to extract from a data set and transform the information into a comprehensible structure for further use. 1 2 3 4 The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as 'knowledge discovery' in databases, the term data mining wasn't coined until the 1990s. What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power. Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Rupashi Koul "Overview of Data Mining" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31368.pdf Paper Url :https://www.ijtsrd.com/engineering/computer-engineering/31368/overview-of-data-mining/rupashi-koul
The document discusses mining frequent items and item sets from data streams using fuzzy approaches. It describes objectives of mining frequent items from datasets in real-time using fuzzy sets and slices. This involves fetching relevant records, analyzing the data, searching for liked items using fuzzy slices, identifying frequently viewed item lists, making recommendations, and evaluating the results. Algorithms used for mining frequent items from data streams in a single or multiple pass are also reviewed.
Next generation big data analytics state of the artNazrul Islam
This document reviews recent research on network big data, including different data types (online network data, mobile/IoT data, geography data, spatial temporal data, streaming/real-time data, and visual data), storage models, privacy and security issues, analysis methods, and applications. It discusses the trends in network big data moving from simple online social network data to include more data sources and data that has spatial, temporal, and real-time components. The challenges of analyzing large and diverse datasets from networks are also addressed.
This document summarizes a proposed method for text-based video retrieval. The method involves:
1) Extracting frames from videos and segmenting text regions within frames.
2) Recognizing characters using optical character recognition (OCR) and extracting color features.
3) Storing the text features and color features in a database.
4) Matching user-inputted text queries to the stored text features to retrieve matching videos. The proposed method aims to improve video indexing and retrieval accuracy compared to visual query methods.
Content based video retrieval using discrete cosine transformnooriasukmaningtyas
A content based video retrieval (CBVR) framework is built in this paper.
One of the essential features of video retrieval process and CBVR is a color
value. The discrete cosine transform (DCT) is used to extract a query video
features to compare with the video features stored in our database. Average
result of 0.6475 was obtained by using the DCT after implementing it to the
database we created and collected, and on all categories. This technique was
applied on our database of video, check 100 database videos, 5 videos in
Keywords: each category.
Precision face image retrieval by extracting the face features and comparing ...prjpublications
This document describes a proposed method for improving content-based face image retrieval. The method uses two orthogonal techniques: attribute-enhanced sparse coding and attribute-embedded inverted indexing. Attribute-enhanced sparse coding exploits global features to construct semantic codewords offline. Attribute-embedded inverted indexing considers local query image features in a binary signature to efficiently retrieve images. By combining these techniques, the method reduces errors and achieves better face image extraction from databases compared to existing content-based retrieval systems. It works by extracting features from the query image, matching them to database images, and returning ranked results.
INTELLIGENT INFORMATION RETRIEVAL WITHIN DIGITAL LIBRARY USING DOMAIN ONTOLOGYcscpconf
A digital library is a type of information retrieval (IR) system. The existing information retrieval
methodologies generally have problems on keyword-searching. We proposed a model to solve
the problem by using concept-based approach (ontology) and metadata case base. This model
consists of identifying domain concepts in user’s query and applying expansion to them. The
system aims at contributing to an improved relevance of results retrieved from digital libraries
by proposing a conceptual query expansion for intelligent concept-based retrieval. We need to
import the concept of ontology, making use of its advantage of abundant semantics and
standard concept. Domain specific ontology can be used to improve information retrieval from
traditional level based on keyword to the lay based on knowledge (or concept) and change the
process of retrieval from traditional keyword matching to semantics matching. One approach is
query expansion techniques using domain ontology and the other would be introducing a case
based similarity measure for metadata information retrieval using Case Based Reasoning
(CBR) approach. Results show improvements over classic method, query expansion using
general purpose ontology and a number of other approaches.
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
This document summarizes a research paper that proposes a novel approach for content-based image retrieval using wavelet transform and hierarchical neural networks. The paper describes how wavelet transforms are used to extract features from images, and a neural network is trained on these features to classify and retrieve similar images. The system was tested on a database of 450 images across different categories. Initial results found an accuracy of about 70% when querying images. The paper concludes that while initial results are promising, further research is needed to explore different wavelet functions, feature extraction techniques, and classification methods to improve accuracy.
Research Inventy: International Journal of Engineering and Scienceresearchinventy
This document summarizes a research paper that proposes a novel approach for content-based image retrieval using wavelet transform and hierarchical neural networks. The paper describes how wavelet transforms are used to extract features from images, and a neural network is trained on these features to classify and retrieve similar images. The system was tested on a database of 450 images across different categories. Initial results found an accuracy of about 70% when querying images. The paper concludes that while initial results are promising, further research is needed to explore different wavelet functions, feature extraction techniques, and classification methods to improve accuracy.
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Extraction and Retrieval of Web based Content in Web EngineeringIRJET Journal
The document discusses a proposed architecture for parallelizing natural language processing (NLP) operations and web content crawling using Apache Hadoop and MapReduce. The system extracts keywords and key phrases from online articles using NLP techniques like part-of-speech tagging in a Hadoop cluster. Evaluation of the system showed improved storage capacity, faster data processing, shorter search times and accurate information retrieval from large datasets stored in HBase.
An Improved Annotation Based Summary Generation For Unstructured DataMelinda Watson
This document discusses annotation-based summarization of unstructured data. It begins with an introduction to annotation and information retrieval. Current annotation processes cannot maintain modifications due to frequent document updates. The document then reviews literature on automatic text classification, applying annotations to linked open data sets, and using domain ontologies for automatic document annotation. Keywords, sentences and contexts are extracted from documents for annotation. Different annotation models are discussed. The goal is to develop an improved annotation approach for summarizing unstructured data that can handle frequent document changes.
Content-based image retrieval (CBIR) uses the content features for
retrieving and searching the images in a given large database. Earlier,
different hand feature descriptor designs are researched based on cues that
are visual such as shape, colour, and texture used to represent these images.
Although, deep learning technologies have widely been applied as an
alternative to designing engineering that is dominant for over a decade. The
features are automatically learnt through the data. This research work
proposes integrated dual deep convolutional neural network (IDD-CNN),
IDD-CNN comprises two distinctive CNN, first CNN exploits the features
and further custom CNN is designed for exploiting the custom features.
Moreover, a novel directed graph is designed that comprises the two blocks
i.e. learning block and memory block which helps in finding the similarity
among images; since this research considers the large dataset, an optimal
strategy is introduced for compact features. Moreover, IDD-CNN is
evaluated considering the two distinctive benchmark datasets the oxford
dataset considering mean average precision (mAP) metrics and comparative
analysis shows IDD-CNN outperforms the other existing model.
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...IJEACS
The huge amount of library data stored in our modern research and statistic centers of organizations is springing up on daily bases. These databases grow exponentially in size with respect to time, it becomes exceptionally difficult to easily understand the behavior and interpret data with the relationships that exist between attributes. This exponential growth of data poses new organizational challenges like the conventional record management system infrastructure could no longer cope to give precise and detailed information about the behavior data over time. There is confusion and novel concern in selecting tools that can support and handle big data visualization that deals with multi-dimension. Viewing all related data at once in a database is a problem that has attracted the interest of data professionals with machine learning skills. This is a lingering issue in the data industry because the existing techniques cannot be used to remove or filter noise from relevant data and pad up missing values in order to get the required information. The aim is to develop a stacked generalization model that combines the functionality of random forest and decision tree to visualization library database visualization. In this paper, the random forest and decision tree techniques were employed to effectively visualize large amounts of school library data. The proposed system was implemented with a few lines of Python code to create visualizations that can help users at a glance understand and interpret the behavior of data and its relationships. The model was trained and tested to learn and extract hidden patterns of data with a cross-validation test. It combined the functionalities of both models to form a stacked generalization model that performed better than the individual techniques. The stacked model produced 95% followed by the RF which produced a 95% accuracy rate and 0.223600 RMSE error value in comparison with the DT which recorded an 80.00% success rate and 0.15990 RMSE value.
Multivariate feature descriptor based cbir model to query large image databasesIJARIIT
The content based image retrieval (CBIR) applications have grown their popularity in the past decade with the
exponential growth in the image data volumes. The social networks have aggravated the size of image data on the internet. Social
network enables everyone to upload the images of one’s choice, which becomes the reason behind aggregation of millions of
images on the cyber space. It’s not possible to query these large image databases with the ordinary methods. Hence there was a
strong requirement of a smart and intelligent method to discover the similar images, which has been accomplished by using the
machine learning methods. In this paper, the multivariate feature descriptor method has been presented to extract the required
and relevant information from the large image databases. The proposed multivariate method involves the image color and texture
for the purpose of image matching to the query image (also known as a reference image). The most matching entities are returned
as the final results by the image extraction method. There are four methods, which involves three singular feature and one
multivariate feature based models, have been implemented. The multivariate model has been found much stable and returned
the maximum accuracy under this model.
This document proposes a novel approach for detecting text in images and using the detected text as keywords to retrieve similar textual images from a database. The approach uses a text detection technique to find text regions in images, eliminates false positives, recognizes the text using OCR, and forms keywords using a neural language model. The detected keywords are then used to index and retrieve similar textual images from two benchmark datasets. Experimental results show the approach effectively retrieves similar textual images by exploiting the dominant text information in the images.
This document proposes a novel approach for detecting text in images and using the detected text as keywords to retrieve similar textual images from a database. The approach uses a text detection technique to find text regions in images, eliminates false positives, recognizes the text using OCR, and forms keywords using a neural language model. These keywords are then used to index and retrieve similar textual images based on the detected text. The experimental results on two benchmark datasets show this text-based approach is effective for retrieving textual images.
This document discusses optimizing content-based image retrieval in peer-to-peer systems. It summarizes previous work on content-based image retrieval using multi-instance queries in peer-to-peer networks. The authors propose two optimizations to previous work: 1) clustering peers to reduce search time, and 2) constructing a search index at cluster heads to avoid searching each peer. Their experiments show the proposed approach reduces search time compared to previous work, with some reduction in accuracy that improves as more nodes are added. The authors plan future work to analyze performance with different cluster sizes and representations for faster search.
This document summarizes a research paper on developing a feature-based product recommendation system. It begins by introducing recommender systems and their importance for e-commerce. It then describes how the proposed system takes basic product descriptions as input, recognizes features using association rule mining and k-nearest neighbor algorithms, and outputs recommended additional features to improve the product profile. The paper evaluates the system's performance on recommending antivirus software features. In under 3 sentences.
Text Mining with Automatic Annotation from Unstructured ContentIIRindia
This document summarizes text mining techniques for automatic annotation of unstructured content such as images, video and audio. It discusses using multimedia mining to extract structured information from multimedia data through techniques like automatic image annotation and video annotation. Automatic image annotation assigns keywords or tags to images using computer vision and machine learning. Video annotation creates text transcripts from spoken narration in videos and segments videos into clips linked to corresponding text. The document explores models for content-based multimedia retrieval based on visual features and annotations.
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
The document proposes a method for video retrieval based on genre recognition of a query video clip. It extracts regions of interest from frames of the query clip and videos in a database based on motion detection. Features are extracted from these regions and used for matching to recognize the genre. A tree pruning technique is employed to identify the genre of the query clip and retrieve similar genre videos from the database. The method segments objects, recognizes them, and uses tree pruning for genre recognition and retrieval. It was evaluated on a dataset containing sports, movies, and news genres and showed effectiveness in genre recognition and retrieval.
Query clip genre recognition using tree pruning technique for video retrievalIAEME Publication
The document proposes a method for video retrieval based on genre recognition of a query video clip. It extracts regions of interest from frames of the query clip and videos in a database. Features are extracted from these regions and used for matching via Euclidean distance. A tree pruning technique is employed to recognize the genre of the query clip and retrieve similar genre videos from the database. The method segments objects, extracts features, performs matching and genre recognition, and retrieves relevant videos in three or fewer sentences.
Similar to RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTION IN CBMR (20)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)Rebecca Bilbro
To honor ten years of PyData London, join Dr. Rebecca Bilbro as she takes us back in time to reflect on a little over ten years working as a data scientist. One of the many renegade PhDs who joined the fledgling field of data science of the 2010's, Rebecca will share lessons learned the hard way, often from watching data science projects go sideways and learning to fix broken things. Through the lens of these canon events, she'll identify some of the anti-patterns and red flags she's learned to steer around.
Do People Really Know Their Fertility Intentions? Correspondence between Sel...Xiao Xu
Fertility intention data from surveys often serve as a crucial component in modeling fertility behaviors. Yet, the persistent gap between stated intentions and actual fertility decisions, coupled with the prevalence of uncertain responses, has cast doubt on the overall utility of intentions and sparked controversies about their nature. In this study, we use survey data from a representative sample of Dutch women. With the help of open-ended questions (OEQs) on fertility and Natural Language Processing (NLP) methods, we are able to conduct an in-depth analysis of fertility narratives. Specifically, we annotate the (expert) perceived fertility intentions of respondents and compare them to their self-reported intentions from the survey. Through this analysis, we aim to reveal the disparities between self-reported intentions and the narratives. Furthermore, by applying neural topic modeling methods, we could uncover which topics and characteristics are more prevalent among respondents who exhibit a significant discrepancy between their stated intentions and their probable future behavior, as reflected in their narratives.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
RECURRENT FEATURE GROUPING AND CLASSIFICATION MODEL FOR ACTION MODEL PREDICTION IN CBMR
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
DOI: 10.5121/ijdkp.2017.7605 63
RECURRENT FEATURE GROUPING AND
CLASSIFICATION MODEL FOR ACTION MODEL
PREDICTION IN CBMR
Vinoda Reddy1
, P.Suresh Varma2
, A.Govardhan3
1
Associate Professor & HOD, CSE Dept., SIT, Gulbarga, Karnataka, India.
2
Principal, University College of Engineering,AdikaviNannaya University, Rajahmundry,
A.P, India
3
Principal, University College of Engineering, JNTUH, Hyderabad, Telangana, India.
ABSTRACT:
Content based retrieval has an advantage of higher prediction accuracy as compared to tagging based
approach. However, the complexity in its representation and classification approach, results in lower
processing accuracy and computation overhead. The correlative nature of the feature data are un-explored
in the conventional modeling, where all the data features are taken as a set of feature values to give a
decision. The recurrent feature class attribute is observed for the feature regrouping in action model
prediction. In this paper a co-relative information, bounding grouping approach is suggested for action
model prediction in CBMR application. The co-relative recurrent feature mapping results in faster
retrieval process as compared to the conventional retrieval system.
KEY WORDS:
Recurrent feature mapping, classification, action model prediction, CBMR.
I. INTRODUCTION
With the improvements in new mode of data exchange approaches, sharing of data from one
location to other has become easier. So, as the updation to a remote database is also improved.
The more, the collection of data, more accurate the retrieval accuracy is. However, the process of
retrieval depends on the passed query and the defining tag information in the database. In this
direction, content based Multimedia retrieval (CBMR) has emerged as a retrieval system for
multimedia data.Nowadays as large multimedia data sets such as audio, speech, text, web, image,
video and combinations of several types are becoming increasingly available and are almost
unstructured or semi-structured data by nature, which makes it difficult for human beings to
extract the information without powerful tools. This drives the need to develop data mining
techniques that can work on all kinds of data such as documents, images, and signals. [1] explores
on survey of the current state of multimedia data mining and knowledge discovery, data mining
efforts aimed at multimedia data, current approaches and well known techniques for mining
multimedia data. [2] Provides a survey of the state-of-the-art in the field of Visual Information
Retrieval (VIR) systems, particularly Content-Based Visual Information Retrieval (CBVIR)
systems. It presents the main concepts and system design issues, reviews many research
prototypes and commercial solutions currently available and points out promising research
directions in this area. Content-based information retrieval of multimedia data is a great and
attractive challenge which raises numerous research activities. As multimedia data become
ubiquitous in our daily lives, information retrieval systems have to adapt their retrieval
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
64
performance to different situations in order to efficiently satisfy the users’ information needs
anytime and anywhere.
To enhance the content-based multimedia information retrieval process, [3] proposes an efficient
similarity measure based on flexible feature representations. It also defines the similarity model
for content-based information retrieval and show its feasibility on real world multimedia data.
Recently, information retrieval for text and multimedia content has become an important research
area. Content based retrieval in multimedia is a challenging problem since multimedia data needs
detailed interpretation from pixel values. [4] Proposed an overview of the content based retrieval
is presented along with the different strategies in terms of syntactic and semantic indexing for
retrieval. The matching techniques used and learning methods employed are also analyzed, and
key directions for future research are also presented. Multimedia database management systems
can be seen as storage and retrieval systems, in which large volumes of media objects are created,
indexed, modified, searched and retrieved. There are two types of retrievals: navigation through
database to locate the desired data and database querying that finds the desired data associatively,
attribute-based, or content-based. Query processing in multimedia databases has the following
stages: query specification and refining, query processing (compiling, optimizing, and executing),
generation of relevancy ranked results, and relevance feedback. In content-based retrieval
systems, the collections of multimedia objects are stored as digitized representations. The queries
are characterized by fuzziness and ambiguity. Therefore, the user will not get from the beginning
what s/he wants. Metrics (precision, recall) and indexation and content-based retrieval techniques
can help, but the user is the one who will have the last word in appreciation of a query results.
Many multimedia applications, such as the World-Wide-Web (WWW), video-on-demand (VOD),
and digital library, require strong support of a video database system. In [6], a survey was done
on various approaches to content-based video data retrieval. Compared with traditional keyword-
based data retrieval, content-based retrieval provides a flexible and powerful way to retrieve
video data. It also discusses three approaches in depth. First, since video segmentation is an
elementary task in video index construction, we discuss an approach to detect shot changes for
video segmentation, which is based on the processing of MPEG compressed video data. In [7] the
current state-of-the-art in Relevance Feedback was discussed as seen from content-based image
retrieval point of view and recommend a novel approach for the future. The large and growing
amount of digital data, and the development of the Internet highlight the need to develop
sophisticated access methods that provide more than just simple text-based queries. Many
programs have been developed with complex mathematics algorithms to allow the transformation
of image or audio data in a way that enhances searching accuracy. However it becomes difficult
when dealing with large sets of multimedia data. [8] Proposes and demonstrates a Content Based
Multimedia Retrieval System (CBMRS). The proposed CBMRS includes both video and audio
retrieval systems. The Content Based Video Retrieval System (CBVRS) based on DCT and
clustering algorithms. The audio retrieval system based on Mel-Frequency Cepstral Coefficients
(MFCCs), the Dynamic Time Warping (DTW) algorithm and the Nearest Neighbor (NN) rule.
With the development of multimedia data types and available bandwidth there is huge demand of
video retrieval systems, as users shift from text based retrieval systems to content based retrieval
systems.
Selection of extracted features play an important role in content based video retrieval regardless
of video attributes being under consideration. These features are intended for selecting, indexing
and ranking according to their potential interest to the user. Good features selection also allows
the time and space costs of the retrieval process to be reduced. [9] Reviews the interesting
features that can be extracted from video data for indexing and retrieval along with similarity
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
65
measurement methods. It also identifies present research issues in area of content based video
retrieval systems. Image retrieval has been popular for several years. There are different system
designs for content based image retrieval (CBIR) system. [10] Propose a novel system
architecture for CBIR system which combines techniques include content-based image and color
analysis, as well as data mining techniques. To our best knowledge, this is the first time to
propose segmentation and grid module, feature extraction module, K-means and k-nearest
neighbor clustering algorithms and bring in the neighborhood module to build the CBIR system.
Concept of neighborhood color analysis module which also recognizes the side of every grids of
image is first contributed in [10] only. The spatial, temporal, storage, retrieval, integration, and
presentation requirements of multimedia data differ significantly from those for traditional data.
A multimedia database management system provides for the efficient storage and manipulation of
multimedia data in all its varied forms. [11] looks into the basic nature of multimedia data,
highlight the need for multimedia DBMSs, and discuss the requirements and issues necessary for
developing such systems. [12] Presents a query processing strategy for the content-based video
query language named CVQL. By CVQL, users can flexibly specify query predicates by the
spatial and temporal relationships of the content objects. The query processing strategy evaluates
the predicates and returns qualified videos or frames as results. A promising approach is based on
using the audio layer of a lecture recording to obtain information about the lecture’s contents [13]
present an indexing method for computer science courses based on their existing recorded videos.
The transcriptions from a speech-recognition engine (SRE) are sufficient to create a chain index
for detailed browsing inside a lecture video. The index structure and the evaluation of the
supplied keywords are presented. Content sharing networks, such as YouTube, contain traces of
both explicit online interactions (such as likes, comments, or subscriptions), as well as latent
interactions (such as quoting, or remixing, parts of a video).[14] propose visual memes, or
frequently re-posted short video segments, for detecting and monitoring such latent video
interactions at scale. Visual memes are extracted by scalable detection algorithms that we
develop, with high accuracy. It further augments visual memes with text, via a statistical model of
latent topics. [14] models content interactions on YouTube with visual memes, defining several
measures of influence and building predictive models for meme popularity. Main goal of [15] is
to show the improvement of using a textual pre-filtering combined with an image re-ranking in a
Multimedia Information Retrieval task. The define three step based retrieval processes and a well-
selected combination of visual and textual techniques help the developed Multimedia Information
Retrieval System to overcome the semantic gap in a given query.
In [15] five different late semantic fusion approaches are discussed and experimented in a
realistic scenario for multimedia retrieval like the one provided by the publicly available Image
CLEF Wikipedia Collection. As digital video databases grow, so grows the problem of
effectively navigating through them. [16] Propose a novel content-based video retrieval approach
to searching such video databases, specifically those involving human actions, incorporating
spatio-temporal localization. It outline a novel, highly efficient localization model that first
performs temporal localization based on histograms of evenly spaced time-slices, then spatial
localization based on histograms of a 2-D spatial grid. It further argue that our retrieval model,
based on the aforementioned localization, followed by relevance ranking, results in a highly
discriminative system, while remaining an order of magnitude faster than the current state of- the-
art method. [17] propose scalable solutions to learning query-specific distance functions by 1)
adopting a simple large-margin learning framework, 2) using the query-logs of text-based image
search engine to train distance functions used in content-based systems. Recent research shows
that multimedia resources in the wild are growing at a staggering rate. The rapid increase number
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
66
of multimedia resources has brought an urgent need to develop intelligent methods to organize
and process them. In [18], the semantic link network model is used for organizing multimedia
resources. A whole model for generating the association relation between multimedia resources
using semantic link network model is proposed. The definitions, modules, and mechanisms of the
semantic link network are used in the proposed method. The integration between the semantic
link network and multimedia resources provides a new prospect for organizing them with their
semantics. The semantic gap between low-level visual features and high-level semantics is a well-
known challenge in content-based multimedia information retrieval. With the rapid
popularization of social media, which allows users to assign tags to describe images and videos,
attention is naturally drawn to take advantage of these metadata in order to bridge the semantic
gap. [19] proposes a Sparse Linear Integration (SLI) model that focuses on integrating visual
content and its associated metadata, which are referred to as the content and the context
modalities respectively, for semantic concept retrieval. An optimization problem is formulated to
approximate an instance using a sparse linear combination of other instances. The difference
between them is considered as the error to minimize. In [20] first present a global image data
model that supports both metadata and low-level descriptions of images and their salient objects.
This allows making multi-criteria image retrieval. Then, it presents an image data repository
model that captures all the data described in the model and permit to integrate heterogeneous
operations in a DBMS. In particular, content-based operations (content-based join and selection)
in combination with traditional ones can be carried out using our model. However, the
conventional approaches lack the correlative property of feature matching resulting in lower
processing accuracy and increase in overhead. To eliminate these features, in this paper an new
feature recurrent mapping model is proposed. To present the suggested approach, the rest of the
paper is presented in 7 sections. The CBMR model and its feature representation is outlined in
section 2. Section 3 outlines the feature regrouping approach. The suggested approach of
recurrent feature regrouping is presented in section 4. Section 5 illustrates the result obtained.
2. CBMR MODEL AND FEATURE REPRESENTATION [22]
For the extraction of features in the action model prediction, the developed past work [22] of
multi kernel dimensional reduction technique is used. The proposed multimedia retrieval system
consists of two phases. In the training phase, the training process is developed for database which
facilitates the updation of various multimedia features extracted from different video samples. In
the test phase, the test video sample is processed for feature extraction and the SVM classifier
using the training features are extracted from the database are compared with the features,
matching the result of the test video multimedia feature set taken as multimedia features.
The Multimedia retrieval system proposed is divided into four operational phases:
I) Pre-processing
2) feature extraction using 2D - Gabor filter and histogram
3) Feature vector dimensionality reduction MLK-DR.
4) multimedia retrieval system using SVM classifier.
The proposed approach is a multimedia video retrieval system that is used to estimate multimedia
information’s from the dataset. Multimedia video from database sources are acquired and the
retrieval system process the video input which is then further processed as multimedia video
feature set. For a given multimedia video sample using the histogram feature to estimate the
feature estimate approach and multi-linear dimension reduction (ML-DR) is used in association
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
67
with a 2D-Gabor filter. The Input multimedia video preprocessing phase, process on the input
video data read from the database and process through the images. The samples are then resized,
cropped and converted to gray scale for feature representation. After preprocessing, multimedia
video goes through feature extraction process. At this stage, all possible orientations of Gabor
filter was applied to remove all possible variations. Here, the entire Gabor filter is applied on
eight orientations and histogram feature set is derived. In the next step, the extracted features are
used to reduce the dimensionality using multi-linear approach to reduce feature vectors. A SVM
classifier using these featurevectors are used with the database to compare and match the classes
for classifying the multimedia information.
The proposed algorithm for the system is as follows.
In the Multi-linear dimensionality reduction ML-DR approach the features are transformed to the
multi-linear subspace that extracts the multi-dimensional features from the database. ML-DR,
which operated in linear dimension for expansion. While ML-DR operated directly on the two-
mode processing the data are processed through multi-functional objectives. Video filtering using
Gabor filter and feature extraction for dimensionality reduction in ML-DR results in output for
dimensionally reduced feature multimedia dataset which are derived from projection matrix.
3. FEATURE REGROUPING APPROACH
Co-clustering can be simply considered as a simultaneous clustering of more properties of the
dataset. Co-clustering produces a set of clusters of original data and also a set of clusters of their
property. Unlike other clustering algorithms, co-clustering also defines a clustering criterion and
then optimizes it. In a nutshell, co-clustering finds out the subsets of attributes and their
properties simultaneously of a data matrix using a specified criterion. From the summarization
point of view, co-clustering provides significant benefits. Basically, there are two types of co-
clustering approaches, Block co-clustering and Information Theoretic co-clustering. Among the
defined co-clustering logic, the Information Theoretic Co-clustering (ITCC) approach was proved
to be optimal [21]. In the information theoretic co-clustering was based on Bergman divergence.
This ITCC tries to reduce the loss of information in the approximation of a data matrix X. The
reduction is data loss can be minimized through a predefined distance called Bergman divergence
function. For a given co-clustering(R,C) and a matrix approximation scheme P, a class of random
variables which store the properties of attribute data matrix X is defined. The objective function
Algorithm
Input: multimedia databases, multimedia video test sample
Output: the action class for the multimedia sample.
Step 1: multimedia video data is read from the database.
Step 2: The video is resized for uniform dimensions
Step 3: Color video is converted to gray scale video.
Step 4: gray-scale multimedia video is cropped to the area.
Step 5: 2D-Gabor filters and the histogram method is applied to crop video to extract the
multimedia feature.
Step 6: Using the extracted feature ML-DR are processed to reduce dimensionality.
Step 7: The training features available in the database are used for classification using
the SVM classifier.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
68
tries to reduce the attributes loss on the approximation of for a co-clustering R, C. The
Bregman information of X is defined by [21]:
=
[ ]
(1)
Here, the matrix approximation scheme is defined by the expected value and the Bergman
divergence for an optimal co-clustering as follows
, = arg min [ , ] (2)
Here, the Bergman divergence was directly related with Euclidean distance and can be expressed
as
= ! − #
#
(3)
Here, the Bergman divergence was evaluated by just calculating the Euclidean distance between
the attributes magnitude values. The Euclidean distance will give the difference between the
magnitudes of attribute values. Though this approach provides better diagnosis accuracy, it is
almost equal to the diagnosis accuracy of sub-clustering approach. This approach is not
considering the divergence of attribute properties. If the co-clustering was done by considering
the divergence of attribute property, the diagnosis accuracy will be increased further. The
complete detail about the proposed approach is given in next section.
4. RECURRENT FEATURE GROUPING (RFG) AND CLASSIFICATION
In the approach of co-clustering the data from the selected sub-clusters of dataset, the approach of
conventional co-clustering using ITCC was observed to be effective. However, in the approach of
co-clustering operation following ITCC, the clustering are made based on the divergence factor of
two observations ( !, # . the divergence factor is computed using the Euclidian distance based
approach, where two of the magnitudal components are compared to obtain a distance factor. The
convergence is observed when the two observations satisfy a minimum divergence criterion.
Wherein the approach results in a optimal co-clustering of data, the relational property of the data
elements are not explored. However the approach of ITCC does not consider nature of attribute
variation for clustering. Though the data type attribute results in proper defining of nature of
variation for the observing attribute, a Euclidian approach is not appropriate in clustering. Hence
in this proposed method, a weighted ITCC coding for co-clustering of data type is presented. The
approach of co-clustering using weight factor is defined by allocating a weight value to the
distinct class attributes in the sub cluster. Let $% be the allocated weight for each sub cluster %,
where the sub cluster are derived using the method of MLKDR selected feature set, clustered
using the data type property with 5 distinct class of effect,
(4)
Where the class attributes is defined as,
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
69
Table 1: Class label attribute to distinct classes formed
Class Type (&') Associated weight ((')
)**+* 1
,- .+* 2
/)01+* 3
2* +* 4
The allocated weights were assigned based on the action effect of each class attribute. The class
with running condition is given weight ‘1’. In the process of mining the dataset for a information
retrieval, the given query details are mined over the sub cluster dataset based on the correlation
feature set in the recurrent mapped feature sets, derived as outlined in previous section. To
optimize the mining performance, co-clustering approaches were introduced. The dataset is
clustered into sub clusters based on the criterion of Bergman divergence Information ∅ ,
satisfying the convergence problem,
arg min [ ∅ , 4] (5)
Where ∈ % and 4is the new co-cluster formed for the sub cluster %. Here, ∅is defined as the
divergence operator given by the Euclidian distance of the two observing element ( !, # ,
∅ !, # = !, #
#
(6)
Clustering based on the consideration of coefficient magnitudes results in co-clusters based on the
observing values, the relational property among the values and the severity of the class values are
however not been considered. As observed, continuous data types are more effective in decision
deriving than the discrete data type, so an inclusion of type factor and integrating the co-clusters
based on the class attribute leads to finer co-clustering of the dataset. In consideration to the
stated approach, the weighted co-clustering computes an aggregative weight value of each of the
class label ($% given as,
, = ∑ $%
7
%8! (7)
Where, $% is the weight value assigned to a class attribute for * sub classes under observation.
While considering a co-clustering of each class, the aggregated weight value is computed, and the
co-clustering limit of
9:
;
is set. Where, ,< is the total weight values assigned to each class
attribute.
,< = ∑ $%
;
%8! (8)
where N is the number of classes.
The convergence problem is then defined by,
arg min [ ∅ , 4 ] ⇒ min , >
9:
;
(9)
The solution to the convergence problem, is here defined by two objectives, with minimum
estimated divergence based on Euclidian distance, subjected to the condition having a minimum
aggregated weight value lower than half of the total class weight value. A limit of
9:
;
is
8. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
70
considered, to have a co-clustering objective of minimum similar elements entering into a class,
without breaking the relational property among the observing attribute. The process of the
proposed weighted ITCC (W-ITCC) approach is illustrated as follow,
Let, !, #, ?, @ be a sub cluster derived from the selection approach of action model selection
process with data type attribute. The associated weight for each of the class be ,!, ,#, ,?, ,@
assigned with the value of {1,2,3,4}. For this set of dataset, to apply weighted co-clustering
process the following step of operations are applied,
Step 1 : The Bergman divergence for the two observing element is computed using (6).
Step 2 : For each divergence, d∅ x!, x# , the convergence criterion defined by (5) is computed.
Step 3 : The aggregative sum weighted value for the observing class is then computed using (7).
Step 4: The convergence criterion is then checked for the obtained aggregated weight value
following (9).
Step 5 : The above steps are repeated for all sub classes to co-cluster.
In the considered case, the total weight
While co-clustering the elements of the entire sub class is obtained by,
,< = ∑ $%
;
%8! = 10.
Now, for the clustering of two sub classes, !, # and !, @ , the aggregative weight sum is
obtained as,
Considering convergence problem of (9), the limiting value is 10/4 = 2.5.
In this case ,!@ >
9I
;
hence not clustered in the same group, even though the divergence criterion
is satisfied. This results in co-clustering of sub-classes with highest similarity into one class. As,
could be observed, class-1 and class-4 are of more dissimilarity than class 1and 2. The allocated
weight values governed the class relation and the limiting boundness of the total weight value
prevents the wrong clustering of the sub-class. The operational algorithm for the suggested
weighted co-clustering approach is as illustrated below,
5. EXPERIMENTAL RESULTS
The proposed system is developed with Matlab tools and Weizmann data set [7] is tested over.
PCA-based feature dimensions reduction is compared with a comparative analysis of the
proposed MLK-DR. the simulation is carried out to evaluate the performance of the proposed
approach. Weizmann dataset simulation for features extraction for which Histogram is computed
as MI-HIST [21] is set forth in the calculation. Figure.1 illustrates the test dataset is shown as
derived from [22]
9. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
71
(a)
(b)
(c)
(d)
Fig 1: Dataset with (a) Bending (B), (b) Jumping (J), (c) Running (R) and (d) Walking (W) sample
The samples are captured a 180x144 resolution, using static camera with a homogenous outdoor
background.The processing test sample having a running action is illustrated in Fig 2.
Fig 2: Test sample having Running action
The obtained features using MI-HIST [21] is presented in table 2.
Table 2. Comparative Analysis of HoG [22], HIST [23] and proposed MI-HIST [21] for running sample
Observations of Techniques HoG [22] HIST [23] MI-HIST[21]
Original Sample Size 478720 478720 478720
Redundant coefficients 436582 457581 464260
HIST Features for Motion
components
42138 21139 14460
Orignal Sample
10. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
72
MI-HIST Histogram features are used for dimensionality reduction, where PCA, LDA and MLK-
DR are applied over it. In the PCA-based dimensionality reduction technique, the extracted
features and the mean normalized K principal features are selected. On the count of selected
facilities where facilities are used for the inter-class correlation, where LDA coding is applied for
least minimum feature count. However, PCA and LDA dimensionality reduction process for two-
dimensional coding. To further reduce the dimensions of the features, MLK-DR was
implemented. To validate the observation, this section illustrates the details about the result
analysis. The performance of proposed approach was verified through accuracy. The accuracy of
the proposed approach was measured using the standard confusion metrics. The metrics are listed
as True Positive (TP), False Positive (FP), True Negative (TN), False negative (FN). The
complete analysis was carried out on all the classes of action model. Initially, the TP denotes the
true people correctly identified as true, FP denotes the falseclass incorrectly identified as true,
The accuracy at each and every stage was evaluated by the following mathematical expression,
JKK)L-KM =
NOPN;
NOPN;PQOPQ;
(10)
Table.3. Accuracy Comparison for proposed approach with earlier approaches under all classes
compute the measured parameters, a mathematical expressions used are defined as,
The sensitivity is measured as the ratio of true positive (TP) to the sum of true positive (TP) and
false negative (FN).
R2*S+T+U+TM =
NO
NOPQ;
(11)
The specificity is measured as the ratio true negative (TN) to the sum of true negative (TN) and
false positive (FP).
R12K+V+K+TM =
N;
N;PQO
(12)
Precision is the ratio of TP to sum of TP and FP while recall is the ratio of TP to sum of TP and
FN. The following expressions give precision and recall measurements
2K- =
NO
NOPQ;
(13)
WL2K+S+ * =
NO
NOPQO
(14)
F-measure is the combined measure of precision and recall. F-measure is also called as balanced
F-score, the expression is as
X_02-S)L2 =
#∗[<]^^.O`<%a%b7
[<]^^PO`<%a%b7
(15)
Table 3.Parametric evaluation of the developed system for processing efficiency
Input Accuracy (%)
PCA LDA MLK-DR ITCC RFG
Running 68.24 71.52 73.41 77.17 82.16
Walking 67.25 70.42 72.15 78.52 84.68
Bending 69.45 73.21 74.54 79.54 82.24
Jumping 71.25 73.54 76.32 79.12 80.64%
11. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
73
Table.4. performance table for the developed approach
Computation Overhead, convergence time, cluster convergence, classification performance
6. CONCLUSION
A new approach of co-clustering logic based on the relational property of the data attributes and
class values is presented. In the mining process for action model analysis is presented. The
clustering operation based on data type property of continuous and discrete data is used. The sub-
cluster formation results in optimal data clusters with recurrent and continuous feature selection,
in consideration with its data type property. The data type property is observed to be a significant
observation in appropriate clustering of dataset, and shown a higher performance in classification
accuracy. The proposed co-clustering over this sub-cluster improves the performance accuracy in
consideration to class type values assigned to each class level.
REFERENCES
[1] Manjunath T.N, Ravindra S Hegadi, Ravikumar.G.K, “A Survey on Multimedia Data Mining and Its
Relevance Today”, IJCSNS, 2010.
[2] Oge Marques and BorkoFurht, “Content-Based Visual Information Retrieval”, Florida Atlantic
University, USA, 2002.
[3] Christian Beecks Thomas Seidl, “Efficient Content-Based Information Retrieval: A New Similarity
Measure for Multimedia Data”, Symposium on Future Directions in Information Access, 2009.
[4] Ankush Mittal, “An Overview of Multimedia Content-Based Retrieval Strategies”, Informatica 2006.
[5] CătălinaNegoiţă, Monica Vlădoiu, “Querying and Information Retrieval in Multimedia Databases”,
informatica 2006.
[6] Arbee L. P. Chen, Chih-Chin Liu, and Tony C. T. Kuo, “Content-based Video Data Retrieval”, ROC,
1999.
12. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6, November 2017
74
[7] SunithaJeyasekhar and SihemMostefai, “Towards Effective Relevance Feedback Methods in Content-
Based Image Retrieval Systems”, International Journal of Innovation, Management and Technology,
2014.
[8] Yuk Ying Chung, Ka Yee Ng, Liwei Liu, “Design of a Content Based Multimedia Retrieval System”,
Int. Conf. on Circuits, Systems, Electronics, Control & Signal Processing, 2006.
[9] B V Patel and B BMeshram, “Content Based Video Retrieval Systems” IJU, 2012.
[10] Ray-I Chang , Shu-Yu Lin, Jan-Ming Ho , Chi-Wen Fann , and Yu-Chun Wang, “ A Novel Content
Based Image Retrieval System using K-means/KNN with Feature Extraction”, ComSIS, 2012.
[11] Donald A. Adjeroh, “Multimedia Database Management Requirements and Issues”, IEEE, 1997.
[12] Tony C. T. Kuo and Arbee L. P. Chen, “Content-Based Query Processing for Video Databases”,
IEEE 2000
[13] Stephan Repp, Andreas Groß, and ChristophMeinel, “Browsing within Lecture Videos Based on the
Chain Index of Speech Transcription”, IEEE, 2008.
[14] LexingXie, ApostolNatsev, “Tracking Large-Scale Video Remix in Real-World Events”, IEEE 2013.
[15] XaroBenavent, Ana Garcia-Serrano, “Multimedia Information Retrieval Based on Late Semantic
Fusion Approaches: Experiments on a Wikipedia Image Collection”, IEEE 2013.
[16] Ling Shao, “Efficient Search and Localization of Human Actions in Video Databases”, IEEE 2014.
[17] Yushi Jing, Michele Covell, “Learning Query-Specific Distance Functions for Large-Scale Web
Image Search”, IEEE 2013.
[18] Chuanping Hu, “Semantic Link Network-Based Model for Organizing Multimedia Big Data”, IEEE
2014.
[19] Qiusha Zhu, Mei-Ling Shyu, “Sparse Linear Integration of Content and Context Modalities for
Semantic Concept Retrieval”, IEEE 2014.
[20] Solomon Atnafu, Richard Chbeir, Lionel Brunie, “Content-Based and Metadata Retrieval in Medical
Image Database”, IEEE 2002.
[21] Trupti S. Atre, K.V.Metre, “MIRS: Text Based and Content Based Image Retrieval”, IJESIT 2014.
[22] Vinoda Reddy, Dr.P.SureshVarma, Dr.A.Govardhan, “Multi-linear kernel mapping for feature
dimension reduction in content based multimedia retrieval system”, International Journal
ofmultimedia and its application, Vol.8, No.2, 2016.