Multidimensional approach in cbmmirs full paper v4.0
A Multidimensional Approach in Content-based Multimedia Information Retrieval System Indra Budi, Zainal A. Hasibuan, Gema P. Mindara Albaar Rubhasy Faculty of Computer Science Department of Computer System University of Indonesia STMIK Indonesia Depok, Indonesia Jakarta, Indonesiaindra@cs.ui.ac.id, email@example.com, firstname.lastname@example.org email@example.comAbstract— In this digital era, the use of digital multimedia considering the level of human labor and the precision level.information is highly utilized and growing very rapidly due to Therefore, in the early 1980s, content-based informationthe development of the Internet. Thus, users demand for more retrieval (CBIR) was introduced to overcome theeffective content-based multimedia information retrieval disadvantages. However, by nature a multimedia documentsystem (CBMMIRS). The major challenge in this research area may consist of more than one type of content, for exampleis that a multimedia document comprises more than one type text, images, video and audio. Thus, in late 1990s, emerged aof contents (i.e. text, image, audio). In order to address this novel approach which combines the text-based and content-challenge, many works have been focusing on the indexing based retrieval method in order to boost CBMMIRStechniques development which can accommodate multiple performance. Many authors describe such technique as amultimedia object representation or known as object features.However, most of the experiments use only one certain kind of multimodal information retrieval whilst the system indexescollection, for example a collection of WWW pages, video and retrieves using various object representation/modalities,collections, image collections, and so forth. In this paper, we such as text, color, texture, etc. Nevertheless, in manypropose a multidimensional approach which could papers, authors used only one type of multimedia collection,accommodates semantic indexing of various multimedia such as TRECVID for video collection , MIRFLICKR forcontents in different multimedia collections, since the fact is image collection , WIKIPEDIA-MM for world wide webthat different multimedia documents may share similar pages (WWW) collection , and so forth.information. The architecture comprises three components: In this paper, we propose a multidimensional approach(1) collection manager (which manages multimedia documents which accommodates the heterogeneous kind of therepository); (2) indexer (which handles multimedia concept multimedia collections and the variety of multimediadetection and indexing); and (3) query processor (which deals contents (i.e. textual, visual, and audio). The goal of thiswith query and search results). Our hypothesis is that the more approach is to achieve the completeness of information,complete the document (which indexed in many different means that the most relevant information must be availablefeature spaces), the more relevant the document and should be in many type of contents. Even though this approach mightranked higher in the search results. be fruitful, but there exist a constraint in context of applying a number of objects features. In this case, excessive use of Keywords- CBMMIRS, multimedia information retrieval, object features in indexing may lead into a poormultidimensional approach performance, due to the famous ‘curse of dimensionality’ problem . As the dimensionality of feature space I. INTRODUCTION increases, the performance of indexing algorithms will With the development of the Internet, the use of digital degrades. Research showed that when the dimensionality ismultimedia information (including audio, video, images and above 10, the performance is no better than a simplegraphics) is growing rapidly and has plays an important role sequential scan .in modern life. Most of the multimedia files were published This paper explores a multidimensional approach inand distributed in various formats via the social media within CBMMIRS. The rest of the paper is organized as follows. Inthe Internet for instance Facebook1, Flickr2, Youtube3, and so Section 2, we show some works related to this paper. Sectionforth. As a result, there is an explosion of digital multimedia 3 focuses on the multidimensional approach in CBMMIRSobjects and users demand for more efficient yet accurate using high dimension of feature spaces with various type ofcontent-based multimedia information retrieval system collections. Section 4 concludes this paper and in this section(CBMMIRS). we also discuss the future works that will be conduct. Due to the large and varied digital multimedia collection,a text-based retrieval system is considered to be inefficient 1 http:// www.facebook.com 2 http:// www.flickr.com 3 http://www.youtube.com
CBMMIRS are no longer an ideal method. Currently, most II. RELATED WORKS recent works uses the scale-invariant feature transform The building block of a CBMMIRS comprises three (SIFT) which based on common grounds and successfullyessential processes: (1) multimedia feature extraction; (2) applied in many projects [11, 13]. SIFT could detects andconcept detection; and (3) indexing process. Each of these provides descriptions of some points from image whichprocesses will be discussed in the following parts. produces more information than the other feature-based methods. There are also few criteria of the detection: localA. Multimedia Feature Extraction contrast, local maxima/minima of certain functions (e.g. Feature extraction is one of the major tasks that laplacian, gradient, etc.) and threshold over a curvaturedetermine the performance of a CBMMIRS . Thus far function (e.g. harris, hessian, etc.). Next, we briefly explainmany techniques are available to generate representation of concerning audio feature extraction.multimedia content which may comprises the combination of 3) Audio Feature Extractiontext, visual (i.e. image), and audio. Next we break down fewstate-of-the-art feature extraction techniques in three Many works have been focusing on structured audiodifferent types of multimedia contents. analysis such as speech or music. Only few system have been proposed to analyze on unstructured audio. One of the1) Textual Feature Extraction popular models is the mel-frequency cepstral coefficient The fundamental of text indexing scheme was proposed (MFCC). MFCC features are modeled based on the shape ofby Salton and McGill with the popular tf-idf scheme . the overall spectrum, making it more favorable for modelingThis technique chooses a basic vocabulary of “terms” or single sound sources. On the other hand, an environmental“words” and counts the number of occurrences of each term. sound comprises more than one source of sounds. In orderAfter that, this term frequency count is compared with an to tackle this issue, the matching-pursuit (MP) techniqueinverse document frequency count. As a result, the tf-idf was proposed. MP provides an efficient way of selecting ascheme reduces documents length to fixed-length lists of small basis set that would produce meaningful features asnumbers. However, the dimension reduction of this scheme well as a flexible representation . It is potentiallyis considered to be insignificant. The most distinguished invariant to background noise and could captureapproach to tackle this issue is the latent semantic indexing characteristics in the signal where MFCC fails. This ends(LSI) approach. LSI uses a singular value decomposition of our discussion regarding multimedia feature extraction inthe X matrix to identify a linear subspace in the space of tf- three different types of multimedia contents. In the nextidf features that captures most of the variance in the part, we focus on the audio visual concept detectioncollection . Later, a major breakthrough was introduced techniques.by Hofman with the probabilistic LSI (pLSI) model. This B. Audio Visual Concept Detectionapproach models each word in a document as a sample froma mixture model, where the mixture components are Multimedia concept detection is considered as one waysmultinomial random variables that can be viewed as in reducing semantic gap. Reference  provides anrepresentations of “topics” . But, all these two models example of a detection model which links each topics with(LSI and pLSI) are based on the “bags of words” one or more visual concepts, known as the Visual Conceptassumption that the order of words in a document could be Detections (VCDT). However, works have been focusingignored. In order to mix the models that capture the only on the visual concept and few on the audio visualexchangeability of both words and documents, the latent concept detection. One of the examples that used both visualDirichlet allocation (LDA) model was proposed . Up till and audio content could be found in . In this work, thenow, this model is widely used by many authors in their IR authors provided an approach to semantically detectresearches. Next, we discussed the image feature extraction. concept(s) from a video collection. However, the audio detection is only classified into speech and instrumental,2) Image Feature Extraction rather than to detect the environmental sounds. This issue There are many ways to generate image representation needs to be more explored more thoroughly in order tointo feature vectors. The traditional method is using image improve the CBMMIRS understandings of concepts existinghistogram. This method was successfully implemented in a in a multimedia document.large scale gallery and museum in Europe . However,this method discards all information regarding spatial C. Multimedia Concept-based Indexingdistribution of color and reduces the signature efficiency The Multimedia concept-based or semantic-basedwhich has been a major flaw . Then, other techniques indexing approach is depends on the fusion of the concepts,were being studied, such as using color, texture, shape, and which many works uses kernel-based classifier (e.g. supportmany other features. Nevertheless, most of them could not vector machine or SVM). Basically, there are two fusionovercome the challenging fact in image extraction which is strategies available: early fusion and late fusion. Earlythe extraction of an image regardless if it were obstructed, fusion method integrates the different modalities, previouslyrotated, and so forth. As a result, using image features in feature from different modalities have been fused then
search algorithm execute on the representation of the new Fig. 1 shows the proposed multidimensional CBMMIRSfusion. On the other hand, late fusion will characterize architecture which adapted from . The system comprisesmultimedia content which employs multiple features. Using three components as follows:this scheme, different rankings referred to data fusion or • Collection Manager (CM): this component is in chargerank aggregation could be combined. Nonetheless, it is with collecting and managing multimedia documentspossible and promising to merge these two schemes, from various types of multimedia document collectionswhereas the early fusion is based on low or intermediate- that we aim to index, searches, and retrieve by thelevel features and the late fusion merges unimodal Indexer and Query Processor. The documents fromclassification scores of high-level features . different types of collections such as video, image, WWW, and other multimedia collections are stored in a III. THE PROPOSED MULTIDIMENSIONAL APPROACH repository along with their metadata which provide information about the documents. CM also includes the We discover that many works in CBMMIR research area administrator user interface with the intention thatare involving with just one type of multimedia collection, for he/she is capable in administering the documentexample video or image collection for sequentially content- collections.based video or image retrieval system. Here we propose a • Indexer (IX): This component is responsible ondifferent approach whereas involving with different kind of generating and maintaining data structures thatfeatures from various type of multimedia collection (e.g. represents one type of multimedia document feature (i.e.video, image, WWW, and other type of collections) in order text, image, and audio) so called index in order toto achieve the completeness of information. Inspired by  provide searching capabilities. IX exploits thewhich uses three different components of documents in order documents collected by CM for indexing processes. Theto elevate retrieval performance; we propose a similar indexing process involves feature extraction methods formultidimensional strategy which is applied in multimedia each and every type of feature as follows: (1) in textdocuments which also have several types of components. feature extraction, we suggest using LDA; (2) in imageThe proposed multidimensional approach is depicted in feature extraction, we use SIFT; (3) in audio featureFigure 1. extraction, we intended using MP technique. In our system design, the indexing process involving a multimedia concept-based indexing which depends on the robustness of multimedia concept detection method. User Interface • Query Processor (QP): this component is responsible Multimedia Concept-based Query for handling query and search results. QP provides user interface for multimedia concept-based query. The Query Processor concept-based query interface differs from a search tools such as Google 4 since it allows users to resolve the Multimedia Concept- Multimedia Search naming heterogeneity that occurs when the identical based Matching Process Results concept is described using different terms. The research issues that may occur in our works are Indexer stated below: Multimedia • Feature extraction techniques. The extraction Multimedia Concept- Concept- based Indexing techniques that we mentioned earlier, such as LDA, Based Index SIFT, and MP, are the state-of-the-art feature extraction methods. Nevertheless, finding the ‘right combination’ is one of the main problems. What feature of a Multimedia Concept Detection Process Training Dataset multimedia object should we choose and what extraction technique we prefer for each feature in order to increase the CBMMIRS performance is still remains a Text Image Audio challenging research area. Feature Extraction Feature Extraction Feature Extraction • Multimedia concept detection method. Many works have been done to automatically detect multimedia concepts. Collection Manager However, the generic concept of a multimedia object, including audio visual collections, has not been explored comprehensively. The standardized visual concepts are available, such as Wiki concepts and Visual Concept Digital Object Metadata Detection topics. In contrast, the standardized concept Video Image WWW Other Repository Repository for audio is not in place. Yet, we ought to explore more Collection Collection Collection Collection in multimedia concept detection method in order to Figure 1. Proposed multidimensional approach in CBMMIRS 4 http://www.google.com
accommodate audio visual features of a multimedia  N. Rasiwasia, J. C. Pereira, E. Coviello, and G. Doyle, “A object. New Approach to Cross-Modal Multimedia Retireval”,• Multimedia concept-based matching process. In the Proceedings of the International Conference on Multimedia, matching process, we propose a different way in ranking October 25-29,2010, ACM New York, USA, ISBN: 978-1- retrieved documents. Our hypothesis is that the more 60558-933-6, DOI: 10.1145/1873951.18739870. complete documents which available in many different  R. Weber, H.-J. Schek, S. Blott, “A quantitative analysis and feature spaces, the more relevant the document. performance study for similarity-search methods in high- Therefore, such documents should be weighed more in dimensional spaces”, Proceedings of the 24th VLDB order to raise the rank. This hypothesis has to be proven Conference, New York, USA, 1998, pp. 194–205. in experiment that will be performed in the next phase of  M. M. Rahman, B. C. Desai, and P. Bhattacharya, “A Feature this work. Level Fusion in Similarity Matching to Content-based Image• Multimedia concept-based query interface. As stated Retrieval”, Information Fusion, 2006. earlier, concept-based query interface differs from a  G. Salton and M. McGill, “Introduction to Modern general search tools. The issue of this research area is to Information Retrieval”, McGraw-Hill, 1983. minimize the ambiguity of different terms with similar  S. Deerwester, S. Dumais, T. T. Landauer, G. Furnas, and R. concept. Harshman, “Indexing by Latent Semantic Analysis”, Journal of the American Society of Information Science, 41(6):391- IV. CONCLUSION AND FUTURE WORKS 407, 1990. This paper proposes a multidimensional approach in  T. Hofman, “Probablistic Latent Semantic Indexing”,CBMMIRS which can accommodate various types of Proceedings of the Twenty-Second Annual Internationalmultimedia object features (i.e. text, image, and audio) in SIGIR Conference, 1999.numerous multimedia document collections. In our design,  D. M. Brei, A. Y. Ng, and M. I. Jordan, “Latent Dirichletthe system comprises three components: (1) collection Allocation”, Journal of Machine Learning Research 3, 2003,manager (which responsible in storing multimedia document pp. 993-1022.collections); (2) indexer (which responsible in extracting and  P. H. Lewis, K. Martinez, F. S. Abas, M. Faizal, A. Fauzi, S.indexing document features in order to be searched by user); C. Y. Chan, M. J. Addis, M. J. Boniface, P. Grimwood, A.and (3) query processor (which responsible in managing Stevenson, C. Lahanier, J. Stevenson, “An Integrated Contentqueries and search results). We also identify few research and Metadata Based Retrieval System for Art”, Journal IEEEissues in these three CBMMIRS components. Nevertheless, Transactions on Image Processing, vol.13, Marchfurther experiment needs to be conducted not only to test the 2004, pp.302-313.retrieval performance, but also to prove our hypothesis,  E. Valle, M. Cord, and S. Philipp-Foliguet, “Content-basedwhich is that the more complete the document (which Retrieval of Images for Cultural Institutions using Localindexed in several different feature spaces), the more Descriptors”, Proceedings of Geometric Modelling andrelevant the document compare to the others which only Imaging — New Trends — GMAI 2006, London England,indexed in only one feature space. Thus, such documents July 05–06, 2006, DOI: 10.1109/GMAI.2006.16..should be place in the top list of the search results.  M. Kampel, R. Huber-Mörk, M. Zaharieva, “Image-Based Retrieval and Identification of Ancient Coins”, Journal IEEE ACKNOWLEDGMENT Intelligent Systems, Vol. 24 Issue 2, March 2009 IEEE Educational Activities Department Piscataway, NJ, This paper was fully supported by DRPM UI Research USA, pp.26-34, DOI: 10.11109/MIS.2009.29.Grant under contract Number 1198/SK/R/UI/2010 (researchproject on Indonesian e-Cultural Heritage and Natural  S. Chu, S. Narayan, and C.-C. J. Kuo, “Environmental SoundHistory Framework). Recognition Using MP-based Features”, Proceedings of International Conference on Accoustics, Speech, and Signal REFERENCES Processing, 2008. M. J. Huskes and M. S. Lew, “The MIR Flickr Retrieval  Z. Zhao and H. Glotin, “Concept Content Based Wikipedia Evaluation”, MIR ’08 Proceeding of the 1st ACM WEB Image Retrieval using CLEF VCDT 2008”. International Conference on Multimedia Information  M. Rautiainen, T. Seppänen, J. Penttilä, and J. Peltola, Retrieval, ACM New York, USA, 2008, ISBN: 978-1-60558- “Detecting Semantic Concepts from Video Using Temporal 312-9, DOI:10.1145/1460096.1460104. Gradients and Audio Classification”. A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation  S. Ayache, G. Qu´enot, and J. Gensel, “Classifier Fusion for Campaigns and TRECVid”, MIR ’06 Proceedings of the 8th SVM-Based Multimedia Semantic Indexing”. ACM International Workshop on Multimedia Information  Z. A. Hasibuan, “Multi Dimensions Concept-based Retrieval, ACM New York, USA, 2006, ISBN: 1-59593-495- Information Retrieval System”, Proceedings of ALL/ACH 2, DOI: 10.1145/1178677.1178722. 2000 Conference, Glasgow, UK, 2000. A. Popescu, T. Tsikrika, and J. Kludas, “Overview of the  Z. A. Hasibuan, A. Kurniawan, and R. Budiarto, “Multi- Wikipedia Retrieval Task at ImageCLEF 2010”, Working Format Concept-Based Information Retrieval using Data Notes of the ImageCLEF 2010 Lab, Padua, Italy, 2010. Grid”, Journal of Advanced Computing and Applications Vol. 1 No. 1, 2009, pp. 1-11.