Cccnc using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

684 views

Published on

Recommendation systems provide suggestions based on information about the preferences of users. The filtering information is used by recommender systems for the processing of information and suggestions to users and content-based filtering is an approach to filtering information widely used in recommender systems. Content-Based Filtering on analyzing the correlation of the content of items with the profile, suggesting relevant items and discarding the irrelevant. Widely used on the Internet, recommendation systems are being studied for use in the context of Digital TV, there are already several studies in this direction. Just as occurs on the Internet, recommendation systems can be used in Digital TV for recommendation of TV programs, advertising and publicity and also electronic commerce. Thus, the items in the context of digital TV, may be programs, publicity / advertising and the products to be sold. Applying Content Filtering Based on the recommendation of programs, for example, it should correlate the content of these programs with user preferences, which in this scenario are the types of programs he has preferred to watch. This paper presents the studies performed with Content Filtering Based on Data Applied to Digital TV. The studies seek to observe and evaluate how some techniques of content-based filtering can be used in recommendation systems in the context of Digital TV

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
684
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cccnc using content-based filtering in a system of recommendation in the context of digital mobile interactive tv

  1. 1. Using Content-Based Filtering in a System of Recommendation in the Context of Digital Mobile Interactive TV Elaine Cecília Gatto, Sergio Donizetti Zorzo IEEE Conference Publishing Computer Science Department. Federal University of São Carlos – UFSCar Highway Washington Luís, Km 235, PO Box 676, 13565-9. São Carlos, São Paulo, Brazil. {elaine_gatto, zorzo}@dc.ufscar.br. Abstract-Recommendation systems provide suggestions based This work is organized as follows: section 1 introduces theon information about the preferences of users. The filtering paper, section 2 presents a comparison between digital TV andinformation is used by recommender systems for the processingof information and suggestions to users and content-based portable devices for homes, section 3 presents related works,filtering is an approach to filtering information widely used in section 4 talks about content-based filtering, section 5 presentsrecommender systems. Content-Based Filtering on analyzing the our recommendation system, Section 6 talks about thecorrelation of the content of items with the profile, suggesting characteristics of households, the EPG, the user history andrelevant items and discarding the irrelevant. Widely used on the methodology used for the tests, Section 7 talks about theInternet, recommendation systems are being studied for use inthe context of Digital TV, there are already several studies in this results and Section 8 presents the conclusion.direction. Just as occurs on the Internet, recommendationsystems can be used in Digital TV for recommendation of TV II. COMPARING IDTV IN RESIDENCES AND IDTV FOR CELLprograms, advertising and publicity and also electronic PHONEScommerce. Thus, the items in the context of digital TV, may beprograms, publicity / advertising and the products to be sold. The use of IDTV for cell phones will quickly boom due toApplying Content Filtering Based on the recommendation of the increasingly quantity of these devices surpassing televisionprograms, for example, it should correlate the content of theseprograms with user preferences, which in this scenario are the sets in Brazil, when cell phones with IDTV are available totypes of programs he has preferred to watch. This paper presents population. Thus, some differences between IDTV forthe studies performed with Content Filtering Based on Data residences and for cell phones can be noticed.Applied to Digital TV. The studies seek to observe and evaluate IDTV standard adopted in Brazil calls full-seg the fixedhow some techniques of content-based filtering can be used in devices like set-top-box, and one-seg, devices like cell phones,recommendation systems in the context of Digital TV. miniTVs, PDAs, etc. In residences, the IDTV is used by all I. INTRODUCTION residents while in the cell phone it is normally used by only one user, the owner of the device. Digital TV implementation in Brazil provides new markets Another characteristic is the size of the display. Inwhich can be explored. Well-succeeded technologies as those residences, the IDTV television sets have screens bigger thanin Web environment, for example, can be applied in Digital 30”, where is possible to have a more flexible development,TV domain and achieve the same success. presentation and displaying of the content. However, cell The interaction either through the remote control or the cell phones screens are smaller than 10" requiring a higher effortphone keyboard etc by the user today, will allow many in development to display the content on the screen avoidingapplications to be carried to this environment. image pollution and confusion to the user. One of the areas which has been extensively studied and is An exceptional characteristic in this environment is thatwell-succeeded in the Web is that of personalization. There IDTV for cell phones can be seen anywhere and anytime. Onare some surveys concerning recommendation systems for the other hand, IDTV viewing period in residences can beDigital TV as for example [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] among longer than in cell phones which are used in situations ofothers. waiting and displacement. Recommendation systems can contribute to a better use of IDTV in cell phones can use already existent 2G/3G netDigital TV in residences, in groups or individually, in a cell architecture, and 4G in the future, as a return channel, makingphone, for example. These systems can help the user to choose interactivity possible in this environment before occurring inthe program, avoiding waste of time and of course, suggesting IDTV.to the user programs which really interest him. Moreover, The middleware adopted in Brazil has national technologyrecommendation systems can be applied to publicity and and is called Ginga. Ginga-NCL and Ginga-J declarative andadvertisement on Digital TV, as well as in the T-Commerce. imperative portions of the middleware are necessary for full-
  2. 2. seg devices. For one-seg devices, only Ginga-NCL declarative phones is not necessary to provide recommendation and,portion is required. There is a reference implementation of the consequently the need for remote communication is also notmiddleware for full-seg devices. For one-seg devices, this necessary, avoiding that the user pay for the data traffic in thereference implementation is not available yet, but are working net in order to receive recommendation or send his data, andin this middleware development, as PUC-RIO and UFES thus, protecting the user’s data privacy.(Symbian e Android). [11, 12, 13] IV. CONTENT-BASED FILTERING The users of these devices need special attention due tocurrent characteristics of this environment like processing Content-Based Filtering (CBF) uses the content attributes topower, storage capacity and battery. describe the content of the items and then calculate the similarity. This approach does not depend on other users’ III. RELATED WORK evaluation about the items. CBF is an information recovery There are many works involving recommendation systems technique which bases its forecast on the fact that previousfor IDTV for set-top-box and more recently for portable preferences of the users are reliable indicators for futuredevices. This section presents two recent works about behavior. In order to formulate recommendations, a variety ofrecommendation systems for IDTV. algorithms has been proposed to evaluate the content of In [5] the recommendation system fits the systems with documents and find regularities. Some of these algorithmscontent-based filtering category, using text mining. The operate with classification knowledge and others operate withsystem uses a simple interface with the user and accepts a the problem of regression. Some of the problems andnatural language as text entry as well as four values which limitations found in systems using CBF are superreflect user preferences for comedy, action, horror and erotic. specialization, the problem of the new user and the analyses ofFirst, the system extracts texts and then searches for emotions limited content. [7, 6, 14]in the text and the distances among themes are calculated. V. RECOMMENDER SYSTEMFinally, an index is calculated for each entry and a list ofprograms organized by this index is returned. Our recommendation system aims to facilitate the IDTV In [3] the main aim of the system is substituting the user’s routine by interacting with a simple interface whichcommon content by a personalized and adapted content in a provides content of preference without spending so much timemore attractive way for the user. Therefore, this system to find it.accepts and allows TV reception either through broadcast, or The process starts when the user turns on the TV in the cellmultimedia streaming. The system uses explicit collection – phone. The user history data collected is submitted towhen using for the first time it is necessary to inform the information filtering based on content in order to find thepreferences – and also implicit collection – user’s actions in user’s profile. Data resulting from this process are formatted.the device are monitored, stored and sent to the server. The The user profile is stored in a database with the date and timepersonalized content– chosen based on preferences – is sent to of the generation. With the user profile updated it is possiblethe user’s portable device by the Server in order to be to look in the EPG for compatible TV programs and which arepreviously stored before being exhibited. going to be transmitted around the current time, providing a The ZapTV [4] developed for DVB-H standard allows the list of these programs. This list is also stored in a data baseuser to create his own content, offering aggregated value with the date and time of generation.services as multimodal access (Web and Cell phones), return The recommendations are presented to the user and thosechannel, video note, personalized sharing and distribution of required are stored with the user history. During the time thecontent. Besides the technology provided by DVB-H, ZapTV IDTV for cell phone is turned on, all programs viewed by thecomprehends other technologies as TV-Anytime, user are stored in the database which has the user history. ThisTechnologies emerging from Web 2.0 and involved in the process is repeated every time the user turns on the TV.Semantic Web. The main functionalities of ZapTV include a Ginga-NCL middleware has a layer for resident applicationssocial net, personalized content broadcasting (implicit or responsible for exhibition, other layer for the common centerexplicit recommendation), thematic channels diffusion responsible for offering several services and a last layerplanning (age-group, genre or specific theme), client regarding the protocols stack.application and transmission of the electronic programming The recommendation system is considered as an element inguide. ZapTV seeks to improve the recommendation using an Ginga-NCL architecture, in Ginga Common Core, due to theintelligent personalization mechanism which matches need to continue using data locally and also use Tunnerinformation filtering with semantic logic processes and it was libraries – in order to obtain information about the channelsbased on the principles of participation and sharing between tune – ESInformation – in order to obtain information aboutWeb 2.0 users, so that the creation, sharing, classification and EIT table generating the EPG – and Context Manager – tonote of content make the search for content easier. obtain system information. Our recommendation system is in the portable device, and As GINGA-NCL middleware is mandatory in Brazil forthe inclusion of servers in Brazilian IDTV architecture for cell portable devices, the recommendation system was planned,
  3. 3. designed and modeled according to Brazilian rules which refer standard ABNT NBR 15603-3:2007, Annex C, "genderto portable devices, thus meeting these devices needs. More descriptor in the descriptor content." [17]details on our system can be obtained in. [15] A new table was created, identical to the EPG table, but VI. TESTS with added fields with the names of genre, to be used with the technique of the cosine. These fields were populated with 0 or For the tests we used the data corresponding to TV viewing 1 depending on the program or not fit in that genre, becomingand program schedule. These data were provided by IBOPE a matrix.which is a Brazilian multinational private equity firms and aleading market research in Latin America. 67 years ago to D. User HistoryIBOPE provides a wide range of information and media Historical data viewing of users are needed for thestudies, public opinion, voting intention, consumption, brand discovery of these preferences. In the context of digital TVand market behavior. In the following subsections the that we are considering, these data are collected and storedcharacteristics of these data and the tests will be detailed. [16] implicitly. Spreadsheets sent by tuning IBOPE, which contains userA. Characteristics of Residences data, were modified so that filtering techniques based on Data that contain information provided IBOPE EPG (TV content could be applied.programming), history of the users view (what the viewer saw) The data were then separated by households and althoughand also the socioeconomic information. All these data were some homes have more than one TV in these households itseparated and stored in MySQL database the data correspond was noticed that there is no record of monitoring more thanto 15 days of programming and monitoring of six Brazilian one TV at the same time and therefore it is considered that thehouseholds with TV programming Open. These households household has only one TV.were monitored every minute, and each individual was also Data were also formatted: date in yyyy-mm-dd, time inmonitored separately. hh:mm:ss format TV and 00X. The resulting sheets were converted to CSV files that were then inserted in MySQL TABLE I NUMBER OF INDIVIDUALS BY RESIDENCE were also added to user data, information for day of week, time of day and duration of display. Residence 1 2 3 4 5 6 Individuals 2 3 3 2 2 3 E. Methodology TVs 1 1 2 2 1 2 We simulate the Content Filtering Based on using twoB. Characteristics of Date different techniques, the Apriori and cosine, both using as a The data used for these tests have undergone a process of target attribute to gender. In the case of Apriori, we apply themanual adjustment. For each of the algorithms used, was a settings shown in Figure 1.necessary pre-processing for correct use and analysis. Then start the simulation for Apriori. For each household,Subsections C, D, E and F detail the composition of these data. the process was the same on the first day to generate recommendations for the second day, the second day, basedC. EPG on what was seen the day before and the present day, to The EPG (Electronic Program Guide) is composed of 15 generate recommendations for the third day, and so on.TXT files called programming files, one for each day First we opened the CSV file corresponding to the X home(05/03/2008 to 19/03/2008) with a grid of 10 TV stations and day 1. After convert some attributes String to NumericOpen, starting at 00:00:00 and ends at 05:59:00. After Nominal and others for Nominal (applying filters). Then theunderstanding the files that make up the EPG, the data were executable Apriori and ultimately save the output. With thecopied from the archives of programming a spreadsheet data saved, it was possible to assess whether the day after,BrOffice and then was done cleaning up unnecessary data. someone from the household attended any of the genera found We noticed some inconsistencies in schedules, which were by Apriori.immediately corrected so that future analysis will not generate To find the cosine first and save the profile, then calculateerroneous results. This was repeated for each of the 15 the distance of the cosine, cosine and the standard itself andprogramming files, generating a single spreadsheet containing finally found the right answers. The process is iterative and isthe entire 15 days of EPG. performed for each day and each household. Was added to these data the day of week and duration of theprogram. The EPG, this step is not complete, missing the F. Apriori e Cosinegenre and subgenre of each program. Searched for it on the The algorithms of association techniques identifyofficial websites of the gender of each station broadcast associations between data records that are somehow related.programs and then identified according to the Brazilian The basic premise is evidence on the presence of others in the same transaction, to determine what things are related.
  4. 4. Moreover, we also calculated the average percentage of correct answers for the number of recommendations generated using the following formula: (2) Figure 2 and Figure 3 show the chart with the average achieved for the percentages calculated for each hit home for the techniques of the Cosine and Apriori. Households 2, 4 and 6 had the best results for the cosine and the Apriori households 4:06. As can be seen in Figure 4, which presents the comparison chart between the two techniques in general, the cosine has outperformed over the Apriori. Figure 1. Parameters used in Apriori. The association rules interconnect objects in an attempt toexpose patterns and trends. The discovery of associations mustshow both associations as trivial associations not trivial. The Apriori algorithm is often used to mine associationrules. Apriori can work with a high number of attributes,generating various combinations between them andperforming successive searches across the database,maintaining optimum performance in terms of processing time. Figure 2. Parameters used in Cosine. The algorithm tries to find all the relevant association rulesbetween items, which has the form X (history) ==> Y(consequent). If x% of transactions that contain X also containY, then x% represents the factor of trust (under the rule ofconfidence). The support factor is a measure representing x%of cases in which X and Y occur simultaneously over the totalnumber of records (often). [18] The cosine is a measure of similarity, a metric that can beapplied to find out if an item has a correlation or not with theuser profile. A binary vector is a set of two elements, x and y.In an n-dimensional space, where n is the number of items inthe vector. You can therefore calculate the cosine between thevectors, measured as similarity between the user profile and its Figure 3. Parameters used in Apriori.history. The similarity is high when the value of cosine is high,the closer to 1, the greater the similarity. [19] VII. RESULTS For all households were made in excel spreadsheets toaccount for the percentage of success of each technique toeach household. As the number of recommendations weregenerated in May, then the basic formula for calculating thepercentage of correct answers was used: (1) Figure 4. Comparison between means of Apriori and Cosine.
  5. 5. VIII. CONCLUSION ACKNOWLEDGMENT During the tests, we observed some peculiarities. Our We thank IBOPE for providing real data about thesystem recommends content based on the kinds of programs, electronic program guide and also the viewer’s behavior dataand our analysis were made according to that parameter. With from March, 05, 2008 to March, 19, 2008.the Apriori algorithm, the format of the data is already REFERENCEScollected correctly for use. For cosine, the EPG needs to bechanged to an array before starting the discovery process [1] Avila, P. M. TV Recommender: Application Development Support of Recommendation for the Brazilian System of Digital TV. Dissertation.profiles and recommendations. Graduate in Computer Science. Department of Computer Science. The Apriori is able to mine only the user viewing history, Federal University of Sao Carlos. 90 pages, 2010.discovering your profile from the rules. To select the programs [2] Lucas, A. Customization for Digital TV using the strategy of Recommender System for multiuser environments. Dissertation.to be recommended, another technique should be used. The Graduate in Computer Science. Department of Computer Science.cosine can do both. However, Apriori can discover more Federal University of Sao Carlos. 103 pages, 2009.features in the user history, for example, "the user stands in [3] Uribe, S. et al. Mobile TV Targeted Advertisement and Content Personalization. In 16th International Workshop Conference on Systems,front of the TV more often at night, he enjoys watching Signals and Image Processing, Chalkida, Greece, 18-19/06/2009.movies and watching TV more frequently in the second." [4] Solla, A. G. et al. ZapTV: Personalized User-Generated Content for The cosine cannot find these features, but can reach our Handheld Devices in DVB-H Mobile Newtorks. In Proceedings 6th European Interactive TV Conference, p.193-203, Salzburg, Áustria, 03-goal. To find similar patterns in association rules, it is 04/07/2008.necessary to use more complex queries to the bank. [5] Bär, A. et al. A Lightweight Mobile TV Recommender: Towards a One- The output from the Apriori must be crafted to generate the Click-to-Watch Experience. In Proceedings 6th European Interactive TV Conference, p.142-147, Salzburg, Áustria, 03-04/07/2008.correct profile of the user, ie, the rules should be interpreted, [6] Einarsson, O. P. Content Personalization for Mobile TV Combiningwhich in terms of implementation becomes somewhat Content-Based and Collavorative Filtering. Master Thesis. Center forcumbersome. The cosine output is more readable, its result Information and Communication Technologies. Technical Univesity of Denmark. August 22, 2007goes straight to the intended goal, allowing the output to be [7] Chorianopoulos, K. Personalized and mobile digital TV applications. Inused without the need for a post-treatment. Proceedings of the Multimedia Tools and Aplications, p. 1- 10, vol.36, In relation to entry to the Apriori there is no need of 27 January 2007. [8] Choi, J. Y.; Koh, D.; Lee, J. Ex-ante simulation of mobile TV markettreatment, since the data will be used the way they are based on consumers’ preference data. In Proceedings of thecollected. But for the cosine, where the EPG is updated, the Technological Forecasting & Social Change, p. 1043-1053, 2007.table containing the matrix of the EPG should be modified [9] Yu, Z. et al. TV program recommendation for multiple viewers based on user profile merging. In Proceedings of the User Model User-Adap Inter,according to the new EPG, becoming somewhat laborious. p. 63-82, 2006. Then simulated with both techniques the process of delivery [10] Das, D. and ter Horst, H. Recommder Systems for TV. In Proceedingsand acceptance of recommendations by calculating the of 15 th AAAI Conference, Madison, Wisconsin, July 1998. [11] Ginga. Disponível em: <http://www.ginga.org.br/>, Acessado em 06 depercentage of correct and generating graphics. The profile of janeiro de 2010. http://www.ginga.org.br/genres found by both algorithms are similar. Although both [12] Ginga-NCL. Disponível em: <http://www.ginga.org.br/>, Acessado emtechniques to cover the needs of the system, the cosine is one 06 de janeiro de 2010. http://www.gingancl.org.br/ [13] Ginga-J. Disponível em: <http://www.ginga.org.br/>, Acessado em 06that can be better utilized. de janeiro de 2010. http://dev.openginga.org/ On a desktop, as was the case of our tests, the return of [14] Pazzani, M. J. A framework for Collaborative, Content-Based andcalculating the cosine is faster in relation to the return of the Demographic Filtering. Artificial Intelligence Review, p. 393-408, December 1999.Apriori association rules. However, further studies on the [15] Gatto, Elaine C., Zorzi, Sergio D. Recommender System for Digital TVconsumption of processing of these algorithms in a cell with Portable Interactive Brasileira. In 8th International Information andTVDI is not yet possible in Brazil. The time that the whole Telecommunication Technologies Symposium - December 09-11, 2009. Florianopolis, Santa Catarina, Brazil.process of recommendation takes to complete varies according [16] IBOPE. Disponível em <www.ibope.com.br>to the technique of customization to be used. In our tests and [17] ABNT NBR 15603-2. Digital terrestrial television – Multiplexing andsimulations, the cosine ends the process before the Apriori. service information (SI) Part 2: Data structure and definition of basic information of SI. Studies show that although both algorithms meet our needs, [18] Witten, I. H.; Frank, E. Data Mining: Practical Machine Learning Toolsthese two techniques the Cosine can be better worked in the and Techniques, 2nd Edition, Morgan Kaufmann, 525 pages, June 2005.recommendation system for TVDPI. [19] Cristo, M. Sistemas de Recomendação, Métodos e Avaliação. 81 slides. 2009.

×