ZieOok (‘AlsoSee’)building a generic recommendation framework for the cultural heritage ﬁeld by Siem Vaessen - managing partner @ Zimmerman & Zimmerman Berlin Buzzwords 2011
About the images for the futurePreserving audiovisual heritage of the Netherlands through conservationand digitization;Seven year project , started in 2007, will end in 2014Budget of €154 million;During the project, a total of 137.200 hours of video, 22.510 hours of ﬁlm,123.900 hours of audio, and 2.9 million photos from these archives will berestored, preserved, digitized, and disclosed through various services. So what to do with all this data? besides digitization...
Current status + loads of interfaces, applications and tools built on top of this content more info @ http://imagesforthefuture.com/en/
Main purpose ZieOok (‘AlsoSee’)“To create meaningfull relations between assets and users by means of a recommendation engine” (june 2009)Build an API which will fully function based on REST calls on top of theMahout/Hadoop setup;Develop a recommendation framework based on an existing framework;Develop an administrator dashboard: a central hub for controlling maincomponents of the recommendation framework (GUI);Code developed within ZieOok needs to become open-source.
The ‘market-analysis’Identify codebase that is suitable for the project;Make sure that codebase is sustainable.Question: can a semantic correlation be established within the project? 1. Lexicon- or ontology based (connecting Thesauri); 2. A Trust network based sytem based on the FOAF (Friend of a Friend) speciﬁcation; 3. Context-adaptable system that extracts addtional information from the lexicon or the ontology. Two frameworks identiﬁed “Duine Framework is a (collection of) software libraries that allows developers to create prediction engines.” Telematica Instituut/Novay / version 188.8.131.52 RC1 (17/2/09)
Apache Lucene Mahout (fka: Taste)At that time version 0.2;An Apache foundation project;2.0 version of the Apache License. Choice made! and now for the actual work...
ZieOok Dashboard: central hubGrant access to Dashboard for content-providers;Import- and train collections of content-providers;Create recommenders;Create templates for recommenders;Provide statistics;Provide a HTML widget for simple usage on blogs etc.;Set ﬁlters to recommendations (date-limit, use subparts of collections only)Provide a REST API to build GUI’s and recommendations.
Collections, users and ratingsTwofold way:1.using OAI PMH (Open Archives Iniative - Protocol Metadata Harvesting) http://anyplace.org/OAI?verb=GetRecord&identiﬁer=oai:arXiv.org:hep-th/ 9901001&metadataPreﬁx=oai_czp+ collections are updated by content-provider;- no user information stored in OAI however, speciﬁc ZieOok job;+ have a variety of connectors available (aoi_dc -Dublin Core-);- cold start problem: no information on ratings, nor users.2.use the Movielens formatadd collection ﬁle;add ratings ﬁle;add user ﬁle;+ ‘ideal start’: all data available from collection, users and ratings;- static, updates need to arrive from content-platform itself, no harvesting mechanism available.
RecommendationsTwo ways to render recommendations1. Simple HTML widgetZieOok created recommendation renders unstyled HTML:top 5 recommendation;like/dislike;2. Call on the ZieOok REST APIGet full access from the ZieOok API to build custom recommendersuse REST calls to the ZieOok framework;import/analyse/train data;real-time;
UsecaseConnect Dutch Broadcasting Organisation (NPO) to ZieOok. (on-demand) Front-end: Recommend items Rate items (like/dislike) See similar users & connect Back-end: (Dashboard) Set linear recommenders (in between 16:00-18:00, 18:00-20:00, 20:00-00:00) Filters (limit date on content or only show category ‘sports,news’ within the collection )
Quality of recommendationsSo what deﬁnes quality?Quality set by a gold standard;But also deﬁne non-quality such as: Also see: X Currently an editorial process
RoadmapShort term1.Bring ZieOok onstream: end of this month (June 2011);2.Release ZieOok REST API to the community (under discussion);3.Connect content-platforms.Long term1.Maintain ZieOok Cluster for a 3 year period;2.Hybrid recommender (recommend cross-platform);3.Identify risks in development and upgrades: Mahout API changes,Hadoop changes etc.
End of presentation / Q&Aby Siem Vaessen - managing partner @ Zimmerman & Zimmerman