Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Project ZieOok - Berlin Buzzwords 2011


Published on

Presentation given during Berlin Buzzwords 2011.

Talk on project ZieOok: building a generic recommendation platform on top of Mahout an Hadoop.

Published in: Technology, Education
  • Be the first to comment

Project ZieOok - Berlin Buzzwords 2011

  1. 1. ZieOok (‘AlsoSee’)building a generic recommendation framework for the cultural heritage field by Siem Vaessen - managing partner @ Zimmerman & Zimmerman Berlin Buzzwords 2011
  2. 2. About the images for the futurePreserving audiovisual heritage of the Netherlands through conservationand digitization;Seven year project , started in 2007, will end in 2014Budget of €154 million;During the project, a total of 137.200 hours of video, 22.510 hours of film,123.900 hours of audio, and 2.9 million photos from these archives will berestored, preserved, digitized, and disclosed through various services. So what to do with all this data? besides digitization...
  3. 3. Current status + loads of interfaces, applications and tools built on top of this content more info @
  4. 4. Main purpose ZieOok (‘AlsoSee’)“To create meaningfull relations between assets and users by means of a recommendation engine” (june 2009)Build an API which will fully function based on REST calls on top of theMahout/Hadoop setup;Develop a recommendation framework based on an existing framework;Develop an administrator dashboard: a central hub for controlling maincomponents of the recommendation framework (GUI);Code developed within ZieOok needs to become open-source.
  5. 5. Long tailBringing niche content to users
  6. 6. The ‘market-analysis’Identify codebase that is suitable for the project;Make sure that codebase is sustainable.Question: can a semantic correlation be established within the project? 1. Lexicon- or ontology based (connecting Thesauri); 2. A Trust network based sytem based on the FOAF (Friend of a Friend) specification; 3. Context-adaptable system that extracts addtional information from the lexicon or the ontology. Two frameworks identified “Duine Framework is a (collection of) software libraries that allows developers to create prediction engines.” Telematica Instituut/Novay / version RC1 (17/2/09)
  7. 7. Apache Lucene Mahout (fka: Taste)At that time version 0.2;An Apache foundation project;2.0 version of the Apache License. Choice made! and now for the actual work...
  8. 8. Core concept ZieOok
  9. 9. Technical architecture ZieOok
  10. 10. Rails ‘front-end’ structure !
  11. 11. ZieOok datamodel: FOAFFriend of a Friend specification. ( ! <foaf:person> <foaf:gender /> <foaf:age /> <foaf:knows /> <foaf:based_near /> <foaf:made rdf:resource=”some-rating-uri” /> </foaf:Person> <zieook:rating> <foaf:maker rdf:resource=”some-user-uri” /> <foaf:Document rdf:resource=”item-uri” /> <rdf:DateTime /> <zieook:value /> <zieook:range /> <zieook:source rdf:resource=”source-uri” /> ! <zieook:recom rdf:resource=”recommender-uri /> </zieook:rating>
  12. 12. ZieOok Dashboard: central hubGrant access to Dashboard for content-providers;Import- and train collections of content-providers;Create recommenders;Create templates for recommenders;Provide statistics;Provide a HTML widget for simple usage on blogs etc.;Set filters to recommendations (date-limit, use subparts of collections only)Provide a REST API to build GUI’s and recommendations.
  13. 13. Collections, users and ratingsTwofold way:1.using OAI PMH (Open Archives Iniative - Protocol Metadata Harvesting) 9901001&metadataPrefix=oai_czp+ collections are updated by content-provider;- no user information stored in OAI however, specific ZieOok job;+ have a variety of connectors available (aoi_dc -Dublin Core-);- cold start problem: no information on ratings, nor users.2.use the Movielens formatadd collection file;add ratings file;add user file;+ ‘ideal start’: all data available from collection, users and ratings;- static, updates need to arrive from content-platform itself, no harvesting mechanism available.
  14. 14. RecommendationsTwo ways to render recommendations1. Simple HTML widgetZieOok created recommendation renders unstyled HTML:top 5 recommendation;like/dislike;2. Call on the ZieOok REST APIGet full access from the ZieOok API to build custom recommendersuse REST calls to the ZieOok framework;import/analyse/train data;real-time;
  15. 15. UsecaseConnect Dutch Broadcasting Organisation (NPO) to ZieOok. (on-demand) Front-end: Recommend items Rate items (like/dislike) See similar users & connect Back-end: (Dashboard) Set linear recommenders (in between 16:00-18:00, 18:00-20:00, 20:00-00:00) Filters (limit date on content or only show category ‘sports,news’ within the collection )
  16. 16. Quality of recommendationsSo what defines quality?Quality set by a gold standard;But also define non-quality such as: Also see: X Currently an editorial process
  17. 17. RoadmapShort term1.Bring ZieOok onstream: end of this month (June 2011);2.Release ZieOok REST API to the community (under discussion);3.Connect content-platforms.Long term1.Maintain ZieOok Cluster for a 3 year period;2.Hybrid recommender (recommend cross-platform);3.Identify risks in development and upgrades: Mahout API changes,Hadoop changes etc.
  18. 18. End of presentation / Q&Aby Siem Vaessen - managing partner @ Zimmerman & Zimmerman