SlideShare a Scribd company logo
1 of 291
Strategic Scenarios in

     Digital contents


       Marco Brambilla et al.


 Politecnico di Milano, DEI and MIP
          Acer Academy
             May 2009


http://home.dei.polimi.it/mbrambil/
Agenda overview


   Information overload
   Evolution of contents
   Web 2.0
   Web 3.0
   Tools and technologies for managing information overload
1. Information overload
Introduction and motivation

    161 exabytes of information was created or replicated
     worldwide in 2006
    IDC estimates 6X growth by 2010 to 988 exabytes (a
     zetabyte) / year
    That‟s more than in the previous 5,000 years.

         – DATA from: Dr. Michael L. Brodie - Chief Scientist Verizon
Where does content come from

   The largest source of data?  USERS
   YouTube Videos
       1.7 billion served / month
       1 million streams / day = 75 billion e-mails
   Facebook had [in 2007] …
       1.8 billion photos
       31 million active users
       100.000 new users / day
       1,800 applications
   MySpace, 185+ million registered users (Apr 2007), has…
       Images:
          – 1+ billion - Millions uploaded / day- 150,000 requests / sec
       Songs:
          – 25 million - 250,000 concurrent streams
       Videos:
          – 60 TB - 60,000 uploaded / day - 15,000 concurrent streams
Quality of data

    (User Generated) Content is:
       25% original; 75% replicated
       25% from the workplace; 75% not
       95% unstructured and growing
    While enterprise data is 10-15% structured and decreasing
    Main challenges:
       How to make multimedia content available to search engines and
        search based applications?
       Exploiting multimedia content requires:
          – Acquiring it
          – (Re) Formatting it
          – Indexing it
          – Querying it
          – Transmitting it
          – Browsing it
Information overload effects on (our)
way of working




                      For knowledge workers
                      • Time is limited
                      • Processes overlap
                      • Knowledge is (often) artefact-
                        dependent
                      • Tools allow multiplicity of uses
                      • Need for several tools
                      • Relations with people take time
                      • Contexts mix and merge
Example: email (!!)




                      8
Working with information

    Types of information
       Usefulness
          – Active: ephemeral and working (“hot”)
          – Dormant: inactive, potentially useful (“cold”)
          – Not useful
          – Un-accessed
       Ownership: mine or not-mine
    Activities
       Acquisition of items to form a collection
       Organisation of items
       Maintenance of the collection (e.g. archiving items into long-
        term storage)
       Retrieval of items for reuse
                      Information (and choice) overload.. On YOUTUBE
Acquisition


    Different between tools
        Manual (files), uncontrolled (e-mails)
        Push vs. pull
    Reasons for deciding how to store information
        Portability
        Number of access points
        Preservation of information in its current state
        Currency of information
        Context
        Reminding
        Ease of integration into existing structures
        Communication and information sharing
        Ease of maintenance
Organisation


    Categorisations are complex
       Folders vs. keywords
       Trees vs. webs
       Change over time
    Categorisations are local
       If two groups of people construct thesauri in a particular subject
        area, the overlap of index terms will only be 60%
       Two indexers using the same thesaurus on the same document
        use common index terms in only 30% of cases
       The output from two experienced database searchers has only
        40% overlap
       Experts' judgements of relevance concur in only 60% of cases
Maintanance


   Hardly any
      Occasional cleaning
      Extensive maintenance is related to major life changes (e.g. new
       job)
Retrieval

       Personal archives instead of corporate systems
       Need to start searching
           Not invented here: reinventing is more fun than reusing
           Asking is more difficult than sharing
       Social search: asking others
           Estimations of quality and relevance are best made by experts
            themselves
           It's fastest and most efficient way
           Colleagues can give you feedback and help to sharpen your
            questions
           Consulting others is fun
       While searching systems
           Preference for location-based search
           Critical reminding function of file placement
           Lack of retrieval of archived files
2. Evolution of contents
Evolution of contents and technologies


 I. from static to dynamic
 II. from fixed to mobile
 III. from big to small
 IV. from local to global
 V. from vertical to horizontal
 VI. from sometimes-on to always-on
 VII. from wired to wireless
 VIII. from divergence to convergence




                                         15
Content proliferation and classification


 Proliferation of
    blogs
    online video
    podcasting,
    other social media tools
 the definition of what consititutes ‟web‟/‟non-web‟ content has
  become increasingly blurred




                                                                    16
Pervasive and convergent digital content




                                           17
Convergence of connectivity




                              18
3. Web 2.0
Strategic scenarios in digital content and digital business
Social- vs. Group- ware

 The basic model of 90's era collaboration (Lotus Notes):
all about the group.
     Information was managed in group-based repositories, then passed around for
      review, or published to intranet portals via customized apps. Information era
      workflows where people are first and foremost occupiers of roles, not individuals,
      and the materials being created are more closely aligned with groups than
      individuals.



 Web 2.0 social tools: MySpace, Facebook, LinkedIn
Social networks -- explicit ones or implicit ones in social media –
     are really organized around individuals and their networked self-expression. I am
      writing this blog post, and publishing it, personally. It is not the product of some
      workgroup. It is not an anonymous chunk of text on a corporate portal. My
      Facebook profile pulls traffic from my network of contacts, sources I find
      interesting, and the chance presence updates of my friends.
 See: http://www.stoweboyd.com/message/2007/01/in_the_time_of_.html
                                                                                             21
Doug Engelbart, 1968




   "The grand challenge is
    to boost the collective
       IQ of organizations
          and of society. "
Tim O’Reilly, 2006, on Web 2.0



           “The central principle
      behind the success of the
      giants born in the Web 1.0
       era who have survived to
            lead the Web 2.0 era
         appears to be this, that
        they have embraced the
             power of the web to
              harness collective
                    intelligence”
Web 2.0 is about The Social Web




 “Web 2.0 Is Much More
     About A Change In
People and Society Than
           Technology”
                 -Dion Hinchcliffe,
                     tech blogger

   1 billion people connect to the
    Internet
   100 million web sites
   over a third of adults in US
    have contributed content to the
    public Internet. - 18% of adults
    over 65
Tim Berners-Lee



                      “The Web isn’t about what
                    you can do with computers.
                   It’s people and, yes, they are
                       connected by computers.
                       But computer science, as
                  the study of what happens in
                    a computer, doesn’t tell you
                     about what happens on the
                                           Web.”
                                        NY Times, Nov 2, 2006
But what is “collective intelligence”
in the social web sense?

 intelligent collection?
    collaborative bookmarking, searching
 “database of intentions”
    clicking, rating, tagging, buying
 what we all know but hadn‟t got around to saying in public
  before
    blogs, wikis, discussion lists




                                            “database of intentions” – Tim O’Reilly
the wisdom of clouds?
“Collective Knowledge” Systems


 The capacity to provide useful information
 based on human contributions
 which gets better as more people participate.


 typically
    mix of structured, machine-readable data and unstructured data
     from human input
Collective Knowledge is Real


 FAQ-o-Sphere - self service Q&A forums
 Citizen Journalism – “We the Media”
 Product reviews for gadgets and hotels
 Collaborative filtering for books and music
 Amateur Academia
The timeline
Web 2.0

  The phrase "Web 2.0" can refer to one or more of the following:
   The transition of web sites from isolated information silos to sources of
    content and functionality, thus becoming computing platforms serving
    web applications to end-users


   A social phenomenon embracing an approach to generating and
    distributing Web content itself, characterized by open communication,
    decentralization of authority, freedom to share and re-use, and "the
    market as a conversation”


   Enhanced organization and categorization of content, emphasizing deep
    linking


   A rise in the economic value of the Web, possibly surpassing the impact
    of the dot-com boom of the late 1990s
Two main kinds


 PEOPLE FOCUS: The first kind of socializing is typified by
  "people focus" websites such as Bebo, Facebook, and Myspace
  and Xiaonei.
 HOBBY FOCUS: The second kind of socializing is typified by
  a sort of "hobby focus" websites. such as Flickr, Kodak Gallery
  and Photobucket
Web 2.0 (see Wesch from YouTube
[LOCAL])


  Since social web applications are built to encourage communication
   between people, they typically emphasize some combination of the
   following social attributes:
 Identity: who are you?
 Reputation: what do people think you stand for?
 Presence: where are you?
 Relationships: who are you connected with? who do you trust?
 Groups: how do you organize your connections?
 Conversations: what do you discuss with others?
 Sharing: what content do you make available for others to interact with?
 Examples of social applications include Twitter, Facebook, Stumpedia,
  and Jaiku.
Keyword: sharing!
 Sharing...
         Useful   vs.   Not useful (!?) 
Sharing for the enterprise?

   (1) A teenager model?   (2) Always useful?
Community




            36
Human Resource Management 2.0


 Social networks for the job market
       – To find and be found
       – To manage your online
         reputation
       – To research and
         reference check
       – To hire a superstar
       – To use your network to do your job better
       – To use your network to get a better job
                                                   http://www.linkedin.com/
Blog




        a user-generated website where entries are
         made in journal style and displayed in a
         reverse chronological order. The term
         "blog" is derived from "Web log." "Blog"
         can also be used as a verb, meaning to
         maintain or add content to a blog.
Wiki




        a website that allows the visitors themselves
         to easily add, remove, and otherwise edit
         and change available content, typically
         without the need for registration. This ease
         of interaction and operation makes a wiki an
         effective tool for mass collaborative
         authoring.
Best known wiki
Wiki vs. Blog


    A blog, or web log, shares writing and multimedia content in the form of
    “posts” (starting point entries) and “comments” (responses to the posts).
    While commenting, and even posting, are open to the members of the
    blog or the general public, no one is able to change a comment or post
    made by another. The usual format is post-comment-comment-comment,
    and so on. For this reason, blogs are often the vehicle of choice to
    expressindividual opinions.

    A wiki has a far more open structure and allows others to change what
    one person has written. This openness may trump individual opinion
    withgroup consensus.
Special purpose blogs: photos, music, ...




                                            42
(Social) Tagging




 Term – a word or phrase that is recognizable by
  people and computers
 Document – a thing to be tagged, identifiable by a
  URI or a similar naming service
 Tagger – someone or thing doing the tagging, such
  as the user of an application
 Tagged – the assertion by Tagger that Document
  should be tagged with Term
Podcast




     A podcast is a media file that is
      distributed by subscription (paid or
      unpaid) over the Internet using
      syndication feeds, for playback on mobile
      devices and personal computers.
Examples of Podcasts available



  iTunes Store
  NPR
  ArtsEdge
  Ed. Podcast
   Network
  SFMoMA
Blog with Podcasts & Wikis




   Several
    functions
    on the
    same
    platform
Gathering specific communities –
TappedIn
Collecting feedbacks – SurveyMonkey
                           SurveyMonkey.com
Tools. Example: collaboration and
sharing

 Webex
    Meeting center
    Training center
 Acquired by CISCO in 2007
 Integrated phone conferencing, VoIP, support for PowerPoint,
  Flash, audio, and video;
 Meeting recording and playback, One-click meeting access,
  scheduling, and IM applications, full compatibility, secure
  communications


 See http://www.sramanamitra.com/2007/03/15/cisco-acquires-
  webex-beefs-collaboration/

                                                                 49
Trends and size


 Facebook growth: 700% from 2008 to 2009
 Twitter growth: 3,700%
 And unique visitors..
One big social application? Facebook
connect!

                        evolution of Facebook Platform enabling
                         you to integrate Facebook into your own
                         site.
                       You can add social context to your site:
                        Identity. Seamlessly connect the user's
                         Facebook account with your site
                        Friends. Bring a user's Facebook friends
                         into your site.
                        Social Distribution. Publish information
                         back into Facebook.
                        Privacy. Bring dynamic privacy to your
                         site.


       How scalable, reliable, open-minded?

                                                                    51
Wouldn’t this be better?


                    But..




                            52
The Mash-up approach

  User-defined combination
   of services available on the
   web
  Graphical design
  Immediate execution
E.g.: airlines mash-up



                         Tracing of referral,
                         searches, and so on




      […]
SOA vs. Web 2.0

                      SOA   Web 2.0

        Planning


         Design


     Implementation


       Monitoring
Comparison ...

                   Web 2.0        SOA
                          Saas = Saas

    Web-based interoperability    Standard based interoperability
                      (REST)      (SOAP, WSDL, UDDI)

      Application as a platform = Application as a platform

   Pushes for unexpected reuse    Allows reuse

                          RIA     No UI

     Participatory architecture   Centralized governance
… and complementarity




Fonte: Babak Hosseinzadeh, IBM
Short term challenge: Mash-up on SOA

             Mash-up               SOA
Mid-term: Web as a platform

   The past                                                               The future

                                          […]                                                               […]

                         Framework                                                      Framework
API
      API
            API
                  API
                        API
                              API
                                    API
                                           API

                                                       API




                                                                                 RSS




                                                                                                             RSS
                                                                    RSS
                                                             REST


                                                                          SOAP


                                                                                       REST
                                                                                              REST
                                                                                                     SOAP



                                                                                                                         SOAP
                                                 […]                                                               […]




                              Operating System                                                                       Web


                                           Hardware                                                           Internet
Example: eBay

  Services for
         shopping
         trading
  Publishes services
         REST interface
         SOAP interface
  Numbers1:
         4 billion requests/month
          (5.5 mln/h)
         25% of the offer only via
          Web Service
         25000 registered developers
         1900 known applications
 1http://blogs.zdnet.com/ITFacts/?p=10326
Example: Amazon

 Services for
     e-commerce
     on-line payment
     computing (EC2)
     storage (s3)
     human computing (MTurk)
     Queues (SQS)
 Success stories
     Ex 1, Jungle Disk: online back-up
      service
     Ex 2, ABACA:99%-protection
      antispam
(NOT) Artificial intelligence: Mechanical
Turk !




                                            62
4. Web 3.0
SOA provides great plumbing!
Web 2.0 providegreatplumbing!




                   E. Della Valle @ CEFRIELValle @ CEFRIEL - Politecnico di Milano
                                     E. Della - Politecnico di
Is plumbing enough?
How to manage complexity?

 A few services in a small company                                       Hundreds of services and processes
                                                                           in a big organization


              Few services                                            Several services




                                                                                                                                      Several enterprises
                                                                              A1
                 B8         A4                                                        A1 B3      A1
                                                                    B3      A1
                                                                        A1                  A1
                     A1                                                         A1
                                                                      A1 A4         A2
                          A4 A1 A2                                             A1 A4       A2 A4 A1 A2
                B3                                                                     A1 A2
One company




                           A5                                                A1 A2
                                                                     B3     B3       A1
                    A1                                                      A1
                                                                          A1 A1          B3 A1 A1
                                                                                         A1
                         A4 A6                                        A1               A4
                                                                                  A1 A1 A1 A4 A1 A2
                                                                                  A4       B3
                                                                                            A1       A1
                                             A2                          A4 A1 B3         A1 A1 A4 A1 A2
                                                                                               A2                          A1
                                                                            A4 B3 A1   A4
                                                                    B3                             A1                  A2     A4 A1 A2
                                                                            A1 A1            A1       A1               A1
                                                                                                                   A1 A1
                                                                      A1
                                                                       B3           A2 A4 A1 A2
                                                                                             A1            A1
                                                               A1     A1                 A4 A1 A2 A1A1 A4 A4
                                                                                                        A2    A1    A2A2 A4 A1 A2
                                                                                       A2
                                                                         A4
                                                                  A4 A1 A1 A1 A1
                                                                          A2 A1 A4           B3 A1                    A4      A2 A4    A2
                                                                         A4 A1            A1A1              A1      A2    B3
                                                                                                B3 A4                               A2
                                                                        B3 A4 A1 B3   A2
                                                                                     A1
                                                                                 A1 A1          A1    A1           A4 A1 A4 A1
                                                                                                                    A4    A2
                                                                          B3 B3A1 A1
                                                                    A1 A1                        A2 A4 A1
                                                                                                   A1 A1 A2            A1A1
              Mashup                                                       A4
                                                                           A1
                                                                                         A2 A1
                                                                                         A4
                                                                                    A1 A1A1 A1A2 A4
                                                                                       A4 A1                  A4
                                                                                                            B3     A1A1 B3 B3 A1
                                                                                                                    A1
                                                                                                      A1
                                         ?

                A            N1          E            N2   F


                                     C            D


                                                                          Complex BPM
The problem is in the semantics!




“The problem is not in the plumbing,
 it is in the semantics ”
        VerizonChief Scientist - M . L . Brodie
“L’eterogeneità semantica rimane il principale intoppo alla
integrazione di applicazioni, un intoppo che i Web Services da soli non
risolveranno. Finché qualcuno non troverà un modo di per far sì che
le applicazioni si capiscano, gli effetti dei Web Services resteranno
limitate. Quando si passano i dati di un utente in un certo formato
usando un Web Services come interfaccia, il programma che li riceve
deve comunque sapere in che formato sono. Occorre comunque accordarsi sulla
struttura di ciascun business object. Fino ad ora nessuno ha ancora trovato una
soluzione attuabile…”
                                          Oracle Chairman and CEO - Larry Ellison
Web 3.0


 Combining SOA + Social Web + Semantic Web
 I.e., Services + Folksonomies + Ontologies (or + Taxonomies)




                                                                 69
Tim Berners-Lee, 2001




             “The Semantic Web is not a
          separate Web but an extension
             of the current one, in which
               information is given well-
                  defined meaning, better
         enabling computers and people
                 to work in cooperation.”

                             Scientific American, May 2001
Beyond Web 2.0 ...
Business Process


                                                                                                       Given a BPM:
                                                                                                       Find the best
                                                                                                       set of services?
                                                                                                       Find the best
                                                                                                       datasource?
Integration




                                      Mediator
                           Mediator
                                                                                                       Manage not
                                                                                                       heterogeneous
                                  Web as a world scale platform                                        data/services?
                                                  Legacy
                                                 Mediator   Mediator    Comm.
                                                 Mediator              Mediator


                                                                                                       AT
Services




                   Buyer                                                                               RUNTIME!

                                                                                      […]

                                                  […]
                                      […]
                                                                                  3rd Party Shipment
SOA + Web 2.0 = ?
        UDDI
                                                  WSDL
                               Service
                             Description
WSBPEL           Discovery
                 Agencies


                                     Publish
                Discover

                                                   Service
                                                 Description
   Service                            Service
  requester                           provider
                      Interact


                  SOAP ..
  source: http://www.w3.org/TR/2002/WD-ws-arch-20021114/
SOA Advantages
                   Costs of different EAI approaches
Relative costs




                       Custom Integration
                       Proprietary EAI solutions
                       Web Services based EAI solutions
                       SOA based EAI solutions




                 Adoption        Deployment               Maintenance         Changes
                                                               [source ZapThink http://www.zapthink.com/]
From vertical applications...

 Different IT solutions in each department



    Department 1           Department 2             Department N




                                              […]
… to service extraction …

 Rationalization of IT solutions
 Factorization and publication of common services


    Department 1            Department 2                   Department N




                                                     […]
… and process composition.

 For using internal subprocesses, but also processes of customers or providers.



 Client

 Department 1

 Department 2

Shared services

Outsourced services

Provider
“Ontology is overrated.”



   “[tags] are a radical break with previous categorization
    strategies”
   hierarchical, centrally controlled, taxonomic
    categorization has serious limitations
      e.g., Dewey Decimal System
   free-form, massively distributed tagging is resilient
    against several of these limitations




                                    http://shirky.com/writings/ontology_overrated.html
But...


 ontologies aren‟t taxonomies
 they are for sharing, not finding
 they enable cross-application aggregation and value-added
  services
Ontology of Folksonomy


 What would it look like to formalize an ontology for
  tag data?


 Functional Purpose: applications that use tag data from
  multiple systems
    tag search across multiple sites
    collaboratively filtered search
      – “find things using tags my buddies say match those tags”
    combine tags with structured query
      – “find all hotels in Spain tagged with “romantic”

                                    http://tomgruber.org/writing/ontology-of-folksonomy.htm
Example: formal match,
semantic mismatch

 System A says a tag is a property of a document.
 System B says a tag is an assertion by an individual with an
  identity.
 Does it mean anything to combine the tag data from these two
  systems?
    “Precision without accuracy”
    “Statistical fantasy”
Engineering the tag ontology


 Working with tag community, identify core and non core
  agreements
 Use the process of ontology engineering to surface issues that
  need clarification
 Couple a proposed ontology with reference implementations or
  hosted APIs
Issues raised by ontological engineering

 is term identity invariant over case, whitespace, punctuation?
 are documents one-to-one with URI identities?
  (are alias URLs possible?)
 can tagging be asserted without human taggers?
 negation of tag assertions?
 tag polarity – “voting” for an assertion
 tag spaces – is the scope of tagging data a user community,
  application, namespace, or database?
Pivot Browsing – surfing unstructured
content along structured lines

 Structured data provides dimensions of a hypercube
    location
    author
    type
    date
    quality rating
 Travel researchers browse along any dimension.
 The key structured data is the destination hierarchy
    Contributors place their content into the destination hierarchy, and
     the other dimensions are automatic.
5. Tools and technologies
for managing information overload
Tools



        Information:
    The double edged sword


 You want good
  information, not all
  information
    Information Retrieval
     /search
        – Multimedia IR
    RSS/Bloglines/Google
     Reader
    Social bookmarking
5.1. Multimedia Information Retrieval
Data in digital libraries

 TEXT: e-book, Word documents, Web pages, PDF, Blog,
  etc.

 Audio:
    Speech (broadcasting, podcasting, recording, etc.)
    Music (CD, MP3, etc.)

 Pictures: Personal photos, schemes, diagrams, etc.

 Video: sequence of images and audio (music and/or
  speech)

Challenge: How to make multimedia content available
  to search engines and search based applications?
Some user challenges…


 Precision & contextual relevancy
    aware of rights, user and information contexts
    personalization and recommendation


 Search must support multiple interaction patterns
    active searching, monitoring, browsing and "being aware“


 Trust and spam


 Ubiquity of access
MIR Application Areas


  Architecture, real estate, and           Investigation services
   interior design
                                                (e.g., human characteristics
      (e.g., searching for ideas)               recognition, forensics)
  Broadcast media selection                Journalism
      (e.g., radio and TV channel)             (e.g. searching speeches of a
                                                 certain politician using his name,
  Cultural services                             his voice or his face)
      (history museums, art galleries,     Multimedia directory services
       etc.)
                                                (e.g. yellow pages, Tourist
  Digital libraries                             information, GIS)
      (e.g., musical dictionary, bio-      Multimedia editing
       medical imaging catalogues, film,
       video and radio archives)                (e.g., personalized news service,
                                                 media authoring)
  E-Commerce
                                            Remote sensing
      (e.g., personalized advertising,
       on-line catalogues)                      (e.g., cartography, ecology)
  Education                                Shopping
      (e.g., repositories of multimedia        (e.g., searching for clothes)
       courses)
                                            Social
  Home Entertainment
                                                (e.g. dating services)
      (e.g., personal multimedia
       collections)                         Surveillance
                                                (e.g., traffic control)
MIR: Query Examples

 Play a few notes on a keyboard and retrieve a list of
  musical pieces similar to the required tune, or images
  matching the notes in a certain way, e.g., in terms of
  emotions
 Draw a few lines on a screen and find a set of images
  containing similar graphics, logos, ideograms,...
 Define objects, including color patches or textures and
  retrieve examples among which you select the interesting
  objects to compose your design
 On a given set of multimedia objects, describe
  movements and relations between objects and so
  search for animations fulfilling the described temporal and
  spatial relations
 Describe actions and get a list of scenarios containing
  such actions
 Using an excerpt of Pavarotti’s voice, obtaining a list of
  Pavarotti’s records, video clips where Pavarotti is singing
  and photographic material portraying Pavarotti
State-of-the art of MSE

 Image search              Video Search
      www.tiltomo.com         www.blinx.com
      www.tineye.com          www.clipta.com
      www.pixsta.com          www.yovisto.com
      www.picsearch.com


 Music Search              Entrerprise MIR search
    www.midomi.com              www.autonomy.com
    www.audiobaba.com           www.pictron.com
    http://www.bmat.com         www.exalead.com
                                 www.fastsearch.com
Metadata?                                               92


  “Data about other data”
    They describe in a structured fashion properties
     of the data
      – E.g.: owner, creation and modification date,
        description, etc.


  Some metadata are implicitly available
    E.g.: file size, file name, etc.


  Others need to be manually provided or
   automatically extracted
The MIR reference architecture
Content Process




                         Content        Content
Content Acquisition
                      Transformation   Indexing
Content acquisition



 In MIR, content is acquired from many sources and
  in in multiple ways:
   By crawling
   By user’s contribution
   By syndicated contribution from content aggregators
   Via broadcast capture (e.g., from air/cable/satellite
    broadcast, IPTV, Internet TV multicast, ..)
Content acquisition

 In text or Web search engines, content is a closed or open
  collection of documents
 Textual Web content is acquired by crawlers, who exploit link
  navigation

 In MIR, content is acquired from many sources, in a range of
  quality and value:
      Web cams, security apps
      (Video/Audio) Telephony and teleconferencing
      Industrial/Academic/Medical
      User Generated Content
      Public Access and Government Access
      Rushes, Raw Footage                                        MOTION PICTURES



                                              VALUE
      News
                                                            BROADCAST TV
      Advertising
                                                        ENTERPRISE
      TV Programming
      Feature Films                              USER GENERATED

                                                      WEB CAM, SECURITY


                                                             PRODUCTION COST
Acquisition: (video) metadata sources & formats

 Content element may be accompanied by textual
  descriptions, which range in quantity and quality, from no
  description (e.g., web cam content) to multilingual high
  value data (closed captions and production metadata of
  motion pictures)
 Metadata may reside:
    Embedded within content (e.g., close captions)
    In surrounding Web pages or links (e.g., HTML content, link
     anchors, etc)
    In domain-specific databases (e.g., IMDB for feature films)
    In ontologies:
     http://www.daml.org/ontologies/keyword.html


                                                          ASSET PACKAGE
                         METADATA




                                    METADATA



                                               METADATA
        MULTIPLEXED
         METADATA                     MEDIA
                                     STREAMS                     EXTERNAL
                                                                 METADATA
Acquisition: (video) representative metadata
standards


 Standard     Body
 MPEG-7,      ISO/IEC Int. Electrotechnical Comm., Motion
 MPEG-21      Picture Expert Group
 UPnP         Universal Plug and Play forum
 MXF, MDD     SMPTE Society of Motion Picture and Television
              Engineers
 AAF          AMWA Advanced Media Workflow Association
 TV Anytime   ETSI European Telecommunication Standards
              Institute
 Timed Text   W3C, 3GPP
 RSS          Harward
 Podcast      Apple
 Media RSS    Yahoo
Transformation dimesions: Digital video formats


 A digital video is a sequence of frames
 The Frame Aspect Ratio (FAR) defines the shape of each
  image (width divided by heigh), with 4:3 and 16:9 being the
  currently adopted values




 Pixel aspect ratio (PAR) describes how the width of pixels in a
  digital image compares to their height (rectangular pixels
  format exist for analog TV compatibility).
 Frame rate: number of frames per second (24 and 25 are
  common, but also lower and higher values are used)
Transformation dimensions: compression

 Web media must be compressed, with lossy (but perceptually
  acceptable) transformations
 In video, compression works in two ways
    Intra-Frame: an image is divided in blocks, whose content is
     “averaged”
    Inter-frame: a frame is represented differentially with respect to
     the preceding one, by encoding only block that “have moved”
     and their motion vector
    Example (MPEG compression)
Content Transformation: popular compression
standards

Standard           Typical bitrates      Applications
M-JPEG,            Up to 60              Consumer electronics, video
JPEG2000           Mbit/sec              editing systems

DVCAM              25M                   Consumer

MPEG-1             1.5M                  CD-ROM Multimedia

MPEG-2             4-20M                 Broadcast TV, DVD
MPEG-4             300K-12M              Mobile video, Podcast, IPTV
H.264
H.261 H.263        64k-1M                Video teleconferencing,
                                         telephony
Each standard has profiles, that balance latency, complexity, error resilience
and bandwidth, specifically for a target application (e.g., file-based vs
transport-based fruition)
Content indexing

 In textual search engines, content need little (lexical) analysis
  before indexing
    Index elements (words) are part of the content


 In MIR, content cannot be indexed directly
    Indexablemeatadatamust be created from the input data
       – Low level features: concisely describe physical or perceptual properties
         of a media element (e.g., feature vectors)
       – High level features: domain concepts characterizing the content (e.g.,
         extracted objects and their properties, content categorizations, etc)


 In continuous media, extracted features must be related to the
  media segment that they characterize, both in space and time

 Feature extraction may require a change of medium, e.g.,
  speech to text transcription
Motivations for metadata generation

 Computer are not able to catch the
  underlying meaning of a multimedia
  content
    A computer is not able to understand that
     this picture represents a sunset
    Pixels and audio samples do not convey
     semantics, just binary


 Metadata are used to produce
  representations that are manageable
  by computers
    E.g.: text or numbers
How to create multimedia annotations?

  Manually
     Expensive
       – It can take up to 10x the duration of the video
       – Problems in scaling to millions of contents
     Incomplete or inaccurate
       – People might not be able to holistically catch all the
         meanings associated with a multimedia object
     Difficult
       – Some contents are tedious to describe with words
           - E.g., a melody without lyrics

  Automatically
     Good quality
       – Some technologies have a ~90% precision
     “Low” cost
Indexing: the core pipeline


                                   Content                                        Metadata
                                  processing                                      Indexing



     Multimedia                                                     Metadata
                                                                 (e.g., MPEG-7)                   Indexes
       content                                                                                 (e.g., inverted
(e.g., MPEG-2 video)                                                                                 files)
                                     Video                      Audio
                                  processing                  processing




                                                                  Segmentation
                                    Segmentation




                                                                                     Audio
                                                                                    Analysis
                        Image                       Video
                       Analysis
                                                   Analysis
Image/Text segmentation


 GOAL: identify the type of contents
  included in an image
   Text + pictures
   Image sections
Audio Segmentation

  GOAL: split an audio track according to
   contained information
     Music
     Speech
     Noise
    …
  Additional usage
     Identification and removal of ads
Video Segmentation

 Keyframe segmentation:
   segment a video track
    according to its keyframes
     – fixed-length temporal segments
 Shot detection:
   automated detection of
    transitions between shots
     – a shot is a series of interrelated
       consecutive pictures taken
       contiguously by a single camera
       and representing a continuous
       action in time and space.
Speaker identification

  GOAL: identify people participating in a
   discussion

                                          ERIC
                                              DAVID
                                                JOHN




  Additional usage:
    Vocal command execution
Word spotting

 GOAL: recognize spoken words belonging to a
  closed dictionary
                                              Call
                                                Open
                                                     Bomb

 Additional usage:
   Spot blacklist words in spontaneous speech
     – E.g.: terrorist, attack,…
   dialing (e.g., "Call home”)
   call routing (e.g., "I would like to make a collect
    call”)
   Domotic appliance control
Speech to text


  GOAL: automatically recognize spoken words
   belonging to an open dictionary




  Example: quote_detection.avi



                             CREDITS: Thorsten Hermes@SSMT2006
Identification of audio events

 GOAL: automatically identify audio events of
  interest
    E.g.: shouts, gunshots, etc.


 Additional usage:
    Security applications


 Example: sound_events.avi



                             CREDITS: Thorsten Hermes@SSMT2006
Classification of music genre, mood, etc.

 GOAL: automatically classify the genre and
  mood of a song
    Rock, pop, Jazz, Blues, etc.
    Happy, aggressive, sad, melancholic,



                                             Rock
                                                 Dance!


 Additional usage:
    Automatic selection of songs for playlist
     composition
Images: low-level features

 GOAL: extract implicit characteristics of a
  picture
    luminosity
    orientations
    textures
    Color distribution
Images: Optical character recognition (OCR)


  OCR is a technique for
   translating images of typed or
   handwritten text into symbols
  Solved problem for typewritten
   text (99% accuracy)
  Commercial solutions for
   handwritten text (e.g, MS
   Tablet PC)
Image: face identification and recognition


 GOAL: recognize and identify
  faces in an image
 Usage examples:
   People counting
   Security applications
 Example: face_detection.avi




      CREDITS: Thorsten
     Hermes@SSMT2006
Image: concept detection

 Image analysis extract low level features from raw data
  (e.g., color histograms, color correlograms, color
  moments, co-occurrence texture matrices, edge
  direction histograms, etc..)
 Features can be used to build discrete classifiers, which
  may associate semantic concepts to images or regions
  thereof
 The MediaMill semantic search engine defines 491
  semantic concepts
    http://www.science.uva.nl/research/mediamill/demo
 Concepts can be detected also from text (e.g., from
  manual or automatic metadata) using NLP techniques
  (FAST text search engine recognizes entities like
  geographical locations, professions, names of persons,
  domain-specific technical concepts, etc)
Image: object identification

 GOAL: identify objects appearing in a picture
   Basket ball, cars, planes, players, etc.




   Also by example (unaware of position, scaling, etc)
     – objectByExample.mp4

                      CREDITS: http://www.youtube.com/user/GuoshenYu
Video OCR

 Video OCR has specific problems, due
  to low resolution, small text size, and
  interference with background
 Detection is normally done on the most
  representative image of an entire
  shots, rather than frame by frame
 Approach: filter for enhancing
  resolution + pattern matching for
  character identification
 Example: VirageConTEXTract text
  extraction and recognition technology
  (recognizes text in real time)
Multimodal annotation fusion

 Media segmentation and concept extraction are
  probabilistic processes
 The result is characterized by a confidence value
 Significance can be enhanced by comparing the
  output of distinct techniques applied to the same
  or similar problems
 Examples:
   Media segmentation: shot detection + speaker’s turn
    identification
   Person recognition: voice identification + face
    detection
   Concept detection: image based classification (e.g.,
    “outdoor” & “water” + object extraction: “bird”,
    “boat”)
Overview of the query process
Content querying

 In textual search applications, queries are keywords or
  expressions thereof

 In MIR, search can take place
    By keyword
    By (mono-media) example (e.g., query by image, query
     by humming, query by song similarity)
    By (multi-media) example (e.g., query by video
     similarity)

 Query by example entails real time content processing

 MIR query processing naturally requires the interaction
  of multiple search engines (e.g., a text search engine
  for textual metadata and a content-based search
  engine for feature vectors)
Querying: modalities


 In MIR applications, search keyword match the manual
  or automatic metadata
 A complementary approach is to provide an example of
  the desired content and look for similar media elements
 Similarity is a medium-dependent, domain-dependent,
  and subjective criterion
 Can be computed on low lever features (e.g., image
  color histograms, music bpm) or on high level
  concepts/categorization (e.g., melancholic images,
  party music)
 Can be multimodal (e.g., video similarity)
 Querying may also consider context information (e.g.,
  the user’s geographical position or the access device)
Example query modalities and search types


                   where[contains(“amsterdam”)]
                                 and                 52.37N 4.89 E
                     topic[contains(“building”)]

   “amsterdam”                                                                  Image


                                                                                        Song

                                     Query analysis
                                      Federation
                                                                             Music search


   Text search                                                        Image Similarity index
                                                                            search

                          XML search                 Geo search
  Inverted index                                                     Similarity index

                       Semantic index                R-tree index
Faceted query

 When a media collection is
  large and its content
  unknown to the user,
  exposing part of the
  metadata can help
 This can be done by
  showing a compact
  representation of the
  categories of content
  (facets)
 A query can be restricted
  by selecting only the
  relevant facets
Querying: by keyword

 The keyword may match the manual metadata and/or the
  automatic metadata
 The match can be multimodal: in the audio, in a visual
  concept
Querying: by similarity – query interface
Content browsing

 In textual search engines,
  results are ranked linearly,
  browsed by navigating
  links, and read at a glance
 In MIR and similarity-
  based search applications,
  browsing results must
  consider multiple
  dimensions
    Relevance: where the
     result appears in the
     sequence of retrieved
     media elements
    Space: where the search
     has matched inside a
     spatially organized media
     element (e.g., an image)
    Time: when a match
     occurs in a linear media
     element
Browsing: timeline-based video access
References

 MPEG-7:
     MPEG-7 Overview
      http://www.chiariglione.org/mpeg/standards/mpeg-
      7/mpeg-7.htm
     Prof. Ray Larson & Prof. Marc Davis, UC Berkeley
      SIMS
      http://www.sims.berkeley.edu/academics/courses/is
      202/f03/
   RSS: http://www.rssboard.org/rss-specification
   MEDIA RSS: http://search.yahoo.com/mrss
   MPEG:http://en.wikipedia.org/wiki/MPEG
   Shot detection:
    http://en.wikipedia.org/wiki/Shot_boundary_detec
    tion
References

 MediaMill:
  http://www.science.uva.nl/research/mediamill
 Similarity search
    www.midimi.com
    www.tiltomo.com
    http://tineye.com/
 Slides del corsodi “ArchiviMultimedialie Data
  Mining”, Politecnicodi Torino, Prof. Silvia Chiusano
 Slides e video dellelezionetenutedal Prof. Thorsten
  Hermes presso la summer school SSMS 2006
 PHAROS: http://www.pharos-audiovisual-
  search.eu/
5.2 RSS and readers
Acquisition: RSS and Media RSS


 RSS (Really Simple Syndication) describes a family of web feed
  formats used to publish frequently updated web resources (e.g.,
  news)
 An RSS feed includes full or summarized text, plus metadata
  such as publishing dates and authorship
 RSS formats are specified using XML
 RSS 2.0 now “frozen”
 Media RSS proposed by Yahoo as an RSS module that
  supplements the <enclosure> element capabilities of RSS 2.0 to
  allow for more robust media syndication.
Acquisition: Example of RSS 2.0
Acquisition: Browser rendition of RSS
Acquisition: an example of Media RSS
Indexing: Media segmentation in MPEG-7
Bloglines: web content aggregator




                                    138
Google reader




                139
Social bookmarking

    Online shared catalogs of annotated bookmarks
    Even ad-hoc sites are needed for managing
     complexity of bookmark sharing task




                                                     140
5.3 Personalization
Why Personalization?




    Personalization is an attempt to find most relevant
     documents using information about user's goals,
     knowledge, preferences, navigation history, etc.
Same Query, Different Intent


    “Cancer”
    Different meanings
       “Information about the astronomical/astrological sign of cancer”
       “information about cancer treatments”
    Different intents
       “is there any new tests for cancer?”
       “information about cancer treatments”
Personalization Algorithms


 Standard IR

                            Query
                                           Server
                                Document
                Client


                         User



 Related to relevance feedback
 Query expansion
 Result re-ranking
User Profile

    A user‟s profile is a collection of information about
     the user of the system.
    This information is used to get the user to more
     relevant information
Core vs. Extended User Profile

    Core profile
       contains information related to the user search goals and
        interests
    Extended profile
       contains information related to the user as a person in order to
        understand or model the use that a person will make with the
        information retrieved
Who Maintains the Profile?




      Profile is provided and maintained by the
       user/administrator
         Sometimes the only choice
      The system constructs and updates the profile (automatic
       personalization)
      Collaborative - user and system
         User creates, system maintains
         User can influence and edit
         Does it help or not?
Adaptive Search




     Goals:
        Present documents (pages) that are most suitable for the
         individual user
     Methods:
        Employ user profiles representing short-term and/or long-
         term interests
        Rank and present search results taking both user query and
         user profile into account
Personalized Search: Benefits




      Resolving ambiguity
         The profile provides a context to the query in order to reduce
          ambiguity.
         Example: The profile of interests will allow to distinguish
          what the user asked about “Berkeley” (“Pirates”, “Jaguar”)
          really wants
      Revealing hidden treasures
         The profile allows to bring to surface most relevant
          documents, which could be hidden beyond top results page
         Example: Owner of iPhone searches for Google Android.
          Pages referring to both would be most interesting
Where to Apply Profiles ?




      The user profile can be applied in several ways:
         To modify the query itself (pre-processing)
         To change the usual way of retrieval
         To process results of a query (post-processing)
         To present document snippets
         Special case: adaptation for meta-search
Pre-Process: Query Expansion




     User profile is applied to add terms to the query
        Popular terms could be added to introduce context
        Similar terms could be added to resolve indexer-user
         mismatch
        Related terms could be added to resolve ambiguity
        Works with any IR model or search engine
Pre-Process: Relevance Feedback


    In this case the profile is used to “move” the query
    Imagine that:
       the documents,
       the query
       the user profile
     are represented by the same set of weighted index terms
Post-Processing


   The user profile is used to organize the results of the
    retrieval process
      Present to the user the most interesting documents
      Filter out irrelevant documents
   Extended profile can be used effectively
   In this case the use of the profile adds an extra step to
    processing
   Similar to classic information filtering problem
   Typical way for adaptive Web IR
Post-Filter: Annotations




      The result could be relevant to the user in several aspects.
       Fusing this relevance with query relevance is error prone
       and leads to a loss of data
      Results are ranked by the query relevance, but annotated
       with visual cues reflecting other kinds of relevance
         User interests - Syskill and Webert, group interests -
          KnowledgeSea
Post-Filter: Re-Ranking




      Re-ranking is a typical approach for post-filtering
      Each document is rated according to its relevance
       (similarity) to the user or group profile
      This rating is fused with the relevance rating returned by
       the search engine
      The results are ranked by fused rating
         User model: WIFS, group model: I-Spy
Privacy related problems


    Web Information Retrieval face a challenge; that the data
     required to perform evaluations, namely query logs and click-
     through data, is not readily available due to valid privacy
     concerns.
    Researchers can:
       Limit to small (and sometimes biased) samples of users,
        restricting somewhat the conclusions that can be drawn.
       Limit the usage of private data to local computation, exploiting
        personal data only in post processing search result.
       Look for publicly available data that can be used to approximate
        query logs and click-through data (such as user bookmarks).




                                                                       157
Tag Data and Personalized Information
Retrieval

 Recently it has been shown that the information contained in
  social bookmarking (tagging) systems may be useful for
  improving Web search.
 Using data from the social bookmarking site del.icio.us, it is
  possible to demonstrate how one can rate the quality of
  personalized retrieval results.
 User's “bookmark history" can be used to improve search
  results via personalization.
    Analogously to studies involving implicit feedback mechanisms
     in IR, which have found that profiles based on the content of
     clicked URLs outperform those based on past queries alone,
     profiles based on the content of bookmarked URLs are
     generally superior to those based on tags alone.


                                                                   158
Tag Data and Personalized Information
Retrieval

 Social bookmarking systems such as del.icio.us and
  Bibsonomy are a recent and popular phenomenon.
 Users label interesting web pages (or research articles) with
  primarily short and unstructured annotations in natural
  language called tags.
 These sites offer an alternative model for discovering
  information online.
    Rather than following the traditional model of submitting
     queries to a Web search engine, users can browse tags as though
     they were directories looking for popular pages that have been
     tagged by a number of different users. Since tags are chosen by
     users from an unrestricted vocabulary, these systems can be
     seen to provide consensus categorizations of interesting
     websites.

                                                                  159
Tag Data and Personalized Information
Retrieval

 How social bookmarking data can be used to improve Web
  search?
 Can tag data be used to approximate actual user queries to a
  search engine?
 How evaluate personalized IR systems using information
  contained in social bookmarks (tag data)?
 Is there enough information in (i.e. a strong enough
  correlation between) the tags/bookmarks in a user's history in
  order to build a profile of the user that will be useful for
  personalizing search engine results?




                                                                 160
Models for generating a profile of the
user

 We record the (time ordered) stream of webpages that have been
  bookmarked by a particular user
 The first simple profile involves counting the occurrences of
  terms in the tags of any of the known bookmarks.
 An obvious problem is that users often have multiple interests
  and their many bookmarks cover a range of topics. Thus some
  bookmarks may be completely unrelated to the nth bookmark
  (and thus the tags being used as the current query).




                                                                   161
 The second source of information in the bookmarks is the
  content of the bookmarked pages themselves.
 One would expect given the much larger vocabulary of Web
  pages compared to tag data, that content may prove more
  useful than tags. Indeed content-based profiles are more
  useful than query-based ones.
 A user spends more time deliberating which pages to
  bookmark than deciding which search results to click on.
 Since a user will only bookmark sites that they find
  particularly useful or interesting, these documents should
  contain a lot of useful information about the user and the
  content of bookmarked documents is particularly useful for
  personalization.

                                                               162
 The previous profile is somewhat adhoc in its decision which
  documents to include and which not to include.
 In theory, we would like to include all documents that the user
  has bookmarked, but weight them according to their expected
  usefulness for resolving ambiguity in the current query.
 Our first attempt to estimate the distance between two
  bookmarks is to count the number of common terms in their
  respective sets of tags




                                                               163
How do we use these profiles?

    In order to incorporate the user profile for personalized
     information retrieval queries are expanded with terms from
     the profile, weighting them appropriately.
    The number of expansion terms to be added to the query is
     limited so as to limit the amount of noise and total length of
     the expanded query.
    In particular, the K most frequent terms from the profile are
     added and the weights to account for the missing terms are
     normalized.




                                                                      164
5.4 Recommendation systems
Introduction to Recommender Systems


     Systems for recommending items (e.g. books, movies,
      CD’s, web pages, newsgroup messages) to users based on
      examples of their preferences.
     Objective:
        To propose objects fitting the user needs/wishes
        To sell services (site visits) or goods
     Many search engines and on-line stores provide
      recommendations (e.g. Amazon, CDNow).
     Recommenders have been shown to substantially increase
      clicks (and sales).
Book Recommender


         Red
         Mars



        Found
        ation


        Juras-
        sic
        Park       Machine       User
        Lost       Learning     Profile
        World



        2001
                              Neuro-      2010
                              mancer

        Differ-
        ence
        Engine
Personalization


       Recommenders are instances of personalization software.
       Personalization concerns adapting to the individual
        needs, interests, and preferences of each user.
       Includes:
          Recommending
          Filtering
          Predicting (e.g. form or calendar appt. completion)
       From a business perspective, it is viewed as part of
        Customer Relationship Management (CRM).
Machine Learning and Personalization


     Machine Learning can allow learning a user
      model or profile of a particular user based on:
        Sample interaction
        Rated examples
        Similar user profiles
     This model or profile can then be used to:
        Recommend items
        Filter information
        Predict behavior
Types of recommendation systems


       1.Search-based recommendations
       2.Category-based recommendations
       3.Collaborative filtering
       4.Clustering
       5.Association rules
       6.Information filtering
       7.Classifiers
1. Search-based recommendations


     The only visitor types a search query
        « data mining customer »
     The system retrieves all the items that
      correspond to that query
        e.g. 6 books
     The system recommends some of these books
      based on general, non-personalized ranking
      (sales rank, popularity, etc.)
Search-based recommendations

     Pros:
        Simple to implement


     Cons:
        Not very powerful
        Which criteria to use to rank recommendations?
        Is it really « recommendations »?
        The user only gets what he asked for
2. Category-based recommendations

     Each item belongs to one category or more.
     Explicit / implicitchoice:
        The customer select a category of interest
         (refinesearch, opt-in for category-
         basedrecommendations, etc.).
          – « Subjects> Computers & Internet >Databases> Data
            Storage & Management > Data Mining »
        The system selects categories of interest on the
         behalf of the customer, based on the current item
         viewed, past purchases, etc.
     Certain items
      (bestsellers,
      new items) are
      eventually
      recommended
Category-based recommendations

     Pros:
        Still simple to implement


     Cons:
        Again: not very powerful, which criteria to use to
         order recommendations? is it really
         « recommendations »?
        Capacity highly depends upon the kind of
         categories implemented
          – Too specific: not efficient
          – Not specific enough: no relevant recommendations
3. Collaborative filtering

      Collaborative filtering techniques « compare »
       customers, based on their previous purchases,
       to make recommendations to « similar »
       customers
      It’s also called « social » filtering
      Follow these steps:
        1.Find customers who are similar (« nearest
          neighbors ») in term of tastes, preferences, past
          behaviors
        2.Aggregate weighted preferences of these
          neighbors
        3.Make recommendations based on these
          aggregated, weighted preferences (most
          preferred, unbought items)
Collaborative filtering

      Example: the system needs to make
       recommendations to customer C
                Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
   Customer A     X                          X
   Customer B              X        X                 X
   Customer C              X        X
   Customer D              X                                   X
   Customer E     X                                   X

      Customer B is very close to C (he has bought all
       the books C has bought). Book 5 is highly
       recommended
      Customer D is somewhat close. Book 6 is
       recommended to a lower extent
      Customers A and E are not similar at all.
       Weight=0
Collaborative filtering

  Pros:
      Extremely powerful and efficient
      Very relevant recommendations
      (1) The bigger the database, (2) the more the past
       behaviors, the better the recommendations
  Cons:
      Difficult to implement, resource and time-consuming
      What about a new item that has never been
       purchased?
       Cannot be recommended
      What about a new customer who has never bought
       anything? Cannot be compared to other customers
         no items can be recommended
4. Clustering


      Another way to make recommendations based
       on past purchases of other customers is to
       cluster customers into categories
      Each cluster will be assigned « typical »
       preferences, based on preferences of customers
       who belong to the cluster
      Customers within each cluster will receive
       recommendations computed at the cluster level
Clustering


                Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
   Customer A     X                          X
   Customer B              X        X                 X
   Customer C              X        X
   Customer D              X                                   X
   Customer E     X                                   X

     Customers B, C and D are « clustered »
      together. Customers A and E are clustered into
      another separate group
     « Typical » preferences for CLUSTER are:
         Book 2, very high
         Book 3, high
         Books 5 and 6, may be recommended
         Books 1 and 4, not recommended at all
Clustering


                Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
   Customer A     X                          X
   Customer B              X        X                 X
   Customer C              X        X
   Customer D              X                                   X
   Customer E     X                                   X
   Customer F                       X                 X

     How does it work?
     Any customer that shall be classified as a
      member of CLUSTER will receive
      recommendations based on preferences of the
      group:
         Book 2 will be highly recommended to Customer F
         Book 6 will also be recommended to some extent
Clustering

     Problem: customers may belong to more than
      one cluster; clusters may overlap
     Predictions are then averaged across the
      clusters, weighted by participation
                    Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
       Customer A     X                          X
       Customer B              X        X                 X
       Customer C              X        X
       Customer D              X                                   X
       Customer E     X                                   X
       Customer F                       X                 X

                    Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
       Customer A     X                          X
       Customer B              X        X                 X
       Customer C              X        X
       Customer D              X                                   X
       Customer E     X                                   X
       Customer F                       X                 X
Clustering

     Pros:
        Clustering techniques work on aggregated data:
         faster
        It can also be applied as a « first step » for
         shrinking the selection of relevant neighbors in a
         collaborative filtering algorithm


     Cons:
        Recommendations (per cluster) are less relevant
         than collaborative filtering (per individual)
5. Association rules

      Clustering works at a group (cluster) level
      Collaborative filtering works at the customer
       level
      Association rules work at the item level
Association rules

          Past purchases are transformed into
           relationships of common purchases

                          Book 1       Book 2       Book 3    Book 4     Book 5    Book 6
    Customer A              X                                   X
    Customer B                           X            X                    X
    Customer C                           X            X
    Customer D                           X                                           X
    Customer E              X                                              X
    Customer F                                        X                    X

                                                        Also bought…
                                Book 1       Book 2    Book 3   Book 4    Book 5   Book 6
                 Book 1                                           1         1
   who bought…
    Customers




                 Book 2                                  2                  1        1
                 Book 3                         2                           2
                 Book 4            1
                 Book 5            1            1         2
                 Book 6                         1
Association rules

      These association rules are then used to make
       recommendations
      If a visitor has some interest in Book 5, he will
       be recommended to buy Book 3 as well
      Recommendations are constrained to some
       minimum levels of confidence
      What if recommendations can be made using
       more than one piece of information?
                   Recommendations are aggregated
                                                Also bought…
                             Book 1   Book 2   Book 3   Book 4   Book 5   Book 6
                    Book 1                                1        1
    who bought…
     Customers




                    Book 2                       2                 1        1
                    Book 3              2                          2
                    Book 4     1
                    Book 5     1        1        2
                    Book 6              1
Association rules

     Pros:
        Fast to implement
        Fast to execute
        Not much storage space required
        Not « individual » specific
        Very successful in broad applications for large
         populations, such as shelf layout in retail stores


     Cons:
        Not suitable if knowledge of preferences change
         rapidly
        It is tempting to do not apply restrictive
         confidence rules
             May lead to litteraly stupid recommendations
6. Information filtering


      Association rules compare items based on past
       purchases
      Information filtering compare items based on
       their content
      Also called « content-based filtering » or
       « content-based recommendations »
         Can exploit syntactical information on objects
          (features)
         But also semantic knowledge of objects
          (concepts/ontologies)
Information filtering

      What is the « content » of an item?


      It can be explicit « attributes » or
       « characteristics » of the item. For example for a
       film:
         Action / adventure
         Feature Bruce Willis
         Year 1995


      It can also be « textual content » (title,
       description, table of content, etc.)
         Several techniques exist to compute the distance
          between two textual documents
Information filtering


      How does it work?
         A textual document is scanned and parsed
         Word occurrences are counted (may be stemmed)
         Several words or «tokens» are not taken into
          account: rarely used or «stop words»
         Each document is transformed into a normed
          TFIDF vector, size N(Term Frequency / Inverted
          Document Frequency).
         The distance between any pair of vector is
          computed
Information filtering

      An (unrealistic) example: how to compute
       recommendations between 8 books based only on their
       title?


      Books selected:
         Building data mining applications for CRM
         Accelerating Customer Relationships: Using CRM and
          Relationship Technologies
         Mastering Data Mining: The Art and Science of Customer
          Relationship Management
         Data Mining Your Website
         Introduction to marketing
         Consumer behavior
         marketing research, a handbook
         Customer knowledge management
COUNT
                building data       Accelerating    Mastering Data    Data Mining Your   Introduction to   consumer     marketing      customer
                   mining            Customer       Mining: The Art       Website           marketing       behavior   research, a    knowledge
               applications for    Relationships:   and Science of                                                      handbook     management
                     crm          Using CRM and       Customer
                                    Relationship     Relationship
                                   Technologies      Management
a                                                                                                                          1
accelerating                            1
and                                     1                 1
application           1
art                                                       1
behavior                                                                                                      1
building              1
consumer                                                                                                      1
crm                   1                 1
customer                                1                 1                                                                              1
data                  1                                   1                  1
for                   1
handbook                                                                                                                   1
introduction                                                                                   1
knowledge                                                                                                                                1
management                                                1                                                                              1
marketing                                                                                      1                           1
mastering                                                 1
mining                1                                   1                  1
of                                                        1
relationship                            2                 1
research                                                                                                                   1
science                                                   1
technology                              1
the                                                       1
to                                                                                             1
using                                   1
website                                                                      1
your                                                                         1
TFIDF Normed Vectors
                building data       Accelerating    Mastering Data    Data Mining Your   Introduction to   consumer     marketing      customer
                   mining            Customer       Mining: The Art       Website           marketing       behavior   research, a    knowledge
                  Mastering Data Mining:
               applications for    Relationships:   and Science of
                                                                                         Data mining                    handbook     management
                     crm          Using CRM and       Customer
                     The Art and Science
                                    Relationship     Relationship
                                   Technologies
                 of Customer Relationship            Management                          your website
a                 0.000     0.000    0.000                                 0.000             0.000          0.000        0.537         0.000
accelerating
and
                  0.000
                  0.000
                        Management
                            0.432
                            0.296
                                     0.000
                                     0.256
                                                                           0.000
                                                                           0.000
                                                                                             0.000
                                                                                             0.000
                                                                                                            0.000
                                                                                                            0.000
                                                                                                                         0.000
                                                                                                                         0.000
                                                                                                                                       0.000
                                                                                                                                       0.000
application         0.502             0.000             0.000              0.000             0.000          0.000        0.000         0.000
art                 0.000             0.000             0.374              0.000             0.000          0.000        0.000         0.000
behavior            0.000             0.000             0.000              0.000             0.000          0.707        0.000         0.000
building            0.502             0.000             0.000              0.000             0.000          0.000        0.000         0.000
consumer            0.000             0.000             0.000              0.000             0.000          0.707        0.000         0.000
crm                 0.344             0.296             0.000              0.000             0.000          0.000        0.000         0.000
customer            0.000             0.216             0.187              0.000             0.000          0.000        0.000         0.381
data                0.251             0.000             0.187              0.316             0.000          0.000        0.000         0.000
for                 0.502             0.000             0.000              0.000             0.000          0.000        0.000         0.000
handbook            0.000             0.000             0.000              0.000             0.000          0.000        0.537         0.000
introduction        0.000             0.000             0.000              0.000             0.636          0.000        0.000         0.000
knowledge           0.000             0.000             0.000              0.000             0.000          0.000        0.000         0.763
management          0.000             0.000             0.256              0.000             0.000          0.000        0.000         0.522
marketing           0.000             0.000             0.000              0.000             0.436          0.000        0.368         0.000
mastering           0.000             0.000             0.374              0.000             0.000          0.000        0.000         0.000
mining              0.251             0.000             0.187              0.316             0.000          0.000        0.000         0.000
of                  0.000             0.000             0.374              0.000             0.000          0.000        0.000         0.000
relationship
research
                    0.000
                    0.000         Data0.468
                                      0.000
                                                        0.256
                                                        0.000
                                                                           0.000
                                                                           0.000
                                                                                             0.000
                                                                                             0.000
                                                                                                            0.000
                                                                                                            0.000
                                                                                                                         0.000
                                                                                                                         0.537
                                                                                                                                       0.000
                                                                                                                                       0.000
science             0.000             0.000             0.374              0.000             0.000          0.000        0.000         0.000
technology          0.000             0.432             0.000              0.000             0.000          0.000        0.000         0.000
the                 0.000             0.000             0.374              0.000             0.000          0.000        0.000         0.000
to                  0.000             0.000             0.000              0.000             0.636          0.000        0.000         0.000
using               0.000             0.432             0.000              0.000             0.000          0.000        0.000         0.000
website             0.000             0.000             0.000              0.632             0.000          0.000        0.000         0.000
your                0.000             0.000
                                                           0.187
                                                        0.000              0.632
                                                                                           0.316
                                                                                             0.000          0.000        0.000         0.000
Information filtering

      A customer is interested in the following book:
       « Building data mining applications for CRM »
      The system computes distances between this book and the
       7 others
      The « closest » books are recommended:
         #1:Data Mining Your Website
         #2:Accelerating Customer Relationships: Using CRM
          and Relationship Technologies
         #3:Mastering Data Mining: The Art and Science of
          Customer Relationship Management
         Not recommended:Introduction to marketing
         Not recommended: Consumer behavior
         Not recommended:marketing research, a handbook
         Not recommended: Customer knowledge
          management
Information filtering

      Pros:
         No need for past purchase history
         Not extremely difficult to implement


      Cons:
         « Static » recommendations
         Not efficient is content is not very informative
          e.g. information filtering is more suited to
          recommend technical books than novels or movies
7. Classifiers

      Classifiers are general computational models
      They may take in inputs:
         Vector of item features (action / adventure, Bruce
          Willis)
         Preferences of customers (like action / adventure)
         Relations among items
      They may give as outputs:
         Classification
         Rank
         Preference estimate
      That can be a neural network, Bayesian network, rule
       induction model, etc.
      The classifier is trained using a training set
Classifiers

      Pros:
         Versatile
         Can be combined with other methods to improve
          accuracy of recommendations


      Cons:
         Need a relevant training set
Collaborative Filtering

      Maintain a database of many users’ ratings of a variety of
       items.
      For a given user, find other similar users whose ratings
       strongly correlate with the current user.
      Recommend items rated highly by these similar users, but
       not rated by the current user.
      Almost all existing commercial recommenders use this
       approach (e.g. Amazon).
Collaborative Filtering




           A   9        A      A   5         A          A 6             A 10
User       B   3        B      B   3         B          B 4             B 4
           C            C 9    C             C 8        C               C 8
Database   :   :        : :    :   :         : :        : :             . .
           Z   5        Z 10   Z   7         Z          Z               Z 1




                                                         A   9   A 10
                                                         B   3   B 4
                               Correlation               C       C 8
                                 Match                   :   :   . .
                                                         Z   5   Z 1


                                       A 9
               Active                  B 3             Extract
                User                                                           C
                                       C           Recommendations
                                       . .
                                       Z 5
Collaborative Filtering Method

     Weight all users with respect to similarity with
      the active user.
     Select a subset of the users (neighbors) to use
      as predictors.
     Normalize ratings and compute a prediction from
      a weighted combination of the selected
      neighbors’ ratings.
     Present items with highest predicted ratings as
      recommendations.
Significance Weighting

     Important not to trust correlations based on very
      few co-rated items.
     Include significance weights, based on number of
      co-rated items.
        If no items are rated by both users, correlation is not meaningful
Neighbor Selection

     For a given active user, a, select correlated users
      to serve as source of predictions.
     Standard approach is to use the most similarn
      users, u, based on similarity weights, wa,u
     Alternate approach is to include all users whose
      similarity weight is above a given threshold.
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business
Strategic scenarios in digital content and digital business

More Related Content

What's hot

Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Sue Beckingham
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Thomas Ryberg
 
10 things to know about presserving socialmedia
10 things to know about presserving socialmedia10 things to know about presserving socialmedia
10 things to know about presserving socialmediakawanicole
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Paul Gilbreath
 
NMC Horizon Report > 2008 Higher Ed Edition Presentation
NMC Horizon Report > 2008 Higher Ed Edition PresentationNMC Horizon Report > 2008 Higher Ed Edition Presentation
NMC Horizon Report > 2008 Higher Ed Edition PresentationNew Media Consortium
 
What is Web 3.0?
What is Web 3.0?What is Web 3.0?
What is Web 3.0?Johan Koren
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lora Aroyo
 
Building Online Communities
Building Online CommunitiesBuilding Online Communities
Building Online CommunitiesLisa Trager
 
Digital learning; connected, collaborated and constructed
Digital learning; connected, collaborated and constructedDigital learning; connected, collaborated and constructed
Digital learning; connected, collaborated and constructedJacob Theilgaard
 
Grey Template Differentiated Instruction For Digital Natives
Grey Template Differentiated Instruction For Digital NativesGrey Template Differentiated Instruction For Digital Natives
Grey Template Differentiated Instruction For Digital Nativesandrea feeney
 
Driving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open DataDriving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open DataJeanne Holm
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0New Media Days
 
Web 2.0: Implications for Library Services
Web 2.0: Implications for Library ServicesWeb 2.0: Implications for Library Services
Web 2.0: Implications for Library ServicesADINET Ahmedabad
 
Social Software and Participatory Learning: Pedagogical Choices with Technolo...
Social Software and Participatory Learning: Pedagogical Choices with Technolo...Social Software and Participatory Learning: Pedagogical Choices with Technolo...
Social Software and Participatory Learning: Pedagogical Choices with Technolo...wanzahirah
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011Vincent Ducrey
 
Social Information Processing (Tin180 Com)
Social Information Processing (Tin180 Com)Social Information Processing (Tin180 Com)
Social Information Processing (Tin180 Com)Tin180 VietNam
 
Knowledge Management
Knowledge ManagementKnowledge Management
Knowledge ManagementBarbora P
 
Web 2.0
Web 2.0Web 2.0
Web 2.0bjornh
 

What's hot (20)

Creating Knowledge Sharing Networks
Creating Knowledge Sharing NetworksCreating Knowledge Sharing Networks
Creating Knowledge Sharing Networks
 
Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence Digital Connectedness: Taking Ownership of Your Professional Online Presence
Digital Connectedness: Taking Ownership of Your Professional Online Presence
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0
 
10 things to know about presserving socialmedia
10 things to know about presserving socialmedia10 things to know about presserving socialmedia
10 things to know about presserving socialmedia
 
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
Web 2.0 Collective Intelligence - How to use collective intelligence techniqu...
 
NMC Horizon Report > 2008 Higher Ed Edition Presentation
NMC Horizon Report > 2008 Higher Ed Edition PresentationNMC Horizon Report > 2008 Higher Ed Edition Presentation
NMC Horizon Report > 2008 Higher Ed Edition Presentation
 
What is Web 3.0?
What is Web 3.0?What is Web 3.0?
What is Web 3.0?
 
Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)Lecture 7: Social Web Challenges (2012)
Lecture 7: Social Web Challenges (2012)
 
Building Online Communities
Building Online CommunitiesBuilding Online Communities
Building Online Communities
 
Social Intranet
Social IntranetSocial Intranet
Social Intranet
 
Digital learning; connected, collaborated and constructed
Digital learning; connected, collaborated and constructedDigital learning; connected, collaborated and constructed
Digital learning; connected, collaborated and constructed
 
Grey Template Differentiated Instruction For Digital Natives
Grey Template Differentiated Instruction For Digital NativesGrey Template Differentiated Instruction For Digital Natives
Grey Template Differentiated Instruction For Digital Natives
 
Driving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open DataDriving Innovation with Knowledge Sharing and Open Data
Driving Innovation with Knowledge Sharing and Open Data
 
Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0Derrick De K Brainframes Of Web 2.0
Derrick De K Brainframes Of Web 2.0
 
Web 2.0: Implications for Library Services
Web 2.0: Implications for Library ServicesWeb 2.0: Implications for Library Services
Web 2.0: Implications for Library Services
 
Social Software and Participatory Learning: Pedagogical Choices with Technolo...
Social Software and Participatory Learning: Pedagogical Choices with Technolo...Social Software and Participatory Learning: Pedagogical Choices with Technolo...
Social Software and Participatory Learning: Pedagogical Choices with Technolo...
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011
 
Social Information Processing (Tin180 Com)
Social Information Processing (Tin180 Com)Social Information Processing (Tin180 Com)
Social Information Processing (Tin180 Com)
 
Knowledge Management
Knowledge ManagementKnowledge Management
Knowledge Management
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 

Viewers also liked

Gita Study Nov 10 Dr. Shriniwas J. Kashalikar
Gita Study  Nov 10  Dr. Shriniwas J. KashalikarGita Study  Nov 10  Dr. Shriniwas J. Kashalikar
Gita Study Nov 10 Dr. Shriniwas J. Kashalikaramolsawarkar
 
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...Chris Poppe
 
2009.09.29 chris poppe - metadata
2009.09.29   chris poppe - metadata2009.09.29   chris poppe - metadata
2009.09.29 chris poppe - metadataChris Poppe
 
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...MOVIO: Interactive digital storytelling for mediation and valorisation of cul...
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...Sam Habibi Minelli
 
FRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceFRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceJakob .
 

Viewers also liked (8)

Gita Study Nov 10 Dr. Shriniwas J. Kashalikar
Gita Study  Nov 10  Dr. Shriniwas J. KashalikarGita Study  Nov 10  Dr. Shriniwas J. Kashalikar
Gita Study Nov 10 Dr. Shriniwas J. Kashalikar
 
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...
SeMuDate-SAMT How To Align Media Metadata Schemas, Design And Implementation ...
 
RDF briefing
RDF briefingRDF briefing
RDF briefing
 
Genomics and OpenHelix - Basic Intro 12apr09
Genomics and OpenHelix - Basic Intro 12apr09Genomics and OpenHelix - Basic Intro 12apr09
Genomics and OpenHelix - Basic Intro 12apr09
 
2009.09.29 chris poppe - metadata
2009.09.29   chris poppe - metadata2009.09.29   chris poppe - metadata
2009.09.29 chris poppe - metadata
 
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...MOVIO: Interactive digital storytelling for mediation and valorisation of cul...
MOVIO: Interactive digital storytelling for mediation and valorisation of cul...
 
ODASE Introduction
ODASE IntroductionODASE Introduction
ODASE Introduction
 
FRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic ResourceFRBR light with Simplified Ontology for Bibliographic Resource
FRBR light with Simplified Ontology for Bibliographic Resource
 

Similar to Strategic scenarios in digital content and digital business

Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Lynne Thomas
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web ExperienceJohn Breslin
 
Web 3.0? A look at the future of the World Wide Web
Web 3.0?  A look at the future of the World Wide WebWeb 3.0?  A look at the future of the World Wide Web
Web 3.0? A look at the future of the World Wide Webrgkwml
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentAlberto Labarga
 
Inter Lab06 Bebo White 1
Inter Lab06 Bebo White 1Inter Lab06 Bebo White 1
Inter Lab06 Bebo White 1Ram Srivastava
 
Walking Our Way to the Web
Walking Our Way to the WebWalking Our Way to the Web
Walking Our Way to the WebFabien Gandon
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009Salim Ismail
 
Workshop A, Keith De La Rue, E2.0
Workshop A, Keith De La Rue, E2.0Workshop A, Keith De La Rue, E2.0
Workshop A, Keith De La Rue, E2.0guesta04b0
 
Web 2.0 in Libraries: Theory and Practice
Web 2.0 in Libraries: Theory and PracticeWeb 2.0 in Libraries: Theory and Practice
Web 2.0 in Libraries: Theory and PracticeMeredith Farkas
 
IA in Wikipedia Poster, IA Summit
IA in Wikipedia Poster, IA SummitIA in Wikipedia Poster, IA Summit
IA in Wikipedia Poster, IA SummitNoreen Whysel
 
Northwest Elearning Community Conference Keynote
Northwest Elearning Community Conference Keynote Northwest Elearning Community Conference Keynote
Northwest Elearning Community Conference Keynote webstu
 
Northwest eLearning Community Conference Keynote (10-07)
Northwest eLearning Community Conference Keynote (10-07)Northwest eLearning Community Conference Keynote (10-07)
Northwest eLearning Community Conference Keynote (10-07)Cable Green
 
Web 2.0, entreprise 2.0, E-Learning 2.0
Web 2.0, entreprise 2.0, E-Learning 2.0Web 2.0, entreprise 2.0, E-Learning 2.0
Web 2.0, entreprise 2.0, E-Learning 2.0UIR Webscience
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social ProcessRobert Cormia
 
Wikinomics and the Future of Education
Wikinomics and the Future of EducationWikinomics and the Future of Education
Wikinomics and the Future of EducationAnthony Williams
 
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...Jisc
 
I. Pecha Kucha - Collective Intelligence
I. Pecha Kucha - Collective IntelligenceI. Pecha Kucha - Collective Intelligence
I. Pecha Kucha - Collective IntelligenceMEDSrm
 

Similar to Strategic scenarios in digital content and digital business (20)

Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web Experience
 
Web 3.0? A look at the future of the World Wide Web
Web 3.0?  A look at the future of the World Wide WebWeb 3.0?  A look at the future of the World Wide Web
Web 3.0? A look at the future of the World Wide Web
 
Web2 Oct08
Web2 Oct08Web2 Oct08
Web2 Oct08
 
Web2 Oct08
Web2 Oct08Web2 Oct08
Web2 Oct08
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living document
 
Inter Lab06 Bebo White 1
Inter Lab06 Bebo White 1Inter Lab06 Bebo White 1
Inter Lab06 Bebo White 1
 
Walking Our Way to the Web
Walking Our Way to the WebWalking Our Way to the Web
Walking Our Way to the Web
 
Intelligentcontent2009
Intelligentcontent2009Intelligentcontent2009
Intelligentcontent2009
 
Workshop A, Keith De La Rue, E2.0
Workshop A, Keith De La Rue, E2.0Workshop A, Keith De La Rue, E2.0
Workshop A, Keith De La Rue, E2.0
 
Web 2.0 in Libraries: Theory and Practice
Web 2.0 in Libraries: Theory and PracticeWeb 2.0 in Libraries: Theory and Practice
Web 2.0 in Libraries: Theory and Practice
 
IA in Wikipedia Poster, IA Summit
IA in Wikipedia Poster, IA SummitIA in Wikipedia Poster, IA Summit
IA in Wikipedia Poster, IA Summit
 
Northwest Elearning Community Conference Keynote
Northwest Elearning Community Conference Keynote Northwest Elearning Community Conference Keynote
Northwest Elearning Community Conference Keynote
 
Northwest eLearning Community Conference Keynote (10-07)
Northwest eLearning Community Conference Keynote (10-07)Northwest eLearning Community Conference Keynote (10-07)
Northwest eLearning Community Conference Keynote (10-07)
 
Web 2.0, entreprise 2.0, E-Learning 2.0
Web 2.0, entreprise 2.0, E-Learning 2.0Web 2.0, entreprise 2.0, E-Learning 2.0
Web 2.0, entreprise 2.0, E-Learning 2.0
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social Process
 
Wikinomics and the Future of Education
Wikinomics and the Future of EducationWikinomics and the Future of Education
Wikinomics and the Future of Education
 
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
 
I. Pecha Kucha - Collective Intelligence
I. Pecha Kucha - Collective IntelligenceI. Pecha Kucha - Collective Intelligence
I. Pecha Kucha - Collective Intelligence
 
Work 2.0 Tech Best Practices Aenc
Work 2.0   Tech Best Practices   AencWork 2.0   Tech Best Practices   Aenc
Work 2.0 Tech Best Practices Aenc
 

More from Marco Brambilla

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...Marco Brambilla
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Marco Brambilla
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Marco Brambilla
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheresMarco Brambilla
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social MediaMarco Brambilla
 
Trigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demoTrigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demoMarco Brambilla
 
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Marco Brambilla
 
Analyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projectsAnalyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projectsMarco Brambilla
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...Marco Brambilla
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksMarco Brambilla
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Marco Brambilla
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionMarco Brambilla
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...Marco Brambilla
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Marco Brambilla
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Marco Brambilla
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...Marco Brambilla
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.Marco Brambilla
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoMarco Brambilla
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introductionMarco Brambilla
 

More from Marco Brambilla (20)

M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
Thesis Topics and Proposals @ Polimi Data Science Lab - 2023 - prof. Brambill...
 
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023Hierarchical Transformers for User Semantic Similarity - ICWE 2023
Hierarchical Transformers for User Semantic Similarity - ICWE 2023
 
Exploring the Bi-verse. A trip across the digital and physical ecospheres
Exploring the Bi-verse.A trip across the digital and physical ecospheresExploring the Bi-verse.A trip across the digital and physical ecospheres
Exploring the Bi-verse. A trip across the digital and physical ecospheres
 
Conversation graphs in Online Social Media
Conversation graphs in Online Social MediaConversation graphs in Online Social Media
Conversation graphs in Online Social Media
 
Trigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demoTrigger.eu: Cocteau game for policy making - introduction and demo
Trigger.eu: Cocteau game for policy making - introduction and demo
 
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
Generation of Realistic Navigation Paths for Web Site Testing using RNNs and ...
 
Analyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projectsAnalyzing rich club behavior in open source projects
Analyzing rich club behavior in open source projects
 
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...Analysis of On-line Debate on Long-Running Political Phenomena.The Brexit C...
Analysis of On-line Debate on Long-Running Political Phenomena. The Brexit C...
 
Community analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networksCommunity analysis using graph representation learning on social networks
Community analysis using graph representation learning on social networks
 
Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals Available Data Science M.Sc. Thesis Proposals
Available Data Science M.Sc. Thesis Proposals
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018Iterative knowledge extraction from social networks. The Web Conference 2018
Iterative knowledge extraction from social networks. The Web Conference 2018
 
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...Driving Style and Behavior Analysis based on Trip Segmentation over GPS  Info...
Driving Style and Behavior Analysis based on Trip Segmentation over GPS Info...
 
Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...Myths and challenges in knowledge extraction and analysis from human-generate...
Myths and challenges in knowledge extraction and analysis from human-generate...
 
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
Harvesting Knowledge from Social Networks: Extracting Typed Relationships amo...
 
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...Model-driven Development of  User Interfaces for IoT via Domain-specific Comp...
Model-driven Development of User Interfaces for IoT via Domain-specific Comp...
 
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.A Model-Based Method for  Seamless Web and Mobile Experience. Splash 2016 conf.
A Model-Based Method for Seamless Web and Mobile Experience. Splash 2016 conf.
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Web Science. An introduction
Web Science. An introductionWeb Science. An introduction
Web Science. An introduction
 

Recently uploaded

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Juan Carlos Gonzalez
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Recently uploaded (20)

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?Governance in SharePoint Premium:What's in the box?
Governance in SharePoint Premium:What's in the box?
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

Strategic scenarios in digital content and digital business

  • 1. Strategic Scenarios in Digital contents Marco Brambilla et al. Politecnico di Milano, DEI and MIP Acer Academy May 2009 http://home.dei.polimi.it/mbrambil/
  • 2. Agenda overview  Information overload  Evolution of contents  Web 2.0  Web 3.0  Tools and technologies for managing information overload
  • 4. Introduction and motivation  161 exabytes of information was created or replicated worldwide in 2006  IDC estimates 6X growth by 2010 to 988 exabytes (a zetabyte) / year  That‟s more than in the previous 5,000 years. – DATA from: Dr. Michael L. Brodie - Chief Scientist Verizon
  • 5. Where does content come from  The largest source of data?  USERS  YouTube Videos  1.7 billion served / month  1 million streams / day = 75 billion e-mails  Facebook had [in 2007] …  1.8 billion photos  31 million active users  100.000 new users / day  1,800 applications  MySpace, 185+ million registered users (Apr 2007), has…  Images: – 1+ billion - Millions uploaded / day- 150,000 requests / sec  Songs: – 25 million - 250,000 concurrent streams  Videos: – 60 TB - 60,000 uploaded / day - 15,000 concurrent streams
  • 6. Quality of data  (User Generated) Content is:  25% original; 75% replicated  25% from the workplace; 75% not  95% unstructured and growing  While enterprise data is 10-15% structured and decreasing  Main challenges:  How to make multimedia content available to search engines and search based applications?  Exploiting multimedia content requires: – Acquiring it – (Re) Formatting it – Indexing it – Querying it – Transmitting it – Browsing it
  • 7. Information overload effects on (our) way of working For knowledge workers • Time is limited • Processes overlap • Knowledge is (often) artefact- dependent • Tools allow multiplicity of uses • Need for several tools • Relations with people take time • Contexts mix and merge
  • 9. Working with information  Types of information  Usefulness – Active: ephemeral and working (“hot”) – Dormant: inactive, potentially useful (“cold”) – Not useful – Un-accessed  Ownership: mine or not-mine  Activities  Acquisition of items to form a collection  Organisation of items  Maintenance of the collection (e.g. archiving items into long- term storage)  Retrieval of items for reuse  Information (and choice) overload.. On YOUTUBE
  • 10. Acquisition  Different between tools  Manual (files), uncontrolled (e-mails)  Push vs. pull  Reasons for deciding how to store information  Portability  Number of access points  Preservation of information in its current state  Currency of information  Context  Reminding  Ease of integration into existing structures  Communication and information sharing  Ease of maintenance
  • 11. Organisation  Categorisations are complex  Folders vs. keywords  Trees vs. webs  Change over time  Categorisations are local  If two groups of people construct thesauri in a particular subject area, the overlap of index terms will only be 60%  Two indexers using the same thesaurus on the same document use common index terms in only 30% of cases  The output from two experienced database searchers has only 40% overlap  Experts' judgements of relevance concur in only 60% of cases
  • 12. Maintanance  Hardly any  Occasional cleaning  Extensive maintenance is related to major life changes (e.g. new job)
  • 13. Retrieval  Personal archives instead of corporate systems  Need to start searching  Not invented here: reinventing is more fun than reusing  Asking is more difficult than sharing  Social search: asking others  Estimations of quality and relevance are best made by experts themselves  It's fastest and most efficient way  Colleagues can give you feedback and help to sharpen your questions  Consulting others is fun  While searching systems  Preference for location-based search  Critical reminding function of file placement  Lack of retrieval of archived files
  • 14. 2. Evolution of contents
  • 15. Evolution of contents and technologies  I. from static to dynamic  II. from fixed to mobile  III. from big to small  IV. from local to global  V. from vertical to horizontal  VI. from sometimes-on to always-on  VII. from wired to wireless  VIII. from divergence to convergence 15
  • 16. Content proliferation and classification  Proliferation of  blogs  online video  podcasting,  other social media tools  the definition of what consititutes ‟web‟/‟non-web‟ content has become increasingly blurred 16
  • 17. Pervasive and convergent digital content 17
  • 21. Social- vs. Group- ware  The basic model of 90's era collaboration (Lotus Notes): all about the group.  Information was managed in group-based repositories, then passed around for review, or published to intranet portals via customized apps. Information era workflows where people are first and foremost occupiers of roles, not individuals, and the materials being created are more closely aligned with groups than individuals.  Web 2.0 social tools: MySpace, Facebook, LinkedIn Social networks -- explicit ones or implicit ones in social media –  are really organized around individuals and their networked self-expression. I am writing this blog post, and publishing it, personally. It is not the product of some workgroup. It is not an anonymous chunk of text on a corporate portal. My Facebook profile pulls traffic from my network of contacts, sources I find interesting, and the chance presence updates of my friends.  See: http://www.stoweboyd.com/message/2007/01/in_the_time_of_.html 21
  • 22. Doug Engelbart, 1968 "The grand challenge is to boost the collective IQ of organizations and of society. "
  • 23. Tim O’Reilly, 2006, on Web 2.0 “The central principle behind the success of the giants born in the Web 1.0 era who have survived to lead the Web 2.0 era appears to be this, that they have embraced the power of the web to harness collective intelligence”
  • 24. Web 2.0 is about The Social Web “Web 2.0 Is Much More About A Change In People and Society Than Technology” -Dion Hinchcliffe, tech blogger  1 billion people connect to the Internet  100 million web sites  over a third of adults in US have contributed content to the public Internet. - 18% of adults over 65
  • 25. Tim Berners-Lee “The Web isn’t about what you can do with computers. It’s people and, yes, they are connected by computers. But computer science, as the study of what happens in a computer, doesn’t tell you about what happens on the Web.” NY Times, Nov 2, 2006
  • 26. But what is “collective intelligence” in the social web sense?  intelligent collection?  collaborative bookmarking, searching  “database of intentions”  clicking, rating, tagging, buying  what we all know but hadn‟t got around to saying in public before  blogs, wikis, discussion lists “database of intentions” – Tim O’Reilly
  • 27. the wisdom of clouds?
  • 28. “Collective Knowledge” Systems  The capacity to provide useful information  based on human contributions  which gets better as more people participate.  typically  mix of structured, machine-readable data and unstructured data from human input
  • 29. Collective Knowledge is Real  FAQ-o-Sphere - self service Q&A forums  Citizen Journalism – “We the Media”  Product reviews for gadgets and hotels  Collaborative filtering for books and music  Amateur Academia
  • 31. Web 2.0 The phrase "Web 2.0" can refer to one or more of the following:  The transition of web sites from isolated information silos to sources of content and functionality, thus becoming computing platforms serving web applications to end-users  A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use, and "the market as a conversation”  Enhanced organization and categorization of content, emphasizing deep linking  A rise in the economic value of the Web, possibly surpassing the impact of the dot-com boom of the late 1990s
  • 32. Two main kinds  PEOPLE FOCUS: The first kind of socializing is typified by "people focus" websites such as Bebo, Facebook, and Myspace and Xiaonei.  HOBBY FOCUS: The second kind of socializing is typified by a sort of "hobby focus" websites. such as Flickr, Kodak Gallery and Photobucket
  • 33. Web 2.0 (see Wesch from YouTube [LOCAL]) Since social web applications are built to encourage communication between people, they typically emphasize some combination of the following social attributes:  Identity: who are you?  Reputation: what do people think you stand for?  Presence: where are you?  Relationships: who are you connected with? who do you trust?  Groups: how do you organize your connections?  Conversations: what do you discuss with others?  Sharing: what content do you make available for others to interact with?  Examples of social applications include Twitter, Facebook, Stumpedia, and Jaiku.
  • 34. Keyword: sharing!  Sharing...  Useful vs. Not useful (!?) 
  • 35. Sharing for the enterprise? (1) A teenager model? (2) Always useful?
  • 36. Community 36
  • 37. Human Resource Management 2.0  Social networks for the job market – To find and be found – To manage your online reputation – To research and reference check – To hire a superstar – To use your network to do your job better – To use your network to get a better job http://www.linkedin.com/
  • 38. Blog  a user-generated website where entries are made in journal style and displayed in a reverse chronological order. The term "blog" is derived from "Web log." "Blog" can also be used as a verb, meaning to maintain or add content to a blog.
  • 39. Wiki  a website that allows the visitors themselves to easily add, remove, and otherwise edit and change available content, typically without the need for registration. This ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring.
  • 41. Wiki vs. Blog A blog, or web log, shares writing and multimedia content in the form of “posts” (starting point entries) and “comments” (responses to the posts). While commenting, and even posting, are open to the members of the blog or the general public, no one is able to change a comment or post made by another. The usual format is post-comment-comment-comment, and so on. For this reason, blogs are often the vehicle of choice to expressindividual opinions. A wiki has a far more open structure and allows others to change what one person has written. This openness may trump individual opinion withgroup consensus.
  • 42. Special purpose blogs: photos, music, ... 42
  • 43. (Social) Tagging  Term – a word or phrase that is recognizable by people and computers  Document – a thing to be tagged, identifiable by a URI or a similar naming service  Tagger – someone or thing doing the tagging, such as the user of an application  Tagged – the assertion by Tagger that Document should be tagged with Term
  • 44. Podcast  A podcast is a media file that is distributed by subscription (paid or unpaid) over the Internet using syndication feeds, for playback on mobile devices and personal computers.
  • 45. Examples of Podcasts available  iTunes Store  NPR  ArtsEdge  Ed. Podcast Network  SFMoMA
  • 46. Blog with Podcasts & Wikis  Several functions on the same platform
  • 48. Collecting feedbacks – SurveyMonkey SurveyMonkey.com
  • 49. Tools. Example: collaboration and sharing  Webex  Meeting center  Training center  Acquired by CISCO in 2007  Integrated phone conferencing, VoIP, support for PowerPoint, Flash, audio, and video;  Meeting recording and playback, One-click meeting access, scheduling, and IM applications, full compatibility, secure communications  See http://www.sramanamitra.com/2007/03/15/cisco-acquires- webex-beefs-collaboration/ 49
  • 50. Trends and size  Facebook growth: 700% from 2008 to 2009  Twitter growth: 3,700%  And unique visitors..
  • 51. One big social application? Facebook connect!  evolution of Facebook Platform enabling you to integrate Facebook into your own site. You can add social context to your site:  Identity. Seamlessly connect the user's Facebook account with your site  Friends. Bring a user's Facebook friends into your site.  Social Distribution. Publish information back into Facebook.  Privacy. Bring dynamic privacy to your site. How scalable, reliable, open-minded? 51
  • 52. Wouldn’t this be better? But.. 52
  • 53. The Mash-up approach  User-defined combination of services available on the web  Graphical design  Immediate execution
  • 54. E.g.: airlines mash-up Tracing of referral, searches, and so on […]
  • 55. SOA vs. Web 2.0 SOA Web 2.0 Planning Design Implementation Monitoring
  • 56. Comparison ... Web 2.0 SOA Saas = Saas Web-based interoperability Standard based interoperability (REST) (SOAP, WSDL, UDDI) Application as a platform = Application as a platform Pushes for unexpected reuse Allows reuse RIA No UI Participatory architecture Centralized governance
  • 57. … and complementarity Fonte: Babak Hosseinzadeh, IBM
  • 58. Short term challenge: Mash-up on SOA Mash-up SOA
  • 59. Mid-term: Web as a platform  The past  The future […] […] Framework Framework API API API API API API API API API RSS RSS RSS REST SOAP REST REST SOAP SOAP […] […] Operating System Web Hardware Internet
  • 60. Example: eBay  Services for  shopping  trading  Publishes services  REST interface  SOAP interface  Numbers1:  4 billion requests/month (5.5 mln/h)  25% of the offer only via Web Service  25000 registered developers  1900 known applications 1http://blogs.zdnet.com/ITFacts/?p=10326
  • 61. Example: Amazon  Services for  e-commerce  on-line payment  computing (EC2)  storage (s3)  human computing (MTurk)  Queues (SQS)  Success stories  Ex 1, Jungle Disk: online back-up service  Ex 2, ABACA:99%-protection antispam
  • 62. (NOT) Artificial intelligence: Mechanical Turk ! 62
  • 64. SOA provides great plumbing!
  • 65. Web 2.0 providegreatplumbing! E. Della Valle @ CEFRIELValle @ CEFRIEL - Politecnico di Milano E. Della - Politecnico di
  • 67. How to manage complexity?  A few services in a small company  Hundreds of services and processes in a big organization Few services Several services Several enterprises A1 B8 A4 A1 B3 A1 B3 A1 A1 A1 A1 A1 A1 A4 A2 A4 A1 A2 A1 A4 A2 A4 A1 A2 B3 A1 A2 One company A5 A1 A2 B3 B3 A1 A1 A1 A1 A1 B3 A1 A1 A1 A4 A6 A1 A4 A1 A1 A1 A4 A1 A2 A4 B3 A1 A1 A2 A4 A1 B3 A1 A1 A4 A1 A2 A2 A1 A4 B3 A1 A4 B3 A1 A2 A4 A1 A2 A1 A1 A1 A1 A1 A1 A1 A1 B3 A2 A4 A1 A2 A1 A1 A1 A1 A4 A1 A2 A1A1 A4 A4 A2 A1 A2A2 A4 A1 A2 A2 A4 A4 A1 A1 A1 A1 A2 A1 A4 B3 A1 A4 A2 A4 A2 A4 A1 A1A1 A1 A2 B3 B3 A4 A2 B3 A4 A1 B3 A2 A1 A1 A1 A1 A1 A4 A1 A4 A1 A4 A2 B3 B3A1 A1 A1 A1 A2 A4 A1 A1 A1 A2 A1A1 Mashup A4 A1 A2 A1 A4 A1 A1A1 A1A2 A4 A4 A1 A4 B3 A1A1 B3 B3 A1 A1 A1 ? A N1 E N2 F C D Complex BPM
  • 68. The problem is in the semantics! “The problem is not in the plumbing, it is in the semantics ” VerizonChief Scientist - M . L . Brodie “L’eterogeneità semantica rimane il principale intoppo alla integrazione di applicazioni, un intoppo che i Web Services da soli non risolveranno. Finché qualcuno non troverà un modo di per far sì che le applicazioni si capiscano, gli effetti dei Web Services resteranno limitate. Quando si passano i dati di un utente in un certo formato usando un Web Services come interfaccia, il programma che li riceve deve comunque sapere in che formato sono. Occorre comunque accordarsi sulla struttura di ciascun business object. Fino ad ora nessuno ha ancora trovato una soluzione attuabile…” Oracle Chairman and CEO - Larry Ellison
  • 69. Web 3.0  Combining SOA + Social Web + Semantic Web  I.e., Services + Folksonomies + Ontologies (or + Taxonomies) 69
  • 70. Tim Berners-Lee, 2001 “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” Scientific American, May 2001
  • 71. Beyond Web 2.0 ... Business Process Given a BPM: Find the best set of services? Find the best datasource? Integration Mediator Mediator Manage not heterogeneous Web as a world scale platform data/services? Legacy Mediator Mediator Comm. Mediator Mediator AT Services Buyer RUNTIME! […] […] […] 3rd Party Shipment
  • 72. SOA + Web 2.0 = ? UDDI WSDL Service Description WSBPEL Discovery Agencies Publish Discover Service Description Service Service requester provider Interact SOAP .. source: http://www.w3.org/TR/2002/WD-ws-arch-20021114/
  • 73. SOA Advantages Costs of different EAI approaches Relative costs Custom Integration Proprietary EAI solutions Web Services based EAI solutions SOA based EAI solutions Adoption Deployment Maintenance Changes [source ZapThink http://www.zapthink.com/]
  • 74. From vertical applications...  Different IT solutions in each department Department 1 Department 2 Department N […]
  • 75. … to service extraction …  Rationalization of IT solutions  Factorization and publication of common services Department 1 Department 2 Department N […]
  • 76. … and process composition.  For using internal subprocesses, but also processes of customers or providers. Client Department 1 Department 2 Shared services Outsourced services Provider
  • 77. “Ontology is overrated.”  “[tags] are a radical break with previous categorization strategies”  hierarchical, centrally controlled, taxonomic categorization has serious limitations  e.g., Dewey Decimal System  free-form, massively distributed tagging is resilient against several of these limitations http://shirky.com/writings/ontology_overrated.html
  • 78. But...  ontologies aren‟t taxonomies  they are for sharing, not finding  they enable cross-application aggregation and value-added services
  • 79. Ontology of Folksonomy  What would it look like to formalize an ontology for tag data?  Functional Purpose: applications that use tag data from multiple systems  tag search across multiple sites  collaboratively filtered search – “find things using tags my buddies say match those tags”  combine tags with structured query – “find all hotels in Spain tagged with “romantic” http://tomgruber.org/writing/ontology-of-folksonomy.htm
  • 80. Example: formal match, semantic mismatch  System A says a tag is a property of a document.  System B says a tag is an assertion by an individual with an identity.  Does it mean anything to combine the tag data from these two systems?  “Precision without accuracy”  “Statistical fantasy”
  • 81. Engineering the tag ontology  Working with tag community, identify core and non core agreements  Use the process of ontology engineering to surface issues that need clarification  Couple a proposed ontology with reference implementations or hosted APIs
  • 82. Issues raised by ontological engineering  is term identity invariant over case, whitespace, punctuation?  are documents one-to-one with URI identities? (are alias URLs possible?)  can tagging be asserted without human taggers?  negation of tag assertions?  tag polarity – “voting” for an assertion  tag spaces – is the scope of tagging data a user community, application, namespace, or database?
  • 83. Pivot Browsing – surfing unstructured content along structured lines  Structured data provides dimensions of a hypercube  location  author  type  date  quality rating  Travel researchers browse along any dimension.  The key structured data is the destination hierarchy  Contributors place their content into the destination hierarchy, and the other dimensions are automatic.
  • 84. 5. Tools and technologies for managing information overload
  • 85. Tools Information: The double edged sword  You want good information, not all information  Information Retrieval /search – Multimedia IR  RSS/Bloglines/Google Reader  Social bookmarking
  • 87. Data in digital libraries  TEXT: e-book, Word documents, Web pages, PDF, Blog, etc.  Audio:  Speech (broadcasting, podcasting, recording, etc.)  Music (CD, MP3, etc.)  Pictures: Personal photos, schemes, diagrams, etc.  Video: sequence of images and audio (music and/or speech) Challenge: How to make multimedia content available to search engines and search based applications?
  • 88. Some user challenges…  Precision & contextual relevancy  aware of rights, user and information contexts  personalization and recommendation  Search must support multiple interaction patterns  active searching, monitoring, browsing and "being aware“  Trust and spam  Ubiquity of access
  • 89. MIR Application Areas  Architecture, real estate, and  Investigation services interior design  (e.g., human characteristics  (e.g., searching for ideas) recognition, forensics)  Broadcast media selection  Journalism  (e.g., radio and TV channel)  (e.g. searching speeches of a certain politician using his name,  Cultural services his voice or his face)  (history museums, art galleries,  Multimedia directory services etc.)  (e.g. yellow pages, Tourist  Digital libraries information, GIS)  (e.g., musical dictionary, bio-  Multimedia editing medical imaging catalogues, film, video and radio archives)  (e.g., personalized news service, media authoring)  E-Commerce  Remote sensing  (e.g., personalized advertising, on-line catalogues)  (e.g., cartography, ecology)  Education  Shopping  (e.g., repositories of multimedia  (e.g., searching for clothes) courses)  Social  Home Entertainment  (e.g. dating services)  (e.g., personal multimedia collections)  Surveillance  (e.g., traffic control)
  • 90. MIR: Query Examples  Play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g., in terms of emotions  Draw a few lines on a screen and find a set of images containing similar graphics, logos, ideograms,...  Define objects, including color patches or textures and retrieve examples among which you select the interesting objects to compose your design  On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations  Describe actions and get a list of scenarios containing such actions  Using an excerpt of Pavarotti’s voice, obtaining a list of Pavarotti’s records, video clips where Pavarotti is singing and photographic material portraying Pavarotti
  • 91. State-of-the art of MSE  Image search  Video Search  www.tiltomo.com  www.blinx.com  www.tineye.com  www.clipta.com  www.pixsta.com  www.yovisto.com  www.picsearch.com  Music Search  Entrerprise MIR search  www.midomi.com  www.autonomy.com  www.audiobaba.com  www.pictron.com  http://www.bmat.com  www.exalead.com  www.fastsearch.com
  • 92. Metadata? 92  “Data about other data”  They describe in a structured fashion properties of the data – E.g.: owner, creation and modification date, description, etc.  Some metadata are implicitly available  E.g.: file size, file name, etc.  Others need to be manually provided or automatically extracted
  • 93. The MIR reference architecture
  • 94. Content Process Content Content Content Acquisition Transformation Indexing
  • 95. Content acquisition  In MIR, content is acquired from many sources and in in multiple ways:  By crawling  By user’s contribution  By syndicated contribution from content aggregators  Via broadcast capture (e.g., from air/cable/satellite broadcast, IPTV, Internet TV multicast, ..)
  • 96. Content acquisition  In text or Web search engines, content is a closed or open collection of documents  Textual Web content is acquired by crawlers, who exploit link navigation  In MIR, content is acquired from many sources, in a range of quality and value:  Web cams, security apps  (Video/Audio) Telephony and teleconferencing  Industrial/Academic/Medical  User Generated Content  Public Access and Government Access  Rushes, Raw Footage MOTION PICTURES VALUE  News BROADCAST TV  Advertising ENTERPRISE  TV Programming  Feature Films USER GENERATED WEB CAM, SECURITY PRODUCTION COST
  • 97. Acquisition: (video) metadata sources & formats  Content element may be accompanied by textual descriptions, which range in quantity and quality, from no description (e.g., web cam content) to multilingual high value data (closed captions and production metadata of motion pictures)  Metadata may reside:  Embedded within content (e.g., close captions)  In surrounding Web pages or links (e.g., HTML content, link anchors, etc)  In domain-specific databases (e.g., IMDB for feature films)  In ontologies: http://www.daml.org/ontologies/keyword.html ASSET PACKAGE METADATA METADATA METADATA MULTIPLEXED METADATA MEDIA STREAMS EXTERNAL METADATA
  • 98. Acquisition: (video) representative metadata standards Standard Body MPEG-7, ISO/IEC Int. Electrotechnical Comm., Motion MPEG-21 Picture Expert Group UPnP Universal Plug and Play forum MXF, MDD SMPTE Society of Motion Picture and Television Engineers AAF AMWA Advanced Media Workflow Association TV Anytime ETSI European Telecommunication Standards Institute Timed Text W3C, 3GPP RSS Harward Podcast Apple Media RSS Yahoo
  • 99. Transformation dimesions: Digital video formats  A digital video is a sequence of frames  The Frame Aspect Ratio (FAR) defines the shape of each image (width divided by heigh), with 4:3 and 16:9 being the currently adopted values  Pixel aspect ratio (PAR) describes how the width of pixels in a digital image compares to their height (rectangular pixels format exist for analog TV compatibility).  Frame rate: number of frames per second (24 and 25 are common, but also lower and higher values are used)
  • 100. Transformation dimensions: compression  Web media must be compressed, with lossy (but perceptually acceptable) transformations  In video, compression works in two ways  Intra-Frame: an image is divided in blocks, whose content is “averaged”  Inter-frame: a frame is represented differentially with respect to the preceding one, by encoding only block that “have moved” and their motion vector  Example (MPEG compression)
  • 101. Content Transformation: popular compression standards Standard Typical bitrates Applications M-JPEG, Up to 60 Consumer electronics, video JPEG2000 Mbit/sec editing systems DVCAM 25M Consumer MPEG-1 1.5M CD-ROM Multimedia MPEG-2 4-20M Broadcast TV, DVD MPEG-4 300K-12M Mobile video, Podcast, IPTV H.264 H.261 H.263 64k-1M Video teleconferencing, telephony Each standard has profiles, that balance latency, complexity, error resilience and bandwidth, specifically for a target application (e.g., file-based vs transport-based fruition)
  • 102. Content indexing  In textual search engines, content need little (lexical) analysis before indexing  Index elements (words) are part of the content  In MIR, content cannot be indexed directly  Indexablemeatadatamust be created from the input data – Low level features: concisely describe physical or perceptual properties of a media element (e.g., feature vectors) – High level features: domain concepts characterizing the content (e.g., extracted objects and their properties, content categorizations, etc)  In continuous media, extracted features must be related to the media segment that they characterize, both in space and time  Feature extraction may require a change of medium, e.g., speech to text transcription
  • 103. Motivations for metadata generation  Computer are not able to catch the underlying meaning of a multimedia content  A computer is not able to understand that this picture represents a sunset  Pixels and audio samples do not convey semantics, just binary  Metadata are used to produce representations that are manageable by computers  E.g.: text or numbers
  • 104. How to create multimedia annotations?  Manually  Expensive – It can take up to 10x the duration of the video – Problems in scaling to millions of contents  Incomplete or inaccurate – People might not be able to holistically catch all the meanings associated with a multimedia object  Difficult – Some contents are tedious to describe with words - E.g., a melody without lyrics  Automatically  Good quality – Some technologies have a ~90% precision  “Low” cost
  • 105. Indexing: the core pipeline Content Metadata processing Indexing Multimedia Metadata (e.g., MPEG-7) Indexes content (e.g., inverted (e.g., MPEG-2 video) files) Video Audio processing processing Segmentation Segmentation Audio Analysis Image Video Analysis Analysis
  • 106. Image/Text segmentation  GOAL: identify the type of contents included in an image  Text + pictures  Image sections
  • 107. Audio Segmentation  GOAL: split an audio track according to contained information  Music  Speech  Noise …  Additional usage  Identification and removal of ads
  • 108. Video Segmentation  Keyframe segmentation:  segment a video track according to its keyframes – fixed-length temporal segments  Shot detection:  automated detection of transitions between shots – a shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space.
  • 109. Speaker identification  GOAL: identify people participating in a discussion ERIC DAVID JOHN  Additional usage:  Vocal command execution
  • 110. Word spotting  GOAL: recognize spoken words belonging to a closed dictionary Call Open Bomb  Additional usage:  Spot blacklist words in spontaneous speech – E.g.: terrorist, attack,…  dialing (e.g., "Call home”)  call routing (e.g., "I would like to make a collect call”)  Domotic appliance control
  • 111. Speech to text  GOAL: automatically recognize spoken words belonging to an open dictionary  Example: quote_detection.avi CREDITS: Thorsten Hermes@SSMT2006
  • 112. Identification of audio events  GOAL: automatically identify audio events of interest  E.g.: shouts, gunshots, etc.  Additional usage:  Security applications  Example: sound_events.avi CREDITS: Thorsten Hermes@SSMT2006
  • 113. Classification of music genre, mood, etc.  GOAL: automatically classify the genre and mood of a song  Rock, pop, Jazz, Blues, etc.  Happy, aggressive, sad, melancholic, Rock Dance!  Additional usage:  Automatic selection of songs for playlist composition
  • 114. Images: low-level features  GOAL: extract implicit characteristics of a picture  luminosity  orientations  textures  Color distribution
  • 115. Images: Optical character recognition (OCR)  OCR is a technique for translating images of typed or handwritten text into symbols  Solved problem for typewritten text (99% accuracy)  Commercial solutions for handwritten text (e.g, MS Tablet PC)
  • 116. Image: face identification and recognition  GOAL: recognize and identify faces in an image  Usage examples:  People counting  Security applications  Example: face_detection.avi CREDITS: Thorsten Hermes@SSMT2006
  • 117. Image: concept detection  Image analysis extract low level features from raw data (e.g., color histograms, color correlograms, color moments, co-occurrence texture matrices, edge direction histograms, etc..)  Features can be used to build discrete classifiers, which may associate semantic concepts to images or regions thereof  The MediaMill semantic search engine defines 491 semantic concepts  http://www.science.uva.nl/research/mediamill/demo  Concepts can be detected also from text (e.g., from manual or automatic metadata) using NLP techniques (FAST text search engine recognizes entities like geographical locations, professions, names of persons, domain-specific technical concepts, etc)
  • 118. Image: object identification  GOAL: identify objects appearing in a picture  Basket ball, cars, planes, players, etc.  Also by example (unaware of position, scaling, etc) – objectByExample.mp4 CREDITS: http://www.youtube.com/user/GuoshenYu
  • 119. Video OCR  Video OCR has specific problems, due to low resolution, small text size, and interference with background  Detection is normally done on the most representative image of an entire shots, rather than frame by frame  Approach: filter for enhancing resolution + pattern matching for character identification  Example: VirageConTEXTract text extraction and recognition technology (recognizes text in real time)
  • 120. Multimodal annotation fusion  Media segmentation and concept extraction are probabilistic processes  The result is characterized by a confidence value  Significance can be enhanced by comparing the output of distinct techniques applied to the same or similar problems  Examples:  Media segmentation: shot detection + speaker’s turn identification  Person recognition: voice identification + face detection  Concept detection: image based classification (e.g., “outdoor” & “water” + object extraction: “bird”, “boat”)
  • 121. Overview of the query process
  • 122. Content querying  In textual search applications, queries are keywords or expressions thereof  In MIR, search can take place  By keyword  By (mono-media) example (e.g., query by image, query by humming, query by song similarity)  By (multi-media) example (e.g., query by video similarity)  Query by example entails real time content processing  MIR query processing naturally requires the interaction of multiple search engines (e.g., a text search engine for textual metadata and a content-based search engine for feature vectors)
  • 123. Querying: modalities  In MIR applications, search keyword match the manual or automatic metadata  A complementary approach is to provide an example of the desired content and look for similar media elements  Similarity is a medium-dependent, domain-dependent, and subjective criterion  Can be computed on low lever features (e.g., image color histograms, music bpm) or on high level concepts/categorization (e.g., melancholic images, party music)  Can be multimodal (e.g., video similarity)  Querying may also consider context information (e.g., the user’s geographical position or the access device)
  • 124. Example query modalities and search types where[contains(“amsterdam”)] and 52.37N 4.89 E topic[contains(“building”)] “amsterdam” Image Song Query analysis Federation Music search Text search Image Similarity index search XML search Geo search Inverted index Similarity index Semantic index R-tree index
  • 125. Faceted query  When a media collection is large and its content unknown to the user, exposing part of the metadata can help  This can be done by showing a compact representation of the categories of content (facets)  A query can be restricted by selecting only the relevant facets
  • 126. Querying: by keyword  The keyword may match the manual metadata and/or the automatic metadata  The match can be multimodal: in the audio, in a visual concept
  • 127. Querying: by similarity – query interface
  • 128. Content browsing  In textual search engines, results are ranked linearly, browsed by navigating links, and read at a glance  In MIR and similarity- based search applications, browsing results must consider multiple dimensions  Relevance: where the result appears in the sequence of retrieved media elements  Space: where the search has matched inside a spatially organized media element (e.g., an image)  Time: when a match occurs in a linear media element
  • 130. References  MPEG-7:  MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm  Prof. Ray Larson & Prof. Marc Davis, UC Berkeley SIMS http://www.sims.berkeley.edu/academics/courses/is 202/f03/  RSS: http://www.rssboard.org/rss-specification  MEDIA RSS: http://search.yahoo.com/mrss  MPEG:http://en.wikipedia.org/wiki/MPEG  Shot detection: http://en.wikipedia.org/wiki/Shot_boundary_detec tion
  • 131. References  MediaMill: http://www.science.uva.nl/research/mediamill  Similarity search  www.midimi.com  www.tiltomo.com  http://tineye.com/  Slides del corsodi “ArchiviMultimedialie Data Mining”, Politecnicodi Torino, Prof. Silvia Chiusano  Slides e video dellelezionetenutedal Prof. Thorsten Hermes presso la summer school SSMS 2006  PHAROS: http://www.pharos-audiovisual- search.eu/
  • 132. 5.2 RSS and readers
  • 133. Acquisition: RSS and Media RSS  RSS (Really Simple Syndication) describes a family of web feed formats used to publish frequently updated web resources (e.g., news)  An RSS feed includes full or summarized text, plus metadata such as publishing dates and authorship  RSS formats are specified using XML  RSS 2.0 now “frozen”  Media RSS proposed by Yahoo as an RSS module that supplements the <enclosure> element capabilities of RSS 2.0 to allow for more robust media syndication.
  • 136. Acquisition: an example of Media RSS
  • 138. Bloglines: web content aggregator 138
  • 140. Social bookmarking  Online shared catalogs of annotated bookmarks  Even ad-hoc sites are needed for managing complexity of bookmark sharing task 140
  • 142. Why Personalization?  Personalization is an attempt to find most relevant documents using information about user's goals, knowledge, preferences, navigation history, etc.
  • 143. Same Query, Different Intent  “Cancer”  Different meanings  “Information about the astronomical/astrological sign of cancer”  “information about cancer treatments”  Different intents  “is there any new tests for cancer?”  “information about cancer treatments”
  • 144. Personalization Algorithms  Standard IR Query Server Document Client User  Related to relevance feedback  Query expansion  Result re-ranking
  • 145. User Profile  A user‟s profile is a collection of information about the user of the system.  This information is used to get the user to more relevant information
  • 146. Core vs. Extended User Profile  Core profile  contains information related to the user search goals and interests  Extended profile  contains information related to the user as a person in order to understand or model the use that a person will make with the information retrieved
  • 147. Who Maintains the Profile?  Profile is provided and maintained by the user/administrator  Sometimes the only choice  The system constructs and updates the profile (automatic personalization)  Collaborative - user and system  User creates, system maintains  User can influence and edit  Does it help or not?
  • 148. Adaptive Search  Goals:  Present documents (pages) that are most suitable for the individual user  Methods:  Employ user profiles representing short-term and/or long- term interests  Rank and present search results taking both user query and user profile into account
  • 149. Personalized Search: Benefits  Resolving ambiguity  The profile provides a context to the query in order to reduce ambiguity.  Example: The profile of interests will allow to distinguish what the user asked about “Berkeley” (“Pirates”, “Jaguar”) really wants  Revealing hidden treasures  The profile allows to bring to surface most relevant documents, which could be hidden beyond top results page  Example: Owner of iPhone searches for Google Android. Pages referring to both would be most interesting
  • 150. Where to Apply Profiles ?  The user profile can be applied in several ways:  To modify the query itself (pre-processing)  To change the usual way of retrieval  To process results of a query (post-processing)  To present document snippets  Special case: adaptation for meta-search
  • 151. Pre-Process: Query Expansion  User profile is applied to add terms to the query  Popular terms could be added to introduce context  Similar terms could be added to resolve indexer-user mismatch  Related terms could be added to resolve ambiguity  Works with any IR model or search engine
  • 152. Pre-Process: Relevance Feedback  In this case the profile is used to “move” the query  Imagine that:  the documents,  the query  the user profile are represented by the same set of weighted index terms
  • 153. Post-Processing  The user profile is used to organize the results of the retrieval process  Present to the user the most interesting documents  Filter out irrelevant documents  Extended profile can be used effectively  In this case the use of the profile adds an extra step to processing  Similar to classic information filtering problem  Typical way for adaptive Web IR
  • 154. Post-Filter: Annotations  The result could be relevant to the user in several aspects. Fusing this relevance with query relevance is error prone and leads to a loss of data  Results are ranked by the query relevance, but annotated with visual cues reflecting other kinds of relevance  User interests - Syskill and Webert, group interests - KnowledgeSea
  • 155. Post-Filter: Re-Ranking  Re-ranking is a typical approach for post-filtering  Each document is rated according to its relevance (similarity) to the user or group profile  This rating is fused with the relevance rating returned by the search engine  The results are ranked by fused rating  User model: WIFS, group model: I-Spy
  • 156. Privacy related problems  Web Information Retrieval face a challenge; that the data required to perform evaluations, namely query logs and click- through data, is not readily available due to valid privacy concerns.  Researchers can:  Limit to small (and sometimes biased) samples of users, restricting somewhat the conclusions that can be drawn.  Limit the usage of private data to local computation, exploiting personal data only in post processing search result.  Look for publicly available data that can be used to approximate query logs and click-through data (such as user bookmarks). 157
  • 157. Tag Data and Personalized Information Retrieval  Recently it has been shown that the information contained in social bookmarking (tagging) systems may be useful for improving Web search.  Using data from the social bookmarking site del.icio.us, it is possible to demonstrate how one can rate the quality of personalized retrieval results.  User's “bookmark history" can be used to improve search results via personalization.  Analogously to studies involving implicit feedback mechanisms in IR, which have found that profiles based on the content of clicked URLs outperform those based on past queries alone, profiles based on the content of bookmarked URLs are generally superior to those based on tags alone. 158
  • 158. Tag Data and Personalized Information Retrieval  Social bookmarking systems such as del.icio.us and Bibsonomy are a recent and popular phenomenon.  Users label interesting web pages (or research articles) with primarily short and unstructured annotations in natural language called tags.  These sites offer an alternative model for discovering information online.  Rather than following the traditional model of submitting queries to a Web search engine, users can browse tags as though they were directories looking for popular pages that have been tagged by a number of different users. Since tags are chosen by users from an unrestricted vocabulary, these systems can be seen to provide consensus categorizations of interesting websites. 159
  • 159. Tag Data and Personalized Information Retrieval  How social bookmarking data can be used to improve Web search?  Can tag data be used to approximate actual user queries to a search engine?  How evaluate personalized IR systems using information contained in social bookmarks (tag data)?  Is there enough information in (i.e. a strong enough correlation between) the tags/bookmarks in a user's history in order to build a profile of the user that will be useful for personalizing search engine results? 160
  • 160. Models for generating a profile of the user  We record the (time ordered) stream of webpages that have been bookmarked by a particular user  The first simple profile involves counting the occurrences of terms in the tags of any of the known bookmarks.  An obvious problem is that users often have multiple interests and their many bookmarks cover a range of topics. Thus some bookmarks may be completely unrelated to the nth bookmark (and thus the tags being used as the current query). 161
  • 161.  The second source of information in the bookmarks is the content of the bookmarked pages themselves.  One would expect given the much larger vocabulary of Web pages compared to tag data, that content may prove more useful than tags. Indeed content-based profiles are more useful than query-based ones.  A user spends more time deliberating which pages to bookmark than deciding which search results to click on.  Since a user will only bookmark sites that they find particularly useful or interesting, these documents should contain a lot of useful information about the user and the content of bookmarked documents is particularly useful for personalization. 162
  • 162.  The previous profile is somewhat adhoc in its decision which documents to include and which not to include.  In theory, we would like to include all documents that the user has bookmarked, but weight them according to their expected usefulness for resolving ambiguity in the current query.  Our first attempt to estimate the distance between two bookmarks is to count the number of common terms in their respective sets of tags 163
  • 163. How do we use these profiles?  In order to incorporate the user profile for personalized information retrieval queries are expanded with terms from the profile, weighting them appropriately.  The number of expansion terms to be added to the query is limited so as to limit the amount of noise and total length of the expanded query.  In particular, the K most frequent terms from the profile are added and the weights to account for the missing terms are normalized. 164
  • 165. Introduction to Recommender Systems  Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences.  Objective:  To propose objects fitting the user needs/wishes  To sell services (site visits) or goods  Many search engines and on-line stores provide recommendations (e.g. Amazon, CDNow).  Recommenders have been shown to substantially increase clicks (and sales).
  • 166. Book Recommender Red Mars Found ation Juras- sic Park Machine User Lost Learning Profile World 2001 Neuro- 2010 mancer Differ- ence Engine
  • 167. Personalization  Recommenders are instances of personalization software.  Personalization concerns adapting to the individual needs, interests, and preferences of each user.  Includes:  Recommending  Filtering  Predicting (e.g. form or calendar appt. completion)  From a business perspective, it is viewed as part of Customer Relationship Management (CRM).
  • 168. Machine Learning and Personalization  Machine Learning can allow learning a user model or profile of a particular user based on:  Sample interaction  Rated examples  Similar user profiles  This model or profile can then be used to:  Recommend items  Filter information  Predict behavior
  • 169. Types of recommendation systems 1.Search-based recommendations 2.Category-based recommendations 3.Collaborative filtering 4.Clustering 5.Association rules 6.Information filtering 7.Classifiers
  • 170. 1. Search-based recommendations  The only visitor types a search query  « data mining customer »  The system retrieves all the items that correspond to that query  e.g. 6 books  The system recommends some of these books based on general, non-personalized ranking (sales rank, popularity, etc.)
  • 171. Search-based recommendations  Pros:  Simple to implement  Cons:  Not very powerful  Which criteria to use to rank recommendations?  Is it really « recommendations »?  The user only gets what he asked for
  • 172. 2. Category-based recommendations  Each item belongs to one category or more.  Explicit / implicitchoice:  The customer select a category of interest (refinesearch, opt-in for category- basedrecommendations, etc.). – « Subjects> Computers & Internet >Databases> Data Storage & Management > Data Mining »  The system selects categories of interest on the behalf of the customer, based on the current item viewed, past purchases, etc.  Certain items (bestsellers, new items) are eventually recommended
  • 173. Category-based recommendations  Pros:  Still simple to implement  Cons:  Again: not very powerful, which criteria to use to order recommendations? is it really « recommendations »?  Capacity highly depends upon the kind of categories implemented – Too specific: not efficient – Not specific enough: no relevant recommendations
  • 174. 3. Collaborative filtering  Collaborative filtering techniques « compare » customers, based on their previous purchases, to make recommendations to « similar » customers  It’s also called « social » filtering  Follow these steps: 1.Find customers who are similar (« nearest neighbors ») in term of tastes, preferences, past behaviors 2.Aggregate weighted preferences of these neighbors 3.Make recommendations based on these aggregated, weighted preferences (most preferred, unbought items)
  • 175. Collaborative filtering  Example: the system needs to make recommendations to customer C Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X  Customer B is very close to C (he has bought all the books C has bought). Book 5 is highly recommended  Customer D is somewhat close. Book 6 is recommended to a lower extent  Customers A and E are not similar at all. Weight=0
  • 176. Collaborative filtering  Pros:  Extremely powerful and efficient  Very relevant recommendations  (1) The bigger the database, (2) the more the past behaviors, the better the recommendations  Cons:  Difficult to implement, resource and time-consuming  What about a new item that has never been purchased? Cannot be recommended  What about a new customer who has never bought anything? Cannot be compared to other customers no items can be recommended
  • 177. 4. Clustering  Another way to make recommendations based on past purchases of other customers is to cluster customers into categories  Each cluster will be assigned « typical » preferences, based on preferences of customers who belong to the cluster  Customers within each cluster will receive recommendations computed at the cluster level
  • 178. Clustering Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X  Customers B, C and D are « clustered » together. Customers A and E are clustered into another separate group  « Typical » preferences for CLUSTER are:  Book 2, very high  Book 3, high  Books 5 and 6, may be recommended  Books 1 and 4, not recommended at all
  • 179. Clustering Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X  How does it work?  Any customer that shall be classified as a member of CLUSTER will receive recommendations based on preferences of the group:  Book 2 will be highly recommended to Customer F  Book 6 will also be recommended to some extent
  • 180. Clustering  Problem: customers may belong to more than one cluster; clusters may overlap  Predictions are then averaged across the clusters, weighted by participation Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X
  • 181. Clustering  Pros:  Clustering techniques work on aggregated data: faster  It can also be applied as a « first step » for shrinking the selection of relevant neighbors in a collaborative filtering algorithm  Cons:  Recommendations (per cluster) are less relevant than collaborative filtering (per individual)
  • 182. 5. Association rules  Clustering works at a group (cluster) level  Collaborative filtering works at the customer level  Association rules work at the item level
  • 183. Association rules  Past purchases are transformed into relationships of common purchases Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X Also bought… Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 1 1 1 who bought… Customers Book 2 2 1 1 Book 3 2 2 Book 4 1 Book 5 1 1 2 Book 6 1
  • 184. Association rules  These association rules are then used to make recommendations  If a visitor has some interest in Book 5, he will be recommended to buy Book 3 as well  Recommendations are constrained to some minimum levels of confidence  What if recommendations can be made using more than one piece of information?  Recommendations are aggregated Also bought… Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 1 1 1 who bought… Customers Book 2 2 1 1 Book 3 2 2 Book 4 1 Book 5 1 1 2 Book 6 1
  • 185. Association rules  Pros:  Fast to implement  Fast to execute  Not much storage space required  Not « individual » specific  Very successful in broad applications for large populations, such as shelf layout in retail stores  Cons:  Not suitable if knowledge of preferences change rapidly  It is tempting to do not apply restrictive confidence rules May lead to litteraly stupid recommendations
  • 186. 6. Information filtering  Association rules compare items based on past purchases  Information filtering compare items based on their content  Also called « content-based filtering » or « content-based recommendations »  Can exploit syntactical information on objects (features)  But also semantic knowledge of objects (concepts/ontologies)
  • 187. Information filtering  What is the « content » of an item?  It can be explicit « attributes » or « characteristics » of the item. For example for a film:  Action / adventure  Feature Bruce Willis  Year 1995  It can also be « textual content » (title, description, table of content, etc.)  Several techniques exist to compute the distance between two textual documents
  • 188. Information filtering  How does it work?  A textual document is scanned and parsed  Word occurrences are counted (may be stemmed)  Several words or «tokens» are not taken into account: rarely used or «stop words»  Each document is transformed into a normed TFIDF vector, size N(Term Frequency / Inverted Document Frequency).  The distance between any pair of vector is computed
  • 189. Information filtering  An (unrealistic) example: how to compute recommendations between 8 books based only on their title?  Books selected:  Building data mining applications for CRM  Accelerating Customer Relationships: Using CRM and Relationship Technologies  Mastering Data Mining: The Art and Science of Customer Relationship Management  Data Mining Your Website  Introduction to marketing  Consumer behavior  marketing research, a handbook  Customer knowledge management
  • 190. COUNT building data Accelerating Mastering Data Data Mining Your Introduction to consumer marketing customer mining Customer Mining: The Art Website marketing behavior research, a knowledge applications for Relationships: and Science of handbook management crm Using CRM and Customer Relationship Relationship Technologies Management a 1 accelerating 1 and 1 1 application 1 art 1 behavior 1 building 1 consumer 1 crm 1 1 customer 1 1 1 data 1 1 1 for 1 handbook 1 introduction 1 knowledge 1 management 1 1 marketing 1 1 mastering 1 mining 1 1 1 of 1 relationship 2 1 research 1 science 1 technology 1 the 1 to 1 using 1 website 1 your 1
  • 191. TFIDF Normed Vectors building data Accelerating Mastering Data Data Mining Your Introduction to consumer marketing customer mining Customer Mining: The Art Website marketing behavior research, a knowledge Mastering Data Mining: applications for Relationships: and Science of Data mining handbook management crm Using CRM and Customer The Art and Science Relationship Relationship Technologies of Customer Relationship Management your website a 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 accelerating and 0.000 0.000 Management 0.432 0.296 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 application 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 art 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 behavior 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000 building 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 consumer 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000 crm 0.344 0.296 0.000 0.000 0.000 0.000 0.000 0.000 customer 0.000 0.216 0.187 0.000 0.000 0.000 0.000 0.381 data 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000 for 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 handbook 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 introduction 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000 knowledge 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.763 management 0.000 0.000 0.256 0.000 0.000 0.000 0.000 0.522 marketing 0.000 0.000 0.000 0.000 0.436 0.000 0.368 0.000 mastering 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 mining 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000 of 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 relationship research 0.000 0.000 Data0.468 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 0.000 science 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 technology 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000 the 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 to 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000 using 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000 website 0.000 0.000 0.000 0.632 0.000 0.000 0.000 0.000 your 0.000 0.000 0.187 0.000 0.632 0.316 0.000 0.000 0.000 0.000
  • 192. Information filtering  A customer is interested in the following book: « Building data mining applications for CRM »  The system computes distances between this book and the 7 others  The « closest » books are recommended:  #1:Data Mining Your Website  #2:Accelerating Customer Relationships: Using CRM and Relationship Technologies  #3:Mastering Data Mining: The Art and Science of Customer Relationship Management  Not recommended:Introduction to marketing  Not recommended: Consumer behavior  Not recommended:marketing research, a handbook  Not recommended: Customer knowledge management
  • 193. Information filtering  Pros:  No need for past purchase history  Not extremely difficult to implement  Cons:  « Static » recommendations  Not efficient is content is not very informative e.g. information filtering is more suited to recommend technical books than novels or movies
  • 194. 7. Classifiers  Classifiers are general computational models  They may take in inputs:  Vector of item features (action / adventure, Bruce Willis)  Preferences of customers (like action / adventure)  Relations among items  They may give as outputs:  Classification  Rank  Preference estimate  That can be a neural network, Bayesian network, rule induction model, etc.  The classifier is trained using a training set
  • 195. Classifiers  Pros:  Versatile  Can be combined with other methods to improve accuracy of recommendations  Cons:  Need a relevant training set
  • 196. Collaborative Filtering  Maintain a database of many users’ ratings of a variety of items.  For a given user, find other similar users whose ratings strongly correlate with the current user.  Recommend items rated highly by these similar users, but not rated by the current user.  Almost all existing commercial recommenders use this approach (e.g. Amazon).
  • 197. Collaborative Filtering A 9 A A 5 A A 6 A 10 User B 3 B B 3 B B 4 B 4 C C 9 C C 8 C C 8 Database : : : : : : : : : : . . Z 5 Z 10 Z 7 Z Z Z 1 A 9 A 10 B 3 B 4 Correlation C C 8 Match : : . . Z 5 Z 1 A 9 Active B 3 Extract User C C Recommendations . . Z 5
  • 198. Collaborative Filtering Method  Weight all users with respect to similarity with the active user.  Select a subset of the users (neighbors) to use as predictors.  Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.  Present items with highest predicted ratings as recommendations.
  • 199. Significance Weighting  Important not to trust correlations based on very few co-rated items.  Include significance weights, based on number of co-rated items.  If no items are rated by both users, correlation is not meaningful
  • 200. Neighbor Selection  For a given active user, a, select correlated users to serve as source of predictions.  Standard approach is to use the most similarn users, u, based on similarity weights, wa,u  Alternate approach is to include all users whose similarity weight is above a given threshold.

Editor's Notes

  1. There have been many definitions for IR in the last decades… we just report
  2. There have been many definitions for IR in the last decades… we just report
  3. User-centric interfacesCloud services should be accessed with simple and pervasive methods. In fact, the Cloud computing adopts the concept of Utility computing. Utility Computing: users obtain and employ computing platforms in computing Clouds as easily as they access a traditional public utility. In detail, the Cloud services enjoy the following features:The cloud interfaces do not force users to change their working habits and environments.The cloud client software which is required to be installed locally is lightweightCloud interfaces are location independent and can be accessed by some well established interfaces like Web services framework and Internet browserAutonomous SystemThe computing Cloud is an autonomous system and it is managed transparently to users. Hardware, software and data inside clouds can be automatically reconfigured, orchestrated and consolidated to present a single platform image, finally rendered to users.Scalability and flexibilityThe scalability and flexibility are the most important features that drive the emergence of the Cloud computing. Cloud services and computing platforms offered by computing Clouds could be scaled across various concerns, such as geographical locations, hardware performance, software configurations. The computing platform should be flexible to adapt to various requirements of a potentially large number of users.
  4. Software or an application is hosted as a service and provided to customers across the Internet. This mode eliminates the need to install and run the application on the customer’s local computers. SaaS therefore alleviates the customer’s burden of software maintenance, and reduces the expense of software purchases by on-demand pricingAn early example of the SaaS is the Application Service Provider (ASP). The ASP approach provides subscriptions to software that is hosted or delivered over the Internet. Microsoft’s “Software +Service” shows another example: a combination of local software and Internet services interacting with one another. Google’s Chrome browser gives an interesting SaaS scenario: a new desktop could be offered, through which applications can be delivered (either locally or remotely) in addition to the traditional Web browsing experience
  5. The Google App Engine is an interesting example of the IaaS. The Google App Engine enables users to build Web applications with Google’s APIs and SDKs across the same scalable systems, which power the Google applications.
  6. ITaaS is a highly disruptive concept for enterprise users, who have less to gain and more to lose by outsourcing ITCloud service providers trying to serve this space must implement enterprise-class capabilities at multiple levels both in the network and at the end pointsKey business and technical challenges include cost, security, performance, business resiliency, interoperability, and data migrationCloud computing is still in early development. Market researchers, financial analysis, and business leaders all want to assess its potential markets and business impact. According to IDC, a market research firm that recently surveyed IT executives, CIOs, and other business leaders, IT spending on cloud services will reach US$42 billion by 2012. However, as with any disruptive technology and transitional business model, there is no definitive assessment of cloud computing’s market opportunity. We believe its long-term business impact could be even larger