• Save
Strategic scenarios in digital content and digital business
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Strategic scenarios in digital content and digital business

  • 6,608 views
Uploaded on

This lesson was given in May 2009 at MIP, Politecnico di Milano. The audience included members of the Acer academy program....

This lesson was given in May 2009 at MIP, Politecnico di Milano. The audience included members of the Acer academy program.

Rights on reused content are maintained by respective owners.

See further information on my activity at:
http://home.dei.polimi.it/mbrambil/
and:
http://twitter.com/marcobrambi

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,608
On Slideshare
6,605
From Embeds
3
Number of Embeds
3

Actions

Shares
Downloads
0
Comments
0
Likes
7

Embeds 3

http://www.slideshare.net 1
http://www.techgig.com 1
https://lms.kku.edu.sa 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • There have been many definitions for IR in the last decades… we just report
  • There have been many definitions for IR in the last decades… we just report
  • User-centric interfacesCloud services should be accessed with simple and pervasive methods. In fact, the Cloud computing adopts the concept of Utility computing. Utility Computing: users obtain and employ computing platforms in computing Clouds as easily as they access a traditional public utility. In detail, the Cloud services enjoy the following features:The cloud interfaces do not force users to change their working habits and environments.The cloud client software which is required to be installed locally is lightweightCloud interfaces are location independent and can be accessed by some well established interfaces like Web services framework and Internet browserAutonomous SystemThe computing Cloud is an autonomous system and it is managed transparently to users. Hardware, software and data inside clouds can be automatically reconfigured, orchestrated and consolidated to present a single platform image, finally rendered to users.Scalability and flexibilityThe scalability and flexibility are the most important features that drive the emergence of the Cloud computing. Cloud services and computing platforms offered by computing Clouds could be scaled across various concerns, such as geographical locations, hardware performance, software configurations. The computing platform should be flexible to adapt to various requirements of a potentially large number of users.
  • Software or an application is hosted as a service and provided to customers across the Internet. This mode eliminates the need to install and run the application on the customer’s local computers. SaaS therefore alleviates the customer’s burden of software maintenance, and reduces the expense of software purchases by on-demand pricingAn early example of the SaaS is the Application Service Provider (ASP). The ASP approach provides subscriptions to software that is hosted or delivered over the Internet. Microsoft’s “Software +Service” shows another example: a combination of local software and Internet services interacting with one another. Google’s Chrome browser gives an interesting SaaS scenario: a new desktop could be offered, through which applications can be delivered (either locally or remotely) in addition to the traditional Web browsing experience
  • The Google App Engine is an interesting example of the IaaS. The Google App Engine enables users to build Web applications with Google’s APIs and SDKs across the same scalable systems, which power the Google applications.
  • ITaaS is a highly disruptive concept for enterprise users, who have less to gain and more to lose by outsourcing ITCloud service providers trying to serve this space must implement enterprise-class capabilities at multiple levels both in the network and at the end pointsKey business and technical challenges include cost, security, performance, business resiliency, interoperability, and data migrationCloud computing is still in early development. Market researchers, financial analysis, and business leaders all want to assess its potential markets and business impact. According to IDC, a market research firm that recently surveyed IT executives, CIOs, and other business leaders, IT spending on cloud services will reach US$42 billion by 2012. However, as with any disruptive technology and transitional business model, there is no definitive assessment of cloud computing’s market opportunity. We believe its long-term business impact could be even larger

Transcript

  • 1. Strategic Scenarios in Digital contents Marco Brambilla et al. Politecnico di Milano, DEI and MIP Acer Academy May 2009 http://home.dei.polimi.it/mbrambil/
  • 2. Agenda overview  Information overload  Evolution of contents  Web 2.0  Web 3.0  Tools and technologies for managing information overload
  • 3. 1. Information overload
  • 4. Introduction and motivation  161 exabytes of information was created or replicated worldwide in 2006  IDC estimates 6X growth by 2010 to 988 exabytes (a zetabyte) / year  That‟s more than in the previous 5,000 years. – DATA from: Dr. Michael L. Brodie - Chief Scientist Verizon
  • 5. Where does content come from  The largest source of data?  USERS  YouTube Videos  1.7 billion served / month  1 million streams / day = 75 billion e-mails  Facebook had [in 2007] …  1.8 billion photos  31 million active users  100.000 new users / day  1,800 applications  MySpace, 185+ million registered users (Apr 2007), has…  Images: – 1+ billion - Millions uploaded / day- 150,000 requests / sec  Songs: – 25 million - 250,000 concurrent streams  Videos: – 60 TB - 60,000 uploaded / day - 15,000 concurrent streams
  • 6. Quality of data  (User Generated) Content is:  25% original; 75% replicated  25% from the workplace; 75% not  95% unstructured and growing  While enterprise data is 10-15% structured and decreasing  Main challenges:  How to make multimedia content available to search engines and search based applications?  Exploiting multimedia content requires: – Acquiring it – (Re) Formatting it – Indexing it – Querying it – Transmitting it – Browsing it
  • 7. Information overload effects on (our) way of working For knowledge workers • Time is limited • Processes overlap • Knowledge is (often) artefact- dependent • Tools allow multiplicity of uses • Need for several tools • Relations with people take time • Contexts mix and merge
  • 8. Example: email (!!) 8
  • 9. Working with information  Types of information  Usefulness – Active: ephemeral and working (“hot”) – Dormant: inactive, potentially useful (“cold”) – Not useful – Un-accessed  Ownership: mine or not-mine  Activities  Acquisition of items to form a collection  Organisation of items  Maintenance of the collection (e.g. archiving items into long- term storage)  Retrieval of items for reuse  Information (and choice) overload.. On YOUTUBE
  • 10. Acquisition  Different between tools  Manual (files), uncontrolled (e-mails)  Push vs. pull  Reasons for deciding how to store information  Portability  Number of access points  Preservation of information in its current state  Currency of information  Context  Reminding  Ease of integration into existing structures  Communication and information sharing  Ease of maintenance
  • 11. Organisation  Categorisations are complex  Folders vs. keywords  Trees vs. webs  Change over time  Categorisations are local  If two groups of people construct thesauri in a particular subject area, the overlap of index terms will only be 60%  Two indexers using the same thesaurus on the same document use common index terms in only 30% of cases  The output from two experienced database searchers has only 40% overlap  Experts' judgements of relevance concur in only 60% of cases
  • 12. Maintanance  Hardly any  Occasional cleaning  Extensive maintenance is related to major life changes (e.g. new job)
  • 13. Retrieval  Personal archives instead of corporate systems  Need to start searching  Not invented here: reinventing is more fun than reusing  Asking is more difficult than sharing  Social search: asking others  Estimations of quality and relevance are best made by experts themselves  It's fastest and most efficient way  Colleagues can give you feedback and help to sharpen your questions  Consulting others is fun  While searching systems  Preference for location-based search  Critical reminding function of file placement  Lack of retrieval of archived files
  • 14. 2. Evolution of contents
  • 15. Evolution of contents and technologies  I. from static to dynamic  II. from fixed to mobile  III. from big to small  IV. from local to global  V. from vertical to horizontal  VI. from sometimes-on to always-on  VII. from wired to wireless  VIII. from divergence to convergence 15
  • 16. Content proliferation and classification  Proliferation of  blogs  online video  podcasting,  other social media tools  the definition of what consititutes ‟web‟/‟non-web‟ content has become increasingly blurred 16
  • 17. Pervasive and convergent digital content 17
  • 18. Convergence of connectivity 18
  • 19. 3. Web 2.0
  • 20. Social- vs. Group- ware  The basic model of 90's era collaboration (Lotus Notes): all about the group.  Information was managed in group-based repositories, then passed around for review, or published to intranet portals via customized apps. Information era workflows where people are first and foremost occupiers of roles, not individuals, and the materials being created are more closely aligned with groups than individuals.  Web 2.0 social tools: MySpace, Facebook, LinkedIn Social networks -- explicit ones or implicit ones in social media –  are really organized around individuals and their networked self-expression. I am writing this blog post, and publishing it, personally. It is not the product of some workgroup. It is not an anonymous chunk of text on a corporate portal. My Facebook profile pulls traffic from my network of contacts, sources I find interesting, and the chance presence updates of my friends.  See: http://www.stoweboyd.com/message/2007/01/in_the_time_of_.html 21
  • 21. Doug Engelbart, 1968 "The grand challenge is to boost the collective IQ of organizations and of society. "
  • 22. Tim O’Reilly, 2006, on Web 2.0 “The central principle behind the success of the giants born in the Web 1.0 era who have survived to lead the Web 2.0 era appears to be this, that they have embraced the power of the web to harness collective intelligence”
  • 23. Web 2.0 is about The Social Web “Web 2.0 Is Much More About A Change In People and Society Than Technology” -Dion Hinchcliffe, tech blogger  1 billion people connect to the Internet  100 million web sites  over a third of adults in US have contributed content to the public Internet. - 18% of adults over 65
  • 24. Tim Berners-Lee “The Web isn’t about what you can do with computers. It’s people and, yes, they are connected by computers. But computer science, as the study of what happens in a computer, doesn’t tell you about what happens on the Web.” NY Times, Nov 2, 2006
  • 25. But what is “collective intelligence” in the social web sense?  intelligent collection?  collaborative bookmarking, searching  “database of intentions”  clicking, rating, tagging, buying  what we all know but hadn‟t got around to saying in public before  blogs, wikis, discussion lists “database of intentions” – Tim O’Reilly
  • 26. the wisdom of clouds?
  • 27. “Collective Knowledge” Systems  The capacity to provide useful information  based on human contributions  which gets better as more people participate.  typically  mix of structured, machine-readable data and unstructured data from human input
  • 28. Collective Knowledge is Real  FAQ-o-Sphere - self service Q&A forums  Citizen Journalism – “We the Media”  Product reviews for gadgets and hotels  Collaborative filtering for books and music  Amateur Academia
  • 29. The timeline
  • 30. Web 2.0 The phrase "Web 2.0" can refer to one or more of the following:  The transition of web sites from isolated information silos to sources of content and functionality, thus becoming computing platforms serving web applications to end-users  A social phenomenon embracing an approach to generating and distributing Web content itself, characterized by open communication, decentralization of authority, freedom to share and re-use, and "the market as a conversation”  Enhanced organization and categorization of content, emphasizing deep linking  A rise in the economic value of the Web, possibly surpassing the impact of the dot-com boom of the late 1990s
  • 31. Two main kinds  PEOPLE FOCUS: The first kind of socializing is typified by "people focus" websites such as Bebo, Facebook, and Myspace and Xiaonei.  HOBBY FOCUS: The second kind of socializing is typified by a sort of "hobby focus" websites. such as Flickr, Kodak Gallery and Photobucket
  • 32. Web 2.0 (see Wesch from YouTube [LOCAL]) Since social web applications are built to encourage communication between people, they typically emphasize some combination of the following social attributes:  Identity: who are you?  Reputation: what do people think you stand for?  Presence: where are you?  Relationships: who are you connected with? who do you trust?  Groups: how do you organize your connections?  Conversations: what do you discuss with others?  Sharing: what content do you make available for others to interact with?  Examples of social applications include Twitter, Facebook, Stumpedia, and Jaiku.
  • 33. Keyword: sharing!  Sharing...  Useful vs. Not useful (!?) 
  • 34. Sharing for the enterprise? (1) A teenager model? (2) Always useful?
  • 35. Community 36
  • 36. Human Resource Management 2.0  Social networks for the job market – To find and be found – To manage your online reputation – To research and reference check – To hire a superstar – To use your network to do your job better – To use your network to get a better job http://www.linkedin.com/
  • 37. Blog  a user-generated website where entries are made in journal style and displayed in a reverse chronological order. The term "blog" is derived from "Web log." "Blog" can also be used as a verb, meaning to maintain or add content to a blog.
  • 38. Wiki  a website that allows the visitors themselves to easily add, remove, and otherwise edit and change available content, typically without the need for registration. This ease of interaction and operation makes a wiki an effective tool for mass collaborative authoring.
  • 39. Best known wiki
  • 40. Wiki vs. Blog A blog, or web log, shares writing and multimedia content in the form of “posts” (starting point entries) and “comments” (responses to the posts). While commenting, and even posting, are open to the members of the blog or the general public, no one is able to change a comment or post made by another. The usual format is post-comment-comment-comment, and so on. For this reason, blogs are often the vehicle of choice to expressindividual opinions. A wiki has a far more open structure and allows others to change what one person has written. This openness may trump individual opinion withgroup consensus.
  • 41. Special purpose blogs: photos, music, ... 42
  • 42. (Social) Tagging  Term – a word or phrase that is recognizable by people and computers  Document – a thing to be tagged, identifiable by a URI or a similar naming service  Tagger – someone or thing doing the tagging, such as the user of an application  Tagged – the assertion by Tagger that Document should be tagged with Term
  • 43. Podcast  A podcast is a media file that is distributed by subscription (paid or unpaid) over the Internet using syndication feeds, for playback on mobile devices and personal computers.
  • 44. Examples of Podcasts available  iTunes Store  NPR  ArtsEdge  Ed. Podcast Network  SFMoMA
  • 45. Blog with Podcasts & Wikis  Several functions on the same platform
  • 46. Gathering specific communities – TappedIn
  • 47. Collecting feedbacks – SurveyMonkey SurveyMonkey.com
  • 48. Tools. Example: collaboration and sharing  Webex  Meeting center  Training center  Acquired by CISCO in 2007  Integrated phone conferencing, VoIP, support for PowerPoint, Flash, audio, and video;  Meeting recording and playback, One-click meeting access, scheduling, and IM applications, full compatibility, secure communications  See http://www.sramanamitra.com/2007/03/15/cisco-acquires- webex-beefs-collaboration/ 49
  • 49. Trends and size  Facebook growth: 700% from 2008 to 2009  Twitter growth: 3,700%  And unique visitors..
  • 50. One big social application? Facebook connect!  evolution of Facebook Platform enabling you to integrate Facebook into your own site. You can add social context to your site:  Identity. Seamlessly connect the user's Facebook account with your site  Friends. Bring a user's Facebook friends into your site.  Social Distribution. Publish information back into Facebook.  Privacy. Bring dynamic privacy to your site. How scalable, reliable, open-minded? 51
  • 51. Wouldn’t this be better? But.. 52
  • 52. The Mash-up approach  User-defined combination of services available on the web  Graphical design  Immediate execution
  • 53. E.g.: airlines mash-up Tracing of referral, searches, and so on […]
  • 54. SOA vs. Web 2.0 SOA Web 2.0 Planning Design Implementation Monitoring
  • 55. Comparison ... Web 2.0 SOA Saas = Saas Web-based interoperability Standard based interoperability (REST) (SOAP, WSDL, UDDI) Application as a platform = Application as a platform Pushes for unexpected reuse Allows reuse RIA No UI Participatory architecture Centralized governance
  • 56. … and complementarity Fonte: Babak Hosseinzadeh, IBM
  • 57. Short term challenge: Mash-up on SOA Mash-up SOA
  • 58. Mid-term: Web as a platform  The past  The future […] […] Framework Framework API API API API API API API API API RSS RSS RSS REST SOAP REST REST SOAP SOAP […] […] Operating System Web Hardware Internet
  • 59. Example: eBay  Services for  shopping  trading  Publishes services  REST interface  SOAP interface  Numbers1:  4 billion requests/month (5.5 mln/h)  25% of the offer only via Web Service  25000 registered developers  1900 known applications 1http://blogs.zdnet.com/ITFacts/?p=10326
  • 60. Example: Amazon  Services for  e-commerce  on-line payment  computing (EC2)  storage (s3)  human computing (MTurk)  Queues (SQS)  Success stories  Ex 1, Jungle Disk: online back-up service  Ex 2, ABACA:99%-protection antispam
  • 61. (NOT) Artificial intelligence: Mechanical Turk ! 62
  • 62. 4. Web 3.0
  • 63. SOA provides great plumbing!
  • 64. Web 2.0 providegreatplumbing! E. Della Valle @ CEFRIELValle @ CEFRIEL - Politecnico di Milano E. Della - Politecnico di
  • 65. Is plumbing enough?
  • 66. How to manage complexity?  A few services in a small company  Hundreds of services and processes in a big organization Few services Several services Several enterprises A1 B8 A4 A1 B3 A1 B3 A1 A1 A1 A1 A1 A1 A4 A2 A4 A1 A2 A1 A4 A2 A4 A1 A2 B3 A1 A2 One company A5 A1 A2 B3 B3 A1 A1 A1 A1 A1 B3 A1 A1 A1 A4 A6 A1 A4 A1 A1 A1 A4 A1 A2 A4 B3 A1 A1 A2 A4 A1 B3 A1 A1 A4 A1 A2 A2 A1 A4 B3 A1 A4 B3 A1 A2 A4 A1 A2 A1 A1 A1 A1 A1 A1 A1 A1 B3 A2 A4 A1 A2 A1 A1 A1 A1 A4 A1 A2 A1A1 A4 A4 A2 A1 A2A2 A4 A1 A2 A2 A4 A4 A1 A1 A1 A1 A2 A1 A4 B3 A1 A4 A2 A4 A2 A4 A1 A1A1 A1 A2 B3 B3 A4 A2 B3 A4 A1 B3 A2 A1 A1 A1 A1 A1 A4 A1 A4 A1 A4 A2 B3 B3A1 A1 A1 A1 A2 A4 A1 A1 A1 A2 A1A1 Mashup A4 A1 A2 A1 A4 A1 A1A1 A1A2 A4 A4 A1 A4 B3 A1A1 B3 B3 A1 A1 A1 ? A N1 E N2 F C D Complex BPM
  • 67. The problem is in the semantics! “The problem is not in the plumbing, it is in the semantics ” VerizonChief Scientist - M . L . Brodie “L’eterogeneità semantica rimane il principale intoppo alla integrazione di applicazioni, un intoppo che i Web Services da soli non risolveranno. Finché qualcuno non troverà un modo di per far sì che le applicazioni si capiscano, gli effetti dei Web Services resteranno limitate. Quando si passano i dati di un utente in un certo formato usando un Web Services come interfaccia, il programma che li riceve deve comunque sapere in che formato sono. Occorre comunque accordarsi sulla struttura di ciascun business object. Fino ad ora nessuno ha ancora trovato una soluzione attuabile…” Oracle Chairman and CEO - Larry Ellison
  • 68. Web 3.0  Combining SOA + Social Web + Semantic Web  I.e., Services + Folksonomies + Ontologies (or + Taxonomies) 69
  • 69. Tim Berners-Lee, 2001 “The Semantic Web is not a separate Web but an extension of the current one, in which information is given well- defined meaning, better enabling computers and people to work in cooperation.” Scientific American, May 2001
  • 70. Beyond Web 2.0 ... Business Process Given a BPM: Find the best set of services? Find the best datasource? Integration Mediator Mediator Manage not heterogeneous Web as a world scale platform data/services? Legacy Mediator Mediator Comm. Mediator Mediator AT Services Buyer RUNTIME! […] […] […] 3rd Party Shipment
  • 71. SOA + Web 2.0 = ? UDDI WSDL Service Description WSBPEL Discovery Agencies Publish Discover Service Description Service Service requester provider Interact SOAP .. source: http://www.w3.org/TR/2002/WD-ws-arch-20021114/
  • 72. SOA Advantages Costs of different EAI approaches Relative costs Custom Integration Proprietary EAI solutions Web Services based EAI solutions SOA based EAI solutions Adoption Deployment Maintenance Changes [source ZapThink http://www.zapthink.com/]
  • 73. From vertical applications...  Different IT solutions in each department Department 1 Department 2 Department N […]
  • 74. … to service extraction …  Rationalization of IT solutions  Factorization and publication of common services Department 1 Department 2 Department N […]
  • 75. … and process composition.  For using internal subprocesses, but also processes of customers or providers. Client Department 1 Department 2 Shared services Outsourced services Provider
  • 76. “Ontology is overrated.”  “[tags] are a radical break with previous categorization strategies”  hierarchical, centrally controlled, taxonomic categorization has serious limitations  e.g., Dewey Decimal System  free-form, massively distributed tagging is resilient against several of these limitations http://shirky.com/writings/ontology_overrated.html
  • 77. But...  ontologies aren‟t taxonomies  they are for sharing, not finding  they enable cross-application aggregation and value-added services
  • 78. Ontology of Folksonomy  What would it look like to formalize an ontology for tag data?  Functional Purpose: applications that use tag data from multiple systems  tag search across multiple sites  collaboratively filtered search – “find things using tags my buddies say match those tags”  combine tags with structured query – “find all hotels in Spain tagged with “romantic” http://tomgruber.org/writing/ontology-of-folksonomy.htm
  • 79. Example: formal match, semantic mismatch  System A says a tag is a property of a document.  System B says a tag is an assertion by an individual with an identity.  Does it mean anything to combine the tag data from these two systems?  “Precision without accuracy”  “Statistical fantasy”
  • 80. Engineering the tag ontology  Working with tag community, identify core and non core agreements  Use the process of ontology engineering to surface issues that need clarification  Couple a proposed ontology with reference implementations or hosted APIs
  • 81. Issues raised by ontological engineering  is term identity invariant over case, whitespace, punctuation?  are documents one-to-one with URI identities? (are alias URLs possible?)  can tagging be asserted without human taggers?  negation of tag assertions?  tag polarity – “voting” for an assertion  tag spaces – is the scope of tagging data a user community, application, namespace, or database?
  • 82. Pivot Browsing – surfing unstructured content along structured lines  Structured data provides dimensions of a hypercube  location  author  type  date  quality rating  Travel researchers browse along any dimension.  The key structured data is the destination hierarchy  Contributors place their content into the destination hierarchy, and the other dimensions are automatic.
  • 83. 5. Tools and technologies for managing information overload
  • 84. Tools Information: The double edged sword  You want good information, not all information  Information Retrieval /search – Multimedia IR  RSS/Bloglines/Google Reader  Social bookmarking
  • 85. 5.1. Multimedia Information Retrieval
  • 86. Data in digital libraries  TEXT: e-book, Word documents, Web pages, PDF, Blog, etc.  Audio:  Speech (broadcasting, podcasting, recording, etc.)  Music (CD, MP3, etc.)  Pictures: Personal photos, schemes, diagrams, etc.  Video: sequence of images and audio (music and/or speech) Challenge: How to make multimedia content available to search engines and search based applications?
  • 87. Some user challenges…  Precision & contextual relevancy  aware of rights, user and information contexts  personalization and recommendation  Search must support multiple interaction patterns  active searching, monitoring, browsing and "being aware“  Trust and spam  Ubiquity of access
  • 88. MIR Application Areas  Architecture, real estate, and  Investigation services interior design  (e.g., human characteristics  (e.g., searching for ideas) recognition, forensics)  Broadcast media selection  Journalism  (e.g., radio and TV channel)  (e.g. searching speeches of a certain politician using his name,  Cultural services his voice or his face)  (history museums, art galleries,  Multimedia directory services etc.)  (e.g. yellow pages, Tourist  Digital libraries information, GIS)  (e.g., musical dictionary, bio-  Multimedia editing medical imaging catalogues, film, video and radio archives)  (e.g., personalized news service, media authoring)  E-Commerce  Remote sensing  (e.g., personalized advertising, on-line catalogues)  (e.g., cartography, ecology)  Education  Shopping  (e.g., repositories of multimedia  (e.g., searching for clothes) courses)  Social  Home Entertainment  (e.g. dating services)  (e.g., personal multimedia collections)  Surveillance  (e.g., traffic control)
  • 89. MIR: Query Examples  Play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g., in terms of emotions  Draw a few lines on a screen and find a set of images containing similar graphics, logos, ideograms,...  Define objects, including color patches or textures and retrieve examples among which you select the interesting objects to compose your design  On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations  Describe actions and get a list of scenarios containing such actions  Using an excerpt of Pavarotti’s voice, obtaining a list of Pavarotti’s records, video clips where Pavarotti is singing and photographic material portraying Pavarotti
  • 90. State-of-the art of MSE  Image search  Video Search  www.tiltomo.com  www.blinx.com  www.tineye.com  www.clipta.com  www.pixsta.com  www.yovisto.com  www.picsearch.com  Music Search  Entrerprise MIR search  www.midomi.com  www.autonomy.com  www.audiobaba.com  www.pictron.com  http://www.bmat.com  www.exalead.com  www.fastsearch.com
  • 91. Metadata? 92  “Data about other data”  They describe in a structured fashion properties of the data – E.g.: owner, creation and modification date, description, etc.  Some metadata are implicitly available  E.g.: file size, file name, etc.  Others need to be manually provided or automatically extracted
  • 92. The MIR reference architecture
  • 93. Content Process Content Content Content Acquisition Transformation Indexing
  • 94. Content acquisition  In MIR, content is acquired from many sources and in in multiple ways:  By crawling  By user’s contribution  By syndicated contribution from content aggregators  Via broadcast capture (e.g., from air/cable/satellite broadcast, IPTV, Internet TV multicast, ..)
  • 95. Content acquisition  In text or Web search engines, content is a closed or open collection of documents  Textual Web content is acquired by crawlers, who exploit link navigation  In MIR, content is acquired from many sources, in a range of quality and value:  Web cams, security apps  (Video/Audio) Telephony and teleconferencing  Industrial/Academic/Medical  User Generated Content  Public Access and Government Access  Rushes, Raw Footage MOTION PICTURES VALUE  News BROADCAST TV  Advertising ENTERPRISE  TV Programming  Feature Films USER GENERATED WEB CAM, SECURITY PRODUCTION COST
  • 96. Acquisition: (video) metadata sources & formats  Content element may be accompanied by textual descriptions, which range in quantity and quality, from no description (e.g., web cam content) to multilingual high value data (closed captions and production metadata of motion pictures)  Metadata may reside:  Embedded within content (e.g., close captions)  In surrounding Web pages or links (e.g., HTML content, link anchors, etc)  In domain-specific databases (e.g., IMDB for feature films)  In ontologies: http://www.daml.org/ontologies/keyword.html ASSET PACKAGE METADATA METADATA METADATA MULTIPLEXED METADATA MEDIA STREAMS EXTERNAL METADATA
  • 97. Acquisition: (video) representative metadata standards Standard Body MPEG-7, ISO/IEC Int. Electrotechnical Comm., Motion MPEG-21 Picture Expert Group UPnP Universal Plug and Play forum MXF, MDD SMPTE Society of Motion Picture and Television Engineers AAF AMWA Advanced Media Workflow Association TV Anytime ETSI European Telecommunication Standards Institute Timed Text W3C, 3GPP RSS Harward Podcast Apple Media RSS Yahoo
  • 98. Transformation dimesions: Digital video formats  A digital video is a sequence of frames  The Frame Aspect Ratio (FAR) defines the shape of each image (width divided by heigh), with 4:3 and 16:9 being the currently adopted values  Pixel aspect ratio (PAR) describes how the width of pixels in a digital image compares to their height (rectangular pixels format exist for analog TV compatibility).  Frame rate: number of frames per second (24 and 25 are common, but also lower and higher values are used)
  • 99. Transformation dimensions: compression  Web media must be compressed, with lossy (but perceptually acceptable) transformations  In video, compression works in two ways  Intra-Frame: an image is divided in blocks, whose content is “averaged”  Inter-frame: a frame is represented differentially with respect to the preceding one, by encoding only block that “have moved” and their motion vector  Example (MPEG compression)
  • 100. Content Transformation: popular compression standards Standard Typical bitrates Applications M-JPEG, Up to 60 Consumer electronics, video JPEG2000 Mbit/sec editing systems DVCAM 25M Consumer MPEG-1 1.5M CD-ROM Multimedia MPEG-2 4-20M Broadcast TV, DVD MPEG-4 300K-12M Mobile video, Podcast, IPTV H.264 H.261 H.263 64k-1M Video teleconferencing, telephony Each standard has profiles, that balance latency, complexity, error resilience and bandwidth, specifically for a target application (e.g., file-based vs transport-based fruition)
  • 101. Content indexing  In textual search engines, content need little (lexical) analysis before indexing  Index elements (words) are part of the content  In MIR, content cannot be indexed directly  Indexablemeatadatamust be created from the input data – Low level features: concisely describe physical or perceptual properties of a media element (e.g., feature vectors) – High level features: domain concepts characterizing the content (e.g., extracted objects and their properties, content categorizations, etc)  In continuous media, extracted features must be related to the media segment that they characterize, both in space and time  Feature extraction may require a change of medium, e.g., speech to text transcription
  • 102. Motivations for metadata generation  Computer are not able to catch the underlying meaning of a multimedia content  A computer is not able to understand that this picture represents a sunset  Pixels and audio samples do not convey semantics, just binary  Metadata are used to produce representations that are manageable by computers  E.g.: text or numbers
  • 103. How to create multimedia annotations?  Manually  Expensive – It can take up to 10x the duration of the video – Problems in scaling to millions of contents  Incomplete or inaccurate – People might not be able to holistically catch all the meanings associated with a multimedia object  Difficult – Some contents are tedious to describe with words - E.g., a melody without lyrics  Automatically  Good quality – Some technologies have a ~90% precision  “Low” cost
  • 104. Indexing: the core pipeline Content Metadata processing Indexing Multimedia Metadata (e.g., MPEG-7) Indexes content (e.g., inverted (e.g., MPEG-2 video) files) Video Audio processing processing Segmentation Segmentation Audio Analysis Image Video Analysis Analysis
  • 105. Image/Text segmentation  GOAL: identify the type of contents included in an image  Text + pictures  Image sections
  • 106. Audio Segmentation  GOAL: split an audio track according to contained information  Music  Speech  Noise …  Additional usage  Identification and removal of ads
  • 107. Video Segmentation  Keyframe segmentation:  segment a video track according to its keyframes – fixed-length temporal segments  Shot detection:  automated detection of transitions between shots – a shot is a series of interrelated consecutive pictures taken contiguously by a single camera and representing a continuous action in time and space.
  • 108. Speaker identification  GOAL: identify people participating in a discussion ERIC DAVID JOHN  Additional usage:  Vocal command execution
  • 109. Word spotting  GOAL: recognize spoken words belonging to a closed dictionary Call Open Bomb  Additional usage:  Spot blacklist words in spontaneous speech – E.g.: terrorist, attack,…  dialing (e.g., "Call home”)  call routing (e.g., "I would like to make a collect call”)  Domotic appliance control
  • 110. Speech to text  GOAL: automatically recognize spoken words belonging to an open dictionary  Example: quote_detection.avi CREDITS: Thorsten Hermes@SSMT2006
  • 111. Identification of audio events  GOAL: automatically identify audio events of interest  E.g.: shouts, gunshots, etc.  Additional usage:  Security applications  Example: sound_events.avi CREDITS: Thorsten Hermes@SSMT2006
  • 112. Classification of music genre, mood, etc.  GOAL: automatically classify the genre and mood of a song  Rock, pop, Jazz, Blues, etc.  Happy, aggressive, sad, melancholic, Rock Dance!  Additional usage:  Automatic selection of songs for playlist composition
  • 113. Images: low-level features  GOAL: extract implicit characteristics of a picture  luminosity  orientations  textures  Color distribution
  • 114. Images: Optical character recognition (OCR)  OCR is a technique for translating images of typed or handwritten text into symbols  Solved problem for typewritten text (99% accuracy)  Commercial solutions for handwritten text (e.g, MS Tablet PC)
  • 115. Image: face identification and recognition  GOAL: recognize and identify faces in an image  Usage examples:  People counting  Security applications  Example: face_detection.avi CREDITS: Thorsten Hermes@SSMT2006
  • 116. Image: concept detection  Image analysis extract low level features from raw data (e.g., color histograms, color correlograms, color moments, co-occurrence texture matrices, edge direction histograms, etc..)  Features can be used to build discrete classifiers, which may associate semantic concepts to images or regions thereof  The MediaMill semantic search engine defines 491 semantic concepts  http://www.science.uva.nl/research/mediamill/demo  Concepts can be detected also from text (e.g., from manual or automatic metadata) using NLP techniques (FAST text search engine recognizes entities like geographical locations, professions, names of persons, domain-specific technical concepts, etc)
  • 117. Image: object identification  GOAL: identify objects appearing in a picture  Basket ball, cars, planes, players, etc.  Also by example (unaware of position, scaling, etc) – objectByExample.mp4 CREDITS: http://www.youtube.com/user/GuoshenYu
  • 118. Video OCR  Video OCR has specific problems, due to low resolution, small text size, and interference with background  Detection is normally done on the most representative image of an entire shots, rather than frame by frame  Approach: filter for enhancing resolution + pattern matching for character identification  Example: VirageConTEXTract text extraction and recognition technology (recognizes text in real time)
  • 119. Multimodal annotation fusion  Media segmentation and concept extraction are probabilistic processes  The result is characterized by a confidence value  Significance can be enhanced by comparing the output of distinct techniques applied to the same or similar problems  Examples:  Media segmentation: shot detection + speaker’s turn identification  Person recognition: voice identification + face detection  Concept detection: image based classification (e.g., “outdoor” & “water” + object extraction: “bird”, “boat”)
  • 120. Overview of the query process
  • 121. Content querying  In textual search applications, queries are keywords or expressions thereof  In MIR, search can take place  By keyword  By (mono-media) example (e.g., query by image, query by humming, query by song similarity)  By (multi-media) example (e.g., query by video similarity)  Query by example entails real time content processing  MIR query processing naturally requires the interaction of multiple search engines (e.g., a text search engine for textual metadata and a content-based search engine for feature vectors)
  • 122. Querying: modalities  In MIR applications, search keyword match the manual or automatic metadata  A complementary approach is to provide an example of the desired content and look for similar media elements  Similarity is a medium-dependent, domain-dependent, and subjective criterion  Can be computed on low lever features (e.g., image color histograms, music bpm) or on high level concepts/categorization (e.g., melancholic images, party music)  Can be multimodal (e.g., video similarity)  Querying may also consider context information (e.g., the user’s geographical position or the access device)
  • 123. Example query modalities and search types where[contains(“amsterdam”)] and 52.37N 4.89 E topic[contains(“building”)] “amsterdam” Image Song Query analysis Federation Music search Text search Image Similarity index search XML search Geo search Inverted index Similarity index Semantic index R-tree index
  • 124. Faceted query  When a media collection is large and its content unknown to the user, exposing part of the metadata can help  This can be done by showing a compact representation of the categories of content (facets)  A query can be restricted by selecting only the relevant facets
  • 125. Querying: by keyword  The keyword may match the manual metadata and/or the automatic metadata  The match can be multimodal: in the audio, in a visual concept
  • 126. Querying: by similarity – query interface
  • 127. Content browsing  In textual search engines, results are ranked linearly, browsed by navigating links, and read at a glance  In MIR and similarity- based search applications, browsing results must consider multiple dimensions  Relevance: where the result appears in the sequence of retrieved media elements  Space: where the search has matched inside a spatially organized media element (e.g., an image)  Time: when a match occurs in a linear media element
  • 128. Browsing: timeline-based video access
  • 129. References  MPEG-7:  MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg- 7/mpeg-7.htm  Prof. Ray Larson & Prof. Marc Davis, UC Berkeley SIMS http://www.sims.berkeley.edu/academics/courses/is 202/f03/  RSS: http://www.rssboard.org/rss-specification  MEDIA RSS: http://search.yahoo.com/mrss  MPEG:http://en.wikipedia.org/wiki/MPEG  Shot detection: http://en.wikipedia.org/wiki/Shot_boundary_detec tion
  • 130. References  MediaMill: http://www.science.uva.nl/research/mediamill  Similarity search  www.midimi.com  www.tiltomo.com  http://tineye.com/  Slides del corsodi “ArchiviMultimedialie Data Mining”, Politecnicodi Torino, Prof. Silvia Chiusano  Slides e video dellelezionetenutedal Prof. Thorsten Hermes presso la summer school SSMS 2006  PHAROS: http://www.pharos-audiovisual- search.eu/
  • 131. 5.2 RSS and readers
  • 132. Acquisition: RSS and Media RSS  RSS (Really Simple Syndication) describes a family of web feed formats used to publish frequently updated web resources (e.g., news)  An RSS feed includes full or summarized text, plus metadata such as publishing dates and authorship  RSS formats are specified using XML  RSS 2.0 now “frozen”  Media RSS proposed by Yahoo as an RSS module that supplements the <enclosure> element capabilities of RSS 2.0 to allow for more robust media syndication.
  • 133. Acquisition: Example of RSS 2.0
  • 134. Acquisition: Browser rendition of RSS
  • 135. Acquisition: an example of Media RSS
  • 136. Indexing: Media segmentation in MPEG-7
  • 137. Bloglines: web content aggregator 138
  • 138. Google reader 139
  • 139. Social bookmarking  Online shared catalogs of annotated bookmarks  Even ad-hoc sites are needed for managing complexity of bookmark sharing task 140
  • 140. 5.3 Personalization
  • 141. Why Personalization?  Personalization is an attempt to find most relevant documents using information about user's goals, knowledge, preferences, navigation history, etc.
  • 142. Same Query, Different Intent  “Cancer”  Different meanings  “Information about the astronomical/astrological sign of cancer”  “information about cancer treatments”  Different intents  “is there any new tests for cancer?”  “information about cancer treatments”
  • 143. Personalization Algorithms  Standard IR Query Server Document Client User  Related to relevance feedback  Query expansion  Result re-ranking
  • 144. User Profile  A user‟s profile is a collection of information about the user of the system.  This information is used to get the user to more relevant information
  • 145. Core vs. Extended User Profile  Core profile  contains information related to the user search goals and interests  Extended profile  contains information related to the user as a person in order to understand or model the use that a person will make with the information retrieved
  • 146. Who Maintains the Profile?  Profile is provided and maintained by the user/administrator  Sometimes the only choice  The system constructs and updates the profile (automatic personalization)  Collaborative - user and system  User creates, system maintains  User can influence and edit  Does it help or not?
  • 147. Adaptive Search  Goals:  Present documents (pages) that are most suitable for the individual user  Methods:  Employ user profiles representing short-term and/or long- term interests  Rank and present search results taking both user query and user profile into account
  • 148. Personalized Search: Benefits  Resolving ambiguity  The profile provides a context to the query in order to reduce ambiguity.  Example: The profile of interests will allow to distinguish what the user asked about “Berkeley” (“Pirates”, “Jaguar”) really wants  Revealing hidden treasures  The profile allows to bring to surface most relevant documents, which could be hidden beyond top results page  Example: Owner of iPhone searches for Google Android. Pages referring to both would be most interesting
  • 149. Where to Apply Profiles ?  The user profile can be applied in several ways:  To modify the query itself (pre-processing)  To change the usual way of retrieval  To process results of a query (post-processing)  To present document snippets  Special case: adaptation for meta-search
  • 150. Pre-Process: Query Expansion  User profile is applied to add terms to the query  Popular terms could be added to introduce context  Similar terms could be added to resolve indexer-user mismatch  Related terms could be added to resolve ambiguity  Works with any IR model or search engine
  • 151. Pre-Process: Relevance Feedback  In this case the profile is used to “move” the query  Imagine that:  the documents,  the query  the user profile are represented by the same set of weighted index terms
  • 152. Post-Processing  The user profile is used to organize the results of the retrieval process  Present to the user the most interesting documents  Filter out irrelevant documents  Extended profile can be used effectively  In this case the use of the profile adds an extra step to processing  Similar to classic information filtering problem  Typical way for adaptive Web IR
  • 153. Post-Filter: Annotations  The result could be relevant to the user in several aspects. Fusing this relevance with query relevance is error prone and leads to a loss of data  Results are ranked by the query relevance, but annotated with visual cues reflecting other kinds of relevance  User interests - Syskill and Webert, group interests - KnowledgeSea
  • 154. Post-Filter: Re-Ranking  Re-ranking is a typical approach for post-filtering  Each document is rated according to its relevance (similarity) to the user or group profile  This rating is fused with the relevance rating returned by the search engine  The results are ranked by fused rating  User model: WIFS, group model: I-Spy
  • 155. Privacy related problems  Web Information Retrieval face a challenge; that the data required to perform evaluations, namely query logs and click- through data, is not readily available due to valid privacy concerns.  Researchers can:  Limit to small (and sometimes biased) samples of users, restricting somewhat the conclusions that can be drawn.  Limit the usage of private data to local computation, exploiting personal data only in post processing search result.  Look for publicly available data that can be used to approximate query logs and click-through data (such as user bookmarks). 157
  • 156. Tag Data and Personalized Information Retrieval  Recently it has been shown that the information contained in social bookmarking (tagging) systems may be useful for improving Web search.  Using data from the social bookmarking site del.icio.us, it is possible to demonstrate how one can rate the quality of personalized retrieval results.  User's “bookmark history" can be used to improve search results via personalization.  Analogously to studies involving implicit feedback mechanisms in IR, which have found that profiles based on the content of clicked URLs outperform those based on past queries alone, profiles based on the content of bookmarked URLs are generally superior to those based on tags alone. 158
  • 157. Tag Data and Personalized Information Retrieval  Social bookmarking systems such as del.icio.us and Bibsonomy are a recent and popular phenomenon.  Users label interesting web pages (or research articles) with primarily short and unstructured annotations in natural language called tags.  These sites offer an alternative model for discovering information online.  Rather than following the traditional model of submitting queries to a Web search engine, users can browse tags as though they were directories looking for popular pages that have been tagged by a number of different users. Since tags are chosen by users from an unrestricted vocabulary, these systems can be seen to provide consensus categorizations of interesting websites. 159
  • 158. Tag Data and Personalized Information Retrieval  How social bookmarking data can be used to improve Web search?  Can tag data be used to approximate actual user queries to a search engine?  How evaluate personalized IR systems using information contained in social bookmarks (tag data)?  Is there enough information in (i.e. a strong enough correlation between) the tags/bookmarks in a user's history in order to build a profile of the user that will be useful for personalizing search engine results? 160
  • 159. Models for generating a profile of the user  We record the (time ordered) stream of webpages that have been bookmarked by a particular user  The first simple profile involves counting the occurrences of terms in the tags of any of the known bookmarks.  An obvious problem is that users often have multiple interests and their many bookmarks cover a range of topics. Thus some bookmarks may be completely unrelated to the nth bookmark (and thus the tags being used as the current query). 161
  • 160.  The second source of information in the bookmarks is the content of the bookmarked pages themselves.  One would expect given the much larger vocabulary of Web pages compared to tag data, that content may prove more useful than tags. Indeed content-based profiles are more useful than query-based ones.  A user spends more time deliberating which pages to bookmark than deciding which search results to click on.  Since a user will only bookmark sites that they find particularly useful or interesting, these documents should contain a lot of useful information about the user and the content of bookmarked documents is particularly useful for personalization. 162
  • 161.  The previous profile is somewhat adhoc in its decision which documents to include and which not to include.  In theory, we would like to include all documents that the user has bookmarked, but weight them according to their expected usefulness for resolving ambiguity in the current query.  Our first attempt to estimate the distance between two bookmarks is to count the number of common terms in their respective sets of tags 163
  • 162. How do we use these profiles?  In order to incorporate the user profile for personalized information retrieval queries are expanded with terms from the profile, weighting them appropriately.  The number of expansion terms to be added to the query is limited so as to limit the amount of noise and total length of the expanded query.  In particular, the K most frequent terms from the profile are added and the weights to account for the missing terms are normalized. 164
  • 163. 5.4 Recommendation systems
  • 164. Introduction to Recommender Systems  Systems for recommending items (e.g. books, movies, CD’s, web pages, newsgroup messages) to users based on examples of their preferences.  Objective:  To propose objects fitting the user needs/wishes  To sell services (site visits) or goods  Many search engines and on-line stores provide recommendations (e.g. Amazon, CDNow).  Recommenders have been shown to substantially increase clicks (and sales).
  • 165. Book Recommender Red Mars Found ation Juras- sic Park Machine User Lost Learning Profile World 2001 Neuro- 2010 mancer Differ- ence Engine
  • 166. Personalization  Recommenders are instances of personalization software.  Personalization concerns adapting to the individual needs, interests, and preferences of each user.  Includes:  Recommending  Filtering  Predicting (e.g. form or calendar appt. completion)  From a business perspective, it is viewed as part of Customer Relationship Management (CRM).
  • 167. Machine Learning and Personalization  Machine Learning can allow learning a user model or profile of a particular user based on:  Sample interaction  Rated examples  Similar user profiles  This model or profile can then be used to:  Recommend items  Filter information  Predict behavior
  • 168. Types of recommendation systems 1.Search-based recommendations 2.Category-based recommendations 3.Collaborative filtering 4.Clustering 5.Association rules 6.Information filtering 7.Classifiers
  • 169. 1. Search-based recommendations  The only visitor types a search query  « data mining customer »  The system retrieves all the items that correspond to that query  e.g. 6 books  The system recommends some of these books based on general, non-personalized ranking (sales rank, popularity, etc.)
  • 170. Search-based recommendations  Pros:  Simple to implement  Cons:  Not very powerful  Which criteria to use to rank recommendations?  Is it really « recommendations »?  The user only gets what he asked for
  • 171. 2. Category-based recommendations  Each item belongs to one category or more.  Explicit / implicitchoice:  The customer select a category of interest (refinesearch, opt-in for category- basedrecommendations, etc.). – « Subjects> Computers & Internet >Databases> Data Storage & Management > Data Mining »  The system selects categories of interest on the behalf of the customer, based on the current item viewed, past purchases, etc.  Certain items (bestsellers, new items) are eventually recommended
  • 172. Category-based recommendations  Pros:  Still simple to implement  Cons:  Again: not very powerful, which criteria to use to order recommendations? is it really « recommendations »?  Capacity highly depends upon the kind of categories implemented – Too specific: not efficient – Not specific enough: no relevant recommendations
  • 173. 3. Collaborative filtering  Collaborative filtering techniques « compare » customers, based on their previous purchases, to make recommendations to « similar » customers  It’s also called « social » filtering  Follow these steps: 1.Find customers who are similar (« nearest neighbors ») in term of tastes, preferences, past behaviors 2.Aggregate weighted preferences of these neighbors 3.Make recommendations based on these aggregated, weighted preferences (most preferred, unbought items)
  • 174. Collaborative filtering  Example: the system needs to make recommendations to customer C Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X  Customer B is very close to C (he has bought all the books C has bought). Book 5 is highly recommended  Customer D is somewhat close. Book 6 is recommended to a lower extent  Customers A and E are not similar at all. Weight=0
  • 175. Collaborative filtering  Pros:  Extremely powerful and efficient  Very relevant recommendations  (1) The bigger the database, (2) the more the past behaviors, the better the recommendations  Cons:  Difficult to implement, resource and time-consuming  What about a new item that has never been purchased? Cannot be recommended  What about a new customer who has never bought anything? Cannot be compared to other customers no items can be recommended
  • 176. 4. Clustering  Another way to make recommendations based on past purchases of other customers is to cluster customers into categories  Each cluster will be assigned « typical » preferences, based on preferences of customers who belong to the cluster  Customers within each cluster will receive recommendations computed at the cluster level
  • 177. Clustering Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X  Customers B, C and D are « clustered » together. Customers A and E are clustered into another separate group  « Typical » preferences for CLUSTER are:  Book 2, very high  Book 3, high  Books 5 and 6, may be recommended  Books 1 and 4, not recommended at all
  • 178. Clustering Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X  How does it work?  Any customer that shall be classified as a member of CLUSTER will receive recommendations based on preferences of the group:  Book 2 will be highly recommended to Customer F  Book 6 will also be recommended to some extent
  • 179. Clustering  Problem: customers may belong to more than one cluster; clusters may overlap  Predictions are then averaged across the clusters, weighted by participation Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X
  • 180. Clustering  Pros:  Clustering techniques work on aggregated data: faster  It can also be applied as a « first step » for shrinking the selection of relevant neighbors in a collaborative filtering algorithm  Cons:  Recommendations (per cluster) are less relevant than collaborative filtering (per individual)
  • 181. 5. Association rules  Clustering works at a group (cluster) level  Collaborative filtering works at the customer level  Association rules work at the item level
  • 182. Association rules  Past purchases are transformed into relationships of common purchases Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Customer A X X Customer B X X X Customer C X X Customer D X X Customer E X X Customer F X X Also bought… Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 1 1 1 who bought… Customers Book 2 2 1 1 Book 3 2 2 Book 4 1 Book 5 1 1 2 Book 6 1
  • 183. Association rules  These association rules are then used to make recommendations  If a visitor has some interest in Book 5, he will be recommended to buy Book 3 as well  Recommendations are constrained to some minimum levels of confidence  What if recommendations can be made using more than one piece of information?  Recommendations are aggregated Also bought… Book 1 Book 2 Book 3 Book 4 Book 5 Book 6 Book 1 1 1 who bought… Customers Book 2 2 1 1 Book 3 2 2 Book 4 1 Book 5 1 1 2 Book 6 1
  • 184. Association rules  Pros:  Fast to implement  Fast to execute  Not much storage space required  Not « individual » specific  Very successful in broad applications for large populations, such as shelf layout in retail stores  Cons:  Not suitable if knowledge of preferences change rapidly  It is tempting to do not apply restrictive confidence rules May lead to litteraly stupid recommendations
  • 185. 6. Information filtering  Association rules compare items based on past purchases  Information filtering compare items based on their content  Also called « content-based filtering » or « content-based recommendations »  Can exploit syntactical information on objects (features)  But also semantic knowledge of objects (concepts/ontologies)
  • 186. Information filtering  What is the « content » of an item?  It can be explicit « attributes » or « characteristics » of the item. For example for a film:  Action / adventure  Feature Bruce Willis  Year 1995  It can also be « textual content » (title, description, table of content, etc.)  Several techniques exist to compute the distance between two textual documents
  • 187. Information filtering  How does it work?  A textual document is scanned and parsed  Word occurrences are counted (may be stemmed)  Several words or «tokens» are not taken into account: rarely used or «stop words»  Each document is transformed into a normed TFIDF vector, size N(Term Frequency / Inverted Document Frequency).  The distance between any pair of vector is computed
  • 188. Information filtering  An (unrealistic) example: how to compute recommendations between 8 books based only on their title?  Books selected:  Building data mining applications for CRM  Accelerating Customer Relationships: Using CRM and Relationship Technologies  Mastering Data Mining: The Art and Science of Customer Relationship Management  Data Mining Your Website  Introduction to marketing  Consumer behavior  marketing research, a handbook  Customer knowledge management
  • 189. COUNT building data Accelerating Mastering Data Data Mining Your Introduction to consumer marketing customer mining Customer Mining: The Art Website marketing behavior research, a knowledge applications for Relationships: and Science of handbook management crm Using CRM and Customer Relationship Relationship Technologies Management a 1 accelerating 1 and 1 1 application 1 art 1 behavior 1 building 1 consumer 1 crm 1 1 customer 1 1 1 data 1 1 1 for 1 handbook 1 introduction 1 knowledge 1 management 1 1 marketing 1 1 mastering 1 mining 1 1 1 of 1 relationship 2 1 research 1 science 1 technology 1 the 1 to 1 using 1 website 1 your 1
  • 190. TFIDF Normed Vectors building data Accelerating Mastering Data Data Mining Your Introduction to consumer marketing customer mining Customer Mining: The Art Website marketing behavior research, a knowledge Mastering Data Mining: applications for Relationships: and Science of Data mining handbook management crm Using CRM and Customer The Art and Science Relationship Relationship Technologies of Customer Relationship Management your website a 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 accelerating and 0.000 0.000 Management 0.432 0.296 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 application 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 art 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 behavior 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000 building 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 consumer 0.000 0.000 0.000 0.000 0.000 0.707 0.000 0.000 crm 0.344 0.296 0.000 0.000 0.000 0.000 0.000 0.000 customer 0.000 0.216 0.187 0.000 0.000 0.000 0.000 0.381 data 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000 for 0.502 0.000 0.000 0.000 0.000 0.000 0.000 0.000 handbook 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 introduction 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000 knowledge 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.763 management 0.000 0.000 0.256 0.000 0.000 0.000 0.000 0.522 marketing 0.000 0.000 0.000 0.000 0.436 0.000 0.368 0.000 mastering 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 mining 0.251 0.000 0.187 0.316 0.000 0.000 0.000 0.000 of 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 relationship research 0.000 0.000 Data0.468 0.000 0.256 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.537 0.000 0.000 science 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 technology 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000 the 0.000 0.000 0.374 0.000 0.000 0.000 0.000 0.000 to 0.000 0.000 0.000 0.000 0.636 0.000 0.000 0.000 using 0.000 0.432 0.000 0.000 0.000 0.000 0.000 0.000 website 0.000 0.000 0.000 0.632 0.000 0.000 0.000 0.000 your 0.000 0.000 0.187 0.000 0.632 0.316 0.000 0.000 0.000 0.000
  • 191. Information filtering  A customer is interested in the following book: « Building data mining applications for CRM »  The system computes distances between this book and the 7 others  The « closest » books are recommended:  #1:Data Mining Your Website  #2:Accelerating Customer Relationships: Using CRM and Relationship Technologies  #3:Mastering Data Mining: The Art and Science of Customer Relationship Management  Not recommended:Introduction to marketing  Not recommended: Consumer behavior  Not recommended:marketing research, a handbook  Not recommended: Customer knowledge management
  • 192. Information filtering  Pros:  No need for past purchase history  Not extremely difficult to implement  Cons:  « Static » recommendations  Not efficient is content is not very informative e.g. information filtering is more suited to recommend technical books than novels or movies
  • 193. 7. Classifiers  Classifiers are general computational models  They may take in inputs:  Vector of item features (action / adventure, Bruce Willis)  Preferences of customers (like action / adventure)  Relations among items  They may give as outputs:  Classification  Rank  Preference estimate  That can be a neural network, Bayesian network, rule induction model, etc.  The classifier is trained using a training set
  • 194. Classifiers  Pros:  Versatile  Can be combined with other methods to improve accuracy of recommendations  Cons:  Need a relevant training set
  • 195. Collaborative Filtering  Maintain a database of many users’ ratings of a variety of items.  For a given user, find other similar users whose ratings strongly correlate with the current user.  Recommend items rated highly by these similar users, but not rated by the current user.  Almost all existing commercial recommenders use this approach (e.g. Amazon).
  • 196. Collaborative Filtering A 9 A A 5 A A 6 A 10 User B 3 B B 3 B B 4 B 4 C C 9 C C 8 C C 8 Database : : : : : : : : : : . . Z 5 Z 10 Z 7 Z Z Z 1 A 9 A 10 B 3 B 4 Correlation C C 8 Match : : . . Z 5 Z 1 A 9 Active B 3 Extract User C C Recommendations . . Z 5
  • 197. Collaborative Filtering Method  Weight all users with respect to similarity with the active user.  Select a subset of the users (neighbors) to use as predictors.  Normalize ratings and compute a prediction from a weighted combination of the selected neighbors’ ratings.  Present items with highest predicted ratings as recommendations.
  • 198. Significance Weighting  Important not to trust correlations based on very few co-rated items.  Include significance weights, based on number of co-rated items.  If no items are rated by both users, correlation is not meaningful
  • 199. Neighbor Selection  For a given active user, a, select correlated users to serve as source of predictions.  Standard approach is to use the most similarn users, u, based on similarity weights, wa,u  Alternate approach is to include all users whose similarity weight is above a given threshold.
  • 200. Rating Prediction  Predict a rating, for each item, for active user, by using the n selected neighbor users  To account for users different ratings levels, base predictions on differences from a user’s average rating  To avoid bias from optimistic/pessimistic users  Weight users’ ratings contribution by their similarity to the active user.
  • 201. Problems with Collaborative Filtering  Cold Start: There needs to be enough other users already in the system to find a match.  Sparsity: If there are many items to be recommended, even if there are many users, the user/ratings matrix is sparse, and it is hard to find users that have rated the same items.  First Rater: Cannot recommend an item that has not been previously rated.  New items  Esoteric/niche items  Popularity Bias: Cannot recommend items to someone with unique tastes.  Tends to recommend popular items.
  • 202. Content-Based Recommending  Recommendations are based on information on the content of items rather than on other users’ opinions.  Uses a machine learning algorithm to induce a profile of the users preferences from examples based on a featural description of content.
  • 203. Advantages of Content-Based Approach  No need for data on other users.  No cold-start or sparsity problems.  Able to recommend to users with unique tastes.  Able to recommend new and unpopular items  No first-rater problem.  Can provide explanations of recommended items by listing content-features that caused an item to be recommended.
  • 204. Disadvantages of Content-Based Method  Requires content that can be encoded as meaningful features.  Users’ tastes must be represented as a learnable function of these content features.  Unable to exploit quality judgments of other users.  Unless these are somehow included in the content features.
  • 205. Example: a Book Recommending Agent  Content-based recommender for books using information about titles extracted from Amazon.  Uses information extraction from the web to organize text into fields:  Author  Title  Editorial Reviews  Customer Comments  Subject terms  Related authors  Related titles
  • 206. Overview Amazon Pages Database Information Extraction Rated Examples Machine Learning Learner Recommendations 1.~~~~~~ User Profile 2.~~~~~~~ 3.~~~~~ : : : Predictor
  • 207. Sample Amazon Page
  • 208. Content Information  Extracted information is used to form “bags of words” for the following slots:  Author  Title  Description (reviews and comments)  Subjects  Related Titles  Related Authors
  • 209. User ratings  User rates selected titles on a 1 to 10 scale.  A text-categorization algorithm allows to learn a profile from these rated examples.  Rating 6–10: Positive  Rating 1–5: Negative  The learned profile is used to rank all other books as recommendations based on the computed posterior probability that they are positive.  User can also provide explicit positive/negative keywords, which are used as priors to bias the role of these features in categorization.
  • 210. Combining Content and Collaboration  Content-based and collaborative methods have complementary strengths and weaknesses.  Combine methods to obtain the best of both.  Various hybrid approaches:  Apply both methods and combine recommendations.  Use collaborative data as content.  Use content-based predictor as another collaborator.  Use content-based predictor to complete collaborative data.
  • 211. - Combined Recommending Metrics  Mean Absolute Error (MAE)  Compares numerical predictions with user ratings  ROC sensitivity  How well predictions help users select high-quality items  Ratings 4 considered “good”; < 4 considered “bad”
  • 212. - Combined Recommending Typical experiment results - I MAE 1,06 1,04 1,02 1 CF MAE 0,98 Content 0,96 Naïve 0,94 CBCF 0,92 0,9 Algorithm CBCF is significantly better (4% over CF) at (p < 0.001)
  • 213. - Combined Recommending Typical experiment results - II ROC Sensitivity 0,68 0,66 0,64 CF ROC-4 Content 0,62 Naïve CBCF 0,6 0,58 Algorithm CBCF outperforms rest (5% improvement over CF)
  • 214. Conclusions  Recommending and personalization are important approaches to combating information over-load.  Machine Learning is an important part of systems for these tasks.  Collaborative filtering has problems.  Advanced content based filtering can leverage on semantic models (ontologies).  Content-based methods address these problems (but have problems of their own).  Integrating both is best.
  • 215. Strategic Scenarios in Digital business
  • 216. Enterprise 3.0
  • 217. Agenda  Virtualisation – The Story so far  Future Directions for the Datacenter  What is Cloud Computing  Introduction to MapReduce  Introduction to Ajax  Cloud and Privacy Issues 219
  • 218. Virtualisation – The Story so far
  • 219. Virtualisation 221
  • 220. Virtualisation 222
  • 221. Virtualisation 223
  • 222. The Worldwide Server Market 2008: 8M total server units shipped (x86 server units =7.6M) Other Servers 5% x86 servers 95% 2008: $54.4B total factory revenue ($28.7B x86 server factory revenue) Other x86 Servers servers 47% 53% 30M x86 Servers In Data Centers Today Source: IDC Server Average Utilization = 5 -10% Tracker 224
  • 223. Migration from Physical to Virtual Consolidation Management: migration from physical to virtual machines 225
  • 224. Scalability It is possible to move Virtual Machines, without interrupting the applications running inside 226
  • 225. Automatic Scalability It is possible to automatically balances the Workloads according to set limits and guarantees 227
  • 226. High Availability Servers and Applications are protected against component and system X failure 229
  • 227. Future Directions for the Datacenter
  • 228. Future Directions for the Datacenter  Virtual datacenter  Remote desktop  Cloud computing 232
  • 229. From Virtual Infrastructure to Virtual Datacenter Windows Linux Future Future Future … Application Availability Security Scalability Management On-Premise Infrastructure Compute Storage Network Cloud Datacent er Off- Premise Infrastruc Cloud ture 233
  • 230. Desktops That Follow the User User Env Thin User Env Client Virtualization PC D a at User App Env OS Client Virtualization MOBILE User 234
  • 231. The Desktop Dilemma IT Needs To Bring Down End Users Want Their Costs, but… Desktop To Follow Them  Thick or Thin?  Want info to be accessibly anywhere, anytime  Mobile or Not?  Don’t want info locked up  Windows or Mac? in a single device  But don’t want to give up richness 235
  • 232. User Experience WAN LAN LOCAL  Deliver via client  Hi Latency virtualization  Low Bandwidth  Hi Speed  Rich portable  Full 3D / desktop  “Productive Desktop” Multimedia  Optimal media experience 236
  • 233. Centralized Management Provisioning  Deploy dozens of virtual desktops quickly and with lower storage requirements Image Updating  Update VDI desktops and virtualized laptops from a single master image Policy Enforcement  Centralized security policies of virtualized laptops 237
  • 234. Off-premise Cloud migration Off-Premise Cloud On-Premise Cloud Elastic Capacity: Local Cloud to Off-premise Cloud migration 238
  • 235. What is Cloud Computing
  • 236. The Problem Overwhelming complexity >70% of IT budgets just to keep the lights on <30% of IT budgets goes to innovation and competitive advantage 240
  • 237. The Goal IT Efficiency as a Service Control (Internally or Externally Provisioned) Choice 241
  • 238. What is Cloud Computing?  Coined in late of 2007  A hot topic due to its abilities to offer flexible dynamic IT infrastructures, QoS guaranteed computing environment and configurable software services  Different flavors ...  for application and IT users, it‟s IT as a service (ITaaS) – that is, delivery of computing, storage, and applications over the Internet from centralized data center  for Internet application developers, it‟s an Internet scale software development platform and runtime environment  for infrastructure providers and administrators, it‟s the massive, distributed data center infrastructure connected by IP networks 242
  • 239. Why Is Cloud Computing Distinct?  User-centric interfaces  On-demand service provisioning  QoS guaranteed offer  Autonomous System  Scalability and flexibility 243
  • 240. Cloud computing ingredients A mix of ...  Large scale problems  Large data centers  Three-layer architecture  Highly-interactive applications  Programming model 244
  • 241. 1. Large Scale Problems  Characteristics:  Definitely data-intensive  May also be processing intensive  Examples:  Web 2.0 applications  Batch processing (e.g., billing cycles)  Data mining applications (e.g., campaign management)  Crawling, indexing, searching, mining the Web 245
  • 242. 2. Large Data Centers  Scale problems?  Throw more machines at it!  Clear trend:  centralization of computing resources in large data centers  Important Issues:  Redundancy  Efficiency  Utilization  Management 246
  • 243. 2. Large Data Centers 247 Maximilien Brice, © CERN
  • 244. 3. Three-layer architecture “Why do it yourself if you can pay someone to do it for you?”  Utility computing (IaaS)  Why buy machines when you can rent cycles?  Examples: Amazon‟s EC2, GoGrid, AppNexus  Platform as a Service (PaaS)  Give me nice API and take care of the implementation  Example: MapReduce, Hadoop  Software as a Service (SaaS)  Just run it for me!  Example: Gmail 248
  • 245. Example Products For Each Layer 249
  • 246. Hardware as a Service (HaaS)  Hardware as a Service was coined in 2006  As the result of rapid advances in hardware virtualization, IT automation and usage metering and pricing, users could buy IT hardware, or even an entire data center, as a pay-as-you-go subscription service  The HaaS is flexible, scalable and manageable to meet user‟s needs 250
  • 247. Software as a Service (SaaS)  Software or an application is hosted as a service and provided to customers across the Internet.  This mode eliminates the need to install and run the application on the customer‟s local computers  An early example of the SaaS is Saleforce.com 251
  • 248. Data as a Service (DaaS)  Data in various formats and from multiple sources could be accessed via services by users on the network.  Examples:  Amazon Simple Storage Service (S3): a simple Web services interface that can be used to store and retrieve data from anywhere on the Web  Google Docs 252
  • 249. Virtual Storage Provisioning APP APP APP Virtual machine disks OS OS OS consume only the amount of VM physical space in use Thick Thin Thin Significantly improve storage 20GB 40GB 100GB utilization Virtual 20GB Disks 20GB Eliminate need to over- 40GB provision virtual disks 100GB Reduce storage costs by up to 50% Datastore 60GB 20GB 253
  • 250. Infrastructure as a Service (IaaS)  Based on the support of HaaS, SaaS and DaaS, the cloud computing in addition can deliver the Infrastructure as a Service (IaaS) for users  Users thus can on-demand subscribe to their favorite computing infrastructures with requirements of hardware configuration, software installation and data access demands 254
  • 251. Relationship Between Services 255
  • 252. 4. Highly-interactive applications  What is the nature of software applications?  From the desktop to the browser  SaaS == Web-based applications  Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications?  AJAX (asynchronous JavaScript and XML)  “Front-end” of cloud computing 256
  • 253. 5. Programming model  Users drive into the computing Cloud with data and applications.  Cloud programming models have been proposed for users  Examples:  MapReduce (by Google)  Hadoop (by Apache)  Back-end software technology for cloud computing  Batch-oriented processing of large data 257
  • 254. Public Clouds  Large scale infrastructure available on a rental basis  Operating System virtualization provides CPU isolation  “Roll-your-own” network provisioning provides network isolation  Locally specific storage abstractions  Fully customer self-service  Service Level Agreements (SLAs) are advertised  Requests are accepted and resources granted via web services  Customers access resources remotely via the Internet  Accountability is e-commerce based  Web-based transaction  “Pay-as-you-go” and flat-rate subscription 258
  • 255. Cloud Mythologies  Cloud computing infrastructure is just a web service interface to operating system virtualization.  “I‟m running VMware in my data center – I‟m running a private cloud.”  Cloud computing imposes a significant performance penalty over “bare metal” provisioning.  “I won‟t be able to run a private cloud because my users will not tolerate the performance hit.”  Clouds and Grids are equivalent  “In the mid 1990s, the term grid was coined to describe technologies that would allow consumers to obtain computing power on demand.” 259
  • 256. The Path to IT as a Service (ITaaS) Trusted Efficient Reliable Flexible Private Cloud Secure Dynamic App Loads Federation & Choice DATACENTER Internal TODAY Standards External Cloud Cloud Computing Cloud Efficient • Reliable • Flexible • Secure • Dynamic 11 260
  • 257. Cloud Delivers CapEx and OpEx Savings Compute Storage Network Storage/network Storage Thin Network optimizations Provisioning Distributed Switch Power Volume Grow Management Third party distributed virtual switches High consolidation ratios Most efficient use of hardware resources Low operational overhead 261
  • 258. Cloud Delivers Green IT and Optimization Consolidates workloads onto fewer servers when the cluster needs fewer resources Places unneeded servers in standby mode APP APP APP APP APP APP APP APP APP OS OS OS OS OS OS OS OS OS Brings servers back online as VM workload needs increase Minimizes power consumption while guaranteeing service levels brings servers powers off back online server when No disruption or downtime when load requirements to virtual machines increases are lower 262
  • 259. Rolling Out a New Business Service Availability Availability Security Internal APP Cloud Security Performance Performance Lowest TCO Becomes a Matter of Specifying Required SLAs 263
  • 260. Future Proof IT… Internal Cloud External Cloud APP APP Owned and Rented by IT Operated by IT Unlock new market based economies of scale, service and innovation beyond what currently exists today 264
  • 261. Choice of Providers Choice from a rich ecosystem of clouds High-end SLA Cloud Disaster Test/Dev as Recovery a Service as a Service 35 265
  • 262. Challenges  ITaaS is a highly disruptive concept for enterprise users  Key business and technical challenges include cost, security, performance, business resiliency, interoperability, and data migration  According to IDC, IT spending on cloud services will reach US$42 billion by 2012 266
  • 263. Further Research Aspects  Collaboration applications  chat, instant messaging, Internet phone calling, ...  Application and data integration across clouds  leverage the available EAI, EII, and ESB technologies  Multimedia transmission and data mining  Transmitting bulky multimedia data across the network will continue to be a challenge, and it needs further research to speed up cloud computing  Service Management  Problems of discovering and composing services 267
  • 264. Introduction to MapReduce
  • 265. What is MapReduce? 269  It all boils down to…  Divide-and-conquer  Throwing more hardware at the problem
  • 266. Divide and Conquer “Work” Partition w1 w2 w3 worker worker worker r1 r2 r3 “Result” Combine 270
  • 267. Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different (virtual) machines in a distributed system 271
  • 268. On Amazon With EC2 0. Allocate Hadoop cluster 1. Move data to cluster EC2 2. Develop MapReduce application 3. Submit MapReduce application You Your Hadoop Cluster 4. Move data out of cluster 272
  • 269. Introduction to Ajax
  • 270. Ajax  Ajax refers to a number of technologies:  Asynchronous JavaScript and XML: An approach for building interactive Web applications From “old-school” Web applications to Ajax… 274
  • 271. “Old-School” Web Applications browser sends server generates user does something request to server Web page as a response to the request 1 3 browser server-side systems 2 request Interface Web backend server database response 4 5 data is returned browser replaces view in response to with data sent from server the request 275
  • 272. “Old-School” characteristics  User-driven:  Things only happen when the user does something (e.g., clicks on a link or button)  Views defined by URLs:  You can bookmark something and come back to it; use the forward/backward button  Simple user interaction model:  Not that many things you can do in browser  Synchronous Interaction:  System responses are synchronized with user-driven events 276
  • 273. From “Old-School” to Ajax Ajax intermediates between the interface and the server. browser server-side systems request Interface Ajax Web backend “engine” server database response data interaction management management 277
  • 274. Ajax: Things to watch out for!  Hype  Application development/maintenance cost  Brower incompatibilities  Many different approaches and tools  For many things, lack of agreed-on best practices  Behavior is not „Web-like‟  Standard things often don‟t work correctly (e.g., browser „back‟ button, bookmarks)  Usability issues for users with disabilities  Security issues 278
  • 275. The next frontier? Interactive Web applications browser server-side systems HTTP request Interface Ajax Apache MySQL “engine” HTTP response interaction management Hadoop cluster MapReduce HDFS Backend batch processing 279
  • 276. Facebook architecture Caching servers: 15 million requests per second, 95% handled by memcache (15 TB of RAM) Database layer: 800 eight-core Linux servers running MySQL (40 TB user data)
  • 277. Cloud and Privacy Issues
  • 278. Cloud Computing: Public Perception  “Use of Cloud Computing Applications and Services”  Study conducted by the Pew Research Center  April-May, 2008: sample of 1,553 Internet Users  Users may not know the term, but it‟s already here…
  • 279. Cloud computing activities Percent of Internet users who do the following… All 18-29 User webmail services 56% 77% Store personal photos online 34% 50% Use online applications 29% 39% Store personal videos online 7% 14% Pay to store computer files online 5% 9% Back up hard drive to an online site 5% 7% 284
  • 280. Reasons people use cloud applications Major Minor Not Easy and convenient 51% 23% 23% Ubiquitous access 41% 25% 32% Easily shared 39% 28% 29% Won’t lose information 34% 23% 23% 285
  • 281. Cloud Computing Issues  Reliability  Security, Privacy, and Anonymity  Access and Usage 286
  • 282. Reliability  Accuracy of results?  Consequences of failure?  Who bears the risks?  Liability for losses?  Corruption of data? 287
  • 283. Security, Privacy, Anonymity  Need for protection, particularly sensitive information  Personal (photos, emails, medical information)  Corporate (client accounts, internal memos, etc.)  Scientific (experimental data, results, etc.)  Technology isn‟t enough! 288
  • 284. Security, Privacy, Anonymity  Spying by other users of the cloud service  Monitoring by the cloud provider  Quality control  Data mining  Government surveillance  Cross-national issues 289
  • 285. Access and Usage  Intellectual Property  Licenses  Export and data sharing prohibitions 290
  • 286. Who’s driving policy today?  Telecommunications policy and law  What is a cloud provider (legally speaking)? … closest to a teleco  Contractual law  Often unilaterally dictated by the cloud provider 291
  • 287. 292 “you acknowledge that you bear sole responsibility for adequate security, protection and backup of your content”
  • 288. Possible Futures  Governmental regulations?  Laws do not keep pace with technology  Let the market handle it?  Service differentiation  Danger of monopolies  Combination of the two? 293
  • 289. 7. BPM and impact on ICT
  • 290. Enterprise 3.0 – some reference readings  http://www.zoliblog.com/2007/08/13/enterprise-30-where-is-it- headed-interesting-panel-with-the-wrong-title/  http://blogs.zdnet.com/SAAS/?p=76&tag=rbxccnbzd1  http://www.fastforwardblog.com/2007/04/19/what-is-enterprise- 30-heres-a-good-definition/  http://www.readwriteweb.com/archives/enterprise_30.php  Extended enterprise  How webex has done an excellent job capitalizing on the Extended Enterprise opportunity. http://www.sramanamitra.com/2007/02/21/saas-crm-the-extended- enterprise/  how Salesforce.com (Nasdaq:CRM) takes advantage of the Extended Enterprise trend. 295