Content Management, Metadata and Semantic Web

  • 5,801 views
Uploaded on

Keynote given at NetObjectDays conference, Erfurt, September 11, 2001. …

Keynote given at NetObjectDays conference, Erfurt, September 11, 2001.

One of the earliest keynotes discussing commercial semantic web technologies, semantic web applications (including semantic search, semantic targeting, semantic content management). Prof. Sheth started a Semantic Web company Taalee, Inc. in 1999 (Product was MediaAnywhere A/V search engine),that merged to become Voquette in 2001 (product was called SCORE), Semagix in 2004 (product was called Semagix Freedom), and then Fortent in 2006 (products included Know Your Customers). Additional details can be found in U.S. Patent #6311194, 30 Oct. 2001 (filed 2000).

Note: the commercial system used "WorldModel" as at the time, business customers were not yet warm to "Ontology" - the concept/intent is the same. More recent information at http://knoesis.org

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
5,801
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
135
Comments
1
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute.
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute.
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute. Companies in categorization field: Autonomy, Metacode (bought by Interwoven), Semio, Inxight, etc. Typical strategies employed by competition: Statistical/AI/Parsing/NLP/Rules-based/Collaborative Filtering Result: Partial success in categorization Placement of a document in a node, solely based on above strategies (nothing to do with metadata describing it – the basis behind semantics) Resulting classification – rigid/static/ambiguous/fuzzy Captures only standard physical metadata (source, date, length etc.), which is often useless in categorization purposes
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute. Taalee performs categorization by laying importance to semantic metadata extracted from any document Strategies employed by Taalee: Knowledge-based/Statistical/Rules-based/AI techniques Result: Complete success in categorization! Precise category/categories chalked out for classifying document Resulting classification – flexible/dynamic/unambiguous/crisp Value-added metadata churned out to rig out the context/gist of the document Metadata => Great potential for Automated Content Enrichment (ACE) Classifying into or mapping to other taxonomies possible Promise to greatly enhance the current functioning of Content Manager and Syndication Software/Service
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute. Why? What is its use?
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute.
  • 04/29/10 Taalee Proprietary & Confidential. Do not copy or distribute.

Transcript

  • 1. Content Management, Metadata & Semantic Web Keynote Address Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) [email_address] Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.
  • 2. Agenda
    • What is Traditional Content Management
    • New Content Management Challenges faced by Enterprises
      • Semantic Content Management
    • Metadata
      • Metadata Descriptions and Standards
      • (Automated) Metadata Creation/Extraction/Tagging
      • Metadata Usage/Applications
    • Semantics (and Semantic Web)
      • Current and Future
  • 3. Traditional Content Management: Core Objectives and Features
    • Primary Objective: Effectively create, manage and publish internal content, with
    • Existing content creation applications (MS-Office, Notes) and provide some new capabilities (Speech to text)
    • (Basic, Syntactic) metadata
    • Workflow or lifecycle support (from author to Web publication or distribution)
    • Versioning and Rollback
    • (Keyword-based/Syntactical) Search and Personalization
    • Internal Distribution
    • Web publishing
    Content Creation and Edition Content Management Content Personalization and Services Content Delivery
  • 4. Technology/Product Provider Landscape
    • Traditional Content Management Companies
      • Interwoven, Vignette, Broadvision, Enprise, Documentum, Open Market
    • Three of several upcoming companies focusing on metadata, semantics and/or semantic web
      • Applied Semantics, Voquette (Taalee), Ontoprise
      • See http://business.semanticweb.org for more
  • 5. Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)
    • “ If a new bond comes into inventory, then we should get a message, an alert...and be able to refine to say that I only have California, Oregon and Washington clients ...."
    • “ In the month of July, I received 95 e-mails from my subscriptions. These e-mails included 61 that had 143 attachments that had 67 more attachments. In total therefore, I received almost 400 documents including 5 different types (HTML,PDF, Word, Rich Media, …). Even with this volume, I had subscribed to only 10 categories in the Equities area. There are a total of 26 Equity Subscription areas and a total of 166 categories to which a user can subscribe across all Product Areas.”
    Professional users of a traditional Content Management Product/Solution
  • 6. Enterprise Content Management – sample user requirements (from a large Financial Svcs Company)
    • The real question is, " Which sales ideas may have significant relevance to my book of business ?" For example, an earnings warning on an equity rated Hold or Lower and not owned by any of my clients may not be of high relevance to me. Ideally, a relevance analysis would:
      • Greatly reduce the volume of Product Area Ideas sent to every FA, hopefully to perhaps 10% to 20% or less of today's volume with ideas that are potentially actionable for that FA and his/her client
      • Result in FAs reading and evaluating the Product Area Ideas, taking appropriate actions , and generating sales because the Product Area Ideas would be relevant
      • Result in customer satisfaction because clients would understand FAs are paying attention to their needs and developing focused ideas
    Professional users of a traditional Content Management Product/Solution
  • 7. Enterprise Content Management – sample product requirements (from a large Financial Svcs Company)
    • “ Content generation is a more complex and probably costly problem to solve ... we reportedly create about 9 million messages a month for field delivery. On average, this would mean 1,000 messages per month per ‘ big user’ or perhaps only 500 to 600 per ‘ little user’ .…I strongly believe an analysis is in order of the nature and necessity of generated content , the establishment of content generation standards , the movement towards development and implementation of a relevance engine, … “
    Director (Product Management) of a large company that uses a leading Content Management Product
  • 8. New Enterprise Content Management Challenges
    • More variety and complexity
      • More formats (MPEG, PDF, MS Office, WM, Real, AVI, etc)
      • More types (Docs, Images -> Audio, Video, Variety of text-structured, unstructured)
      • More sources (internal, extranet, internet, feeds)
    • Information Overload
      • Too much data, precious little information (Relevance)
    • Creating Value from Content
      • How to Distribute the right content to the right people as needed? (Personalization -- book of business)
      • Customized delivery for different consumption options (mobile/desktop, devices)
      • Insight, Decision Making (Actionable)
  • 9. New Enterprise Content Management Technical Challenges
    • Aggregation
      • Feed handlers/Agents that understand content representation and media semantics
      • Push-pull, Web-DB-Files, Structured-Semi-structured-Unstructured data of different types
    • Homogenization and Enhancement
      • Enterprise-wide common view
        • Domain model, taxonomy/classification, metadata standards
      • Semantic Metadata– created automatically if possible
    • Semantic Applications
      • Search, personalization, directory, alerts, etc. using metadata and semantics (semantic association and correlation), for improved relevance, intelligent personalization, customization
  • 10. Creating and Serving Metadata to Power the Life-cycle of Content Applications Back End "A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV “ Metadata increases content value in each step of content value chain.” Amit Sheth Where is the content? Whose is it? Produce Aggregate What is this content about? Catalog/ Index What other content is it related to? Integrate Syndicate What is the right content for this user? Personalize What is the best way to monetize this interaction? Interactive Marketing Broadcast, Wireline, Wireless, Interactive TV Semantic Metadata
  • 11. A Metadata Classification Data (Heterogeneous Types/Media) Content Independent Metadata (creation-date, location, type-of-sensor...) Content Dependent Metadata (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Ontologies Classifications Domain Models User More Semantics for Relevance to tackle Information Overload!!
  • 12. Semantics
    • “ meaning or relationship of meanings, or relating to meaning” (Webster)
    • is concerned with the relationship between the linguistic symbols and their meaning or real-world objects
    • meaning and use of data (Information System)
    • Example: Palm -> Company, Product, Technology, Tree Name, part of location (Palm Spring, Palm Beach)
    • Semantics, Ontologies (Domain Models), Metamodels, Metadata, Content/Data
  • 13. “ The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it . . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web , 1999) A Content Management centric definition of Semantic Web: The concept that Web-accessible content can be organized and utilized semantically, rather than though syntactic and structural methods. Semantics: The Next Step in the Web’s Evolution
  • 14. Next Generation: Semantic Content Management
  • 15. Organizing Content
    • Different and Related Objectives: Search, Browse, Summarization, Association/Relationships
    • Indexing
    • Clustering
    • Classification
    • Controlled Vocabulary, Reference Data/ Dictionary/Thesaurus
    • Metadata
    • Knowledge Base (Entities/Objects and Relationships)
  • 16. Statistical/AI Techniques Customer Article Feed 4715 Classification of Article 4715 Customer Training Set Traditional Text Categorization Routing/Distribution Classify Place in a taxonomy feed Most traditional Content Management Products support Categorization of unstructured content.. Standard Metadata Feed Source : iSyndicate   Posted Date : 11/20/2000
  • 17. Knowledge-base & Statistical/AI Techniques Article Feed 4715 Classification of Article 4715 Customer Training Set & KB Routing/Distribution Classify Place in a taxonomy Taalee Training Set & KB Map to another taxonomy Metadata Catalog Semantic Engine™ Precise Personalization/ Syndication/Filtering Voquette/Taalee’s Categorization & Automatic Metadata Creation feed Standard metadata Semantic metadata FTE Company Analysis Conference Calls Earnings Stock Analysis ENT Company Analysis Conference Calls Earnings Stock Analysis NYSE Member Companies Market News IPOs Automated Content Enrichment (ACE) Article 4715 Metadata Feed Source : iSyndicate   Posted Date : 11/20/2000 Company Name : France Telecom , Equant Ticker Symbol : FTE , ENT Exchange : NYSE Topic : Company News
  • 18. Technologies for Organizing Content
    • Information Retrieval/Document Indexing
    • TF-IDF/statistical, Clustering, LSI
    • Statistical learning/AI: Machine learning, Bayesian, Markov Chains, Neural Network
    • Lexical, Natural language
    • Thesaurus, Reference data, Domain models ( Ontology )
    • Information Extractors
    • Reasoning/Inferencing: Logic based, Knowledge-based, Rule processing and
    • Most powerful solutions require combine several of these, addressing more of the objectives
  • 19. Multiple competitng standards! Multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model Theme keywords : digital line graph, hydrography, transportation... Title : Dakota Aquifer Online linkage : http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method: Vector Horizontal Coordinate System Definition: Universal Transverse Mercator … … … ... UDK Metadata Model Search terms : digital line graph, hydrography, transportation... Topic : Dakota Aquifer Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ Measuring Techniques: Vector Co-ordinate System: Universal Transverse Mercator … … … ...
  • 20. Basis for Semantics
    • A. Facts/Concepts/Terms/Entities
    • Dictionary, Thesaurus, Reference Data, Vocabulary
    • B. Facts with Relationships
    • Taxonomy/(Categories), Ontology
    • Domain Modeling (e.g., Golf = golfer, tournament name, golf course, event)
    • Knowledge Base
  • 21. Ontology
    • Standardizes meaning, description, representation of involved concepts/terms/attributes
    • Captures the semantics involved via domain characteristics, resulting in semantic metadata
    • “ Ontological Commitment” forms basis for knowledge sharing and reuse
    • Ontology provides semantic underpinning.
  • 22. An Ontology Disaster eventDate description site => latitude, longitude site latitude longitude Natural Disaster Man-made Disaster damage numberOfDeaths damagePhoto Volcano Earthquake NuclearTest magnitude bodyWaveMagnitude conductedBy explosiveYield bodyWaveMagnitude < 10 bodyWaveMagnitude > 0 magnitude < 10 magnitude > 0 Terms/Concepts (Attributes) Functional Dependencies (FDs) Domain Rules Hierarchies
  • 23. Controlled Vocabularies/ Classifications/Taxonomies/Ontologies
    • WordNet
    • Cyc
    • The Medical Subject Headings (MeSH): NLM's controlled vocabulary used for indexing articles, for cataloging books and other holdings, and for searching MeSH-indexed databases, including MEDLINE . MeSH terminology provides a consistent way to retrieve information that may use different terminology for the same concepts. Year 2000 MeSH includes more than 19,000 main headings, 110,000 Supplementary Concept Records (formerly Supplementary Chemical Records), and an entry vocabulary of over 300,000 terms.
  • 24. Open Directory Project (ODP): Classification/Taxonomy & Directory
  • 25. Metadata Specifications (MetaModels) Metadata Domain Independent (Dublin Core, RDF, DAML+OIL) Frameworks/Infrastructures (XCM, XMI) Function Specific ICE (Syndication) Domain (Application) Specific MARC (Library), FGDC and UDK (Geographic), PRISM (Publishing), FXML (Financial Transactions). RIXML (Buy-Sell Research/Financial Services), IMS Learning Resource (Distance Learning). ….. Media Specific MPEGx, VoiceXML NewsML (News exchange)
  • 26. Types of Specs and Standards (or MetaModels)
    • Domain Independent: (MCF), RDF, (MOF), DublinCore
    • Media Specific: MPEG4, MPEG7, VoiceXML
    • Domain/Industry Specific (metamodels): MARC (Library), FGDC and UDK (Geographic), NewsML (News), PRISM (Publishing), RIXML (Buy-Sell Research/Financial Services)
    • Application Specific: ICE (Syndication), IMS Learning Resource (Distance Learning)
    • Exchange/Sharing: XCM, XMI
    • Orthogonal/(Other): RDFS, namespaces, ontologies, domain models, (DAML, OIL)
  • 27. Dublin Core Metadata Initiative
    • Simple element set designed for resource description
    • International, inter-discipline, W3C community consensus
    • “ Semantic” interface among resource description communities (very limited form of semantics)
    Source:www.desire.org
  • 28. Dublin Core RDF
    • <xml>
    • <?namespace href = &quot;http://w3.org/rdf-schema&quot; as = &quot;RDF&quot;>
    • <?namespace href = &quot;http://metadata.net/DC&quot; as = &quot;DC&quot;>
    • <RDF:Abbreviated>
    • <RDF:Assertion RDF:HREF = http://www.mysite.com/mydoc.html
    • DC:Title = &quot;I've Never Metadata I've Never Liked“
    • DC:Creator = &quot;Mary Crystal“
    • DC:Subject = &quot;Metadata, Dublin Core, Stuff&quot;/>
    • </RDF:Abbreviated>
    • </xml>
  • 29. NewsML Source:http://www.mediabricks.com The content provider supplies NewsML packaged media content to the operator. The content can be categorized as current events, finance, sport, etc. (but no standards is specified) and updated hourly. The operator receives NewsML data from the content provider. The content server automatically pushes updated news articles to all news service subscribers. Consumers sign up for the news service directly on the device. When using the news service, the user browses through the categories and reads the news articles. The news articles are presented in a continuous flow (one after the other) without end-user interaction.
  • 30. NewsML
    • Content-descriptive metadata:
    • < HeadLine > Seattle attacked by Godzilla-like creature, Microsoft closes HQ </ HeadLine >  
    • < DateLine > Seattle, Was., Aug 30, 2009 /AthensWire via COMTEX/ -- </ DateLine >  
    • < CopyrightLine > Copyright (C) 2009 AthensWire. All rights reserved. </ CopyrightLine >  
    • Administrative metadata:
    • < Provider >< Party FormalName =&quot; Comtex &quot; /></ Provider >
    • < Source >< Party FormalName =&quot; AthensWire &quot; /></ Source >
    • Rights metadata:
    • < CopyrightDate > 2009 </ CopyrightDate >
    • Descriptive metadata:
    • < Language FormalName =&quot; en &quot; />  
    • < Property FormalName =&quot; Location &quot; Value =“ Seattle, Washington, United States, North America &quot; />
    • < Property FormalName =&quot; PublicCompany &quot; Vocabulary =&quot; urn:newsml:comtexnews.net:20010201:DomesticPublicCompanies:1 &quot;>  
    • < Property FormalName =&quot; CompanyName &quot; Value =“ Microsoft Corp. &quot; />
    • < Property FormalName =&quot; StockSymbol &quot; Value =&quot; MSFT &quot;/>< Property FormalName =&quot; StockExchange &quot; Value =&quot; Nasdaq &quot; />
    • </ Property >
  • 31. RIXML
    • Financial metadata for Buy/Sell sides
    • Highly domain-specific
    • Schema (see next slide) [from UserGuide, p. 31]
    • Example: MorningCall.xml
  • 32. RIXML Schema
  • 33. Metadata Creation and Semanticization
    • Automatic Content Classification/Categorization
    • Metadata Creation/Extraction: Types of metadata created
    Semantic Engine and WorldModel are trademarks of Taalee, Inc. Metadata Extraction is a patented technology of Taalee, Inc.
  • 34. Content Handling/Ingest
    • Infrastructure/Exchange
    • Feed Handlers
    • Crawlers/Screen Scrapers/Bots
    • Software Agents
    • Centralized, Distributed, or Mobile/Migratory
  • 35. Information Extraction for Metadata Creation METADATA EXTRACTORS Key challenge: Create/extract as much (semantics) metadata automatically as possible WWW, Enterprise Repositories Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . .
  • 36. Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997 - 0530 MDT NATIONAL PREPAREDNESS LEVEL II CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM . This fire is on the east side of the Innoko Flats, between Galena and McGr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. The fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. CHINIKLIK MOUNTAIN, Galena District, BLM . A Type II Incident Management Team (Wehking) is assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire is contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fire burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend, depending on the results of infrared scanning. LAYOUT Date => day month int ‘,’ int
  • 37. Extraction Agent Web Page Enhanced Metadata Asset Taalee Extraction and Knowledgebase Enhancement
  • 38. Automatic Categorization & Metadata Tagging (unstructured text/transcript of A/V) ABSOLUTE CONTROL OF THE SENATE IS STILL IN QUESTION. AS OF TONIGHT, THE REPUBLICANS HAVE 50 SENATE SEATS AND THE DEMOCRATS 49. IN WASHINGTON STATE, THE SENATE RACE REMAINS TOO CLOSE TO CALL. IF THE DEMOCRATIC CHALLENGER UNSEATS THE REPUBLICAN IUMBENT THE SENATE WILL BE EVENLY DIVIDED. IN MISSOURI, REPUBLICAN SENATOR JOHN ASHCROFT SAYS HE WILL NOT CHALLENGE HIS LOSS TO GOVERNOR MEL CARNAHAN WHO DIED IN A CRASH THREE WEEKS AGO. GOVERNOR CARNAHAN'S WIFE IS EXPECTED TO TAKE HIS PLACE. IN THE HIGHEST PROFILE SENATE EVENT OF THE NIGHT, HILLARY CLINTON WON THE NEW YORK SENATE SEAT. SHE IS THE FIRST FIRST LADY TO RUN MUCH LESS WIN. Video Segment with Associated Text Segment Description Semantic Metadata Auto Categorization
  • 39. Video with Editorialized Text on the Web Automatic Categorization & Metadata Tagging (Web page) Auto Categorization Semantic Metadata
  • 40. Automatic Categorization & Metadata Tagging (Feed) Text From Bllomberg Auto Categorization Semantic Metadata
  • 41.       Taalee Metadata on Football Assets Rich Media Reference Page Baltimore 31, Pit 24 http://www.nfl.com Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter. Professional Ravens, Steelers Bal 31, Pit 24 Quandry Ismail, Tony Banks Touchdown NFL.com 2/02/2000 League: Teams: Score: Players: Event: Produced by: Posted date: Crawler provided text for indexing vs Agent provided semantic metadata Virage Search on football touchdown Jimmy Smith Interview Part Seven Jimmy Smith explains his philosophy on showboating. URL: http://cbs.sportsline... Brian Griese Interview Part Four Brian Griese talks about the first touchdown he ever threw. URL: http://cbs.sportsline... Metadata from Typical Cataloging of Football Assets
  • 42. Traditional Content Management Agent Push Pull Information Extraction Agents Dynamic KB Custom WorldModel Relevant Metadata Enhancement Knowledge Management Aggregation & Metadata Extraction Knowledge Management (Knowledge Base, Domain Model, Metadata) Agent Front End Portal Voquette Semantic Applications Feeds (proprietary formats, standards-based, NewsML) Corporate Repositories Web Sites One Approach to Extending Traditional CM: Voquette’s Semantic Engine Technology Search Personalization Alerts Notifications Custom “research” applications Content Metadata Metadata Metadata Metadata
  • 43. Taalee/Voquette Semantic Platform Architecture
    • Content of all format, media, push/pull:
    • Web sites/pages: static, dynamic
    • Content Feeds (unstructured, semistructured/docs, tagged/XML)
    • Corporate Repositories/databases
    • Homogenization/integration:
    • with taxonomy (categorization)
    • contextually relevant metadata
    • wrt to domain model, automatically generated from content and inferenced
    © Taalee Inc.
  • 44. Content which does contain the words the user asked for Extractor Agents Content which does not contain the words the user asked for, but is about what he asked for. Value-added Metadata Content the user did not think to ask for , but which he needs to know . Semantic Associations + + Semantic Content End-User Semantic Content
  • 45. Metadata and Semantic Technology enabled Applications
  • 46. Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field Delightful, relevant information, exceptional targeting opportunity
  • 47. Creating a Web of related information What can a context do?
  • 48. Example (test on http://directory.mediaanywhere.com ) Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
  • 49. What else can a context do? (a commercial perspective) Semantic Enrichment Semantic Targeting
  • 50. Semantic/Interactive Targeting Precisely targeted through the use of Structured Metadata and integration from multiple sources Buy Al Pacino Videos Buy Russell Crowe Videos Buy Christopher Plummer Videos Buy Diane Venora Videos Buy Philip Baker Hall Videos Buy The Insider Video
  • 51. Example 1 – Snapshots (“Jamal Anderson”) Click on first result for Jamal Anderson View metadata. Note that Team name and League name are also included in the metadata Search for ‘Jamal Anderson’ in ‘Football’ View the original source HTML page. Verify that the source page contains no mention of Team name and League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
  • 52. Example 2 – Snapshots (“Gary Sheffield”) Click on first result for Gary Sheffield View metadata. Note that Team name and League name are also included in the metadata Search for ‘Gary Sheffield’ in ‘Baseball’ View the original source HTML page. Verify that the source page contains no mention of Team name and League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
  • 53. Semantic Web – Intelligent Content (supported by Taalee Semantic Engine) Related Stock News Industry News Technology Products COMPANY EPA Regulations Competition COMPANIES in Same or Related INDUSTRY COMPANIES in INDUSTRY with Competing PRODUCTS Impacting INDUSTRY or Filed By COMPANY Important to INDUSTRY or COMPANY SEC Intelligent Content = What You Asked for + What you need to know!
  • 54. Semantic Application – Equity Dashboard Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related news not specifically asked for (Semantic Associations) Competitive research inferred automatically Automatic 3 rd party content integration
  • 55. Internal Source 1 Research Internal Source 2 External feeds/Web (e.g. Reuters) Voquette Metabase World Model Third-party Content Mgmt And Syndication Semantic Engine 1 2 3 4 Cisco story from Source 1 passed on to add semantic associations Consults Knowledge Base for Cisco ’s competition Returns result: Lucent is a competitor of Cisco Lucent story from external feeds picked for publishing as “semantically related” to Cisco story – passed on to Dashboard Story on Lucent Story on Cisco XCM-compliant metadata, XML or other format Semantic Application ASP/Enterprise hosted Extractor Agent 1 Extractor Agent 2 Extractor Agent 3 Metadata centric Content Management Architecture
  • 56. Wireless Application of Semantic Metadata and Automatic Content Enrichment  Clicking on the link for Cisco Analyst Calls displays a listing sorted by date. Semantic filtering uses just the right metadata to meet screen and other constrains. E.g., Analyst Call focuses on the source and analyst name or company. The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst. MyStocks News Sports Music MyMedia    $  My Stocks CSCO NT IBM Market CSCO Analyst Call Conf Call Earnings    11/08 ON24 Payne 11/07 ON24 H&Q  11/06 CBS Langlesis CSCO Analysis
  • 57. Scene Description Tree Retrieve Scene Description Track “ NSF Playoff” Node Enhanced XML Description MPEG-2/4/7 Enhanced Digital Cable Video MPEG Encoder MPEG Decoder Node = AVO Object Voqutte/Taalee Semantic Engine
    • Produced by: Fox Sports  
    • Creation Date: 12/05/2000
    • League: NFL
    • Teams: Seattle Seahawks,
    • Atlanta Falcons
    • Players: John Kitna
    • Coaches: Mike Holmgren,
    • Dan Reeves
    • Location: Atlanta
    Object Content Information (OCI) Metadata-rich Value-added Node Create Scene Description Tree  GREAT USER EXPERIENCE Metadata’s role in emerging iTV infrastructure Channel sales through Video Server Vendors, Video App Servers, and Broadcasters License metadata decoder and semantic applications to device makers “ NSF Playoff”
  • 58. Metadata for Automatic Content Enrichment Interactive Television This segment has embedded or referenced metadata that is used by personalization application to show only the stocks that user is interested in. This screen is customizable with interactivity feature using metadata such as whether there is a new Conference Call video on CSCO. Part of the screen can be automatically customized to show conference call specific information– including transcript, participation, etc. all of which are relevant metadata Conference Call itself can have embedded metadata to support personalization and interactivity.
  • 59. Semantic Technology Features
    • Unstructured Text Content
    • Semi-Structured Content
    • Structured Content
    • Audio/Video Content with associated text (transcript, journalist notes)
    • Create a Customized &quot;World Model&quot; (Taxonomy Tree with customized domain attributes)
    • Automatically homogenize content feed tags
    • Automatically categorize unstructured text
    • Automatically create tags based on text Itself
    • Create and maintain a Customized Knowledge Base for any domain
    • Automatically enhance content tags based on information beyond text
    • Build contextually relevant custom research applications
    • Contextual Search (an order of magnitude better than keyword-based search)
    • Support push or pull delivery/ingestion of content
    • Personalization/Alerts/Notifications
    • Real Time Indexing (stories indexed for search/personalization within a minute)
    • Provide the user with relevant information not explicitly asked for (Semantic Associations)
  • 60. Along with the evolution of metadata and semantic technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced Content Management.
  • 61. Resources/References
    • RDF: w w w . w 3 . o r g / T R / R E C - r d f - s y n t a x /
    • ICE: www.icestandard.org
    • Meta Object Facility (MOF) Specification, Version 1.3, September 27, 1999: http://cgi.omg.org/cgi-bin/doc?ad/99-09-05
    • XML Metadata Interchange (XMI) Specification, Version 1.1, October 25, 1999: http://cgi.omg.org/cgi-bin/doc?ad/9910-02 http://cgi.omg.org/cgi-bin/doc?ad/99-10-03
    • DAML: www.daml.org
    • NEWSML: newsshowcase.reuters.com
    • PRISM: www.prismstandard.org/techdev/prismspec1.asp
    • RIXML: www.rixml.org
    • XCM: www.vignette.com
    • OIL: www.ontoknowledge.org/oil
    • SEMANTICWEB: www.semanticweb.org , business.semanticweb.org
    • VOICEXML: www.voicexml.org
    • MPEG7: www.darmstadt.gmd.de/mobile/MPEG7/
    • Taalee: www.taalee.com
    • Applied Semantics: www.appliedsemantics.com
    • Ontoprose: www.ontoprise.com
  • 62.
    • Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Amit Sheth & Wolfgang Klas, Eds., McGraw Hill, ISBN: 0-07-057735-8, 1998.
    • Information Brokering, Vipul Kashyap & Amit Sheth, Kluwer Academic Publishers, 2001.
    • Voquette Semantic Technology White Paper.
    • Mysteries of Metadata, Speaker – Amit Sheth, Workshop at Content World 2001.
    • Infoquilt Project, LSDIS lab.
    • http://www.taalee.com
    • http://lsdis.cs.uga.edu/~amit