Content Management, Metadata and Semantic Web

Amit Sheth
Amit ShethFounding Director, Artificial Intelligence Institute at University of South Carolina
Content Management,  Metadata & Semantic Web Keynote Address Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 Amit Sheth CTO/SrVP, Voquette (www.voquette.com)  [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab,  University Of Georgia (lsdis.cs.uga.edu) [email_address] Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Traditional Content Management:  Core Objectives and Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Content Creation and Edition Content Management Content Personalization and Services Content Delivery
Technology/Product Provider Landscape ,[object Object],[object Object],[object Object],[object Object],[object Object]
Enterprise Content Management  – sample user requirements (from a large Financial Svcs Company) ,[object Object],[object Object],Professional users of a traditional Content Management Product/Solution
Enterprise Content Management  – sample user requirements (from a large Financial Svcs Company) ,[object Object],[object Object],[object Object],[object Object],Professional users of a traditional Content Management Product/Solution
Enterprise Content Management  – sample product requirements (from a large Financial Svcs Company) ,[object Object],Director  (Product Management) of a large company that uses a leading Content Management Product
New Enterprise Content Management Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
New Enterprise Content Management Technical Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Creating and Serving Metadata to Power the Life-cycle of Content Applications Back End "A Web content repository without metadata is like a library without an index."  - Jack Jia, IWOV “ Metadata increases content value in each step of content value chain.”  Amit Sheth Where is the content? Whose is it? Produce Aggregate What is this content about? Catalog/ Index What other content is it related to? Integrate Syndicate What is the right content for this user? Personalize What is the best way to monetize this interaction? Interactive Marketing Broadcast, Wireline, Wireless, Interactive TV Semantic Metadata
A Metadata Classification Data   (Heterogeneous Types/Media) Content Independent Metadata   (creation-date, location, type-of-sensor...) Content Dependent Metadata   (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists,  document vectors, LSI) Domain Independent (structural) Metadata   (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata  concept descriptions from ontologies Ontologies Classifications Domain Models User More  Semantics for  Relevance  to tackle Information Overload!!
Semantics ,[object Object],[object Object],[object Object],[object Object],[object Object]
“ The Web of data (and connections) with meaning  in the sense that  a computer program can learn enough about what the data means to process it .  . . .  Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.”  (Tim Berners-Lee,  Weaving the Web , 1999) A Content Management centric definition of Semantic Web: The concept that Web-accessible  content can be organized and utilized semantically,  rather than though syntactic and structural methods. Semantics:  The Next Step in the Web’s Evolution
Next Generation: Semantic Content Management
Organizing Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Statistical/AI  Techniques Customer  Article Feed  4715 Classification of  Article 4715 Customer  Training  Set Traditional Text Categorization Routing/Distribution Classify Place in a taxonomy feed Most traditional Content Management Products support  Categorization of unstructured content.. Standard Metadata Feed Source : iSyndicate    Posted Date : 11/20/2000
Knowledge-base &  Statistical/AI Techniques Article Feed 4715 Classification  of Article 4715 Customer  Training  Set & KB Routing/Distribution Classify Place in a taxonomy Taalee  Training  Set & KB Map to another taxonomy Metadata Catalog Semantic Engine™ Precise Personalization/ Syndication/Filtering Voquette/Taalee’s Categorization & Automatic Metadata Creation feed Standard  metadata Semantic  metadata FTE Company Analysis Conference Calls Earnings Stock Analysis ENT Company Analysis Conference Calls Earnings Stock Analysis NYSE Member Companies Market News IPOs Automated Content  Enrichment (ACE) Article 4715 Metadata Feed Source :  iSyndicate      Posted Date : 11/20/2000  Company Name :  France Telecom ,    Equant   Ticker Symbol :  FTE ,  ENT   Exchange :  NYSE   Topic :  Company News
Technologies for Organizing Content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multiple competitng standards! Multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model Theme keywords :  digital line graph, hydrography, transportation... Title : Dakota Aquifer Online linkage : http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method:  Vector Horizontal Coordinate System Definition: Universal Transverse Mercator   … … … ... UDK Metadata Model Search terms :  digital line graph,  hydrography, transportation... Topic :  Dakota Aquifer Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ Measuring Techniques:  Vector Co-ordinate System: Universal Transverse Mercator … … … ...
Basis for Semantics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Ontology ,[object Object],[object Object],[object Object],[object Object]
An Ontology Disaster eventDate description site => latitude, longitude site latitude longitude Natural Disaster Man-made Disaster damage numberOfDeaths damagePhoto Volcano Earthquake NuclearTest magnitude bodyWaveMagnitude conductedBy explosiveYield bodyWaveMagnitude < 10 bodyWaveMagnitude > 0 magnitude < 10 magnitude > 0 Terms/Concepts (Attributes) Functional Dependencies (FDs) Domain Rules Hierarchies
Controlled  Vocabularies/ Classifications/Taxonomies/Ontologies ,[object Object],[object Object],[object Object]
Open Directory Project (ODP): Classification/Taxonomy & Directory
Metadata Specifications  (MetaModels) Metadata Domain Independent   (Dublin Core, RDF, DAML+OIL) Frameworks/Infrastructures   (XCM, XMI) Function Specific ICE (Syndication)  Domain (Application) Specific MARC (Library), FGDC and UDK (Geographic), PRISM (Publishing), FXML (Financial Transactions). RIXML (Buy-Sell Research/Financial Services),  IMS Learning Resource (Distance Learning). ….. Media Specific MPEGx, VoiceXML NewsML (News exchange)
Types of Specs and Standards  (or MetaModels) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dublin Core Metadata Initiative ,[object Object],[object Object],[object Object],Source:www.desire.org
Dublin Core RDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
NewsML Source:http://www.mediabricks.com The  content provider  supplies NewsML packaged media content to the operator. The content can be categorized as current events, finance, sport, etc. (but no standards is specified) and updated hourly. The  operator  receives NewsML data from the content provider. The content server automatically pushes updated news articles to all news service subscribers.  Consumers  sign up for the news service directly on the device. When using the news service, the user browses through the categories and reads the news articles. The news articles are presented in a continuous flow (one after the other) without end-user interaction.
NewsML ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
RIXML ,[object Object],[object Object],[object Object],[object Object]
RIXML Schema
Metadata Creation and Semanticization ,[object Object],[object Object],Semantic Engine and WorldModel are trademarks of Taalee, Inc. Metadata Extraction is a patented technology of Taalee, Inc.
Content Handling/Ingest ,[object Object],[object Object],[object Object],[object Object],[object Object]
Information Extraction for Metadata Creation METADATA EXTRACTORS Key challenge:  Create/extract as much (semantics) metadata automatically as possible WWW, Enterprise Repositories Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . .
Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997  - 0530 MDT NATIONAL PREPAREDNESS LEVEL II CURRENT  SITUATION: Alaska continues to experience large fire activity.  Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM .   This fire is on the east side of the Innoko Flats, between Galena and McGr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce.  The fire has increased in size, but was not mapped due to thick smoke.  The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. CHINIKLIK MOUNTAIN, Galena District, BLM .   A Type II Incident Management Team (Wehking) is  assigned to the Chiniklik fire.  The fire is contained.  Major areas of heat have been mopped up.  The fire is contained.  Major areas of heat have been mopped-up.  All crews and overhead will mop-up where the fire burned beyond the meadows.  No flare-ups occurred today.  Demobilization is planned for this weekend, depending on the results of infrared scanning. LAYOUT Date => day month int ‘,’ int
Extraction  Agent Web Page Enhanced Metadata Asset Taalee Extraction and Knowledgebase  Enhancement
Automatic Categorization & Metadata Tagging  (unstructured text/transcript of A/V) ABSOLUTE CONTROL OF THE SENATE IS STILL IN QUESTION. AS OF TONIGHT, THE REPUBLICANS HAVE 50 SENATE SEATS AND THE DEMOCRATS 49. IN WASHINGTON STATE, THE SENATE RACE REMAINS TOO CLOSE TO CALL. IF THE DEMOCRATIC CHALLENGER UNSEATS THE REPUBLICAN IUMBENT THE SENATE WILL BE EVENLY DIVIDED. IN MISSOURI, REPUBLICAN SENATOR JOHN ASHCROFT SAYS HE WILL NOT CHALLENGE HIS LOSS TO GOVERNOR MEL CARNAHAN WHO DIED IN A CRASH THREE WEEKS AGO. GOVERNOR CARNAHAN'S WIFE IS EXPECTED TO TAKE HIS PLACE. IN THE HIGHEST PROFILE SENATE EVENT OF THE NIGHT, HILLARY CLINTON WON THE NEW YORK SENATE SEAT. SHE IS THE FIRST FIRST LADY TO RUN MUCH LESS WIN.  Video Segment with Associated Text Segment Description Semantic Metadata Auto Categorization
Video with Editorialized  Text on the Web Automatic Categorization & Metadata Tagging (Web page) Auto Categorization Semantic Metadata
Automatic Categorization & Metadata Tagging (Feed) Text From Bllomberg Auto Categorization Semantic Metadata
      Taalee Metadata on  Football Assets Rich Media Reference Page Baltimore 31, Pit 24 http://www.nfl.com Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter. Professional Ravens, Steelers Bal 31, Pit 24 Quandry Ismail, Tony Banks Touchdown NFL.com 2/02/2000 League: Teams: Score: Players: Event: Produced by: Posted date: Crawler provided text for indexing vs  Agent provided semantic metadata Virage Search on  football touchdown Jimmy Smith Interview Part Seven Jimmy Smith explains his  philosophy on showboating.  URL:  http://cbs.sportsline... Brian Griese Interview Part Four Brian Griese talks about the  first touchdown he ever threw.  URL:  http://cbs.sportsline... Metadata from Typical Cataloging of Football Assets
Traditional Content  Management Agent Push Pull Information Extraction Agents Dynamic KB Custom WorldModel Relevant Metadata Enhancement Knowledge Management Aggregation & Metadata Extraction Knowledge Management  (Knowledge Base, Domain Model, Metadata) Agent Front End Portal Voquette  Semantic Applications Feeds (proprietary  formats,  standards-based,  NewsML) Corporate Repositories Web Sites One Approach to Extending Traditional CM:  Voquette’s Semantic Engine Technology Search Personalization Alerts Notifications Custom “research”  applications Content Metadata Metadata Metadata Metadata
Taalee/Voquette Semantic Platform Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],© Taalee Inc.
Content which does contain the  words the user asked for Extractor Agents Content which does not contain the  words the user  asked for, but is  about  what he asked for. Value-added Metadata Content the user did not  think to ask for , but which he  needs to know . Semantic Associations + + Semantic Content End-User Semantic  Content
Metadata and Semantic Technology enabled Applications
Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Context and Domain Specific Attributes Uniform Metadata for Content from Multiple  Sources, Can be sorted by any field Delightful, relevant information, exceptional targeting opportunity
Creating a Web of related information What can a context do?
Example  (test on  http://directory.mediaanywhere.com ) Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
What else can a context do? (a commercial perspective) Semantic Enrichment Semantic Targeting
Semantic/Interactive Targeting Precisely targeted through the use of Structured Metadata and integration from multiple sources Buy  Al Pacino  Videos Buy  Russell Crowe  Videos Buy  Christopher Plummer  Videos Buy  Diane Venora  Videos Buy  Philip Baker Hall  Videos Buy  The Insider  Video
Example 1 – Snapshots (“Jamal Anderson”) Click on first result for Jamal Anderson View metadata. Note that  Team name  and  League name  are also included in the metadata Search for ‘Jamal Anderson’ in ‘Football’ View the original source HTML page. Verify that the source page contains no mention of  Team name  and  League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
Example 2 – Snapshots (“Gary Sheffield”) Click on first result for Gary Sheffield View metadata. Note that  Team name  and  League name  are also included in the metadata Search for ‘Gary Sheffield’ in ‘Baseball’ View the original source HTML page. Verify that the source page contains no mention of  Team name  and  League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
Semantic Web – Intelligent Content (supported by Taalee Semantic Engine) Related Stock  News Industry News Technology  Products COMPANY EPA Regulations Competition COMPANIES in Same or Related INDUSTRY COMPANIES  in INDUSTRY with Competing  PRODUCTS Impacting INDUSTRY or Filed By COMPANY Important to INDUSTRY or COMPANY SEC Intelligent Content = What You Asked for + What you need to know!
Semantic Application – Equity Dashboard Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related news not specifically asked for (Semantic Associations) Competitive research inferred automatically Automatic 3 rd  party content integration
Internal Source 1 Research Internal Source 2 External feeds/Web (e.g. Reuters) Voquette Metabase World Model Third-party Content Mgmt And Syndication Semantic Engine 1 2 3 4 Cisco  story from  Source 1 passed on to add semantic associations Consults Knowledge Base for  Cisco ’s competition Returns result: Lucent  is a competitor of  Cisco Lucent  story  from external  feeds picked for publishing as “semantically  related” to  Cisco  story – passed on to Dashboard Story on Lucent Story on Cisco XCM-compliant metadata, XML or other format Semantic Application ASP/Enterprise hosted Extractor  Agent 1 Extractor  Agent 2 Extractor  Agent 3 Metadata centric Content Management Architecture
Wireless Application of  Semantic Metadata  and  Automatic Content Enrichment  Clicking on the link for Cisco Analyst Calls displays a listing sorted by date.  Semantic filtering uses just the right metadata to meet screen and other constrains.  E.g., Analyst Call focuses on the source and analyst name or company.  The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst. MyStocks News Sports Music MyMedia    $  My Stocks CSCO NT IBM Market CSCO Analyst Call Conf Call Earnings    11/08 ON24 Payne 11/07 ON24 H&Q   11/06 CBS  Langlesis CSCO Analysis
Scene Description Tree Retrieve Scene Description Track “ NSF Playoff” Node Enhanced  XML  Description MPEG-2/4/7 Enhanced  Digital Cable Video MPEG Encoder MPEG Decoder Node = AVO Object Voqutte/Taalee Semantic Engine ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Object Content Information (OCI) Metadata-rich Value-added Node Create Scene Description Tree  GREAT USER EXPERIENCE Metadata’s role in emerging  iTV infrastructure  Channel sales through Video Server Vendors,  Video App Servers, and Broadcasters License metadata decoder and  semantic applications to  device makers “ NSF Playoff”
Metadata for  Automatic Content Enrichment Interactive Television This segment has embedded or referenced metadata that is used by personalization application to show only the stocks that user is interested in. This screen is customizable with interactivity feature using metadata such as whether there is a new Conference Call video on CSCO. Part of the screen can be automatically customized to  show conference call specific  information– including transcript, participation, etc. all of which are relevant metadata Conference Call itself can have  embedded metadata to  support personalization and interactivity.
Semantic Technology Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Along with the evolution of metadata and semantic technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced Content Management.
Resources/References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1 of 62

Recommended

Taxonomies And Search Aiim Mn by
Taxonomies And Search Aiim MnTaxonomies And Search Aiim Mn
Taxonomies And Search Aiim MnAIIM Minnesota
4.2K views112 slides
#SPSVancouver 2016 - The importance of metadata by
#SPSVancouver 2016 - The importance of metadata#SPSVancouver 2016 - The importance of metadata
#SPSVancouver 2016 - The importance of metadataVincent Biret
786 views44 slides
Taxonomies and Metadata in Information Architecture by
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
16K views64 slides
Taxonomies and Metadata by
Taxonomies and MetadataTaxonomies and Metadata
Taxonomies and MetadataAravind Sesagiri Raamkumar
4.7K views29 slides
Six Ways to Simplify Metadata Management by
Six Ways to Simplify Metadata ManagementSix Ways to Simplify Metadata Management
Six Ways to Simplify Metadata ManagementEnterprise Knowledge
2K views12 slides
How to Use Site Search to Drive Conversions and Create Customers by
How to Use Site Search to Drive Conversions and Create CustomersHow to Use Site Search to Drive Conversions and Create Customers
How to Use Site Search to Drive Conversions and Create CustomersEarley Information Science
596 views40 slides

More Related Content

What's hot

Taxonomy And Metadata by
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And MetadataDavid Champeau
11K views21 slides
Semantic Technology in Publishing & Finance by
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceVladimir Alexiev, PhD, PMP
1.4K views50 slides
Taxonomy 101 by
Taxonomy 101Taxonomy 101
Taxonomy 101Theresa Putkey
1.3K views46 slides
Looking Under the Hood -- Australia SharePoint Conference by
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceChristian Buckley
720 views52 slides
Taxonomy and Metadata Demystified by
Taxonomy and Metadata DemystifiedTaxonomy and Metadata Demystified
Taxonomy and Metadata DemystifiedFindwise
2.5K views18 slides

What's hot(20)

Looking Under the Hood -- Australia SharePoint Conference by Christian Buckley
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint Conference
Christian Buckley720 views
Taxonomy and Metadata Demystified by Findwise
Taxonomy and Metadata DemystifiedTaxonomy and Metadata Demystified
Taxonomy and Metadata Demystified
Findwise2.5K views
Building An XML Publishing System With DITA by Scott Abel
Building An XML Publishing System With DITABuilding An XML Publishing System With DITA
Building An XML Publishing System With DITA
Scott Abel1.8K views
Five creative search solutions using text analytics by Enterprise Knowledge
Five creative search solutions using text analyticsFive creative search solutions using text analytics
Five creative search solutions using text analytics
Kbee Spaces Financial Services by atolomei
Kbee Spaces Financial ServicesKbee Spaces Financial Services
Kbee Spaces Financial Services
atolomei265 views
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery” by VOGIN-academie
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
VOGIN-academie185 views
Successful Content Management Through Taxonomy And Metadata Design by sarakirsten
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
sarakirsten13.6K views
Managing Electronic Resources for Public Libraries, Part 1 by ALATechSource
Managing Electronic Resources for Public Libraries, Part 1Managing Electronic Resources for Public Libraries, Part 1
Managing Electronic Resources for Public Libraries, Part 1
ALATechSource2.1K views
Managing Electronic Resources for Public Libraries: Part 2 by ALATechSource
Managing Electronic Resources for Public Libraries: Part 2Managing Electronic Resources for Public Libraries: Part 2
Managing Electronic Resources for Public Libraries: Part 2
ALATechSource1.4K views
Semantics in Financial Services -David Newman by Peter Berger
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
Peter Berger3.2K views
An Overview of Dow Jones' Use of Semantic Technologies by Christine Connors
An Overview of Dow Jones' Use of Semantic TechnologiesAn Overview of Dow Jones' Use of Semantic Technologies
An Overview of Dow Jones' Use of Semantic Technologies
Christine Connors1.4K views
Semantic Applications for Financial Services by DavidSNewman
Semantic Applications for Financial ServicesSemantic Applications for Financial Services
Semantic Applications for Financial Services
DavidSNewman2.7K views
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ... by martingarland
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
martingarland731 views

Similar to Content Management, Metadata and Semantic Web

SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY by
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYSEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYAmit Sheth
952 views43 slides
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery” by
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”voginip
1.5K views32 slides
Taxonomy and seo sla 05-06-10(jc) by
Taxonomy and seo   sla 05-06-10(jc)Taxonomy and seo   sla 05-06-10(jc)
Taxonomy and seo sla 05-06-10(jc)Earley Information Science
2.6K views59 slides
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating... by
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
549 views61 slides
Content management by
Content managementContent management
Content managementRajendra Babu
533 views30 slides
Structuring Serendipitous Collaboration by
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous CollaborationNick Inglis
51 views38 slides

Similar to Content Management, Metadata and Semantic Web(20)

SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY by Amit Sheth
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITYSEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
SEMANTIC CONTENT MANAGEMENT FOR ENTERPRISES AND NATIONAL SECURITY
Amit Sheth952 views
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery” by voginip
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
voginip1.5K views
Structuring Serendipitous Collaboration by Nick Inglis
Structuring Serendipitous CollaborationStructuring Serendipitous Collaboration
Structuring Serendipitous Collaboration
Nick Inglis51 views
Semantic Web in Action: Ontology-driven information search, integration and a... by Amit Sheth
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
Amit Sheth2.8K views
Hid content management systems by dhiraj.gaur
Hid content management systemsHid content management systems
Hid content management systems
dhiraj.gaur869 views
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M... by Paul Wlodarczyk
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
Paul Wlodarczyk2.7K views
IWMW 2002: The Value of Metadata and How to Realise It by IWMW
IWMW 2002: The Value of Metadata and How to Realise ItIWMW 2002: The Value of Metadata and How to Realise It
IWMW 2002: The Value of Metadata and How to Realise It
IWMW 311 views

Recently uploaded

Gross Anatomy of the Liver by
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liverobaje godwin sunday
89 views12 slides
The Future of Micro-credentials: Is Small Really Beautiful? by
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?Mark Brown
102 views35 slides
Nelson_RecordStore.pdf by
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdfBrynNelson5
50 views10 slides
MIXING OF PHARMACEUTICALS.pptx by
MIXING OF PHARMACEUTICALS.pptxMIXING OF PHARMACEUTICALS.pptx
MIXING OF PHARMACEUTICALS.pptxAnupkumar Sharma
125 views35 slides
Meet the Bible by
Meet the BibleMeet the Bible
Meet the BibleSteve Thomason
81 views80 slides
NodeJS and ExpressJS.pdf by
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdfArthyR3
50 views17 slides

Recently uploaded(20)

The Future of Micro-credentials: Is Small Really Beautiful? by Mark Brown
The Future of Micro-credentials:  Is Small Really Beautiful?The Future of Micro-credentials:  Is Small Really Beautiful?
The Future of Micro-credentials: Is Small Really Beautiful?
Mark Brown102 views
Nelson_RecordStore.pdf by BrynNelson5
Nelson_RecordStore.pdfNelson_RecordStore.pdf
Nelson_RecordStore.pdf
BrynNelson550 views
NodeJS and ExpressJS.pdf by ArthyR3
NodeJS and ExpressJS.pdfNodeJS and ExpressJS.pdf
NodeJS and ExpressJS.pdf
ArthyR350 views
Interaction of microorganisms with vascular plants.pptx by MicrobiologyMicro
Interaction of microorganisms with vascular plants.pptxInteraction of microorganisms with vascular plants.pptx
Interaction of microorganisms with vascular plants.pptx
INT-244 Topic 6b Confucianism by S Meyer
INT-244 Topic 6b ConfucianismINT-244 Topic 6b Confucianism
INT-244 Topic 6b Confucianism
S Meyer49 views
12.5.23 Poverty and Precarity.pptx by mary850239
12.5.23 Poverty and Precarity.pptx12.5.23 Poverty and Precarity.pptx
12.5.23 Poverty and Precarity.pptx
mary850239514 views
Career Building in AI - Technologies, Trends and Opportunities by WebStackAcademy
Career Building in AI - Technologies, Trends and OpportunitiesCareer Building in AI - Technologies, Trends and Opportunities
Career Building in AI - Technologies, Trends and Opportunities
WebStackAcademy47 views
UNIT NO 13 ORGANISMS AND POPULATION.pptx by Madhuri Bhande
UNIT NO 13 ORGANISMS AND POPULATION.pptxUNIT NO 13 ORGANISMS AND POPULATION.pptx
UNIT NO 13 ORGANISMS AND POPULATION.pptx
Madhuri Bhande43 views
Retail Store Scavenger Hunt.pptx by jmurphy154
Retail Store Scavenger Hunt.pptxRetail Store Scavenger Hunt.pptx
Retail Store Scavenger Hunt.pptx
jmurphy15453 views
Pharmaceutical Analysis PPT (BP 102T) by yakshpharmacy009
Pharmaceutical Analysis PPT (BP 102T) Pharmaceutical Analysis PPT (BP 102T)
Pharmaceutical Analysis PPT (BP 102T)
yakshpharmacy009116 views
11.30.23A Poverty and Inequality in America.pptx by mary850239
11.30.23A Poverty and Inequality in America.pptx11.30.23A Poverty and Inequality in America.pptx
11.30.23A Poverty and Inequality in America.pptx
mary850239181 views
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf by TechSoup
 Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
Ask The Expert! Nonprofit Website Tools, Tips, and Technology.pdf
TechSoup 62 views
Education of marginalized and socially disadvantages segments.pptx by GarimaBhati5
Education of marginalized and socially disadvantages segments.pptxEducation of marginalized and socially disadvantages segments.pptx
Education of marginalized and socially disadvantages segments.pptx
GarimaBhati547 views

Content Management, Metadata and Semantic Web

  • 1. Content Management, Metadata & Semantic Web Keynote Address Net.ObjectDAYS 2001, Erfurt, Germany, September 11, 2001 Amit Sheth CTO/SrVP, Voquette (www.voquette.com) [formerly Founder/CEO, Taalee, www.taalee.com] Director, Large Scale Distributed Information Systems Lab, University Of Georgia (lsdis.cs.uga.edu) [email_address] Metadata Extraction is a patented pending technology of Taalee, Inc. Semantic Engine and WorldModel are trademarks of Taalee. Inc.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Creating and Serving Metadata to Power the Life-cycle of Content Applications Back End &quot;A Web content repository without metadata is like a library without an index.&quot; - Jack Jia, IWOV “ Metadata increases content value in each step of content value chain.” Amit Sheth Where is the content? Whose is it? Produce Aggregate What is this content about? Catalog/ Index What other content is it related to? Integrate Syndicate What is the right content for this user? Personalize What is the best way to monetize this interaction? Interactive Marketing Broadcast, Wireline, Wireless, Interactive TV Semantic Metadata
  • 11. A Metadata Classification Data (Heterogeneous Types/Media) Content Independent Metadata (creation-date, location, type-of-sensor...) Content Dependent Metadata (size, max colors, rows, columns...) Direct Content Based Metadata (inverted lists, document vectors, LSI) Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...) Domain Specific Metadata area, population (Census), land-cover, relief (GIS),metadata concept descriptions from ontologies Ontologies Classifications Domain Models User More Semantics for Relevance to tackle Information Overload!!
  • 12.
  • 13. “ The Web of data (and connections) with meaning in the sense that a computer program can learn enough about what the data means to process it . . . . Imagine what computers can understand when there is a vast tangle of interconnected terms and data that can automatically be followed.” (Tim Berners-Lee, Weaving the Web , 1999) A Content Management centric definition of Semantic Web: The concept that Web-accessible content can be organized and utilized semantically, rather than though syntactic and structural methods. Semantics: The Next Step in the Web’s Evolution
  • 14. Next Generation: Semantic Content Management
  • 15.
  • 16. Statistical/AI Techniques Customer Article Feed 4715 Classification of Article 4715 Customer Training Set Traditional Text Categorization Routing/Distribution Classify Place in a taxonomy feed Most traditional Content Management Products support Categorization of unstructured content.. Standard Metadata Feed Source : iSyndicate   Posted Date : 11/20/2000
  • 17. Knowledge-base & Statistical/AI Techniques Article Feed 4715 Classification of Article 4715 Customer Training Set & KB Routing/Distribution Classify Place in a taxonomy Taalee Training Set & KB Map to another taxonomy Metadata Catalog Semantic Engine™ Precise Personalization/ Syndication/Filtering Voquette/Taalee’s Categorization & Automatic Metadata Creation feed Standard metadata Semantic metadata FTE Company Analysis Conference Calls Earnings Stock Analysis ENT Company Analysis Conference Calls Earnings Stock Analysis NYSE Member Companies Market News IPOs Automated Content Enrichment (ACE) Article 4715 Metadata Feed Source : iSyndicate   Posted Date : 11/20/2000 Company Name : France Telecom , Equant Ticker Symbol : FTE , ENT Exchange : NYSE Topic : Company News
  • 18.
  • 19. Multiple competitng standards! Multiple heterogeneous metadata models with different tag names for the same data in the same GIS domain Kansas State FGDC Metadata Model Theme keywords : digital line graph, hydrography, transportation... Title : Dakota Aquifer Online linkage : http://gisdasc.kgs.ukans.edu/dasc/ Direct Spatial Reference Method: Vector Horizontal Coordinate System Definition: Universal Transverse Mercator … … … ... UDK Metadata Model Search terms : digital line graph, hydrography, transportation... Topic : Dakota Aquifer Adress Id: http://gisdasc.kgs.ukans.edu/dasc/ Measuring Techniques: Vector Co-ordinate System: Universal Transverse Mercator … … … ...
  • 20.
  • 21.
  • 22. An Ontology Disaster eventDate description site => latitude, longitude site latitude longitude Natural Disaster Man-made Disaster damage numberOfDeaths damagePhoto Volcano Earthquake NuclearTest magnitude bodyWaveMagnitude conductedBy explosiveYield bodyWaveMagnitude < 10 bodyWaveMagnitude > 0 magnitude < 10 magnitude > 0 Terms/Concepts (Attributes) Functional Dependencies (FDs) Domain Rules Hierarchies
  • 23.
  • 24. Open Directory Project (ODP): Classification/Taxonomy & Directory
  • 25. Metadata Specifications (MetaModels) Metadata Domain Independent (Dublin Core, RDF, DAML+OIL) Frameworks/Infrastructures (XCM, XMI) Function Specific ICE (Syndication) Domain (Application) Specific MARC (Library), FGDC and UDK (Geographic), PRISM (Publishing), FXML (Financial Transactions). RIXML (Buy-Sell Research/Financial Services), IMS Learning Resource (Distance Learning). ….. Media Specific MPEGx, VoiceXML NewsML (News exchange)
  • 26.
  • 27.
  • 28.
  • 29. NewsML Source:http://www.mediabricks.com The content provider supplies NewsML packaged media content to the operator. The content can be categorized as current events, finance, sport, etc. (but no standards is specified) and updated hourly. The operator receives NewsML data from the content provider. The content server automatically pushes updated news articles to all news service subscribers. Consumers sign up for the news service directly on the device. When using the news service, the user browses through the categories and reads the news articles. The news articles are presented in a continuous flow (one after the other) without end-user interaction.
  • 30.
  • 31.
  • 33.
  • 34.
  • 35. Information Extraction for Metadata Creation METADATA EXTRACTORS Key challenge: Create/extract as much (semantics) metadata automatically as possible WWW, Enterprise Repositories Digital Maps Nexis UPI AP Feeds/ Documents Digital Audios Data Stores Digital Videos Digital Images . . . . . . . . .
  • 36. Extracting a Text Document: Syntactic approach INCIDENT MANAGEMENT SITUATION REPORT Friday August 1, 1997 - 0530 MDT NATIONAL PREPAREDNESS LEVEL II CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have been staffed for structure protection. SIMELS, Galena District, BLM . This fire is on the east side of the Innoko Flats, between Galena and McGr The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. The fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is 35% contained, while protection of the historic cabit continues. CHINIKLIK MOUNTAIN, Galena District, BLM . A Type II Incident Management Team (Wehking) is assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire is contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fire burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend, depending on the results of infrared scanning. LAYOUT Date => day month int ‘,’ int
  • 37. Extraction Agent Web Page Enhanced Metadata Asset Taalee Extraction and Knowledgebase Enhancement
  • 38. Automatic Categorization & Metadata Tagging (unstructured text/transcript of A/V) ABSOLUTE CONTROL OF THE SENATE IS STILL IN QUESTION. AS OF TONIGHT, THE REPUBLICANS HAVE 50 SENATE SEATS AND THE DEMOCRATS 49. IN WASHINGTON STATE, THE SENATE RACE REMAINS TOO CLOSE TO CALL. IF THE DEMOCRATIC CHALLENGER UNSEATS THE REPUBLICAN IUMBENT THE SENATE WILL BE EVENLY DIVIDED. IN MISSOURI, REPUBLICAN SENATOR JOHN ASHCROFT SAYS HE WILL NOT CHALLENGE HIS LOSS TO GOVERNOR MEL CARNAHAN WHO DIED IN A CRASH THREE WEEKS AGO. GOVERNOR CARNAHAN'S WIFE IS EXPECTED TO TAKE HIS PLACE. IN THE HIGHEST PROFILE SENATE EVENT OF THE NIGHT, HILLARY CLINTON WON THE NEW YORK SENATE SEAT. SHE IS THE FIRST FIRST LADY TO RUN MUCH LESS WIN. Video Segment with Associated Text Segment Description Semantic Metadata Auto Categorization
  • 39. Video with Editorialized Text on the Web Automatic Categorization & Metadata Tagging (Web page) Auto Categorization Semantic Metadata
  • 40. Automatic Categorization & Metadata Tagging (Feed) Text From Bllomberg Auto Categorization Semantic Metadata
  • 41.       Taalee Metadata on Football Assets Rich Media Reference Page Baltimore 31, Pit 24 http://www.nfl.com Quandry Ismail and Tony Banks hook up for their third long touchdown, this time on a 76-yarder to extend the Raven’s lead to 31-24 in the third quarter. Professional Ravens, Steelers Bal 31, Pit 24 Quandry Ismail, Tony Banks Touchdown NFL.com 2/02/2000 League: Teams: Score: Players: Event: Produced by: Posted date: Crawler provided text for indexing vs Agent provided semantic metadata Virage Search on football touchdown Jimmy Smith Interview Part Seven Jimmy Smith explains his philosophy on showboating. URL: http://cbs.sportsline... Brian Griese Interview Part Four Brian Griese talks about the first touchdown he ever threw. URL: http://cbs.sportsline... Metadata from Typical Cataloging of Football Assets
  • 42. Traditional Content Management Agent Push Pull Information Extraction Agents Dynamic KB Custom WorldModel Relevant Metadata Enhancement Knowledge Management Aggregation & Metadata Extraction Knowledge Management (Knowledge Base, Domain Model, Metadata) Agent Front End Portal Voquette Semantic Applications Feeds (proprietary formats, standards-based, NewsML) Corporate Repositories Web Sites One Approach to Extending Traditional CM: Voquette’s Semantic Engine Technology Search Personalization Alerts Notifications Custom “research” applications Content Metadata Metadata Metadata Metadata
  • 43.
  • 44. Content which does contain the words the user asked for Extractor Agents Content which does not contain the words the user asked for, but is about what he asked for. Value-added Metadata Content the user did not think to ask for , but which he needs to know . Semantic Associations + + Semantic Content End-User Semantic Content
  • 45. Metadata and Semantic Technology enabled Applications
  • 46. Taalee’s Semantic Search Highly customizable, precise and freshest A/V search Context and Domain Specific Attributes Uniform Metadata for Content from Multiple Sources, Can be sorted by any field Delightful, relevant information, exceptional targeting opportunity
  • 47. Creating a Web of related information What can a context do?
  • 48. Example (test on http://directory.mediaanywhere.com ) Search for company ‘Commerce One’ Links to news on companies that compete against Commerce One Links to news on companies Commerce One competes against (To view news on Ariba, click on the link for Ariba) Crucial news on Commerce One’s competitors (Ariba) can be accessed easily and automatically
  • 49. What else can a context do? (a commercial perspective) Semantic Enrichment Semantic Targeting
  • 50. Semantic/Interactive Targeting Precisely targeted through the use of Structured Metadata and integration from multiple sources Buy Al Pacino Videos Buy Russell Crowe Videos Buy Christopher Plummer Videos Buy Diane Venora Videos Buy Philip Baker Hall Videos Buy The Insider Video
  • 51. Example 1 – Snapshots (“Jamal Anderson”) Click on first result for Jamal Anderson View metadata. Note that Team name and League name are also included in the metadata Search for ‘Jamal Anderson’ in ‘Football’ View the original source HTML page. Verify that the source page contains no mention of Team name and League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
  • 52. Example 2 – Snapshots (“Gary Sheffield”) Click on first result for Gary Sheffield View metadata. Note that Team name and League name are also included in the metadata Search for ‘Gary Sheffield’ in ‘Baseball’ View the original source HTML page. Verify that the source page contains no mention of Team name and League name . They were Taalee’s value-additions to the metadata to facilitate easier search.
  • 53. Semantic Web – Intelligent Content (supported by Taalee Semantic Engine) Related Stock News Industry News Technology Products COMPANY EPA Regulations Competition COMPANIES in Same or Related INDUSTRY COMPANIES in INDUSTRY with Competing PRODUCTS Impacting INDUSTRY or Filed By COMPANY Important to INDUSTRY or COMPANY SEC Intelligent Content = What You Asked for + What you need to know!
  • 54. Semantic Application – Equity Dashboard Focused relevant content organized by topic ( semantic categorization ) Automatic Content Aggregation from multiple content providers and feeds Related news not specifically asked for (Semantic Associations) Competitive research inferred automatically Automatic 3 rd party content integration
  • 55. Internal Source 1 Research Internal Source 2 External feeds/Web (e.g. Reuters) Voquette Metabase World Model Third-party Content Mgmt And Syndication Semantic Engine 1 2 3 4 Cisco story from Source 1 passed on to add semantic associations Consults Knowledge Base for Cisco ’s competition Returns result: Lucent is a competitor of Cisco Lucent story from external feeds picked for publishing as “semantically related” to Cisco story – passed on to Dashboard Story on Lucent Story on Cisco XCM-compliant metadata, XML or other format Semantic Application ASP/Enterprise hosted Extractor Agent 1 Extractor Agent 2 Extractor Agent 3 Metadata centric Content Management Architecture
  • 56. Wireless Application of Semantic Metadata and Automatic Content Enrichment  Clicking on the link for Cisco Analyst Calls displays a listing sorted by date. Semantic filtering uses just the right metadata to meet screen and other constrains. E.g., Analyst Call focuses on the source and analyst name or company. The icon denote additional metadata, such as “Strong Buy” by H&Q Analyst. MyStocks News Sports Music MyMedia    $  My Stocks CSCO NT IBM Market CSCO Analyst Call Conf Call Earnings    11/08 ON24 Payne 11/07 ON24 H&Q  11/06 CBS Langlesis CSCO Analysis
  • 57.
  • 58. Metadata for Automatic Content Enrichment Interactive Television This segment has embedded or referenced metadata that is used by personalization application to show only the stocks that user is interested in. This screen is customizable with interactivity feature using metadata such as whether there is a new Conference Call video on CSCO. Part of the screen can be automatically customized to show conference call specific information– including transcript, participation, etc. all of which are relevant metadata Conference Call itself can have embedded metadata to support personalization and interactivity.
  • 59.
  • 60. Along with the evolution of metadata and semantic technologies enabling the next generation of the Web, Content Management has entered the next generation of Enhanced Content Management.
  • 61.
  • 62.

Editor's Notes

  1. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute.
  2. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute.
  3. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute. Companies in categorization field: Autonomy, Metacode (bought by Interwoven), Semio, Inxight, etc. Typical strategies employed by competition: Statistical/AI/Parsing/NLP/Rules-based/Collaborative Filtering Result: Partial success in categorization Placement of a document in a node, solely based on above strategies (nothing to do with metadata describing it – the basis behind semantics) Resulting classification – rigid/static/ambiguous/fuzzy Captures only standard physical metadata (source, date, length etc.), which is often useless in categorization purposes
  4. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute. Taalee performs categorization by laying importance to semantic metadata extracted from any document Strategies employed by Taalee: Knowledge-based/Statistical/Rules-based/AI techniques Result: Complete success in categorization! Precise category/categories chalked out for classifying document Resulting classification – flexible/dynamic/unambiguous/crisp Value-added metadata churned out to rig out the context/gist of the document Metadata =&gt; Great potential for Automated Content Enrichment (ACE) Classifying into or mapping to other taxonomies possible Promise to greatly enhance the current functioning of Content Manager and Syndication Software/Service
  5. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute. Why? What is its use?
  6. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute.
  7. 04/29/10 Taalee Proprietary &amp; Confidential. Do not copy or distribute.