Amit P. Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships,” Keynote at the 29th Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2002), Milovy, Czech Republic, November 22–29, 2002.
Keynote: http://www.sofsem.cz/sofsem02/keynote.html
Related paper: http://knoesis.wright.edu/?q=node/2063
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...Emily Kolvitz
Image Resource Findability on the World Wide Web is still very much a landgrab. For the Semantic Web to become a reality online businesses and individuals have to get their hands dirty and also come facetoface with the realization that search engine giants are increasingly becoming the goto tool for information resource retrieval. “Increasingly, students use Web search engines such as Google to locate information resources rather than seek out library online catalogs or databases of scholarly journal articles” (Lippincott 2013). This puts the search engine giant in a unique position to dictate how the future of search will work on the Web and therefore, your organization’s future presence (or lack thereof) on the Web. Search Engine Optimization (SEO) techniques change frequently and remain much a mystery to many companies. The one variable in the equation of Web findability that remains a staple is good qualitymetadataunderthehoodoftheWebsite. Inthiscasestudy,amethodologyisappliedto the Gateway to Oklahoma History’s Website. This study can be generalized to organizations looking to benchmark their own findability maturity on the Web from an imagecentric viewpoint.
Amit P. Sheth, “Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating and Exploiting Complex Semantic Relationships,” Keynote at the 29th Conference on Current Trends in Theory and Practice of Informatics (SOFSEM 2002), Milovy, Czech Republic, November 22–29, 2002.
Keynote: http://www.sofsem.cz/sofsem02/keynote.html
Related paper: http://knoesis.wright.edu/?q=node/2063
Gateway to Oklahoma History Case Study: Structured Data and Metadata Evaluati...Emily Kolvitz
Image Resource Findability on the World Wide Web is still very much a landgrab. For the Semantic Web to become a reality online businesses and individuals have to get their hands dirty and also come facetoface with the realization that search engine giants are increasingly becoming the goto tool for information resource retrieval. “Increasingly, students use Web search engines such as Google to locate information resources rather than seek out library online catalogs or databases of scholarly journal articles” (Lippincott 2013). This puts the search engine giant in a unique position to dictate how the future of search will work on the Web and therefore, your organization’s future presence (or lack thereof) on the Web. Search Engine Optimization (SEO) techniques change frequently and remain much a mystery to many companies. The one variable in the equation of Web findability that remains a staple is good qualitymetadataunderthehoodoftheWebsite. Inthiscasestudy,amethodologyisappliedto the Gateway to Oklahoma History’s Website. This study can be generalized to organizations looking to benchmark their own findability maturity on the Web from an imagecentric viewpoint.
Federated Search: The Good, The Bad And The Uglydorishelfer
Presented at the SLA 2007 Annual Conference in Denver, CO to the Science and Technology Division (Sci-Tech) on a program entitled: "Federated Searching: The Good, The Bad and the Ugly." Based on an article in Searcher and with additional contributions from Kathy Dabbour and Lynn Lampert on user and librarian assessment of Federated Searching.
Talk based on: Ricardo Baeza-Yates and Carlos Castillo: “Web Retrieval and Mining”.Entry in “Encyclopedia of Library and Information Sciences”, third edition (to appear in 2009).
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
Semantic Annotation: The Mainstay of Semantic WebEditor IJCATR
Given that semantic Web realization is based on the critical mass of metadata accessibility and the representation of data with formal
knowledge, it needs to generate metadata that is specific, easy to understand and well-defined. However, semantic annotation of the
web documents is the successful way to make the Semantic Web vision a reality. This paper introduces the Semantic Web and its
vision (stack layers) with regard to some concept definitions that helps the understanding of semantic annotation. Additionally, this
paper introduces the semantic annotation categories, tools, domains and models
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Search on the Web is a daily activity for many people throughout the world
Search and communication are most popular uses of the computer
Applications involving search are everywhere
The field of computer science that is most involved with R&D for search is information retrieval (IR)
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
Кулинарные традиции Пьемонта неразрывно связаны с той землей, на которой они появились.
Гастрономические путешествие по кухне этого северо-западного региона Италии проходит через пологие холмы Ланге, родину лесных орехов, трюфелей и великолепных вин, обширные равнины Верчелли, Бьелла и Навары, где растет рис – основной ингредиент ризотто, а далее затрагивает суровые склоны Альп.
Там нашли свое пристанище вальденсы, подарившие местной кухне рецепты Melanzane alla menta, Pitta coi pomodori, Pesce spade alla ghiotta и т.д.
Federated Search: The Good, The Bad And The Uglydorishelfer
Presented at the SLA 2007 Annual Conference in Denver, CO to the Science and Technology Division (Sci-Tech) on a program entitled: "Federated Searching: The Good, The Bad and the Ugly." Based on an article in Searcher and with additional contributions from Kathy Dabbour and Lynn Lampert on user and librarian assessment of Federated Searching.
Talk based on: Ricardo Baeza-Yates and Carlos Castillo: “Web Retrieval and Mining”.Entry in “Encyclopedia of Library and Information Sciences”, third edition (to appear in 2009).
UNIT V TEXT AND OPINION MINING
Text Mining in Social Networks -Opinion extraction – Sentiment classification and clustering -
Temporal sentiment analysis - Irony detection in opinion mining - Wish analysis – Product review mining – Review Classification – Tracking sentiments towards topics over time
Semantic Annotation: The Mainstay of Semantic WebEditor IJCATR
Given that semantic Web realization is based on the critical mass of metadata accessibility and the representation of data with formal
knowledge, it needs to generate metadata that is specific, easy to understand and well-defined. However, semantic annotation of the
web documents is the successful way to make the Semantic Web vision a reality. This paper introduces the Semantic Web and its
vision (stack layers) with regard to some concept definitions that helps the understanding of semantic annotation. Additionally, this
paper introduces the semantic annotation categories, tools, domains and models
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Search on the Web is a daily activity for many people throughout the world
Search and communication are most popular uses of the computer
Applications involving search are everywhere
The field of computer science that is most involved with R&D for search is information retrieval (IR)
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
Кулинарные традиции Пьемонта неразрывно связаны с той землей, на которой они появились.
Гастрономические путешествие по кухне этого северо-западного региона Италии проходит через пологие холмы Ланге, родину лесных орехов, трюфелей и великолепных вин, обширные равнины Верчелли, Бьелла и Навары, где растет рис – основной ингредиент ризотто, а далее затрагивает суровые склоны Альп.
Там нашли свое пристанище вальденсы, подарившие местной кухне рецепты Melanzane alla menta, Pitta coi pomodori, Pesce spade alla ghiotta и т.д.
The impact of innovation on travel and tourism industries (World Travel Marke...Brian Solis
From the impact of Pokemon Go on Silicon Valley to artificial intelligence, futurist Brian Solis talks to Mathew Parsons of World Travel Market about the future of travel, tourism and hospitality.
We’re all trying to find that idea or spark that will turn a good project into a great project. Creativity plays a huge role in the outcome of our work. Harnessing the power of collaboration and open source, we can make great strides towards excellence. Not just for designers, this talk can be applicable to many different roles – even development. In this talk, Seasoned Creative Director Sara Cannon is going to share some secrets about creative methodology, collaboration, and the strong role that open source can play in our work.
The Six Highest Performing B2B Blog Post FormatsBarry Feldman
If your B2B blogging goals include earning social media shares and backlinks to boost your search rankings, this infographic lists the size best approaches.
Each technological age has been marked by a shift in how the industrial platform enables companies to rethink their business processes and create wealth. In the talk I argue that we are limiting our view of what this next industrial/digital age can offer because of how we read, measure and through that perceive the world (how we cherry pick data). Companies are locked in metrics and quantitative measures, data that can fit into a spreadsheet. And by that they see the digital transformation merely as an efficiency tool to the fossil fuel age. But we need to stretch further…
Applications of Semantic Technology in the Real World TodayAmit Sheth
Amit Sheth, "Applications of Semantic Technology in the Real World Today," talk given at Semantic Technology Conference, San Jose, CA, March 2005.
This talk reviews real-world applications mainly deployed in financial services industry developed over Semagix Freedom platform described in http://knoesis.org/library/resource.php?id=810 . Technology is based on this patent: "Semantic web and its applications in browsing, searching, profiling, personalization and advertising", http://knoesis.org/library/resource.php?id=843 .
Amit Sheth founded Taalee in 1999, which merged with Voquette in 2002, and then with Semagix in 2004.
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
Amit Sheth, "Semantic Web & Info. Brokering Opportunities, Commercialization and Challenges," Keynote talk at the workshop on Semantic Web: Models, Architecture and Management, September 21, 2000, Lisbon, Portugal.
This was the keynote given at probably the first international event with "Semantic Web" in title (and before the well known SciAm article). As in TBL's use of Semantic Web in his 1999 book, (semantic) metadata plays central role. The use of Worldmodel/Ontology is consistent with our use of ontology for (Web) information integration in 1994 CIKM paper. Summary of the talk by event organizers and other details are at: http://knoesis.org/library/resource.php?id=735
Prof. Sheth started a Semantic Web company Taalee, Inc. in 1999 (product was called MediaAnywhere A/V search engine- discussed in this paper in the context of one of its use by a customer Redband Broadcasting). The product included Semantic Web/populated Ontology based semantic (faceted) search, semantic browsing, semantic personalization, semantic targeting (advertisement), etc as is described in U.S. Patent #6311194, 30 Oct. 2001 (filed 2000). MediaAnywhere has about 25 ontologies in News/Business, Sports, Entertainment, etc.
Taalee merged to become Voquette in 2001 (product was called SCORE), Semagix in 2004 (product was called Semagix Freedom), and then Fortent in 2006 (products included Know Your Customers).
The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information
X api chinese cop monthly meeting feb.2016Jessie Chuang
Topics
XAPI Vocabulary spec. From ADL
Linked Data / Semantic web. / Web 3.0
Linked Data in education and content recommender
Semantic search and Google Knowledge Graph
APIs eat software (connect with partners and services)
How should we exploit data and build intelligence layer?
Case Study (Hong Ding Educational Technology)
Monetize your data and add value (intelligence)
Making IA Real: Planning an Information Architecture StrategyChiara Fox Ogan
Presented at Internet Librarian conference in 2001. Provides an introduction to what information architecture is and how you can use the methods to develop a good website.
Week 8 slides from the class "Social Web 2.0" I taught at the University of Washington's Masters in Communication program in 2007. Most of the content is still very relevant today. Topics: Social metadata, ratings, and social tagging.
Similar to Recent Trends in Semantic Search Technologies (20)
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
20240609 QFM020 Irresponsible AI Reading List May 2024
Recent Trends in Semantic Search Technologies
1. Peter Mika| Yahoo! Research, Spain
pmika@yahoo-inc.com
Thanh Tran | Semsolute, Germany
Tran@semsolute.com
Semantic Search on the Rise
2. About the speakers
Peter Mika
Senior Research Scientist
Head of Semantic Search group at
Yahoo! Labs
Expertise: Semantic Search, Web
Object Retrieval, Natural Language
Processing
Tran Duc Thanh
CEO of Semsolute, Semantic Search
Technologies Company
Served as Assistant Professor for
Karlsruhe Institute of Technology and
Stanford University
Expertise: Semantic Search,
Semantic / Linked Data Management
3. Agenda
Why Semantic Search
What is Semantic Search
Innovative Semantic Search Applications
Behind the Scene
Questions
5. Why Semantic Search? I.
“We are at the beginning of search.“ (Marissa Mayer)
Solved large classes of queries, e.g. navigational
Remaining queries are hard, not solvable by brute
force, require deep understanding of the world and
human cognition, e.g.
Ambiguous searches: paris hilton
Imprecise or overly precise searches
Searches for descriptions: 34 year old computer scientist
living in barcelona
Background knowledge and metadata can help to
address poorly solved queries
Many of these queries
would not be asked by
users, who learned over
time what search
technology can and can
not do.
6. Why Semantic Search? II.
The Semantic Web is now a reality
Large amounts of data published in RDF
Linked Data
Metadata in HTML
Facebook‟s Open Graph Protocol
Schema.org
Casual users
Don‟t know SPARQL
Unaware of the schema of the data
Searching data instead or in addition to searching
documents
Enable innovative search applications / tasks
8. Semantic Search: Using Semantic Models for
Search
Semantic search is a retrieval paradigm that
Exploits the semantics of the data or explicit background
knowledge to understand user intent and the meaning of
content
Incorporates the intent of the query and the meaning of
content into the search process (semantic models)
9. Semantic Search: Different Kinds / Different
Uses of Semantic Models
Wide range of semantic search systems
Employ different semantic models, possibly at
different steps of the search process and in order to
support different tasks
Query formulation
Query processing / understanding
Ranking
Result presentation
Result / query refinement
10. Semantic models
Semantics is concerned with the meaning of the
resources made available for search
Various representations of meaning
Word-level models: models of relationships among
words
Taxonomies, thesauri, dictionaries of entity names
Inference along linguistic relations, e.g. broader/narrower
terms
Concept-level models: models of relationships
among objects
Ontologies capture entities in the world and their
relationships
Inference along domain-specific relations
11. Graph-based Conceptual Models
Core of W3C standards for knowledge representation
and data exchange: RDF, OWL
Large amount of data / knowledge on the Web
available as graphs
Linked Data: hundreds of interconnected datasets
capturing domain-independent and domain-specific
knowledge
Metadata in HTML
RDFa, microdata, Facebook‟s OGP
Private graphs
Google‟s Knowledge Graph
Facebook Graph
Yahoo‟s Knowledge Base (talk yesterday)
Microsoft's Satori
13. Where can you find Linked Data?
Downloads
Dbpedia data dumps
SPARQL access
LOD cache by OpenLink: 51 billion triples
Keyword search
Sindice by SindiceTech
14. Google Knowledge Graph
Start with Freebase‟s database, which had 12 million
entities
As of June 2012, Knowledge Graph has 500 million
entities and over 3.5 billion relationships between
those entities
Prioritize properties based on what users were most
15. Facebook‟s Open Graph Protocol
The „Like‟ button provides publishers with a way to
promote their content on Facebook and build
communities
Shows up in profiles and news feed
Site owners can later reach users who have liked an
object
Facebook Graph API allows 3rd party developers to
access the data
Open Graph Protocol is an RDFa-based format that
allows to describe the object that the user „Likes‟
16. Facebook‟s Open Graph Protocol
RDF vocabulary to be used in conjunction with RDFa
Simplify the work of developers by restricting the freedom in RDFa
Activities, Businesses, Groups, Organizations, People, Places,
Products and Entertainment
Only HTML <head> accepted
http://opengraphprotocol.org/
<html xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url"
content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-
imdb.com/images/rock.jpg" /> …
</head> ...
17. Semantic Web markup: schema.org
Agreement on a shared set of schemas for common types
of web content
Use a single format to communicate the same information to all three
search engines
Bing, Google, and Yahoo! (June, 2011), Yandex (Nov, 2011)
Microdata and RDFa support
Schemas for most common web content
Business listings, images/video, recipes, reviews, products, jobs…
Community
public-vocabs@w3.org
19. Current state of metadata on the Web
Analysis of the Bing/Yahoo! Search Crawl
US crawl, January, 2012
31% of webpages, 5% of domains contain some metadata
P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus,
LDOW 2012
WebDataCommons.org
Data extracted from a public crawl (commoncrawl.org)
February, 2012 results show 11% of URLs with metadata
compared to 5% in 2009/2010 data
7.3 billion triples available for download
H.Mühleisen, C.Bizer.Web Data Commons - Extracting
Structured Data from Two Large Web Corpora, LDOW 2012
Large increase in RDFa and microdata adoption compared
to microformats
20. Where can you find HTML metadata?
Web Data Commons
Glimmer: glimmer.research.yahoo.com
Online index of the schema.org data in Web Data
Commons
22. Innovative Semantic Search Applications
Entity search: entity/entities as results
Factual search: direct answers, facts (about entities)
Relational search: complex relationships between entities
Semantic auto-completion: suggesting queries based on
the intent of the provided inputs
Results aggregation / analysis / prediction: apply
computational models
Semantic log analysis: understanding user behavior in
terms of objects
Semantic profiling: recommendations based on particular
interests
Semantic context: contextual model of users / interests
Support for complex tasks, e.g. booking a vacation using a
combination of services
Conversational search
31. Contextual (pervasive, ambient) search
Yahoo! Connected
TV:
Widget engine
embedded into the
TV
Yahoo! IntoNow:
recognize audio and
show related content
32. Interactive Voice Search
Siri
Question-Answering
Variety of backend sources
including Wolfram Alpha and
various Yahoo! services
Task completion
E.g. schedule an event
34. Conversational Search
Parlance EU project
Complex dialogs around a set of objects
Restaurant
Area
Price range
Type of cuisine
Complete system
Automated Speech Recognition (ASR)
Spoken Language Understanding (SLU)
Interaction Management
Knowledge Base
Natural Language Generation (NLG)
Text-to-Speech (TTS)
Video
Commercial alternatives from Nuance
36. Main Technological Building Blocks
Query Interpretation
Spelling Correction
Query Segmentation
Entity Recognition
Query Intent Interpretation for Semantic Auto-Completion
Ranking
Entity Ranking
Relationship Ranking
Aggregation
Result Fusion
Rank / Score Aggregation
Result Presentation
Summary Generation
Visualization
37. Semsolute‟s Building Blocks - Keyword / Key Phrase
Interpretation
Entity
“address company san
francisco”
Semantic entity index
Inverted index for entities /
triples
Return entities / entities‟
relationships as results to
keys
Semantic entity ranking
Structured language model:
one language model for every
attribute
Returns entities‟ LMs that
most likely generate the
keywords, i.e. the entity
descriptions that best match
38. Relationship
s / Structure
Entity
“address company san
francisco”
Semsolute‟s Building Blocks – Semantic Graph
Construction
Offline component: query-
independent schema graph
Reuse schema
Pseudo-schema construction:
all possible connections
between classes of entities,
e.g. friendships between users
Online component: query-
specific keyword matching
elements
Connect keyword matching
elements / entities to the
classes they belong to
39. Relationship
s / Structure
Entity
“address company san
francisco”
Semsolute‟s Building Blocks – Graph Exploration
Top-k graph exploration
Shortest-path based algorithm
that finds top-k graphs
connecting keyword matching
elements
Top-k graph ranking
Language model based
Aggregated model that
combines the LMs of entities
matching the keywords
40. Semsolute‟s Building Blocks – Query Generation &
Processing
TripleRelationship
s / Structure
Entity
Address of companies located in San
Francisco?
“address company san
francisco”
Graph to query mapping
Translation rules that map top
ranked graphs to structured
queries (SQL, SPARQL)
Translation rules that map
structured queries to natural
language questions
Graph matching
Triple index: cover index
supporting different triple
patterns
Various join implementations
41. Yahoo! Spark: Entity Recommendation in
Search
Different use cases in Web Search
Some users are short on time
Need direct answers
Query expansion, question-answering, information boxes, rich
results…
Other users want to explore
Long term interests such as sports, celebrities, movies and music
Long running tasks such as travel planning
Spark is a search assistance tool for exploration
Recommend related entities given the user‟s current
query
Based on explicit relations in a Knowledge Base
46. Spark challenges
Interpretation and disambiguation
Obama and Toyota are places in Japan, but maybe
the user is not looking for them
The popularity of “obama” is not a sign of the
popularity of a Japanese town
Ranking
“Release me” from Engelbert Humperdinck should
rank higher than “Lesbian Seagull” which only
appeared on the soundtrack of a Beavis and
Butthead episode
Editorial relevance vs. what people click
Large-scale data processing and ML
Knowledge Base built from Wikipedia, Yahoo!
data, Web extraction
Feature extraction from query logs, Flickr and Twitter
data
Entity
graph
Data
preprocessing
Feature
extraction
Model
learning
Feature
sources
Editorial
judgements
Datapack
Ranking
model
Ranking and
disambiguation
Entity
data
Features
47. Contact
Peter Mika
pmika@yahoo-inc.com
@pmika
Tran Duc Thanh
thanh.tran@semsolute.com
49. Resources
Detailed information
Peter Mika. Entity Search on the Web, Keynote at Web of
Linked Entities WS
Peter Mika, Thanh Tran. Semantic search tutorial
SemTech2012
Books
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern
Information Retrieval. ACM Press. 2011
Survey papers
Thanh Tran, Peter Mika. Survey of Semantic Search
Approaches. Under submission, 2012.
Conferences and workshops
ISWC, ESWC, WWW, SIGIR, CIKM, SemTech
Semantic Search workshop series
Exploiting Semantic Annotations in Information Retrieval
(ESAIR)
Entity-oriented Search (EOS) workshop
Web of Linked Entities (WoLE) workshop
Editor's Notes
Mobile: Google interactive voice search (conversation), Siri (Peter)Facebook’s Graph Search (Thanh)Knowledge Graph (infoboxes)... entity search (“tom cruise actor”) to list/category queries (“tom cruise spouses”) to question-answering (“tom cruise height”) (Thanh)Spark (Yahoo!): related entity recommendation (Peter)Thanh’s search engine: auto-complete based on the schema/data, entity search to relational search using Yago data (Thanh)Glimmer: RDF search engine (Peter)
Semantic search can be seen as a retrieval paradigm Centered on the use of semanticsIncorporates the semantics entailed by the query and (or) the resources into the matching process, it essentially performs semantic search.
Facebook invited, but continues to pursue OGP
We implemented the search paradigms and integrated them as separate search modules into a demonstrator system of the Information Workbench7 that has been developed as a showcase for interaction with the Web of data. In particular, keyword search is implemented according to the design and technologies employed by standard Semantic Web search engines. Like Sindice and FalconS, we use an invertedindex to store and retrieve RDF resources based on terms. Also using the inverted index, faceted search is implemented based on the techniques discussed in [25]. Result completion is based on recent work discussed for the TASTIER system [8]. For computing join graphs, we use the top-k procedure elaborated in [9]. This technique is also used for computing top-k interpretations, i.e. to support query completion. We choose to display the top-6 queries and the top-25 results respectively.