This document discusses semantic markup with schema.org to help search engines understand web pages better. It describes how schema.org was created as a collaborative effort by major search engines to define a shared set of schemas. This allows publishers to markup their content in a consistent way so it can be understood by different search engines and applications. The document outlines how schema.org has grown significantly in adoption and detail over time. It also discusses how schema.org builds on semantic web standards and can describe actions websites can take to help with task completion.
Kerry Dean presented at the January 2015 DFWSEM | Dallas/Fort Worth Search Engine Marketing Association meeting on The State of SEO: 2015 and Beyond!
https://www.dfwsem.org/events/kerry-dean/
As large datasets come together exciting and unexpected things can happen. Human behavior is high dimensional, so combining many diverse datasets is critical to revealing actionable insights.
Kerry Dean presented at the January 2015 DFWSEM | Dallas/Fort Worth Search Engine Marketing Association meeting on The State of SEO: 2015 and Beyond!
https://www.dfwsem.org/events/kerry-dean/
As large datasets come together exciting and unexpected things can happen. Human behavior is high dimensional, so combining many diverse datasets is critical to revealing actionable insights.
This is the presentation material used for the VOGIN-IP lezing 28 februari 2013 by Marina Noordegraaf. If you want to hear more about the context and meaning of the images, you know whom you might ask ;-) For the version WITH animations go to http://dl.dropbox.com/u/18649633/VOGINIP280213Slideshare.pptx
De spelregels van de publieke informatieruimte worden opnieuw geschreven. In deze presentatie nemen Frank Huysmans en Marina Noordegraaf je mee in de digitale wereld, waarin toegang tot informatie niet vanzelfsprekend is en soms wordt verzameld zonder dat je het weet.
De presentatie spoort kenniswerkers aan om positie te nemen en hun gebruikers de weg te wijzen naar hun rechten in een gedigitaliseerde maatschappij.
Prof. Bob de Graaff sprak op de VOGIN-IP-lezing op 26 maart 2015, over de veranderingen in de wereld van de inlichtingendiensten; door ontwikkelingen op internet kan nu iedereen zijn eigen inlichtingendienst zijn
Presentation of NewsReader as keynote for VOGIN-IP 2015. Can we handle the news? How computers reads millions of news articles to extract what, when, where and who is involved over longer periods of time. News reading technology is developed for 4 languages (English, Dutch Spanish and Italian), creating RDF from text.
This is the presentation material used for the VOGIN-IP lezing 28 februari 2013 by Marina Noordegraaf. If you want to hear more about the context and meaning of the images, you know whom you might ask ;-) For the version WITH animations go to http://dl.dropbox.com/u/18649633/VOGINIP280213Slideshare.pptx
De spelregels van de publieke informatieruimte worden opnieuw geschreven. In deze presentatie nemen Frank Huysmans en Marina Noordegraaf je mee in de digitale wereld, waarin toegang tot informatie niet vanzelfsprekend is en soms wordt verzameld zonder dat je het weet.
De presentatie spoort kenniswerkers aan om positie te nemen en hun gebruikers de weg te wijzen naar hun rechten in een gedigitaliseerde maatschappij.
Prof. Bob de Graaff sprak op de VOGIN-IP-lezing op 26 maart 2015, over de veranderingen in de wereld van de inlichtingendiensten; door ontwikkelingen op internet kan nu iedereen zijn eigen inlichtingendienst zijn
Presentation of NewsReader as keynote for VOGIN-IP 2015. Can we handle the news? How computers reads millions of news articles to extract what, when, where and who is involved over longer periods of time. News reading technology is developed for 4 languages (English, Dutch Spanish and Italian), creating RDF from text.
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
The search sun shifted in March 2011 when Google started rolling out the beginning of the Panda update. Instead of using the famous PageRank, a link-based relevance calculation, Panda rests on a machine interpretation of user experience to decide which sites are most relevant to a searchers quest for knowledge. This means that IA and UX practitioners need to start thinking about the machine implications of the way they structure information on the web, and think ahead about the human implications for how search engines present their sites in response to searcher queries. Bearish SEO will present real, actionable methods for content providers, information architects and user experience designers to directly influence search engine discoverability. Need is an experience. It is a state of being. The goal for this presentation is to ensure that user experience professionals become an integral part of designing search experience.
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
Data Governance and Stewardship requires automation of business semantics management at its nucleus, in order to achieve data trust between business and IT communities in the organization. University divisions operate highly autonomously and decentralized, and are often geographically distributed. Hence, they benefit more from an collaborative and agile approach to Data Governance and Stewardship approach that adapts to its nature.
In this lecture, we start by reviewing 'C' in ICT and reflect on the dilemma: what is the most important quality of data being shared: truth or trust? We review the wide spectrum of business semantics. We visit the different phases of growing data pain as an organization expands, and we map each phase on this spectrum of semantics.
Next, we introduce our principles and framework for business semantics management to support Data Governance and Stewardship focusing on the structural (what), processual (how) and organizational (who) components. We illustrate with use cases from Stanford University, George Washington University and Public Science and Innovation Administrations.
LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio.
To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way.
This LinkedIn talk at the NYC Hadoop Meetup held 3/18 at ContextWeb focused on best practices for quickly uncovering patterns, visualizing trends, and generating actionable insights from large datasets.
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
This presentation will discuss how to collect Web data with precision, transform it and then apply next-generation text analytics to reveal insights about the past activities of persons of interest and/or predict future outcomes. Featured guest speaker Claire Schmidt will discuss results of a project which proved the potential of using automated Web data collection and advanced analytics to identify potential child victims of exploitation.
Search engines have changed a lot over the last 15 years and optimizing Websites for them must keep up. This presentation looks at the search landscape and present strategies and tactics for optimizing for today's search.
What is the current status quo of the Semantic Web as first mentioned by Tim Berners Lee in 2001?
Not only 10 blue links can drive you traffic anymore, Google has added many so called Knowlegde cards and panels to answer the specific informational need of their users. Sounds complicated, but it isn’t. If you ask for information, Google will try to answer it within the result pages.
I'll share my research from a theoretical point of view through exploring patents and papers, and actual testing cases in the live indices of Google. Getting your site listed as the source of an Answer Card can result in an increase of CTR as much as 16%. How to get listed? Come join my session and I'll shine some light on the factors that come into play when optimizing for Google's Knowledge graph.
Overview of semantic technologies and ontologies, with a focus on their definition, uses and users. RDF, RDF-S, OWL and SWRL are discussed. Some example users include IBM Watson, construction and finance industry companies, BBC and Google.
Similar to Semantic mark-up with schema.org: helping search engines understand the Web (20)
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptx
Semantic mark-up with schema.org: helping search engines understand the Web
1. S e m a n t i c m a r k u p w i t h s c h e ma . o r g :
h e l p i n g s e a r c h e n g i n e s u n d e r s t a n d t h e We b
P R E S E N T E D B Y P e t e r M i k a , D i r e c t o r o f R e s e a r c h , Y a h o o L a b s ⎪ M a r c h 2 6 , 2 0 1 5
5. What can we do?
5
Improve Information Retrieval
› Harder and harder given the same data
• Exploited term-based relevance models, hyperlink structure and interaction data
• Combination of features using machine learning
• Heavy investment in computational power
– real-time indexing, instant search, datacenters and edge services
Improve the Web
› Make the Web more searchable?
6. The Semantic Web (2001-)
3/27/20156
Part of Tim Berners-Lee’s
original proposal for the Web
Beginning of a research community
› Formal ontology
› Logical reasoning
› Agents, web services
Rough start in deployment
› Misplaced expectations
› Lack of adoption
7. The Semantic Web, May 2001
“At the doctor's office, Lucy instructed her
Semantic Web agent through her handheld Web
browser. The agent promptly retrieved
information about Mom's prescribed treatment
from the doctor's agent, looked up several lists
of providers, and checked for the ones in-plan
for Mom's insurance within a 20-mile radius of
her home and with a rating of excellent or very
good on trusted rating services. It then began
trying to find a match between available
appointment times (supplied by the agents of
individual providers through their Web sites) and
Pete's and Lucy's busy schedules.”
(The emphasized keywords indicate terms
whose semantics, or meaning, were defined for
the agent through the Semantic Web.)
3/27/20157
Misplaced expectations?
8. Lack of adoption
Standardization ahead of adoption
› URI, RDF, RDF/XML, RDFa, JSON-LD,
OWL, RIF, SPARQL, OWL-S, POWDER …
Chicken and egg problem
› No users/use cases, hence no data
› No data, because no users/use cases
By 2007, some modest progress
› Metadata in HTML: microformats
› Linked Data: simplifying the stack
9. Microsearch internal prototype (2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
10. Yahoo SearchMonkey (2008)
1. Extract structured data
› Semantic Web markup
• Example:
<span property=“vcard:city”>Santa Clara</span>
<span property=“vcard:region”>CA</span>
› Information Extraction
2. Presentation
› Fixed presentation templates
• One template per object type
› Applications
• Third-party modules to display data (SearchMonkey)
11. Effectiveness of enhanced results
Explicit user feedback
› Side-by-side editorial evaluation (A/B testing)
• Editors are shown a traditional search result and enhanced result for the same page
• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
Implicit user feedback
› Click-through rate analysis
• Long dwell time limit of 100s (Ciemiewicz et al. 2010)
• 15% increase in ‘good’ clicks
› User interaction model
• Enhanced results lead users to relevant documents (IV) even though less likely to clicked than
textual (III)
• Enhanced results effectively reduce bad clicks!
See
› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR
2011: 725-734
12. Other applications of enhanced results
Google Rich Snippets - June, 2009
› Faceted search for recipes - Feb, 2011
Bing tiles – Feb, 2011
Facebook’s Like button and the Open Graph Protocol (2010)
› Shows up in profiles and news feed
› Site owners can later reach users who have liked an object
Twitter cards (2012)
› More visual/interactive tweets
14. Not just web pages: markup in email
Google Now
Yahoo Search/Mail
Microsoft Cortana
15. Problem!
16
Each of these applications require a different markup
› Different schemas and syntax
What’s a publisher to do?
› Mark up the same content differently for every consumer
• Time consuming
• Error prone
16. schema.org
Collaborative effort sponsored by large consumers of Web data
› Bing, Google, and Yahoo! as initial founders (June, 2011)
› Yandex joins schema.org in Nov, 2011
Agreement on a shared set of schemas for the Web
› Available at schema.org in HTML and machine readable formats
› Free to use under W3C Royalty Free terms
21. schema.org structure
Classes
› Each class has a label and descriptions
› Classes form a class hierarchy
• Multiple inheritance allowed but rare (a class with two super-classes)
Properties
› Each property has a label and description
› Properties have domains and ranges, and inverse properties
Datatypes
› Boolean, Date, DateTime etc.
22. schema.org usage in practice
Depends on the skillset of the publisher
› Instances are rarely given an identifier, or identified by the URL of the webpage
› schema.org consumers (validators etc.) are tolerant to mistakes
• e.g. accept text even when an object is required
Driven by applications
› Publishers often provide the minimal information required in a particular context
› Validators (Bing, Google, Yandex) validate different subsets
23. schema.org statistics
R.V. Guha: Light at the end of the tunnel (ISWC 2013 keynote)
› Over 15% of all pages now have schema.org markup
› Over 5 million sites, over 25 billion entity references
› In other words
• Same order of magnitude as the web
See also
› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
• Based on Bing US corpus
• 31% of webpages, 5% of domains contain some metadata
› WebDataCommons
• Based on CommonCrawl Nov 2013
• 26% of webpages, 14% of domains contain some metadata
24. schema.org process
Process
› Initial release
• Group of experts harmonizing existing vocabularies
› Regular updates based on public discussion
• Fixes
• Extensions
• Deprecation
– almost never
Tooling
› Website (App Engine)
• Open Source
› Github
29. schema.org and web standards
schema.org builds on Semantic Web standards
› RDFa, JSON-LD, HTML5 microdata
Not a standardization effort in the classical sense
› Continuously evolving ontology
› Huge scope (‘everything on the Web’)
› Shallow depths compared to more targeted efforts
More specialized discussions typically at more targeted forums
› e.g. W3C Community Groups
Large enumerations and/or rapidly changing knowledge maintained elsewhere
› e.g. PlaceOfWorship
› BuddhistTemple, CatholicChurch, Church, HinduTemple, Mosque, Synagogue …
› Meanwhile over at Wikipedia:
• https://en.wikipedia.org/wiki/Place_of_worship
• https://www.wikidata.org/wiki/Q1370598
35. Task completion
36
We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› We need to understand what the available actions are
Schema.org Actions
› Describe what actions can be taken on a page/email
› See blog post and overview article
THING
THING
36. Actions
Schema.org v1.2 (April, 2014)
› See blog post and overview article for detail.
› and public-vocabs threads for even more details.
37.
38. {
"@type": "Product",
"url": "http://example.com/products/ipod",
"potentialAction": {
"@type": "BuyAction",
"target": {
"@type": "EntryPoint",
"urlTemplate": "https://example.com/products/ipod/buy",
"encodingType": "application/ld+json",
"contentType": "application/ld+json"
},
"result": {
"@type": "Order",
"url-output": "required",
"confirmationNumber-output": "required",
"orderNumber-output": "required",
"orderStatus-output": "required"
}
}
}
{
"@type": "BuyAction",
"actionStatus": "CompletedActionStatus",
"object":
"https://example.com/products/ipod",
"result": {
"@type": "Order",
"url":
"http://example.com/orders/1199334"
"confirmationNumber": "1ABBCDDF23234",
"orderNumber": "1199334",
"orderStatus": "PROCESSING"
},
}
Actions example Here is a Product and
a potential action
(Buy)
After POSTing the
request to the
EntryPoint, here is
your completed action
41. Q&A
Many thanks to
› The schema.org group and the many contributors to schema.org
› Dan Brickley
Get involved
› Join the discussion at public-vocabs@w3.org
› File a bug, fork a schema, track releases at Github.org
Contact me
› pmika@yahoo-inc.com
› @pmika
› http://www.slideshare.net/pmika/