Semantic Web Identity is the condition in which search engines understand the existence and nature of entities. Most academic organizations are not very well understood by search engines, and Wikipedia and Wikidata can help search engines understand an organization’s business, which helps drive users to the organization. A key concept in this presentation: what are the risks in not participating in how your institution’s identity is shaped on the web?
My User is a Machine: Semantic Web Identity for Academic Organizations
1. My User is a Machine
Semantic Web Identity
for Academic
Organizations
Kenning Arlitsch, PhD, MLIS
Dean of the Library
ALA Conversation Starter
June 24, 2018
2. We Think Our Users Are Human…
• But machines mediate human information seeking behavior
• We don’t feed the machines well enough
3. Semantic Web Identity (SWI)
• The condition in which Internet search engines recognize the
existence and nature of entities (ex., academic organizations)
• Happens when a search engine has gathered enough verifiable facts
about an entity for a formal display of that entity in SERP
• Knowledge Graph Card is an indicator of SWI
11. ARL: The (Marketing) Problem of Names
• 125 ARL member libraries
• Every library has a primary (official) name
• http://www.arl.org/membership/list-of-arl-members
• 94 libraries also have alternate names
• Example:
• Yale University Library = primary name
• Sterling Memorial Library = alternate name
12. Table plot showing that ARL library alternate names (column
1, orange rows) were more likely to display an accurate KC
(column 2, green rows)
13. Records for ARL members in
Knowledge Bases
Knowledge Base Primary (% of 125) Alternate (% of 94) Total (% of 219)
Google My Business 22% 43% 31%
Google Plus (verified) 18% 20% 19%
Wikipedia (w/infobox) 24% 28% 26%
DBpedia 24% 41% 32%
Wikidata 21% 39% 29%
24. Broader Impact
• More than the tactical approach
• Strategy of delivering machine-comprehensible data to SE
• That’s where the users are
• Traditional outreach/marketing practices don't work on the SW
• Inconsistent use of names
• Lack of explicit “same as” declarations for machine comprehension
• Opportunities
• Develop cohesive marketing strategies and consistent processes
• Expand skill sets of library faculty and staff
• Offer SWI services to campus constituents
Editor's Notes
Knowledge Graph Card is a product of Google's Knowledge Graph, a graph database that gathers information and helps inform search engine results. KC began appearing in 2012.
Builds on SEO research I have been conducting with a colleague since 2009.
There is evidence in the literature that Google draws from a variety of sources to populate its Knowledge Graph. Some of these sources are proprietary (GMB and G+) and some are open knowledge bases on the LOD.
GMB – recommended by Google itself
Wikipedia – Google lists Wikipedia as the source of the textual description field on the KC
Dbpedia – represents Wikipedia information as linked data
Wikidata – Google historically drew much information from its own Freebase, but chose to migrate that data into Wikidata to support this LOD source.
I generated table plots from R to help visualize some of the findings. A table plot compares two or more columns of spreadsheet data. Each row of the table plot displays a library name.
In this table plot it is easy to see that the proportion of primary library names (blue row) and alternate names (orange rows). It is also easy to see that the alternate library names were more likely to display an accurate KC, as indicated by the green rows in the second column.
This table shows that the percentage of library primary and alternate names that have records or profiles in the knowledge bases is low, although it's a bit higher for the alternate names. This fits the pattern of alternate names being more likely to display KC.
Search results are not the only goal, but search results are
highly visible goal
easy to communicate to partners – clear to see shortcomings before and benefits after
clear to emphasize that SWI Service is not promising search ranking results
Rather aiming to populate linked data sources with content that facilitates a more accurate and authoritative machine comprehension of the entity, which in turn increases the visibility of the entity for human consumption/action
Pre-SWI Service screenshot - understood as a string, but not a thing
Sparse, Ambiguous, Loosely Related Search Results
Discrete Data - functioning within standalone systems that do not readily communicate
Buried Treasure - some information is discoverable, much information is hidden
(In)Visibility - ambiguous entity with sparse data
ENCYCLOPEDIC INFORMATION
These case studies provide examples of two service models
In both cases MSU Library is providing expertise in writing for Wikipedia, creating Wikipedia entries, and curating entries
JJCBE –
Greater personnel/time resources
Highly collaborative Wikipedia drafting/editing process
Template from MSU Library
JJCBE writes
JJCBE share
MSU Library edit
JJCBE revise
JJCBE Re-Share
MSU Library Edit
Etc. until ready for Wikipedia
MSU Library
Honors College –
Fewer personnel/time resources
Independent Wikipedia drafting/editing process
Template from MSU Library
MSU Library writes, edits, publishes
DATABASE INFORMATION
STRUCTURED DATA STATEMENTS
Can wait for Bot to generate Wikidata record. But, may be significant lagtime. Demonstrated benefits from Wikidata records. Therefore, we’ve made decision to create of Wikidata record upon publication of Wikipedia entry – more efficient and expedient.
Statements about the entity with thorough referencing.
Essentially creating connections among various nodes LOD universe.
DBpedia - extract structured content from the information created in Wikipedia
Hub for connecting various datasets.
Cannot create content within Dbpedia, but important to monitor what content is or is not within Dbpedia dataset – what is being extracted from Wikipedia – for download or interlinking with other databases
Search Results -- Not the only goal, but certainly a clear (and highly visible) indicator of success.
Very robust and mature Knowledge Cards for JJCBE and Honors
Name
Images
Geolocation
Action Buttons (website, directions, call, etc.)
Wikipedia snippet
Address
Phone
Contextual Information (founded, academic staff, affiliation, etc.)
Reviews (from Google+ and Facebook)
Alternative contextual search topics
After Semantic Web Optimization
Web Location with sitemapping
Physical Location
Pertinent business information
And for that matter, the type of business
Description of entity pulled in from Wikipedia
Verified Social Media profiles
oh, and some not-yet-verified (by Google) Social Media profiles
link to the full Wikipedia entry
and, just in case you mistyped or are looking for something else but couldn’t quite recall the name, here are some contextually appropriate entities you may be searching for
the suggestions provided here are not based upon browser history but rather on the relationships between the various entities. in short, linked data and machine learning
Interoperability - understood and operates within various systems
Discoverability - readily findable
Visibility - data and entity both prominently visible and available
Authority - knowing the entity we are looking at is the entity we are searching for
Use this data to create quarterly reports to share with partners.
Suite of web-based tools to measure web traffic and user activities.
Also developing methods for measuring correlation between SWI Service and metrics of strategic importance to MSU partners.
When comparing data from one week during June 2016 (AFTER SWI) to data from the same period in 2015 (BEFORE SWI), all analytical measures indicate the Semantic Web Identity Enhancement Project has produced favorable website activity for JJCBE. We see similar data for other partners involved in the project.
SESSIONS + 20%
Total number of Sessions within the date range. A session is the period time a user is actively engaged with your website, app, etc. All usage data (Screen Views, Events, Ecommerce, etc.) is associated with a session.
USERS + 25%
Users that have had at least one session within the selected date range. Includes both new and returning users.
PAGEVIEWS + 63%
Pageviews is the total number of pages viewed. Repeated views of a single page are counted.
PAGES / SESSION + 35%
Pages/Session (Average Page Depth) is the average number of pages viewed during a session. Repeated views of a single page are counted.
AVG SESSION DURATION + 18%
The average length of a Session.
BOUNCE RATE - 2%
Bounce Rate is the percentage of single-page visits (i.e. visits in which the person left your site from the entrance page without interacting with the page).
% NEW SESSIONS + 2%
An estimate of the percentage of first time visits.
Technical solutions to this problem are interesting, but only viewing the problem of Semantic Web Identity from a technical standpoint misses the greater point. Again, the appearance of a KC is simply an indicator that search engines understand the existence and nature of the entity in question. Librarians have been slow to engage in the Semantic Web and we are suffering the consequences. However, the situation is ripe with opportunities.