As a vendor to content producers of all kinds, shapes, and sizes here is what I want you to believe…
Data Harmony introduces the Deprecated status with the 3.14 release. Deprecated terms are used within thesauri to denote outdated or regressed terminology. The terms may include historical geographic states (e.g. “Soviet Union), may include outdated medical terminology (e.g. “Bright’s Disease” formally used to classify “Nephritis” of the kidneys), or outdated expressions of terms in which the taxonomy team wishes to retain but does not want to remove the term and add as a synonym. Indexers and archivists may want to preserve the outdated terminology for historical or legacy purposes, so we added an option to include them, yet limit the way they impact indexing future content.
Functionally, the deprecated status acts similarly to the Candidate statuses. Both have dedicated views in Thesaurus Master in order for taxonomists to “resolve” the term within the vocabulary. Both statuses have MAI options to ignore or include within the MAI suggested terms postings, and both statuses can be changed with the click of a button in the term record pane.
Finally, we added support for importing and exporting terms with Deprecated statuses . Terms with a Deprecated status, however, will not create identity rules for the term on import. Additionally, if a term is changed to Deprecated, users will not be allowed to edit or add rules which include a USE statement for the deprecated terms. These restrictions are to instruct taxonomy developers and managers to resolve future indexing of concepts which may have previously included the outdated vocabulary concepts yet must direct to a new term for future automated-indexing processes.
#1 on the upper left shows the radio button to change a term’s status in the thesaurus. To change the status, simply click Deprecated and the term record is saved as Deprecated.
#2 displays the Deprecated Terms view, listing each term with the deprecated status. This will allow users to sort through a list of deprecated terms and resolve each of their rules to redirect to another term or to remove a rule outright.
#3 shows the validation step in which a user attempts to save or add a rule which includes a deprecated concept. MAI will instruct the user that saving a rule including a deprecated term is invalid.
We have added multiple enhancements and rolled several smaller API calls into the suggest Terms API. This gives us and users more streamlined options to call on the Data Harmony suite for indexing content.
Data Harmony now fully supports JSON calls alongside XML output.
Weighting, or boosting, can now be performed at the section level of a document and adds a multiplier to the suggested returns value. For instance, terms discovered in the title would be skewed more heavily than terms in the body if the weightings were changed.
Suggest Terms can specify the maximum limit of suggested terms returned without reconfiguring the project from the Admin module.
Fullpath can specify the ideal-full-path of the descriptor as it appears in the taxonomy
Highlighting options provide inline-tagging of the descriptors within the text
Here’s an example of how the boosting would appear in an API.
Special Character Extensions -Special characters such as single quotes, ampersands, greater than and less than symbols, etc. have not been allowed in the MAIstro syntax. They are however very important in chemical nomenclature, as they are in place names such as Washington, D.C. AI now allows import of most special characters Apostrophes, representing possession are now recognized by the MAI parser. MAI will now correctly parse terms, mainly entity names, containing multiple special characters including parentheses, commas, and periods. Wrote a best practices section in the DH User Guide to deal with the variation in the use of Periods and commas. As long as neither the periods nor commas are followed by a whitespace, MAI will correctly parse the text-to-match. Where they are followed by a space please see the section recommending changing the padding characters setting in the Data Harmony Administration Module.
Data Harmony Update 2020 final
Data Harmony Update
• Who we are
• What we do
• Introducing Data Harmony 3.14
• New Features
• Introducing Data Harmony 4.0
• New Features
• New Products
Access Innovations, Inc.
What do we do?
Leveraging your content
A Brief History of Access Innovations, Inc.
• Founded in October, 1978 in Margie’s kitchen with 6 original
• Jay Ven Eman hired as employee #1!
• Building bibliographic databases by aggregating information
from secondary publishers
• First commercial installation of Apple computers in 1980
Mission and Vision
• To maximize customer information assets, their creation, capture,
distribution, and reuse
• Achieve and maintain technical and professional leadership in
software and services for content creators
• Closely held
• Financed by
• Sweat and Persistence
• Good Cash Flow and Management
• Since 1978
Marjorie M.K. Hlava
Jay Ven Eman
Woman Owned Small Business
• Metadata Creation and Enhancement
• Semantic Enrichment
• Controlled Vocabulary Development
• Database Design and Construction
• Text, Image, and Database Markup
• Data Capture and Conversion
• Abstracting and Indexing
• Training sets
• Medical Plants Names Service (MPNS)
Database Services - 3
• Applications development
• Data Harmony Hosting Environment
• Search – Lucene and Solr
• Search Harmony interface
• Web services layer
• Link to user experience or user interface
• Web calls
• API setup and linking
Database Services - 4
• Analytics from semantics
• Business Intelligence (BI)
• Visualizations for decision makers
• Coverage analytics
• Term mining
• Image indexing
• Fate prediction
• SciGen – No Bad Submissions (No B.S.)
• Data Harmony
• XIS (XML Intranet System)®
• M.A.I.® (Machine Aided Indexer)
• Thesaurus Master ®
• Built for our use starting in 1987
• Visual Basic C++ Java Web hosted
• Aid to the editorial and indexing processes
• Alleviate the clerical aspects
• Speed the tagging process
• Guarantee accuracy, consistency, and depth of indexing
• Two patents – 21 granted claims
• Platform independent
• Runs in proprietary "browser"; uses Java in Operating System, not
• APIs, Web services to interact with other apps
• TCP/IP over intra and internets
• SSL option included
• JSON option for API returns
• WebStart or installation app to simplify client installation
• GlassFish and TomCat for web app extensions
Data Harmony Suite - Main Modules
•XML Intranet System
•Administrative configuration module
•“The Data Harmony Suite”
• Machine Aided Indexing (M.A.I.)
• Semantic, syntactic, morphological, etc. layer
• Rule Builder for users
• Concept Extractor for text
• Statistics for Machine Learning
• Use in automatic, batch, or assisted mode
• Thesaurus Master
• For creating taxonomies, thesauri, ontologies, and authority files
• Thesaurus Master and M.A.I. combined
• A bunch more modules!
•Daily Blog – Melody Smith and the rest of Heather
•3 + items per day
•5 days a week
•Launched in June 2010
• The Human Genome Project lists 22,300 genes
• There are an average of 19 synonyms per gene name
• Bringing these together to auto index to the preferred
• Auto API call to the TaxoGene
• Licensed at $3895 per Year
• 2000 taxonomies listed
• Open access and deposit
• Reuse or update instead of build from scratch
Access Integrity (Ai2)
• Medical Claims Compliance
• Automatic ICD-10 suggestions
• Rules bases for
• Accurate, deep, consistent coding
• Making medical billing efficient
• Based on the patient encounter / physicians notes
3.14 v 1058+
• This means 1058 revisions and improvements since v 3.13
• Lots of little improvements
• A few big new features
• Most increases are in managed services
• New status for thesaurus terms
• Additional view added for terms with deprecated statuses
• Used for legacy indexing
• Rule saves disabled (cannot create new rules for Deprecated Terms)
• Import Options (no default identity rule built on import)
• Projects prior to 3.14 will not display deprecated terms unless changing one
line in the project configuration file.
• Added ability to import and export terms with deprecated status.
• Setting in Admin module for choosing to skip deprecated terms during M.A.I.
(“yes” to skip is the default setting)
1. Deprecated Term Status
in the Term Record Pane
3. Saving or changing a rule with a Deprecated term within a USE
statement will produce an error, signifying the editor to resolve the
term in the rule base or refrain from editing the current rule
2. Deprecated Terms
view – Produces an
alphabetical listing of
all terms with
Functions similarly to
Candidate Terms view.
• Can choose to index with deprecated terms as though their statuses were
• A new "Deprecated View" is now listed in the View options (under Candidate
• A term is switched to "deprecated" with simple click. If it has rules the editor will
popup and ask the editor to handle them (either delete or edit to remove the
• If a rule contains a deprecated terms it will not validate.
• When importing a new term as deprecated it won't automatically add a new
"identity rule" as we do with other "regular" terms.
• Added support for import and export.
New XIS Applet - MAI-rerun on re-index
• New XIS app declared within the schema to update MAI on all records
Suggested Terms API changes
Format (JSON or XML)
•Weights of terms can be “boosted” depending on
•Number of terms returned
•Allows Full path indexing
New DH APIs and Enhancements
Added multiple options to the suggestTerms API
1. Format (JSON or XML)
2. Boost Weighting of Terms
4. Use fields (to return with MAI terms)
6. Highlight (inlineTagging)
7. Capture (save received data or no)
8. SaveToXis (xisProject, xisDocset, xisUser)
9. Specify maximum number of returns
Added Logging API for every MAI call
example of suggestTerms
"format" : "XML",
"weight" : 3,
"batchLimit" : 1000,
"fields" : [
"saveToXis" : true,
"fullpath" : true,
"hilite" : false,
"xisProject" : "PLOSfilter",
"xisDocset" : "records",
"xisUser" : "editor"}
suggestTerms Weighting (Boost)
By changing the boost value for multiple
fields, we see the MAI suggested returns
in the output are skewed higher towards
terms that appear in highly boosted
sections such as article titles.
Special Character Extensions
• Single quotes, ampersands, greater than and less than symbols, etc.
• Formerly not been allowed in the MAIstro syntax
• AI now allows import of most special character
• Apostrophes, representing possession are now recognized by the MAI parser.
• MAI will now correctly parse terms, mainly entity names, containing multiple
special characters including parentheses, commas, and periods.
‘ ” & < >
• Wrote a best practices section in the DH User Guide
• Periods or commas are followed by a whitespace
• MAI will correctly parse the text-to-match.
• Where they are followed by a space please see the section
recommending changing the padding characters setting in the Data
Harmony Administration Module.
•Track how often the MAI server is called with an API
• IP addresses
DH 4.0 – the Dashboard
• Thesaurus Master
• Project Information
• Swift Summ
•New customers welcome
•Need an Upgrade? – see Heather or Jay
UI upgrade coming!
Image: Courtesy AACR and EJPress
Add a box:
• Five Rule bases
• Identifies taxonomic concepts
• Controversial topics
• Suspect science
• Endangered species
• Bad call lines
• Clinical trials
• XIS powers a pre submission filtering application
• Used to help editors quickly review records
• Retains SciGen Analysis and other metadata information
Medical Plant Names Service
•From The Royal Botanical Gardens at Kew
•Nearly 28,000 Medicinal Plants
• Full records
• 14.7 synonyms - average
• Know the right name and the actual use
•Offered on subscription as a API call for your data
Knowledge Organization Systems for Commerce
• NKOS, Linked data, academic apps, etc.
• But what about the things businesses use?
• Commerce apps
• Thin data
• Coded lists
• Need words and inferences
• Many applications in commerce
• Enabling search
• Enabling transactions
• Enabling purchase
• Use case
• How to index / tag everything
• On an online “store” site, like Amazon, eBay, Walmart, Home Depot, B&H
• Or instore to enable search on a kiosk
• Or for purchase of services and supplies on a corporate website
• Map to UNSPSC or Ecl@ss for corporate transactions
• UNSPSC (United Nations Standard Products and
“Ink jet printer”
Other code sets
(Walmart, Target, etc.)
Brick and Mortar
Self improving workflow which can improve the speed and accuracy.
Effective implementation of the master taxonomy
• A well maintained master taxonomy has multiple uses which can increase value
“Ink jet printer”
Other code sets
(Walmart, Target, etc.)
Brick and Mortar
A Knowledge Graph?
Or does it have to be an RDF Triples?
Certainly could be converted
Thesaurus Master with Knowledge
URL Linking enabling a deeper ontological
understanding of your metadata
Knowledge Graph Linking
• Thesaurus Master will now link to outside knowledge stores
• Mayo Clinic
• Also allow arbitrary knowledge stores
• In-house wiki’s
The Power of Knowledge Graphs
• The taxonomic motivation for knowledge graphs
• Mainly describes real world entities and their interrelations, organized in a
• Defines possible classes and relations of entities in a schema
• Allows for potentially interrelating arbitrary entities with each other
• Covers nearly all topical domains
• Use-case motivations
• Named-entity disambiguation
• SPARQL Query integration
• Automated NLP algorithms that read text changes in the graph and produce
structured knowledge extracted from that text.
• truth maintenance to all inferred knowledge, regardless of source, so that
revisions to the graph maintain consistency with itself.
• Knowledge graph integration will include API Integration
• Allow access to graph relationships
• SPARQL Queries
• Truth relationships
• NLP (MAI) access to the graphs
• Subgraph associations as well
• When this is useful for an organization
• Curation of the knowledge store
• Semantic Extract, Transform, and Load
• On Demand Load
• Custom Views
• Enhanced search in the taxonomy
• Custom term inferences
• Rule refinement