Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Harmony Update 2020 final

22 views

Published on

Margie Hlava discusses the latest with Data Harmony, the most advanced human-assisted AI platform on the market.

Published in: Marketing
  • Be the first to comment

  • Be the first to like this

Data Harmony Update 2020 final

  1. 1. Data Harmony Update February 2020
  2. 2. Update • Who we are • What we do • Introducing Data Harmony 3.14 • New Features • Introducing Data Harmony 4.0 • New Features • New Products
  3. 3. Access Innovations, Inc. What do we do? Leveraging your content semantically
  4. 4. A Brief History of Access Innovations, Inc. • Founded in October, 1978 in Margie’s kitchen with 6 original partners • Jay Ven Eman hired as employee #1! • Building bibliographic databases by aggregating information from secondary publishers • First commercial installation of Apple computers in 1980
  5. 5. Mission and Vision • MISSION: • To maximize customer information assets, their creation, capture, distribution, and reuse • VISION: • Achieve and maintain technical and professional leadership in software and services for content creators
  6. 6. • Closely held • Financed by • Sweat and Persistence • Good Cash Flow and Management • Since 1978 Marjorie M.K. Hlava Jay Ven Eman Joanna Ginter Woman Owned Small Business Corporate Information
  7. 7. Some of Our Current Clients IOP
  8. 8. Our Services • Metadata Creation and Enhancement • Semantic Enrichment • Controlled Vocabulary Development • Database Design and Construction • Text, Image, and Database Markup • Data Capture and Conversion • Abstracting and Indexing • Training sets • Medical Plants Names Service (MPNS)
  9. 9. Database Services • Database Design • Consulting • DTD / Metadata Schemas • Workflow Analysis / Project Scheduling • Editorial Services • Metadata capture and creation • Tagging – XML, SGML • Abstracting /Indexing • Author disambiguation • Semantic Enrichment
  10. 10. Database Services - 2 • Taxonomy Construction • Thesaurus • Vocabulary • Ontology • Data Linking (linked data) • Authority Files – pick lists • Rule Bases • Semantic Enrichment - automatically • Data Format Conversion • Database Applications • Retrospective metadata tagging • Author disambiguation
  11. 11. Database Services - 3 • Applications development • Data Harmony Hosting Environment • Search – Lucene and Solr • Search Harmony interface • Web services layer • Link to user experience or user interface • Web calls • API setup and linking
  12. 12. Database Services - 4 • Analytics from semantics • Business Intelligence (BI) • Visualizations for decision makers • Coverage analytics • Term mining • Image indexing • Fate prediction • SciGen – No Bad Submissions (No B.S.) • www.accessinn.com
  13. 13. Our Software • Data Harmony • XIS (XML Intranet System)® • M.A.I.® (Machine Aided Indexer) • Thesaurus Master ® MAIstro™ Data Harmony Suite
  14. 14. Extension Modules – 4.0 • Extension Modules • Search Harmony • DiscoverENT • SentiScore • TopiCluster • SwiftSumm
  15. 15. Managed Services • Inline Tagging • Search Harmony • Semantic Fingerprinting • Smart Submit • TaxoGene • MAIChem • SciGen Detection • Access Integrity – Medical Coding • MeSH Rule Base • NewsIndexer Rule Base • MPNS Tagging • E-commerce mapping
  16. 16. Data Harmony • Built for our use starting in 1987 • Visual Basic C++ Java Web hosted • Aid to the editorial and indexing processes • Alleviate the clerical aspects • Speed the tagging process • Guarantee accuracy, consistency, and depth of indexing • Two patents – 21 granted claims
  17. 17. Data Harmony • Java • Platform independent • Runs in proprietary "browser"; uses Java in Operating System, not browser applets • APIs, Web services to interact with other apps • XML • TCP/IP over intra and internets • SSL option included • JSON option for API returns • WebStart or installation app to simplify client installation • GlassFish and TomCat for web app extensions www.dataharmony.com
  18. 18. Data Harmony Suite - Main Modules •M.A.I. •Thesaurus Master •XIS •XML Intranet System •Administrative configuration module •“The Data Harmony Suite”
  19. 19. Full multilingual display
  20. 20. Data Harmony • Machine Aided Indexing (M.A.I.) • Semantic, syntactic, morphological, etc. layer • Rule Builder for users • Concept Extractor for text • Statistics for Machine Learning • Use in automatic, batch, or assisted mode • Thesaurus Master • For creating taxonomies, thesauri, ontologies, and authority files • MAIstro • Thesaurus Master and M.A.I. combined • AND • A bunch more modules!
  21. 21. TaxoDiary •Daily Blog – Melody Smith and the rest of Heather Kotula’s team •Weekly Feature •3 + items per day •5 days a week •Big archive •Launched in June 2010
  22. 22. TaxoGene • The Human Genome Project lists 22,300 genes • There are an average of 19 synonyms per gene name • Bringing these together to auto index to the preferred name • Auto API call to the TaxoGene • Licensed at $3895 per Year
  23. 23. TaxoBank • 2000 taxonomies listed • Open access and deposit • Terms of use included • Reuse or update instead of build from scratch
  24. 24. Access Integrity (Ai2) • Medical Claims Compliance • Automatic ICD-10 suggestions • Rules bases for • CPT • HCPCS • ICD-10 • Accurate, deep, consistent coding • Making medical billing efficient • Based on the patient encounter / physicians notes
  25. 25. New Releases - 2020 • 3.14 • Deprecated terms • Uber API • 4.0 • DiscoverEnt • SentiScore • TOPiCluster • TermSpy • Swift Summ • Smart Submit • TaxoGene • Kew MPNS Service • E-commerce mapping • Knowledge Graph Linking
  26. 26. New for 3.14 SOFTWARE UPDATE
  27. 27. 3.14 v 1058+ • This means 1058 revisions and improvements since v 3.13 • Lots of little improvements • A few big new features • Most increases are in managed services
  28. 28. Deprecated Terms • New status for thesaurus terms • Additional view added for terms with deprecated statuses • Behavior • Used for legacy indexing • Rule saves disabled (cannot create new rules for Deprecated Terms) • Import Options (no default identity rule built on import) • Projects prior to 3.14 will not display deprecated terms unless changing one line in the project configuration file. • Added ability to import and export terms with deprecated status. • Setting in Admin module for choosing to skip deprecated terms during M.A.I. (“yes” to skip is the default setting)
  29. 29. 1. Deprecated Term Status in the Term Record Pane 3. Saving or changing a rule with a Deprecated term within a USE statement will produce an error, signifying the editor to resolve the term in the rule base or refrain from editing the current rule 2. Deprecated Terms view – Produces an alphabetical listing of all terms with deprecated status. Functions similarly to Candidate Terms view.
  30. 30. Deprecated Terms • Can choose to index with deprecated terms as though their statuses were Candidate • A new "Deprecated View" is now listed in the View options (under Candidate Terms option). • A term is switched to "deprecated" with simple click. If it has rules the editor will popup and ask the editor to handle them (either delete or edit to remove the term). • If a rule contains a deprecated terms it will not validate. • When importing a new term as deprecated it won't automatically add a new "identity rule" as we do with other "regular" terms. • Added support for import and export.
  31. 31. New XIS Applet - MAI-rerun on re-index • New XIS app declared within the schema to update MAI on all records when re-indexed
  32. 32. Suggested Terms API changes Format (JSON or XML) •XML Changes level •Weights of terms can be “boosted” depending on the field •Number of terms returned •Allows Full path indexing
  33. 33. New DH APIs and Enhancements Added multiple options to the suggestTerms API 1. Format (JSON or XML) 2. Boost Weighting of Terms 3. BatchLimit, 4. Use fields (to return with MAI terms) 5. Fullpath 6. Highlight (inlineTagging) 7. Capture (save received data or no) 8. SaveToXis (xisProject, xisDocset, xisUser) 9. Specify maximum number of returns Added Logging API for every MAI call example of suggestTerms { "format" : "XML", "weight" : 3, "batchLimit" : 1000, "fields" : [ "BT", "NT", "RT" ], "saveToXis" : true, "fullpath" : true, "hilite" : false, "xisProject" : "PLOSfilter", "xisDocset" : "records", "xisUser" : "editor"}
  34. 34. suggestTerms Weighting (Boost) By changing the boost value for multiple fields, we see the MAI suggested returns in the output are skewed higher towards terms that appear in highly boosted sections such as article titles. { "boosts": [ { "type": "xpath", "value": "/doc/section-title/title", "boost": 5 }, { "type": "regex", "value": "<abstract>.*?</abstract>" , "boost": 2 }, { "type": "xmlTag", "value": "footer", "boost": -10 } ] }
  35. 35. Special Character Extensions • Single quotes, ampersands, greater than and less than symbols, etc. • Formerly not been allowed in the MAIstro syntax • AI now allows import of most special character • Apostrophes, representing possession are now recognized by the MAI parser. • MAI will now correctly parse terms, mainly entity names, containing multiple special characters including parentheses, commas, and periods. ‘ ” & < >
  36. 36. Washington, D.C. • Wrote a best practices section in the DH User Guide • Periods or commas are followed by a whitespace • MAI will correctly parse the text-to-match. • Where they are followed by a space please see the section recommending changing the padding characters setting in the Data Harmony Administration Module.
  37. 37. Logging API •Track how often the MAI server is called with an API •Dates • Timestamps • IP addresses
  38. 38. DH 4.0 – the Dashboard • Thesaurus Master • MAI • XIS • Project Information • Admin • Support • DiscoverEnt • SentiScore • TOPiCluster • TermSpy • Swift Summ
  39. 39. Dashboard splash screen
  40. 40. Thesaurus Master
  41. 41. XIS
  42. 42. DiscoverENT
  43. 43. Sentiscore
  44. 44. TopiCluster
  45. 45. Term Spy
  46. 46. SwiftSumm
  47. 47. •New customers welcome •Need an Upgrade? – see Heather or Jay UI upgrade coming!
  48. 48. Image: Courtesy AACR and EJPress Add a box: “Suggest New terms” Smart Submit
  49. 49. • Five Rule bases • Identifies taxonomic concepts • Controversial topics • Suspect science • Endangered species • Bad call lines • Clinical trials • XIS powers a pre submission filtering application • Used to help editors quickly review records • Retains SciGen Analysis and other metadata information Smart Submit
  50. 50. Medical Plant Names Service •From The Royal Botanical Gardens at Kew •Nearly 28,000 Medicinal Plants • Full records • 14.7 synonyms - average • Know the right name and the actual use •Offered on subscription as a API call for your data
  51. 51. Knowledge Organization Systems for Commerce • NKOS, Linked data, academic apps, etc. • But what about the things businesses use? • Commerce apps • Thin data • Coded lists • Need words and inferences • Many applications in commerce • Enabling search • Enabling transactions • Enabling purchase
  52. 52. E-Commerce transactions • Use case • How to index / tag everything • On an online “store” site, like Amazon, eBay, Walmart, Home Depot, B&H Photo • Or instore to enable search on a kiosk • Or for purchase of services and supplies on a corporate website • Map to UNSPSC or Ecl@ss for corporate transactions • UNSPSC (United Nations Standard Products and Services Code)
  53. 53. Others KOS Platform Code 101011 Inkjet Printers UNSPSC “Computer printers” 43212104 Eclass “Ink jet printer” 19140103 Other code sets Product Code Sets Local Stores Local Stores Local Stores Local Stores Large Retailers (Walmart, Target, etc.) Brick and Mortar Retailers eBay “Printers,Computer” 171961 eCommerce Retailers eBay “Printers, Inkjet” 745677 eBay “Printers,Computer” 171961 USAID Federal Agencies NASA
  54. 54. What Next? Self improving workflow which can improve the speed and accuracy.
  55. 55. What Next? Effective implementation of the master taxonomy • A well maintained master taxonomy has multiple uses which can increase value including…
  56. 56. Others KOS Platform Code 101011 Inkjet Printers UNSPSC “Computer printers” 43212104 Eclass “Ink jet printer” 19140103 Other code sets Product Code Sets Local Stores Local Stores Local Stores Local Stores Large Retailers (Walmart, Target, etc.) Brick and Mortar Retailers eBay “Printers,Computer” 171961 eCommerce Retailers eBay “Printers, Inkjet” 745677 eBay “Printers,Computer” 171961 USAID Federal Agencies NASA A Knowledge Graph? Or does it have to be an RDF Triples? Certainly could be converted
  57. 57. Coming Soon!
  58. 58. Thesaurus Master with Knowledge Graphs URL Linking enabling a deeper ontological understanding of your metadata a.k.a. Knowledge Graph Linking
  59. 59. Knowledge Graph • Thesaurus Master will now link to outside knowledge stores • Wikipedia • DBPedia • WebMD • Mayo Clinic • Also allow arbitrary knowledge stores • In-house wiki’s • Databases • Etc…
  60. 60. The Power of Knowledge Graphs • The taxonomic motivation for knowledge graphs • Mainly describes real world entities and their interrelations, organized in a graph • Defines possible classes and relations of entities in a schema • Allows for potentially interrelating arbitrary entities with each other • Covers nearly all topical domains • Use-case motivations • Named-entity disambiguation • SPARQL Query integration • Automated NLP algorithms that read text changes in the graph and produce structured knowledge extracted from that text. • truth maintenance to all inferred knowledge, regardless of source, so that revisions to the graph maintain consistency with itself.
  61. 61. API Support • Knowledge graph integration will include API Integration • Allow access to graph relationships • SPARQL Queries • Truth relationships • NLP (MAI) access to the graphs • Subgraph associations as well • When this is useful for an organization • Curation of the knowledge store • Semantic Extract, Transform, and Load • On Demand Load • Custom Views • Enhanced search in the taxonomy • Custom term inferences • Rule refinement
  62. 62. • 3.14 • Deprecated terms • Uber API • 4.0 • DiscoverEnt • SentiScore • TOPiCluster • TermSpy • Swift Summ • Smart Submit • TaxoGene • Kew MPNS Service • E-commerce mapping • Knowledge Graph Linking New releases - 2020

×