F.A.I.R Data with Knowledge
Graphs & AI
Strategy, processes & practices, and tools
Who is speaking today?
James Morris
Senior Informaiton
Scientist
Smartlogic
Fredric Landqvist
Senior Information
Architect
TietoEvry/Findwise
A few housekeeping items
• This webinar is in broadcast mode all
participants are muted
• Please put your questions in the
GoToWebinar panel and we’ll answer
as many as we can in the Q & A
session
• This broadcast is being recorded –
replay information will be sent to all
registrants following the broadcast
Agenda
• The quest for quality data and F.A.I.R principles
• Available standards and opportunities
• Mitigation opportunities – keeping the human in the loop
• How ontologies, semantics and enterprise knowledge graphs
provide a connection to business context
• Q&A
Data Governance
The quest for quality data assets!
F.A.I.R
5 - Star
Knowledge Graphs
Alphabet Soup
Vocabularies
http://flic.kr/p/zCyMp
Standards
Knowledge
Models
Cherry Picking
http://flic.kr/p/8Xtus2
Organising
Automatic for the people
Digital Assistants (AI)
Machine Learning
NLP
Federated Learning
…
Societal
Data
Challenges
Pathogen
Infrastructure
Environment
Health Data
The connected patient
Pathogen
• Patient, Profession,
Politics, Provision,
Provenance, Process,
Participation & Patterns
• Problem: Data chasm!
Smart Society Data
• Smart Cities, Smart
Building, next
generation
engineering, IoT,
infrastructure
• Standards: BRICK,
BOT, OSCL,
RealEstateCore,
SAREF…
Digital Blue Economy
Ocean Data Factory
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. principles in Context
FINDABLE
ACCESSIBLE
INTEROPERABLE
RE-USABLE
→ FAIR in the context of Information
Architecture
→ FAIR in the context of Librarianship
→ FAIR in the context of Smartlogic experience…
o GXP and research organizations
o Library/Informatics teams within
organizations
→ FAIR in the context of Knowledge Graphs and
Semantic AI
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. in information architecture
20
https://xd.adobe.com/ideas/process/information-architecture/information-architecture-users-content-context/
. Retrieved 25-Apr-2021
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. in the Librarians’ Context
https://www.mhpbooks.com/library-of-congress-quiz-for-librarians-and-also-regular-people/moby-card-
catalog/
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. in the Librarians’ Context
22
+
+
Then
Now
= F.A.I.R.
© 2021 SMARTLOGIC SEMAPHORE INC. 23
RDA/Objectives and Principles/Rev/3 1 July 2009
FIND IDENTIFY
SELECT
OBTAIN
UNDERSTAND
“The (meta)data should enable the user to…”
© 2021 SMARTLOGIC SEMAPHORE INC.
The 5 Laws of Library Science
1. Books Are For Use
2. Every Reader His/Her Book
3. Every Book Its Reader
4. Save The Time Of The Reader
5. The Library Is A Growing Organism
S. R. Ranganathan, 1931.
© Vikas Kamat
© 2021 SMARTLOGIC SEMAPHORE INC.
The 5 FAIR Laws of Data Science?
1. Data is for use by researchers*
2. Every researcher* their data
3. Every data object its researcher*
4. Save the time of the researcher*
5. The world of research data is a growing organism
*or their machine agents
https://www.develandoo.com/blog/do-robots-read/
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. and the Semantic Web?
https://5stardata.info/en/
F.A.I.R.
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. in the context of Smartlogic clients
27
https://www.ideagen.com/thought-leadership/blog/how-to-meet-all-9-alcoa-
principles-with-our-document-module
© 2021 SMARTLOGIC SEMAPHORE INC.
ACOLA+ data integrity principles
28
→ #1: Attributable: The person who performs a
data-related task must be identifiable as the
person who performed that task.
→ #2: Legible: Data should be readable and
understandable, with a clear picture of the
step/event sequence that data has passed
through
→ #3: Contemporaneous:
Data activity should be time stamped with a
record of when it took place.
→ #4: Original: Every originally captured piece of
data must be retained, rather than replaced or
deleted.
→ #5: Accurate: Data should be inputted, stored
and maintained with precision and validity.
https://www.ideagen.com/thought-leadership/blog/how-to-meet-all-9-alcoa-principles-with-our-document-module
→ #6: Complete: Data features a trackable audit
trail to prove that nothing has been deleted or
lost.
→ #7: Consistent: Data should display consistently,
wherever it is accessed from within your
document management system.
→ #8: Enduring: Records and information should
be accessible and readable during the entire
period in which they might be needed...
potentially decades after recording!
→ #9: Available: Documents and records should be
accessible in a readable format to all applicable
personnel responsible for their review or
operational processes. External users should also
be provided access for inspection/review where
necessary.
© 2021 SMARTLOGIC SEMAPHORE INC.
ACOLA+ and F.A.I.R.
29
For primary purpose of data:
GXP of product development, approval, and
distribution
For secondary use of data:
Research, data science, collaborations, new
opportunities
© 2021 SMARTLOGIC SEMAPHORE INC.
F.A.I.R. and Smartlogic Life Sciences Clients
30
FINDABLE
ACCESSIBLE
INTEROPERABLE
RE-USABLE
● Build a FAIR culture
○ Reference it in strategic plans
○ Meet people where they are
○ Capitalize on the current momentum
● Provide the right infrastructure to be FAIR
● Provide tools that integrate easily with their existing systems
● Make it easy to choose the right metadata:
○ Don’t over complicate the vocabularies
○ Map to vocabularies already in use
○ Provide assistance with choosing the right values
○ Automatically assign values when possible.
● Support the process with services
● Make it easy to do the right thing!
© 2021 SMARTLOGIC SEMAPHORE INC.
• Trending: used for knowledge representation and reasoning for by leaders
like Facebook, Google, Microsoft and any organization dependent on rapidly
changing, interconnected data.
• “Schema-less”: new data and types can be easily incorporated as there is no
formal structure to which the information must comply. Being non-
relational, information is stored as a series of nodes and edges or simple
constructs of subject-predicate-object (triple).
• Dynamic: a growing semantic network of facts about things that can be used
for data integration, knowledge discovery, and in-depth analysis.
• Intelligent? a successful graph combines data from many different sources
allowing for new connections to made, inferred or tested. Ontologies or
other knowledge models help make those connections and pose hypotheses.
• Chaotic: Data in a graph does not automatically form connections and can
lead to user frustration. Planning, intention, and a focus on semantics is
needed.
Knowledge Graphs are…
31
© 2021 SMARTLOGIC SEMAPHORE INC.
Semaphore in a nutshell
32
Semaphore delivers these capabilities at enterprise scale
Build and manage semantic models
Simplify the ingestion, development and customization
Enrich, extract and harmonize
• Enrich information assets with complete, consistent and precise metadata
• Extract critical facts, entities and relationships for further processing
• Harmonize different information sources for unified access
Apply semantics to your business problem
• Enable knowledge discovery
• Support investigative analytics
• Automate manual processes for higher precision
© 2021 SMARTLOGIC SEMAPHORE INC.
Semaphore and Knowledge Graphs
DATA FROM
DOCUMENTS
DATA FROM
DATABASES
ONTOLOGIES AND
KNOWLEDGE MODELS
© 2021 SMARTLOGIC SEMAPHORE INC.
Discovering Knowledge with Graph Traversal
34
© 2021 SMARTLOGIC SEMAPHORE INC. 35
Knowledge Graphs in a Wider Context.
Reference
Data
Semantic Data
Extraction
Semantically-Enhanced
Data Catalogue
Metadata Hub
Subjective Enrichment
(Aboutness)
Semantically-Enhanced
Analytics
Enterprise
Knowledge
Graph
Semaphore Semantic AI Services, Integrations and Capabilities
Semantic Search
& Discovery
Q & A
Thank you for attending
We’ll take questions now
Fredric Landqvist & James R Morris
Findwise /TietoEvry & Smartlogic
References and links
• F.A.I.R Data principles and Open Phacts (open pharma space)
• FORCE11 - Future of Research Communications and e-Scholarship
• Open Standards, Knowledge Models and Vocabularies:
• W3C RDF, SKOS, OWL, SHACL
• Health Data: HL7/FHIR, MeSH, SnoMed CT, UMLS, ICD11 , OMG and more
• Smart Cities, Buildings and Services: BRICKS, BOT, SANREF, OSLC
• Information Science: Resource Description and Access (RDA), ACOLA+
• Blog post-series on FAIR Data, Knowledge Graphs, AI and more
• Findwise, TietoEvry and Smartlogic
Thank You

F.A.I.R. Data with Knowledge Graphs & AI

  • 1.
    F.A.I.R Data withKnowledge Graphs & AI Strategy, processes & practices, and tools
  • 2.
    Who is speakingtoday? James Morris Senior Informaiton Scientist Smartlogic Fredric Landqvist Senior Information Architect TietoEvry/Findwise
  • 3.
    A few housekeepingitems • This webinar is in broadcast mode all participants are muted • Please put your questions in the GoToWebinar panel and we’ll answer as many as we can in the Q & A session • This broadcast is being recorded – replay information will be sent to all registrants following the broadcast
  • 4.
    Agenda • The questfor quality data and F.A.I.R principles • Available standards and opportunities • Mitigation opportunities – keeping the human in the loop • How ontologies, semantics and enterprise knowledge graphs provide a connection to business context • Q&A
  • 5.
    Data Governance The questfor quality data assets!
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Automatic for thepeople Digital Assistants (AI) Machine Learning NLP Federated Learning …
  • 13.
  • 14.
  • 15.
    Pathogen • Patient, Profession, Politics,Provision, Provenance, Process, Participation & Patterns • Problem: Data chasm!
  • 16.
    Smart Society Data •Smart Cities, Smart Building, next generation engineering, IoT, infrastructure • Standards: BRICK, BOT, OSCL, RealEstateCore, SAREF…
  • 17.
  • 19.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. principles in Context FINDABLE ACCESSIBLE INTEROPERABLE RE-USABLE → FAIR in the context of Information Architecture → FAIR in the context of Librarianship → FAIR in the context of Smartlogic experience… o GXP and research organizations o Library/Informatics teams within organizations → FAIR in the context of Knowledge Graphs and Semantic AI
  • 20.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. in information architecture 20 https://xd.adobe.com/ideas/process/information-architecture/information-architecture-users-content-context/ . Retrieved 25-Apr-2021
  • 21.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. in the Librarians’ Context https://www.mhpbooks.com/library-of-congress-quiz-for-librarians-and-also-regular-people/moby-card- catalog/
  • 22.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. in the Librarians’ Context 22 + + Then Now = F.A.I.R.
  • 23.
    © 2021 SMARTLOGICSEMAPHORE INC. 23 RDA/Objectives and Principles/Rev/3 1 July 2009 FIND IDENTIFY SELECT OBTAIN UNDERSTAND “The (meta)data should enable the user to…”
  • 24.
    © 2021 SMARTLOGICSEMAPHORE INC. The 5 Laws of Library Science 1. Books Are For Use 2. Every Reader His/Her Book 3. Every Book Its Reader 4. Save The Time Of The Reader 5. The Library Is A Growing Organism S. R. Ranganathan, 1931. © Vikas Kamat
  • 25.
    © 2021 SMARTLOGICSEMAPHORE INC. The 5 FAIR Laws of Data Science? 1. Data is for use by researchers* 2. Every researcher* their data 3. Every data object its researcher* 4. Save the time of the researcher* 5. The world of research data is a growing organism *or their machine agents https://www.develandoo.com/blog/do-robots-read/
  • 26.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. and the Semantic Web? https://5stardata.info/en/ F.A.I.R.
  • 27.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. in the context of Smartlogic clients 27 https://www.ideagen.com/thought-leadership/blog/how-to-meet-all-9-alcoa- principles-with-our-document-module
  • 28.
    © 2021 SMARTLOGICSEMAPHORE INC. ACOLA+ data integrity principles 28 → #1: Attributable: The person who performs a data-related task must be identifiable as the person who performed that task. → #2: Legible: Data should be readable and understandable, with a clear picture of the step/event sequence that data has passed through → #3: Contemporaneous: Data activity should be time stamped with a record of when it took place. → #4: Original: Every originally captured piece of data must be retained, rather than replaced or deleted. → #5: Accurate: Data should be inputted, stored and maintained with precision and validity. https://www.ideagen.com/thought-leadership/blog/how-to-meet-all-9-alcoa-principles-with-our-document-module → #6: Complete: Data features a trackable audit trail to prove that nothing has been deleted or lost. → #7: Consistent: Data should display consistently, wherever it is accessed from within your document management system. → #8: Enduring: Records and information should be accessible and readable during the entire period in which they might be needed... potentially decades after recording! → #9: Available: Documents and records should be accessible in a readable format to all applicable personnel responsible for their review or operational processes. External users should also be provided access for inspection/review where necessary.
  • 29.
    © 2021 SMARTLOGICSEMAPHORE INC. ACOLA+ and F.A.I.R. 29 For primary purpose of data: GXP of product development, approval, and distribution For secondary use of data: Research, data science, collaborations, new opportunities
  • 30.
    © 2021 SMARTLOGICSEMAPHORE INC. F.A.I.R. and Smartlogic Life Sciences Clients 30 FINDABLE ACCESSIBLE INTEROPERABLE RE-USABLE ● Build a FAIR culture ○ Reference it in strategic plans ○ Meet people where they are ○ Capitalize on the current momentum ● Provide the right infrastructure to be FAIR ● Provide tools that integrate easily with their existing systems ● Make it easy to choose the right metadata: ○ Don’t over complicate the vocabularies ○ Map to vocabularies already in use ○ Provide assistance with choosing the right values ○ Automatically assign values when possible. ● Support the process with services ● Make it easy to do the right thing!
  • 31.
    © 2021 SMARTLOGICSEMAPHORE INC. • Trending: used for knowledge representation and reasoning for by leaders like Facebook, Google, Microsoft and any organization dependent on rapidly changing, interconnected data. • “Schema-less”: new data and types can be easily incorporated as there is no formal structure to which the information must comply. Being non- relational, information is stored as a series of nodes and edges or simple constructs of subject-predicate-object (triple). • Dynamic: a growing semantic network of facts about things that can be used for data integration, knowledge discovery, and in-depth analysis. • Intelligent? a successful graph combines data from many different sources allowing for new connections to made, inferred or tested. Ontologies or other knowledge models help make those connections and pose hypotheses. • Chaotic: Data in a graph does not automatically form connections and can lead to user frustration. Planning, intention, and a focus on semantics is needed. Knowledge Graphs are… 31
  • 32.
    © 2021 SMARTLOGICSEMAPHORE INC. Semaphore in a nutshell 32 Semaphore delivers these capabilities at enterprise scale Build and manage semantic models Simplify the ingestion, development and customization Enrich, extract and harmonize • Enrich information assets with complete, consistent and precise metadata • Extract critical facts, entities and relationships for further processing • Harmonize different information sources for unified access Apply semantics to your business problem • Enable knowledge discovery • Support investigative analytics • Automate manual processes for higher precision
  • 33.
    © 2021 SMARTLOGICSEMAPHORE INC. Semaphore and Knowledge Graphs DATA FROM DOCUMENTS DATA FROM DATABASES ONTOLOGIES AND KNOWLEDGE MODELS
  • 34.
    © 2021 SMARTLOGICSEMAPHORE INC. Discovering Knowledge with Graph Traversal 34
  • 35.
    © 2021 SMARTLOGICSEMAPHORE INC. 35 Knowledge Graphs in a Wider Context. Reference Data Semantic Data Extraction Semantically-Enhanced Data Catalogue Metadata Hub Subjective Enrichment (Aboutness) Semantically-Enhanced Analytics Enterprise Knowledge Graph Semaphore Semantic AI Services, Integrations and Capabilities Semantic Search & Discovery
  • 36.
    Q & A Thankyou for attending We’ll take questions now Fredric Landqvist & James R Morris Findwise /TietoEvry & Smartlogic
  • 37.
    References and links •F.A.I.R Data principles and Open Phacts (open pharma space) • FORCE11 - Future of Research Communications and e-Scholarship • Open Standards, Knowledge Models and Vocabularies: • W3C RDF, SKOS, OWL, SHACL • Health Data: HL7/FHIR, MeSH, SnoMed CT, UMLS, ICD11 , OMG and more • Smart Cities, Buildings and Services: BRICKS, BOT, SANREF, OSLC • Information Science: Resource Description and Access (RDA), ACOLA+ • Blog post-series on FAIR Data, Knowledge Graphs, AI and more • Findwise, TietoEvry and Smartlogic
  • 38.