This document proposes turning unstructured text data from newspaper and magazine archives into "Smart Data" through the use of conceptual models and metadata. Smart Data would have embedded descriptions allowing software to understand and interact with the underlying data. The proposal describes designing Smart Data based on existing metadata standards for museums and publications, and creating a system that can extract facts and answers from the text archives through queries against the Smart Data.
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
FactMiners & PRImA's "Turning Text Soup into Smart Data" - The Goal: Smart Data
1. Goal: Smart Data
From “readable” to “computable”
FactMiners & PRImA’s
Knight News Challenge Entry
Turn Text Soup into Smart Data in
Newspaper & Magazine Archives”
A self-running video slideshow.
One slide every 15 seconds.
Pause as needed.
2. Q: What is Smart Data?
• A: Smart Data is self-descriptive
data that can “carry on a conversation”
with Smart Programs to support
access, editing, and visualization of
the data itself.
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
The “actual” data of the database
To access the “actual” data of the database,
Smart Programs “talk” to an embedded
“database about the database” (AKA a metamodel )
3. Q: What does Smart Data look like?
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
• A: Smart Data includes BOTH the
complex document structure
of the source AND the underlying
conceptual model of the source
content.
4. Q: What can Smart Data do?
• A: Turn expensive, time-
consuming, labor-intensive
research studies into “Just ask!”
queries
• Good for things like:
• How did local reporting of race
relations impact public policy in
Indiana in the 1950s?
• Did advertising or editorial
coverage account for the
popularity of programs in the
Softalk Bestseller lists?
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
5. Q: How “smart” is our Smart Data design?
• We spent a year researching
museum informatics and
prototyping Smart Data designs.
• Our software architecture is based
on CIDOC-CRM (Conceptual
Reference Model for Museums)
microservice workflows and
PRESSoo, the ISSN.org
metamodel for serial publications
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
Winter, 2013
Spring, 2014
Fall, 2014
Summer, 2015
Neo4j GraphGist Challenge,
a 1st place for Metamodel
Subgraph domain model
Semi-finals Ashoka/LEGO
“Re-imagine Learning” Challenge.
#MW2014 FactMiners demo.
Introduced to #cidocCRM.
Museum Computer Network
Emerging Professional Scholarship.
#MCN2014 paper & demo.
“Massively Addressable Text” published
in peer-reviewed CODE|WORDS.
#HILT2015 Crowdsourcing Course
DPLA Community Reps.
Internet Archive Content Partner.
ICOM #cidocCRM SIG member.
Incorporate PRESSoo into design.
Begin PRImA Collaboration.
6. Q: How “open” is our Smart Data design?
• Using a metamodel
subgraph design
pattern to embed and pass
info about data and its access
and transformation is
technology neutral &
future-proof.
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
Without Smart Data
With Smart Data
Database
10 Load X
20 Print X
30 Goto 10
Domain knowledge written
into task-specific programs
Metamodel statically stored
within #TEI header section of
source documents std. text files
<teiHeader>
<metamodel />
<structure />
<content />
Any “smart” DB
For dynamic Linked Open Data access,
DB need only have import &
ability to represent data structures
read from metamodel header.
10 Load metamodel
20 Configure editors
30 Do stuff…
“Smart” program in
any language
7. We have a design to “tame” Text Soup and
unlock “facts” in archive data.
• An innovative design combining international standards
for conceptual modeling of museum collections
(cidocCRM and PRESSoo) together with a “self-
descriptive” software/database design pattern provide the
foundation for mining Smart Data from Text Soup.
• In the next slideshow, we describe our design for the
technology to “fact-mine” Smart Data from
newspaper & magazine digital archives…
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”
8. FactMiners & PRImA:
Our Knight News Challenge Entry
•“Turn Text Soup into Smart Data in
Newspaper & Magazine Archives”
https://goo.gl/99Vn5M
• Team
• Jim Salmons, FactMiners
• Timlynn Babitsky, FactMiners
• Apostolos Antonacopoulos, PRImA
• Christian Clausner, PRImA
FactMiners & PRImA: Knight News Challenge – “Turning Text Soup into Smart Data in Newspaper & Magazine Archives”