Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"

SILVERCHAIR
Supporting AI: Best Practices for
Content Delivery Platforms
Jake Zarnegar
Chief Business Development Officer, Silverchair

SILVERCHAIR
Orientation to Today’s Talks

SILVERCHAIR
Our Work in Context Understanding
and improving our
worldThe Big Loop:
Research and
Education
Understanding and
improving research
communication and
education effectiveness
The Little Loop:
Publishing
We play a key role in
multiple learning and
improvement loops
Hypothesize
Do something
Generate data
Analyze/Interpret data
Incorporate learning

SILVERCHAIR
• It is our responsibility to identify and apply promising new technology
advancements to the “loops” we serve.
• Examples:
• Internet content distribution (!)
• Streaming video
• Dataset/computational code sharing
• Adaptive learning algorithms
• Emerging TDM and AI content analysis techniques
Embracing the New is Not Optional

SILVERCHAIR
• New technologies take a while to
be “fit to purpose” in particular
problem areas
• Hat tip to Flavius Chis, my Silverchair colleague
(who is a Romanian living in America) who used
this chart to help me understand our fundamental
difference in orientation to new things.
But… Caution is Warranted

SILVERCHAIR
Much like “the cloud,” “big data,” and “machine learning” before it, the term
“artificial intelligence” has been hijacked by marketers and advertising copywriters.
If the hype leaves you asking “What is A.I., really?,” don’t worry, you’re not alone. I
asked various experts to define the term and got different answers. The only thing
they all seem to agree on is that artificial intelligence is a set of technologies that
try to imitate or augment human intelligence.
To me, the emphasis is on augmentation, in which intelligent software helps us
interact and deal with the increasingly digital world we live in.
Om Malik, The New Yorker
http://www.newyorker.com/business/currency/the-hype-and-hope-of-artificial-intelligence

SILVERCHAIR
How Do AI Tools Augment Human Intelligence?

SILVERCHAIR Bloom, et al. 1956

SILVERCHAIR
• Software is far superior to
human experts at this level
of cognition
• Does not require AI methods

SILVERCHAIR
Knowledge Discovery in Databases:
• Traditional knowledge discovery: “Insights are surfaced by highly-educated
human specialists finding, reading, and synthesizing the right few pieces of
information”
• TDM/AI-assisted knowledge discovery: “Insights are surfaced by specialist-
directed algorithms that perform higher-order cognitive analyses on
thousands, millions, or billions of pieces of information”
TDM/AI Methods are New Tools in An Expert’s Toolbox

SILVERCHAIR
http://news.mit.edu/2020/artificial-intelligence-identifies-new-antibiotic-0220
So We Can Just Sit Back and Watch the Magic Happen, Right?

SILVERCHAIR
• Online systems were initially designed to help human experts with
traditional knowledge discovery:
• Filtering/finding relevant content (bring it directly to them or the online
venues where they work)
• Assessing it quickly (visual display of titles, abstracts, tables/charts,
conclusions)
• Understanding it deeply (full narrative “story” of the research, visuals,
audio/video supplementation, citations to related research, PDFs to take
away and read later, etc.)
Not Quite…

SILVERCHAIR
To enable researchers (and publishers) to get the most value out of emerging
TDM/AI tools, we need to add and consistently evolve specifically-designed
content and data interfaces to our online content platforms.
New AI Tools = New Requirements for Content Platforms

SILVERCHAIR
Two Major Datasets
Content
• Research papers, datasets,
news, whitepapers, education
materials, courseware, etc.
• “Big Loop”
Interactions / Usage Activity
• Data generated from activity on
the platform (myriad basic and
complex measures).
• “Little Loop”
A modern online platform is a streaming content and analytics server

SILVERCHAIR
But how do TDM/AI projects actually work?

SILVERCHAIR
Knowledge Discovery in Databases (KDD) [1] is divided in four main phases: domain
exploration, data preparation, data mining, and interpretation of results.
1. The first phase is responsible for understanding the problem and what data will be
used in the knowledge discovery process.
2. The next phase selects, cleans, and transforms the data to a format that is
suitable for a specific data mining algorithm.
3. In the third phase, the chosen data mining algorithm performs some intelligent
techniques to discover patterns that can be of potential use.
4. The last phase is responsible for manipulating the extracted patterns to generate
interpretable knowledge for humans…

SILVERCHAIR
…Most of the research carried out in this area focus on the data mining phase, which
uses artificial intelligence algorithms like decision trees, artificial neural networks,
evolutionary computation, among others [2] to discover knowledge. On the other
hand, the data preparation phase, responsible for integration, cleaning, and
transformation of data, has not been the subject of much research. In fact, Pyle [3]
argues that “data preparation consumes 60 to 90% of the time needed to mine data
– and contributes 75 to 90% to the mining project’s success.”
From: Paulo M. Goncalves Jr. and Roberto S. M. Barros, "Automating Data Preprocessing with DMPML and
KDDML," 10th IEEE/ACIS International Conference on Computer and Information Science, 2011, DOI:
10.1109/ICIS.2011.23.

SILVERCHAIR
“I downloaded 2TB of Arxiv content last week but I can’t bring myself
to open it and start working on analyzing it because I know I have at
least 6 months of painstaking content cleanup & preparation ahead of
me before I can begin.”
--Mike M., Data Scientist, Fast Forward Labs

SILVERCHAIR
5 Platform Best Practices to Support AI Projects

SILVERCHAIR
• Most projects select content from multiple sources and add/mix with
internal proprietary data
• Even Facebook ingests third-party data to supplement their massive
consumer behavior database. (You’re not Facebook.)
• If your data is too hard to extract/work with, you may be bypassed for easier
alternatives
• Adopting this humble orientation will help you think less about your
preferences for format and delivery, and more about shared standards
1. Accept That You Don’t Have Everything They Need

SILVERCHAIR
• Publishing (especially research publishing) has a huge head start here, with
shared XML standards and many universal data structures
• Note that JSON format is preferred by many AI content analysis tools and
you may want to offer this format as well
• It is best if all of your content (including historical) is delivered in one
consistent format with consistent use of tagging structures
• The title, authors, abstract, where the conclusions are in the paper, etc.
2. Provide Consistent Content/Data Tagging

SILVERCHAIR
• Just letting bots crawl your site is not a programmatic interface. Web page
coding (primarily HTML, JS, CSS) is heavyweight -- built for visual layout and
interactions (traditional use case), although it can be enhanced with
embedded metadata.
• Best practice is to produce a separate programmatic interface that is more
lightweight and designed for concurrent, high-volume transactions.
3. Provide a Programmatic Interface
Of the Silverchair Platform’s last 500 million sessions, 380 million of them were
programmatic.

SILVERCHAIR
• Commercial Publisher Platforms:
https://www.elsevier.com/solutions/sciencedirect/support/api
• Aggregators:
https://www.dimensions.ai/dimensions-apis/
• Independent Platforms:
https://www.silverchair.com/news/silverchair-releases-analysis-and-content-
service/
• Rights/Licensing Organizations:
https://www.copyright.com/business/xmlformining/
API Examples to Examine

SILVERCHAIR
• Per our research and interviews, data scientists overwhelming desire to take
large amounts of content away and ingest it into their own systems (rather
than use your platform as a transactional dependency).
• Some APIs limit the amount that can be downloaded/taken away at one
time, applying old-school limits to the new techniques
• However, protection/vetting is necessary – this is a new type of relationship
4. Embrace Takeout & Delivery (vs. Eat-In)

SILVERCHAIR
• In a seeming paradox, ”some” of your data can be more valuable than “all” of
your data
• If you have data hooks and API tools for complex selection and filtering, the
AI or data scientist won’t need to develop tools of their own in their project
• Recently a licensing agent told me that signed a deal for more $ for a
specifically narrowed portion of a dataset than they quoted for the whole
dataset (not per object, more $ overall!)
5. Enable Complex Selection & Filtering

SILVERCHAIR
New Business Possibilities

SILVERCHAIR
“…data preparation consumes 60 to 90% of the time needed to mine data – and
contributes 75 to 90% to the mining project’s success”
From: Paulo M. Goncalves Jr. and Roberto S. M. Barros, "Automating Data Preprocessing with DMPML and
KDDML," 10th IEEE/ACIS International Conference on Computer and Information Science, 2011, DOI:
10.1109/ICIS.2011.23.
Remember the Business Need

SILVERCHAIR
• Full-text normalized JSON (or XML)
• Separate product/subscription for sale
• Enrich it further for complex filtering/selection or sell it “as is”
• Separate delivery mechanism (no human interface) but can piggyback on
existing content workflows
• Accesses a new class of customer w/deep pockets (AI creators or implementers)
• Requires new vetting/legal agreements
Consider Feeding AI Tools as a New Product

SILVERCHAIR
Or Create Your Own New Services Using AI
• Deliver new on-site content analysis tools for end users using a combination of
semantics & new text mining/AI tools (many of which are free to use from
Google/Amazon/Microsoft/others)
• Experiments, experiments, experiments (recommendation, auto-summarization,
analogues, image analysis, sentiment, prediction, etc.)

SILVERCHAIR
Thank You
Jake Zarnegar
Chief Business Development Officer, Silverchair

Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"

Similar to Zarneger "Supporting AI: Best Practices for Content Delivery Platforms" (20)

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Recently uploaded

Recently uploaded (20)

Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"

Editor's Notes