NFAIS Forethought:
Artificial Intelligence #2.
20th May 2020
Stuart Maxwell
COO, Scholarly iQ
Lessons learned from developing a
predictive analytics data model
1. About SiQ
2. Smarter Topic Analysis
Objective
Data
Findings
Actions
3. Lessons Learned
Leaders in usage reporting since 2002
Authorized COUNTER R5 vendor for ALL Platform, Database, Title and Item reports
Trusted 3rd party maintaining COUNTER compliancy every year since 2003
Fully independent, flexible and client focused solutions including SQL, Hadoop, Hive, Redshift, IBM,
Tableau, Cognos, QlikView and others
Delivering platform independent reporting on HighWire, Silverchair, IDM, Safari, ingenta,
PubFactory etc as well as publisher specific custom platforms
Innovating new uses and benefits from usage data such as the integration of SiQ’s PSI Metrics for
OA Usage Reporting and Denials Reporting
Topic Health Monitor
Topic Health Monitor
Objective – Aid editorial decisions and customer communications
Why - Usage data is provided for Titles, Databases, Books, Articles etc but could this
be made more intelligible in terms of subjects, topics and concepts?
How – Integrate trusted, industry standard quantitative performance metrics over
time with descriptive taxonomic data by DOI/URI
Outcome – THM will flag and predict which topics were significantly increasing or
decreasing in usage
Future Objectives – Segment by further available dimensions (Institution, Geo, Title, Funder etc)
Feed performance data into predictive models
Integrate with SiQ’s PSI Metrics OA usage reporting to qualify open access
topic usage models
Data
COUNTER Compliant Usage – Downloads per article per month over time
Publisher Taxonomic data – Topic identifiers by DOI
Calculation -
𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 =
(𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠 𝑓𝑜𝑟 𝑀𝑜𝑛𝑡ℎ − 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟)
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟
Map Outliers Above and Below Zero Change
Topic Health Monitor - decreasing
Data Insights
Topic Health Monitor – increasing
Data Insights
Topic Health Monitor – enhanced reporting and analysis
January November
Topic Health Monitor – segmentation by Account
Account Health Monitor
Topic Health Monitor - outputs
Showed changing topic usage rate of change with speed and direction of travel with increased
intelligibility compared to Titles, Articles, Books etc
Showed quantitative performance metrics fueling ongoing reporting and analytics into
trajectory and predictive models
Showed segmentation and custom analysis to targeted questions such as interest/topics usage
over time for particular clients
Identified opportunities for gap analysis between usage, search terms, content acquisition etc
Identified opportunities for topic clustering for recommendations as well as behavioural
customer segmentation
Topic Health Monitor - outputs
BUT – identified significant and meaningful skew in the taxonomic data
Missing/erroneous taxonomic data across the content with usage meant that reporting,
analysis and data models were NOT reliable
So explored solutions to achieve more complete subject, topic, concept data to integrate into
THM
UNSILO partnership with SiQ to mine concepts associated with DOI/URI directly from content
Return to test phase with joint customer data
Objectives -
Questions -
Data -
Compliance –
Actions –
Lessons Learned – know your
Set clear, contextual reasons for doing
What answers do I need to find and what are the boundaries of what these
answers might tell me?
What data do I need to answer these questions?
Where does underlying data come from? Can different sources work
together?
What are the metrics that are showing value for us? Where does this data
come from?
Do I have comprehensive enough data or are there gaps/skews? Can it be
replicated and standardised?
Can this data be used this way and what governance should be in place?
Understand the underlying reasons for results before taking action
Topic Health Monitor – next steps
Improved harvesting of content/concept data –
Integration with SiQ PSI Metrics OA Usage Reporting –
Enhanced, Integrated Topic Health Monitor –
Predictive modelling, recommendations, ???
Stuart Maxwell
COO, Scholarly iQ
stuart.maxwell@scholarlyiq.com
Phone: +44 (0) 7580 723230
www.scholarlyiq.com

Maxwell "Lessons Learned from Developing a Predictive Analytics Data Model"

  • 1.
    NFAIS Forethought: Artificial Intelligence#2. 20th May 2020 Stuart Maxwell COO, Scholarly iQ Lessons learned from developing a predictive analytics data model
  • 2.
    1. About SiQ 2.Smarter Topic Analysis Objective Data Findings Actions 3. Lessons Learned
  • 3.
    Leaders in usagereporting since 2002 Authorized COUNTER R5 vendor for ALL Platform, Database, Title and Item reports Trusted 3rd party maintaining COUNTER compliancy every year since 2003 Fully independent, flexible and client focused solutions including SQL, Hadoop, Hive, Redshift, IBM, Tableau, Cognos, QlikView and others Delivering platform independent reporting on HighWire, Silverchair, IDM, Safari, ingenta, PubFactory etc as well as publisher specific custom platforms Innovating new uses and benefits from usage data such as the integration of SiQ’s PSI Metrics for OA Usage Reporting and Denials Reporting
  • 4.
  • 5.
    Topic Health Monitor Objective– Aid editorial decisions and customer communications Why - Usage data is provided for Titles, Databases, Books, Articles etc but could this be made more intelligible in terms of subjects, topics and concepts? How – Integrate trusted, industry standard quantitative performance metrics over time with descriptive taxonomic data by DOI/URI Outcome – THM will flag and predict which topics were significantly increasing or decreasing in usage Future Objectives – Segment by further available dimensions (Institution, Geo, Title, Funder etc) Feed performance data into predictive models Integrate with SiQ’s PSI Metrics OA usage reporting to qualify open access topic usage models
  • 6.
    Data COUNTER Compliant Usage– Downloads per article per month over time Publisher Taxonomic data – Topic identifiers by DOI Calculation - 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑐ℎ𝑎𝑛𝑔𝑒 = (𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑑𝑜𝑤𝑛𝑙𝑜𝑎𝑑𝑠 𝑓𝑜𝑟 𝑀𝑜𝑛𝑡ℎ − 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟) 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑚𝑜𝑛𝑡ℎ𝑙𝑦 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑓𝑜𝑟 𝑦𝑒𝑎𝑟
  • 7.
    Map Outliers Aboveand Below Zero Change
  • 8.
    Topic Health Monitor- decreasing Data Insights
  • 9.
    Topic Health Monitor– increasing Data Insights
  • 10.
    Topic Health Monitor– enhanced reporting and analysis
  • 11.
    January November Topic HealthMonitor – segmentation by Account Account Health Monitor
  • 12.
    Topic Health Monitor- outputs Showed changing topic usage rate of change with speed and direction of travel with increased intelligibility compared to Titles, Articles, Books etc Showed quantitative performance metrics fueling ongoing reporting and analytics into trajectory and predictive models Showed segmentation and custom analysis to targeted questions such as interest/topics usage over time for particular clients Identified opportunities for gap analysis between usage, search terms, content acquisition etc Identified opportunities for topic clustering for recommendations as well as behavioural customer segmentation
  • 13.
    Topic Health Monitor- outputs BUT – identified significant and meaningful skew in the taxonomic data Missing/erroneous taxonomic data across the content with usage meant that reporting, analysis and data models were NOT reliable So explored solutions to achieve more complete subject, topic, concept data to integrate into THM UNSILO partnership with SiQ to mine concepts associated with DOI/URI directly from content Return to test phase with joint customer data
  • 14.
    Objectives - Questions - Data- Compliance – Actions – Lessons Learned – know your Set clear, contextual reasons for doing What answers do I need to find and what are the boundaries of what these answers might tell me? What data do I need to answer these questions? Where does underlying data come from? Can different sources work together? What are the metrics that are showing value for us? Where does this data come from? Do I have comprehensive enough data or are there gaps/skews? Can it be replicated and standardised? Can this data be used this way and what governance should be in place? Understand the underlying reasons for results before taking action
  • 15.
    Topic Health Monitor– next steps Improved harvesting of content/concept data – Integration with SiQ PSI Metrics OA Usage Reporting – Enhanced, Integrated Topic Health Monitor – Predictive modelling, recommendations, ???
  • 16.
    Stuart Maxwell COO, ScholarlyiQ stuart.maxwell@scholarlyiq.com Phone: +44 (0) 7580 723230 www.scholarlyiq.com