Baker Tilly refers to Baker Tilly Virchow Krause, LLP,
an independently owned and managed member of Baker Tilly International. © 2012 Baker Tilly Virchow Krause, LLP
Baker Tilly
Management Consulting
Realizing Business Value from Unstructured Data
THE VALUE OF UNSTRUCTURED DATA
ANALYTICS
2
There is a tremendous opportunity to gain a competitive advantage by
analyzing unstructured data
Industries continue to struggle with integrating unstructured analytics into their business models. It is time
consuming to identify all of the relevant data source and technically challenging to consume the data into an
analytics environment, where additional processing needs to occur before the data can be analyzed.
Leveraging a Broad Variety of Data:
Companies must be able to transform and parse data from multiple sources and in multiple formats:
databases, text files, scientific devices, transactions, and even social media postings. End users also need easy,
consistent access to all of this data to create a 360-degree view- of their customers, their products, or their
brand.
The Value of Unstructured Data Analytics
Unlocking the True Potential of Big Data
3
The Value of Unstructured Data Analytics
Mapping data sources to use-cases
Business use-cases that can benefit from an analysis of unstructured data include:
• Clinical Trial Development-Analysis: Expedite the analysis of patient diary and Patient Reported Outcome data to reduce
time-to-market (and potentially uncover unanticipated benefits in early stage trials)
• Clinical Trial-PRO Development- Analyzing publicly available discussion forum data can accelerate the development of
Patient Reported Outcome measures and streamline the FDA’s protocol review process
• Active Market Surveillance (Pharmacovigilance): Are patients using and experiencing your product in a manner that is
consistent with your Clinical Trial data?
• Market Intelligence: Understanding how your customers are describing their experiences with specific medications can
inform market positioning and facilitate targeted messaging
• Labelling Claim Expansion: Are there unanticipated applications and benefits that are being articulated by customers
that can be used to inform programmatic expansion of an existing compound?
Data Sources Include: Clinical Research
 PubMed
 www.clinicaltrials.gov
 FDA.gov
Patient Support Sites
 patientslikeme.com
 dailystrength.org
 askapatient.com
Social Media Platforms
 Reddit
 Twitter
 Facebook
 Clinical Trial Data (e.g., Patient Diaries)
 Call Center Notes
 Documents
Internal
The abundance of data provides tremendous opportunity…
And an overwhelming amount of data points:
7
We can help separate meaningful from meaningless
MAPPING DATA SOURCES TO USE CASES
8
The Value of Unstructured Data Analytics
Availability of Data
Relevant data is readily available:
9
CASE STUDY: PHARMACOVIGILANCE
10
Case-Study: Pharmacovigilance
Data Source:Reddit
11
234M Unique Users 853,824 Subreddits 11,464 Active Communities
217 Countries 8 Billion Page Views Monthly 13+ minutes spent on
Average
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
Analyzing all of the
post titles can yield
value…
But analyzing the
conversations people are
having, and associated
metadata like post date
and # of comments can
be infinitely more
powerful
The Challenges with Analyzing Externally Sourced Unstructured Data:
• There are thousands of posts, and tens of thousands (and more) comments
• Without technology and a methodical text mining processes, gaining insight would require manual review
and data collection.
The amount of time to mine insights from the data would take on the order of months making
it difficult to impact business decisions
13
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
• To source the data we wrote a Python script to crawl the site and scrape
the data
• We ran a query on Reddit, using ‘Lipitor’ as the search term and analyzed
the results using Python and Oracle Big Data Discovery
• The following are some visualizations and insights we were able to glean
from the data.
14
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
15
We are able to view a quick top-line summary of the data set and KPIs:
And a distribution of where posts have been submitted
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
16
Symptoms that people discuss, buried in the comments section of the posts have been tagged,
aggregated and visualized in a Tag Cloud:
And we can see how the volume of comments about the symptoms has changed over time:
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
17
We are able to see distribution of comments by location…
And limit our analysis to a geographic location of interest. Our summary data updates automatically based on
this refinement:
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
18
We can set up alerts that tell us when Pfizer products are mentioned:
And configure the alerts to show us the terms that were used to flag them:
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
19
Users have complete visibility into the source data and finding key words and phrases is facilitated by
powerful search technology:
Sample Use-Case: Pharmacovigilance
Data Source: www.reddit.com
Summary
• Robust publicly available unstructured data provides opportunities to inform multiple use-cases, including:
- Pharmacovigilance
- Competitor Analysis
- Market Research
- Expedited Clinical Trial End-Point Development
• For most companies these data points represent a difficult ‘aspirational’ data source for inclusion in Business Processes
• Barriers include:
- Identifying the relevant publicly available data sources
- Technical challenges associated with sourcing the data
- Methodology/Technical approach to generating insights (Text Analytics)
- Integrating insights into Business Processes
• Baker Tilly can help!
Proposed next steps
• Custom Demo Development
 Conduct 1/2 day onsite Discovery working-session
 Define high-value use-case for demo
 Identify 2-3 high value unstructured sources for inclusions in demo
 Develop 4-5 visualizations to demonstrate value and surface insights
20
The Value of Unstructured Data Analytics
Summary & Proposed Next Steps
Interested in learning more? Contact Andrew Malinow, PhD

Life Science Analytics

  • 1.
    Baker Tilly refersto Baker Tilly Virchow Krause, LLP, an independently owned and managed member of Baker Tilly International. © 2012 Baker Tilly Virchow Krause, LLP Baker Tilly Management Consulting Realizing Business Value from Unstructured Data
  • 2.
    THE VALUE OFUNSTRUCTURED DATA ANALYTICS 2
  • 3.
    There is atremendous opportunity to gain a competitive advantage by analyzing unstructured data Industries continue to struggle with integrating unstructured analytics into their business models. It is time consuming to identify all of the relevant data source and technically challenging to consume the data into an analytics environment, where additional processing needs to occur before the data can be analyzed. Leveraging a Broad Variety of Data: Companies must be able to transform and parse data from multiple sources and in multiple formats: databases, text files, scientific devices, transactions, and even social media postings. End users also need easy, consistent access to all of this data to create a 360-degree view- of their customers, their products, or their brand. The Value of Unstructured Data Analytics Unlocking the True Potential of Big Data 3
  • 4.
    The Value ofUnstructured Data Analytics Mapping data sources to use-cases Business use-cases that can benefit from an analysis of unstructured data include: • Clinical Trial Development-Analysis: Expedite the analysis of patient diary and Patient Reported Outcome data to reduce time-to-market (and potentially uncover unanticipated benefits in early stage trials) • Clinical Trial-PRO Development- Analyzing publicly available discussion forum data can accelerate the development of Patient Reported Outcome measures and streamline the FDA’s protocol review process • Active Market Surveillance (Pharmacovigilance): Are patients using and experiencing your product in a manner that is consistent with your Clinical Trial data? • Market Intelligence: Understanding how your customers are describing their experiences with specific medications can inform market positioning and facilitate targeted messaging • Labelling Claim Expansion: Are there unanticipated applications and benefits that are being articulated by customers that can be used to inform programmatic expansion of an existing compound? Data Sources Include: Clinical Research  PubMed  www.clinicaltrials.gov  FDA.gov Patient Support Sites  patientslikeme.com  dailystrength.org  askapatient.com Social Media Platforms  Reddit  Twitter  Facebook  Clinical Trial Data (e.g., Patient Diaries)  Call Center Notes  Documents Internal
  • 5.
    The abundance ofdata provides tremendous opportunity…
  • 6.
    And an overwhelmingamount of data points:
  • 7.
    7 We can helpseparate meaningful from meaningless
  • 8.
    MAPPING DATA SOURCESTO USE CASES 8
  • 9.
    The Value ofUnstructured Data Analytics Availability of Data Relevant data is readily available: 9
  • 10.
  • 11.
    Case-Study: Pharmacovigilance Data Source:Reddit 11 234MUnique Users 853,824 Subreddits 11,464 Active Communities 217 Countries 8 Billion Page Views Monthly 13+ minutes spent on Average
  • 12.
    Sample Use-Case: Pharmacovigilance DataSource: www.reddit.com Analyzing all of the post titles can yield value… But analyzing the conversations people are having, and associated metadata like post date and # of comments can be infinitely more powerful
  • 13.
    The Challenges withAnalyzing Externally Sourced Unstructured Data: • There are thousands of posts, and tens of thousands (and more) comments • Without technology and a methodical text mining processes, gaining insight would require manual review and data collection. The amount of time to mine insights from the data would take on the order of months making it difficult to impact business decisions 13 Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 14.
    • To sourcethe data we wrote a Python script to crawl the site and scrape the data • We ran a query on Reddit, using ‘Lipitor’ as the search term and analyzed the results using Python and Oracle Big Data Discovery • The following are some visualizations and insights we were able to glean from the data. 14 Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 15.
    15 We are ableto view a quick top-line summary of the data set and KPIs: And a distribution of where posts have been submitted Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 16.
    16 Symptoms that peoplediscuss, buried in the comments section of the posts have been tagged, aggregated and visualized in a Tag Cloud: And we can see how the volume of comments about the symptoms has changed over time: Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 17.
    17 We are ableto see distribution of comments by location… And limit our analysis to a geographic location of interest. Our summary data updates automatically based on this refinement: Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 18.
    18 We can setup alerts that tell us when Pfizer products are mentioned: And configure the alerts to show us the terms that were used to flag them: Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 19.
    19 Users have completevisibility into the source data and finding key words and phrases is facilitated by powerful search technology: Sample Use-Case: Pharmacovigilance Data Source: www.reddit.com
  • 20.
    Summary • Robust publiclyavailable unstructured data provides opportunities to inform multiple use-cases, including: - Pharmacovigilance - Competitor Analysis - Market Research - Expedited Clinical Trial End-Point Development • For most companies these data points represent a difficult ‘aspirational’ data source for inclusion in Business Processes • Barriers include: - Identifying the relevant publicly available data sources - Technical challenges associated with sourcing the data - Methodology/Technical approach to generating insights (Text Analytics) - Integrating insights into Business Processes • Baker Tilly can help! Proposed next steps • Custom Demo Development  Conduct 1/2 day onsite Discovery working-session  Define high-value use-case for demo  Identify 2-3 high value unstructured sources for inclusions in demo  Develop 4-5 visualizations to demonstrate value and surface insights 20 The Value of Unstructured Data Analytics Summary & Proposed Next Steps Interested in learning more? Contact Andrew Malinow, PhD

Editor's Notes

  • #4 https://www.informatica.com/content/dam/informatica-com/global/amer/us/collateral/executive-brief/big-data-pharmaceutical-industry_ebook_2341.pdf