1
competitive. intelligence.
Text
Mining
OSFair 2017
Manuel Noya
Sept 7, 2017. Athens
Who I am
Researcher turned entrepreneur:
BSc Chemical Engineering (USC, Spain) and MSc Materials Science (UPM, Spain).
Know-how: technology scouting, competitive intelligence, NPD (new product
development), early-stage startups and innovation strategy.
Researcher for NEOKER (Spain) and SRI International (CA): an awarded spin-off from
USC (Spain), now scaling up in China. R&D projects in materials science for over 3
years at SRI International (Menlo Park, CA).
Cofounded Linknovate.com in May 2012 in Palo Alto (CA) and went through Stanford
University accelerator program (StartX) in 2013.
The company provides clients such as BMW AG and REPSOL with competitive
intelligence software tools.
CEO since 2014, now a 7 people company growing internationally.
Our credentials
4 4
We founded Linknovate en 2012 in Stanford University, CA.
We have been awarded 3 EU projects (grants) by the EC and have gone through 2 of the most prestigious startup
accelerators in the world.
Some of our satisfied clients:
Practise today
Problem today
Over 560 organizations
826 identified
504 contacted
158 positive
answers
66 detailed
info
Experts funnel
“Biometric Sensors”
826 experts identified in over 460 organizations worldwide
12
Aspire Food Group
13
2. Fresh User-Generated data: “We focus on bringing modern agricultural techniques to the farming of
insects. Currently our focus is primarily on the production of insects as food and as an ingredient.”
“We have been focused on producing insects for food for over three years. With regard to TRL, we have products on
the market, so arguably TRL8-9.”
3.Willingness: “Joint tech development / R&D collaboration; Partnerships - e.g. grant applications like NSF, DOE or
EU projects; Consider an investment (equity or other). We re open to discussing opportunities.”
4. Companies/ Groups recommended: “(…) the only North American company that we have seen any
credible results from is Tiny Farms, though there are indications that at least one of the established CPG companies
has begin making significant investments in R&D. Looking further afield, there are a number of companies in Europe
(particularly France and the Netherlands) that seem sophisticated.”
Email:
gm@aspirefg.com
1. Technology Maturity (2 aspects)
Insects as Protein Source: TRL 8 or higher. Already in the market
Industrial Processing of Insects: TRL 7. Less than 1,5y to be in the market
User Case: “Insects as an Alternative Source of Proteins”
User Generated Data (detailed, fresh, assesing willingness). Expert profile
Otros
1-23
Camera-based
TRL
Investigación aplicada
Prueba de concepto validada
Wearables
1-2 3 4
4 5-65-6 7 7
User Case: “biometric sensors”
User Generated Data (classified in categories). Companies’ visualization
Universidades
Relevancia
2 3 4 5
14
Centros de
Investigación
Compañías
Prot. en entorno de lab.
Prot. en entorno de altura.
Mayor
powered by
Market - Clients
16
Text Mining [unstructured text]. Insights
Ranking of Active Entities Worldwide
17
Ranking of Active Entities Worldwide (II)
18
Working in systems for reduction of irrelevant information during searches.
Works in the text-mining based bioassay neighboring analysis as a standalone or as a complementary
tool for the PubChem bioassay neighboring process to enable efficient integration of assay results
and generate hypotheses for the discovery of bioactivities of the tested reagents.
Ranking of Active Entities Worldwide (III)
19
Study investigating the usefulness of natural language processing (NLP) as an adjunct to dictionary-
based concept normalization. Methods used: two biomedical concept normalization systems,
MetaMap and Peregrine, with and without the use of a rule-based NLP module.
Interested in automated extraction of useful biomedical information from unstructured text.
Especially in the importance of named entity recognition and relationship extraction as fundamental
approaches that are relevant to systems biology.
Insights from Linknovate.com
20
Academic sources dominate over
‘industrial signals’.
We may be still waiting for the ‘great’
stuff to land?
Algorithmia over
‘Document Classification’,
‘Ontologies’, ‘Feature
Analysis’, ‘Rule Extraction’,
etc
Insights from Linknovate.com (II)
21
High academic activity,
primarily with universities as
the most involved
organisations.
Highlight: small-medium size
companies are the most active
in 2017…
Global Data Comparison
Comparing search engines of complementary results
Results show similar trends in all 3 engines and data sources.
22
Sources
• Linknovate (publications, conf proceedings, grants, patent apps, news, web monitoring)
• PubMed (publications, conf proceedings)
• Google Patents (patents)
Text Mining [unstructured text]. Academic Key-players
Stanford University – U.S.A.
https://www.stanford.edu/
24
Text Mining for Adverse Drug Events: the Promise,
Challenges, and State of the Art
A Framework for the Automatic Extraction of Rules
from Online Text
Learning the Structure of Biomedical Relationships
from Unstructured Text
References in this topic
This article provides an overview of recent advances in
pharmacovigilance driven by the application of text mining.
This paper presents a general-purpose framework for acquiring more
complex relationships from text and then encoding this knowledge
as rules.
Here we describe a novel algorithm, Ensemble Biclustering for
Classification (EBC), that learns the structure of biomedical
relationships automatically from text, overcoming differences in
word choice and sentence structure.
Stanford University Network
25
Elon University – U.S.A.
https://www.elon.edu/home/
26
CRI: RUI: CI-EN: Infrastructure to Enable Mining and
Analysis of Open Source Software Engineering
Artifacts
Awarded with a NSF Grant in 2014
This NSF CRI supported Research at Undergraduate Institutions (RUI)
project will integrate, expand and enhance several distinct data
sources currently used by three research communities: those who
study Free, Libre, and Open Source Software (FLOSS), the larger
empirical software engineering research community, and researchers
engaged in data mining and text mining.
Old Dominion University – U.S.A.
https://www.odu.edu/
27
Collaboration with James Madison University in a publication
related to the conpetitive analysis that three companies make
around the content that their customers generate in social
networks
This paper describes an in-depth case study which applies text mining
to analyze unstructured text content on Facebook and Twitter sites of
the three largest pizza chains: Pizza Hut, Domino's Pizza and Papa
John's Pizza. The results reveal the value of social media competitive
analysis and the power of text mining as an effective technique to
extract business value from the vast amount of available social media
data.
Social media competitive analysis and text mining: A case
study in the pizza industry
University of Illinois at Urbana - Champaign – U.S.A.
http://illinois.edu/
28
In this paper we introduce a text cube architecture designed to
organize social media data in multiple dimensions and
hierarchies for efficient information query and visualization from
multiple perspectives.
SocialCube: A Text Cube Framework for Analyzing Social
Media Data
Collaboration with Cornell University and the company
Intelligent Automation, Inc. in a publication related to the
study of social and cultural behaviors through the contents
generated by users of social networks
Text Mining [unstructured text]. Industry Key-players
IBM – U.S.A.
https://www.ibm.com/us-en/
30
Towards comprehensive longitudinal healthcare data
capture
Collaboration with Wright State University in a publication
about the use of text mining in unstructured clinical texts.
In this work therefore, we explore a pattern-based approach for
extracting Smoker Semantic Types (SST) from unstructured clinical
notes.
IBM Network
31
Linguamatics – U.K.
https://www.linguamatics.com/
32
Linguamatics_
deploying innovative NLP text mining software
-> high-value knowledge discovery
& decision support.
KBSI (Knowledge Based Systems, Inc.) – U.S.A.
https://www.kbsi.com/
33
SOME PARTNERS
Amenity Analytics – U.S.A.
http://www.amenityanalytics.com/
34
$7.6M of raised funds in August 2017
INVESTORS
Amenity Analytics provides next-generation
Text-Mining AI Platform. A leading edge text
analytics platform that allows customers to
identify actionable signals from unstructured
data.
AYLIEN – Ireland
http://aylien.com/
35
$1.14M of raised funds
$580k in March 2016
Aylien is an artificial intelligence
startup that focuses on creating
technologies that help machines
understand humans better. The firm
provides text analysis and news
API's that allow users to make sense
of human-generated content at
scale. They also provide a range of
content analysis solutions to
developers, data scientists,
marketers and academics.
INVESTORS
Lexalytics – U.S.A.
https://www.lexalytics.com/
36
Lexalytics transforms global
conversations into meaningful and
actionable insights. Their leading
text analysis platforms process
billions of pieces of unstructured
data, translating thoughts and
feelings into profitable decisions for
their customers. Lexalytics helps
companies implement vital
feedback and monitoring programs
that create an ongoing dialogue
with their customers.
Bitext – U.S.A.
https://www.bitext.com/
37
$900k of raised funds in January 2015
Bitext develops multilingual analytics
technology in 30 languages. The company
takes an approach to text analysis, using
linguistic knowledge as a scientific base.
38
Text & Data Mining for
Competitive Intelligence
manuel@linknovate.com | skype: manu_noia
Gracias!

OSFair2017 Workshop | Text mining

  • 1.
  • 3.
    Who I am Researcherturned entrepreneur: BSc Chemical Engineering (USC, Spain) and MSc Materials Science (UPM, Spain). Know-how: technology scouting, competitive intelligence, NPD (new product development), early-stage startups and innovation strategy. Researcher for NEOKER (Spain) and SRI International (CA): an awarded spin-off from USC (Spain), now scaling up in China. R&D projects in materials science for over 3 years at SRI International (Menlo Park, CA). Cofounded Linknovate.com in May 2012 in Palo Alto (CA) and went through Stanford University accelerator program (StartX) in 2013. The company provides clients such as BMW AG and REPSOL with competitive intelligence software tools. CEO since 2014, now a 7 people company growing internationally.
  • 4.
    Our credentials 4 4 Wefounded Linknovate en 2012 in Stanford University, CA. We have been awarded 3 EU projects (grants) by the EC and have gone through 2 of the most prestigious startup accelerators in the world. Some of our satisfied clients:
  • 10.
  • 11.
    Over 560 organizations 826identified 504 contacted 158 positive answers 66 detailed info Experts funnel “Biometric Sensors” 826 experts identified in over 460 organizations worldwide
  • 12.
  • 13.
    13 2. Fresh User-Generateddata: “We focus on bringing modern agricultural techniques to the farming of insects. Currently our focus is primarily on the production of insects as food and as an ingredient.” “We have been focused on producing insects for food for over three years. With regard to TRL, we have products on the market, so arguably TRL8-9.” 3.Willingness: “Joint tech development / R&D collaboration; Partnerships - e.g. grant applications like NSF, DOE or EU projects; Consider an investment (equity or other). We re open to discussing opportunities.” 4. Companies/ Groups recommended: “(…) the only North American company that we have seen any credible results from is Tiny Farms, though there are indications that at least one of the established CPG companies has begin making significant investments in R&D. Looking further afield, there are a number of companies in Europe (particularly France and the Netherlands) that seem sophisticated.” Email: gm@aspirefg.com 1. Technology Maturity (2 aspects) Insects as Protein Source: TRL 8 or higher. Already in the market Industrial Processing of Insects: TRL 7. Less than 1,5y to be in the market User Case: “Insects as an Alternative Source of Proteins” User Generated Data (detailed, fresh, assesing willingness). Expert profile
  • 14.
    Otros 1-23 Camera-based TRL Investigación aplicada Prueba deconcepto validada Wearables 1-2 3 4 4 5-65-6 7 7 User Case: “biometric sensors” User Generated Data (classified in categories). Companies’ visualization Universidades Relevancia 2 3 4 5 14 Centros de Investigación Compañías Prot. en entorno de lab. Prot. en entorno de altura. Mayor powered by
  • 15.
  • 16.
  • 17.
    Ranking of ActiveEntities Worldwide 17
  • 18.
    Ranking of ActiveEntities Worldwide (II) 18 Working in systems for reduction of irrelevant information during searches. Works in the text-mining based bioassay neighboring analysis as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.
  • 19.
    Ranking of ActiveEntities Worldwide (III) 19 Study investigating the usefulness of natural language processing (NLP) as an adjunct to dictionary- based concept normalization. Methods used: two biomedical concept normalization systems, MetaMap and Peregrine, with and without the use of a rule-based NLP module. Interested in automated extraction of useful biomedical information from unstructured text. Especially in the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology.
  • 20.
    Insights from Linknovate.com 20 Academicsources dominate over ‘industrial signals’. We may be still waiting for the ‘great’ stuff to land? Algorithmia over ‘Document Classification’, ‘Ontologies’, ‘Feature Analysis’, ‘Rule Extraction’, etc
  • 21.
    Insights from Linknovate.com(II) 21 High academic activity, primarily with universities as the most involved organisations. Highlight: small-medium size companies are the most active in 2017…
  • 22.
    Global Data Comparison Comparingsearch engines of complementary results Results show similar trends in all 3 engines and data sources. 22 Sources • Linknovate (publications, conf proceedings, grants, patent apps, news, web monitoring) • PubMed (publications, conf proceedings) • Google Patents (patents)
  • 23.
    Text Mining [unstructuredtext]. Academic Key-players
  • 24.
    Stanford University –U.S.A. https://www.stanford.edu/ 24 Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art A Framework for the Automatic Extraction of Rules from Online Text Learning the Structure of Biomedical Relationships from Unstructured Text References in this topic This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining. This paper presents a general-purpose framework for acquiring more complex relationships from text and then encoding this knowledge as rules. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure.
  • 25.
  • 26.
    Elon University –U.S.A. https://www.elon.edu/home/ 26 CRI: RUI: CI-EN: Infrastructure to Enable Mining and Analysis of Open Source Software Engineering Artifacts Awarded with a NSF Grant in 2014 This NSF CRI supported Research at Undergraduate Institutions (RUI) project will integrate, expand and enhance several distinct data sources currently used by three research communities: those who study Free, Libre, and Open Source Software (FLOSS), the larger empirical software engineering research community, and researchers engaged in data mining and text mining.
  • 27.
    Old Dominion University– U.S.A. https://www.odu.edu/ 27 Collaboration with James Madison University in a publication related to the conpetitive analysis that three companies make around the content that their customers generate in social networks This paper describes an in-depth case study which applies text mining to analyze unstructured text content on Facebook and Twitter sites of the three largest pizza chains: Pizza Hut, Domino's Pizza and Papa John's Pizza. The results reveal the value of social media competitive analysis and the power of text mining as an effective technique to extract business value from the vast amount of available social media data. Social media competitive analysis and text mining: A case study in the pizza industry
  • 28.
    University of Illinoisat Urbana - Champaign – U.S.A. http://illinois.edu/ 28 In this paper we introduce a text cube architecture designed to organize social media data in multiple dimensions and hierarchies for efficient information query and visualization from multiple perspectives. SocialCube: A Text Cube Framework for Analyzing Social Media Data Collaboration with Cornell University and the company Intelligent Automation, Inc. in a publication related to the study of social and cultural behaviors through the contents generated by users of social networks
  • 29.
    Text Mining [unstructuredtext]. Industry Key-players
  • 30.
    IBM – U.S.A. https://www.ibm.com/us-en/ 30 Towardscomprehensive longitudinal healthcare data capture Collaboration with Wright State University in a publication about the use of text mining in unstructured clinical texts. In this work therefore, we explore a pattern-based approach for extracting Smoker Semantic Types (SST) from unstructured clinical notes.
  • 31.
  • 32.
    Linguamatics – U.K. https://www.linguamatics.com/ 32 Linguamatics_ deployinginnovative NLP text mining software -> high-value knowledge discovery & decision support.
  • 33.
    KBSI (Knowledge BasedSystems, Inc.) – U.S.A. https://www.kbsi.com/ 33 SOME PARTNERS
  • 34.
    Amenity Analytics –U.S.A. http://www.amenityanalytics.com/ 34 $7.6M of raised funds in August 2017 INVESTORS Amenity Analytics provides next-generation Text-Mining AI Platform. A leading edge text analytics platform that allows customers to identify actionable signals from unstructured data.
  • 35.
    AYLIEN – Ireland http://aylien.com/ 35 $1.14Mof raised funds $580k in March 2016 Aylien is an artificial intelligence startup that focuses on creating technologies that help machines understand humans better. The firm provides text analysis and news API's that allow users to make sense of human-generated content at scale. They also provide a range of content analysis solutions to developers, data scientists, marketers and academics. INVESTORS
  • 36.
    Lexalytics – U.S.A. https://www.lexalytics.com/ 36 Lexalyticstransforms global conversations into meaningful and actionable insights. Their leading text analysis platforms process billions of pieces of unstructured data, translating thoughts and feelings into profitable decisions for their customers. Lexalytics helps companies implement vital feedback and monitoring programs that create an ongoing dialogue with their customers.
  • 37.
    Bitext – U.S.A. https://www.bitext.com/ 37 $900kof raised funds in January 2015 Bitext develops multilingual analytics technology in 30 languages. The company takes an approach to text analysis, using linguistic knowledge as a scientific base.
  • 38.
    38 Text & DataMining for Competitive Intelligence manuel@linknovate.com | skype: manu_noia Gracias!