Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OSFair2017 Workshop | Text mining


Published on

Manuel Noya talks about the science-industry relationship driven by competitive intelligence and how to surf emerging technologies

Workshop title:TDM unlocking a goldmine of information

Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.


Published in: Science
  • Be the first to comment

  • Be the first to like this

OSFair2017 Workshop | Text mining

  1. 1. 1 competitive. intelligence. Text Mining OSFair 2017 Manuel Noya Sept 7, 2017. Athens
  2. 2. Who I am Researcher turned entrepreneur: BSc Chemical Engineering (USC, Spain) and MSc Materials Science (UPM, Spain). Know-how: technology scouting, competitive intelligence, NPD (new product development), early-stage startups and innovation strategy. Researcher for NEOKER (Spain) and SRI International (CA): an awarded spin-off from USC (Spain), now scaling up in China. R&D projects in materials science for over 3 years at SRI International (Menlo Park, CA). Cofounded in May 2012 in Palo Alto (CA) and went through Stanford University accelerator program (StartX) in 2013. The company provides clients such as BMW AG and REPSOL with competitive intelligence software tools. CEO since 2014, now a 7 people company growing internationally.
  3. 3. Our credentials 4 4 We founded Linknovate en 2012 in Stanford University, CA. We have been awarded 3 EU projects (grants) by the EC and have gone through 2 of the most prestigious startup accelerators in the world. Some of our satisfied clients:
  4. 4. Practise today Problem today
  5. 5. Over 560 organizations 826 identified 504 contacted 158 positive answers 66 detailed info Experts funnel “Biometric Sensors” 826 experts identified in over 460 organizations worldwide
  6. 6. 12 Aspire Food Group
  7. 7. 13 2. Fresh User-Generated data: “We focus on bringing modern agricultural techniques to the farming of insects. Currently our focus is primarily on the production of insects as food and as an ingredient.” “We have been focused on producing insects for food for over three years. With regard to TRL, we have products on the market, so arguably TRL8-9.” 3.Willingness: “Joint tech development / R&D collaboration; Partnerships - e.g. grant applications like NSF, DOE or EU projects; Consider an investment (equity or other). We re open to discussing opportunities.” 4. Companies/ Groups recommended: “(…) the only North American company that we have seen any credible results from is Tiny Farms, though there are indications that at least one of the established CPG companies has begin making significant investments in R&D. Looking further afield, there are a number of companies in Europe (particularly France and the Netherlands) that seem sophisticated.” Email: 1. Technology Maturity (2 aspects) Insects as Protein Source: TRL 8 or higher. Already in the market Industrial Processing of Insects: TRL 7. Less than 1,5y to be in the market User Case: “Insects as an Alternative Source of Proteins” User Generated Data (detailed, fresh, assesing willingness). Expert profile
  8. 8. Otros 1-23 Camera-based TRL Investigación aplicada Prueba de concepto validada Wearables 1-2 3 4 4 5-65-6 7 7 User Case: “biometric sensors” User Generated Data (classified in categories). Companies’ visualization Universidades Relevancia 2 3 4 5 14 Centros de Investigación Compañías Prot. en entorno de lab. Prot. en entorno de altura. Mayor powered by
  9. 9. Market - Clients
  10. 10. 16 Text Mining [unstructured text]. Insights
  11. 11. Ranking of Active Entities Worldwide 17
  12. 12. Ranking of Active Entities Worldwide (II) 18 Working in systems for reduction of irrelevant information during searches. Works in the text-mining based bioassay neighboring analysis as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.
  13. 13. Ranking of Active Entities Worldwide (III) 19 Study investigating the usefulness of natural language processing (NLP) as an adjunct to dictionary- based concept normalization. Methods used: two biomedical concept normalization systems, MetaMap and Peregrine, with and without the use of a rule-based NLP module. Interested in automated extraction of useful biomedical information from unstructured text. Especially in the importance of named entity recognition and relationship extraction as fundamental approaches that are relevant to systems biology.
  14. 14. Insights from 20 Academic sources dominate over ‘industrial signals’. We may be still waiting for the ‘great’ stuff to land? Algorithmia over ‘Document Classification’, ‘Ontologies’, ‘Feature Analysis’, ‘Rule Extraction’, etc
  15. 15. Insights from (II) 21 High academic activity, primarily with universities as the most involved organisations. Highlight: small-medium size companies are the most active in 2017…
  16. 16. Global Data Comparison Comparing search engines of complementary results Results show similar trends in all 3 engines and data sources. 22 Sources • Linknovate (publications, conf proceedings, grants, patent apps, news, web monitoring) • PubMed (publications, conf proceedings) • Google Patents (patents)
  17. 17. Text Mining [unstructured text]. Academic Key-players
  18. 18. Stanford University – U.S.A. 24 Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art A Framework for the Automatic Extraction of Rules from Online Text Learning the Structure of Biomedical Relationships from Unstructured Text References in this topic This article provides an overview of recent advances in pharmacovigilance driven by the application of text mining. This paper presents a general-purpose framework for acquiring more complex relationships from text and then encoding this knowledge as rules. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure.
  19. 19. Stanford University Network 25
  20. 20. Elon University – U.S.A. 26 CRI: RUI: CI-EN: Infrastructure to Enable Mining and Analysis of Open Source Software Engineering Artifacts Awarded with a NSF Grant in 2014 This NSF CRI supported Research at Undergraduate Institutions (RUI) project will integrate, expand and enhance several distinct data sources currently used by three research communities: those who study Free, Libre, and Open Source Software (FLOSS), the larger empirical software engineering research community, and researchers engaged in data mining and text mining.
  21. 21. Old Dominion University – U.S.A. 27 Collaboration with James Madison University in a publication related to the conpetitive analysis that three companies make around the content that their customers generate in social networks This paper describes an in-depth case study which applies text mining to analyze unstructured text content on Facebook and Twitter sites of the three largest pizza chains: Pizza Hut, Domino's Pizza and Papa John's Pizza. The results reveal the value of social media competitive analysis and the power of text mining as an effective technique to extract business value from the vast amount of available social media data. Social media competitive analysis and text mining: A case study in the pizza industry
  22. 22. University of Illinois at Urbana - Champaign – U.S.A. 28 In this paper we introduce a text cube architecture designed to organize social media data in multiple dimensions and hierarchies for efficient information query and visualization from multiple perspectives. SocialCube: A Text Cube Framework for Analyzing Social Media Data Collaboration with Cornell University and the company Intelligent Automation, Inc. in a publication related to the study of social and cultural behaviors through the contents generated by users of social networks
  23. 23. Text Mining [unstructured text]. Industry Key-players
  24. 24. IBM – U.S.A. 30 Towards comprehensive longitudinal healthcare data capture Collaboration with Wright State University in a publication about the use of text mining in unstructured clinical texts. In this work therefore, we explore a pattern-based approach for extracting Smoker Semantic Types (SST) from unstructured clinical notes.
  25. 25. IBM Network 31
  26. 26. Linguamatics – U.K. 32 Linguamatics_ deploying innovative NLP text mining software -> high-value knowledge discovery & decision support.
  27. 27. KBSI (Knowledge Based Systems, Inc.) – U.S.A. 33 SOME PARTNERS
  28. 28. Amenity Analytics – U.S.A. 34 $7.6M of raised funds in August 2017 INVESTORS Amenity Analytics provides next-generation Text-Mining AI Platform. A leading edge text analytics platform that allows customers to identify actionable signals from unstructured data.
  29. 29. AYLIEN – Ireland 35 $1.14M of raised funds $580k in March 2016 Aylien is an artificial intelligence startup that focuses on creating technologies that help machines understand humans better. The firm provides text analysis and news API's that allow users to make sense of human-generated content at scale. They also provide a range of content analysis solutions to developers, data scientists, marketers and academics. INVESTORS
  30. 30. Lexalytics – U.S.A. 36 Lexalytics transforms global conversations into meaningful and actionable insights. Their leading text analysis platforms process billions of pieces of unstructured data, translating thoughts and feelings into profitable decisions for their customers. Lexalytics helps companies implement vital feedback and monitoring programs that create an ongoing dialogue with their customers.
  31. 31. Bitext – U.S.A. 37 $900k of raised funds in January 2015 Bitext develops multilingual analytics technology in 30 languages. The company takes an approach to text analysis, using linguistic knowledge as a scientific base.
  32. 32. 38 Text & Data Mining for Competitive Intelligence | skype: manu_noia Gracias!