Data Analytics and Industry-Academic Partnerships: An Irish Perspective


Published on

Business Intelligence Workshop (BIWO 2014) / Santiago, Chile / 14th August 2014

Published in: Technology

Data Analytics and Industry-Academic Partnerships: An Irish Perspective

  1. 1. John Breslin - NUI Galway - @johnbreslin Data Analytics and Industry-Academic Partnerships: An Irish Perspective
  2. 2. First: Ireland and Chile! •  John/Juan Garland, governor of Valdivia; Ambrosio O’Higgins (from Sligo), Chile governor and founder of first transcontinental postal service; his son Bernardo O’Higgins, first president of independent Chile; John/Juan McKenna and Thomond O’Brien, Chilean-Irish independence fighters; grandson Benjamin McKenna, writer and liberal politician; George O’Brien, founder of the Chilean navy; Patricio Lynch, Chilean-Irish naval hero (grandfather from Galway) and ancestor of Che Guevara; drill head used in rescue of Atacama miners was made in Ireland (County Clare) • for-chile.html (a colleague of mine from NUI Galway wrote this) • •
  3. 3. 1. Menlo Castle (origin of Menlo Park in Silicon Valley, California) 2. Computer Museum of Ireland (at DERI) 3. NUI Galway (where Stoney, namer of the “electron”, was a prof.) 4. Java’s (Zachary Quinto, AKA Spock, waited on tables here) OnePageCRM Insight Ex Ordo 1 2 3 4 5 G A L W A YT E C H M A P European Microcity of the Future 5. Claddagh (birthplace of the Claddagh Ring, and Angel from Buffy!) 6. Ignite TTO Business Innovation Centre 7. Galway Technology Centre 8. Innovation in Business Centre (at GMIT 9. Marine Institute 7 8 6 @johnbreslin @technologyvo @startupgalw #upgalway #gaillimhabu v1.201405271
  4. 4. NUI Galway introduction
  5. 5. NUI Galway in brief •  Established in 1845: •  One of Ireland’s seven universities •  105 hectare campus (260 acres) •  120 links with universities around the world •  17,300 students: •  12,500 undergraduates, 3,600 postgraduates, 1,200 other •  2,541 staff: •  1,078 academics, 1,015 admin and support, 448 research •  90,000 alumni in over a hundred countries
  6. 6. Famous alumni •  Alice Perry, first female graduate engineer in the world, 1849 •  Michael O’Shaughnessy, a Civil Engineering graduate from the University in the 1880’s, was San Francisco Chief Engineer, and commissioned the Golden Gate Bridge •  Honorary degrees to Nelson Mandela, Hillary Clinton •  TV and movie star Martin Sheen (The West Wing’s President Bartlet) studied here in 2006/2007
  7. 7. •  Includes the largest School of Engineering in Ireland (finished 2011, 14,000 square metres, €43 million) •  Information Technology, Electrical and Electronic Engineering, Biomedical Engineering, Mechanical Engineering, Civil Engineering, DERI (now Insight) The College of Engineering and Informatics
  8. 8. Insight Centre for Data Analytics Incorporating DERI (Digital Enterprise Research Institute) at NUI Galway
  10. 10. 1 = 10bytes 18 exabyte 1000,000,000,000,000,000
  11. 11. 1 = 10bytes 18 exabyte 20,000 x all of the printed material in the US Library of Congress. Or all of the words spoken by humans. Ever!
  12. 12. 1 = 10bytes 18 exabyte 6 ! hours But, we now create this much information every
  13. 13. 1 = 10bytes 18 exabyte Volume. Velocity. Variation. +Veracity.
  14. 14. A PARADIGM
 SHIFT algorithm! ! data algorithm ! ! data
  15. 15. Politics Epidemiology Sport
  16. 16. FROM PATTERNS TO PREDICTIONS From simple inertial sensing… … to longitudinal gait analysis. Detecting gait-cycle
 deviations ⇒ falls prediction Predicting functional decline.
  18. 18. TURNING DATA INTO DECISIONS Using sensor data to 
 optimise forestry resources 
 during harvesting. Stem volume prediction, yield management etc. Remote sensing + autonomous cutter control.
  19. 19. TURNING DATA INTO KNOWLEDGE Creating a network of knowledge. Data ⇒ Semantics Semantics ⇒ Discovery e.g. discovering links 
 between drugs, genes, 
 and diseases.
  20. 20. MINING PATTERNS FROM REAL-TIME SOURCES Physical shockwaves travel at 4.8km/sec but knowledge of the earthquake traveled at 70km/ sec to Galway. 14:42 Earthquake strikes." " 14:43 First tweet from @Bacanalnica in 
 nearby Managua." " 14:44 120 secs later the first tweets are posted.
  21. 21. Insight Centre Future ProjectsLegacy Projects
  22. 22. 28
  23. 23. Peracton (DERI spin out)
  24. 24. SindiceTech (spin out)
  25. 25. Case studies: finding insights in business data 1.  Finding expertise and content 2.  Holistic energy management 3.  AYLIEN text analytics 4.  Social analytics for recommendation and communities
  26. 26. •  Saffron ( extracts knowledge from text, with business applications in expert finding, community detection, recommender systems, and enterprise search, e.g.: 1.  Ecommerce system with Kennys Bookshop to analyse book descriptions and reviews to extract a fine-grained book topic categorisation for use in book recommendation to customers 2.  EnRG entity relatedness for applications in semantic search (EnRG is built over a large matrix on Wikipedia and using the DBpedia ontology) 1. Expertise and content
  27. 27. Kennys Bookshop
  28. 28. 2. Holistic energy management Managing energy related to: •  Office IT •  Data centres •  Facilities •  Business travel •  Daily commutes Keep in mind business context: •  Energy expended •  Finances required •  Resource allocation •  Human resources •  Asset management
  29. 29. Making smart buildings smarter (have air conditioning at CO2 peaks)
  30. 30. Energy management software can be unintuitive/difficult to use
  31. 31. We can do better!
  32. 32. More challenges •  Technology and data interoperability: data scattered among different systems, multiple incompatible technologies make it difficult to use •  Interpreting dynamic and static data: sensors, ERP, BMS, assets databases •  Need to proactively identify efficiency opportunities •  Empowering actions and including users in the loop •  Understanding of direct and indirect impacts of activities •  Embedding impacts within business processes •  Engaging users
  33. 33. Applications Energy Analysis Model Complex Events Situation Awareness Apps Energy and Sustainability Dashboards Decision Support Systems LinkedData Support Services Entity Management Service Data Catalog Complex Event Processing Engine Provenance Search & Query Sources Adapter Adapter Adapter Adapter Adapter Energy saving applications Energy awareness Semantic event processing Collaborative data management Cloud of energy data Linked sensor middleware Resource Description Framework (RDF) Semantic sensor networks Constrained application protocol (CoAP) Linked energy intelligence
  34. 34. You…
  35. 35. …are part of a team…
  36. 36. …sharing a ‘shell’
  37. 37. 3. AYLIEN text analytics •  AYLIEN is based in Dublin, backed by SOS Ventures •  7 employees, started as B2C, switched to B2B in 2014 •  Vision to “extract reality from data” (information retrieval domain) •  Research collaboration with NUI Galway through John (sentiment analysis on large-scale social media) •
  38. 38. Text Analysis API (TAA) ●  A package of easy-to-use tools for extracting information and insights from any text ●  Language detection ●  Supported Languages (EN, DE, PT, ES, IT, FR) ●  168 customers ●  Academic, ad intelligence and brand protection, sentiment analysis/opinion mining, PR and media, CRMs, education, psychology/interest graph ●  Endpoints: o  Extraction o  Classification o  Summarisation o  Concept/entity extraction o  Hashtag suggestion o  Sentiment analysis
  39. 39. How is TAA used? o  Three major methods for deploying text analytics services: API (via the “cloud”), on-premises deployments, other integrations o  TAA is mainly provided using an API/subscription model (monthly) via Mashape or 3scale o  Additional integrations with Google Spreadsheets and other platforms (Telerik, Azuqua) o  In future: on-premises deployment, subscription (yearly), custom solutions (bespoke)
  40. 40. Under the TAA hood… •  Based on Machine Learning (ML) techniques (supervised, unsupervised and semi-supervised) •  Extraction: useful for scraping text, media and metadata from web pages •  Annotate text, media and metadata in a training set •  Extract a set of heuristic rules and use them to extract text, media and metadata •  Summarisation: Extracting n-best key sentences from a document, based on heuristics and a sentence similarity matrix (initially), learning over time •  Classification and document-level sentiment analysis: assigning a label to any piece of text (“sports”, “technology”, “positive”, “negative”) •  Create word vectors from an annotated dataset •  Train a classifier, use it to predict future classes for a new instance •  Similar to a spam filter •  Concept extraction: Find what is mentioned in a document and disambiguate them based on contextual clues e.g. Apple is mentioned, how do we find out if it’s the fruit or the company?
  41. 41. TAA market (SMEs) Segments: SMEs, enterprises Size: “many times the $2bn forecast” US, UK, Germany, Spain, India See “Text Analytics 2014: User Perspectives on Solutions and Providers” Market: natural language processing [related markets: machine learning, text mining] SME segment: AlchemyAPI, Semantria (Lexalytics), Textalytics, Fluxifi Enterprise segment: SAS, IBM, Lexalytics Main target: SMEs Differentiators: feature-richness, quality, price, progression how-5-natural-language-processing-apis- stack/analysis/2014/07/28
  42. 42. 4. Social analytics •  Some applications include cross-domain recommendations, community detection and evolution monitoring •  SemStim (Cisco) •  Whassapi (Volvo Ocean Race) •  SociaLens (ROBUST)
  43. 43. SemStim A spreading activation based algorithm tuned for RDF- based social semantic networks
  44. 44. How does it work? User profile with DBpedia URIs from multiple source domains Cross-domain recommendation algorithm using DBpedia as background knowledge Input Background knowledge C
  45. 45. Advansse
  46. 46. Whassappi: real-time event and topic detection GraphDB Networks Slices Ranked  Communities,   Users  and  Labels Feedback: Users  to  Follow Raw  Tweets RelationalDB Relevant  Tweets Signal: Updated  Communities  Data Tweets,  Topics  & Communities Relationships Initial  Tweets  &   Communities  Data Tweets ClassifierTwitter Stream Filter Entities  Extraction Analytics Streaming  Flow On-­‐demand  Flow System  Triggered  Flow Data  Flow  Types Mobile   App
  47. 47. SociaLens: insights into enterprise communities
  48. 48. Thanks! Any questions?