Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Knowledge Will Propel Machine Understanding of Big Data


Published on

Preview video:
CCKS Keynote, August 2017:
SEAS Summer School, July 2017
Related paper:
CCKS Conf had over 500 attendees- some photos:

Knowledge Will Propel Machine Understanding of Big Data

  1. 1. Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing: Wright State University, Dayton, Ohio Knowledge will Propel Machine Understanding of Big Data Keynote at the China Conference on Knowledge Graph and Semantic Computing, Chengdu, China, 26- 29 August 2017. Invited talk at the Summer School on Learning in Data Science: Models, Algorithms and Tools, Ahmedabad, 17 July 2017. Colloquium at Fraunhofer- Berlin, 23 Aug 2017. 1
  2. 2. Machine Intelligence - we will interpret it much more broadly than Google: “all aspect of machine learning”... We will define it as machines (any system) performing similar to (nearly emulating) human intelligence. For this talk, our focus will be limited to (big) data/content - esp. How will machines “understand” the data/signals/observations, so that it can (help) take timely and good (evidence based) decision and actions. 2
  3. 3. About how brain works and brain inspired computing 3
  4. 4. • The astounding bandwidth of your senses is 11 million bits of information every second. • In conscious activities like reading, the human brain distills approximately 40 bits of information per second. …and do it efficiently and at scale The Brain: Inspiration for Intelligent Processing: What if we could automate the interpretation of data? 4
  5. 5. Credit: Looi Consulting ( big-data/) • In 2008, data generated > storage available. Less than 0.5% of data get analyzed. • Vast variety of data: text > images > A/V > genome sequencing > IoT. • Of all the data generated, which data is relevant, and why? Which data to analyze? Which data can offer insight? Who cares for what data? How to get attention to a human decision maker? What we need is intelligent processing to get actionable, smart data. A Big Challenge and Opportunity in Recent Times Scale of Data Analysis of Data Different forms of Data Uncertainty of Data 5
  6. 6. First used in 2004; redefined in 2013: Smart data makes sense out of big data. How do we solve problems with real-world complexity, gather vast amounts of data, diverse knowledge, and come up with intelligent decisions and timely actions? Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of big data, in-turn providing actionable information and improving decision making. 6
  7. 7. Levels of Abstraction Hyperthyroidism Elevated Blood Pressure Systolic blood pressure of 150 mmHg “150” ... ... Interpreted data (abductive) [in OWL] e.g., diagnosis Interpreted data (deductive) [in OWL] e.g., threshold Interpreted data (deductive) [in RDF] e.g., label Raw data [in TEXT] e.g., number Intellego SSN Ontology 7
  8. 8. Reasoning Knowledge and Experience Intelligence: Knowledge/Experience + Reasoning 8
  9. 9. On Reasoning - Another talk! 9
  10. 10. 10
  11. 11. Today’s focus is on how do computers better “understand” diverse, multimodal data With the focus on the role knowledge plays, often complementing/enhancing ML and NLP techniques, in contextual “understanding” of data to help solve the problem for which the data is potentially relevant. This encompasses topics of information extraction and semantic annotation. 111
  12. 12. Creation & Use of Knowledge ~2000 12
  13. 13. 13 Short detour: it is becoming easier to find or create relevant knowledge for a given application • Existence of large knowledge bases • Ability to search/find a relevant knowledge bases [WI’13] • Ability extract a relevant subset [IEEE Big Data’16] • Ability to enrich - by deriving new concepts and new facts [BIBM’12] Knowledge graphs are already playing influential roles in many applications involving big data, starting with search [15 years of search & knowledge graphs].
  14. 14. 14 Knowledge Graphs become prominent Linked Open Data > 9960 datasets, > 149 B triples 38.3 M entities and 8.8 B facts Google Knowledge Graph 570 M entities and 18 B facts annotations Linkedln knowledge graph
  15. 15. 15 Domain-specific knowledge extraction from LOD Linked Open Data Book related information? Filter relevant datasets Extract relevant portion of a data set Project Gutenberg DBpedia DBTropes Books, Countries, Drugs Books, movie, games Books Book specific DBpedia Book specific DBTropes
  16. 16. 16 Ability to enrich knowledge graphs Atrial fibrillation Hypertension Diabetes Fatigue Syncope Weight loss Chest pain Discomfort in chest Dizzy Shortness of Breath Nausea Vomiting Headache Cough Weight gain Initial knowledge graph on disorder and symptoms Patient Notes Atrial fibrillation Hypertension Diabetes Chest pain Weight gain Discomfort in chest Cough Headache Edema Shortness of Breath Initial knowledge base does not know about edema. Can Edema be a symptom of any of the disorders mentioned according to the patient notes?
  17. 17. 17 Knowledge plays an indispensable role in deeper understanding of content Especially interesting situations: I. Large amounts of training data are unavailable, II. The objects to be recognized are complex, such as implicit entities and highly subjective content, and III.Applications need to use complementary or related data in multiple modalities/media.
  18. 18. 18 Challenging Examples/Applications I. Implicit entity recognition and linking II. Understanding and analyzing drug abuse related discussions on web forums III.Understanding city traffic dynamics using sensor and textual observations IV.Emoji similarity and sense disambiguation
  19. 19. 19 Implicit Entity Recognition and Linking Sujan Perera, Pablo N. Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad Thirunarayan. Implicit Entity Linking in Tweets. Extended Semantic Web Conference. Heraklion, Crete, Greece : Springer; 2016. p. 118-132. Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex, Christopher Heid, Greg Mott. Implicit Entity Recognition in Clinical Documents. 4th Joint Conference on Lexical and Computational Semantics (*SEM) 2015. Denver, CO: Association for Computational Linguistics; 2015. p. 228-238.
  20. 20. 20 Implicit Entity Recognition and Linking Named Entity Recognition Relationship Extraction Entity Linking Implicit information extraction
  21. 21. 21
  22. 22. 22
  23. 23. 23
  24. 24. 24 Understanding and Analyzing Drug Abuse Related Discussions on Web Forums Cameron, Delroy, Gary A. Smith, Raminta Daniulaityte, Amit P. Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z. Watkins, and Russel Falck. "PREDOSE: a semantic web platform for drug abuse epidemiology using social media." Journal of biomedical informatics 46, no. 6 (2013): 985-997.
  25. 25. Codes Triples (subject-predicate-object) Suboxone used by injection, negative experience Suboxone injection-causes-Cephalalgia Suboxone used by injection, amount Suboxone injection-dosage amount-2mg Suboxone used by injection, positive experience Suboxone injection-has_side_effect-Euphoria experience sucked, didn’t do shit, bad headache feel pretty damn good, feel great Sentiment Extraction +ve -ve Triples DOSAGE PRONOUN INTERVAL Route of Admin. RELATIONSHIPS SENTIMENTS DIVERSE DATA TYPES ENTITIES I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It gave me a bad headache, for hours, and I almost vomited. I could feel the bupe working but overall the experience sucked. Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now. Buprenorphine subClassOf bupe Entity Identification has_slang_term SuboxoneSubutex subClassOf bupey has_slang_term Drug Abuse Ontology (DAO) 83 Classes 37 Properties 33:1 Buprenorphine 24:1 Loperamide 25
  26. 26. 26 Ontology Lexicon Lexico-ontology Rule-based Grammar ENTITIES TRIPLES EMOTION INTENSITY PRONOUN SENTIMENT DRUG-FORM ROUTE OF ADM SIDEEFFECT DOSAGE FREQUENCY INTERVAL Suboxone, Kratom, Heroin, Suboxone-CAUSE-Cephalalgia disgusted, amazed, irritated more than, a, few of I, me, mine, my Im glad, turn out bad, weird ointment, tablet, pill, film smoke, inject, snort, sniff Itching, blisters, flushing, shaking hands, difficulty breathing DOSAGE: <AMT><UNIT> (e.g. 5mg, 2-3 tabs) FREQ: <AMT><FREQ_IND><PERIOD> (e.g. 5 times a week) INTERVAL: <PERIOD_IND><PERIOD> (e.g. several years) PREDOSE: Smarter Data through Shared Context and Data Integration
  27. 27. 27 Understanding city traffic using sensor and textual observations Pramod Anantharam, Krishnaprasad Thirunarayan, Surendra Marupudi, Amit Sheth, Tanvi Banerjee. Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations. In 30th AAAI Conference on Artificial Intelligence (AAAI-16). Phoenix, Arizona; 2016. Pramod Anantharam, Krishnaprasad Thirunarayan, Amit Sheth. Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases. In 2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013) at SIAM International Conference on Data Mining (SDM 2013). Austin, Texas; 2013.
  28. 28. 28 By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1. 1 The Crisis of Public Transport in India. 2 IBM Smarter Traffic. Modes of Transportation in Indian Cities The Texas Transportation Institute (TTI) Congestion report for the United States Severity of the Traffic Problem [2011] 2030
  29. 29. 29 • What time to start? • What route to take? • What is the reason for traffic? • Wait for some time or re-route? Questions Asked Daily
  30. 30. 30 Complementary Data Sources
  31. 31. 31Image credit: Multiple Events Varying influence Interact with Each Other Challenge: Non-linearity in Traffic Dynamics
  32. 32. 32 7 × 24 LDS(1,1), LDS(1,2) ,…., LDS(1,24) LDS(7,1), LDS(7,2) ,…., LDS(7,24) . . . di hj Mon. Tue. Wed. Thu. Fri. Sat. Sun. Mon. Tue. Wed. Thu. Fri. Sat. Sun.Speed/travel-time time series data from a link. Time series data for each hour of day (1-24) for each day of week (Monday – Sunday). Mean time series computed for each day of week and hour of day along with the medoid. 168 LDS models for each link; Total models learned = 425,712 i.e., (2,534 links × 168 models per link). Step 1: Index data for each link for day of week and hour of day utilizing the traffic domain knowledge for piece-wise linear approximation Step 2: Find the “typical” dynamics by computing the mean and choosing the medoid for each hour of day and day of week Step 3: Learn LDS parameters for the medoid for each hour of day (24 hours) and each day of week (7 days) resulting in 24 × 7 = 168 models for each link Learning Context-specific LDS Models
  33. 33. 33 Tagging Anomalies with LDS Models Log likelihood min. and max. values obtained from five number summary Compute Log Likelihood for each hour of observed data (di,hj) LDS(hj,di) LDS(1,1), LDS(1,2) ,…., LDS(1,24) LDS(7,1), LDS(7,2) ,…., LDS(7,24) . . d i hj (Input) Speed and travel-time time Observations from a link Train? Tag Anomalous hours using the Log Likelihood Range Lik(1,1), Lik(1,2) ,…., Lik(1,24) Lik(7,1), Lik(7,2) ,…., Lik(7,24) L= Yes (Training Phase) No (di,hj) (min. likelihood) (Output) Anomalies . .
  34. 34. 34 Hourly Traffic Dynamics Over a Day
  35. 35. 35 Most of the drivers tend to go 5 km/h over the posted speed limit. There are relatively few drivers who go more than 10 km/h over the posted speed limit. There are situations in a day where the drivers are going (forced) below the speed limit e.g., rush hour traffic. Do these histograms resemble any probability distribution? Traffic Data: Possible Explanation
  36. 36. 36 Public Safety Urban Planning Gov. & Agency Admin. Energy & water Environmental TransportationSocial Programs Healthcare Education Twitter as a Source of City Events
  37. 37. 37 Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. 2015. Extracting City Traffic Events from Social Streams. ACM Trans. Intell. Syst. Technol. 6, 4, Article 43 (July 2015), 27 pages. DOI: 10.1145/2717317. Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B- LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O O Extracting City Events from Textual Data
  38. 38. 38 Complementary Events Traffic Incident; road-construction Textual Events from Tweets vs. Complementary
  39. 39. 39 Corroborative Events Fog visibility-air-quality; fog Textual Events from Tweets vs. Corroborative
  40. 40. 40 Timeliness Concert Concert Textual Events from Tweets vs. Timeliness
  41. 41. Image Credit: Overturned Truck Domain knowledge in the form of traffic vocabulary Domain knowledge of traffic flow synthesized from sensor data Explained-by Horizontal operator: relating/mapping data from different modality to a concept (theme) within a spatio-temporal context; Spatial context even include what it means to have a slow traffic for the type of road ( Understanding: Semantic Annotation of Sensor + Textual Data Utilizing Background Knowledge 41
  42. 42. 42 This example demonstrates use of: • Multimodal data streams (types of events from text - signature from sensor data). • Multiple sources of declarative knowledge/ontologies. • Semantic annotations and enrichments. • Use of rich representation (PGM) • learned probabilistic models improved using declarative knowledge • Statistical approach to create normalcy models and understand anomalies using historical data. Explain anomalies using extracted events. • use declarative knowledge to approximate nonlinear models using a collection of linear dynamical systems • Provide actionable information. How traffic analysis captures complexity of the real-world?
  43. 43. 43 Emoji Similarity and Sense Disambiguation Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo 2016). Bellevue, WA, USA; 2016. Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: An Open Service and API for Emoji Sense Discovery. In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web Intelligence 2017). Leipzig, Germany; 2017.
  44. 44. 44 • 6B messages with emoji are exchanged everyday!
  45. 45. 45 Understanding Emoji Meanings • The ability to automatically process, derive meaning, and interpret text fused with emoji will be essential to understand emoji • Having access to knowledge bases that capture emoji meaning can play a vital role in representing, contextually disambiguating, and converting emoji into text • They can help to leverage already existing NLP techniques for processing and better understanding emoji
  46. 46. 46 EmojiNet: A machine-readable emoji sense inventory Creating of EmojiNet, with Nonuple of an emoji
  47. 47. 47 Emoji Sense Disambiguation “The ability to identify the meaning of an emoji in the context of a message in a computational manner” Emoji usage in social media with multiple senses Currently there’s no labeled dataset that can be used to solve emoji sense disambiguation in a supervised learning setting.
  48. 48. 48 Tackling Emoji Sense Disambiguation • Use Simplified LESK algorithm to disambiguate emoji sense
  49. 49. 49 Emoji Similarity “Given two or more emoji, how to calculate the semantic similarity between them in a computational manner?” Top-5 emoji pairs with highest inter-annotator agreement for each ordinal value from 0 to 4 for two questions. Here, the Q1 was on the equivalence of the two emoji and the Q2 was on the relatedness between them. Ordinal values 0 and 4 represent the least and the highest relatedness/equivalence, respectively.
  50. 50. 50 Using EmojiNet to measure Emoji Similarity • Different types of emoji meanings extracted from EmojiNet are used to model the meaning of an emoji (more details on
  51. 51. 51 Using EmojiNet to measure Emoji Similarity • We combine distributional semantics of words (learned via word embeddings) and emoji definitions in EmojiNet (external knowledge) to model emoji embeddings • Our emoji embeddings models outperform the previous emoji embedding models (based on purely distributional semantics) by ~10% in a benchmark sentiment analysis task
  52. 52. 52 Knowledge-based Approaches and the Resulting Improvements Problem Domain Use of Knowledge/Knowledge bases Problems we could solve that could not be solved (well) w/o knowledge Implicit Entity Linking Adapted UMLS definitions for identifying medical entities, and Wikipedia and Twitter data for identifying Twitter entities Was not solved before Understanding Drug Abuse-related Discussions Application of Drug Abuse Ontology along with slang term dictionaries and grammar Not solved well at all Traffic Data Analysis Statistical knowledge extraction and using ontologies for Twitter event extraction Multi-modal data stream correlation and explanation virtually impossible Emoji Similarity and Sense Disambiguation Generation and application of EmojiNet Emoji interpretation solved much better
  53. 53. 53 Take away “Data alone is not enough”: Consider combining data-centric/bottom up/statistical learning with knowledge-based/top down techniques • To improve understanding of simpler content • To understand complex content and concepts • To understand heterogeneous/multimodal content
  54. 54. 54 Cognitive Computing Semantic Computing Perceptual Computing Key contributors and collaborators for this talk: Sanjaya Wijeratne Dr. T.K. PrasadSujan Perera Thank You Thanks to the NSF and NIH. Thank you, and please visit us at
  55. 55. My group and some of the Kno.e.sis faculty 55
  56. 56. Ohio Center of Excellence in Knowledge-enabled Computing Wright State University 56