Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 2.1 ontological representation of the telecom domain for advanced ai applications

379 views

Published on

Talk at SEMANTiCS 2017
www.semantics.cc

Session 2.1 ontological representation of the telecom domain for advanced ai applications

  1. 1. Ontological representation of the telecom domain for advanced AI applications Felix Burkhardt, Joachim Stegmann: Deutsche Telekom AG Till Plumbaum, Christian Sauer: DAI (Distributed Artificial Intelligence) Labor Tilman Becker, Michael Feld: DFKI (German Research Center for AI)
  2. 2. 2 Creating an ontology in the telecom domain Overview • Research and development project together with scientific partners • DAI-Labor, Distributed Artificial Intelligence Laboratory, Berlin • DFKI (German research center for AI), Saarbrücken
  3. 3. 3 Creating an ontology in the telecom domain Contents • Motivation • Ontology Creation: • Datasources • Manual creation of upper ontology • Retrieving suggestions for new concepts, synonyms, relations • Comparing traditional machine learning with DNNs • Ontology storage and maintenance • Natural lanuguage interface • Translating Natural queries to SPARQL, Answer generation • Application examples • Audio based semantic search, Chatbot integration • Summary and Outlook
  4. 4. 4 Creating an ontology in the telecom domain Motivation • Statistical data mining is successful but has its limits • Rule based systems can be used when training data is sparse • Data-based / rulse based can be combined when rules are learned from data • Modeling semantic knowledge explicitly can help knowledge retrieval applications • One example would be disambiguation for question answering in „chatbots“, i.e. automatic dialog systems
  5. 5. 5 Creating an ontology in the telecom domain Motivation Why are we developing an own ontology? • An ontology is deeply connected to the company‘s knowledge / internal data and can exist in a vendor independent format • Separating the ontology from the supplier lessens the dependence to one supplier • One common ontology about a company‘s domain can be updated by common data-sources and reused by different applications
  6. 6. 6 Creating an ontology in the telecom domain Datasources • Several datasources get harvested: • Forum posts • Official website • XML files (product specs) • Chatlogs
  7. 7. 7 Creating an ontology in the telecom domain Manual creation of upper Ontology • A general ontology for a domain big as the telecom domain is a challenge • Needs to cover different areas, such as sales, infrastructure, customer support and many more • Two design decisions were made: 1. Concentrate on one area after another – not all at once • Starting with customer support use case 2. Creating an ontology/taxonomy for each area - relations between different areas are added when needed • Each area is created by area experts and ontology experts Early version of the ontology showing the broad concepts included
  8. 8. 8 Creating an ontology in the telecom domain Retrieving suggestions for new concepts • After the manual creation of the upper structure, we used crawler techniques to gather more concepts automatically • Different sources were used • Telekom Hilft Forum • Telekom Product Website • XML Data • For each source we created a specialized crawler • New sub-concepts, e.g. from Telekom Product Website new VR hardware as part of the general Home concept • New attributes from product data, e.g. XML such as 5G as new transmission speed
  9. 9. Creating an ontology in the telecom domain Retrieving suggestions for new concepts 9 • New Concepts retrieved for the Home (Zuhause) concept include for example sub-concepts like EntertainTV or Geräte (Devices) • These sub-concepts are then populated with devices and device information (also automated by crawling the relevant sources) Concept Zuhause with automatically added sub-concepts Sub-concept WLANundRouter with automatically entities
  10. 10. 10 Creating an ontology in the telecom domain Retrieving suggestions for new Synonyms • Customers tend to use different words for the same thing – Synonyms • Our ontology should cover all those different words • Important in e.g. a search use case • To retrieve the synonyms (and also misspellings) we used shallow neuronal nets (word2vec) and fasttext to learn from a big corpus of user conversations • Corpus is Telekom Hilft Forum with over 2 Million text snippets W2V and fasttext deliver different views on a concept
  11. 11. 11 Creating an ontology in the telecom domain Retrieving suggestions for new Synonyms ML Approach allows us to get Information for yet unknown devices. Word2Vec based on Deep Learning for J, skip gram version. Word2Vec operates word- based, fasttext character- based, so it finds similar terms even for unknown words
  12. 12. 12 Creating an ontology in the telecom domain Comparing traditional machine learning with DNNs • We investigated topic classification of the forum-posts • Compared „classical machine learning“ with Deep neural nets. • Both resulted in 55% accuracy rsp. 83% „one in three“ • Also investigated subclustering with DNN (4 subclusters per category) Apache Lucene preprocessing NER / disambuiguation Multinomial Naive Bayes Deep Temporal Convolutional Neural Network * Comparison
  13. 13. 13 Creating an ontology in the telecom domain Adding arbitrary relations • Not done automatically yet. • Of course term candidates from natrual language harvesting might become alt-labels for relations • For now, relations are added manually, derived from appliaction use-cases
  14. 14. 14 Creating an ontology in the telecom domain Ontology storage and maintenance • Started with Protege and switched to Poolparty now • Scalabilty • Interfaces • NLP integration • Maintenance
  15. 15. 15 Creating an ontology in the telecom domain Translating Natural queries to SPARQL • Design Time: NL model Generator builds example sentences from question templates and ontology entities • Runtime: Nuance Mix model processes input sentence. The extracted paramters are converted into a SPARQL query and executed on the ontology. Results are converted back to text. 13 NL Model Generator Ontology Q&A Templates (Classification) NL Model (Nuance .trsx file) DialogRuntime e.g. „Which smartphones havea changeable battery?“ e.g. „The iPhone7 and the Samsung Galaxy S7“ SPARQL Query forresponse EntitiesList CloudUpload / Download User Input Chatbot Output Static Generation NLU via Mix API Query Generation JSON Text/ Speech Facts Answer Generation
  16. 16. 16 Creating an ontology in the telecom domain NL MODEL EXAMPLE • The Nuance Mix web interface allows the definition of intents and parameters. • For each intent, several example sentences should be provided. • Here, the intents are the different Q&A templates (question types).
  17. 17. 17 Creating an ontology in the telecom domain Answer generation • Upper part: The Q&A template database defines how SPARQL queries look for different linguistic question structures. • Lower part: An example question is executed and the presented intermediate and final results are generated. Running Query on Ontology (via Jena) Reading question classification: Intent ID + SPARQL Template + Parameter Ask question [via Text or Speech input] List: SCOWWS_SELECT ?s WHERE {?s rdf:type/rdfs:subClassOf* dtag:%s.?s dtag:%s?v. FILTER(?v = "%s").}_subject1_predicate1_object1 SCOWWI_SELECT ?s WHERE {?s rdf:type/rdfs:subClassOf* dtag:%s. ?s dtag:%s?v. FILTER(REGEX(str(?v), "%s")).}_subject1_predicate1_object1 … JSON "interpretations": [{ "action": {"intent": "value": "SCOWWS"}}, "concepts": { "object1":[{"literal": “changeable", "value": "changeable"}], "predicate1": [{"literal": “battery", "value": “battery" }], "subject1":[{"literal": "Smartphones", "value": "Smartphone"}]}, "literal":„Which smartphoneshaveachangeablebattery"}] Evaluation using Mix.nlu Model via WebSocket (NLU Service) via WebSocket (NLU Service) 1) Extract intent + concepts 2) Based on IntentID, create final SPARQL query via SPARQL Template Answer dtag:Alcatel2051silver dtag:SamsungGalaxyJ52016black dtag:SamsungGalaxyXcover3SMG389Fsilver SCOWWS: Question by Subjectwhere Subject = Class; WITHOUTComputation; WITH Predicate, WITH Object(String) SPARQL Query Ontologie SELECT?s WHERE {?s rdf:type/rdfs:subClassOf*dtag:Smartphone.?s dtag:battery ?v.FILTER(?v = “changeable").}
  18. 18. 18 Creating an ontology in the telecom domain Application examples: Audio based semantic search • Human agents get supported by intelligent content suggestions • Project Highlights • Intelligent Q&A is highly appreciated by DT agents • Full export of product XML data provides good precondition for AI based answer generation • Expansion of AI approach to support DT Social Media agents with recommended answers for several customer requests (Facebook & Twitter) • Learnings: • System performance depends on data material quality (structured vs. unstructured) • Social media data are highly unstructured – time is needed for manual preprocessing - currently full automated clustering and preprocessing is still under development
  19. 19. 19 Creating an ontology in the telecom domain Application examples: Audio based semantic search
  20. 20. 20 Creating an ontology in the telecom domain Application examples: Chatbot integration
  21. 21. 21 Creating an ontology in the telecom domain Summary and Outlook • We created an ontology based on RDF in the telecom domain to support our AI activities. • From several in domain data sources such as Product descriptions, chat logs and help forum posts • On the one hand the ontology will be queried directly by SPARQL queries that are derived from natural language searches. • On the other hand, it is the semantic basis for a variety of other applications such as • semantic search, • agent content assistance, • virtual digital assistant, • social media mining and • intelligent chat bot. • The advantages of maintaining an own centralized ontology are manifold: by storing the knowledge in an open standard format • we strengthen the independence of proprietary technology, • can keep parts of the data private and on-site, • and re-use the data more easily.
  22. 22. 22 Creating an ontology in the telecom domain Thanks Felix.Burkhardt@telekom.de

×