Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ai for Human Communication

415 views

Published on

AI for human communication is about recognition, parsing, understanding, and generating natural language. The concept of natural language is evolving. A key focus is the analysis, interpretation, and generation of verbal and written language. Other language focus areas include haptic, sonic, and visual language, data, and interaction.

Published in: Technology
  • Be the first to comment

Ai for Human Communication

  1. 1. This content included for educational purposes. This research deck précis information from the Forrester Digital Transformation Conference in May 2017. It compiles selected copy and visuals from conference presentations and recent Forrester research reports. Contents are organized into the following sections: • Digital transfor Machine Learning Human Communica6on Ar6ficial Intelligence Natural Language Processing: NLP|NLU|NLG InteracFon: Dialog, gesture,
 emoFon, hapFc Audible Language:
 Speech, sound Visual Language:
 2D/3D/4D WriQen Language:
 Verbal, text Formal
 Language
 Processing Symbolic Reasoning Data Deep Learning AI 
 FOR HUMAN COMMUNICATION 1
  2. 2. This content included for educational purposes. 2 • Lawrence Mills Davis is founder and managing director of Project10X, a research consultancy known for forward-looking industry studies; multi-company innovation and market development programs; and business solution strategy consulting. Mills brings 30 years experience as an industry analyst, business consultant, computer scientist, and entrepreneur. He is the author of more than 50 reports, whitepapers, articles, and industry studies. • Mills researches artificial intelligence technologies and their applications across industries, including cognitive computing, machine learning (ML), deep learning (DL), predictive analytics, symbolic AI reasoning, expert systems (ES), natural language processing (NLP), conversational UI, intelligent assistance (IA), and robotic process automation (RPA), and autonomous multi- agent systems. • For clients seeking to exploit transformative opportunities presented by the rapidly evolving capabilities of artificial intelligence, Mills brings a depth and breadth of expertise to help leaders realize their goals. More than narrow specialization, he brings perspective that combines understanding of business, technology, and creativity. Mills fills roles that include industry research, venture development, and solution envisioning. Lawrence Mills Davis Managing Director Project10X mdavis@project10x.com 202-667-6400
  3. 3. This content included for educational purposes. SECTIONS 1. AI for human communication 2. AI for natural language summarization 3. AI for natural language generation 4. AI technology evolution 3
  4. 4. AI FOR HUMAN COMMUNICATION
  5. 5. AI for human communication is about recognition, parsing, understanding, and generating natural language. The concept of natural language is evolving. Human communication encompasses visual language and conversational interaction as well as text. 5This content included for educational purposes.
  6. 6. This content included for educational purposes. This research deck précis information from the Forrester Digital Transformation Conference in May 2017. It compiles selected copy and visuals from conference presentations and recent Forrester research reports. Contents are organized into the following sections: ▪ Digital transfor 6 Overview of AI 
 for human communication •Natural language processing (NLP) is the confluence of artificial intelligence (AI) and linguistics. •A key focus is the analysis, interpretation, and generation of verbal and written language. •Other language focus areas include audible & visual language, data, and interaction. •Formal programming languages enable computers to process natural language and other types of data. •Symbolic reasoning employs rules and logic to frame arguments, make inferences, and draw conclusions. •Machine learning (ML) is a area of AI and NLP that solves problems using statistical techniques, large data sets and probabilistic reasoning. •Deep learning (DL) is a type of machine learning that uses layered artificial neural networks. Deep Learning Machine Learning Human Communica6on Ar6ficial Intelligence Natural Language Processing: NLP|NLU|NLG InteracFon: Dialog, gesture,
 emoFon, hapFc Audible Language:
 Speech, sound Visual Language:
 2D/3D/4D WriQen Language:
 Verbal, text Formal
 Language
 Processing Symbolic Reasoning Data
  7. 7. NATURAL LANGUAGE PROCESSING 7This content included for educational purposes.
  8. 8. This content included for educational purposes. 8 nat·u·ral lan·guage proc·ess·ing /ˈnaCH(ə)rəl//ˈlaNGɡwij//ˈpräˌsesˌiNG/ Natural language is spoken or wriQen speech. English, Chinese, Spanish, and Arabic are examples of natural language. A formal language such as mathemaFcs, symbolic logic, or a computer language isn't. Natural language processing recognizes the sequence of words spoken by a person or another computer, understands the syntax or grammar of the words (i.e., does a syntacFcal analysis), and then extracts the meaning of the words. Some meaning can be derived from a sequence of words taken out of context (i.e., by semanFc analysis). Much more of the meaning depends on the context in which the words are spoken (e.g., who spoke them, under what circumstances, with what tone, and what else was said, parFcularly before the words), which requires a pragmaFc analysis to extract meaning in context. Natural language technology processes queries, answers questions, finds information, and connects users with various services to accomplish tasks. What is natural language processing? NLP
  9. 9. Aoccdrnig to a rseearch taem at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht the frist and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. 9This content included for educational purposes.
  10. 10. This content included for educational purposes. How natural language interpretation & natural language generation happens 10This content included for educational purposes.
  11. 11. This content included for educational purposes. Text analytics 11 Text mining is the discovery by computer of new, previously unknown information, by automatically extracting it from different written resources. A key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of experimentation. Text analytics is the investigation of concepts, connections, patterns, correlations, and trends discovered in written sources. Text analytics examine linguistic structure and apply statistical, semantic, and machine-learning techniques to discern entities (names, dates, places, terms) and their attributes as well as relationships, concepts, and even sentiments. They extract these 'features' to databases or semantic stores for further analysis, automate classification and processing of source documents, and exploit visualization for exploratory analysis. IM messages, email, call center logs, customer service survey results, claims forms, corporate documents, blogs, message boards, and websites are providing companies with enormous quantities of unstructured data — data that is information-rich but typically difficult to get at in a usable way. Text analytics goes beyond search to turn documents and messages into data. It extends Business Intelligence (BI) and data mining and brings analytical power to content management. Together, these complementary technologies have the potential to turn knowledge management into knowledge analytics.
  12. 12. NATURAL LANGUAGE UNDERSTANDING 12This content included for educational purposes.
  13. 13. This content included for educational purposes. Speech I/O vs NLP vs NLU NLP NLU syntactic parsing machine translation named entity recognition (NER) part-of-speech tagging (POS) semantic parsing relation extraction sentiment analysis coreference resolution dialogue agents paraphrase & natural language inference text-to- speech (TTS) summarization automatic speech recognition (ASR) text categorization question answering (QA) Speech I/O 13This content included for educational purposes.
  14. 14. This content included for educational purposes. Natural language understanding (NLU) Natural language understanding (NLU) involves mapping a given natural language input into useful representations, and analyzing different aspects of the language. NLU is critical to making making AI happen. But language is more than words, and NLU involves more than lots of math to facilitate search for matching words. Language understanding requires dealing with ideas, allusions, inferences, with implicit but critical connections to the ongoing goals and plans. To develop models of NLU effectively, we must begin with limited domains in which the range of knowledge needed is well enough understood that natural language can be interpreted within the right context. One example is in mentoring in massively delivered educational systems. If we want to have better educated students we need to offer them hundreds of different experiences to choose from instead of a mandated curriculum. A main obstacle to doing that now is the lack of expert teachers. We can build experiential learning based on simulations and virtual reality enabling student to pursue their own interests and eliminate the “one size fits all curriculum.” To make this happen expertise must be captured and brought in to guide from people at their time of need. A good teacher (and a good parent) can do that, but they cannot always be available. A kid in Kansas who wants to be an aerospace engineer should get to try out designing airplanes. But a mentor would be needed. We can build AI mentors in limited domains so it would be possible for a student anywhere to learn to do anything because the AI mentor would understand what a user was trying to accomplish within the domain and perhaps is struggling with. The student could ask questions and expect good answers tailored to the student’s needs because the AI/NLU mentor would know exactly what the students was trying to do because it has a perfect model of the world in which the student was working, the relevant expertise needed, and the mistakes students often make. NLU gets much easier when there is deep domain knowledge available. Source: Roger C Shank 14
  15. 15. This content included for educational purposes. Machine reading & comprehension AI machine learning is being developed to understand social media, news trends, stock prices and trades, and other data sources that might impact enterprise decisions. 15
  16. 16. This content included for educational purposes. Example queries of the future 16 Which of these eye images shows symptoms of diabetic retinopathy? Please fetch me a cup of 
 tea from the kitchen Describe this video
 in Spanish Find me documents related to reinforcement learning for robotics 
 and summarize them in German Source: Google
  17. 17. This content included for educational purposes. 17 Source: NarraFve Science Explainable AI (XAI) New machine-learning systems will have the ability to explain their rationale, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future. State-of-the-art human- computer interface techniques will translate models into understandable and useful explanation dialogues for the end user. Source: DARPA New learning process Training data Explainable model Explanation interface This is a cat: • It has fur, whiskers, and claws. • It has this feature: • I understand why/why not • I know when it will succeed/fail
  18. 18. VISUAL LANGUAGE 18This content included for educational purposes.
  19. 19. This content included for educational purposes. Source: Robert Horn Source: Robert Horn Visual Language The integration of words, images, and shapes into a single communication unit. • Words are essential to visual language. They give conceptual shape, and supply the capacity to name, define, and classify elements, and to discuss abstractions. • Images are what we first think of when we think of visual language. But, without words and/or shapes, images are only conventional visual art. • Shapes differ from images. They are more abstract. We combine them with words to form diagramming systems. Shapes and their integration with words and/or images is an essential part of visual language. 19
  20. 20. This content included for educational purposes. 20 Source: NarraFve Science Source: Robert Horn Visual language is being created by the merger of vocabularies from many, widely different fields
  21. 21. This content included for educational purposes. Toward understanding diagrams using recurrent networks and deep learning 21 Source: AI2 Diagrams are rich and diverse. The top row depicts inter class variability of visual illustrations. The bottom row shows intra-class variation for the water cycle category. LSTM1 LSTM1 LSTM1 LSTM1 LSTM2 LSTM2 LSTM2 LSTM2 c0 c1 c2 cT [xycand, scorecand, overlapcand, … scorerel , seenrel … ] Candidate Relationships Diagram Parse Graph Stacked LSTM Network Relationship Feature Vector FC1 FC2 FC1 FC2 FC1 FC2 FC1 FC2 FC3 FC3 FC3 FC3 Add No change Add Final Fully Connected Fully Connected Architecture for inferring DPGs from diagrams. The LSTM based network exploits global constraints such as overlap, coverage, and layout to select a subset of relations amongst thousands of candidates to construct a DPG. The diagram depicts The life cycle of a) frog 0.924 b) bird 0.02 c) insecticide 0.054 d) insect 0.002 How many stages of Growth does the diagram Feature? a) 4 0.924 b) 2 0.02 c) 3 0.054 d) 1 0.002 What comes before Second feed? a) digestion 0.0 b) First feed 0.15 c) indigestion 0.0 d) oviposition 0.85 Sample question answering results. Left column is the diagram. The second column shows the answer chosen and the third column shows the nodes and edges in the DPG that Dqa-Net decided to attend to (indicated by red highlights). Diagrams represent complex concepts, relationships and events, often when it would be difficult to portray the same information with natural images. Diagram Parse Graphs (DPG) model the structure of diagrams. RNN+LSTM-based syntactic parsing of diagrams learns to infer DPGs. Adding a DPG-based attention model enables semantic interpretation and reasoning for diagram question answering.
  22. 22. This content included for educational purposes. 22 Computer vision • The ability of computers to idenFfy objects, scenes, and acFviFes in unconstrained (that is, naturalisFc) visual environments. • Computer vision has been transformed by the rise of deep learning. • The confluence of large-scale computing, especially on GPUs, the availability of large datasets, especially via the internet, and refinements of neural network algorithms has led to dramatic improvements. • Computers are able to perform some (narrowly defined) visual classification tasks better than people. A current research focus is automatic image and video captioning. This content included for educational purposes.
  23. 23. This content included for educational purposes. Image annotation 
 and captioning using
 deep learning a man riding a motorcycle 
 on a city street a plate of food with
 meat and vegetables 23
  24. 24. This content included for educational purposes. Video question-answering 24
  25. 25. AI FOR NATURAL LANGUAGE SUMMARIZATION
  26. 26. This content included for educational purposes. • The goal of automated summarization is to produce a shorter version of a source text by preserving the meaning and the key contents of the original. A well written summary reduces the amount of cognitive work needed to digest large amounts of text. • Automatic summarization is part of artificial intelligence, natural language processing, machine learning, deep learning, data mining and information retrieval. • Document summarization tries to create a representative extract or abstract of the entire document, by finding or generating the most informative sentences. • Image summarization tries to find the most representative and important (i.e. salient) images and generates explanatory captions of still or moving scenes, including objects, events, emotions, etc. 26 Automatic summarization
  27. 27. This content included for educational purposes. 27 Deep Learning Machine Learning Natural
 Language Processing Ar6ficial Intelligence • Natural Language Processing (NLP) is the confluence of Artificial Intelligence (AI) and linguistics. A key focus is analysis and interpretation of written language. • Machine Learning (or ML) is an area of AI and NLP that uses large data sets and statistical techniques for problem solving. • Deep Learning (DL) is a type of machine learning that uses neural networks (including Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN)) to process natural language and other types of data.
  28. 28. This content included for educational purposes. 28 S u m m a r i z a t i o n Output documentInput document Purpose Source size Single-document Multi-document Specificity Domain-specific General Form Audience Generic Query-oriented Usage Expansiveness Indicative Informative Derivation Conventionality Background Just-the-news Extract Abstract Partiality Neutral Evaluative Fixed Floating Scale Genre Summarization classification Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Genres of summary include: • Single-document vs. multi-document source — based on one text vs. fuses together many texts. E.g., for multi-document summaries we may want one summary with common information, or similarities and differences among documents, or support and opposition to specific ideas and concepts. • Generic vs. query-oriented — provides author’s view vs. reflects user’s interest. • Indicative vs. informative — what’s it about (quick categorization) vs. substitute for reading it (content processing). • Background vs. just-the-news — assumes reader’s prior knowledge is poor vs. up-to-date. • Extract vs. abstract — lists fragments of text vs. re-phrases content coherently.
  29. 29. This content included for educational purposes. 29 extractive abstractive select subset of words output in best order encode hidden state decode to text sequence Extractive vs. Abstractive summarization
  30. 30. This content included for educational purposes. Query Document MulFple Documents Automatic summarization machine 30 10% 50% 100% Long Very Brief Headline Brief IN OUT Extract Abstract IndicaFve InformaFve Generic Query-oriented Background Just the news Extracted summaries Computable models Abstracted summaries • Frames, templates • ProbabilisFc models • Knowledge graphs • Internal states
  31. 31. This content included for educational purposes. 31 Summary of Text Document Text Summarization Approaches Extraction
 Techniques
 (Statistics) Abstraction Techniques (Linguistic) General Techniques Statistics
 Foundation Linguistic and Mathematical Foundation Graph-based Techniques Combined Techniques: Extraction Abstraction Keyword Title Word Distance Cue Phrases Sentence Position Lexical Chains Clustering Non-negative Matrices Factorization Clustering Machine Learning Neural Networks Fuzzy Logic Wikipedia (k-base) Surface Approach SemanFc Approach
  32. 32. This content included for educational purposes. Automated summarization using statistical heuristics 32 Source Documents Extracted Sentence Summary DTM DOCUMENTS T 
 E R M S Determine vocabulary, term frequency, and most important words Vectorize sentences by word frequency Score sentences by frequency of most important words Select best scoring sentences
  33. 33. This content included for educational purposes. 33 Input document(s) Summary Pre-processing Normalizer Segmenter Stemmer Stop-word eliminator List of sentences List of pre-processed words for each sentence Processing Clustering Learning Scoring List of clusters Summary size P(f|C) Extraction Extraction Sentences scores ReOrdering List of first higher scored sentences Reordered sentences Extrac6ve summariza6on process • Preprocessing reads and cleans-up data (including stop word removal, numbers, punctuation, short words, stemming, lemmatization), and builds the document term matrix. • Processing vectorizes and scores sentences, which may entail heuristic, statistical, linguistic, graph-based, and machine learning methods. • Extraction selects, orders and stitches together highest scoring sentences, and presents the summary
  34. 34. This content included for educational purposes. Automated summarization using topic modeling 34 Select highest scoring sentence or sentences. Build document-term matrix and preprocess. Train using LDA to learn topics. Vectorize using LDA to determine which topics occur in each sentence as well as the weighted distribuFon of topics across all documents. Score sentences by how much they are dominated by the most dominant topic. Input training data. DTM DOCUMENT ST 
 E R M S Output topic modeled extracted sentence summary(s). LDA Source Documents
  35. 35. This content included for educational purposes. 35 WordsTopics ObservedObserved Latent Documents Topic modeling approaches try to model relaFonships between 
 observed words and documents by a set of latent topics. Topic modeling • A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. • Topic modeling is used to discover hidden semantic structures in a text body. • Documents are about several topics at the same time. Topics are associated with different words. Topics in the documents are expressed through the words that are used. • Latent topics are the “link” between the documents and words. Topics explain why certain words appear in a given document.
  36. 36. This content included for educational purposes. Document-Term Matrix • DTM describes the frequency of terms that occur in a collection of documents and is the foundation on which all topic modeling methods work. • The document-term matrix (DTM) describes the frequency of terms that occur in a collection of documents and is the foundation on which all topic modeling methods work. • Preprocessing steps are pretty much the same for all of the topic modeling algorithms: - Bag-of words (BOW) approaches are used, since the DTM does not contain ordering information. - Punctuation, numbers, short, rare and uninformative words are typically removed. - Stemming and lemmatization also may be applied. 36 Document-Term Matrix
  37. 37. This content included for educational purposes. • A key preprocessing step is to reduce high- dimensional term vector space to low- dimensional ‘latent’ topic space. • Two words co-occurring in a text: - signal that they are related - document frequency determines strength of signal - co-occurrence index • TF: Term Frequency — terms occurring more frequently in document are more important • IDF: Inverted Document Frequency — terms in fewer documents are more specific • TF * IDF indicates importance of term relative to the document 37 Semantic Analysis TF-IDF Dimension Reduction Semantic relatedness and TF-IDF
  38. 38. This content included for educational purposes. 38 Probabilistic topic model What is a topic? A list of probabilities for each of the possible words in a vocabulary. Example topic: • dog: 5% • cat: 5% • hause: 3% • hamster: 2% • turtle: 1% • calculus: 0.000001% • analytics: 0.000001% • .......
  39. 39. This content included for educational purposes. Convolu6onal neural network architecture for sentence classifica6on 39 This diagram illustrates a convolutional neural network (CNN) architecture for sentence classification. • It shows three filter region sizes: 2, 3 and 4, each of which has 2 filters. • Every filter performs convolution on the sentence matrix and generates (variable-length) feature maps. • Next, 1-max pooling is performed over each map, i.e., the largest number from each feature map is recorded. Thus a univariate feature vector is generated from all six maps, and these 6 features are concatenated to form a feature vector for the penultimate layer. • The final softmax layer then receives this feature vector as input and uses it to classify the sentence; here we assume binary classification and hence depict two possible output states. Source: Zhang, Y., & Wallace, B. (2015). A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification.
  40. 40. This content included for educational purposes. Several different topic modeling algorithms: • LSA — Latent semantic analysis finds smaller (lower-rank) matrices that closely approximate DTM. • pLSA — Probabilistic LSA finds topic-word and topic-document associations that best match dataset and a specified number of topics (K). • LDA — Latent Dirichlet Allocation finds topic- word and topic-document associations that best match dataset and specified number of topics that come from Dirichlet distribution with given Dirichlet priors. • Other advanced topic modeling algorithms — will briefly mention several including CTM, DTM, HTM, RTM, STM, and sLDA. 40 Topic modeling algorithms
  41. 41. This content included for educational purposes. 6x4 DOCUMENTS T E R M S = 6x4 TOPICS T E R M S X X TOP 0 0 0 0 IC 0 0 0 0 IMPO 0 0 0 0 RTAN CE 4x4 DOCUMENTS T O P I C S 41 Latent semantic analysis • LSA is a technique of distributional semantics for analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. • LSA finds smaller (lower-rank) matrices that closely approximate the document- term matrix by picking the highest assignments for each word to topic, and each topic to document, and dropping the ones not of interest. • The contexts in which a certain word exists or does not exist determine the similarity of the documents.
  42. 42. This content included for educational purposes. Aspects that describe summaries 42 Latent Dirichlet Allocation • Latent Dirichlet Allocation (LDA) is an unsupervised, probabilistic, text clustering algorithm. • LDA finds topic-word and topic-document associations that best match dataset and specified number of topics that come from Dirichlet distribution with given Dirichlet priors. • LDA defines a generative model that can be used to model how documents are generated given a set of topics and the words in the topics. • The LDA model is built as follows: 1. Estimate topics as product of observed words 2. Use to estimate document topic proportions 3. Evaluate corpus based on the distributions suggested in (1) & (2) 4. Use (3) to improve topic estimations (1) 5. Reiterate until best fit found.
  43. 43. This content included for educational purposes. 43 Source: Andrius Knispelis, ISSUU the topic distribution for document i a parameter that sets the prior on the per-document topic distributions a parameter that sets the prior on the per-topic word distributions the topic for the j’th word in a document i observed words in a document i N M Θα β Z W N words M documents A topic model developed by David Blei, Andrew Ng and Michael Jordan in 2003. It tells us what topics are present in any given document by observing all the words in it and producing a topic distribution. LATENT DIRICHLET ALLOCATION word word word word word word word word word word word word word word word word tfidf.mm wordids.txt words documents words topics model.lda Document Term Matrix Topic Model This content included for educational purposes.
  44. 44. This content included for educational purposes. Understanding LDA alpha and beta parameters 44 In practice, a high alpha-value will lead to documents being more similar in terms of what topics they contain. A high beta-value will similarly lead to topics being more similar in terms of what words they contain. α β Impact on content A high beta-value means that each topic is likely to contain a mixture of most of the words, and not any word specifically. A low value means that a topic may contain a mixture of just a few of the words. A high alpha-value means that each document is likely to contain a mixture of most of the topics, and not any single topic specifically. A low alpha value puts less such constraints on documents and means that it is more likely that a document may contain mixture of just a few, or even only one, of the topics.
  45. 45. This content included for educational purposes. 45 Source: Andrius Knispelis, ISSUU preprocess the data Text corpus depends on the application domain. It should be contextualised since the window of context will determine what words are considered to be related. The only observable features for the model are words. Experiment with various stoplists to make sure only the right ones are getting in. Training corpus can be different from the documents it will be scored on. Good all utility corpus is Wikipedia. train the model The key parameter is the number of topics. Again, depends on the domain. Other parameters are alpha and beta. You can leave them aside to begin with and only tune later. Good place to start is gensim - free python library. score it on new document The goal of the model is not to label documents, but rather to give them a unique fingerprint so that they can be compared to each other in a humanlike fashion. evaluate the performance Evaluation depends on the application. Use Jensen-Shannon Distance as similarity metric. Evaluation should show whether the model captures the right aspects compared to a human. Also it will show what distance threshold is still being perceived as similar enough. Use perplexity to see if your model is representative of the documents you’re scoring it on. LDA process This content included for educational purposes.
  46. 46. This content included for educational purposes. LDA topic modeling process 46 Topics and their Words Tuning Parameters Dictionaries Bag-of-Words Bag of- words Dictionaries Tokenization Lemmatization Stopwords Removal LDA Vector Space ModelPreprocessing Step 1: 
 Select β • The term distribution β is determined for each topic by β ∼ Dirichlet (δ). Step 2: Select α • The. proportions θ of the topic distribution for the document w are determined by: θ ∼ Dirichlet (α) Step 3: Iterate • For each of the N words wi - (a) Choose a topic zi ∼ Multinomial(θ). - (b) Choose a word wi from a multinomial probability distribution conditioned on the topic - zi : p(wi|zi, β). * β is the term distribution of topics and contains the probability of a word occurring in a given topic. * The process is purely based on frequency and co-occurrence of words • Pass through LDA algorithm and evaluate • Create document-term matrix, dictionaries, corpus of Bag-of-Words • Clean documents of as much noise as possible, for example: - Lowercase all the text - Replace all special characters and do n-gram tokenizing - Lemmatize - reduce words to their root form, e.g., “reviews” and “reviewing” to “review” - Remove numbers (e.g., “2017”) and remove HTML tags and symbols
  47. 47. This content included for educational purposes. • Correlated topic model — CTM allows topics to be correlated, leading to better prediction, which is more robust to overfitting. • Dynamic topic model — DTM models how each individual topic changes over time. • Supervised LDA — sLDA associates an external variable with each document, which defines a one-to-one correspondence between latent topics and user tags. • Relational topic model — RTM predicts which documents a new document is likely to be linked to. (E.g., tracking activities on Facebook in order to predict a reaction to an advertisement.) • Hierarchical topic model — HTM draws the relationship between one topic and another (which LDA does not) and indicates the level of abstraction of a topic (which CTM correlation does not). • Structural topic model — STM provides fast, transparent, replicable analyses that require few a priori assumptions about the texts under study. STM includes covariates of interest. Unlike LDA, topics can be correlated and each document has its own prior distribution over topics, defined by covariate X rather than sharing a mean, allowing word use within a topic to vary by covariate U. 47 Advanced 
 topic modeling techniques
  48. 48. This content included for educational purposes. Topic modeling is a form of lossy compression because it expresses a document as a vector where each element can be thought of as the weight of that topic in that document. Each element of the vector has interpretable meaning. This makes topic modeling a powerful technique to apply in many more contexts than text summarization. For example: • A preprocessing step to generate features for arbitrary text classification tasks • A way to visualize and explore a corpus by grouping and linking similar documents • A solution to the cold-start problem that plagues collaborative filtering • Applied to non-text data, including images, genetic information, and click-through data. 48 Other uses of topic modeling
  49. 49. This content included for educational purposes. 49 Query focused0mul$Qdocument0summariza$on0 • a Document Document Document Document Document Input Docs Sentence Segmentation All sentences from documents Sentence Simplification Content Selection Sentence Extraction: LLR, MMR Extracted sentences Information Ordering Sentence Realization Summary All sentences plus simplified versions Query • Multi-document summarization aims to capture the important information of a set of documents related to the same topic and presenting it in a brief, representative, and pertinent summary. • Query-driven summarization encodes criteria as search specs. The user needs only certain types of information (e.g., I know what I want! — don’t confuse me with drivel!) System processes specs top- down to filter or analyze text portions. Templates or frames order information and shape presentation of the summary.
  50. 50. This content included for educational purposes. 50 This diagram depicts three approaches to automatic summarization that start with sentence level vectorization of input using an off-the-shelf language model (Skip-thoughts) and then process source data using feedforward and recurrent neural network configurations to generate increasingly coherent extractive and abstractive summaries of the source documents. Automatic summarization using sentence-level vectorization 
 and recurrent neural networks Language Model Training Data Source Data Sentence Vectors Feedforward Neural Network Encode-Decode RNN with LSTM and AQenFon Recurrent Neural Network
 with LSTM 1 2 3 Extracted Sentence Summary More Coherent ExtracFve Summary AbstracFve Summary
  51. 51. This content included for educational purposes. 51 Semantic hashing is using a deep autoencoder as a hash- function to map a relatively small number of binary variables to map documents to memory addresses in such a way that semantically similar documents are located at nearby addresses. Semantic Hashing Function Document Semantic Hashing Address Space Semantically Silmiiar Documents European Community Energy Markets Accounts/Earnings Learntomapdocuments into small number of semantic binary codes. Retrieve similardocuments storedat the nearbyaddresses with no searchat all.
  52. 52. This content included for educational purposes. 52 Word embeddings A word’s meaning is embedded by the surrounding words. Word2vec is a two-layer neural net for pre-processing text. Its input is a text corpus. Its outputs are word embeddings — a set of feature vectors for words in that corpus. Word vectors are positioned in the space so that words that share common contexts (word(s) preceding and/or following) are located in close proximity to each other. One of two model architectures are used to produce the word embeddings distribution. These include continuous bag-of-words (CBOW) and continuous skip-gram: • With CBOW, the model predicts the current word from a window of surrounding context words without considering word order. • With skip-gram, the model uses the current word to predict the surrounding window of context words, and weighs nearby words more heavily than more distant context words. Word2vec embedding captures subtle syntactical and semantic structure in the text corpus and can be used to map similarities, analogies and compositionality.
  53. 53. This content included for educational purposes. 53 Skip-thoughts In contiguous text, nearby sentences provide rich semantic and contextual information. Skip-thought model extends the skip-gram structure used in word2vec. It is trained to reconstruct the surrounding sentences and to map sentences that share syntactic and semantic properties to similar vectors. Learned sentence vectors are highly generic, and can be reused for many different tasks by learning an additional mapping, such as a classification layer. The Skip-thought model aQempts to predict the preceding sentence (in red) and the subsequent sentence (in green), given a source sentence (in grey)
  54. 54. This content included for educational purposes. 54 Feedforward neural network Source: A Beginner’s Guide to Recurrent Networks and LSTMs Neural networks • A neural network is a system composed of many simple processing elements operating in parallel, which can acquire, store, and utilize experiential knowledge from data. • Input examples are fed to the network and transformed into an output. For example, to map raw data to categories, recognizing patterns that signal, for example, that an input image should be labeled “cat” or “elephant.” • Feedforward neural networks move information straight through (never touching a given node twice). Once trained the neural network has no notion of order in time. It only considers the current example it has been exposed, nothing before that.
  55. 55. This content included for educational purposes. 55Source: A Beginner’s Guide to Recurrent Networks and LSTMs Simple Recurrent Neural Network architecture model Recurrent neural network (RNN) • A recurrent neural network (RNN) can give itself feedback from past experiences. It maintains a hidden state that changes as it sees different inputs. Like short-term memory, this enables answers based on both current input and past experience. • RNNs are distinguished from feedforward networks by having this feedback loop. Recurrent networks take as their input not just the current input example they see, but also what they perceived one step back in time. RNNs have two sources of input, the present and the recent past, which combine to determine how they respond to new data.
  56. 56. This content included for educational purposes. 56Source: A Beginner’s Guide to Recurrent Networks and LSTMs Long short term memory (LSTM) • Long Short Term Memory (LSTM) empowers a RNN with longer- term recall. This allows the model to make more context-aware predictions. • LSTM has gates that act as differentiable RAM memory. Access to memory cells is guarded by “read”, “write” and “erase” gates. • Starting from the bottom of the diagram, the triple arrows show where information flows into the cell at multiple points. That combination of present input and past cell state is fed into the cell itself, and also to each of its three gates, which will decide how the input will be handled. • The black dots are the gates themselves, which determine respectively whether to let new input in, erase the present cell state, and/or let that state impact the network’s output at the present time step. S_c is the current state of the memory cell, and g_y_in is the current input to it. Remember that each gate can be open or shut, and they will recombine their open and shut states at each step. The cell can forget its state, or not; be written to, or not; and be read from, or not, at each time step, and those flows are represented here.
  57. 57. This content included for educational purposes. 57 Abstractive text summarization Abstractive text summarization is a two-step process: • A sequence of text is encoded into some kind of internal representation. • This internal representation 
 is then used to guide the decoding process back into the summary sequence, which may express ideas using words and phrases not found in the source. State of the art architectures use recurrent neural networks for both the encoding and the decoding step; often with attention over the input during decoding as additional help. internal representation summarysource document encoder decoder
  58. 58. This content included for educational purposes. • Training data — (Hi) RNN summarizers have the most extensive data requirements that include language models (such as word2vec and skip-thoughts) for the vectorization/ embedding step, and a large sampling of training documents. Depending on choice of algorithm(s), training documents may also need corresponding summaries. • Domain expertise — (Low) RNN summarizers generally demand less domain specific expertise or hand-crafted linguistic features to develop. Abstractive summarization architectures exist that combine RNNs and probabilistic models to cast the summarization task as a neural machine translation problem, where the models, trained on a large amount of data, learn the alignments between the input text and the target summary through an attention encoder- decoder paradigm enhanced with prior knowledge, such as linguistic features. • Computational cost — (Hi-to-very hi) RNNs require large amounts of preprocessing, and a large (post-training) static shared global state. Computations are best done on a GPU configuration. • Interpretability — (Low) RNN summarizers do not provide simple answers to the why of sentence selection and summary generation. Intermediate embeddings (and internal states) are not easily understandable in a global sense. 58 Google NMT, arxiv.org/abs/1609.08144 RNN sequence-to-sequence language translation — Chinese to English Sequence-to-sequence language translation All variants of encoder-decoder architecture share a common goal: encoding source inputs into fixed-length vector representations, and then feeding such vectors through a “narrow passage” to decode into a target output. The narrow passage forces the network to pick and abstract a small number of important features and builds the connection between a source and a target.
  59. 59. This content included for educational purposes. Example: Input: State Sen. Stewart Greenleaf discusses his proposed human trafficking bill at Calvery BapFst Church in Willow Grove Thursday night. Output: Stewart Greenleaf discusses his human trafficking bill. 59 Sentence compression with LSTMs Source: Lukasz Kaiser, Google Brain Deep learning for abstractive text summarization • If we cast the summarization task as a sequence-to-sequence neural machine translation problem, the models, trained on a large amount of data, learn the alignments between the input text and the target summary through an attention encoder-decoder paradigm. • The encoder is a recursive neural network (RNN) with long-short term memory (LSTM) that reads one token at time from the input source and returns a fixed-size vector representing the input text. • The decoder is another RNN that generates words for the summary and it is conditioned by the vector representation returned by the first network. • Also, we can increase summary quality by integrating prior relational semantic knowledge into RNNs in order to learn jointly word and knowledge embeddings by exploiting knowledge bases and lexical thesaurus.
  60. 60. This content included for educational purposes. 60 Sentence A: I saw Joe’s dog, which was running in the garden. Sentence B: The dog was chasing a cat. Summary: Joe’s dog was chasing a cat in the garden. Source: Liu F., Flanigan J., Thomson S., Sadeh N., Smith N.A. 
 Toward Abstractive Summarization Using Semantic Representations. NAACL 2015 Prior semantic knowledge • Abstractive summarization can be enhanced through integration of a semantic representation from which a summary is generated
  61. 61. This content included for educational purposes. Symbolic methods • Declarative languages (Logic) • Imperative languages 
 C, C++, Java, etc. • Hybrid languages (Prolog) • Rules — theorem provers, expert systems • Frames — case-based reasoning, model-based reasoning • Semantic networks, ontologies • Facts, propositions Symbolic methods can find information by inference, can explain answer Non-Symbolic methods • Neural networks — knowledge encoded in the weights of the neural network, for embeddings, thought vectors • Genetic algorithms • graphical models — baysean reasoning • Support vectors Neural KR is mainly about perception, issue is lack of common sense (there is a lot of inference involved in everyday human reasoning Knowledge Representation
 and Reasoning Knowledge representation and reasoning is: • What any agent—human, animal, electronic, mechanical—needs to know to behave intelligently • What computational mechanisms allow this knowledge to be manipulated? 61
  62. 62. AI FOR NATURAL LANGUAGE GENERATION
  63. 63. This content included for educational purposes. OVERVIEW 63 • Natural language generation — the process by which thought is rendered into language. Computers are learning to “speak our language” in multiple ways, for example: data-to- language, text-to-language, vision-to-language, sound-to- language, and interaction-to-language. AI for human communication is about recognizing, parsing, understanding, and generating natural language. NLG converts some kind of data into human language. Most often this means generating text from structured data. However, the current state of play is broader. To set the stage, we identify four broad classes of AI for language generation with examples. • How data-to-text natural language generation works — This section overviews the process by which data is ingested and analyzed to determine facts; then facts get reasoned over to infer a conceptual outline and a communication plan; and an intelligent narrative is generated from the facts and the plan. • Symbolic and statistical approaches to NLG — Historically, there are two broad technical approaches to NLG—symbolic reasoning and statistical learning: - Symbolic approaches apply classical AI and involve hand- crafted lexicons, knowledge, logic, and rules-based reasoning. We overview the architecture most commonly used. - Statistical learning approaches to NLG have emerged in recent years. They involve machine learning, deep learning, and probabilistic reasoning, and incorporate techniques being developed for computer vision, speech recognition and synthesis, gaming, and robotics.
  64. 64. Natural language generation (NLG) is the process by which 
 thought is rendered into language. 64 David McDonald, Brandeis University This content included for educational purposes.
  65. 65. Natural language generation (NLG) is the conversion of 
 some kind of data into human language. 65This content included for educational purposes.
  66. 66. FOUR CATEGORIES OF NLG 66This content included for educational purposes.
  67. 67. This content included for educational purposes. Data-to-text applications analyze and convert incoming (non-linguistic) data into a generated language. One way is by filling gaps in a predefined template text. Examples of this sort of "robo journalism" include: • Sports reports, such as soccer, baseball, basketball • Virtual ‘newspapers’ from sensor data • Textual descriptions of the day-to-day lives of birds based on satellite data • Weather reports • Financial reports such as earnings reports • Summaries of patient information in clinical contexts • Interactive information about cultural artifacts, for example in a museum context • Text intended to persuade or motivate behavior modification. 67 Data-to-language generation
  68. 68. This content included for educational purposes. Text-to-text applications take existing texts as their input, then automatically produce a new, coherent text or summary as output. Examples include: • Fusion and summarization of related sentences or texts to make them more concise • Simplification of complex texts, for example to make them more accessible for low-literacy readers • Automatic spelling, grammar and text correction • Automatic generation of peer reviews for scientific papers • Generation of paraphrases of input sentences • Atomatic generation of questions, for educational and other purposes. 68 Text-to-language 
 generation
  69. 69. This content included for educational purposes. Vision-to-text applications convert incoming visual data from computer vision into a generated text descriptions or answers to questions. Examples include: • Automatic captions for photographs • Automatic scene descriptions from video • Automatic generation of answers to questions based on understanding and interpretation of a diagram. 69 Vision-to-language generation
  70. 70. This content included for educational purposes. Sound-to-text applications convert incoming auditory data from microphones into a generated text. Examples include: • Automatic speech recognition • Automatic recognition of audible signals and alerts. 70 Sound-to-language generation
  71. 71. HOW DATA-TO-TEXT NLG WORKS 71This content included for educational purposes.
  72. 72. This content included for educational purposes. First, determine communication purpose and requirements 72 CONTEXT DOMAIN 
 & TOPIC EXPERTISE AUDIENCE LINGUISTIC KNOWLEDGE CONTENT
 & DATA Document Planning Micro Planning Surface
 RealizaFon Delivery InteracFon COMMUNICATION PLANNING Learning COMMUNICATION INTENT CONSIDERATIONS: • Communication purpose • Scope • Constraints • Key questions • Answer form(s) • Hypotheses • Strategy • Data exploration • Evidence • Inference • Simulation & testing • Conclusions • Messages • Styling • Delivery • Interaction • Confidence NATURAL LANGUAGE GENERATION
  73. 73. This content included for educational purposes. Steps to transform data into language 73 DATA FACTS INFER CONCEPTUAL OUTLINE INTELLIGENT NARRATIVE GENERATE ANALYZE Source: NarraFve Science Analyze data to determine facts. Reason over facts to infer a conceptual outline; order concepts into a communication plan. Generate an intelligent narrative from the facts according to the plan.
  74. 74. This content included for educational purposes. Self-service NLG example — Upload data 74 Source: Automated Insights Data
  75. 75. This content included for educational purposes. Self-service NLG example — Design article 75 Template Source: Automated Insights
  76. 76. This content included for educational purposes. Self-service NLG example — Generate narratives 76 Source: Automated Insights Narrative
  77. 77. SYMBOLIC AND STATISTICAL APPROACHES TO NLG 77This content included for educational purposes.
  78. 78. This content included for educational purposes. 78 Source: Jonathan Mugan, CEO, DeepGrammar Two technology paths to 
 natural language generation The symbolic path involves hard-coding our world into computers. We manually create representations by building groups and creating relationships between them. We use these representations to build a model of how the world works. The sub-symbolic, or statistical, path has computers learn from text using neural networks. It begins by representing words as vectors, whole sentences as vectors, then to using vectors to answer arbitrary questions. The key is creating algorithms that allow computers to learn from rich sensory experience that is similar to our own.
  79. 79. SYMBOLIC NLG 79This content included for educational purposes.
  80. 80. This content included for educational purposes. 1. Morphological Level: Morphemes are the smallest units of meaning within words and this level deals with morphemes in their role as the parts that make up word. 2. Lexical Level: This level of speech analysis examines how the parts of words (morphemes) combine to make words and how slight differences can dramatically change the meaning of the final word. 3. Syntactic Level: This level focuses on text at the sentence level. Syntax revolves around the idea that in most languages the meaning of a sentence is dependent on word order and dependency. 4. Semantic Level: Semantics focuses on how the context of words within a sentence helps determine the meaning of words on an individual level. 5. Discourse Level: How sentences relate to one another. Sentence order and arrangement can affect the meaning of the sentences. 6. Pragmatic Level: Bases meaning of words or sentences on situational awareness and world knowledge. Basically, what meaning is most likely and would make the most sense. 80 How symbolic NLP interprets language 
 (six level stack)
  81. 81. This content included for educational purposes. 1. Content determination: Deciding which information to include in the text under construction, 2. Text/document structuring: Determining in which order information will be presented in the text, 3. Sentence aggregation: Deciding which information to present in individual sentences, 4. Lexicalization: Finding the right words and phrases to express information, 5. Referring expression generation: Selecting the words and phrases to identify domain objects, 6. Linguistic realization: Combining all words and phrases into well-formed sentences. 81 Source: Reiter and Dale Natural language generation tasks
  82. 82. This content included for educational purposes. Natural language generation tasks 82 DOCUMENT PLANNING Content determination Decides what information will appear in the output text. This depends on what the communication goal is, who the audience is, what sort of input information is available in the first place and other constraints such as allowed text length. Text/document structuring Decides how chunks of content should be grouped in a document, how to relate these groups to each other and in what order they should appear. For instance, to describe last month's weather, one might talk first about temperature, then rainfall. Alternatively, one might start off generally talking about the weather and then provide specific weather events that occurred during the month. MICRO-PLANNING Sentence aggregation Decides how the structures created by document planning should map onto linguistic structures such as sentences and paragraphs. For instance, two ideas can be expressed in two sentences or in one: The month was cooler than average. The month was drier than average. vs. The month was cooler and drier than average. Lexicalization Decides what specific words should be used to express the content. For example, choosing from a lexiconthe actual nouns, verbs, adjectives and adverbs to appear in the text. Also, choosing particular syntactic structures. For example, one could say 'the car owned by Mary' or the phrase 'Mary's car'. Refering expression generation Decides which expressions should be used to refer to entities (both concrete and abstract); it is possible to refer to the same entity in many ways. For example, the month Barack Obama was first elected President of the Unted States can be referred to as: • November 2008 • November • The month Obama was elected • it SURFACE REALIZATION Linguistic realization Uses grammar rules (about morphology and syntax) to convert abstract representations of sentences into actual text. Realization techniques include template completion, hand-coded grammar-based realization, and filtering using probabilistic grammar trained on a large corpora of candidate text passages. Structure realization Converts abstract structures such as paragraphs and sentences into mark-up symbols which are used to display the text.
  83. 83. This content included for educational purposes. Rule-based modular pipeline architecture for natural language generation 83 -1- Document Planning -2- Micro-planning Text Content Determination, Text Structuring Sentence Aggregation, Lexicalization, Referring Expression Generation Linguistic Realization Communication Goals Knowledge Source -3- Surface Realization Text Plan Sentence Plan Source: Reiter and Dale
  84. 84. This content included for educational purposes. Natural Language Genera6on Natural language generation (NLG) is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation, and involves: • Text planning − It includes retrieving the relevant content from knowledge base. • Sentence planning − It includes choosing required words, forming meaningful phrases, setting tone of the sentence. • Text realization − It is mapping sentence plan into sentence (or visualization) structure, followed by text-to-speech processing and/or visualization rendering. • The output may be provided in any natural language, such as English, French, Chinese or Tagalog, and may be combined with graphical elements to provide a mulFmodal presentaFon. • For example, the log files of technical monitoring devices can be analyzed for unexpected events and transformed into alert-driven messages; or numerical Fme-series data from hospital paFent monitors can be rendered as hand-over reports describing trends and events for medical staff starFng a new shiˆ. 84
  85. 85. This content included for educational purposes. 85 VERTICAL RULESET CLIENT RULESET CORE NLG ENGINE CORE ENGINE RULESET Source: Arria NLG rulesets • Core ruleset — general purpose rules used in almost every application of the NLG engine. These capture knowledge about data processing and linguistic communication in general, independent of the particular domain of application. • Vertical ruleset — rules encoding knowledge about the specific industry vertical or domain in which the NLG engine is being used. Industry vertical rulesets are constantly being refined via ongoing development, embodying knowledge about data processing and linguistic communication, which is common to different clients in the same vertical. • Client ruleset — rules that are specific to the client for whom the NLG engine is being configured. These rules embody the particular expertise in data processing and linguistic communication that are unique to a client application.
  86. 86. This content included for educational purposes. 86 Source: NarraFve Science Example architecture for realtime data storytelling The Arria NLG Engine combines data analytics and computational linguistics, enabling it to convert large and diverse datasets into meaningful natural language narratives. Source: Arria DATA ANALYSIS Analysis and Interpretation RAW DATA Information Delivery MESSAGES SENTENCE PLANS FACTS DOCUMENT PLAN SURFACE TEXT DOCUMENT PLANNING SURFACE REALISATION DATA INTERPRETATION MICRO- PLANNING DATA ANALYSIS processes the data to extract the key facts that it contains DATA INTERPRETATION makes sense of the data, particularly from the point of view of what information can be communicated DOCUMENT PLANNING takes the messages derived from the data and works out how to best structure the information they contain into a narrative MICROPLANNING works out how to package the information into sentences to maximise fluency and coherence SURFACE REALISATION ensures that the meanings expressed in the sentences are conveyed using correct grammar, word choice, morphology and punctuation DATA can be ingested from a wide variety of data sources, both structured and unstructured NARRATIVE can be output in a variety of formats (HTML, PDF, Word, etc.), combined with graphics as appropriate, or delivered as speech
  87. 87. STATISTICAL NLG This content included for educational purposes.
  88. 88. This content included for educational purposes. Symbolic vs. Statistical NLG 88 Symbolic approaches apply classical AI and involve preprocessing, hand-crafted lexicons, knowledge, logic, and rules-based reasoning. Statistical learning involves training datasets, vectorization, embeddings, machine learning, deep learning, and probabilistic reasoning. CLASSICAL NLP DEEP LEARNING-BASED NLP
  89. 89. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. Machine Learning Machine Learning is a type of Artificial Intelligence that provides computers with the ability to learn without being explicitly programmed. Machine Learning Algorithm Learned Model Data Prediction Labeled Data Training Prediction Provides various techniques that can learn from and make predictions on data 89 Source: NarraFve Science Machine learning Source: Lukas Masuch
  90. 90. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. Deep Learning Architecture A deep neural network consists of a hierarchy of layers, whereby each layer transforms the input data into more abstract representations (e.g. edge -> nose -> face). The output layer combines those features to make predictions. 90 Source: NarraFve Science Deep learning Source: Lukas Masuch
  91. 91. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. 91 Source: NarraFve Science Why deep learning 
 for NLP?
  92. 92. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. 92 Source: NarraFve Science Ten applications of deep learning for natural language processing
  93. 93. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. Deep Learning in NLP Syntax Parsing SyntaxNet (Parsey McParseface) tags each word with a part-of-speech tag, and it determines the syntactic relationships between words in the sentence with an 94% accuracy compared to a human performance at 96%. Source 93 Source: NarraFve Science Deep learning can be used to parse syntax of natural language sentences. Source: Lukas Masuch
  94. 94. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. Deep Learning in NLP Generating Text To train the RNN, insert characters sequentially and predict the probabilities of the next letter. Backpropagate error and update RNN’s weights to increase the confidence of the correct letter (green) and decrease the confidence of all other letters (red). Trained on structured Wikipedia markdown. Network learns to spell English words completely from scratch and copy general syntactic structures. Source 94 Source: NarraFve Science Deep learning networks can learn to spell correctly and generate texts with appropriate syntactic structures. Source: Lukas Masuch
  95. 95. This content included for educational purposes. Sequence-to-sequence NLG 95 Seq2seq architectures preprocess input data to encode vectors and embeddings (probability distributions) at the character, word, sentence, paragraph, or document level. Input processes sequentially through LSTM recurrent neural networks to develop an overall encoding (thought vector) for the data of interest. The thought vector then processes recurrently through a series of decoder LSTM RNNs to generate the most probable language output.
  96. 96. This content included for educational purposes. Deep Learning in NLP Generating Text To generate text, we feed a character into the trained RNN and get a distribution over what characters are likely to come next (red = likely). We sample from this distribution, and feed it right back in to get the next letter. This highlighted neuron gets very excited (green = excited, blue = not excited) when the RNN is inside the [[ ]] markdown environment and turns off outside of it. The RNN is likely using this neuron to remember if it is inside a URL or not. Source 96 Source: NarraFve Science How deep learning generates text from character-level embeddings. Source: Lukas Masuch
  97. 97. This content included for educational purposes. Deep Learning in NLP Neural Machine Translation Googles Neural Machine Translation system based on a deep LSTM seq2seq architecture reduces translation errors by more than 55%-85% compared to traditional approaches. Source 97 Source: NarraFve Science Deep learning neural machine translations are increasingly accurate. Source: Lukas Masuch
  98. 98. This content included for educational purposes. Deep Learning in NLP Neural Conversational Model Neural Chatbot: Predicts the next sentence given the previous sentences in a conversation. The LSTM remembers facts, understands contexts and performs common sense reasoning in the trained domain. Human: what is the purpose of life ? Machine: to serve the greater good . Human: what is the purpose of living ? Machine: to live forever Human: what is the purpose of existence? Machine: to find out what happens when we get to the planet earth Human: where are you now ? . Machine: i ’m in the middle of nowhere . Human: what is the purpose of dying ? Machine: to have a life Human: what is the purpose of being intelligent ? Machine: to find out what it is. Human: what is the purpose of emotions ? Machine: I don ’t know . Philosophical Q&A – Network trained with OpenSubtitles (movie subtitles) Source 98 Source: NarraFve Science Deep learning LSTM neural networks are being used to generate human-machine conversations. Source: Lukas Masuch
  99. 99. This content included for educational purposes. 99 Source: NarraFve Science Deep learning for story telling
  100. 100. This content included for educational purposes. Summarization, and algorithms to make text quantifiable, allow us to derive insights from Large amounts of unstructured text data. Unstructured text has been slower to yield to the kinds of analysis that many businesses are starting to take for granted. We are beginning to gain the ability to do remarkable things with unstructured text data. First, the use of neural networks and deep learning for text offers the ability to build models that go beyond just counting words to actually representing the concepts and meaning in text quantitatively. These examples start simple and eventually demonstrate the breakthrough capabilities realized by the application of sentence embedding and recurrent neural networks to capturing the semantic meaning of text. 100 Source: NarraFve Science Toward multi-modal deep learning and language generation
  101. 101. AI TECHNOLOGY EVOLUTION
  102. 102. This content included for educational purposes. AI technology directions for 
 human-machine communication 
 and language generation •Evolution from hand-crafted knowledge and rules-based symbolic systems, and statistical learning and probabilistic inferencing systems, to contextual adaption systems that surpass limitations these earlier waves of AI. •Towards explainable AI, embedded continuous machine learning, automatic generation of whole-system causal models, and human- machine symbiosis. •Dedicated AI hardware providing 100X to 1000X increase in computational power. 102 This content included for educational purposes.
  103. 103. This content included for educational purposes. Artificial Intelligence is a programmed ability to process information 103 Source: DARPA perceive rich, complex and subtle information learn within an environment abstract to create new meanings reason to plan and to decide perceiving learning abstracting reasoning Intelligence scale
  104. 104. This content included for educational purposes. Three waves of AI technology 104 Contextual adaptation Engineers create systems that construct explanatory models for classes of real-world phenomena AI systems learn and reason as they encounter new tasks and situations Natural communication among machines and people Engineers create sets of rules to represent knowledge in well defined domains AI systems reason over narrowly defined problems No learning capability and poor handling of uncertainty Engineers create statistical models for specific problem domains and train them on 
 big data AI systems have nuanced classification and prediction capabilities No contextual capability and minimal reasoning ability Handcrafted knowledge Perceiving Learning Abstracting Reasoning Perceiving Learning Abstracting Reasoning Perceiving Learning Abstracting Reasoning Statistical learning Source: DARPA New research is shaping this waveStill advancing and solving hard problems Amazingly effective, but has fundamental limitations
  105. 105. This content included for educational purposes. Some third wave AI technologies 105 Explainable AI Embedded machine learning Continuous learning Automatic whole-system causal models Human-machine symbiosis Source: DARPA
  106. 106. When it comes to different types of natural language goals, like text summarization vs. question-answering vs. explanation of business intelligence, it seems likely a single platform will be able to solve them all in coming years. That is, we won’t see dramatically different technologies for each type of problem. Today, many natural language problems can be reframed as machine translation problems, and use similar approaches to solve them. Tomorrow’s NLG will fuse symbolic and statistical AI approaches in a third-wave synthesis. 106This content included for educational purposes.
  107. 107. www.Project10x.com

×