This document discusses machine translation and its role in facilitating multilingual communication on social media. It begins by defining machine translation and describing the two main approaches of rule-based and statistical machine translation. It then examines how social media platforms like Facebook, Twitter, and Google+ are incorporating machine translation to allow users to communicate across languages. Some key points include:
- Machine translation on social media allows quick translations of posts and comments, helping people with shared interests but different languages to connect.
- Major platforms support translation between over 50 languages, the most ever in human history.
- While translation accuracy remains a challenge, user feedback can help improve results over time.
- Machine translation helps companies market globally by creating
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
This document summarizes a research paper on the progress of natural language processing (NLP) in India. It discusses some of the key challenges in NLP research in India, including the complexity of languages, ambiguity, and lack of annotated corpora. It also outlines several potential benefits and applications of NLP in India, such as bridging the digital divide, enabling e-governance services in local languages, machine translation, and powering local language search engines and content. Overall, the document provides an overview of the state of NLP research in India and its prospects for the future.
The Rise Of ChatGPT_ Advancements In AI-Language Model Technology.pdfLucas Lagone
ChatGPT is the Most Recent AI language model technology that is taking the world by storm. Learn how it works, & potential impact on the future of language communication.
Original source: https://www.nevinainfotech.com/blog/advancements-in-ai-language-model-technology/
Jaap van der Meer, Director of TAUS, shares a compilation of the feedback on the Big Idea as well as a complete overview of new TAUS features and services and new partnerships.
The document summarizes a student project on speech recognition using Python. It includes 4 literature review papers on topics related to speech recognition, natural language processing, and machine learning approaches. It also includes a problem statement, methodology, comparisons table of the papers, conclusions, and proposes future work such as integrating speech APIs and creating a mobile app. The project uses Python and Tkinter to create a GUI-based speech recognition system that converts speech to text and vice versa.
Summer Research Project (Anusaaraka) ReportAnwar Jameel
This document discusses Anusaaraka, a machine translation tool being developed to translate between English and Hindi. It uses principles from Panini's grammar to map word groups and constructions between the languages. Where differences exist, extra notation is added to preserve source language information. The output is presented in layers to show the translation process. It aims to bridge the language barrier by allowing users to access text in their preferred Indian language.
Key features include faithfully representing the source text, reversibility of the translation process through layered output, and transparency by allowing users to trace the translation steps. It was developed by combining traditional Indian linguistic principles with modern technologies.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
This document discusses machine translation and its role in facilitating multilingual communication on social media. It begins by defining machine translation and describing the two main approaches of rule-based and statistical machine translation. It then examines how social media platforms like Facebook, Twitter, and Google+ are incorporating machine translation to allow users to communicate across languages. Some key points include:
- Machine translation on social media allows quick translations of posts and comments, helping people with shared interests but different languages to connect.
- Major platforms support translation between over 50 languages, the most ever in human history.
- While translation accuracy remains a challenge, user feedback can help improve results over time.
- Machine translation helps companies market globally by creating
A REVIEW ON THE PROGRESS OF NATURAL LANGUAGE PROCESSING IN INDIAJoe Osborn
This document summarizes a research paper on the progress of natural language processing (NLP) in India. It discusses some of the key challenges in NLP research in India, including the complexity of languages, ambiguity, and lack of annotated corpora. It also outlines several potential benefits and applications of NLP in India, such as bridging the digital divide, enabling e-governance services in local languages, machine translation, and powering local language search engines and content. Overall, the document provides an overview of the state of NLP research in India and its prospects for the future.
The Rise Of ChatGPT_ Advancements In AI-Language Model Technology.pdfLucas Lagone
ChatGPT is the Most Recent AI language model technology that is taking the world by storm. Learn how it works, & potential impact on the future of language communication.
Original source: https://www.nevinainfotech.com/blog/advancements-in-ai-language-model-technology/
Jaap van der Meer, Director of TAUS, shares a compilation of the feedback on the Big Idea as well as a complete overview of new TAUS features and services and new partnerships.
The document summarizes a student project on speech recognition using Python. It includes 4 literature review papers on topics related to speech recognition, natural language processing, and machine learning approaches. It also includes a problem statement, methodology, comparisons table of the papers, conclusions, and proposes future work such as integrating speech APIs and creating a mobile app. The project uses Python and Tkinter to create a GUI-based speech recognition system that converts speech to text and vice versa.
Summer Research Project (Anusaaraka) ReportAnwar Jameel
This document discusses Anusaaraka, a machine translation tool being developed to translate between English and Hindi. It uses principles from Panini's grammar to map word groups and constructions between the languages. Where differences exist, extra notation is added to preserve source language information. The output is presented in layers to show the translation process. It aims to bridge the language barrier by allowing users to access text in their preferred Indian language.
Key features include faithfully representing the source text, reversibility of the translation process through layered output, and transparency by allowing users to trace the translation steps. It was developed by combining traditional Indian linguistic principles with modern technologies.
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
The document discusses developments in the field of human language technology (HLT) and its future applications. It notes that HLT is no longer confined to academia and is becoming integrated into information and communication technology products and services in daily life. The document provides an overview of developments in text and speech processing, including machine translation systems and spell checkers. It also discusses the role of open source tools and frameworks in advancing research in HLT, particularly for Indian languages.
Introduction:
In the ever-evolving landscape of artificial intelligence, speech recognition datasets have become a cornerstone for technological advancements. The meticulous curation and expansion of these datasets play a pivotal role in propelling innovations in speech recognition, virtual assistants, and various AI applications.
Understanding Speech Recognition Datasets:
Speech recognition datasets involve the systematic collection of spoken words and phrases to facilitate the training and improvement of AI models. These datasets go beyond mere transcription, capturing the nuances of human speech and enabling machines to interpret and understand language with increasing accuracy.
Applications of Speech Recognition Datasets:
Enhancing Speech Recognition Technology:
Delve into how speech recognition datasets contribute to the refinement of speech-to-text technology, enabling applications in transcription services, voice-activated commands, and more.
Empowering Virtual Assistants:
Explore the role of speech recognition datasets in training virtual assistants like Siri, Google Assistant, and others, making them more adept at understanding and responding to user queries.
Fostering Multilingual Capabilities:
Discuss how diverse speech datasets contribute to the development of multilingual speech recognition systems, breaking language barriers and fostering inclusivity.
Improving Accessibility Services:
Highlight the impact of speech recognition datasets on the development of accessibility features, such as voice-controlled interfaces for individuals with disabilities.
Challenges in Speech Recognition Dataset Creation:
Despite their significance, the creation of speech recognition datasets is not without challenges. Variations in accents, background noise, and the need for comprehensive datasets present hurdles that must be overcome to ensure the efficacy of AI models.
Recent Advancements and Innovations:
Explore recent advancements in speech recognition dataset creation techniques, including the use of deep learning, transfer learning, and crowdsourcing to enhance dataset quality and diversity.
Future Directions:
Discuss the potential future developments in speech recognition datasets, considering ongoing efforts to address challenges and the emergence of new technologies. Consider the role of privacy-conscious approaches and ethical considerations in dataset creation.
Conclusion:
In conclusion, the evolution of speech recognition datasets is instrumental in shaping the trajectory of artificial intelligence. The continuous refinement and expansion of these datasets not only drive improvements in existing applications but also pave the way for novel uses in the ever-expanding realm of AI technology. As we navigate the future, the role of speech recognition datasets remains pivotal in realizing the full potential of voice-driven interactions and intelligent systems.
The document discusses the global text-to-speech market, which converts written text to synthesized speech. It is transforming industries by enhancing accessibility for those with disabilities and bridging the gap between written and auditory content. Major players in the TTS market provide solutions that offer enhanced accessibility, multilingual support, and integration across customer service, education, and other industries. Emerging trends include voice cloning, emotional speech, edge computing, and AI-powered platforms.
Conversational AI Agents have become mainstream today due to significant advancements in the methods required to build accurate models, such as machine learning and deep learning, and, secondly, because they are seen as a natural fit in a wide range of domains, such as healthcare, e-commerce, customer service, tourism, and education, that rely heavily on natural language conversations in day-to-day operations. This rapid increase in demand has been matched by an equally rapid rate of research and development, with new products being introduced on a daily basis.
Learn More:https://bit.ly/3tBkT81
Contact Us:
Website: https://www.phdassistance.com/
UK: +44 7537144372
India No:+91-9176966446
Email: info@phdassistance.com
This document summarizes a research paper on developing a Marathi speech recognition system to enable users to search for information on government policies and schemes in India. It describes the challenges of speech recognition including variability between speakers. The proposed system uses Mel frequency cepstral coefficients (MFCC) and Gammatone filters for feature extraction to build robust speaker-dependent and speaker-independent models. Accuracy rates are reported for the models in clean and noisy conditions. The system is implemented with a user interface to allow voice searches of text documents containing policy information stored on the local device. Evaluation of the system shows it can accurately recognize Marathi speech and retrieve relevant documents.
The document discusses using technology resources to increase access to legal services for clients with limited English proficiency (LEP). It provides an overview of relevant laws and considerations for using websites, hotlines, videos and other technologies to deliver language access services and information to LEP communities. Examples highlighted include the LawHelp.org website and its LiveHelp chat feature, as well as technology grants from the Legal Services Corporation that funded projects like kiosks and statewide legal websites.
Demystifying Natural Language Processing: A Beginner’s Guidecyberprosocial
In today’s digital age, the realm of technology constantly pushes boundaries, paving the way for revolutionary advancements. Among these breakthroughs, one particularly fascinating field gaining momentum is Natural Language Processing (NLP). It refers to the ability of computers to understand, interpret, and generate human language in a way that is both meaningful and contextually relevant. This article aims to shed light on the intricacies of NLP, its applications, and its significance in various sectors.
1 CELI – Language and Information Gennaio 2014
2 We develop software solutions based on (NLP) Natural Language Processing
3 CELI’s offices, Countries in which we operate, Years of experience, People, Active customers, Business lines
4 Partners in Academia, Research projects, Published scientific papers
Close relationship with scientific community
5 From 1999 to 2013
6 Clients: semantic solutions, Speech Technology, Blogmeter
7 NLP solutions
8 NLP technology: Comprehensive suite of multilingual components and resource
9 Linguistic processing and annotation
10 From text to Knowledge
11 Meaningful intelligence from unstructured information
12 Speech technology: Comprehensive suite of multilingual components and resources for text processing in Voice application (Text To Speech)
13 Contribution to TTS development:Consulting and technologies
14 Semantic solutions
15 Semantic Search: Enterprise Semantic Search solution for document system and knowledge management systems
16 Linked Data for Semantic Search: Creation-ReUse of multilingual ontologies,Linking to LOD resources,Deploying LOD
17 Linked (Open) Data for Enterprise Search
18 Semantic Search Platform
19 Customer Voice Analytics: Automatic classification of customer surveys (answers to open questions) and verbatim (customer cases or call transcriptios)
20-21 Multilingual management of verbatim coding
22 Product lines (Blogmeter, Crosslibrary)
23 Social Media Monitoring, Analytics & Management Tools per Aziende & Agenzie.
24 Blogmeter: Leader in Italia nella social media intelligence,Tecnologie d’avanguardia per la social intelligence
25 Digital Humanities e Scuola Digitale
26 Leggere i classici usando il digitale
27 I Promessi sposi e Pinocchio
28 Grazie per l’attenzione!
29 Vittorio Di Tomaso ditomaso@celi.it
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
This document summarizes a research paper that proposes using a tree-based pipeline optimization tool (TPOT) to improve sentiment classification of dialectal Arabic texts. The paper provides background on sentiment analysis and challenges in analyzing informal Arabic texts. It then discusses related work applying TPOT and AutoML techniques to optimize machine learning for various tasks. The proposed approach uses TPOT for sentiment analysis of three Arabic dialect datasets to automatically optimize hyperparameters and improve over similar prior work.
J of Consumer Behaviour - 2023 - Sattarapu - Tomeito or Tomahto Exploring co...ThiruVenkat6
This study examines how consumers with different English accents engage with and react to artificially intelligent interactive voice assistants (AIIVAs). The researchers conducted qualitative interviews and quantitative modeling to understand consumer behavior. The findings suggest consumers have positive, negative, and rational reactions to accent issues when interacting with AIIVAs. This provides insights for technology developers to better understand diverse accents when training speech recognition algorithms.
A Chatbot For Medical Purpose Using Deep LearningMartha Brown
This document describes a medical chatbot created using deep learning techniques. The chatbot is designed to help rural and low-income patients in India access healthcare by understanding their symptoms and providing self-care advice or recommending medications. The chatbot uses natural language processing and deep learning models like convolutional neural networks to understand user inputs and generate appropriate responses. It was created to address issues with healthcare access and affordability in India.
An Intelligent Career Counselling Bot A System for CounsellingIRJET Journal
This document describes the development of an intelligent career counseling chatbot. The chatbot uses natural language processing and artificial intelligence algorithms to analyze users' career-related questions and respond with relevant answers from its knowledge base. It allows users to ask open-ended career questions without a specific format. The chatbot's responses aim to simulate a conversation with a real career counselor. It helps users choose careers that match their interests and capabilities. The chatbot's processing involves matching user inputs to patterns in its knowledge base to determine an appropriate response. It has the potential to help many users receive career advice without requiring an in-person counselor.
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
This document presents a robust sign language and hand gesture recognition system using convolutional neural networks. The system captures image frames and processes them through various neural network layers to classify hand gestures into letters, numbers, or other symbols. It segments the hand from images using color thresholds and edge detection. The images then undergo preprocessing like resizing before being classified by the CNN. The CNN is trained on a large dataset to accurately recognize gestures in sign language and provide text output to bridge communication between deaf and non-signing individuals. The system achieved good results classifying several alphabet letters but could be expanded to recognize word combinations.
Improving performance of english hindi cross language information retrieval u...ijnlc
The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in
terms of average precision when compared to monolingual retrieval performance. The main reasons
behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated
query terms. The existing problems of CLIR are needed to be addressed in order to increase the
performance of the CLIR system. In this paper, we are putting our effort to solve the given problem by
proposed an algorithm for improving the performance of English-Hindi CLIR system. We used all possible
combination of Hindi translated query using transliteration of English query terms and choosing the best
query among them for retrieval of documents. The experiment is performed on FIRE 2010 (Forum of
Information Retrieval Evaluation) datasets. The experimental result show that the proposed approach gives
better performance of English-Hindi CLIR system and also helps in overcoming existing problems and
outperforms the existing English-Hindi CLIR system in terms of average precision.
A Survey on Speech Recognition with Language Specificationijtsrd
As a cross disciplinary, speech recognition is entirely based on the speech as the survey object. Speech recognition allows the machine to convert the speech signal into text or commands via the process of identification and understanding. Speech recognition involves in various fields of physiology, psychology, linguistics, computer science and signal processing, and is even related to the person’s body language, and its goal is to achieve natural language communication between man and machine. The speech recognition technology is gradually becoming the key technology of the IT man machine interface. This paper describes the development of speech recognition technology and its basic principles, methods, reviewed the classification of speech recognition systems, speech recognition approaches and voice recognition technology, analyzed the problems faced by the speech recognition. Dr. Preeti Savant | Lakshmi Sandhya H "A Survey on Speech Recognition with Language Specification" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-3 , April 2022, URL: https://www.ijtsrd.com/papers/ijtsrd49370.pdf Paper URL: https://www.ijtsrd.com/computer-science/speech-recognition/49370/a-survey-on-speech-recognition-with-language-specification/dr-preeti-savant
Look who's talking - Christopher Wilson (University of Oslo)mysociety
This was presented by Christopher Wilson from the University of Oslo at the Impacts of Civic Technology Conference (TICTeC 2017) in Florence on 25th April. You can find out more information about the conference here: http://tictec.mysociety.org
The Power of Natural Language Processing (NLP) | Enterprise WiredEnterprise Wired
This comprehensive guide delves into the intricacies of Natural Language Processing, exploring its foundational concepts, applications across diverse industries, challenges, and the cutting-edge advancements shaping the future of this dynamic field.
Speech Recognition Application for the Speech Impaired using the Android-base...TELKOMNIKA JOURNAL
Those who are speech impaired (tunawicara in the Indonesian language) suffer from
abnormalities in their delivery (articulation) of the language as well their voice in normal speech, resulting
in difficulty in communicating verbally within their environment. Therefore, an application is required that
can help and facilitate conversations for communication. In this research, the authors have developed a
speech recognition application that can recognise speech of the speech impaired, and can translate into
text form with input in the form of sound detected on a smartphone. By using the Google Cloud Speech
Application Programming Interface (API), this allows converting audio to text, and it is also user-friendly to
use such APIs. The Google Cloud Speech API integrates with Google Cloud Storage for data storage.
Although research into speech recognition to text has been widely practiced, this research try to develop
speech recognition, specially for speech impaired's speech, as well as perform a likelihood calculation to
see the factor of tone, pronunciation, and speech speed in speech recognition. The test was conducted by
mentioning the digits 1 through 10. The experimental results showed that the recognition rate for the
speech impaired is about 80%, while the recognition rate for normal speech is 100%.
EVALUATION OF CHATBOT TECHNOLOGY: THE CASE OF GREECEkevig
In recent years, the field of Artificial Intelligence has made significant strides, particularly in advancing
chatbots through Natural Language Processing (NLP) technology. Recently, however, there has been
intense competition in this industry, with major tech companies consistently introducing new and improved
solutions. Nevertheless, the Greek context introduces several unique challenges and obstacles to the
adoption of modern solutions, owing to the distinctiveness and relative rarity of the Greek language, as
well as the limited financial resources of the Greek economy. The objective of this study is to assess the
performance of chatbots in terms of the quality of their responses, including relevance, naturalness,
coherence, accuracy, and vocabulary, and to gauge user experience and satisfaction. An additional
objective is to acquire a comprehensive comparative overview of chatbot performance in Greece, both on a
per-question basis and when comparing related questions. The chosen method for evaluation is a guided
interview employing closed-type questions. The intention is to gather structured and quantified data in an
area where the typical internet user may not possess extensive familiarity or prior experience with such
evaluations. Conclusions were drawn for each question, enabling a focused and comparative assessment of
solution quality, identification of potential trends, and verification of response consistency.
Evaluation of Chatbot Technology: The Case of Greecekevig
In recent years, the field of Artificial Intelligence has made significant strides, particularly in advancing chatbots through Natural Language Processing (NLP) technology. Recently, however, there has been intense competition in this industry, with major tech companies consistently introducing new and improved solutions. Nevertheless, the Greek context introduces several unique challenges and obstacles to the adoption of modern solutions, owing to the distinctiveness and relative rarity of the Greek language, as well as the limited financial resources of the Greek economy. The objective of this study is to assess the performance of chatbots in terms of the quality of their responses, including relevance, naturalness, coherence, accuracy, and vocabulary, and to gauge user experience and satisfaction. An additional objective is to acquire a comprehensive comparative overview of chatbot performance in Greece, both on a per-question basis and when comparing related questions. The chosen method for evaluation is a guided interview employing closed-type questions. The intention is to gather structured and quantified data in an area where the typical internet user may not possess extensive familiarity or prior experience with such evaluations. Conclusions were drawn for each question, enabling a focused and comparative assessment of solution quality, identification of potential trends, and verification of response consistency.
Computational linguistics is an interdisciplinary field between linguistics and computer science that deals with computational modeling of human language. It has both theoretical and applied components. Theoretical CL develops formal models of linguistic knowledge and implements them as computer programs to better understand language faculties. Applied CL focuses on practical applications like natural language interfaces and machine translation to improve human-computer interaction. Computational linguistics combines ambitious goals like full language understanding with realistic current applications.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Introduction:
In the ever-evolving landscape of artificial intelligence, speech recognition datasets have become a cornerstone for technological advancements. The meticulous curation and expansion of these datasets play a pivotal role in propelling innovations in speech recognition, virtual assistants, and various AI applications.
Understanding Speech Recognition Datasets:
Speech recognition datasets involve the systematic collection of spoken words and phrases to facilitate the training and improvement of AI models. These datasets go beyond mere transcription, capturing the nuances of human speech and enabling machines to interpret and understand language with increasing accuracy.
Applications of Speech Recognition Datasets:
Enhancing Speech Recognition Technology:
Delve into how speech recognition datasets contribute to the refinement of speech-to-text technology, enabling applications in transcription services, voice-activated commands, and more.
Empowering Virtual Assistants:
Explore the role of speech recognition datasets in training virtual assistants like Siri, Google Assistant, and others, making them more adept at understanding and responding to user queries.
Fostering Multilingual Capabilities:
Discuss how diverse speech datasets contribute to the development of multilingual speech recognition systems, breaking language barriers and fostering inclusivity.
Improving Accessibility Services:
Highlight the impact of speech recognition datasets on the development of accessibility features, such as voice-controlled interfaces for individuals with disabilities.
Challenges in Speech Recognition Dataset Creation:
Despite their significance, the creation of speech recognition datasets is not without challenges. Variations in accents, background noise, and the need for comprehensive datasets present hurdles that must be overcome to ensure the efficacy of AI models.
Recent Advancements and Innovations:
Explore recent advancements in speech recognition dataset creation techniques, including the use of deep learning, transfer learning, and crowdsourcing to enhance dataset quality and diversity.
Future Directions:
Discuss the potential future developments in speech recognition datasets, considering ongoing efforts to address challenges and the emergence of new technologies. Consider the role of privacy-conscious approaches and ethical considerations in dataset creation.
Conclusion:
In conclusion, the evolution of speech recognition datasets is instrumental in shaping the trajectory of artificial intelligence. The continuous refinement and expansion of these datasets not only drive improvements in existing applications but also pave the way for novel uses in the ever-expanding realm of AI technology. As we navigate the future, the role of speech recognition datasets remains pivotal in realizing the full potential of voice-driven interactions and intelligent systems.
The document discusses the global text-to-speech market, which converts written text to synthesized speech. It is transforming industries by enhancing accessibility for those with disabilities and bridging the gap between written and auditory content. Major players in the TTS market provide solutions that offer enhanced accessibility, multilingual support, and integration across customer service, education, and other industries. Emerging trends include voice cloning, emotional speech, edge computing, and AI-powered platforms.
Conversational AI Agents have become mainstream today due to significant advancements in the methods required to build accurate models, such as machine learning and deep learning, and, secondly, because they are seen as a natural fit in a wide range of domains, such as healthcare, e-commerce, customer service, tourism, and education, that rely heavily on natural language conversations in day-to-day operations. This rapid increase in demand has been matched by an equally rapid rate of research and development, with new products being introduced on a daily basis.
Learn More:https://bit.ly/3tBkT81
Contact Us:
Website: https://www.phdassistance.com/
UK: +44 7537144372
India No:+91-9176966446
Email: info@phdassistance.com
This document summarizes a research paper on developing a Marathi speech recognition system to enable users to search for information on government policies and schemes in India. It describes the challenges of speech recognition including variability between speakers. The proposed system uses Mel frequency cepstral coefficients (MFCC) and Gammatone filters for feature extraction to build robust speaker-dependent and speaker-independent models. Accuracy rates are reported for the models in clean and noisy conditions. The system is implemented with a user interface to allow voice searches of text documents containing policy information stored on the local device. Evaluation of the system shows it can accurately recognize Marathi speech and retrieve relevant documents.
The document discusses using technology resources to increase access to legal services for clients with limited English proficiency (LEP). It provides an overview of relevant laws and considerations for using websites, hotlines, videos and other technologies to deliver language access services and information to LEP communities. Examples highlighted include the LawHelp.org website and its LiveHelp chat feature, as well as technology grants from the Legal Services Corporation that funded projects like kiosks and statewide legal websites.
Demystifying Natural Language Processing: A Beginner’s Guidecyberprosocial
In today’s digital age, the realm of technology constantly pushes boundaries, paving the way for revolutionary advancements. Among these breakthroughs, one particularly fascinating field gaining momentum is Natural Language Processing (NLP). It refers to the ability of computers to understand, interpret, and generate human language in a way that is both meaningful and contextually relevant. This article aims to shed light on the intricacies of NLP, its applications, and its significance in various sectors.
1 CELI – Language and Information Gennaio 2014
2 We develop software solutions based on (NLP) Natural Language Processing
3 CELI’s offices, Countries in which we operate, Years of experience, People, Active customers, Business lines
4 Partners in Academia, Research projects, Published scientific papers
Close relationship with scientific community
5 From 1999 to 2013
6 Clients: semantic solutions, Speech Technology, Blogmeter
7 NLP solutions
8 NLP technology: Comprehensive suite of multilingual components and resource
9 Linguistic processing and annotation
10 From text to Knowledge
11 Meaningful intelligence from unstructured information
12 Speech technology: Comprehensive suite of multilingual components and resources for text processing in Voice application (Text To Speech)
13 Contribution to TTS development:Consulting and technologies
14 Semantic solutions
15 Semantic Search: Enterprise Semantic Search solution for document system and knowledge management systems
16 Linked Data for Semantic Search: Creation-ReUse of multilingual ontologies,Linking to LOD resources,Deploying LOD
17 Linked (Open) Data for Enterprise Search
18 Semantic Search Platform
19 Customer Voice Analytics: Automatic classification of customer surveys (answers to open questions) and verbatim (customer cases or call transcriptios)
20-21 Multilingual management of verbatim coding
22 Product lines (Blogmeter, Crosslibrary)
23 Social Media Monitoring, Analytics & Management Tools per Aziende & Agenzie.
24 Blogmeter: Leader in Italia nella social media intelligence,Tecnologie d’avanguardia per la social intelligence
25 Digital Humanities e Scuola Digitale
26 Leggere i classici usando il digitale
27 I Promessi sposi e Pinocchio
28 Grazie per l’attenzione!
29 Vittorio Di Tomaso ditomaso@celi.it
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
This document summarizes a research paper that proposes using a tree-based pipeline optimization tool (TPOT) to improve sentiment classification of dialectal Arabic texts. The paper provides background on sentiment analysis and challenges in analyzing informal Arabic texts. It then discusses related work applying TPOT and AutoML techniques to optimize machine learning for various tasks. The proposed approach uses TPOT for sentiment analysis of three Arabic dialect datasets to automatically optimize hyperparameters and improve over similar prior work.
J of Consumer Behaviour - 2023 - Sattarapu - Tomeito or Tomahto Exploring co...ThiruVenkat6
This study examines how consumers with different English accents engage with and react to artificially intelligent interactive voice assistants (AIIVAs). The researchers conducted qualitative interviews and quantitative modeling to understand consumer behavior. The findings suggest consumers have positive, negative, and rational reactions to accent issues when interacting with AIIVAs. This provides insights for technology developers to better understand diverse accents when training speech recognition algorithms.
A Chatbot For Medical Purpose Using Deep LearningMartha Brown
This document describes a medical chatbot created using deep learning techniques. The chatbot is designed to help rural and low-income patients in India access healthcare by understanding their symptoms and providing self-care advice or recommending medications. The chatbot uses natural language processing and deep learning models like convolutional neural networks to understand user inputs and generate appropriate responses. It was created to address issues with healthcare access and affordability in India.
An Intelligent Career Counselling Bot A System for CounsellingIRJET Journal
This document describes the development of an intelligent career counseling chatbot. The chatbot uses natural language processing and artificial intelligence algorithms to analyze users' career-related questions and respond with relevant answers from its knowledge base. It allows users to ask open-ended career questions without a specific format. The chatbot's responses aim to simulate a conversation with a real career counselor. It helps users choose careers that match their interests and capabilities. The chatbot's processing involves matching user inputs to patterns in its knowledge base to determine an appropriate response. It has the potential to help many users receive career advice without requiring an in-person counselor.
IRJET - A Robust Sign Language and Hand Gesture Recognition System using Conv...IRJET Journal
This document presents a robust sign language and hand gesture recognition system using convolutional neural networks. The system captures image frames and processes them through various neural network layers to classify hand gestures into letters, numbers, or other symbols. It segments the hand from images using color thresholds and edge detection. The images then undergo preprocessing like resizing before being classified by the CNN. The CNN is trained on a large dataset to accurately recognize gestures in sign language and provide text output to bridge communication between deaf and non-signing individuals. The system achieved good results classifying several alphabet letters but could be expanded to recognize word combinations.
Improving performance of english hindi cross language information retrieval u...ijnlc
The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in
terms of average precision when compared to monolingual retrieval performance. The main reasons
behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated
query terms. The existing problems of CLIR are needed to be addressed in order to increase the
performance of the CLIR system. In this paper, we are putting our effort to solve the given problem by
proposed an algorithm for improving the performance of English-Hindi CLIR system. We used all possible
combination of Hindi translated query using transliteration of English query terms and choosing the best
query among them for retrieval of documents. The experiment is performed on FIRE 2010 (Forum of
Information Retrieval Evaluation) datasets. The experimental result show that the proposed approach gives
better performance of English-Hindi CLIR system and also helps in overcoming existing problems and
outperforms the existing English-Hindi CLIR system in terms of average precision.
A Survey on Speech Recognition with Language Specificationijtsrd
As a cross disciplinary, speech recognition is entirely based on the speech as the survey object. Speech recognition allows the machine to convert the speech signal into text or commands via the process of identification and understanding. Speech recognition involves in various fields of physiology, psychology, linguistics, computer science and signal processing, and is even related to the person’s body language, and its goal is to achieve natural language communication between man and machine. The speech recognition technology is gradually becoming the key technology of the IT man machine interface. This paper describes the development of speech recognition technology and its basic principles, methods, reviewed the classification of speech recognition systems, speech recognition approaches and voice recognition technology, analyzed the problems faced by the speech recognition. Dr. Preeti Savant | Lakshmi Sandhya H "A Survey on Speech Recognition with Language Specification" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-6 | Issue-3 , April 2022, URL: https://www.ijtsrd.com/papers/ijtsrd49370.pdf Paper URL: https://www.ijtsrd.com/computer-science/speech-recognition/49370/a-survey-on-speech-recognition-with-language-specification/dr-preeti-savant
Look who's talking - Christopher Wilson (University of Oslo)mysociety
This was presented by Christopher Wilson from the University of Oslo at the Impacts of Civic Technology Conference (TICTeC 2017) in Florence on 25th April. You can find out more information about the conference here: http://tictec.mysociety.org
The Power of Natural Language Processing (NLP) | Enterprise WiredEnterprise Wired
This comprehensive guide delves into the intricacies of Natural Language Processing, exploring its foundational concepts, applications across diverse industries, challenges, and the cutting-edge advancements shaping the future of this dynamic field.
Speech Recognition Application for the Speech Impaired using the Android-base...TELKOMNIKA JOURNAL
Those who are speech impaired (tunawicara in the Indonesian language) suffer from
abnormalities in their delivery (articulation) of the language as well their voice in normal speech, resulting
in difficulty in communicating verbally within their environment. Therefore, an application is required that
can help and facilitate conversations for communication. In this research, the authors have developed a
speech recognition application that can recognise speech of the speech impaired, and can translate into
text form with input in the form of sound detected on a smartphone. By using the Google Cloud Speech
Application Programming Interface (API), this allows converting audio to text, and it is also user-friendly to
use such APIs. The Google Cloud Speech API integrates with Google Cloud Storage for data storage.
Although research into speech recognition to text has been widely practiced, this research try to develop
speech recognition, specially for speech impaired's speech, as well as perform a likelihood calculation to
see the factor of tone, pronunciation, and speech speed in speech recognition. The test was conducted by
mentioning the digits 1 through 10. The experimental results showed that the recognition rate for the
speech impaired is about 80%, while the recognition rate for normal speech is 100%.
EVALUATION OF CHATBOT TECHNOLOGY: THE CASE OF GREECEkevig
In recent years, the field of Artificial Intelligence has made significant strides, particularly in advancing
chatbots through Natural Language Processing (NLP) technology. Recently, however, there has been
intense competition in this industry, with major tech companies consistently introducing new and improved
solutions. Nevertheless, the Greek context introduces several unique challenges and obstacles to the
adoption of modern solutions, owing to the distinctiveness and relative rarity of the Greek language, as
well as the limited financial resources of the Greek economy. The objective of this study is to assess the
performance of chatbots in terms of the quality of their responses, including relevance, naturalness,
coherence, accuracy, and vocabulary, and to gauge user experience and satisfaction. An additional
objective is to acquire a comprehensive comparative overview of chatbot performance in Greece, both on a
per-question basis and when comparing related questions. The chosen method for evaluation is a guided
interview employing closed-type questions. The intention is to gather structured and quantified data in an
area where the typical internet user may not possess extensive familiarity or prior experience with such
evaluations. Conclusions were drawn for each question, enabling a focused and comparative assessment of
solution quality, identification of potential trends, and verification of response consistency.
Evaluation of Chatbot Technology: The Case of Greecekevig
In recent years, the field of Artificial Intelligence has made significant strides, particularly in advancing chatbots through Natural Language Processing (NLP) technology. Recently, however, there has been intense competition in this industry, with major tech companies consistently introducing new and improved solutions. Nevertheless, the Greek context introduces several unique challenges and obstacles to the adoption of modern solutions, owing to the distinctiveness and relative rarity of the Greek language, as well as the limited financial resources of the Greek economy. The objective of this study is to assess the performance of chatbots in terms of the quality of their responses, including relevance, naturalness, coherence, accuracy, and vocabulary, and to gauge user experience and satisfaction. An additional objective is to acquire a comprehensive comparative overview of chatbot performance in Greece, both on a per-question basis and when comparing related questions. The chosen method for evaluation is a guided interview employing closed-type questions. The intention is to gather structured and quantified data in an area where the typical internet user may not possess extensive familiarity or prior experience with such evaluations. Conclusions were drawn for each question, enabling a focused and comparative assessment of solution quality, identification of potential trends, and verification of response consistency.
Computational linguistics is an interdisciplinary field between linguistics and computer science that deals with computational modeling of human language. It has both theoretical and applied components. Theoretical CL develops formal models of linguistic knowledge and implements them as computer programs to better understand language faculties. Applied CL focuses on practical applications like natural language interfaces and machine translation to improve human-computer interaction. Computational linguistics combines ambitious goals like full language understanding with realistic current applications.
Similar to Lec-Introduction-Natural Language processing (20)
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Lec-Introduction-Natural Language processing
1. Asif Ekbal
Dept. of Computer Science and Engineering
IIT Patna, Patna, India
Email: asif@iitp.ac.in, asif.ekbal@gmail.com
Introduction to Natural Language
Processing
2. Morphology
POS tagging
Chunking
Parsing
Semantics
Discourse and Coreference
Increased
Complexity
Of
Processing
Algorithm
Problem
Language
Hindi
Marathi
English
French
Morph
Analysis
Part of Speech
Tagging
Parsing
Semantics
HMM
MEMM
NLP
Trinity
2
CRF
RNN
3. Multilinguality: Indian situation
Major streams
Indo European
Dravidian
SinoTibetan
Austro-Asiatic
Some languages are ranked
within 20 in in the world in
terms of the populations
speaking them
Hindi : 4th (~350 milion)
Bangla: 5h (~230 million)
Marathi 10th (~84 million)
4. Language Technology or Natural Language
Processing: Background & Relevance in
Indian Scenario
5. Background: Indian Context
India is a multi-lingual country with great linguistic and cultural
diversities
22 official languages mentioned in the Indian constitution
However, Census of India in 2001 reported-
122 major languages
1,599 other regional languages
2,371 scripts
30 languages are spoken by more than one million native
speakers
122 are spoken by more than 10,000 people
20% understand English
80% cannot understand
6. Background
Phenomenal growth in the number of internet users, social media
(Facebook,Twitter etc.)
Increasing tendency of using Indian language contents for
exchanging information
Digital divide cannot be tackled unless citizens are given
flexibility in communicating in their own languages
Language Technology or Natural Language Processing (NLP)
that deals with developing theories and techniques for effective
communication in human languages play an important role
towards creating this digital society
7. TDIL: MeiTY, Govt. of India
Technology Development for Indian Languages (TDIL)
Programme
Objective:
developing Information ProcessingTools andTechniques to facilitate
human-machine interaction without language barrier;
creating and accessing multilingual knowledge resources; and
integrating them to develop innovative user products and services
8. TDIL: Some major initiatives
Development of English to Indian Language Machine Translation
(Anuvadaksh):
English to Hindi/Marathi/Bangla/Oriya/Tamil/Urdu/Gujrati/Bodo
Development of English to Indian Language Machine Translation
System with Angla-Bharti Technology: English to
Bangla/Punjabi/Malaylam/Urdu/Hindi/Telugu
Development of Indian Language to Indian Language Machine
Translation System (Sampark)- 18 pairs of languages
-Hindi to Bengali, Bengali to Hindi, Marathi to Hindi, Hindi to Marathi, Hindi to
Punjabi, Punjabi to Hindi, Hindi to Tamil, Tamil to Hindi, Hindi to Kannada, Kannada
to Hindi, Hindi to Telugu, Telugu to Hindi, Hindi to Urdu, Urdu-Hindi, Malaylam to
Tamil,Tamil to Malaylam,Tamil toTelugu,Telugu toTamil
9. TDIL: Some major initiatives
Development of Cross-Lingual Information Access (CLIA)
Assamese, Bengali, Hindi, Oriya, Punjabi,Tamil,Telugu, Marathi
Development of Robust Document Analysis & Recognition System
for Indian Languages (OCR)-14 languages
Assamese, Bengali, Devanagri, Gujrati, Gurumukhi, Kannada,
Malaylam, Manipuri, Marathi, Oriya,Tamil,Telugu,Tibetan, Urdu
Development ofText to Speech System in Indian Languages
Development of Automatic Speech Recognition System in Indian
Languages
Development of Hindi to English MachineTranslation in Judicial Domain
12. Govt. Portal: MyGov.in
Citizen-centric platform empowers people to connect with
the Government & contribute towards good governance
Unique first of its kind participatory governance initiative
involving the common citizen at large
Idea is to bring the government closer to the common man by
the use of online platform creating an interface for healthy
exchange of ideas and views involving the common citizen
and experts
Ultimate goal is to contribute to the social and economic
transformation of India
Was launched on July 26, 2014 by the Hon’ble PM
13. Govt. Portal: MyGov.in
This has been more than successful in keeping the citizens
engaged on important policy issues and governance, be it
Clean Ganga, Girl Child Education, Skill Development
and Healthy India to name a few
Has become a key part of the policy and decision making
process of the country
Platform has been able
to provide the citizens a voice in the governance process of the country
and
create grounds for the citizens to become stakeholders not only in policy
formulation and recommendation but also implementation through
actionable tasks
14. Govt. Portal: MyGov.in
Major attributes: Discussion,Tasks,Talks, Polls and Blogs on
various groups based on the diverse governance and public policy
issues
Has more than 1.78 Million users who contribute their
ideas through discussions and also participate through the
various earmarked tasks
Platform gets more than 10,000 posts per weeks on various
issues
Feedbacks are analyzed and put together as suggestions for the
concerned departments which are responsible to transform them
into actionable agenda
15. Infeasible to mine the most relevant information from this
huge data
Needs a method for automated analysis of this data
Demands sophisticated NLP and ML techniques to
build these
16. Code-mixing
Code-mixing refers to the mixing of two or more languages or
language varieties in speech/text
Kolkata toVaranasi ka kya distance hai
16
Entity English
Hindi
17. Code-Mixing in MyGov.in: Few Examples
Sir ji aapka ye abhiyan acha ha isse naye bharat ka nirman hoga maine
apne school ke student ke sath milkar hospital ki safai ki and jagrukta
rali nikali jisse log gandagi kam failaye.
Aaj her school main swachta abhiyan honi chye we do it
india ko clean rakhne ke lie gandgi karne walo pe penalty lagani chahiye
jo kaam das sal me hoga penalty lagane ke bad wo kuch hi dino me ho
jaega
Modi sir swachh bharat m aapke bjp poltician photo click krawane k liye
safai krte h sathinye neta sirf pik click krte h bs.
Our School also participated in Clean India Campaign . The students of
class XII cleaned a Park and a Basket Ball area .
18. Why to Analyse?
Public opinions play important roles for the betterment of human
lives
Huge volumes and varieties of user-generated contents and user
interaction networks constitute new opportunities for
understanding social behavior
Understanding deep feeling of public can help government to
anticipate deep social changes and adapt to population
expectations
Discipline known as Opinion Mining or Sentiment Analysis
19. NLP: Projected Growth
Growing in an exponential manner
Expected to touch the market of $16 billion in 2021
With compound growth rate of 16% annually
Reasons behind this growth
Rising of the Chatbots
Urge of discovering the customer insights
Transfer of technology of messaging from manual to automated
Translation of contents, and
many other tasks which are required to be automated and involve
language/Speech at some point
Etc.
Major Industries:Amazon,Google,Microsoft,Facebook,IBM etc.
20. NLP: Evolution
Evolving from human-computer interaction to human-
computer conversation
The first critical part of NLP Advancements – Biometrics
The second critical part of NLP advancements–Humanoid
Robotics
21. NLP: In Governance
NLP techniques for the delivery to the common people and to
decrease the interaction gap between the citizen and the
Government
Uses of NLP in GovernmentWebsites
Making e-governance related information to be available in multiple
languages
Natural Language Generation in e-Governance
Chatbot
E.g. farmer can not read or write, but with the multilingual support
and NLP generation, s/he can communicate the query in any language
and get it resolved
22. NLP: In Business, Healthcare
Sentiment Analysis: Analyzing public opinion
Email Filters: Filtering out irrelevant emails
Voice Recognition: Developing smart voice-driven services
Information Extraction
NLP in Healthcare
main concern and priority in nowadays the healthcare system is to provide
better and 24/7 EHR experience
Voice-support systems, Predictive systems, Prescriptive analytics)
NLP in Healthcare
can be used to reduce the communication and interaction gap between
Healthcare technologies (such as patient portals which contain health records of
a patient) and patients
Patients can interact in his/her own language
Easier for a patient to understand health status
23. NLP: In Healthcare
Increasing the dimension of high quality of care
Healthcare reports generally contain parameters which require proper
attention
Use of NLP can provide significant relief in case of calculating the
measure of inpatient care and monitoring the clinical guidelines
Identification of the patients which require Improved Care
Coordination
Automated detection of cancer, detection of the root causes related to
any substance disorder are some of the examples
24. NLP: In Finance
Credit Scoring Method
Estimate risk factor of giving loan with the past histories
E.g. Lenddo EFL (with 115 employees), a Singapore-based company
developed a software called Lenddo Score which uses machine
learning and NLP to assess and calculate an individual’s
creditworthiness.
Document search
Nuance Communications based in Massachusetts developed software
known as Nuance Document Finance Solution, which is used to aid
financial services companies in automatizing the documentation
process
Fraud detection in banking
Stock market prediction- based on sentiment
25. NLP: In Other domains
National Security
Sentiment in Cross-border languages
Hate Speech, Radicalization
NLP in Recruitment
searching the appropriate applications from the data, and it also can
be used for selecting the best applications from the data available
26. Natural Language Processing (NLP)
26
NLP is the branch of computer science focused on developing
systems that allow computers to communicate with people using
everyday language
Related to Computational Linguistics
Also concerns how computational methods can aid the
understanding of human language
27. Perspectives of NLP: Areas of AI and their
inter-dependencies
Search
Vision
Planning
Machine
Learning
Knowledge
Representation
Logic
Expert
Systems
Robotics
NLP
28. Evaluation Challenges
28
Message Understanding Conference (MUC): Information Extraction
(http://www.cs.nyu.edu/cs/faculty/grishman/muc6.html)
Text Retrieval Conference (TREC): Information Retrieval
(http://trec.nist.gov/)
Document Understanding Conference (DUC): Summarization
(http://duc.nist.gov/duc2003/call.html)
Automatic Content Extraction (ACE): Information Extraction
(http://www.itl.nist.gov./iad/894.01/tests/ace/2004/)
Evaluation exercises on Semantic Evaluation (SemEval): WSD, Coreferences
etc. (http://en.wikipedia.org/wiki/SemEval)
Cross Language Evaluation Forum (CLEF): Cross-lingual Information retrieval
(http://www.clef-initiative.eu//)
Recognising Textual Entailment Challenge (RTE): Textual entailment
(http://www.pascal-network.org/Challenges/RTE/ )
29. Evaluation Challenges
29
Morpho Challenge: unsupervised segmentation of words into morphemes
(http://www.cis.hut.fi/morphochallenge2005/)
Web People Search Evaluation Challenges (WePS): Information Extraction
(http://nlp.uned.es/weps/weps-2/)
CoNLL challenges: Chunking, Named Entity extraction etc.
(http://www.cnts.ua.ac.be/conll/)
Text Analysis Conference (TAC): Entailment etc.
(http://pascallin.ecs.soton.ac.uk/Challenges/RTE/)
BioCreative challenges: Biomedical text mining (http://biocreative.sourceforge.net/)
Biomedical information extraction challenges
JNLPBA (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ERtask/report.html)
BioNLP 2009 (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/)
BioNLP 2011 (http://2011.bionlp-st.org/)
BioNLP 2013 , 2014, 2015, 2016 etc.
SemEval: Sentiment, Emotion, Question-Answering etc.
30. Allied Disciplines
Philosophy Semantics, Meaning of “meaning”, Logic
(syllogism)
Linguistics Study of Syntax, Lexicon, Lexical Semantics etc.
Probability and Statistics Corpus Linguistics, Testing of Hypotheses,
System Evaluation
Cognitive Science Computational Models of Language Processing,
Language Acquisition
Psychology Behavioristic insights into Language Processing,
Psychological Models
Brain Science Language Processing Areas in Brain
Physics Information Theory, Entropy, Random Fields
Computer Sc. & Engg. Systems for NLP
32. What is NLP?
Branch of AI
2 Goals
Science Goal: Understand the way language operates
Engineering Goal: Build systems that analyse and generate language;
reduce the man-machine gap
33. Two Views of NLP
33
1. Classical View
2. Statistical/Machine Learning View
34. The famous Turing Test: Language based Interaction
(Computing Machinery and Intelligence:1950)
Machine
Human
Test conductor
Can the test conductor find out which is the machine and which the human
35. Natural Languages vs. Computer Languages
35
Ambiguity is the primary difference between natural and
computer languages
Formal programming languages are designed to be
unambiguous, i.e. they can be defined by a grammar that
produces a unique parse (in general) for each sentence in the
language
Programming languages are also designed for efficient
(deterministic) parsing, i.e. they are deterministic context-
free languages (DCFLs)
A sentence in a DCFL can be parsed in O(n) time where n is the
length of the string
36. NLP architecture and stages of processing-
ambiguity at every stage
Phonetics and phonology
Morphology
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Pragmatics
Discourse
36
37. Phonetics
Processing of speech
Challenges
Homophones: bank (finance) vs. bank (river bank)
Near Homophones: maatraa vs. maatra (Hin)
Word Boundary
aajaayenge (aa jaayenge (will come) or aaj aayenge (will come today)
I got [ua]plate
Phrase boundary
PhD students are especially exhorted to attend as such seminars are integral to one's post-graduate
education
Disfluency: ah,um,ahem etc.
The best part of my job is … well … the best part of my job is the responsibility.
38. Word Segmentation
Breaking a string of characters (graphemes) into a sequence of
words
In some written languages (e.g. Chinese) words are not
separated by spaces
Even in English, characters other than white-space can be used
to separate words [e.g. , ; . - : ( ) ]
Examples from English URLs:
jumptheshark.com jump the shark .com
myspace.com/pluckerswingbar
myspace .com pluckers wing bar
myspace .com plucker swing bar
39. Morphological Analysis
Morphology is the field of linguistics that studies the internal structure
of words (Wikipedia)
A morpheme is the smallest linguistic unit that has semantic meaning
(Wikipedia)
e.g. “carry”, “pre”, “ed”, “ly”, “s”
Morphological analysis is the task of segmenting a word into its
morphemes:
carried carry + ed (past tense)
independently in + (depend + ent) + ly
Googlers (Google + er) + s (plural)
unlockable un + (lock + able) ?
(un + lock) + able ?
40. Morphology
Word formation rules from root words
Nouns: Plural (boy-boys);Gender marking (czar-czarina)
Verbs: Tense (stretch-stretched); Aspect (e.g. perfective sit-had sat);
Modality (e.g. request khaanaa khaaiie)
Crucial first step in NLP
Languages rich in morphology: e.g., Dravidian, Hungarian, Turkish,
Indian languages
Languages poor in morphology: Chinese, English
Languages with rich morphology have the advantage of easier
processing at higher stages of processing
A task of interest to computer science: Finite State Machines for
Word Morphology
41. Lexical Analysis
Essentially refers to dictionary access and obtaining the
properties of the word
e.g.dog
noun (lexical property)
take-’s’-in-plural (morph property)
animate (semantic property)
4-legged (-do-)
carnivore (-do)
Challenge: Lexical or word sense disambiguation
42. Lexical Disambiguation
First step: Part of Speech Disambiguation
Dog as a noun (animal)
Dog as a verb (to pursue or to go after)
Sense Disambiguation
Dog (as animal)
Dog (as a very detestable person)
Needs word relationships in a context
The chair emphasized the need for adult education
Very common in day to day communications
Satellite ChannelAd: Watch what you want,when you want (two senses of watch)
Watch: wrist watch/watching something
43. Technological developments bring in new terms,
additional meanings/nuances for existing terms
Justify as in justify the right margin (word processing context)
Xeroxed:a new verb
DigitalTrace:a new expression
Communifaking:pretending to talk on mobile when you are actually
not
Discomgooglation:anxiety/discomfort at not being able to access
internet
Helicopter Parenting: over parenting
44. Ambiguity of Multiwords
The grandfather kicked the bucket after suffering from cancer.
This job is a piece of cake
Put the sweater on
He is the dark horse of the match
Google Translations of above sentences:
दादा क
ैं सर से पीड़ित होने क
े बाद बाल्टी लात मारी.
इस काम क
े क
े क का एक टुक़िा है.
स्वेटर पर रखो.
वह मैच क
े अंधेरे घो़िा है.
44
45. Ambiguity of Named Entities
Bengali: চঞ্চল সরকার বাড়িতে আতে
English: Government is restless at home. (*)
Chanchal Sarkar is at home
Amsterdam airport:“Baby Changing Room”
Hindi: दैडनक दबंग दुडनया
English: Daily domineering world
Actually name of a Hindi newspaper in Indore
High degree of overlap between NEs and MWEs
Treat differently - transliterate do not translate
45
47. Part of Speech (PoS) Tagging
Annotate each word in a sentence with a PoS
Useful for subsequent syntactic parsing and word sense
disambiguation
I ate the spaghetti with meatballs.
Pro V Det N Prep N
John saw the saw and decided to take it to the table.
PN V Det N Con V Part V Pro Prep Det N
48. Phrase Chunking
Find all non-recursive noun phrases (NPs) and verb phrases (VPs)
in a sentence
[NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs].
[NP He ] [VP reckons ] [NP the current account deficit ] [VP will
narrow ] [PP to ] [NP only # 1.8 billion ] [PP in ] [NP September ]
50. Parsing Strategy
Driven by grammar
S-> NPVP
NP-> N | PRON
VP->V NP |V PP
N-> Mangoes
PRON-> I
V-> like
51. Challenges in Syntactic Processing:
Structural Ambiguity
Scope
1.The old men and women were taken to safe locations
(old men and women) vs. ((old men) and women)
2.No smoking areas will allow Hookas inside
Preposition Phrase Attachment
I saw the boy with a telescope
(who has the telescope?)
I saw the mountain with a telescope
(world knowledge: mountain cannot be an instrument of seeing)
I saw the boy with the pony-tail
(world knowledge: pony-tail cannot be an instrument of seeing)
Very ubiquitous: newspaper headline “20 years later,BMC pays father 20 lakhs
for causing son’s death”
52. Structural Ambiguity…
Overheard
I did not know my PDA had a phone for 3 months
An actual sentence in the newspaper
The camera man shot the man with the gun when he was near
Tendulkar
(Times of India, 26/2/08) Aid for kins of cops killed in terrorist
attacks
53. Headache for Parsing: Garden Path sentences
Garden Pathing: A garden path sentence is a grammatically
correct sentence that starts in such a way that the readers' most likely
interpretation will be incorrect
The horse raced past the garden fell The horse – (that was) raced past
the garden – fell
The old man the boatThe boat (is manned) by the old
Twin Bomb Strike in Baghdad kill 25 (Times of India 05/09/07) (Twin
Bomb Strike) in Baghdad kill 25
55. Semantic Analysis
Representation in terms of
Predicate calculus/Semantic Nets/Frames/Conceptual Dependencies and Scripts
John gave a book to Mary
Give: action,Agent: John, Object: Book, Recipient: Mary
Challenge: ambiguity in semantic role labeling
(Eng)Visiting aunts can be a nuisance
(Hin) aapko mujhe mithaai khilaanii padegii (ambiguous in Marathi and Bengali
too)
Aapnaake aamake misti khoaate hobe
56. Word Sense Disambiguation (WSD)
56
Words in natural language usually have a fair number of different
possible meanings
Ravi has a strong interest in computer science
Ravi pays a large amount of interest on his credit card
For many tasks (question answering, translation), the proper sense of
each ambiguous word in a sentence must be determined
57. Semantic Role Labeling (SRL)
57
For each clause, determine the semantic role played by each noun
phrase that is an argument to the verb
agent patient source destination instrument
John drove Mary fromAustin to Dallas in hisToyota
The hammer broke the window
Also referred to a “case role analysis,” “thematic analysis,” and
“shallow semantic parsing”
58. Textual Entailment
Determine whether one natural language sentence entails
(implies) another under an ordinary interpretation
59. Textual Entailment Problems:
from PASCAL Challenge
TEXT HYPOTHESIS
ENTAIL
MENT
Eyeing the huge market potential, currently
led by Google, Yahoo took over search
company Overture Services Inc last year.
Yahoo bought Overture. TRUE
Microsoft's rival Sun Microsystems Inc.
bought Star Office last month and plans to
boost its development as a Web-based
device running over the Net on personal
computers and Internet appliances.
Microsoft bought Star Office. FALSE
The National Institute for Psychobiology in
Israel was established in May 1971 as the
Israel Center for Psychobiology by Prof.
Joel.
Israel was established in May
1971.
FALSE
Since its formation in 1948, Israel fought
many wars with neighboring Arab
countries.
Israel was established in
1948.
TRUE
61. Pragmatics
Very hard problem
Model user intention
Tourist (in a hurry,checking out of the hotel,motioning to the service boy):Boy,
go upstairs and see if my sandals are under the divan.Do not be late.I just have
15 minutes to catch the train.
Boy (running upstairs and coming back panting):yes sir,they are there.
World knowledge
WHY INDIA NEEDS A SECOND OCTOBER? (ToI, 2/10/07)
62. Discourse
Processing of sequence of sentences
Mother to John:
John go to school. It is open today. Should you bunk? Father will be very angry.
Ambiguity of open
bunk what?
Why will the father be angry?
Complex chain of reasoning and application of world knowledge
Ambiguity of father
father as parent
or
father as headmaster
63. Anaphora Resolution/ Co-Reference
Determine which phrases in a document refer to the same
underlying entity
John put the carrot on the plate and ate it.
Bush started the war in Iraq. But the president needed the
consent of Congress.
Some cases require difficult reasoning.
Today was Jack's birthday. Penny and Janet went to the store.They were
going to get presents. Janet decided to get a kite. "Don't do that," said
Penny. "Jack has a kite. He will make you take it back."
65. 66
Information Extraction (IE)
Identify phrases in language that refer to specific types of
entities and relations in text
Named entity recognition is task of identifying names of
people, places, organizations, etc. in text
people organizations places
Michael Dell is the CEO of Dell Computer Corporation and
lives inAustinTexas.
Relation extraction identifies specific relations between
entities.
Michael Dell is the CEO of Dell Computer Corporation and
lives inAustinTexas.
66. Question Answering
Directly answer natural language questions based on information
presented in a corpora of textual documents (e.g. the web)
When was Barack Obama born? (factoid)
August 4, 1961
Who was president when Barack Obama was born?
John F. Kennedy
How many presidents have there been since Barack Obama was
born?
9
67. Text Summarization
Produce a short summary of a longer document or article
Article: With a split decision in the final two primaries and a flurry of superdelegate
endorsements, Sen. Barack Obama sealed the Democratic presidential nomination last
night after a grueling and history-making campaign against Sen. Hillary Rodham Clinton
that will make him the firstAfrican American to head a major-party ticket. Before a
chanting and cheering audience in St. Paul, Minn., the first-term senator from Illinois
savored what once seemed an unlikely outcome to the Democratic race with a nod to the
marathon that was ending and to what will be another hard-fought battle, against Sen.
John McCain, the presumptive Republican nominee….
Summary: Senator Barack Obama was declared the presumptive Democratic
presidential nominee.
68. Sentiment Analysis
69
Sentiment analysis
Extract subjective information usually from a set of documents, often
using online reviews to determine "polarity" about specific objects
especially useful for identifying trends of public opinion in the social
media, for the purpose of marketing
69. Machine Translation (MT)
Translate a sentence from one natural language to another.
Hasta la vista, bebé
Until we see each other again, baby.
70. Ambiguity Resolution is Required for Translation
71
Syntactic and semantic ambiguities must be properly resolved
for correct translation:
“John plays the guitar.” → “John toca la guitarra.”
“John plays soccer.” → “John juega el fútbol.”
An apocryphal story is that an early MT system gave the
following results when translating from English to Russian and
then back to English:
“The spirit is willing but the flesh is weak.” “The liquor is good
but the meat is spoiled.”
“Out of sight, out of mind.” “Invisible idiot.”
71. Resolving Ambiguity
72
Choosing the correct interpretation of linguistic utterances
requires knowledge of:
Syntax
An agent is typically the subject of the verb
Semantics
Michael and Ellen are names of people
Austin is the name of a city (and of a person)
Toyota is a car company and Prius is a brand of car
Pragmatics
World knowledge
Credit cards require users to pay financial interest
Agents must be animate and a hammer is not animate
72. Manual Knowledge Acquisition
73
Traditional, “rationalist” approaches to language processing
require human specialists to specify and formalize the
required knowledge
Manual knowledge engineering is difficult, time-consuming,
and error prone
“ Rules ” in language have numerous exceptions and
irregularities
“All grammars leak.”: Edward Sapir (1921)
Manually developed systems were expensive to develop and
their abilities were limited and “brittle” (not robust)
73. Automatic Learning Approach
74
Use machine learning methods to automatically acquire the
required knowledge from appropriately annotated text
corpora
Variously referred to as the “corpus based,” “statistical,”
or “empirical” approach
Statistical learning methods were first applied to speech
recognition in the late 1970’s and became the dominant
approach in the 1980’s
During the 1990 ’ s, the statistical training approach
expanded and came to dominate almost all areas of NLP
75. Early History: 1950’s
Shannon (the father of information theory) explored
probabilistic models of natural language (1951)
Chomsky (the extremely influential linguist) developed
formal models of syntax, i.e. finite state and context-free
grammars (1956)
First computational parser developed at U Penn as a cascade
of finite-state transducers (Joshi, 1961; Harris, 1962)
Bayesian methods developed for optical character recognition
(OCR) (Bledsoe & Browning, 1959).
76. History: 1960’s
Work at MIT AI lab on question answering (BASEBALL) and
dialog (ELIZA)
Semantic network models of language for question answering
(Simmons, 1965).
First electronic corpus collected, Brown corpus, 1 million
words (Kucera and Francis, 1967)
Bayesian methods used to identify document authorship (The
Federalist papers) (Mosteller &Wallace, 1964)
77. History: 1970’s
“Natural language understanding” systems developed that
tried to support deeper semantic interpretation
SHRDLU (Winograd, 1972) performs tasks in the “blocks world”
based on NL instruction
Schank et al. (1972, 1977) developed systems for conceptual
representation of language and for understanding short stories using
hand-coded knowledge of scripts, plans, and goals.
Prolog programming language developed to support logic-
based parsing (Colmeraurer, 1975).
Initial development of hidden Markov models (HMMs) for
statistical speech recognition (Baker, 1975; Jelinek, 1976).
78. History: 1980’s
Development of more complex (mildly context sensitive)
grammatical formalisms, e.g. unification grammar, tree-adjoning
grammar etc
Symbolic work on discourse processing and NL generation.
Initial use of statistical (HMM) methods for syntactic analysis
(POS tagging) (Church, 1988).
79. History: 1990’s
Rise of statistical methods and empirical evaluation causes a
“scientific revolution” in the field
Initial annotated corpora developed for training and testing
systems for POS tagging, parsing, WSD, information
extraction, MT, etc.
First statistical machine translation systems developed at
IBM for Canadian Hansards corpus (Brown et al., 1990)
First robust statistical parsers developed (Magerman, 1995;
Collins, 1996; Charniak, 1997)
First systems for robust information extraction developed
(e.g. MUC competitions)
80. History: 2000’s
Increased use of a variety of ML methods, SVMs, logistic
regression (i.e. max-ent), CRF’s, etc.
Continued developed of corpora and competitions on shared
data.
TREC Q/A
SENSEVAL/SEMEVAL
CONLL SharedTasks (NER, SRL…)
Increased emphasis on unsupervised, semi-supervised, and
active learning as alternatives to purely supervised learning.
Shifted focus to semantic tasks such asWSD and SRL.
81. History: 2000 onwards
82
Information extraction from social networks
Information retrieval
Cross-lingual information access
MachineTranslation (statistical, hybrid etc.)
Biomedical text mining
Discourse processing
82. Machine Learning
Machine learning: how to acquire a model on the basis of data /
experience?
Learning parameters (e.g. probabilities)
Learning structure (e.g. BN graphs)
Learning hidden concepts (e.g. clustering)
83. Machine Learning
Unsupervised Learning
No feedback from teacher; detect patterns
Reinforcement Learning
Feedback consists of rewards/punishment
Supervised Learning
Examples of correct answers are given
Discrete answers: Classification
Continuous answers: Regression
84. Supervised Machine Learning
(c)
(a) (b) (d)
x x x x
f(x) f(x) f(x) f(x)
Given a training set:
(x1, y1), (x2, y2), (x3, y3), … (xn, yn)
Where each yi was generated by an unknown y = f (x),
Discover a function h that approximates the true function f
85. Example: Spam Filter
Input: x = email
Output: y = “spam” or “ham”
Setup:
Get a large collection of example
emails, each labeled “spam” or
“ham”
Note: someone has to hand label all
this data!
Want to learn to predict labels of new,
future emails
Features: The attributes used to make
the ham / spam decision
Words: FREE!
Text Patterns: $dd, CAPS
Non-text: SenderInContacts
…
86. Example: Digit Recognition
Input: x = images (pixel grids)
Output: y = a digit 0-9
Setup:
Get a large collection of example images, each
labeled with a digit
Note: someone has to hand label all this data!
Want to learn to predict labels of new, future digit
images
Features: The attributes used to make the digit
decision
Pixels: (6,8)=ON
Shape Patterns: NumComponents,AspectRatio,
NumLoops
…
87. How to Learn
Data: labeled instances, e.g. emails marked spam/ham
Training set
Held out (validation) set
Test set
Features: attribute-value pairs which characterize each x
Experimentation cycle
Learn parameters (e.g. model probabilities) on training set
Tune hyperparameters on held-out set
Compute accuracy on test set
Very important: never “peek” at the test set!
Evaluation
Accuracy: fraction of instances predicted correctly
Overfitting and generalization
Want a classifier which does well on test data
Overfitting: fitting the training data very closely, but not
generalizing well to test data
89. More Text Classification Examples
Many search engine functionalities use classification
Assigning labels to documents or web-pages:
Labels are most often topics such asYahoo-categories
"finance," "sports," "news>world>asia>business"
Labels may be genres (or, categories)
"editorials" "movie-reviews" "news”
Labels may be opinion on a person/product
“like”,“hate”,“neutral”
Labels may be domain-specific
"interesting-to-me" :"not-interesting-to-me”
language identification:English,French,Chinese,…
search vertical:about Linux versus not
“link spam” :“not link spam”
91
90. Classification Methods: History
Manual classification
Used by the originalYahoo! Directory
Looksmart, about.com, ODP, PubMed
Very accurate when job is done by experts
Consistent when the problem size and team is small
Difficult and expensive to scale
Means we need automatic classification methods for big problems
92
91. Classification Methods: History
Automatic classification
Hand-coded rule-based systems
One technique used by Reuters, CIA, etc.
It’s what GoogleAlerts is doing
Widely deployed in government and enterprise
Companies provide “IDE” (integrated development environment) for writing
such rules
E.g., assign category if document contains a given boolean combination of words
Standing queries: Commercial systems have complex query languages (everything
in IR query languages +score accumulators)
Accuracy is often very high if a rule has been carefully refined over time by a
subject expert
Building and maintaining these rules is expensive
Rules could vary with the change of domain
93
92. Classification Methods: History
Supervised learning of a document-label assignment function
Many systems partly rely on machine learning (Autonomy, Microsoft,
Enkata,Yahoo!, Google News, …)
k-Nearest Neighbors (simple, powerful)
Naive Bayes (simple, common method)
Support-vector machines (new, more powerful)
… plus many other methods
Requirement: requires hand-classified training data
But data can be built up (and refined) by amateurs
Many commercial systems use a mixture of methods
94
93. NLP and ML: From Past to Present
NLP based systems have enabled wide-range of applications
Google’s powerful search engines, Google’s MT
Alexa etc.
Amazon Comprehend Medical services
CognitiveAnalytics and NLP, Spam detection, NLP in Recruitment
SentimentAnalysis, Hate Speech detection, Fake News detection
Shallow ML algorithms (corresponds to Statistical NLP)
Used extensively (HMM, MaxEnt, CRF, SVM, Logistic Regression
etc.)
Requires handcrafting of features
Time-consuming
Curse of dimensionality (because of joint modeling of language
models)
94. NLP and ML: From Past to Present
Deep Learning algorithms
No feature engineering
Success of distributed representations (Neural language models)
Some recent developments
The rise of distributed representations (e.g., Word2vec, GLOVE,
ELMO, BERT etc)
Convolutional, recurrent, recursive neural networks, Transformer,
Reinforcement learning
Unsupervised sentence representation learning
Combining deep learning models with memory-augmenting
strategies
Explainable AI
95. •Subfield of learning representations of data
•Exceptionally effective at learning patterns
•Deep learning algorithms attempt to learn (multiple levels of) representations
by using a hierarchy of multiple layers
•If you provide the system tons of information, it begins to understand it
and respond in useful ways
Deep Learning (DL)
https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
96. o Manually designed features are often over-specified, incomplete and take a
long time to design and validate
o Learned Features are easy to adapt, fast to learn
o Deep learning provides a very flexible, (almost?) universal, learnable
framework for representing world, visual and linguistic information
o Can learn both unsupervised and supervised
o Effective end-to-end learning
o Utilize large amounts of training data
Why is DL useful?
In ~2010 DL started
outperforming other ML
techniques
first in speech and vision, then NLP
97. News: March 27, 2019
Yoshua Bengio, Geoffrey Hinton, and Yann LeCun
received the
Turing Award-2018 (equivalent to Nobel Prize of
Computing)
for Modern AI (specifically for deep learning research)
Bengio- University ofToronoto and Google
Hinton- University of Montreal
LeCun- Facebook’s chiefAI scientist and a professor at NYU
99. Books etc.
Main Text(s):
Natural Language Understanding: James Allan
Speech and NLP: Jurafsky and Martin
Foundations of Statistical NLP: Manning and Schutze
Other References:
NLP a Paninian Perspective: Bharati, Cahitanya and Sangal
Statistical NLP: Charniak
Journals
Computational Linguistics, Natural Language Engineering, AI, AI
Magazine, IEEE SMC
Conferences
ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT,
ICON, SIGIR, WWW, ICML, ECML