1
Natural Language Processing and Its Application
Natural Language Processing (NLP) is a field of research to interpret natural language
text or speech to develop useful applications to public. Natural Language Processing researchers
are aiming to collect knowledge on how human beings understand and use language, so that
suitable tools and techniques can be developed to make computer systems understand and
manipulate natural languages to perform the desired activities. Jing (2012, page 670) states that,
text information is an important form of digital information and text encryption technique; it is
one of the effective means of information security. Applications of Natural Language Processing
includes number of fields of studies like machine translation, natural language text processing,
user interfaces, artificial intelligence, natural language speech recognition, and many others.
Natural language processing is a field of research that explores, how computers are used to
understand and manipulate data, used to find linguistic structure and used for machine
translation.
Computers understand and manipulate data by information extraction using named
entities and relation between them. Generally, names and lists are used to recognize or identify
any element, and Natural language processing tries to find relation between the names or lists
called ‘named-entities’. “The process of entity recognition refers to the combined task of finding
spans of text that constitute proper names and then classifying the entities being referred to
according to their type” ( Shapiro 2015, page 902). For example, named-entities relation is
rigorously used in Google search auto-completion. When any user start typing the word ‘Health’,
Google automatically suggests words like ‘Health Systems’ or ‘Health Systems nearby.’
Internally, Google is using named-entities relationships to match the given search string (health)
2
with a list of predefined words (like systems, nearby) to provide auto suggestion to users.
Another way of establishing named-entity relationship is by using ‘Chunking’. Chunking is used
to parse and label sentence segments to establish syntactic correlations between noun or verb
phrases. Chunking allows to add words which are assigned with a tag in the beginning or middle
of the sentence, so that it will be easy to identify the named-entities using the tags. Chunking can
be done in two ways: chunking up and chunking down. Providing data in an abstract manner is
referred to as chunking up. Providing detailed description is referred to as chunking down. For
example, when searching for building or city, chunking up gives very brief information about the
building or city. Whereas, when constructing a building details such as when the building was
constructed, who build it, and how the name of city derived classified as examples of chunking
down.
Manipulation of text data to extract the required information is possible using Natural
Language Processing technique called Text Processing. It has a knowledge for extracting,
automatic indexing, and abstracting data to produce text in a required format user can
understand. Processed data after text processing helps web developers to get accurate
information in designing a website, as this information acts as raw data which will give them
processed text with indexing. After indexing web developers take this information and add it to
search strings; for example, a customer is looking for school information in website, procced
data allows to display all the information related to that school in indexed format; which helps
the customer to browse through it or find the information easily. Overall, the goal of Natural
Language Processing text processing is to understand and breakdown complex sentence
structures into unambiguous and simple sentence to establish relationships between each named-
entities.
3
Identifying linguistic structure can be done by finding parts of speech parsing and
tagging. Natural Language Processing uses the following techniques to take out meaning from
text of a pronounced languages: Identifying parts of speech and correlating them. A given
sentence is broken down into the smallest lexical parts and identify parts of speech of each word.
These sentence break down structures help to derive meaningful sentences. However, phonetics
is used to deal with pronunciation in identifying voice inputs. Morphology is another technique
used to deal with the smallest parts of words in identifying meaning and modifying its meaning
by adding suffixes and prefixes. For example, Microsoft Word uses parts of speech based text
recognition in identifying words and suggesting to users any errors in their text. Another process
in identifying linguistic structure is ‘Tagging’. It is process where in a given list, frequently used
words and nouns are marked as keywords called Tags, and then all the associated words to
keywords are tagged together. According to Yang (2013, page 709), keyword assignment is
important in developing a mechanism that enables professionals to quickly identify specific
terms in contracts. For example, movie is a keyword which is tagged with watching as an
associated word. These tagging technologies is very useful to companies to market their
products. A commercial example for tagging is Google AdWords, where web services identify
the user specific details such as gender, location and interests; based on this information
AdWords technology tags user to products which are useful to them such as visiting locations,
restaurants, and shopping malls near to the user. Tagging is intelligent enough to provide user
desired information by just knowing the basic details of the user. Overall, Natural Language
Processing has a sense of knowledge to understand what humans speak and how to interpret that
data in providing valuable information to the users.
4
Machine Translation helps to recognize patterns to make predictions on the data.
Machine Translation is a theory of research which explains how machines understand and
interpret human language. Natural Language Processing technique, speech recognition is used in
Machine Translation; where computers are designed with powerful algorithms to understand
human language and respond to it. Shahbaz (2012, page 80) explains the main idea behind
extracting any important information is to generate web queries which will return valid values
required for the customers. For example, the query processing system in any mobile devices,
specifically Google Now, IPhone Siri and Microsoft Cortona are the best examples of speech
processing. When a user opens IPhone and asks, “What is the current weather?” system will
respond immediately with the weather report; Siri (Speech recognition software) designed to
understand the human language, will translate speech into text and send the processed text to any
of the weather forecast websites. After extracting the weather report in textual form Siri will
again convert to speech and provide the answer to the user with very minimal time. Machine
Translation is a quite challenging aspect to handle, where translation is not only speech to text
but also language to language. One such example, many websites are available today in different
languages such as English, Chinese, Spanish, and French. Websites are designed to identify the
current language and provide a translation to it in English. Another example is Google
Translator, where any given text can be converted into any other desired language and vice
versa. Although Machine Translation does not always provide prefect output, it can be used to
replace human translators. The advantages of automatic translations and user friendly mobile
applications make Machine Translation an emerging and encouraging field of research.
5
At the core, any techniques of Natural Language Processing are designed to deal with the
important issues in understating and manipulating user data. The method of constructing system
programs to identify natural language mainly involves three major problems: the first one is
processing and manipulating user text, the second one is representing the meaning of the
linguistic input, and the third one is making it machine readable. Solutions to all these problems
are provided by Natural Language processing techniques such as to understand and manipulate
data, find linguistic structure and to do machine translation.
6
References
Jing, X., Fei, H., Hao, Y., & Li, Z. (2012). Text encryption algorithm based on natural language
processing. IEEE Computer Society, 12, 670-672, Doi: 10.1109/MINES.2012.216
Shohbaz, M., Mcminn, P., & Stevenson, M. (2012). Automated discovery of valid test string
from the web using dynamic regular expression collation and natural language
processing. IEEE Compute Society, 12, 79-88. Doi: 10.1109/QSIC.2012.15
Shapiro, S.C., & Schlead, D.R. (2015). Use of background knowledge in natural language
understanding for information fusion. 18th International Conference On Information
Fusion, 15, 901-907.
Yang, D., Leber, C., Tari, L., Crapo, A., Messencer, R., Gustafson, S., & Chandramouli, A.
(2013). A Natural language processing and semantic based system for contract analysis.
IEEE Computer Society, 13, 707-712. Doi: 10.1109/ICIAI.2013.109

NLP and its applications

  • 1.
    1 Natural Language Processingand Its Application Natural Language Processing (NLP) is a field of research to interpret natural language text or speech to develop useful applications to public. Natural Language Processing researchers are aiming to collect knowledge on how human beings understand and use language, so that suitable tools and techniques can be developed to make computer systems understand and manipulate natural languages to perform the desired activities. Jing (2012, page 670) states that, text information is an important form of digital information and text encryption technique; it is one of the effective means of information security. Applications of Natural Language Processing includes number of fields of studies like machine translation, natural language text processing, user interfaces, artificial intelligence, natural language speech recognition, and many others. Natural language processing is a field of research that explores, how computers are used to understand and manipulate data, used to find linguistic structure and used for machine translation. Computers understand and manipulate data by information extraction using named entities and relation between them. Generally, names and lists are used to recognize or identify any element, and Natural language processing tries to find relation between the names or lists called ‘named-entities’. “The process of entity recognition refers to the combined task of finding spans of text that constitute proper names and then classifying the entities being referred to according to their type” ( Shapiro 2015, page 902). For example, named-entities relation is rigorously used in Google search auto-completion. When any user start typing the word ‘Health’, Google automatically suggests words like ‘Health Systems’ or ‘Health Systems nearby.’ Internally, Google is using named-entities relationships to match the given search string (health)
  • 2.
    2 with a listof predefined words (like systems, nearby) to provide auto suggestion to users. Another way of establishing named-entity relationship is by using ‘Chunking’. Chunking is used to parse and label sentence segments to establish syntactic correlations between noun or verb phrases. Chunking allows to add words which are assigned with a tag in the beginning or middle of the sentence, so that it will be easy to identify the named-entities using the tags. Chunking can be done in two ways: chunking up and chunking down. Providing data in an abstract manner is referred to as chunking up. Providing detailed description is referred to as chunking down. For example, when searching for building or city, chunking up gives very brief information about the building or city. Whereas, when constructing a building details such as when the building was constructed, who build it, and how the name of city derived classified as examples of chunking down. Manipulation of text data to extract the required information is possible using Natural Language Processing technique called Text Processing. It has a knowledge for extracting, automatic indexing, and abstracting data to produce text in a required format user can understand. Processed data after text processing helps web developers to get accurate information in designing a website, as this information acts as raw data which will give them processed text with indexing. After indexing web developers take this information and add it to search strings; for example, a customer is looking for school information in website, procced data allows to display all the information related to that school in indexed format; which helps the customer to browse through it or find the information easily. Overall, the goal of Natural Language Processing text processing is to understand and breakdown complex sentence structures into unambiguous and simple sentence to establish relationships between each named- entities.
  • 3.
    3 Identifying linguistic structurecan be done by finding parts of speech parsing and tagging. Natural Language Processing uses the following techniques to take out meaning from text of a pronounced languages: Identifying parts of speech and correlating them. A given sentence is broken down into the smallest lexical parts and identify parts of speech of each word. These sentence break down structures help to derive meaningful sentences. However, phonetics is used to deal with pronunciation in identifying voice inputs. Morphology is another technique used to deal with the smallest parts of words in identifying meaning and modifying its meaning by adding suffixes and prefixes. For example, Microsoft Word uses parts of speech based text recognition in identifying words and suggesting to users any errors in their text. Another process in identifying linguistic structure is ‘Tagging’. It is process where in a given list, frequently used words and nouns are marked as keywords called Tags, and then all the associated words to keywords are tagged together. According to Yang (2013, page 709), keyword assignment is important in developing a mechanism that enables professionals to quickly identify specific terms in contracts. For example, movie is a keyword which is tagged with watching as an associated word. These tagging technologies is very useful to companies to market their products. A commercial example for tagging is Google AdWords, where web services identify the user specific details such as gender, location and interests; based on this information AdWords technology tags user to products which are useful to them such as visiting locations, restaurants, and shopping malls near to the user. Tagging is intelligent enough to provide user desired information by just knowing the basic details of the user. Overall, Natural Language Processing has a sense of knowledge to understand what humans speak and how to interpret that data in providing valuable information to the users.
  • 4.
    4 Machine Translation helpsto recognize patterns to make predictions on the data. Machine Translation is a theory of research which explains how machines understand and interpret human language. Natural Language Processing technique, speech recognition is used in Machine Translation; where computers are designed with powerful algorithms to understand human language and respond to it. Shahbaz (2012, page 80) explains the main idea behind extracting any important information is to generate web queries which will return valid values required for the customers. For example, the query processing system in any mobile devices, specifically Google Now, IPhone Siri and Microsoft Cortona are the best examples of speech processing. When a user opens IPhone and asks, “What is the current weather?” system will respond immediately with the weather report; Siri (Speech recognition software) designed to understand the human language, will translate speech into text and send the processed text to any of the weather forecast websites. After extracting the weather report in textual form Siri will again convert to speech and provide the answer to the user with very minimal time. Machine Translation is a quite challenging aspect to handle, where translation is not only speech to text but also language to language. One such example, many websites are available today in different languages such as English, Chinese, Spanish, and French. Websites are designed to identify the current language and provide a translation to it in English. Another example is Google Translator, where any given text can be converted into any other desired language and vice versa. Although Machine Translation does not always provide prefect output, it can be used to replace human translators. The advantages of automatic translations and user friendly mobile applications make Machine Translation an emerging and encouraging field of research.
  • 5.
    5 At the core,any techniques of Natural Language Processing are designed to deal with the important issues in understating and manipulating user data. The method of constructing system programs to identify natural language mainly involves three major problems: the first one is processing and manipulating user text, the second one is representing the meaning of the linguistic input, and the third one is making it machine readable. Solutions to all these problems are provided by Natural Language processing techniques such as to understand and manipulate data, find linguistic structure and to do machine translation.
  • 6.
    6 References Jing, X., Fei,H., Hao, Y., & Li, Z. (2012). Text encryption algorithm based on natural language processing. IEEE Computer Society, 12, 670-672, Doi: 10.1109/MINES.2012.216 Shohbaz, M., Mcminn, P., & Stevenson, M. (2012). Automated discovery of valid test string from the web using dynamic regular expression collation and natural language processing. IEEE Compute Society, 12, 79-88. Doi: 10.1109/QSIC.2012.15 Shapiro, S.C., & Schlead, D.R. (2015). Use of background knowledge in natural language understanding for information fusion. 18th International Conference On Information Fusion, 15, 901-907. Yang, D., Leber, C., Tari, L., Crapo, A., Messencer, R., Gustafson, S., & Chandramouli, A. (2013). A Natural language processing and semantic based system for contract analysis. IEEE Computer Society, 13, 707-712. Doi: 10.1109/ICIAI.2013.109