SlideShare a Scribd company logo
1 of 17
Download to read offline
1/17
www.leewayhertz.com /named-entity-recognition/
Named Entity Recognition (NER): Unveiling the value in
unstructured text
LeewayHertz
Structured
Text
Classification
E1A gene expression
induces susceptibility
to killing by NK cells.
PoS Tagging
Sentence
Segmentation
Tokenisation
Features
Processing
Extraction
Model
Recognition
Module
E1A gene (D N Region)
expression
induces susceptibility
to killing by NK cells (CellType)
Unstructured
Text
In our digitally interconnected world, the immense generation of textual data has become a staple of daily
life. From social media updates and news articles to emails and other sources, we contribute to a vast
repository of information every day. Yet, the true value within this data often remains locked away due to
its unstructured format, demanding sophisticated techniques for processing. Within Natural Language
Processing (NLP), Named Entity Recognition (NER) stands out as a critical tool for gleaning meaningful
insights from this unstructured textual data by skillfully identifying and categorizing named entities. NLP
allows machines to comprehend, interpret, and interact with human language, thus narrowing the divide
between humans and computers. According to Markets and Markets, the NLP market size reached $18.9
billion in 2023 and is expected to experience significant growth, aiming to hit $68.1 billion by 2028 with a
CAGR of 29.3%.
At its essence, named entity recognition acts as a vital process for detecting and classifying named
entities within texts, revealing their significance and facilitating a more profound level of analysis. These
entities span various categories, including people, organizations, locations, dates, and other contextual
indicators. Through the identification and extraction of these entities, NER converts a sea of unstructured
text into structured information. By clarifying the identities and classifications of named entities, NER lays
the groundwork for detailed analysis, empowering individuals and organizations to make well-informed
decisions and unearth the hidden treasures within the textual landscape.
2/17
Join us as we explore the nuances of named entity recognition, demystifying its fundamental principles
and operations to gain a full appreciation of its capabilities and the intricacies of its application.
What is Named Entity Recognition (NER)?
Key components of named entity recognition
The working mechanism of named entity recognition
An overview of named entity recognition methodologies
NLP models used for named entity recognition
Named entity recognition methods
How to perform named entity recognition using Python?
Use cases of named entity recognition
What is Named Entity Recognition (NER)?
NER is a process used in Natural Language Processing (NLP) where a computer program analyzes text
to identify and extract important pieces of information, such as names of people, places, organizations,
dates, and more. Employing NER allows a computer program to automatically recognize and categorize
these specific pieces of information within the text. This is especially useful when dealing with large
volumes of text, where manually identifying and organizing such entities would be both time-consuming
and prone to errors.
NER involves two key tasks, both crucial for effectively processing text and extracting valuable
information. The first task is identifying significant words and phrases, particularly proper nouns, within
the text. This step requires precisely locating and annotating these words to mark them as named
entities.
Once the named entities are identified, the second task of NER, classification, begins. In this stage, the
recognized entities are sorted into predetermined categories based on their nature. These categories can
include personal names, organizations (such as companies, government bodies, and committees),
locations (ranging from cities to countries and rivers), and temporal expressions indicating specific dates
or times.
Consider the sentence: “Apple Inc. is planning to open a new store in New York City next month.”
In this sentence, “Apple Inc.” is a named entity referring to an organization, while “New York City” is a
named entity representing a location.
The first task of NER is to identify these proper names or phrases within the text. Here, “Apple Inc.” and
“New York City” are the identified named entities.
The second task involves classifying these named entities into predefined categories. In our example,
“Apple Inc.” would be categorized under organizations, and “New York City” would fall under the category
of locations.
NER efficiently extracts and classifies these specific entities from the sentence, enabling further analysis
or information retrieval based on the identified named entities.
3/17
Key components of named entity recognition
In Natural Language Processing, a model designed for NER comprises several essential components,
which include:
Tokenization: The text is divided into individual tokens, which are typically words or punctuation
marks. Tokenization helps in creating a structured representation of the text.
Part-of-speech tagging: Each token is labeled with its corresponding part of speech, such as
noun, verb, adjective, etc. This step provides grammatical context and aids in understanding the
syntactic structure of the text.
Chunking: Tokens are grouped into “chunks” based on their part-of-speech tags. Chunking allows
for identifying and extracting meaningful phrases or entities from the text.
Named entity recognition: This component is responsible for identifying named entities, such as
names of people, organizations, locations, dates, and other specific entities. It involves classifying
these entities into predefined categories or types.
Entity disambiguation: In situations where multiple entities share the same name in the text,
entity disambiguation is performed to determine the correct meaning of the named entity. This
process considers the surrounding context and additional information to resolve any ambiguities.
These components are foundational for NER and contribute to the model’s ability to process and
understand text at a level that is useful for practical applications.
The working mechanism of named entity recognition
NER systems typically follow a two-step process:
Boundary detection
Entity classification
Boundary detection
The first step in Named Entity Recognition (NER) is to figure out where each named entity starts and
ends in the text. This means identifying the beginning and ending points of entities, like names of people
or places. While capital letters can give us clues, especially in English, where proper nouns are usually
capitalized, NER systems usually use more advanced machine learning algorithms. These algorithms
look at a wider range of language features, not just capitalization and punctuation, to identify entities.
For example, in the sentence “John lives in New York, and he works for IBM.”, an NER system would
identify “John,” “New York,” and “IBM” as named entities. The system recognizes “John” as a person,
“New York” as a location, and “IBM” as an organization without necessarily dividing the text into separate
sentences for this step.
Entity classification
Entity classification is a pivotal step in NER, where the system categorizes words or phrases into
predefined types such as location, people, organization, event, time, and so on, using machine learning
techniques.
4/17
Here is how it happens:
Feature extraction: NER systems analyze the text to extract various features that aid in classifying
entities. These features may include the word itself, its part-of-speech tag, the surrounding words,
and broader context. Such linguistic features are crucial for capturing the nuances that inform the
entity’s category.
Training and classification: To prepare for classification, NER models are trained on datasets
where human annotators have manually labeled entities. During training, the model discerns
patterns that it uses to predict entity types in new texts. Common algorithms for NER include
Conditional Random Fields (CRF) and Hidden Markov Models (HMM).
Throughout training, models learn to recognize patterns and cues. For instance, a capitalized word
followed by “Inc.” or “Co.” is likely an organization, while phrases like “born,” “lives in,” or “from”
often signal a person’s name or location.
Prediction: With training complete, the NER model is equipped to classify entities in unseen texts.
It assesses the text, assigns a category to each detected named entity and outputs a list of labeled
entities.
In the sentence “John lives in New York, and he works for IBM.”, an NER system would classify “John” as
a person, “New York” as a location, and “IBM” as an organization.
NER systems can achieve high accuracy but may encounter challenges in ambiguous entities,
misspellings, or rare names not present in the training data. Regular updates and retraining with new
data can help improve the performance of the NER model over time.
Input
Output
Pre-process
Feature
Extraction
Classification
Barack Obama The
44th
President of USA,
Was Born In Honolulu,
Hawaii.
Barack Obama The
44th
President of USA,
Was Born In Honolulu,
Hawaii.
Named Entity
Extraction
Barack Obama
The 44th
President of
USA, Was
Born In Honolulu,
Hawaii.
(Person)
(Location)
(Location)
LeewayHertz
An overview of named entity recognition methodologies
5/17
There are several approaches to NER, each with its own methodology and level of complexity. Here are
the most common ones:
Rule-based systems
Rule-based systems are usually based on hand-crafted rules written by persons with domain expertise.
These rules can be based on patterns in the text, lexical information, or syntactic structure. While rules
can be very effective in some domains, they can be challenging to develop and maintain, and they often
do not generalize well to new domains or languages.
Statistical models
Statistical models for named entity recognition operate on the premise that named entities can be
differentiated from other words in the text based on their surrounding context. Hidden Markov models
(HMMs), maximum entropy (Maxent) models, and support vector machines (SVMs) are common
statistical approaches used in NER. These models learn from labeled training data, capturing the
statistical patterns and dependencies between named entities and their associated words. However, a
major challenge is the need for a large amount of annotated training data, which can be time-consuming
and costly to obtain. Techniques like data augmentation, transfer learning, and semi-supervised learning
are employed to mitigate this. Although deep learning models have shown remarkable advancements in
NER, they require significant computational resources and extensive labeled data for training.
Hybrid systems
In a hybrid NER system, different techniques can be used in conjunction with each other to enhance the
overall performance. For example, a hybrid approach may involve combining rule-based methods with
statistical models. Statistical or machine learning models are utilized to recognize more complex and
diverse named entities. These models can learn patterns and features from annotated training data,
enabling them to generalize well to unseen text.
ML-based approach
The ML approach in NER involves training models to automatically recognize and classify named entities
in text using machine learning techniques. This approach relies on the ability of machine learning
algorithms to learn patterns and make predictions based on labeled training data.
In the ML approach, the first step is to prepare a labeled dataset where named entities are manually
annotated. This dataset consists of text examples along with the corresponding entity labels. Features
are then extracted from the text, which captures important characteristics of the words and their context.
These features can include the surrounding words, part-of-speech tags, syntactic dependencies, or other
linguistic attributes.
NLP models used for named entity recognition
Various approaches can be used for named entity recognition, but two of the most common ones are:
1. Maximum Entropy Markov Model (MEMM), and
2. Conditional Random Fields (CRF)
6/17
MEMM
MEMM is a discriminative model used in NER. It calculates the conditional probability, which is the
likelihood of a sequence of tags given a sequence of words. This enables MEMM to differentiate among
potential tag sequences by selecting the one with the highest probability.
The MEMM model constructs a probability distribution that incorporates various features, which can be
either manually crafted or learned during training. The goal is to find the distribution with maximum
entropy that still meets the constraints set by these features, allowing the inclusion of diverse
characteristics like capitalization, punctuation, and suffixes.
MEMM is adept at handling a wide range of non-independent features, meaning it can model complex
dependencies within the data. However, it is subject to the ‘label bias problem,’ where the transition
probabilities are normalized at each state, leading to potential biases. For instance, if a state has a single
outgoing transition, the model will inevitably select it, regardless of the subsequent observation.
Consider a character-level MEMM analyzing the sequence “rib”. If ‘r’ is encountered, paths for “rib” and
“rob” might initially have the same probability. Upon observing ‘i’, the model transitions only to the state
linked with “rib”, channeling all probability there. When ‘b’ appears, if it leads to only one possible state, it
again receives full probability, perpetuating the bias.
MEMM’s advantages include its versatility across different languages and domains, its efficiency with
large datasets, and its quick processing capability. It systematically identifies sequences of capitalized
words in the text and classifies them as named entities, although it requires careful feature selection to
perform optimally.
CRF
CRF focuses on modeling the conditional probability distribution of the hidden variables (labels) given the
observed variables (input features). This means that CRFs are discriminative models as they directly
model the relationship between the observed and hidden variables without explicitly modeling their joint
distribution.
To capture the dependencies and patterns in the data, CRFs use manually defined feature functions.
These feature functions describe certain properties or characteristics of the observed variables and their
relationships to the hidden variables. In the context of sequence labeling tasks like part-of-speech (POS)
tagging, these feature functions often depend on the position of words in the sequence and the
surrounding words.
For example, a feature function could be defined to check whether a word is a question mark and
whether it is the first word of the sequence, indicating the beginning of a question. Another feature
function could examine whether the current word is a noun and the previous word is also a noun,
capturing the pattern of consecutive nouns. Similarly, a feature function might identify if the current word
is a pronoun and the next word is a verb, indicating a potential subject-verb relationship.
The feature functions can be designed based on domain knowledge and task-specific requirements. By
defining these feature functions, we establish the connections between the observed and hidden
7/17
variables. The weights of the feature functions are learned during the training of the CRF, allowing the
model to assign importance to different features for making predictions.
CRFs rely on manually defined feature functions to capture relevant information from the observed
variables to model the conditional distribution of the hidden variables given the observations. This
enables them to effectively address sequence labeling tasks by considering the dependencies and
patterns within the data. CRFs are trained on labeled data and learn to predict named entity labels based
on the contextual information of words. They are effective because they capture dependencies between
words and labels, making them a valuable tool for named entity recognition tasks.
Named entity recognition methods
The named entity recognition methods include:
Ontology-based NER
Ontology-based NER is a knowledge-based process that collects data sets containing words, terms, and
their relationships to recognize entities in text. The granularity of an ontology directly influences the
breadth and precision of the outcomes in named entity recognition. For example, a free encyclopedia
would require a high-level ontology to capture and structure a wide range of information. In contrast, a
company in the medical science field would need a more detailed ontology to handle the complexities of
medical terminologies.
Ontologies play a vital role in natural language processing by facilitating semantic understanding and
knowledge representation. The process begins with ontology construction, where concepts, relationships,
and properties relevant to the domain are identified and defined. Knowledge acquisition techniques are
then used to populate the ontology with information extracted from text corpora or structured data
sources. Ontology alignment allows for the integration of multiple ontologies, ensuring interoperability.
Semantic annotation involves mapping text or data to ontology concepts, enabling advanced search and
retrieval. Ontologies also support semantic reasoning, allowing for the inference of new knowledge based
on existing ontology relationships.
In question-answering and dialogue systems, ontologies enhance understanding and enable more
accurate responses. Furthermore, ontologies serve as a foundational knowledge representation for
various NLP applications, empowering information extraction, text summarization, machine translation,
sentiment analysis, and more. Therefore, ontologies in NLP provide a structured and standardized
framework for organizing and processing domain-specific knowledge.
Ontology-based NER is similar to machine learning approaches because it can identify known terms and
concepts in unstructured or semi-structured text. However, it also relies on updates to stay current. As
new terms and concepts emerge or existing ones change, the ontology must be updated to ensure
accurate recognition.
Deep learning NER
Deep learning elevates NER accuracy beyond ontology-based methods by discerning word relationships
through word embeddings. These embeddings are specialized representations that encapsulate both
8/17
semantic and syntactic word relationships.
The deep learning approach to NER involves several steps:
Data preparation: A dataset with labeled examples is prepared.
Word embedding: Words are transformed into embeddings that capture nuanced meanings.
Model training: A deep learning model, attentive to word order and context, is trained on this data.
Evaluation and tuning: The model’s predictions are evaluated, and its accuracy is refined.
Prediction: The trained model can then identify named entities in new texts.
Deep learning’s strength in NER lies in its capacity to learn and recognize intricate patterns
autonomously. It offers the advantage of identifying entities that may not exist in an ontology, having been
trained on diverse language data. Deep learning NER is versatile, automating repetitive tasks, thus
saving researchers valuable time.
While deep learning models for NER demonstrate enhanced linguistic understanding, they are data-
hungry, requiring extensive labeled datasets and significant computational power. Despite these
demands, their automated learning prowess renders them highly efficient in extracting named entities
from vast, unstructured texts.
How to perform named entity recognition using Python?
In this section, we delve into NER, a crucial aspect of NLP. We will showcase the significance of NER
using examples, first with SpaCy, a renowned NLP library. Demonstrations include extracting entities from
general and scientific texts. Additionally, we highlight the application of NER in web scraping, illustrating
how it can be employed to extract valuable information from a news article. This section underscores the
versatile utility of NER in uncovering meaningful entities across various contexts and data sources. Let’s
understand in detail:
NER using Spacy
SpaCy is a powerful open-source library for NLP that offers a range of functionalities, including built-in
methods for NER. It provides a fast statistical entity recognition system, making it an efficient choice for
NER tasks.
Using SpaCy for NER is straightforward, and while there may be cases where training custom data is
necessary for specific business needs, the pre-trained SpaCy models generally perform well on various
types of text data.
You’ll need to import the Spacy library and initialize a Spacy model to get started. Here’s an example
code snippet to illustrate the process:
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import spacy
from spacy import displacy
9/17
NER = spacy.load("en_core_web_sm")
import spacy from spacy import displacy NER = spacy.load("en_core_web_sm")
import spacy
from spacy import displacy
NER = spacy.load("en_core_web_sm")
Now, we enter our sample text which we shall be testing.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
raw_text="LeewayHertz, During our 15 years in the industry, we have designed and developed platforms
for startups and enterprises. Our award-winning work generates billions in revenue and is trusted by
millions of users."
raw_text="LeewayHertz, During our 15 years in the industry, we have designed and developed platforms
for startups and enterprises. Our award-winning work generates billions in revenue and is trusted by
millions of users."
raw_text="LeewayHertz, During our 15 years in the industry, we have designed
and developed platforms for startups and enterprises. Our award-winning work
generates billions in revenue and is trusted by millions of users."
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
text1= NER(raw_text)
text1= NER(raw_text)
text1= NER(raw_text)
Now, we print the data and the corresponding label/category of each named entity detected in the
processed text using spaCy.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
for word in text1.ents:
print(word.text,word.label_)
for word in text1.ents: print(word.text,word.label_)
10/17
for word in text1.ents:
print(word.text,word.label_)
The output:
LeewayHertz ORG
our 15 years DATE
billions CARDINAL
millions CARDINAL
Now, we have extracted all the named entities from the given text. We can utilize the following method if
we encounter any difficulties in determining the specific type of a particular named entity.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
spacy.explain("ORG")
spacy.explain("ORG")
spacy.explain("ORG")
Output: Companies, agencies, institutions, etc.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
displacy.render(text1,style="ent",jupyter=True)
displacy.render(text1,style="ent",jupyter=True)
displacy.render(text1,style="ent",jupyter=True)
Now, we will try an interesting visual showing the NEs directly in the text.
LeewayHertz ORG, During our 15 years DATE in the industry, we have designed and developed
platforms for startups and enterprises. Our award-winning work generates billions CARDINAL in revenue
and is trusted by millions CARDINAL of users.
Let us try the same tasks with some tests containing more Named Entities.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
11/17
raw_text2="The ISO mission resulted from a proposal made to ESA in 1979. After a number of studies
ISO was selected in 1983 as the next new start in the ESA Scientific Programme. Following a Call for
Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985. The
two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo-polarimeter (ISOPHOT)
jointly covered wavelengths from 2.5 to around 240 microns with spatial resolutions ranging from 1.5
arcseconds (at the shortest wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite
design and main development phases started in 1986 and 1988, respectively. ISO was launched
perfectly in November 1995 by an Ariane 44P vehicle."
raw_text2="The ISO mission resulted from a proposal made to ESA in 1979. After a number of studies
ISO was selected in 1983 as the next new start in the ESA Scientific Programme. Following a Call for
Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985. The
two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo-polarimeter (ISOPHOT)
jointly covered wavelengths from 2.5 to around 240 microns with spatial resolutions ranging from 1.5
arcseconds (at the shortest wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite
design and main development phases started in 1986 and 1988, respectively. ISO was launched
perfectly in November 1995 by an Ariane 44P vehicle."
raw_text2="The ISO mission resulted from a proposal made to ESA in 1979.
After a number of studies ISO was selected in 1983 as the next new start in
the ESA Scientific Programme. Following a Call for Experiment and Mission
Scientist Proposals, the scientific instruments were selected in mid 1985.
The two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo-
polarimeter (ISOPHOT) jointly covered wavelengths from 2.5 to around 240
microns with spatial resolutions ranging from 1.5 arcseconds (at the shortest
wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite
design and main development phases started in 1986 and 1988, respectively.
ISO was launched perfectly in November 1995 by an Ariane 44P vehicle."
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
text2= NER(raw_text2)
for word in text2.ents:
print(word.text,word.label_)
text2= NER(raw_text2) for word in text2.ents: print(word.text,word.label_)
text2= NER(raw_text2)
for word in text2.ents:
print(word.text,word.label_)
The output
ISO ORG ESA ORG
12/17
1979 DATE ISO ORG
1983 DATE
the ESA Scientific Programme ORG
mid 1985 DATE
two CARDINAL
SWS ORG
LWS ORG
2.5 CARDINAL
1.5 CARDINAL
90 CARDINAL 1
986 DATE 1
988 DATE
ISO ORG November
1995 DATE
Here, we get more types of named entities. Let us identify what type they are.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
spacy.explain("DATE")
spacy.explain("DATE")
spacy.explain("DATE")
Output: Absolute or relative dates or periods
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
spacy.explain("CARDINAL")
spacy.explain("CARDINAL")
spacy.explain("CARDINAL")
13/17
Output: Numerals that do not fall under another type
Now, we analyze the text as a whole in the form of a visual.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
displacy.render(text2,style="ent",jupyter=True)
displacy.render(text2,style="ent",jupyter=True)
displacy.render(text2,style="ent",jupyter=True)
Output
The ISO ORG mission resulted from a proposal made to ESA ORG in 1979 DATE . After a number of
studies ISO ORG was selected in 1983 DATE as the next new start in the ESA Scientific Programme
ORG . Following a Call for Experiment and Mission Scientist Proposals, the scientific instruments were
selected in mid 1985 DATE . The two CARDINAL spectrometers ( SWS ORG , LWS ORG ), a camera
(ISOCAM) and an imaging photo-polarimeter (ISOPHOT) jointly covered wavelengths from 2.5
CARDINAL to around 240 microns with spatial resolutions ranging from 1.5 CARDINAL arcseconds (at
the shortest wavelengths) to 90 CARDINAL arcseconds (at the longer wavelengths). The satellite design
and main development phases started in 1986 DATE and 1988 DATE , respectively. ISO ORG was
launched perfectly in November 1995 DATE by an Ariane 44P vehicle.
We will utilize the Python package BeautifulSoup for web scraping to gather data from a news article and
then perform NER on the extracted text data.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from bs4 import BeautifulSoup
import requests
import re
from bs4 import BeautifulSoup import requests import re
from bs4 import BeautifulSoup
import requests
import re
Now, we will use the URL of the news article
Plain text
Copy to clipboard
Open code in new window
14/17
EnlighterJS 3 Syntax Highlighter
URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news-inr-yen-two-week-high-
as-data-boosts-fed-hike-expectations-jerome-powell-242235"
URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news-inr-yen-two-week-high-
as-data-boosts-fed-hike-expectations-jerome-powell-242235"
URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news-
inr-yen-two-week-high-as-data-boosts-fed-hike-expectations-jerome-powell-
242235"
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
html_content = requests.get(URL).text
soup = BeautifulSoup(html_content, "lxml")
html_content = requests.get(URL).text soup = BeautifulSoup(html_content, "lxml")
html_content = requests.get(URL).text
soup = BeautifulSoup(html_content, "lxml")
Now, we will move to the body content
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
body=soup.body.text
body=soup.body.text
body=soup.body.text
Now, clean the text using regex. Let us have a look at the text.
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
body[1000:1500]
body[1000:1500]
body[1000:1500]
Plain text
15/17
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ws »n nCurrency NewsnnnnnnDollar index hits two-week high as data boosts Fed hike
expectationsnUS dollar rate index news:xa0The U.S. dollar index climbed to a two-week high on
Thursday after economic data showed the labor market remained on a solid footing, giving the Federal
Reserve a possible cushion to continue raising interest rates.nnnnnnnView in Appnnn US dollar
rate index news: The U.S. dollar index climbed to a two-week high on Thursday after economic data
showed the labor market
ws »n nCurrency NewsnnnnnnDollar index hits two-week high as data boosts Fed hike
expectationsnUS dollar rate index news:xa0The U.S. dollar index climbed to a two-week high on
Thursday after economic data showed the labor market remained on a solid footing, giving the Federal
Reserve a possible cushion to continue raising interest rates.nnnnnnnView in Appnnn US dollar
rate index news: The U.S. dollar index climbed to a two-week high on Thursday after economic data
showed the labor market
ws »n nCurrency NewsnnnnnnDollar index hits two-week high as
data boosts Fed hike expectationsnUS dollar rate index news:xa0The U.S.
dollar index climbed to a two-week high on Thursday after economic data
showed the labor market remained on a solid footing, giving the Federal
Reserve a possible cushion to continue raising interest
rates.nnnnnnnView in Appnnn US dollar rate index news: The U.S.
dollar index climbed to a two-week high on Thursday after economic data
showed the labor market
Proceeding with NER
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
text3= NER(body)
displacy.render(text3,style="ent",jupyter=True)
text3= NER(body) displacy.render(text3,style="ent",jupyter=True)
text3= NER(body)
displacy.render(text3,style="ent",jupyter=True)
Use cases of named entity recognition
NER has various use cases across different domains and industries. Some of the common use cases of
NER include:
16/17
Information extraction: NER is widely used to extract valuable information from unstructured text, such
as news articles, research papers, and social media posts. By identifying and classifying named entities
like people, organizations, locations, and dates, NER helps understand the key entities mentioned in the
text.
Document organization and search: NER plays a crucial role in organizing and indexing documents for
efficient information retrieval. By identifying and tagging named entities, documents can be categorized
and searched based on specific entities, making it easier to find relevant information.
Social media analysis: NER is used in social media monitoring and sentiment analysis. It helps in
extracting mentions of brands, products, and people in social media posts and comments, allowing
companies to understand public opinions and trends.
Recommendation systems: NER can be employed in recommendation systems to understand user
preferences and interests. Personalized recommendations can be generated by recognizing entities like
movie titles, books, or music artists in user reviews or interactions.
Healthcare and medical records: In the medical domain, NER is used to extract information from
medical records, such as patient names, medical conditions, treatments, and medications. It aids in
organizing medical data and supporting clinical decision-making.
Chatbots and virtual assistants: NER is essential in natural language processing systems, including
chatbots and virtual assistants. It helps understand user queries and extract relevant entities to provide
accurate responses.
Language translation: NER is used in machine translation systems to identify named entities in the
source language and ensure their proper translation into the target language.
Event detection and news summarization: NER can be applied to identify events and key entities
mentioned in news articles, enabling automatic news summarization and event tracking.
NER is a versatile and valuable tool for extracting valuable information from unstructured text, enabling
various applications that enhance data analysis, decision-making, and user experiences in diverse
domains.
Endnote
Named entity recognition emerges as a pivotal pillar within the realm of natural language processing,
wielding the power to unlock the latent treasures embedded within vast oceans of textual data. With its
ability to identify and categorize named entities, NER bestows structure and context upon the
unstructured text, empowering machines to comprehend and interact with human language more
effectively. As NER continues to evolve with advancements in machine learning and linguistic
methodologies, its applications across industries are boundless, significantly impacting how we interpret,
analyze, and extract meaningful insights from the written word. From aiding sentiment analysis to
streamlining information retrieval and powering intelligent systems, NER remains an indispensable tool in
harnessing the true potential of language in the age of data-driven decision-making.
17/17
NER helps transform texts into actionable insights. Unleash the power of your data with LeewayHertz’s
NER solutions.

More Related Content

Similar to leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructured text.pdf

information extraction by selamu shirtawi
information extraction by selamu shirtawiinformation extraction by selamu shirtawi
information extraction by selamu shirtawi
selamu shirtawi
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
nilesh405711
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
ratnapatil14
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 

Similar to leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructured text.pdf (20)

SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITIONSEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
SEMI-SUPERVISED BOOTSTRAPPING APPROACH FOR NAMED ENTITY RECOGNITION
 
Natural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptxNatural Language Processing_in semantic web.pptx
Natural Language Processing_in semantic web.pptx
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak Reports
 
Top 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data ScientistsTop 10 Must-Know NLP Techniques for Data Scientists
Top 10 Must-Know NLP Techniques for Data Scientists
 
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
 
information extraction by selamu shirtawi
information extraction by selamu shirtawiinformation extraction by selamu shirtawi
information extraction by selamu shirtawi
 
Natural Language Processing .pdf
Natural Language Processing .pdfNatural Language Processing .pdf
Natural Language Processing .pdf
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Frame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptxFrame-Script and Predicate logic.pptx
Frame-Script and Predicate logic.pptx
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Semantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and RecruitingSemantic Search for Sourcing and Recruiting
Semantic Search for Sourcing and Recruiting
 
Named Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document CorpusNamed Entity Recognition Using Web Document Corpus
Named Entity Recognition Using Web Document Corpus
 
Named entity recognition using web document corpus
Named entity recognition using web document corpusNamed entity recognition using web document corpus
Named entity recognition using web document corpus
 
Day2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt ppppppppppppppppppppppppppDay2-Slides.ppt pppppppppppppppppppppppppp
Day2-Slides.ppt pppppppppppppppppppppppppp
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 

More from KristiLBurns

More from KristiLBurns (20)

leewayhertz.com-AI-powered dynamic pricing solutions Optimizing revenue in re...
leewayhertz.com-AI-powered dynamic pricing solutions Optimizing revenue in re...leewayhertz.com-AI-powered dynamic pricing solutions Optimizing revenue in re...
leewayhertz.com-AI-powered dynamic pricing solutions Optimizing revenue in re...
 
leewayhertz.com-Automated invoice processing Leveraging AI for Accounts Payab...
leewayhertz.com-Automated invoice processing Leveraging AI for Accounts Payab...leewayhertz.com-Automated invoice processing Leveraging AI for Accounts Payab...
leewayhertz.com-Automated invoice processing Leveraging AI for Accounts Payab...
 
leewayhertz.com-Predicting the pulse of the market AI in trend analysis.pdf
leewayhertz.com-Predicting the pulse of the market AI in trend analysis.pdfleewayhertz.com-Predicting the pulse of the market AI in trend analysis.pdf
leewayhertz.com-Predicting the pulse of the market AI in trend analysis.pdf
 
leewayhertz.com-AI in networking Redefining digital connectivity and efficien...
leewayhertz.com-AI in networking Redefining digital connectivity and efficien...leewayhertz.com-AI in networking Redefining digital connectivity and efficien...
leewayhertz.com-AI in networking Redefining digital connectivity and efficien...
 
leewayhertz.com-AI in procurement Redefining efficiency through automation.pdf
leewayhertz.com-AI in procurement Redefining efficiency through automation.pdfleewayhertz.com-AI in procurement Redefining efficiency through automation.pdf
leewayhertz.com-AI in procurement Redefining efficiency through automation.pdf
 
leewayhertz.com-AI in production planning Pioneering innovation in the heart ...
leewayhertz.com-AI in production planning Pioneering innovation in the heart ...leewayhertz.com-AI in production planning Pioneering innovation in the heart ...
leewayhertz.com-AI in production planning Pioneering innovation in the heart ...
 
leewayhertz.com-Federated learning Unlocking the potential of secure distribu...
leewayhertz.com-Federated learning Unlocking the potential of secure distribu...leewayhertz.com-Federated learning Unlocking the potential of secure distribu...
leewayhertz.com-Federated learning Unlocking the potential of secure distribu...
 
leewayhertz.com-AI in product lifecycle management A paradigm shift in innova...
leewayhertz.com-AI in product lifecycle management A paradigm shift in innova...leewayhertz.com-AI in product lifecycle management A paradigm shift in innova...
leewayhertz.com-AI in product lifecycle management A paradigm shift in innova...
 
leewayhertz.com-AI in Master Data Management MDM Pioneering next-generation d...
leewayhertz.com-AI in Master Data Management MDM Pioneering next-generation d...leewayhertz.com-AI in Master Data Management MDM Pioneering next-generation d...
leewayhertz.com-AI in Master Data Management MDM Pioneering next-generation d...
 
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...leewayhertz.com-How to build a generative AI solution From prototyping to pro...
leewayhertz.com-How to build a generative AI solution From prototyping to pro...
 
leewayhertz.com-The future of production Generative AI in manufacturing.pdf
leewayhertz.com-The future of production Generative AI in manufacturing.pdfleewayhertz.com-The future of production Generative AI in manufacturing.pdf
leewayhertz.com-The future of production Generative AI in manufacturing.pdf
 
leewayhertz.com-AI use cases and applications in private equity principal inv...
leewayhertz.com-AI use cases and applications in private equity principal inv...leewayhertz.com-AI use cases and applications in private equity principal inv...
leewayhertz.com-AI use cases and applications in private equity principal inv...
 
leewayhertz.com-The role of AI in logistics and supply chain.pdf
leewayhertz.com-The role of AI in logistics and supply chain.pdfleewayhertz.com-The role of AI in logistics and supply chain.pdf
leewayhertz.com-The role of AI in logistics and supply chain.pdf
 
leewayhertz.com-AI in the workplace Transforming todays work dynamics.pdf
leewayhertz.com-AI in the workplace Transforming todays work dynamics.pdfleewayhertz.com-AI in the workplace Transforming todays work dynamics.pdf
leewayhertz.com-AI in the workplace Transforming todays work dynamics.pdf
 
leewayhertz.com-AI in knowledge management Paving the way for transformative ...
leewayhertz.com-AI in knowledge management Paving the way for transformative ...leewayhertz.com-AI in knowledge management Paving the way for transformative ...
leewayhertz.com-AI in knowledge management Paving the way for transformative ...
 
leewayhertz.com-AI in accounting and auditing Blazing new trails in financial...
leewayhertz.com-AI in accounting and auditing Blazing new trails in financial...leewayhertz.com-AI in accounting and auditing Blazing new trails in financial...
leewayhertz.com-AI in accounting and auditing Blazing new trails in financial...
 
leewayhertz.com-How AI-driven development is reshaping the tech landscape.pdf
leewayhertz.com-How AI-driven development is reshaping the tech landscape.pdfleewayhertz.com-How AI-driven development is reshaping the tech landscape.pdf
leewayhertz.com-How AI-driven development is reshaping the tech landscape.pdf
 
leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...leewayhertz.com-AI in market research Charting a course from raw data to stra...
leewayhertz.com-AI in market research Charting a course from raw data to stra...
 
leewayhertz.com-AI in web3 How AI manifests in the world of web3.pdf
leewayhertz.com-AI in web3 How AI manifests in the world of web3.pdfleewayhertz.com-AI in web3 How AI manifests in the world of web3.pdf
leewayhertz.com-AI in web3 How AI manifests in the world of web3.pdf
 
leewayhertz.com-Generative AI in manufacturing.pdf
leewayhertz.com-Generative AI in manufacturing.pdfleewayhertz.com-Generative AI in manufacturing.pdf
leewayhertz.com-Generative AI in manufacturing.pdf
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructured text.pdf

  • 1. 1/17 www.leewayhertz.com /named-entity-recognition/ Named Entity Recognition (NER): Unveiling the value in unstructured text LeewayHertz Structured Text Classification E1A gene expression induces susceptibility to killing by NK cells. PoS Tagging Sentence Segmentation Tokenisation Features Processing Extraction Model Recognition Module E1A gene (D N Region) expression induces susceptibility to killing by NK cells (CellType) Unstructured Text In our digitally interconnected world, the immense generation of textual data has become a staple of daily life. From social media updates and news articles to emails and other sources, we contribute to a vast repository of information every day. Yet, the true value within this data often remains locked away due to its unstructured format, demanding sophisticated techniques for processing. Within Natural Language Processing (NLP), Named Entity Recognition (NER) stands out as a critical tool for gleaning meaningful insights from this unstructured textual data by skillfully identifying and categorizing named entities. NLP allows machines to comprehend, interpret, and interact with human language, thus narrowing the divide between humans and computers. According to Markets and Markets, the NLP market size reached $18.9 billion in 2023 and is expected to experience significant growth, aiming to hit $68.1 billion by 2028 with a CAGR of 29.3%. At its essence, named entity recognition acts as a vital process for detecting and classifying named entities within texts, revealing their significance and facilitating a more profound level of analysis. These entities span various categories, including people, organizations, locations, dates, and other contextual indicators. Through the identification and extraction of these entities, NER converts a sea of unstructured text into structured information. By clarifying the identities and classifications of named entities, NER lays the groundwork for detailed analysis, empowering individuals and organizations to make well-informed decisions and unearth the hidden treasures within the textual landscape.
  • 2. 2/17 Join us as we explore the nuances of named entity recognition, demystifying its fundamental principles and operations to gain a full appreciation of its capabilities and the intricacies of its application. What is Named Entity Recognition (NER)? Key components of named entity recognition The working mechanism of named entity recognition An overview of named entity recognition methodologies NLP models used for named entity recognition Named entity recognition methods How to perform named entity recognition using Python? Use cases of named entity recognition What is Named Entity Recognition (NER)? NER is a process used in Natural Language Processing (NLP) where a computer program analyzes text to identify and extract important pieces of information, such as names of people, places, organizations, dates, and more. Employing NER allows a computer program to automatically recognize and categorize these specific pieces of information within the text. This is especially useful when dealing with large volumes of text, where manually identifying and organizing such entities would be both time-consuming and prone to errors. NER involves two key tasks, both crucial for effectively processing text and extracting valuable information. The first task is identifying significant words and phrases, particularly proper nouns, within the text. This step requires precisely locating and annotating these words to mark them as named entities. Once the named entities are identified, the second task of NER, classification, begins. In this stage, the recognized entities are sorted into predetermined categories based on their nature. These categories can include personal names, organizations (such as companies, government bodies, and committees), locations (ranging from cities to countries and rivers), and temporal expressions indicating specific dates or times. Consider the sentence: “Apple Inc. is planning to open a new store in New York City next month.” In this sentence, “Apple Inc.” is a named entity referring to an organization, while “New York City” is a named entity representing a location. The first task of NER is to identify these proper names or phrases within the text. Here, “Apple Inc.” and “New York City” are the identified named entities. The second task involves classifying these named entities into predefined categories. In our example, “Apple Inc.” would be categorized under organizations, and “New York City” would fall under the category of locations. NER efficiently extracts and classifies these specific entities from the sentence, enabling further analysis or information retrieval based on the identified named entities.
  • 3. 3/17 Key components of named entity recognition In Natural Language Processing, a model designed for NER comprises several essential components, which include: Tokenization: The text is divided into individual tokens, which are typically words or punctuation marks. Tokenization helps in creating a structured representation of the text. Part-of-speech tagging: Each token is labeled with its corresponding part of speech, such as noun, verb, adjective, etc. This step provides grammatical context and aids in understanding the syntactic structure of the text. Chunking: Tokens are grouped into “chunks” based on their part-of-speech tags. Chunking allows for identifying and extracting meaningful phrases or entities from the text. Named entity recognition: This component is responsible for identifying named entities, such as names of people, organizations, locations, dates, and other specific entities. It involves classifying these entities into predefined categories or types. Entity disambiguation: In situations where multiple entities share the same name in the text, entity disambiguation is performed to determine the correct meaning of the named entity. This process considers the surrounding context and additional information to resolve any ambiguities. These components are foundational for NER and contribute to the model’s ability to process and understand text at a level that is useful for practical applications. The working mechanism of named entity recognition NER systems typically follow a two-step process: Boundary detection Entity classification Boundary detection The first step in Named Entity Recognition (NER) is to figure out where each named entity starts and ends in the text. This means identifying the beginning and ending points of entities, like names of people or places. While capital letters can give us clues, especially in English, where proper nouns are usually capitalized, NER systems usually use more advanced machine learning algorithms. These algorithms look at a wider range of language features, not just capitalization and punctuation, to identify entities. For example, in the sentence “John lives in New York, and he works for IBM.”, an NER system would identify “John,” “New York,” and “IBM” as named entities. The system recognizes “John” as a person, “New York” as a location, and “IBM” as an organization without necessarily dividing the text into separate sentences for this step. Entity classification Entity classification is a pivotal step in NER, where the system categorizes words or phrases into predefined types such as location, people, organization, event, time, and so on, using machine learning techniques.
  • 4. 4/17 Here is how it happens: Feature extraction: NER systems analyze the text to extract various features that aid in classifying entities. These features may include the word itself, its part-of-speech tag, the surrounding words, and broader context. Such linguistic features are crucial for capturing the nuances that inform the entity’s category. Training and classification: To prepare for classification, NER models are trained on datasets where human annotators have manually labeled entities. During training, the model discerns patterns that it uses to predict entity types in new texts. Common algorithms for NER include Conditional Random Fields (CRF) and Hidden Markov Models (HMM). Throughout training, models learn to recognize patterns and cues. For instance, a capitalized word followed by “Inc.” or “Co.” is likely an organization, while phrases like “born,” “lives in,” or “from” often signal a person’s name or location. Prediction: With training complete, the NER model is equipped to classify entities in unseen texts. It assesses the text, assigns a category to each detected named entity and outputs a list of labeled entities. In the sentence “John lives in New York, and he works for IBM.”, an NER system would classify “John” as a person, “New York” as a location, and “IBM” as an organization. NER systems can achieve high accuracy but may encounter challenges in ambiguous entities, misspellings, or rare names not present in the training data. Regular updates and retraining with new data can help improve the performance of the NER model over time. Input Output Pre-process Feature Extraction Classification Barack Obama The 44th President of USA, Was Born In Honolulu, Hawaii. Barack Obama The 44th President of USA, Was Born In Honolulu, Hawaii. Named Entity Extraction Barack Obama The 44th President of USA, Was Born In Honolulu, Hawaii. (Person) (Location) (Location) LeewayHertz An overview of named entity recognition methodologies
  • 5. 5/17 There are several approaches to NER, each with its own methodology and level of complexity. Here are the most common ones: Rule-based systems Rule-based systems are usually based on hand-crafted rules written by persons with domain expertise. These rules can be based on patterns in the text, lexical information, or syntactic structure. While rules can be very effective in some domains, they can be challenging to develop and maintain, and they often do not generalize well to new domains or languages. Statistical models Statistical models for named entity recognition operate on the premise that named entities can be differentiated from other words in the text based on their surrounding context. Hidden Markov models (HMMs), maximum entropy (Maxent) models, and support vector machines (SVMs) are common statistical approaches used in NER. These models learn from labeled training data, capturing the statistical patterns and dependencies between named entities and their associated words. However, a major challenge is the need for a large amount of annotated training data, which can be time-consuming and costly to obtain. Techniques like data augmentation, transfer learning, and semi-supervised learning are employed to mitigate this. Although deep learning models have shown remarkable advancements in NER, they require significant computational resources and extensive labeled data for training. Hybrid systems In a hybrid NER system, different techniques can be used in conjunction with each other to enhance the overall performance. For example, a hybrid approach may involve combining rule-based methods with statistical models. Statistical or machine learning models are utilized to recognize more complex and diverse named entities. These models can learn patterns and features from annotated training data, enabling them to generalize well to unseen text. ML-based approach The ML approach in NER involves training models to automatically recognize and classify named entities in text using machine learning techniques. This approach relies on the ability of machine learning algorithms to learn patterns and make predictions based on labeled training data. In the ML approach, the first step is to prepare a labeled dataset where named entities are manually annotated. This dataset consists of text examples along with the corresponding entity labels. Features are then extracted from the text, which captures important characteristics of the words and their context. These features can include the surrounding words, part-of-speech tags, syntactic dependencies, or other linguistic attributes. NLP models used for named entity recognition Various approaches can be used for named entity recognition, but two of the most common ones are: 1. Maximum Entropy Markov Model (MEMM), and 2. Conditional Random Fields (CRF)
  • 6. 6/17 MEMM MEMM is a discriminative model used in NER. It calculates the conditional probability, which is the likelihood of a sequence of tags given a sequence of words. This enables MEMM to differentiate among potential tag sequences by selecting the one with the highest probability. The MEMM model constructs a probability distribution that incorporates various features, which can be either manually crafted or learned during training. The goal is to find the distribution with maximum entropy that still meets the constraints set by these features, allowing the inclusion of diverse characteristics like capitalization, punctuation, and suffixes. MEMM is adept at handling a wide range of non-independent features, meaning it can model complex dependencies within the data. However, it is subject to the ‘label bias problem,’ where the transition probabilities are normalized at each state, leading to potential biases. For instance, if a state has a single outgoing transition, the model will inevitably select it, regardless of the subsequent observation. Consider a character-level MEMM analyzing the sequence “rib”. If ‘r’ is encountered, paths for “rib” and “rob” might initially have the same probability. Upon observing ‘i’, the model transitions only to the state linked with “rib”, channeling all probability there. When ‘b’ appears, if it leads to only one possible state, it again receives full probability, perpetuating the bias. MEMM’s advantages include its versatility across different languages and domains, its efficiency with large datasets, and its quick processing capability. It systematically identifies sequences of capitalized words in the text and classifies them as named entities, although it requires careful feature selection to perform optimally. CRF CRF focuses on modeling the conditional probability distribution of the hidden variables (labels) given the observed variables (input features). This means that CRFs are discriminative models as they directly model the relationship between the observed and hidden variables without explicitly modeling their joint distribution. To capture the dependencies and patterns in the data, CRFs use manually defined feature functions. These feature functions describe certain properties or characteristics of the observed variables and their relationships to the hidden variables. In the context of sequence labeling tasks like part-of-speech (POS) tagging, these feature functions often depend on the position of words in the sequence and the surrounding words. For example, a feature function could be defined to check whether a word is a question mark and whether it is the first word of the sequence, indicating the beginning of a question. Another feature function could examine whether the current word is a noun and the previous word is also a noun, capturing the pattern of consecutive nouns. Similarly, a feature function might identify if the current word is a pronoun and the next word is a verb, indicating a potential subject-verb relationship. The feature functions can be designed based on domain knowledge and task-specific requirements. By defining these feature functions, we establish the connections between the observed and hidden
  • 7. 7/17 variables. The weights of the feature functions are learned during the training of the CRF, allowing the model to assign importance to different features for making predictions. CRFs rely on manually defined feature functions to capture relevant information from the observed variables to model the conditional distribution of the hidden variables given the observations. This enables them to effectively address sequence labeling tasks by considering the dependencies and patterns within the data. CRFs are trained on labeled data and learn to predict named entity labels based on the contextual information of words. They are effective because they capture dependencies between words and labels, making them a valuable tool for named entity recognition tasks. Named entity recognition methods The named entity recognition methods include: Ontology-based NER Ontology-based NER is a knowledge-based process that collects data sets containing words, terms, and their relationships to recognize entities in text. The granularity of an ontology directly influences the breadth and precision of the outcomes in named entity recognition. For example, a free encyclopedia would require a high-level ontology to capture and structure a wide range of information. In contrast, a company in the medical science field would need a more detailed ontology to handle the complexities of medical terminologies. Ontologies play a vital role in natural language processing by facilitating semantic understanding and knowledge representation. The process begins with ontology construction, where concepts, relationships, and properties relevant to the domain are identified and defined. Knowledge acquisition techniques are then used to populate the ontology with information extracted from text corpora or structured data sources. Ontology alignment allows for the integration of multiple ontologies, ensuring interoperability. Semantic annotation involves mapping text or data to ontology concepts, enabling advanced search and retrieval. Ontologies also support semantic reasoning, allowing for the inference of new knowledge based on existing ontology relationships. In question-answering and dialogue systems, ontologies enhance understanding and enable more accurate responses. Furthermore, ontologies serve as a foundational knowledge representation for various NLP applications, empowering information extraction, text summarization, machine translation, sentiment analysis, and more. Therefore, ontologies in NLP provide a structured and standardized framework for organizing and processing domain-specific knowledge. Ontology-based NER is similar to machine learning approaches because it can identify known terms and concepts in unstructured or semi-structured text. However, it also relies on updates to stay current. As new terms and concepts emerge or existing ones change, the ontology must be updated to ensure accurate recognition. Deep learning NER Deep learning elevates NER accuracy beyond ontology-based methods by discerning word relationships through word embeddings. These embeddings are specialized representations that encapsulate both
  • 8. 8/17 semantic and syntactic word relationships. The deep learning approach to NER involves several steps: Data preparation: A dataset with labeled examples is prepared. Word embedding: Words are transformed into embeddings that capture nuanced meanings. Model training: A deep learning model, attentive to word order and context, is trained on this data. Evaluation and tuning: The model’s predictions are evaluated, and its accuracy is refined. Prediction: The trained model can then identify named entities in new texts. Deep learning’s strength in NER lies in its capacity to learn and recognize intricate patterns autonomously. It offers the advantage of identifying entities that may not exist in an ontology, having been trained on diverse language data. Deep learning NER is versatile, automating repetitive tasks, thus saving researchers valuable time. While deep learning models for NER demonstrate enhanced linguistic understanding, they are data- hungry, requiring extensive labeled datasets and significant computational power. Despite these demands, their automated learning prowess renders them highly efficient in extracting named entities from vast, unstructured texts. How to perform named entity recognition using Python? In this section, we delve into NER, a crucial aspect of NLP. We will showcase the significance of NER using examples, first with SpaCy, a renowned NLP library. Demonstrations include extracting entities from general and scientific texts. Additionally, we highlight the application of NER in web scraping, illustrating how it can be employed to extract valuable information from a news article. This section underscores the versatile utility of NER in uncovering meaningful entities across various contexts and data sources. Let’s understand in detail: NER using Spacy SpaCy is a powerful open-source library for NLP that offers a range of functionalities, including built-in methods for NER. It provides a fast statistical entity recognition system, making it an efficient choice for NER tasks. Using SpaCy for NER is straightforward, and while there may be cases where training custom data is necessary for specific business needs, the pre-trained SpaCy models generally perform well on various types of text data. You’ll need to import the Spacy library and initialize a Spacy model to get started. Here’s an example code snippet to illustrate the process: Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter import spacy from spacy import displacy
  • 9. 9/17 NER = spacy.load("en_core_web_sm") import spacy from spacy import displacy NER = spacy.load("en_core_web_sm") import spacy from spacy import displacy NER = spacy.load("en_core_web_sm") Now, we enter our sample text which we shall be testing. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter raw_text="LeewayHertz, During our 15 years in the industry, we have designed and developed platforms for startups and enterprises. Our award-winning work generates billions in revenue and is trusted by millions of users." raw_text="LeewayHertz, During our 15 years in the industry, we have designed and developed platforms for startups and enterprises. Our award-winning work generates billions in revenue and is trusted by millions of users." raw_text="LeewayHertz, During our 15 years in the industry, we have designed and developed platforms for startups and enterprises. Our award-winning work generates billions in revenue and is trusted by millions of users." Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter text1= NER(raw_text) text1= NER(raw_text) text1= NER(raw_text) Now, we print the data and the corresponding label/category of each named entity detected in the processed text using spaCy. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter for word in text1.ents: print(word.text,word.label_) for word in text1.ents: print(word.text,word.label_)
  • 10. 10/17 for word in text1.ents: print(word.text,word.label_) The output: LeewayHertz ORG our 15 years DATE billions CARDINAL millions CARDINAL Now, we have extracted all the named entities from the given text. We can utilize the following method if we encounter any difficulties in determining the specific type of a particular named entity. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter spacy.explain("ORG") spacy.explain("ORG") spacy.explain("ORG") Output: Companies, agencies, institutions, etc. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter displacy.render(text1,style="ent",jupyter=True) displacy.render(text1,style="ent",jupyter=True) displacy.render(text1,style="ent",jupyter=True) Now, we will try an interesting visual showing the NEs directly in the text. LeewayHertz ORG, During our 15 years DATE in the industry, we have designed and developed platforms for startups and enterprises. Our award-winning work generates billions CARDINAL in revenue and is trusted by millions CARDINAL of users. Let us try the same tasks with some tests containing more Named Entities. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter
  • 11. 11/17 raw_text2="The ISO mission resulted from a proposal made to ESA in 1979. After a number of studies ISO was selected in 1983 as the next new start in the ESA Scientific Programme. Following a Call for Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985. The two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo-polarimeter (ISOPHOT) jointly covered wavelengths from 2.5 to around 240 microns with spatial resolutions ranging from 1.5 arcseconds (at the shortest wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite design and main development phases started in 1986 and 1988, respectively. ISO was launched perfectly in November 1995 by an Ariane 44P vehicle." raw_text2="The ISO mission resulted from a proposal made to ESA in 1979. After a number of studies ISO was selected in 1983 as the next new start in the ESA Scientific Programme. Following a Call for Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985. The two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo-polarimeter (ISOPHOT) jointly covered wavelengths from 2.5 to around 240 microns with spatial resolutions ranging from 1.5 arcseconds (at the shortest wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite design and main development phases started in 1986 and 1988, respectively. ISO was launched perfectly in November 1995 by an Ariane 44P vehicle." raw_text2="The ISO mission resulted from a proposal made to ESA in 1979. After a number of studies ISO was selected in 1983 as the next new start in the ESA Scientific Programme. Following a Call for Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985. The two spectrometers (SWS, LWS), a camera (ISOCAM) and an imaging photo- polarimeter (ISOPHOT) jointly covered wavelengths from 2.5 to around 240 microns with spatial resolutions ranging from 1.5 arcseconds (at the shortest wavelengths) to 90 arcseconds (at the longer wavelengths). The satellite design and main development phases started in 1986 and 1988, respectively. ISO was launched perfectly in November 1995 by an Ariane 44P vehicle." Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter text2= NER(raw_text2) for word in text2.ents: print(word.text,word.label_) text2= NER(raw_text2) for word in text2.ents: print(word.text,word.label_) text2= NER(raw_text2) for word in text2.ents: print(word.text,word.label_) The output ISO ORG ESA ORG
  • 12. 12/17 1979 DATE ISO ORG 1983 DATE the ESA Scientific Programme ORG mid 1985 DATE two CARDINAL SWS ORG LWS ORG 2.5 CARDINAL 1.5 CARDINAL 90 CARDINAL 1 986 DATE 1 988 DATE ISO ORG November 1995 DATE Here, we get more types of named entities. Let us identify what type they are. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter spacy.explain("DATE") spacy.explain("DATE") spacy.explain("DATE") Output: Absolute or relative dates or periods Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter spacy.explain("CARDINAL") spacy.explain("CARDINAL") spacy.explain("CARDINAL")
  • 13. 13/17 Output: Numerals that do not fall under another type Now, we analyze the text as a whole in the form of a visual. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter displacy.render(text2,style="ent",jupyter=True) displacy.render(text2,style="ent",jupyter=True) displacy.render(text2,style="ent",jupyter=True) Output The ISO ORG mission resulted from a proposal made to ESA ORG in 1979 DATE . After a number of studies ISO ORG was selected in 1983 DATE as the next new start in the ESA Scientific Programme ORG . Following a Call for Experiment and Mission Scientist Proposals, the scientific instruments were selected in mid 1985 DATE . The two CARDINAL spectrometers ( SWS ORG , LWS ORG ), a camera (ISOCAM) and an imaging photo-polarimeter (ISOPHOT) jointly covered wavelengths from 2.5 CARDINAL to around 240 microns with spatial resolutions ranging from 1.5 CARDINAL arcseconds (at the shortest wavelengths) to 90 CARDINAL arcseconds (at the longer wavelengths). The satellite design and main development phases started in 1986 DATE and 1988 DATE , respectively. ISO ORG was launched perfectly in November 1995 DATE by an Ariane 44P vehicle. We will utilize the Python package BeautifulSoup for web scraping to gather data from a news article and then perform NER on the extracted text data. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter from bs4 import BeautifulSoup import requests import re from bs4 import BeautifulSoup import requests import re from bs4 import BeautifulSoup import requests import re Now, we will use the URL of the news article Plain text Copy to clipboard Open code in new window
  • 14. 14/17 EnlighterJS 3 Syntax Highlighter URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news-inr-yen-two-week-high- as-data-boosts-fed-hike-expectations-jerome-powell-242235" URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news-inr-yen-two-week-high- as-data-boosts-fed-hike-expectations-jerome-powell-242235" URL="https://www.zeebiz.com/markets/currency/news-us-dollar-rate-index-news- inr-yen-two-week-high-as-data-boosts-fed-hike-expectations-jerome-powell- 242235" Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter html_content = requests.get(URL).text soup = BeautifulSoup(html_content, "lxml") html_content = requests.get(URL).text soup = BeautifulSoup(html_content, "lxml") html_content = requests.get(URL).text soup = BeautifulSoup(html_content, "lxml") Now, we will move to the body content Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter body=soup.body.text body=soup.body.text body=soup.body.text Now, clean the text using regex. Let us have a look at the text. Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter body[1000:1500] body[1000:1500] body[1000:1500] Plain text
  • 15. 15/17 Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter ws »n nCurrency NewsnnnnnnDollar index hits two-week high as data boosts Fed hike expectationsnUS dollar rate index news:xa0The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market remained on a solid footing, giving the Federal Reserve a possible cushion to continue raising interest rates.nnnnnnnView in Appnnn US dollar rate index news: The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market ws »n nCurrency NewsnnnnnnDollar index hits two-week high as data boosts Fed hike expectationsnUS dollar rate index news:xa0The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market remained on a solid footing, giving the Federal Reserve a possible cushion to continue raising interest rates.nnnnnnnView in Appnnn US dollar rate index news: The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market ws »n nCurrency NewsnnnnnnDollar index hits two-week high as data boosts Fed hike expectationsnUS dollar rate index news:xa0The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market remained on a solid footing, giving the Federal Reserve a possible cushion to continue raising interest rates.nnnnnnnView in Appnnn US dollar rate index news: The U.S. dollar index climbed to a two-week high on Thursday after economic data showed the labor market Proceeding with NER Plain text Copy to clipboard Open code in new window EnlighterJS 3 Syntax Highlighter text3= NER(body) displacy.render(text3,style="ent",jupyter=True) text3= NER(body) displacy.render(text3,style="ent",jupyter=True) text3= NER(body) displacy.render(text3,style="ent",jupyter=True) Use cases of named entity recognition NER has various use cases across different domains and industries. Some of the common use cases of NER include:
  • 16. 16/17 Information extraction: NER is widely used to extract valuable information from unstructured text, such as news articles, research papers, and social media posts. By identifying and classifying named entities like people, organizations, locations, and dates, NER helps understand the key entities mentioned in the text. Document organization and search: NER plays a crucial role in organizing and indexing documents for efficient information retrieval. By identifying and tagging named entities, documents can be categorized and searched based on specific entities, making it easier to find relevant information. Social media analysis: NER is used in social media monitoring and sentiment analysis. It helps in extracting mentions of brands, products, and people in social media posts and comments, allowing companies to understand public opinions and trends. Recommendation systems: NER can be employed in recommendation systems to understand user preferences and interests. Personalized recommendations can be generated by recognizing entities like movie titles, books, or music artists in user reviews or interactions. Healthcare and medical records: In the medical domain, NER is used to extract information from medical records, such as patient names, medical conditions, treatments, and medications. It aids in organizing medical data and supporting clinical decision-making. Chatbots and virtual assistants: NER is essential in natural language processing systems, including chatbots and virtual assistants. It helps understand user queries and extract relevant entities to provide accurate responses. Language translation: NER is used in machine translation systems to identify named entities in the source language and ensure their proper translation into the target language. Event detection and news summarization: NER can be applied to identify events and key entities mentioned in news articles, enabling automatic news summarization and event tracking. NER is a versatile and valuable tool for extracting valuable information from unstructured text, enabling various applications that enhance data analysis, decision-making, and user experiences in diverse domains. Endnote Named entity recognition emerges as a pivotal pillar within the realm of natural language processing, wielding the power to unlock the latent treasures embedded within vast oceans of textual data. With its ability to identify and categorize named entities, NER bestows structure and context upon the unstructured text, empowering machines to comprehend and interact with human language more effectively. As NER continues to evolve with advancements in machine learning and linguistic methodologies, its applications across industries are boundless, significantly impacting how we interpret, analyze, and extract meaningful insights from the written word. From aiding sentiment analysis to streamlining information retrieval and powering intelligent systems, NER remains an indispensable tool in harnessing the true potential of language in the age of data-driven decision-making.
  • 17. 17/17 NER helps transform texts into actionable insights. Unleash the power of your data with LeewayHertz’s NER solutions.