SlideShare a Scribd company logo
1 of 15
Download to read offline
Python is a simple yet powerful programming language
with excellent functionality for processing linguistic
data.
We chose Python as the implementation language for
NLTK because it has shallow learning curve, its syntax
and semantics are transparent and it has good string-
handling functionality.
As a scripting language, Python facilitates interactive
exploration.
As a dynamic language, Python permits attributes to
be added to objects on the fly, and permits variables
to be typed dynamically facilitating rapid development.
As an object-oriented language, Python permits data
and methods to be encapsulated and re-used easily.
Python comes with an extensive standard library,
including components for graphical programming,
numerical processing, and web data processing.
Python is heavily used in industry, scientific research
and education around the word.
Python is often praised for the way it facilities
productivity, quality, and maintainability of software.
NLTK defines a basic infrastructure can be used to build NLP
programs in Python.
NLTK was designed with six requirements in mind:
i. Simplicity ii. Consistency iii. Extensibility iv. Modularity
v. Well-documented vi. NLTK organization
It provides: basic classes for representing data relevant to
natural language processing: standard interfaces for
performing tasks such tokenization, tagging and parsing;
standard implementations for each task which can be
combined to solve complex problems; and extensive
documentation including tutorial and reference
documentation.
Simplicity: We have tried to provide an intuitive and appealing
framework along with substantial building blocks, for
students to gain a practical knowledge of NLP without
getting bogged down in the tedious house-keeping usually
associated with processing annotated language data.
We have provided software distributions for several
platforms, along with platform-specific instructions, to
make the toolkit easy to install.
We have made a significant effort to ensure that all the
data structures and interfaces are consistent,
making it easy to carry out a variety of tasks using a
uniform framework.
The toolkit easily accommodates new components,
whether those components replicate or extend
existing functionality.
Moreover, the toolkit is organized so that it is usually
obvious where extensions would fit into the toolkit’s
infrastructure.
The interaction between different components of the
toolkit uses simple, well-defined interfaces.
It is possible to complete individual projects using small
parts of the toolkit.
This allows students to learn how to use the toolkit
incrementally throughout a course. Modularity also
makes it easier to change and extend the toolkit.
The toolkit comes with substantial documentation,
including nomenclature, data structures, and
implementations.
NLTK is organized into a collection of task-specific packages.
Each package is a combination of data structures for
representing a particular kind of information such as trees,
and implementations of standard algorithms involving those
structures such as parsers.
This approach is a standard feature of object-oriented design,
in which components encapsulate both the resources and
methods needed to accomplish particular task.
A tokeniser is a program which takes input text and divides it
into “tokens”.
This is often a useful first step, because it means one can then
count the number of words in a text, or count the number of
different words in a text, or extract all the words that occur
exactly 3 times in a Text, etc.
Unix has some facilities which allow you to do this
tokenisation. We start with tr. This “translates” characters.
Typical usage is as follows:
tr 'a-z' 'A-Z' < inputfile > outputfile
tr ’aiou’ e < inputfile > outputfile
tr -c 'A-Za-z' '012' <inputfile> outputfile
See data inten-ling
Typing in Malayalam
Any question?

More Related Content

Similar to 2 why python for nlp

ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ijnlc
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...kevig
 
Real time text stream processing - a dynamic and distributed nlp pipeline
Real time text stream  processing - a dynamic and distributed nlp pipelineReal time text stream  processing - a dynamic and distributed nlp pipeline
Real time text stream processing - a dynamic and distributed nlp pipelineConference Papers
 
Requirment
RequirmentRequirment
Requirmentstat
 
Untitled document (12).pdf
Untitled document (12).pdfUntitled document (12).pdf
Untitled document (12).pdfcollinscafe
 
Mastering Python Programming.pdf
Mastering Python Programming.pdfMastering Python Programming.pdf
Mastering Python Programming.pdfKajal Digital
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1stat
 
Advantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfAdvantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfvegasystemsusa
 
Study of Various Tools for Data Science
Study of Various Tools for Data ScienceStudy of Various Tools for Data Science
Study of Various Tools for Data ScienceIRJET Journal
 
Python Essentials A Quick Guide for Beginners
Python Essentials A Quick Guide for BeginnersPython Essentials A Quick Guide for Beginners
Python Essentials A Quick Guide for BeginnersPRIYASAGIG
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxHGLLearn
 
Text Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An AnalysisText Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An Analysisaciijournal
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISaciijournal
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenizationaciijournal
 
Python-Mastering-the-Language-of-Data-Science.pptx
Python-Mastering-the-Language-of-Data-Science.pptxPython-Mastering-the-Language-of-Data-Science.pptx
Python-Mastering-the-Language-of-Data-Science.pptxdmdHaneef
 

Similar to 2 why python for nlp (20)

ResearchPaper
ResearchPaperResearchPaper
ResearchPaper
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION ALGORITHM FOR TEXT TO GRAPH CONVERSION
ALGORITHM FOR TEXT TO GRAPH CONVERSION
 
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
ALGORITHM FOR TEXT TO GRAPH CONVERSION AND SUMMARIZING USING NLP: A NEW APPRO...
 
Toolboxes for data scientists
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientists
 
Real time text stream processing - a dynamic and distributed nlp pipeline
Real time text stream  processing - a dynamic and distributed nlp pipelineReal time text stream  processing - a dynamic and distributed nlp pipeline
Real time text stream processing - a dynamic and distributed nlp pipeline
 
NEURAL NETWORK BOT
NEURAL NETWORK BOTNEURAL NETWORK BOT
NEURAL NETWORK BOT
 
Requirment
RequirmentRequirment
Requirment
 
Untitled document (12).pdf
Untitled document (12).pdfUntitled document (12).pdf
Untitled document (12).pdf
 
Mastering Python Programming.pdf
Mastering Python Programming.pdfMastering Python Programming.pdf
Mastering Python Programming.pdf
 
Stat Tech Reportv1
Stat Tech Reportv1Stat Tech Reportv1
Stat Tech Reportv1
 
PB.docx
PB.docxPB.docx
PB.docx
 
Advantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfAdvantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdf
 
Study of Various Tools for Data Science
Study of Various Tools for Data ScienceStudy of Various Tools for Data Science
Study of Various Tools for Data Science
 
Python Essentials A Quick Guide for Beginners
Python Essentials A Quick Guide for BeginnersPython Essentials A Quick Guide for Beginners
Python Essentials A Quick Guide for Beginners
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
 
Python Course In Chandigarh
Python Course In ChandigarhPython Course In Chandigarh
Python Course In Chandigarh
 
Text Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An AnalysisText Mining: open Source Tokenization Tools � An Analysis
Text Mining: open Source Tokenization Tools � An Analysis
 
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSISTEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS
 
Text mining open source tokenization
Text mining open source tokenizationText mining open source tokenization
Text mining open source tokenization
 
Python-Mastering-the-Language-of-Data-Science.pptx
Python-Mastering-the-Language-of-Data-Science.pptxPython-Mastering-the-Language-of-Data-Science.pptx
Python-Mastering-the-Language-of-Data-Science.pptx
 

More from ThennarasuSakkan

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)ThennarasuSakkan
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introductionThennarasuSakkan
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introductionThennarasuSakkan
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpusThennarasuSakkan
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpusThennarasuSakkan
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpusThennarasuSakkan
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introductionThennarasuSakkan
 

More from ThennarasuSakkan (9)

11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)11 terms in Corpus Linguistics1 (2)
11 terms in Corpus Linguistics1 (2)
 
8 issues in pos tagging
8 issues in pos tagging8 issues in pos tagging
8 issues in pos tagging
 
7 probability and statistics an introduction
7 probability and statistics an introduction7 probability and statistics an introduction
7 probability and statistics an introduction
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
5 relevance of annotated corpus
5 relevance of annotated corpus5 relevance of annotated corpus
5 relevance of annotated corpus
 
4 salient features of corpus
4 salient features of corpus4 salient features of corpus
4 salient features of corpus
 
1 computational linguistics an introduction
1 computational linguistics   an introduction1 computational linguistics   an introduction
1 computational linguistics an introduction
 

Recently uploaded

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 

2 why python for nlp

  • 1.
  • 2. Python is a simple yet powerful programming language with excellent functionality for processing linguistic data. We chose Python as the implementation language for NLTK because it has shallow learning curve, its syntax and semantics are transparent and it has good string- handling functionality.
  • 3. As a scripting language, Python facilitates interactive exploration. As a dynamic language, Python permits attributes to be added to objects on the fly, and permits variables to be typed dynamically facilitating rapid development. As an object-oriented language, Python permits data and methods to be encapsulated and re-used easily.
  • 4. Python comes with an extensive standard library, including components for graphical programming, numerical processing, and web data processing. Python is heavily used in industry, scientific research and education around the word. Python is often praised for the way it facilities productivity, quality, and maintainability of software.
  • 5. NLTK defines a basic infrastructure can be used to build NLP programs in Python. NLTK was designed with six requirements in mind: i. Simplicity ii. Consistency iii. Extensibility iv. Modularity v. Well-documented vi. NLTK organization It provides: basic classes for representing data relevant to natural language processing: standard interfaces for performing tasks such tokenization, tagging and parsing; standard implementations for each task which can be combined to solve complex problems; and extensive documentation including tutorial and reference documentation.
  • 6. Simplicity: We have tried to provide an intuitive and appealing framework along with substantial building blocks, for students to gain a practical knowledge of NLP without getting bogged down in the tedious house-keeping usually associated with processing annotated language data. We have provided software distributions for several platforms, along with platform-specific instructions, to make the toolkit easy to install.
  • 7. We have made a significant effort to ensure that all the data structures and interfaces are consistent, making it easy to carry out a variety of tasks using a uniform framework.
  • 8. The toolkit easily accommodates new components, whether those components replicate or extend existing functionality. Moreover, the toolkit is organized so that it is usually obvious where extensions would fit into the toolkit’s infrastructure.
  • 9. The interaction between different components of the toolkit uses simple, well-defined interfaces. It is possible to complete individual projects using small parts of the toolkit. This allows students to learn how to use the toolkit incrementally throughout a course. Modularity also makes it easier to change and extend the toolkit.
  • 10. The toolkit comes with substantial documentation, including nomenclature, data structures, and implementations.
  • 11. NLTK is organized into a collection of task-specific packages. Each package is a combination of data structures for representing a particular kind of information such as trees, and implementations of standard algorithms involving those structures such as parsers. This approach is a standard feature of object-oriented design, in which components encapsulate both the resources and methods needed to accomplish particular task.
  • 12. A tokeniser is a program which takes input text and divides it into “tokens”. This is often a useful first step, because it means one can then count the number of words in a text, or count the number of different words in a text, or extract all the words that occur exactly 3 times in a Text, etc. Unix has some facilities which allow you to do this tokenisation. We start with tr. This “translates” characters. Typical usage is as follows:
  • 13. tr 'a-z' 'A-Z' < inputfile > outputfile tr ’aiou’ e < inputfile > outputfile tr -c 'A-Za-z' '012' <inputfile> outputfile