SlideShare a Scribd company logo
1 of 7
Hidden Markov Model Toolkit (HTK) Tutorial
http://www.redicals.com
What is HTK?
HTK is the “Hidden Markov Model Toolkit” developed by the
Cambridge University Engineering Department. This toolkit aims at
building and manipulating Hidden Markov Models (HMMs).
HTK is primarily used for speech recognition research although it has
been used for numerous other applications including speech
synthesis, character recognition and DNA sequencing. HTK consists of
a set of library modules and tools available in C source form. It is
available on free download, along with a complete documentation.
1.1 HTK Construction steps
The main construction steps are the following:
- Creation of a training database: Each element of the vocabulary is
recorded several times, and labelled with the corresponding word.
- Acoustical analysis: The training waveforms are converted into
some series of coefficient vectors.
– Definition of the models: A prototype of Hidden Markov Model
(HMM) is defined for each element of the task vocabulary.
- Training of the models: Each HMM is initialised and trained with
the training data.
- Definition of the task: The grammar of the recogniser (what can be
recognised) is defined.
- Evaluation: The performance of the recogniser can be evaluated on
a corpus of test data.
How to Installing HTK on Windows
STEP 1:- Register with HTK and DownloadHTK Toolkit.
STEP 2:- Extract Windows HTK Binaries, Right-click the htk-
3.x-windows-binary.zip file, and select 'Extract All' from your
right-click menu and follow the steps in the extraction wizard
to extract the zip file to your HTK directory.
STEP 3:-After Extraction, you will get two directories
"bin.win32"and "bin", which contain HTK commands and
other data-processing commands, respectively.
STEP 4:- Now you Need to add these two directories to the
system path
STEP 5:- Go to Start >> type “cmd” Now type the address of
this two directories Such as
set path=%path%;c:htkbin.win32;c:htkbin
NOTE :- I have Created HTK Folder in C drive and have stored
my both directories in it. Such as c:htkbin and
c:htkbin.win32
STEP 5:- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc
How to Installing HTK on Linux
STEP 1 :- DownloadHTK Toolkit and Extract it
STEP 2 :- cd htk
STEP 3 :- ./configure --prefix=/tmp –without-x –disable-
hslab
STEP 4 :- make all
STEP 5 :- sudo make install
STEP 6 :- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc through terminal.
How to Installing HTK on Mac OSX
STEP 1 :- DownloadHTK Toolkit and Extract it
STEP 2 :- $tar zxf HTK-3.4.1.tar.gz
STEP 3 :- Open Terminaland Type cd htk
STEP 4 :- export CPPFLAGS=-UPHNALG
STEP 5 :- cd htk chmod +x configure
STEP 5 :- ./configure –without-x –disable-hslab
STEP 6 :- make all
STEP 7 :- sudo make install
STEP 8 :- To test HTK toolkit you can call any Library such as
HVite, Hcopy etc through termi
1. Data creation
HCopy: feature extraction
HList: file information HLEd: label created(Master Label File, output the MLF)
2. Learning
MakeProtoHMMSet: topologydetermine
the initial model learning
HInit: of HMM and the corresponding cut-out of the phoneme learning (Segmental K-
Means)
HRest:learning of HMM following the HInit (Baum-Welch, increase the number of
mixtures)
3. Recognition
HVite: recognitionby the Viterbi algorithm
HBuild: generationof the word network (sub-networks can also be generated)
HParse: conversionof grammar notation(EBNF ( extendedBackus notation) to)
HDMan: dictionarymanagement tool
4. Analysis
HResult: calculationof the recognitionrate
HTK Tools
HCopy
This program will copy one or more data files to a designated output file, optionally
converting the data into a parameterised form. While the source files can be in any
supported format, the output format is always HTK. Hence, this program is used to
convert data files in other formats to the HTK format, each source data file must
have an associated label file, and a target label file is created. HCopy can also be
used to convert the parameter kind of a file, for example from WAVEFORM to MFCC,
depending on the configuration options. Conversions must be specified via a
configuration.
USE :- HCopy -C mfcc13.cfg ..waveFiles1001-10a.wav outputfeature001-
10a.fea
HInit
HInit is used to provide initial estimates for the parameters of a single HMM using a
set of observation sequences. It works by repeatedly using Viterbi alignment to
segment the training observations and then recomputing the parameters by pooling
the vectors in each segment. For mixture Gaussians, each vector in each segment is
aligned with the component with the highest likelihood. HInit can be used to provide
initial estimates of whole word models in which case the observation sequences are
realisations of the corresponding vocabulary word. Alternatively, HInit can be used
to generate initial estimates of HMMs for phoneme-based speech recognition.
HLEd
This program is a simple editor for manipulating label files. Typical examples of its
use might be to merge a sequence of labels into a single composite label or to
expand a set of labels into a context sensitive set. HLEd works by reading in a list of
editing commands from an edit script file and then makes an edited copy of one or
more label files.
HSLab
HSLab is an interactive label editor for manipulating speech label files. An example
of using HSLab would be to load a sampled waveform file, determine the boundaries
of the speech units of interest and assign labels to them. Alternatively, an existing
label file can be loaded and edited by changing current label boundaries, deleting
and creating new labels.
Hparse
The HParse program generates word level lattice files (for use with e.g. HVite) from
a text file syntax description containing a set of rewrite rules based on extended
Backus-Naur Form (EBNF). The EBNF rules are used to generate an internal
representation of the corresponding finite-state network where HParse network
nodes represent the words in the network, and are connected via sets of links. This
HParse network is then converted to HTK word level lattice.
HERest
This program is used to perform a single re-estimation of the parameters of a set of
HMMs, or linear transforms, using an embedded training version of the Baum-Welch
algorithm. Training data consists of one or more utterances each of which has a
transcription in the form of a standard label file (segment boundaries are ignored).
For each training utterance, a composite model is effectively synthesised by
concatenating the phoneme models given by the transcription. Each phone model
has the same set of accumulators allocated to it as are used in HRest but in HERest
they are updated simultaneously by performing a standard Baum-Welch pass over
each training utterance using the composite model. HERest is intended to operate on
HMMs with initial parameter values estimated by HInit/HRest. HERest supports
multiple mixture Gaussians, discrete and tied-mixture HMMs, multiple data streams,
parameter tying within and between models, and full or diagonal covariance
matrices.
Hresults
HResults is the HTK performance analysis tool. It reads in a set of label files
(typically output from a recognition tool such as HVite) and compares them with the
corresponding reference transcription files. For the analysis of speech recognition
output.
------------------------ Overall Results --------------------------
SENT: %Correct=86.67 [H=52, S=8, N=60]
WORD: %Corr=86.67, Acc=86.67 [H=52, D=0, S=8, I=0, N=60]
H is the number of correct labels, D is the number of deletions, S is the number of
substitutions, I is the number of insertions
and N is the total number of labels in the defining transcription files. The percentage
number of labels correctly recognised is given by
and the accuracy is computed by

More Related Content

What's hot

Cs 1114 - lecture-29
Cs 1114 - lecture-29Cs 1114 - lecture-29
Cs 1114 - lecture-29
Zeeshan Sabir
 
Itp 120 Chapt 19 2009 Binary Input & Output
Itp 120 Chapt 19 2009 Binary Input & OutputItp 120 Chapt 19 2009 Binary Input & Output
Itp 120 Chapt 19 2009 Binary Input & Output
phanleson
 
tibco online training
tibco online trainingtibco online training
tibco online training
sapbest
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
Shivansh Gaur
 

What's hot (18)

D-bus basics
D-bus basicsD-bus basics
D-bus basics
 
Cs 1114 - lecture-29
Cs 1114 - lecture-29Cs 1114 - lecture-29
Cs 1114 - lecture-29
 
Itp 120 Chapt 19 2009 Binary Input & Output
Itp 120 Chapt 19 2009 Binary Input & OutputItp 120 Chapt 19 2009 Binary Input & Output
Itp 120 Chapt 19 2009 Binary Input & Output
 
THE CASE FOR MXF-EMBEDDED EBUCORE METADATA IN ARCHIVING APPLICATIONS | Dieter...
THE CASE FOR MXF-EMBEDDED EBUCORE METADATA IN ARCHIVING APPLICATIONS | Dieter...THE CASE FOR MXF-EMBEDDED EBUCORE METADATA IN ARCHIVING APPLICATIONS | Dieter...
THE CASE FOR MXF-EMBEDDED EBUCORE METADATA IN ARCHIVING APPLICATIONS | Dieter...
 
LaTeX for beginners
LaTeX for beginnersLaTeX for beginners
LaTeX for beginners
 
tibco online training
tibco online trainingtibco online training
tibco online training
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
 
Mca5010 web technologies
Mca5010   web technologiesMca5010   web technologies
Mca5010 web technologies
 
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
 
Ccp4 mmdb-python
Ccp4 mmdb-pythonCcp4 mmdb-python
Ccp4 mmdb-python
 
HDF-EOS Data Extractor & Metadata Updater
HDF-EOS Data Extractor & Metadata UpdaterHDF-EOS Data Extractor & Metadata Updater
HDF-EOS Data Extractor & Metadata Updater
 
Incremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF GraphsIncremental Export of Relational Database Contents into RDF Graphs
Incremental Export of Relational Database Contents into RDF Graphs
 
Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...
Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...
Tools for mxf-embedded bucore metadata, Dieter Van Rijsselbergen, Jean-Pierre...
 
Bt0076, tcpip
Bt0076, tcpipBt0076, tcpip
Bt0076, tcpip
 
Bt0076, tcpip
Bt0076, tcpipBt0076, tcpip
Bt0076, tcpip
 
Embedded metadata in MXF - EBUCore
Embedded metadata in MXF - EBUCoreEmbedded metadata in MXF - EBUCore
Embedded metadata in MXF - EBUCore
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Fota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity AlgorithmsFota Delta Size Reduction Using FIle Similarity Algorithms
Fota Delta Size Reduction Using FIle Similarity Algorithms
 

Similar to Hidden Markov Model Toolkit (HTK) www.redicals.com

Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
rpbrehm
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputs
Suite Solutions
 
Language translators
Language translatorsLanguage translators
Language translators
Aditya Sharat
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
manishamorya
 

Similar to Hidden Markov Model Toolkit (HTK) www.redicals.com (20)

Rhhpc Installation Guide 20100524
Rhhpc Installation Guide 20100524Rhhpc Installation Guide 20100524
Rhhpc Installation Guide 20100524
 
Hack and HHVM
Hack and HHVMHack and HHVM
Hack and HHVM
 
Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014Recommender.system.presentation.pjug.05.20.2014
Recommender.system.presentation.pjug.05.20.2014
 
Unit V.pptx
Unit V.pptxUnit V.pptx
Unit V.pptx
 
Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02Unixshellscript 100406085942-phpapp02
Unixshellscript 100406085942-phpapp02
 
CustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputsCustomizingStyleSheetsForHTMLOutputs
CustomizingStyleSheetsForHTMLOutputs
 
Language translators
Language translatorsLanguage translators
Language translators
 
Compiler Construction introduction
Compiler Construction introductionCompiler Construction introduction
Compiler Construction introduction
 
Why Drupal is Rockstar?
Why Drupal is Rockstar?Why Drupal is Rockstar?
Why Drupal is Rockstar?
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical Tools
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
 
Cs419 Compiler lec1&2 introduction
Cs419 Compiler lec1&2  introductionCs419 Compiler lec1&2  introduction
Cs419 Compiler lec1&2 introduction
 
My cool new Slideshow!
My cool new Slideshow!My cool new Slideshow!
My cool new Slideshow!
 
Compiler Design Material
Compiler Design MaterialCompiler Design Material
Compiler Design Material
 
Introduction 2 linux
Introduction 2 linuxIntroduction 2 linux
Introduction 2 linux
 
A High Dimensional Array Assignment Method For Parallel Computing Systems
A High Dimensional Array Assignment Method For Parallel Computing SystemsA High Dimensional Array Assignment Method For Parallel Computing Systems
A High Dimensional Array Assignment Method For Parallel Computing Systems
 
Compilation
CompilationCompilation
Compilation
 
Compilation
CompilationCompilation
Compilation
 
Compiler design Introduction
Compiler design IntroductionCompiler design Introduction
Compiler design Introduction
 

More from Goa App

More from Goa App (20)

web development in 2024 - website development
web development in 2024 - website developmentweb development in 2024 - website development
web development in 2024 - website development
 
unit test in node js - test cases in node
unit test in node js - test cases in nodeunit test in node js - test cases in node
unit test in node js - test cases in node
 
web development full stack
web development full stackweb development full stack
web development full stack
 
Angular interview questions
Angular interview questionsAngular interview questions
Angular interview questions
 
Spectrofluorimetry (www.redicals.com)
Spectrofluorimetry (www.redicals.com)Spectrofluorimetry (www.redicals.com)
Spectrofluorimetry (www.redicals.com)
 
UV rays
UV rays UV rays
UV rays
 
UV ray spectrophotometer
UV ray spectrophotometerUV ray spectrophotometer
UV ray spectrophotometer
 
Spectrofluorimetry or fluorimetry (www.Redicals.com)
Spectrofluorimetry or fluorimetry (www.Redicals.com)Spectrofluorimetry or fluorimetry (www.Redicals.com)
Spectrofluorimetry or fluorimetry (www.Redicals.com)
 
Atomic Absorption Spectroscopy (www.Redicals.com)
Atomic Absorption Spectroscopy (www.Redicals.com)Atomic Absorption Spectroscopy (www.Redicals.com)
Atomic Absorption Spectroscopy (www.Redicals.com)
 
Cash Budget
Cash BudgetCash Budget
Cash Budget
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Social Network Analysis Using Gephi
Social Network Analysis Using Gephi Social Network Analysis Using Gephi
Social Network Analysis Using Gephi
 
Binomial Heap
Binomial HeapBinomial Heap
Binomial Heap
 
Blu ray
Blu rayBlu ray
Blu ray
 
Memory cards
Memory cardsMemory cards
Memory cards
 
Magnetic memory
Magnetic memoryMagnetic memory
Magnetic memory
 
E governance
E governanceE governance
E governance
 
Mobile phones
Mobile phonesMobile phones
Mobile phones
 
Enterprise resource planning in manufacturing
Enterprise resource planning in manufacturingEnterprise resource planning in manufacturing
Enterprise resource planning in manufacturing
 
Enterprise application integration
Enterprise application integrationEnterprise application integration
Enterprise application integration
 

Recently uploaded

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Recently uploaded (20)

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 

Hidden Markov Model Toolkit (HTK) www.redicals.com

  • 1. Hidden Markov Model Toolkit (HTK) Tutorial http://www.redicals.com What is HTK? HTK is the “Hidden Markov Model Toolkit” developed by the Cambridge University Engineering Department. This toolkit aims at building and manipulating Hidden Markov Models (HMMs). HTK is primarily used for speech recognition research although it has been used for numerous other applications including speech synthesis, character recognition and DNA sequencing. HTK consists of a set of library modules and tools available in C source form. It is available on free download, along with a complete documentation. 1.1 HTK Construction steps The main construction steps are the following: - Creation of a training database: Each element of the vocabulary is recorded several times, and labelled with the corresponding word. - Acoustical analysis: The training waveforms are converted into some series of coefficient vectors. – Definition of the models: A prototype of Hidden Markov Model (HMM) is defined for each element of the task vocabulary. - Training of the models: Each HMM is initialised and trained with the training data. - Definition of the task: The grammar of the recogniser (what can be recognised) is defined.
  • 2. - Evaluation: The performance of the recogniser can be evaluated on a corpus of test data. How to Installing HTK on Windows STEP 1:- Register with HTK and DownloadHTK Toolkit. STEP 2:- Extract Windows HTK Binaries, Right-click the htk- 3.x-windows-binary.zip file, and select 'Extract All' from your right-click menu and follow the steps in the extraction wizard to extract the zip file to your HTK directory. STEP 3:-After Extraction, you will get two directories "bin.win32"and "bin", which contain HTK commands and other data-processing commands, respectively. STEP 4:- Now you Need to add these two directories to the system path STEP 5:- Go to Start >> type “cmd” Now type the address of this two directories Such as set path=%path%;c:htkbin.win32;c:htkbin
  • 3. NOTE :- I have Created HTK Folder in C drive and have stored my both directories in it. Such as c:htkbin and c:htkbin.win32 STEP 5:- To test HTK toolkit you can call any Library such as HVite, Hcopy etc
  • 4. How to Installing HTK on Linux STEP 1 :- DownloadHTK Toolkit and Extract it STEP 2 :- cd htk STEP 3 :- ./configure --prefix=/tmp –without-x –disable- hslab STEP 4 :- make all STEP 5 :- sudo make install STEP 6 :- To test HTK toolkit you can call any Library such as HVite, Hcopy etc through terminal. How to Installing HTK on Mac OSX STEP 1 :- DownloadHTK Toolkit and Extract it STEP 2 :- $tar zxf HTK-3.4.1.tar.gz STEP 3 :- Open Terminaland Type cd htk STEP 4 :- export CPPFLAGS=-UPHNALG STEP 5 :- cd htk chmod +x configure STEP 5 :- ./configure –without-x –disable-hslab STEP 6 :- make all STEP 7 :- sudo make install STEP 8 :- To test HTK toolkit you can call any Library such as HVite, Hcopy etc through termi
  • 5. 1. Data creation HCopy: feature extraction HList: file information HLEd: label created(Master Label File, output the MLF) 2. Learning MakeProtoHMMSet: topologydetermine the initial model learning HInit: of HMM and the corresponding cut-out of the phoneme learning (Segmental K- Means) HRest:learning of HMM following the HInit (Baum-Welch, increase the number of mixtures) 3. Recognition HVite: recognitionby the Viterbi algorithm HBuild: generationof the word network (sub-networks can also be generated) HParse: conversionof grammar notation(EBNF ( extendedBackus notation) to) HDMan: dictionarymanagement tool 4. Analysis HResult: calculationof the recognitionrate HTK Tools HCopy This program will copy one or more data files to a designated output file, optionally converting the data into a parameterised form. While the source files can be in any supported format, the output format is always HTK. Hence, this program is used to convert data files in other formats to the HTK format, each source data file must have an associated label file, and a target label file is created. HCopy can also be used to convert the parameter kind of a file, for example from WAVEFORM to MFCC, depending on the configuration options. Conversions must be specified via a configuration. USE :- HCopy -C mfcc13.cfg ..waveFiles1001-10a.wav outputfeature001- 10a.fea HInit HInit is used to provide initial estimates for the parameters of a single HMM using a set of observation sequences. It works by repeatedly using Viterbi alignment to segment the training observations and then recomputing the parameters by pooling
  • 6. the vectors in each segment. For mixture Gaussians, each vector in each segment is aligned with the component with the highest likelihood. HInit can be used to provide initial estimates of whole word models in which case the observation sequences are realisations of the corresponding vocabulary word. Alternatively, HInit can be used to generate initial estimates of HMMs for phoneme-based speech recognition. HLEd This program is a simple editor for manipulating label files. Typical examples of its use might be to merge a sequence of labels into a single composite label or to expand a set of labels into a context sensitive set. HLEd works by reading in a list of editing commands from an edit script file and then makes an edited copy of one or more label files. HSLab HSLab is an interactive label editor for manipulating speech label files. An example of using HSLab would be to load a sampled waveform file, determine the boundaries of the speech units of interest and assign labels to them. Alternatively, an existing label file can be loaded and edited by changing current label boundaries, deleting and creating new labels. Hparse The HParse program generates word level lattice files (for use with e.g. HVite) from a text file syntax description containing a set of rewrite rules based on extended Backus-Naur Form (EBNF). The EBNF rules are used to generate an internal representation of the corresponding finite-state network where HParse network nodes represent the words in the network, and are connected via sets of links. This HParse network is then converted to HTK word level lattice. HERest This program is used to perform a single re-estimation of the parameters of a set of HMMs, or linear transforms, using an embedded training version of the Baum-Welch algorithm. Training data consists of one or more utterances each of which has a transcription in the form of a standard label file (segment boundaries are ignored). For each training utterance, a composite model is effectively synthesised by concatenating the phoneme models given by the transcription. Each phone model has the same set of accumulators allocated to it as are used in HRest but in HERest they are updated simultaneously by performing a standard Baum-Welch pass over each training utterance using the composite model. HERest is intended to operate on HMMs with initial parameter values estimated by HInit/HRest. HERest supports multiple mixture Gaussians, discrete and tied-mixture HMMs, multiple data streams,
  • 7. parameter tying within and between models, and full or diagonal covariance matrices. Hresults HResults is the HTK performance analysis tool. It reads in a set of label files (typically output from a recognition tool such as HVite) and compares them with the corresponding reference transcription files. For the analysis of speech recognition output. ------------------------ Overall Results -------------------------- SENT: %Correct=86.67 [H=52, S=8, N=60] WORD: %Corr=86.67, Acc=86.67 [H=52, D=0, S=8, I=0, N=60] H is the number of correct labels, D is the number of deletions, S is the number of substitutions, I is the number of insertions and N is the total number of labels in the defining transcription files. The percentage number of labels correctly recognised is given by and the accuracy is computed by