Natural
Language
Processing
Politecnico di Milano
Polo di Como
Prof. Licia Sbattella
--Student: Lorenzo Monni Sau
Matr.: 7...
Indice generale
1. Introduction: Goals of the Assignment and used tools......................................................
2. Choice of the dialogue and text to speech alignment with SPPAS
The choice of the suitable dialogue for the analysis was...
3. Editing the dialogue tiers in Praat and writig a Script for
Processing
Since the process of alignment in SPPAS was not ...
4. POS Tagging
To come up with the part-of-speech tagging of each word in the dialogue the tool
Stanford POSTAGGER was use...
In the analysis non-word utterances were not taken into account since there is only a notword token in the conversation.

...
7. Conclusions
Due to the difficulties in SPPAS processing, the chosen dialogue is a very simple type of
conversation, so ...
8. Appendix: Lines of Code.
MATLAB CODE
function [y_n] = remove_noise(y,win_len,mean_val, atten)
% This functions performs...
pitch = selected ("Pitch")
intensity = selected("Intensity")
space$ = " "
for cont from 1 to numIntervals
select TextGrid ...
print 'pitchMean:2'

'intensityMean:2'

lenStr2 = length(dialogueAct$)
spaceNum2 = 20 - lenStr2
### configure layout ###
p...
// lines for each token
String write_path = "D:Ultimo semestreNatural Language
ProcessingASSIGNMENTconversationdialogue-au...
}
}
//get to root hypernym
if (wnPOS == POS.NOUN)
{
strdomain = getRootHypernym(w);
write_appnd.append(strdomain);
}
}
}
/...
domain[0].getTargetSynset();
else break;
}
}
}
Word rootWord = syndomain.getWord(0);
stringdomain = rootWord.getLemma();
S...
Upcoming SlideShare
Loading in …5
×

Text and Speech Analysis

558 views
426 views

Published on

The objective of this work is to provide a complete analysis of a piece of conversation, carrying out the following features:
- Phonologic features of dialogue and a brief statistical analysis;

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
558
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
25
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Text and Speech Analysis

  1. 1. Natural Language Processing Politecnico di Milano Polo di Como Prof. Licia Sbattella --Student: Lorenzo Monni Sau Matr.: 771378 AA 2012/2013 Assignment: Text & Speech Analysis
  2. 2. Indice generale 1. Introduction: Goals of the Assignment and used tools................................................................2 2. Choice of the dialogue and text to speech alignment with SPPAS..............................................3 3. Editing the dialogue tiers in Praat and writig a Script for Processing.........................................4 4. POS Tagging................................................................................................................................5 5. Semantic Analysis with JWNL....................................................................................................5 6. Results and main statistics...........................................................................................................5 7. Conclusions..................................................................................................................................7 8. Appendix: Lines of Code. ...........................................................................................................8 1. Introduction: Goals of the Assignment and used tools The objective of this work is to provide a complete analysis of a piece of conversation, carrying out the following features: • phonologic features of dialogue and a brief statistical analysis; • A subdivision in dialogue acts using the DAMSL model; • the POS tagging of the dialogue; • a brief Semantic Analysis; • a Graphical Representation of the results. Given these goals, the first step has been the choice of the right dialogue for the purpose of analysis. The audio file of the dialogue together with the written transcription was taken as input to SPPAS (Automatic Phonetic Annotation of Speech), which is a tool for operations of alignment between audio and text, with tokenization and phonetization features. The result of SPPAS analysis got the text aligned with the audio file and it was used as input to PRAAT, which is a tool to capture audio features of speech such as Pitch, Intensity and Formants. The alignment was manually edited in Praat to provide the best match between transcription and audio, and then a Praat script was created to append some audio features and further annotations to the words in the .txt file. The POS Tagging part of the project was carried out by using the POS Tagger of the Stanford University. After this phase the txt with the data looked like a table with audio, dialogue and syntactic features associated with each word of the conversation. The last part of the project involved the semantic analysis of dialogue, leveraging the JWNL java library to query the WordNet lexical database. Graphical results has been made importing the final .txt file in Microsoft Excel.
  3. 3. 2. Choice of the dialogue and text to speech alignment with SPPAS The choice of the suitable dialogue for the analysis was probably the hardest step in the assignment, due to the constraints given by the SPPAS limited capabilities of processing. My first idea was to get an artistically relevant dialogue, so I started with an excerpt from the film Eyes Wide Shut by Stanley Kubrick, and I tried to get the best results in terms of alignments. SPPAS (version 1.4.8) doesn't perform so well with • audio files longer than 2 minute; • excerpts of films, which usually show a relevant background noise; • realistic and natural dialogues, due to superpositions of more voice, non-words phonemes and other imperfections. The Bill and Victor Dialogue had both these three characteristics, so it was almost impossible to obtain a sufficient result in the alignment, even for a following editing provided in Praat. I tried to remove some noise and underline only the speech parts of the audio file using a simple matlab script (See appendix for code), but it didn't work. The second attempt was the dialogue from the italian film Il Divo by Paolo Sorrentino, in which the speech seemed more clear and fluid than the previous. SPPAS also allows processing of italian language dialogues. Unfortunately this audio file showed the same drawbacks of the previous, though I also tried to divide processing in shorter fragments of the audio file, as you can see in the folder. The last attempt was for a linear english educational dialogue between two girls, which worked really good for SPPAS processing. Despite his simpleness and linear dialogue interaction, it had a good level of emotive speaking and it was enough expressive for the purpose of the assignment. To enable a correct alignment with SPPAS I put in the .txt file also the the hashes to signal the moments of pause in the dialogue. This is another limit of SPPAS, since without the silence tracing in the .txt it couldn't provide a precise alignment. The resulting files are shown in the folder of project “SPPAS Processing”.
  4. 4. 3. Editing the dialogue tiers in Praat and writig a Script for Processing Since the process of alignment in SPPAS was not precise, a further editing in Praat was needed, moving boundaries and tokens in the right positions when needed. The results of this editing were saved in the TextGrid file “dialogue-flat-phon_palign”, in the folder “Editing in Praat”. Two more tiers have been added in the TextGrid file, indicating the class of dialogue act (using the theory of dialogue acts classifcation proposed in DAMSL model) and the speaker. The final TextGrid file featured the following tiers: • PhonAlign Tier; • PhnTokAlign Tier; • TokensAlign Tier; • DialogueAct Tier; • Speaker. In the consequent phase I passed from the Praat Editor View to the Praat scripting language, to extract required audio features associated to each word token in the dialogue. The Praat Script “features.praat” takes the Wave file and the TextGrid file as input and produces a txt file which shows: • Word token; • Mean Pitch of token; • Mean Intensity of token; • DialogueAct; • Speaker. The results were saved in the .txt file “conversation-audio” in the folder “Editing in Praat”.
  5. 5. 4. POS Tagging To come up with the part-of-speech tagging of each word in the dialogue the tool Stanford POSTAGGER was used (version 3.2.0). The result of the tagging operation has been stored in the file “conversation-tagged.txt”. A pretrained model has been used to assign part of speech tags to unlabeled text, the adopted model was “wsj-0-18-left3wordsdistsim”, included in the package of the Stanford-postagger. After the POS-tagging processing I noticed some mistakes of the tagger, i.e. some noun terms were recognized as verbs and viceversa, but the majority of words had the right tag. 5. Semantic Analysis with JWNL JWNL is a Java API (Application Programming Interface) to access and query WordNet database. In this context JWNL was used to find the domains of each word token. I used version 2.0 of WordNet, version 1.4 of JWNL and Eclipse as IDE with Java 1.7 SDK and JRE 7 (Java Runtime Environment). To find the domains of each token I leveraged the CATEGORY pointer type, and when no related domains were found I wrote a function which recorsively search the root hypernym. The Java Project reads as .txt input file “conversation-tagged” in the folder “POS tagging”, and writes the .txt file “dialogue-audio-pos-domains” as output file. One issue in this operation was due to the fact that the CATEGORY pointer didn't work for so many tokens, and recursive search for hypernyms returned base classes like “entity” or “abstraction”, too general for the purpose of a semantic domain search. The final results of all processing are stored in the excel file “Dialogue Data” and in the flat .txt file “dialogue-audio-pos-domains-def”. 6. Results and main statistics Data of dialogue analysis were all imported in the excel file “Dialogue Data”, which include four different sheets: – General Data: table with all fields and values; – Speaker Pitch-Intensity: Pitch & Intensity Data and graphics; – Dialogue Acts: Analysis of Dialogue Acts; – Domains: Analysis of Domains.
  6. 6. In the analysis non-word utterances were not taken into account since there is only a notword token in the conversation. Pitch Trend By Speaker 600,00 500,00 Pitch (Hz) 400,00 300,00 200,00 100,00 0,00 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 Token Number Amanda Karen Intensity Trend By Speaker 90,00 80,00 Intensity (dB) 70,00 60,00 50,00 40,00 30,00 20,00 10,00 0,00 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 Token Number Amanda Karen
  7. 7. 7. Conclusions Due to the difficulties in SPPAS processing, the chosen dialogue is a very simple type of conversation, so the DAMSL analysis and the domain analysis did not show sensitive results. The topic of conversation is general, so there is not a particular trend in semantic domains of word tokens. The conversation is equally distributed such that the two speakers have almost the same number of tokens. The conversation shows slight variations in pitch and the fundamental frequency of Amanda's voice is quite different than Karen's, showing the different timber of the two speakers, though always maintaining a pitch in the range of common female values. In average pitch results there is a significant pitch outlier associated to the Amanda's expression “on friday”: the values of 97 and 107 Hz sound a little bit irrealistics if associated to female voice. The average intensity of tokens underlines that the volume of dialogue remains constant during the conversation, there's not softly speaking and the two speakers talk at the same volume (only 2 dB of difference). The PRAAT analysis is probably the most reliable analysis together with POS tagging, whereas the analysis carried out with JWNL shows evident limits in recognizing the correct domains of speech. Most of the domains found are clearly wrong if associated to the kind of dialogue, and the reason relies upon the fact that a knowledge of the context in which word token resides should be mandatory to reach the right semantic domain. The kind of conversation between Amanda and Karen is a Q & A conversation, so it's not a surprise that a high percentage of dialogue acts falls in the Answer and Info-Request types. More pleasant expressions seems to have higher level of pitch and intensity, whereas action-directive, open-options and offers show a lower pitch and sometimes lower intensity, meaning that when the speaker launches a proposal wants probably to give a feeling of modesty, to avoid the feeling of an imposition.
  8. 8. 8. Appendix: Lines of Code. MATLAB CODE function [y_n] = remove_noise(y,win_len,mean_val, atten) % This functions performs a background noise attenuation, provided that the % loudness difference between noise and original signal is high enough. % y = signal with noise % win_len = frame length to calculare noise impact % mean_val = threshold which discriminates between noise and signal % atten = attenuation value to cut noise for n = 1:(length(y)-win_len) if (sum(abs( y(n:(n+win_len-1) ) )) < mean_val*win_len & max(abs(y(n:n+win_len-1)))< mean_val) for m = n:n+win_len-1 y(m) = y(m)*atten; end end end y_n = y; end PRAAT CODE ##### Script to extract features for each token ##### ##print columns of the table## echo Token MeanPitch Intens. DialogueAct select all #sound file & TextGrid file to be analyzed# s = selected("Sound") tg = selected("TextGrid") select tg numIntervals = Get number of intervals... 3 ### calculate Pitch and Intensity of Speech ### select s To Pitch... 0.0 75 600 select s To Intensity... 75 0.0 plus Pitch dialogue-flat Speaker
  9. 9. pitch = selected ("Pitch") intensity = selected("Intensity") space$ = " " for cont from 1 to numIntervals select TextGrid dialogue-flat-phon_palign token$ = Get label of interval... 3 cont tstart = Get starting point... 3 cont tend = Get end point... 3 cont dialogueActNum = Get interval at time... 4 tstart+0.01 dialogueAct$ = Get label of interval... 4 dialogueActNum speakerNum = Get interval at time... 5 tstart+0.01 speaker$ = Get label of interval... 5 speakerNum # for each not-silence token extract mean pitch & mean intensity # if !startsWith (token$, "#") select pitch pitchMean = Get mean... tstart tend Hertz select intensity intensityMean= Get mean... tstart tend dB ### configure layout ### lenStr = length(token$) spaceNum = 15 - lenStr print 'token$' for lung from 1 to spaceNum print 'space$' endfor
  10. 10. print 'pitchMean:2' 'intensityMean:2' lenStr2 = length(dialogueAct$) spaceNum2 = 20 - lenStr2 ### configure layout ### print 'dialogueAct$' for lung from 1 to spaceNum2 print 'space$' endfor print 'speaker$' printline endif endfor ### Save data in txt file ### appendFile ("conversation-audio.txt", info$ ()) JWNL CODE package wordnet; import java.io.*; public class WordSem { public static void main(String[] args) throws JWNLException, IOException, JWNLRuntimeException { // Initialize JWNL with the properties file to point to dictionary files JWNL.initialize(new FileInputStream("file_properties.xml")); // Dictionary object Dictionary wordnet; //After initialization create a Dictionary object that can be queried wordnet = Dictionary.getInstance(); // read text file and extract words to be searched on WordNet String read_path = "D:Ultimo semestreNatural Language ProcessingASSIGNMENTconversationPOS taggingconversation-tagged.txt"; //Open file reader stream (will read file with POS Tagging) FileReader fr = new FileReader(read_path); BufferedReader br = new BufferedReader(fr); //Open file writer stream (will write txt file with "Token POS Domain"
  11. 11. // lines for each token String write_path = "D:Ultimo semestreNatural Language ProcessingASSIGNMENTconversationdialogue-audio-pos-domains.txt"; File file = new File(write_path); FileWriter file_write = new FileWriter(file); String read_linea = ""; //line string variable, read line from sourcefile String wordn = ""; //takes token words from source file String word_POS = ""; // takes POS tags from source file POS wnPOS; // POS tag in WordNet format String strdomain = ""; //takes domain string related to word token // While there are lines in source file take word token and POS tag while(true) { read_linea = br.readLine(); if(read_linea==null) break; String [] splits = read_linea.split("_"); //this is separator between word and tag in source file wordn = splits[0]; System.out.println(wordn); word_POS = splits[1]; System.out.println(word_POS); //begin write line in output txt file StringBuilder write_appnd = new StringBuilder(); write_appnd.append(wordn) .append(" ") .append(word_POS) .append(" "); // translate from POS tag to WordNet word type wnPOS = getWordNetPOS(word_POS); //WordNet analysis: will check for word domain, and for hypernyms if (wnPOS != null && wordn != null) { //An IndexWord is a single word and part of speech. Lookup a SynSet object. IndexWord w = wordnet.lookupIndexWord(wnPOS, wordn); if (w != null) { Synset[] senses = w.getSenses(); int domainlen = senses.length; Pointer[] domain = new Pointer[domainlen]; for (int i=0; i<senses.length; i++) { // CATEGORY is the pointer type for the domains domain = senses[i].getPointers(PointerType.CATEGORY); Synset[] syndomain = new Synset[domain.length]; for (int l=0; l<domain.length; l++) { //obtain synset from domain and then an associated word string syndomain[l] = domain[l].getTargetSynset(); Word rootWord = syndomain[l].getWord(0); strdomain = rootWord.getLemma(); // add to outputtxt file write_appnd.append(strdomain);
  12. 12. } } //get to root hypernym if (wnPOS == POS.NOUN) { strdomain = getRootHypernym(w); write_appnd.append(strdomain); } } } //finish to write line, and then skip to another write_appnd.append("rn"); String write_linea = write_appnd.toString(); file_write.write(write_linea); } file_write.close(); br.close(); } //translate from POS tag to WordNet word type public static POS getWordNetPOS(String wPOS) { POS wordNetPos; switch (wPOS) { case "NN": case "NNS": case "NNP": wordNetPos = POS.NOUN; break; case "VB": case "VBD": case "VBG": case "VBN": case "VBP": case "VBZ": wordNetPos = POS.VERB; break; case "JJ": case "JJR": case "JJS": wordNetPos = POS.ADJECTIVE; break; case "RB": case "RBR": case "RBS": wordNetPos = POS.ADVERB; break; default: wordNetPos = null; } return wordNetPos; } // search for root hypernym public static String getRootHypernym(IndexWord synsetw) throws JWNLException { String stringdomain =""; Synset syndomain = null; Synset[] senses = synsetw.getSenses(); int domainlen = senses.length; Pointer[] domain = new Pointer[domainlen]; for (int i=0; i<senses.length; i++) { domain = senses[0].getPointers(PointerType.HYPERNYM); if (domain.length > 0) { syndomain = domain[0].getTargetSynset(); while(syndomain.toString() != null) { domain = syndomain.getPointers(PointerType.HYPERNYM); if (domain.length > 0) syndomain =
  13. 13. domain[0].getTargetSynset(); else break; } } } Word rootWord = syndomain.getWord(0); stringdomain = rootWord.getLemma(); System.out.println(stringdomain); return stringdomain; } }

×