LIT (Lexicon of the Italian Television) is a project conceived by the Accademia della Crusca, the leading research institution on the Italian language, in collaboration with CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental languages), with the aim of studying frequencies of the Italian lexicon used in television content and targets the specific sector of web applications for linguistic research. The corpus of transcriptions is constituted approximately by 170 hours of random television recordings transmitted by the national broadcaster RAI (Italian Radio Television) during the year 2006.
Realization of natural language interfaces usingunyil96
The document discusses research on using lazy functional programming (LFP) to build natural language interfaces (NLIs). LFP involves delaying evaluation of function arguments until needed. Over 45 researchers have investigated using LFP for NLI design and implementation due to similarities between some linguistic theories and LFP theories. The research has resulted in over 60 papers on using LFP for natural language processing tasks like syntactic and semantic analysis. The paper provides a comprehensive survey of this research area at the intersection of computer science and computational linguistics.
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present ATAR, an ATtention-based LSTM model for ARabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Christoph Lange
The document discusses the Distributed Ontology Language (DOL), a proposed standard being developed by ISO for expressing heterogeneous ontologies and links between ontologies. DOL aims to achieve semantic integration and interoperability across knowledge representations. It will have a formal semantics and support multiple serialization formats. The standard is being developed to facilitate communication and reduce complexity for applications involving multiple ontologies.
Muhammad Adil Raja is a researcher interested in machine learning and its applications. He has a PhD from University of Limerick and has worked as a post-doctorate researcher at Orange Labs. His research focuses on developing machine learning models for tasks like speech quality estimation, network impairment characterization, and computational neuroscience. He has extensive experience developing machine learning software and has authored several research proposals applying machine learning.
An Intersemiotic Translation of Normative Utterances to Machine Languagedannyijwest
This document discusses translating normative utterances expressed in natural language to machine language through programming. It begins by providing context on intersemiotic translation and programming languages. It then distinguishes between regulative and constitutive rules, and explores how each could be translated to Java code. For regulative rules, which regulate pre-existing behaviors, an example is given of coding a rule that turns on a light when motion is detected. For constitutive rules, which define new types of behaviors, an example examines how an object-oriented programming approach could represent real-world devices and their functions through defined classes and properties.
Mediterranean Arabic Language and Speech Technology ResourcesHend Al-Khalifa
The MEDAR survey collected responses from 57 players involved in human language technologies and language resources for Arabic. The responses came from a variety of countries, with the highest numbers coming from Egypt (11), Morocco (10), and West Bank & Gaza Strip (9). The majority of respondents (36) answered on behalf of institutions, while 17 answered as independent experts. The survey gathered information on the respondents' profiles, language resources, needs for resources, and market information to understand the current state of language technologies for Arabic.
This research paper provides an overview of software engineering and object-oriented design and programming. It discusses the history and evolution of software engineering, from early frameworks and methodologies to more modern agile and experimental approaches. Object-oriented concepts like classes, objects, inheritance and polymorphism are explained. Key object-oriented programming languages like Java, C++ and Python are also covered. The paper then focuses on constructs in Java, including classes, objects, methods, data types and control flow statements.
This document provides a summary of Satanjeev Banerjee's education, work experience, areas of research interest, publications, software projects, and references. It details his PhD studies in language technologies at Carnegie Mellon University, as well as his master's degrees from CMU and the University of Minnesota Duluth. His work experience includes research assistantships at CMU working on topics like speech summarization and meeting understanding. He has numerous publications in these areas and has developed software like the SmartNotes meeting recording system and the METEOR machine translation evaluation metric.
Realization of natural language interfaces usingunyil96
The document discusses research on using lazy functional programming (LFP) to build natural language interfaces (NLIs). LFP involves delaying evaluation of function arguments until needed. Over 45 researchers have investigated using LFP for NLI design and implementation due to similarities between some linguistic theories and LFP theories. The research has resulted in over 60 papers on using LFP for natural language processing tasks like syntactic and semantic analysis. The paper provides a comprehensive survey of this research area at the intersection of computer science and computational linguistics.
ATAR: Attention-based LSTM for Arabizi transliterationIJECEIAES
A non-standard romanization of Arabic script, known as Arbizi, is widely used in Arabic online and SMS/chat communities. However, since state-of-the-art tools and applications for Arabic NLP expects Arabic to be written in Arabic script, handling contents written in Arabizi requires a special attention either by building customized tools or by transliterating them into Arabic script. The latter approach is the more common one and this work presents two significant contributions in this direction. The first one is to collect and publicly release the first large-scale “Arabizi to Arabic script” parallel corpus focusing on the Jordanian dialect and consisting of more than 25 k pairs carefully created and inspected by native speakers to ensure highest quality. Second, we present ATAR, an ATtention-based LSTM model for ARabizi transliteration. Training and testing this model on our dataset yields impressive accuracy (79%) and BLEU score (88.49).
Ontology Integration and Interoperability (OntoIOp) – Part 1: The Distributed...Christoph Lange
The document discusses the Distributed Ontology Language (DOL), a proposed standard being developed by ISO for expressing heterogeneous ontologies and links between ontologies. DOL aims to achieve semantic integration and interoperability across knowledge representations. It will have a formal semantics and support multiple serialization formats. The standard is being developed to facilitate communication and reduce complexity for applications involving multiple ontologies.
Muhammad Adil Raja is a researcher interested in machine learning and its applications. He has a PhD from University of Limerick and has worked as a post-doctorate researcher at Orange Labs. His research focuses on developing machine learning models for tasks like speech quality estimation, network impairment characterization, and computational neuroscience. He has extensive experience developing machine learning software and has authored several research proposals applying machine learning.
An Intersemiotic Translation of Normative Utterances to Machine Languagedannyijwest
This document discusses translating normative utterances expressed in natural language to machine language through programming. It begins by providing context on intersemiotic translation and programming languages. It then distinguishes between regulative and constitutive rules, and explores how each could be translated to Java code. For regulative rules, which regulate pre-existing behaviors, an example is given of coding a rule that turns on a light when motion is detected. For constitutive rules, which define new types of behaviors, an example examines how an object-oriented programming approach could represent real-world devices and their functions through defined classes and properties.
Mediterranean Arabic Language and Speech Technology ResourcesHend Al-Khalifa
The MEDAR survey collected responses from 57 players involved in human language technologies and language resources for Arabic. The responses came from a variety of countries, with the highest numbers coming from Egypt (11), Morocco (10), and West Bank & Gaza Strip (9). The majority of respondents (36) answered on behalf of institutions, while 17 answered as independent experts. The survey gathered information on the respondents' profiles, language resources, needs for resources, and market information to understand the current state of language technologies for Arabic.
This research paper provides an overview of software engineering and object-oriented design and programming. It discusses the history and evolution of software engineering, from early frameworks and methodologies to more modern agile and experimental approaches. Object-oriented concepts like classes, objects, inheritance and polymorphism are explained. Key object-oriented programming languages like Java, C++ and Python are also covered. The paper then focuses on constructs in Java, including classes, objects, methods, data types and control flow statements.
This document provides a summary of Satanjeev Banerjee's education, work experience, areas of research interest, publications, software projects, and references. It details his PhD studies in language technologies at Carnegie Mellon University, as well as his master's degrees from CMU and the University of Minnesota Duluth. His work experience includes research assistantships at CMU working on topics like speech summarization and meeting understanding. He has numerous publications in these areas and has developed software like the SmartNotes meeting recording system and the METEOR machine translation evaluation metric.
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
This document provides an overview of the EXCITEMENT project, which aims to develop an open platform for multi-lingual textual inference. It involves academic and industrial partners working on two main goals: (1) developing algorithms and resources for multi-lingual textual inference, and (2) applying inference techniques to analyze customer interactions in multiple languages and channels. The work is organized into several work packages focusing on requirements, data collection, architecture, platform development, evaluation, and dissemination over a three year period.
This paper presents an audio personalization framework for mobile devices. The multimedia
models MPEG-21 and MPEG-7 are used to describe metadata information. The metadata which support personalization are stored into each device. The Web Ontology Language (OWL) language is used to produce and manipulate the relative ontological descriptions. The process is distributed according to the MapReduce framework and implemented over the Android platform. It determines a hierarchical system structure consisted of Master and Worker devices. The Master retrieves a list of audio tracks matching specific criteria using SPARQL queries.
This document describes a chatbot project that was developed to answer queries from UT-Dallas students. The chatbot uses natural language processing and a domain-specific knowledge base about UT-Dallas and its library to analyze user queries and generate relevant responses. It was implemented as an Android application using speech recognition and contains modules for tokenization, sentiment analysis, and pattern matching to understand and respond to queries. The document outlines the architecture, knowledge representation, matching algorithm, and provides examples of conversations with the chatbot.
The document provides an overview of the course "Statistical Methods in Computational Linguistics" which covers topics such as basic probability theory, n-gram language modeling, information theory, machine learning techniques for part-of-speech tagging, and statistical machine translation. It discusses the history and reasons for the rise of empirical methods in natural language processing, including the availability of large corpora and computing resources. The course will use Python and its NLTK library for programming exercises and possibly WEKA for small machine learning experiments.
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
This document discusses text-to-speech synthesis systems. It provides background on the history and development of such systems over three generations from 1962 to the present. It describes some of the main challenges in developing speech synthesis for different languages. The document then focuses on specifics of the Azerbaijani language and outlines the approach used in the text-to-speech synthesis system developed by the authors, which combines concatenative synthesis and formant synthesis methods.
The document provides an overview of computational linguistics and its various applications. It defines computational linguistics as the intersection between linguistics and computer science concerned with computational aspects of human language. Some key applications include developing software for tasks like grammar correction, word sense disambiguation, automatic translation, and more. Large linguistic corpora and techniques like part-of-speech tagging, parsing, and machine learning have allowed computational linguistics to make advances in natural language processing.
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
This document provides an overview of natural language processing (NLP) and its history, current applications, and future potential. It discusses how NLP systems work to understand human language by parsing sentences and querying databases. While NLP systems have improved greatly in recent decades, challenges remain around varying syntax, exceeding system rules or concepts, and fully achieving speech recognition or language translation. The future of NLP looks promising for more advanced text analysis, generation and translation capabilities.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Review of plagiarism detection and control & copyrights in Indiaijiert bestjournal
Plagiarism software�s has been in use for almost a decade to get the sense of �theft of intellectual property". However,in today�s digital world the easy access to the web,lar ge databases,and telecommunication in general,has turned plagiarism into a serious problem for publishers,researc hers and educational institutions. The first part of this paper discusses the different plagiarism detection techni ques such as text based,citation based and shape based and compared with respect of their features and performance and an overview of different plagiarism detection methods used for text documents have taken. The second half of the paper is fully dedicated to copyrights in India.
SOFIA - A Smart-M3 lab course: approach and design style to support student p...Sofia Eu
This document describes a lab course on smart spaces and ambient intelligence that was included in a postgraduate degree program. The course asked student groups to propose, develop, and demonstrate simple smart environment applications using Smart-M3, an open source interoperability platform. The course design was developed around the preliminary results of the European Project SOFIA. Students were introduced to key concepts and tools like semantic data representation using RDF and OWL ontologies. They then developed projects using a provided design style and application development process.
BOOTSTRAPPING METHOD FOR DEVELOPING PART-OF-SPEECH TAGGED CORPUS IN LOW RESOU...ijnlc
Most languages, especially in Africa, have fewer or no established part-of-speech (POS) tagged corpus.
However, POS tagged corpus is essential for natural language processing (NLP) to support advanced
researches such as machine translation, speech recognition, etc. Even in cases where there is no POS
tagged corpus, there are some languages for which parallel texts are available online. The task of POS
tagging a new language corpus with a new tagset usually face a bootstrapping problem at the initial stages
of the annotation process. The unavailability of automatic taggers to help the human annotator makes the
annotation process to appear infeasible to quickly produce adequate amounts of POS tagged corpus for
advanced NLP research and training the taggers. In this paper, we demonstrate the efficacy of a POS
annotation method that employed the services of two automatic approaches to assist POS tagged corpus
creation for a novel language in NLP. The two approaches are cross-lingual and monolingual POS tags
projection. We used cross-lingual to automatically create an initial ‘errorful’ tagged corpus for a target
language via word-alignment. The resources for creating this are derived from a source language rich in
NLP resources. A monolingual method is applied to clean the induce noise via an alignment process and to
transform the source language tags to the target language tags. We used English and Igbo as our case
study. This is possible because there are parallel texts that exist between English and Igbo, and the source
language English has available NLP resources. The results of the experiment show a steady improvement
in accuracy and rate of tags transformation with score ranges of 6.13% to 83.79% and 8.67% to 98.37%
respectively. The rate of tags transformation evaluates the rate at which source language tags are
translated to target language tags.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
The document discusses the field of computational linguistics. It defines computational linguistics as the scientific study of language from a computational perspective, involving linguists, computer scientists, and others. The history of computational linguistics is closely tied to the development of digital computers and early applications included machine translation. Computational linguistics research includes areas like speech recognition, natural language processing, and machine translation.
Developemnt and evaluation of a web based question answering system for arabi...ijnlc
Question Answering (QA) systems are gaining great importance due to the increasing amount of web
content and the high demand for digital information that regular information retrieval techniques cannot
satisfy. A question answering system enables users to have a natural language dialog with the machine,
which is required for virtually all emerging online service systems on the Internet. The need for such
systems is higher in the context of the Arabic language. This is because of the scarcity of Arabic QA
systems, which can be attributed to the great challenges they present to the research community,including
theparticularities of Arabic, such as short vowels, absence of capital letters, complex morphology, etc. In
this paper, we report the design and implementation of an Arabic web-based question answering
system,which we called “JAWEB”, the Arabic word for the verb “answer”. Unlike all Arabic questionanswering
systems, JAWEB is a web-based application,so it can be accessed at any time and from
anywhere. Evaluating JAWEBshowed that it gives the correct answer with 100% recall and 80% precision
on average. When comparedto ask.com, the well-established web-based QA system, JAWEBprovided 15-
20% higher recall.These promising results give clear evidence that JAWEB has great potential as a QA
platform and is much needed by Arabic-speaking Internet users across the world.
Text mining is the process of extracting interesting and non-trivial knowledge or information from unstructured text data. Text mining is the multidisciplinary field which draws on data mining, machine learning, information retrieval, computational linguistics and statistics. Important text mining processes are information extraction, information retrieval, natural language processing, text classification, content analysis and text clustering. All these processes are required to complete the preprocessing step before doing their intended task. Pre-processing significantly reduces the size of the input text documents and the actions involved in this step are sentence boundary determination, natural language specific stop-word elimination, tokenization and stemming. Among this, the most essential and important action is the tokenization. Tokenization helps to divide the textual information into individual words. For performing tokenization process, there are many open source tools are available. The main objective of this work is to analyze the performance of the seven open source tokenization tools. For this comparative analysis, we have taken Nlpdotnet Tokenizer, Mila Tokenizer, NLTK Word Tokenize, TextBlob Word Tokenize, MBSP Word Tokenize, Pattern Word Tokenize and Word Tokenization with Python NLTK. Based on the results, we observed that the Nlpdotnet Tokenizer tool performance is better than other tools.
This document provides an outline and details of a student internship project on text-to-speech conversion using the Python programming language. The project was conducted at iPEC Solutions, which provides AI training and services. The student designed a text-to-speech system using tools including Praat, Audacity, and WaveSurfer. The system converts text to speech by extracting phonetic components, matching them to inventory items, and generating acoustic signals for output. The project aimed to help those with communication difficulties through improved accessibility of text-to-speech technology.
The document is a seminar report on text-to-speech conversion submitted by Nadiminti Saroja Kumar to the Department of Electrical Engineering at Gandhi Institute of Engineering and Technology. It includes an abstract, table of contents, acknowledgements, and introduction on text-to-speech conversion and the history and development of speech synthesis techniques. The report aims to map the current state of speech synthesis technology and methods in preparation for a larger audiovisual speech synthesis project planned from 1998-2000.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
The document summarizes a project to incorporate subject areas into the Apertium machine translation system used by the Universitat Oberta de Catalunya (UOC) virtual university. The project aimed to improve translation quality by adding subject area filters to disambiguate terms. It involved analyzing UOC subject areas, creating dictionaries of terms tagged with subject areas, and implementing subject filters in the interface. The filters allow users to select a subject area to generate translations tailored to that field's terminology. The project benefits UOC administrative staff, linguists, teachers, students and other users by producing more accurate translations for different domains.
The document discusses the field of computational linguistics, defining it as the scientific study of language from a computational perspective. It involves providing computational models of linguistic phenomena and using computational techniques and linguistic theories to solve problems in natural language processing. Computational linguistics aims to automatically process and understand natural language by constructing computer programs. The field has its roots in the 1940s-1950s with the development of code breaking machines and computers. Major conferences and journals in the field are associated with the Association for Computational Linguistics.
This document provides an overview of the EXCITEMENT project, which aims to develop an open platform for multi-lingual textual inference. It involves academic and industrial partners working on two main goals: (1) developing algorithms and resources for multi-lingual textual inference, and (2) applying inference techniques to analyze customer interactions in multiple languages and channels. The work is organized into several work packages focusing on requirements, data collection, architecture, platform development, evaluation, and dissemination over a three year period.
This paper presents an audio personalization framework for mobile devices. The multimedia
models MPEG-21 and MPEG-7 are used to describe metadata information. The metadata which support personalization are stored into each device. The Web Ontology Language (OWL) language is used to produce and manipulate the relative ontological descriptions. The process is distributed according to the MapReduce framework and implemented over the Android platform. It determines a hierarchical system structure consisted of Master and Worker devices. The Master retrieves a list of audio tracks matching specific criteria using SPARQL queries.
This document describes a chatbot project that was developed to answer queries from UT-Dallas students. The chatbot uses natural language processing and a domain-specific knowledge base about UT-Dallas and its library to analyze user queries and generate relevant responses. It was implemented as an Android application using speech recognition and contains modules for tokenization, sentiment analysis, and pattern matching to understand and respond to queries. The document outlines the architecture, knowledge representation, matching algorithm, and provides examples of conversations with the chatbot.
The document provides an overview of the course "Statistical Methods in Computational Linguistics" which covers topics such as basic probability theory, n-gram language modeling, information theory, machine learning techniques for part-of-speech tagging, and statistical machine translation. It discusses the history and reasons for the rise of empirical methods in natural language processing, including the availability of large corpora and computing resources. The course will use Python and its NLTK library for programming exercises and possibly WEKA for small machine learning experiments.
The main-principles-of-text-to-speech-synthesis-systemCemal Ardil
This document discusses text-to-speech synthesis systems. It provides background on the history and development of such systems over three generations from 1962 to the present. It describes some of the main challenges in developing speech synthesis for different languages. The document then focuses on specifics of the Azerbaijani language and outlines the approach used in the text-to-speech synthesis system developed by the authors, which combines concatenative synthesis and formant synthesis methods.
The document provides an overview of computational linguistics and its various applications. It defines computational linguistics as the intersection between linguistics and computer science concerned with computational aspects of human language. Some key applications include developing software for tasks like grammar correction, word sense disambiguation, automatic translation, and more. Large linguistic corpora and techniques like part-of-speech tagging, parsing, and machine learning have allowed computational linguistics to make advances in natural language processing.
A NOVEL APPROACH OF CLASSIFICATION TECHNIQUES FOR CLIRcscpconf
Recent and continuing advances in online information systems are creating many opportunities
and also new problems in information retrieval. Gathering the information in different natural
language is the most difficult task, which often requires huge resources. Cross-language
information retrieval (CLIR) is the retrieval of information for a query written in the native
language. This paper deals with various classification techniques that can be used for solving
the problems encountered in CLIR.
This document provides an overview of natural language processing (NLP) and its history, current applications, and future potential. It discusses how NLP systems work to understand human language by parsing sentences and querying databases. While NLP systems have improved greatly in recent decades, challenges remain around varying syntax, exceeding system rules or concepts, and fully achieving speech recognition or language translation. The future of NLP looks promising for more advanced text analysis, generation and translation capabilities.
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...kevig
In this paper, phoneme sequences are used as language information to perform code-switched language
identification (LID). With the one-pass recognition system, the spoken sounds are converted into
phonetically arranged sequences of sounds. The acoustic models are robust enough to handle multiple
languages when emulating multiple hidden Markov models (HMMs). To determine the phoneme similarity
among our target languages, we reported two methods of phoneme mapping. Statistical phoneme-based
bigram language models (LM) are integrated into speech decoding to eliminate possible phone
mismatches. The supervised support vector machine (SVM) is used to learn to recognize the phonetic
information of mixed-language speech based on recognized phone sequences. As the back-end decision is
taken by an SVM, the likelihood scores of segments with monolingual phone occurrence are used to
classify language identity. The speech corpus was tested on Sepedi and English languages that are often
mixed. Our system is evaluated by measuring both the ASR performance and the LID performance
separately. The systems have obtained a promising ASR accuracy with data-driven phone merging
approach modelled using 16 Gaussian mixtures per state. In code-switched speech and monolingual
speech segments respectively, the proposed systems achieved an acceptable ASR and LID accuracy.
Review of plagiarism detection and control & copyrights in Indiaijiert bestjournal
Plagiarism software�s has been in use for almost a decade to get the sense of �theft of intellectual property". However,in today�s digital world the easy access to the web,lar ge databases,and telecommunication in general,has turned plagiarism into a serious problem for publishers,researc hers and educational institutions. The first part of this paper discusses the different plagiarism detection techni ques such as text based,citation based and shape based and compared with respect of their features and performance and an overview of different plagiarism detection methods used for text documents have taken. The second half of the paper is fully dedicated to copyrights in India.
SOFIA - A Smart-M3 lab course: approach and design style to support student p...Sofia Eu
This document describes a lab course on smart spaces and ambient intelligence that was included in a postgraduate degree program. The course asked student groups to propose, develop, and demonstrate simple smart environment applications using Smart-M3, an open source interoperability platform. The course design was developed around the preliminary results of the European Project SOFIA. Students were introduced to key concepts and tools like semantic data representation using RDF and OWL ontologies. They then developed projects using a provided design style and application development process.
BOOTSTRAPPING METHOD FOR DEVELOPING PART-OF-SPEECH TAGGED CORPUS IN LOW RESOU...ijnlc
Most languages, especially in Africa, have fewer or no established part-of-speech (POS) tagged corpus.
However, POS tagged corpus is essential for natural language processing (NLP) to support advanced
researches such as machine translation, speech recognition, etc. Even in cases where there is no POS
tagged corpus, there are some languages for which parallel texts are available online. The task of POS
tagging a new language corpus with a new tagset usually face a bootstrapping problem at the initial stages
of the annotation process. The unavailability of automatic taggers to help the human annotator makes the
annotation process to appear infeasible to quickly produce adequate amounts of POS tagged corpus for
advanced NLP research and training the taggers. In this paper, we demonstrate the efficacy of a POS
annotation method that employed the services of two automatic approaches to assist POS tagged corpus
creation for a novel language in NLP. The two approaches are cross-lingual and monolingual POS tags
projection. We used cross-lingual to automatically create an initial ‘errorful’ tagged corpus for a target
language via word-alignment. The resources for creating this are derived from a source language rich in
NLP resources. A monolingual method is applied to clean the induce noise via an alignment process and to
transform the source language tags to the target language tags. We used English and Igbo as our case
study. This is possible because there are parallel texts that exist between English and Igbo, and the source
language English has available NLP resources. The results of the experiment show a steady improvement
in accuracy and rate of tags transformation with score ranges of 6.13% to 83.79% and 8.67% to 98.37%
respectively. The rate of tags transformation evaluates the rate at which source language tags are
translated to target language tags.
Computational linguistics originated in the 1950s with a focus on machine translation. It aims to use computers to process and understand human language. There are two main goals - theoretical, to better understand how humans use language; and practical, to build tools like machine translation. It draws from linguistics, computer science, psychology, and logic. Some key applications include spelling/grammar checkers, style checkers, information retrieval systems, and computer-assisted language learning tools. However, natural language processing poses challenges due to issues with phonology, morphology, syntax, semantics, and pragmatics.
The document discusses the field of computational linguistics. It defines computational linguistics as the scientific study of language from a computational perspective, involving linguists, computer scientists, and others. The history of computational linguistics is closely tied to the development of digital computers and early applications included machine translation. Computational linguistics research includes areas like speech recognition, natural language processing, and machine translation.
Developemnt and evaluation of a web based question answering system for arabi...ijnlc
Question Answering (QA) systems are gaining great importance due to the increasing amount of web
content and the high demand for digital information that regular information retrieval techniques cannot
satisfy. A question answering system enables users to have a natural language dialog with the machine,
which is required for virtually all emerging online service systems on the Internet. The need for such
systems is higher in the context of the Arabic language. This is because of the scarcity of Arabic QA
systems, which can be attributed to the great challenges they present to the research community,including
theparticularities of Arabic, such as short vowels, absence of capital letters, complex morphology, etc. In
this paper, we report the design and implementation of an Arabic web-based question answering
system,which we called “JAWEB”, the Arabic word for the verb “answer”. Unlike all Arabic questionanswering
systems, JAWEB is a web-based application,so it can be accessed at any time and from
anywhere. Evaluating JAWEBshowed that it gives the correct answer with 100% recall and 80% precision
on average. When comparedto ask.com, the well-established web-based QA system, JAWEBprovided 15-
20% higher recall.These promising results give clear evidence that JAWEB has great potential as a QA
platform and is much needed by Arabic-speaking Internet users across the world.
Text mining is the process of extracting interesting and non-trivial knowledge or information from unstructured text data. Text mining is the multidisciplinary field which draws on data mining, machine learning, information retrieval, computational linguistics and statistics. Important text mining processes are information extraction, information retrieval, natural language processing, text classification, content analysis and text clustering. All these processes are required to complete the preprocessing step before doing their intended task. Pre-processing significantly reduces the size of the input text documents and the actions involved in this step are sentence boundary determination, natural language specific stop-word elimination, tokenization and stemming. Among this, the most essential and important action is the tokenization. Tokenization helps to divide the textual information into individual words. For performing tokenization process, there are many open source tools are available. The main objective of this work is to analyze the performance of the seven open source tokenization tools. For this comparative analysis, we have taken Nlpdotnet Tokenizer, Mila Tokenizer, NLTK Word Tokenize, TextBlob Word Tokenize, MBSP Word Tokenize, Pattern Word Tokenize and Word Tokenization with Python NLTK. Based on the results, we observed that the Nlpdotnet Tokenizer tool performance is better than other tools.
This document provides an outline and details of a student internship project on text-to-speech conversion using the Python programming language. The project was conducted at iPEC Solutions, which provides AI training and services. The student designed a text-to-speech system using tools including Praat, Audacity, and WaveSurfer. The system converts text to speech by extracting phonetic components, matching them to inventory items, and generating acoustic signals for output. The project aimed to help those with communication difficulties through improved accessibility of text-to-speech technology.
The document is a seminar report on text-to-speech conversion submitted by Nadiminti Saroja Kumar to the Department of Electrical Engineering at Gandhi Institute of Engineering and Technology. It includes an abstract, table of contents, acknowledgements, and introduction on text-to-speech conversion and the history and development of speech synthesis techniques. The report aims to map the current state of speech synthesis technology and methods in preparation for a larger audiovisual speech synthesis project planned from 1998-2000.
This document discusses computational linguistics, including its origins, main application areas, and approaches. Computational linguistics originated from efforts in the 1950s to use computers for machine translation between languages. It aims to develop natural language processing applications like machine translation, speech recognition, and grammar checking. Research employs various approaches including developmental, structural, production-focused, and comprehension-focused methods. The field involves both computer science and linguistics.
The document summarizes a project to incorporate subject areas into the Apertium machine translation system used by the Universitat Oberta de Catalunya (UOC) virtual university. The project aimed to improve translation quality by adding subject area filters to disambiguate terms. It involved analyzing UOC subject areas, creating dictionaries of terms tagged with subject areas, and implementing subject filters in the interface. The filters allow users to select a subject area to generate translations tailored to that field's terminology. The project benefits UOC administrative staff, linguists, teachers, students and other users by producing more accurate translations for different domains.
This document summarizes the Monnet project, which aims to provide multilingual access to structured knowledge through an ontology-lexicon model called lemon. The project has several partners from both industry and research. Lemon represents lexical information relative to ontologies to support tasks like label translation and information extraction across languages. It draws on standards like the Lexical Markup Framework and relates ontologies to multilingual lexicons through a core path of lexicon, lexical entry, sense, reference, form, and representation. Future work includes applying these techniques to domains like financial reporting.
Public annual report for the MOLTO project for year 2011. MOLTO is funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement FP7-ICT-247914.
VIDEO OBJECTS DESCRIPTION IN HINDI TEXT LANGUAGE ijmpict
Video activity recognition has grown to be a dynamic location of analysis in latest years. A widespread
information-driven approach is denoted in this paper that produces descriptions of video content into
textual content description inside the Hindi language. This method combines the final results of modern
item with "real-international" records to pick the in all subject-verb-object triplet for depicting a video. The
usage of this triplet desire technique, a video is tagged via the trainer, mainly, Subject, Verb, and object
(SVO) and then this data is mined to improve the result of checking out video clarification by using pastime
as well as item identity. Contrasting preceding approaches, this method can annotate arbitrary videos
deprived of wanting the large series and annotation of a similar schooling video corpus. The proposed
work affords initial and primary text description within the Hindi language that is producing easy words
and sentence formation. But the fundamental challenging attempt on this work is to extract grammatically
accurate and expressive text records in Hindi textual content regarding video content.
Modeling of Speech Synthesis of Standard Arabic Using an Expert Systemcsandit
This document describes an expert system for speech synthesis of Standard Arabic text. It involves two main stages: 1) creation of a sound database and 2) text-to-speech transformation. The transformation process involves phonetic orthographic transcription of the text and then generating voice signals corresponding to the transcribed phonetic sequence. The expert system uses a knowledge base containing sound data and rewriting rules. It transcribes text using graphemes as basic units and then concatenates sound units from the database to synthesize speech. Tests achieved a 96% success rate in pronouncing sentences correctly. Future work aims to improve prosody and develop fully automatic signal segmentation.
The Evaluation of a Code-Switched Sepedi-English Automatic Speech Recognition...IJCI JOURNAL
Speech technology is a field that encompasses various techniques and tools used to enable machines to interact with speech, such as automatic speech recognition (ASR), spoken dialog systems, and others, allowing a device to capture spoken words through a microphone from a human speaker. End-to-end approaches such as Connectionist Temporal Classification (CTC) and attention-based methods are the most used for the development of ASR systems. However, these techniques were commonly used for research and development for many high-resourced languages with large amounts of speech data for training and evaluation, leaving low-resource languages relatively underdeveloped. While the CTC method has been successfully used for other languages, its effectiveness for the Sepedi language remains uncertain. In this study, we present the evaluation of the Sepedi-English code-switched automatic speech recognition system. This end-to-end system was developed using the Sepedi Prompted Code Switching corpus and the CTC approach. The performance of the system was evaluated using both the NCHLT Sepedi test corpus and the Sepedi Prompted Code Switching corpus. The model produced the lowest WER of 41.9%, however, the model faced challenges in recognizing the Sepedi only text.
In the realm of artificial intelligence (AI), speech recognition has emerged as a transformative technology, enabling machines to understand and interpret human speech with remarkable accuracy. At the heart of this technological revolution lies the availability and quality of speech recognition datasets, which serve as the building blocks for training robust yand efficient speech recognition models.
A speech recognition dataset is a curated collection of audio recordings paired with their corresponding transcriptions or labels. These datasets are essential for training machine learning models to recognize and comprehend spoken language across various accents, dialects, and environmental conditions. The quality and diversity of these datasets directly impact the performance and generalisation capabilities of speech recognition systems.
The importance of high-quality speech recognition datasets cannot be overstated. They facilitate the development of more accurate and robust speech recognition models by providing ample training data for machine learning algorithms. Moreover, they enable researchers and developers to address challenges such as speaker variability, background noise, and linguistic nuances, thus enhancing the overall performance of speech recognition systems.
One of the key challenges in building speech recognition datasets is the acquisition of diverse and representative audio data. This often involves recording a large number of speakers from different demographic backgrounds, geographic regions, and language proficiency levels. Additionally, the audio recordings must capture a wide range of speaking styles, contexts, and environmental conditions to ensure the robustness and versatility of the dataset.
Another crucial aspect of speech recognition datasets is the accuracy and consistency of the transcriptions or labels. Manual transcription of audio data is a labor-intensive process that requires linguistic expertise and meticulous attention to detail. To ensure the reliability of the dataset, transcriptions must be verified and validated by multiple annotators to minimise errors and inconsistencies.
The availability of open-source speech recognition datasets has played a significant role in advancing research and innovation in the field of AI speech technology. Projects such as the LibriSpeech dataset, CommonVoice dataset, and Google's Speech Commands dataset have provided researchers and developers with access to large-scale, annotated audio datasets, fostering collaboration and accelerating progress in speech recognition research.
Furthermore, initiatives aimed at crowdsourcing speech data, such as Mozilla's Common Voice project, have democratised the process of dataset creation by enabling volunteers from around the world to contribute their voice recordings. This approach not only helps to diversify the dataset but also empowers individuals to participate in the development of AI technologies that directly impact their lives.
An Open Online Dictionary for Endangered Uralic Languages.pdfJackie Gold
This summarizes a document describing an online dictionary for endangered Uralic languages built using MediaWiki. The dictionary synchronizes edits made in XML dictionaries and in the MediaWiki system. It incorporates new data like combined dictionary entries and semantic databases. The dictionary interface allows browsing cognate relations and meaning groups. The system aims to make lexicographic resources more accessible while integrating community contributions.
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
This document describes a web browser application that converts text to speech. The key features are:
1. The browser can open different file formats (e.g. doc, pdf) and read the text aloud, reducing reading effort.
2. It includes a text-to-speech converter, recorder to save audio, and image-based history with timestamps.
3. The project aims to combine online content browsing with text-to-speech in a single application, addressing limitations of separate browser and text converter tools.
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET Journal
This document describes a text-to-speech synthesis system for the Hindi language developed using the Festival framework. The system takes Hindi text as input and outputs synthesized speech. It uses a syllable-based concatenative approach where Hindi words are segmented into syllables which are then matched to recorded audio files and concatenated to generate speech. Challenges in developing text-to-speech for Hindi include accurate pronunciation rules and producing natural prosody. The system aims to improve the naturalness of synthesized Hindi speech output.
Acceptance Testing Of A Spoken Language Translation SystemMichele Thomas
This document describes an acceptance test procedure for evaluating a spoken language translation system between Catalan and Spanish. The procedure consisted of two independent tests: (1) an utterance-oriented test to evaluate how speech impacts communication by comparing performance of text vs. speech inputs and outputs, and (2) a task-oriented test to evaluate if users could complete predefined tasks with the system. Twelve subjects participated in the tests, with eight familiar with the technology and four unfamiliar. The results showed that speech-to-speech translation was not fully effective but improving, and there were no significant differences between familiar and unfamiliar users. This evaluation considered performance at both the utterance and task levels.
Abstract
This paper on multilingual information and retrieval systems with optical mass storage describes the technical principles of software design. The different layers and modules from the user interface via transformation modules, thesaurus modules and fulltext interpretation to database management are explained in detail. Two examples of multilingual document imaging systems are presented:
Content
1. The Importance of Multilingual Software Systems With Optical Storage
Media for the European Economic Region
2. Software Design
2.1. Structural and Other Requirements for Multilingual Software
2.2. User Interface and Application
2.3. Transformation Modules
2.4. Selection Lists
2.5. Thesauri
2.6. Fulltext Translation
3. Sample Applications
3.1. wfBase
3.2. HEMIS
4. Outlook and Summary
Multilingual Information and Retrieval Systems, Technology and Applications
Dr. Ulrich Kampffmeyer
IMC Congress, Brussels 1993
Contents
(1) The Importance of Multilingual Software Systems With Optical Storage Media for the European Economic Region
(2) Software Design
(2.1) Structural and Other Requirements for Multilingual Software
(2.2) User Interface and Application
(2.3) Transformation Modules
(2.4) Selection Lists
(2.5) Thesauri
(2.6) Fulltext Translation
(3) Sample Applications
(3.1) wfBase
(3.2) HEMIS
(4) Outlook and Summary
Dr. Ulrich Kampffmeyer
PROJECT CONSULT, Hamburg, Germany
www.PROJECT-CONSULT.com
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
This document summarizes a research paper that proposes using a tree-based pipeline optimization tool (TPOT) to improve sentiment classification of dialectal Arabic texts. The paper provides background on sentiment analysis and challenges in analyzing informal Arabic texts. It then discusses related work applying TPOT and AutoML techniques to optimize machine learning for various tasks. The proposed approach uses TPOT for sentiment analysis of three Arabic dialect datasets to automatically optimize hyperparameters and improve over similar prior work.
A FRAMEWORK FOR BUILDING A MULTILINGUAL INDUSTRIAL ONTOLOGY: METHODOLOGY AND ...IJwest
As Web 3.0 is blooming, ontologies augment semantic Web with semi–structured knowledge. Industrial
ontologies can help in improving online commercial communication and marketing. In addition,
conceptualizing the enterprise knowledge can improve information retrieval for industrial applications.
Having ontologies combine multiple languages can help in delivering the knowledge to a broad sector of
Internet users. In addition, multi-lingual ontologies can also help in commercial transactions. This
research paper provides a framework model for building industrial multilingual ontologies which include
Corpus Determination, Filtering, Analysis, Ontology Building, and Ontology Evaluation. It also addresses
factors to be considered when modeling multilingual ontologies. A case study for building a bilingual
English-Arabic ontology for smart phones is presented. The ontology was illustrated using an ontology
editor and visualization tool. The built ontology consists of 67 classes and 18 instances presented in both
Arabic and English. In addition, applications for using the ontology are presented. Future research
directions for the built industrial ontology are presented.
A FRAMEWORK FOR BUILDING A MULTILINGUAL INDUSTRIAL ONTOLOGY: METHODOLOGY AND...dannyijwest
As Web 3.0 is blooming, ontologies augment semantic Web with semi–structured knowledge. Industrial
ontologies can help in improving online commercial communication and marketing. In addition,
conceptualizing the enterprise knowledge can improve information retrieval for industrial applications.
Having ontologies combine multiple languages can help in delivering the knowledge to a broad sector of
Internet users. In addition, multi-lingual ontologies can also help in commercial transactions. This research
paper provides a framework model for building industrial multilingual ontologies which include Corpus
Determination, Filtering, Analysis, Ontology Building, and Ontology Evaluation
DEVELOPMENT OF AN INTEGRATED TOOL THAT SUMMARRIZE AND PRODUCE THE SIGN LANGUA...IJITCA Journal
This paper presents an intelligent tool to summarize Arabic articles and translate the summary to Arabic
sign language based on avatar technology. This tool was designed for improving web accessibility and help
deaf people for acquiring more benefits. Of the most important problems in this task, is that the deaf people
were facing many difficulties in reading online articles because of their limited reading skills. To solve this
problem the proposed tool includes a summarizer that summaries the articles, so deaf people will read the
summary instead of reading the whole article .Also the tool use avatar technology to translate the summary
to sing language
This document describes two interactive video search and browsing systems - a web application using the Rich Internet Application (RIA) paradigm, and a multi-touch collaborative application. Both systems use the same ontology-based video search engine, which allows semantic searching and browsing of video collections. The web application provides query expansion and interactive search interfaces, while the multi-touch application enables collaborative browsing and organization of video search results. The systems aim to provide responsive and intuitive interfaces for searching and exploring video archives.
An information system to access contemporary archives of art: Cavalcaselle, V...Andrea Ferracani
The research project aims at the development of a digital library for handwritten
documents of artistic and literary culture in XIX and XX centuries. The goal is to provide access
to contemporary archives of documents related to well-known art historians: Giovan Battista
Cavalcaselle, Adolfo Venturi, Ugo Ojetti, Giulio Carlo Argan and Cesare Brandi.
This document describes Sirio, an ontology-based video search engine that allows users to search video content using different interfaces like free text, natural language, or graphical concept composition. The system uses an automatically generated ontology to encode semantic relationships between concepts and exploits this structure to expand queries. It provides responsive search across different video domains through a web-based Rich Internet Application interface without requiring installation.
This document describes a natural interaction system developed for an archaeological exhibition on the Shawbak castle site in Jordan. The system uses a multi-touch tabletop to allow visitors to intuitively explore multimedia content about the site organized by time period and level of detail. Users can select different eras and zoom levels to view related texts, images and videos. The interface was designed to provide easy access to content and require minimal cognitive effort. It was installed at an exhibition in Italy and received positive feedback from visitors.
This document describes a web-based tool that allows semantic browsing of video collections using multimedia ontologies. The tool allows users to browse video collections based on concepts, concept relations, and concept clouds. It uses a rich internet application interface for fast responsiveness. The tool accesses videos and related content from sources like YouTube, Flickr, and Twitter through the ontologies. It provides a graphical interface to explore concept relationships and directly access related video clips.
This document describes natural interaction systems implemented at the Mont'Alfonso Fortress in Italy. Users can interact with multimedia content through hand gestures detected by infrared cameras. An interactive table allows users to browse informational cards about the fortress's history through hand movements. Interactive walls and floor activate related video projections when motion is detected. The goal is to create an intuitive experience for users to learn about the fortress restoration through natural human-computer interaction without wearables or training.
MediaPick is a system that allows users to search and organize multimedia content like videos through natural interactions on a multi-touch tabletop interface. It uses an ontology and semantic search to retrieve relevant video results based on selected concepts. The interface allows users to explore concepts, view search results, play videos, and organize results by filtering, grouping, and dragging videos. The system was designed for professionals who work with large multimedia archives like journalists and archivists.
1) Arneb is a web-based video annotation tool that allows multiple human annotators to collaboratively annotate videos with structured semantic annotations using ontologies.
2) The tool was developed for the EU VidiVideo project to generate ground truth annotations for training automatic video annotation systems.
3) Annotations can be exported in MPEG-7 and OWL ontology formats to provide interoperability, and the tool has been used to annotate over 25,000 annotations of broadcast video by professional archivists.
The document describes an integrated web system for ontology-based video search and annotation consisting of three components: the Orione ontology-based search engine, the Sirio search interface, and the Pan web-based video annotation tool. The system provides an environment for collaborative video annotation and retrieval, supporting different query types and interfaces for both technical and non-technical users. It allows searching of videos from different domains using ontologies and expanding queries through semantic relationships and reasoning.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Things to Consider When Choosing a Website Developer for your Website | FODUUFODUU
Choosing the right website developer is crucial for your business. This article covers essential factors to consider, including experience, portfolio, technical skills, communication, pricing, reputation & reviews, cost and budget considerations and post-launch support. Make an informed decision to ensure your website meets your business goals.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Lit mtap
1. LIT: transcription, annotation, search and visualization
tools for the Lexicon of the Italian Television
Thomas M. Alisi & Alberto Del Bimbo &
Andrea Ferracani & Tiberio Uricchio & Ervin Hoxha &
Besmir Bregasi
# Springer Science+Business Media, LLC 2010
Abstract LIT (Lexicon of the Italian Television) is a project conceived by the Accademia
della Crusca, the leading research institution on the Italian language, in collaboration with
CLIEO (Center for theoretical and historical Linguistics: Italian, European and Oriental
languages), with the aim of studying frequencies of the Italian vocabulary used in
television. Approximately 170 hours of random television recordings acquired from the
national broadcaster RAI (Italian Radio Television) during the year 2006 have been used to
create the corpus of transcriptions. The principal outcome of the project is the design and
implementation of an interactive system which combines a web-based video transcription
and annotation tool, a full featured search engine, and a web application for data
visualization with text-video syncing. Furthermore, the project is currently under
deployment as a module of the larger national research funding FIRB 2009 VIVIT (Fondo
di Investimento per la Ricerca di Base, Vivi l'Italiano), which will integrate its achievements
and results within a semantic web infrastructure.
Keywords Video transcription . Annotation . Streaming . Multimedia indexing . Retrieval .
Semantic web . User experience design
Multimed Tools Appl
DOI 10.1007/s11042-010-0610-3
T. M. Alisi (*) :A. Del Bimbo :A. Ferracani :T. Uricchio :E. Hoxha :B. Bregasi
Media Integration and Communication Center, University of Florence, Florence, Italy
e-mail: thomasalisi@gmail.com
A. Del Bimbo
e-mail: delbimbo@dsi.unifi.it
A. Ferracani
e-mail: andrea.ferracani@unifi.it
T. Uricchio
e-mail: tiberio.uricchio@gmail.com
E. Hoxha
e-mail: ervin.hoxha@gmail.com
B. Bregasi
e-mail: besmir.bregasi@gmail.com
2. 1 Background
A video is made up of audio and visual information, and providing direct access to content
through video metadata is essential for the integration and study of multimedia materials,
either in traditional desktop or rich internet applications. For this reason, annotated corpora
are fundamental components of applications designed for linguistic research. The
exploitation of research tasks on corpora of speech data relies at least on a basic
transcription, while further linguistic analyses usually require that the transcription is
aligned with the speech. These basic tools and functionalities are very important to achieve
results in the field of computational linguistics [6].
Annotation tools, both manual and automatic, of audio-video materials have proliferated
in recent years in different research fields (Praat [19], Transcriber [24], Anvil [4], ILSP
[13], the NITE Workbench [18] and many others [1]), but the scattered set of domains in
which they were involved and the lack of truly recognized standards did not help in the
stabilization of emerging technologies. Moreover, the rapid evolution of applications from
traditional desktop to rich internet and networked environments makes further develop-
ments and deployments of research products which could have a twist towards technology
transfer, and thus maturation, even more difficult.
The research effort made towards the definition of standards has produced numerous
results, and usually each of them applies to its specific domain. The common approach
when it comes to research and development of tools for metadata handling is to have a
decent compatibility with XML-based standards: performances are usually maximized with
strictly proprietary applications and storage systems, while interoperability is guaranteed
through the use of well known interchange protocols, such as SOAP/REST [21] web
services (both based on the HTTP application layer of the Transfer Control Protocol), or
RDF/SPARQL [20] endpoints for semantic web applications.
The underlying structure of annotation tools is thus delegated to protocols and
procedures whose fortune and acceptance may vary, depending on research and
technology trends. One of the most reliable and diffused projects for linguistic annotation
of the early 2000s is the Architecture and Tools for Linguistic Analysis Systems
(ATLAS) [8] which, as stated from the official website, “addresses an array of
applications [that] needs spanning corpus construction, evaluation infrastructure, and
multi-modal visualization. The principal goal of ATLAS is to provide powerful
abstractions over annotation tools and formats in order to maximize flexibility and
extensibility”. The project delivered an annotation tool which relied on its proprietary
XML-based format (Atlas Interchange Format, AIF) and was intended to become a
diffused standard, but its support was subsequently dismissed and parts of the project
merged in the forthcoming technologies of automated recognition developed by the
Multimodal Information Group. Several other projects has been developed in the field of
language annotation, but most of them are either discontinued or based on old
technologies and proprietary formats [15].
The gap in terms of standard definitions was progressively filled by two well known
projects: TEI and MPEG-7, which were able of delivering open standards and spread them
in different contexts, thanks to their versatility and easy of use. The Text Encoding Initiative
(TEI) [5] is a consortium which collectively develops and maintains a standard for the
representation of texts in digital form. Its chief deliverable is a set of Guidelines which
specify encoding methods for machine-readable texts, chiefly in the humanities, social
sciences and linguistics. TEI is an XML vocabulary defined by an XML Schema which
suggests a substantial implementation of structural and functional formalisms of literary
Multimed Tools Appl
3. texts in the structural organization of the elements of a markup language, focusing on the
description of logical and functional structures of documents.
MPEG-7 [22], formally named “Multimedia Content Description Interface”, is a
standard for describing the multimedia content data that supports some degree of
interpretation of the information meaning, which can be passed onto, or accessed by, a
device or a computer code. MPEG-7 is not aimed at any one application in particular;
rather, the elements that MPEG-7 standardizes support as broad a range of applications as
possible.
The two standards have reached wide diffusion and have been adopted by a large
number of products and projects, the latter being designed mainly for the description of the
properties and features of multimedia files, while TEI strictly focusing on linguistic issues.
For this reason an application that deals with specific aspects of language analysis is more
likely to find a friendly development environment using the TEI standard, while providing
mappings for MPEG-7 conversions is still important in order to be compatible with such a
large part of the research world.
Even assuming that an application is using either TEI or MPEG-7 as a data interchange
format, evidences show that research trends recently moved towards automatic annotation
of video content and speech recognition for automatic subtitling, both the techniques having
principal outcomes in commercial environments. This assumption is reflected by the
numbers of relevant publications and projects in the field of automated annotation and
recognition, but unfortunately these important results in multimedia are progressively
reducing the availability of updated resources in the linguistic research field of study.
Video annotation and speech recognition, in conjunction with other technologies like
natural language understanding, are indispensable for multimedia indexing and search,
nonetheless all research fields related to computational linguistic usually require a fine-
grained identification of paragraph levels and a definition of corresponding metadata
descriptors: these outcomes are hardly achieved through automatic image and speech
analysis. Speech recognition is error prone and its accuracy dramatically depends on quality
of acoustic conditions and type of data sets: it can be used for library indexing and retrieval
in specific contexts—some experiments show that data retrieval from transcripts of spoken
documents can be just 3–10% worse than information retrieval on perfectly human
transcribed data [11]—but it is not mature enough to meet the specific needs of linguistic
research. In addition, multilingual speech recognition raises the issue of training the
detectors (e.g. the Sphinx-III continuous speech recognition system developed at Carnegie
Mellon) on acoustic and lexicon data models of different languages or dialects in order to
achieve higher performances, thus increasing the overall complexity of systems and
algorithms due to the analysis of the phonology, morphology, syntax and prosody of
specific languages.
Recent experiments have shown that, in addition to an accurate speech transcription, it is
possible to get some specific information such as: the gender of the speaker, a rough
categorization of the type of acoustic environment (e.g. music, noise, clean, etc.), the
beginning and the end of new stories, the top level story topics and a rough speaker
clustering [3]. Unfortunately these results still present high error rates and these tools are
not capable of achieving good performances, if not in scenarios with little variability.
Extracted data based on image features, color histograms or optical flow analysis are
recently achieving encouraging results and can help building language models from visual
features [3] and lately the traditional machine learning techniques have been reinforced with
semantic web algorithms as in [7]. The american National Institute for Standards and
Technology (NIST), which is also currently hosting the evolution of ATLAS with its
Multimed Tools Appl
4. spawned sub-projects, has even established the TREC conference series, whose goal is to
encourage research in information retrieval by providing a large test collection, uniform
scoring procedures, and a forum for organizations interested in comparing their results, still
the level of precision required by linguistic research and the price/performance ratio
provided by manual transcription is unachievable by automatic systems supported by image
analysis.
Another important aspect considered during the development of the project presented in
this paper was the availability of online annotation and transcription tools, since YouTube
has recently added new features for video annotation and transcription. Users can send
invitations for shared annotations and add three types of information: speech bubbles, notes
and spotlights. YouTube allows users to upload captions and subtitles and files with videos
transcription, even if still in beta and only for English language. Although very useful, the
system is still too rough and generic for the specific domain of linguistic research, since
it does not allow necessary customizations for specific contexts and user defined
taxonomies.
For these reasons this paper presents a set of tools where up to date techniques for rich
internet applications are integrated with video streaming, content indexing and multimedia
retrieval, targeting the specific sector of web applications for linguistic research. The project
is part of a series of activities defined along with the Accademia della Crusca and CLIEO
for the design and implementation of a set of integrated tools for linguistic annotation
handling. The Accademia is a leading institution in the field of research on the Italian
language: presently, a principal activity is the support of scientific research and the training
of new researchers in Italian linguistics and philology through its Centres and in
cooperation with Universities. The CLIEO (Center for theoretical and historical Linguistics:
Italian, European and Oriental languages) was founded in order to provide a confluence into
a single research and higher education entity of different institutions previously active in
Florence in the field of Linguistics: University structures (Department of Italian Studies;
Department of Middle Age and Renaissance Studies; Department of Linguistics; Inter-
Universitary Center for the Geolinguistic Study of Proverbs), the Accademia della Crusca,
the Opera del Vocabolario Italiano—Italian Dictionary (OVI, a CNR Institute), and the
Institute of Legal Information Theory and Techniques (ITTIG, a CNR Institute).
2 Architecture
Since LIT is designed in order to satisfy specific requirements of its users, a use case
scenario is the first step for the creation of a correctly targeted application: two classes of
users are identified for the system were transcriptionists and researchers, and Fig. 1 reports
a simplified version of the tasks they have to undertake.
Expanding each task into a complete use case scenario (where each action is described in
details, e.g. type of connection, subtasks with detailed description of sequence diagrams,
etc.), leads to the definition of specifications and requirements, where key factors are:
& Large use of standards. Using diffused standards for storing annotated data is a key
factor to allow interoperability with other systems and, if needed, achieve simplicity in
creating software adaptors to allow sharing data with other systems.
& User experience design (UXD) [10, 17]. User experience highlights the experiential,
affective and meaningful aspects of Human-Computer Interaction (HCI), but also
covers people’s perception of practical aspects such as utility, ease of use and efficiency.
Multimed Tools Appl
5. The use case scenario for LIT was initially developed taking into account the pool of
researchers and users targeted for the system, considering they have mainly a linguistic
background, and have to deal with delicate and repetitive tasks: a typical scenario for
scientists and technicians. Some aspects and interactions have been thus designed
minimizing users’ cognitive effort while approaching to a complex interface.
& Rich internet application (RIA). A RIA is a web application executed by a browser
plug-in, providing functions and interactions usually associated to desktop applications.
The key advantages of using a RIA paradigm while deploying LIT are given by
centralization and the availability of advanced functionalities through a web browser,
such as: asynchronous communication, improved performance due to local processing
on the client, drag and drop, transitions, sliders, interactive grids and tables, client side
form checking, stacked or tabbed collapsible panes, modal and non-modal dialog boxes.
The use case scenario defined for the project clearly shows that users have remote
access to the system, henceforth the video database should allow distributed access:
these requirements are enough to determine a RIA environment, in order to adopt a
“deploy once, run anywhere” strategy.
& Modularity. LIT is designed to be part of a bigger picture, where results coming from
annotation have to be analyzed in a semantic web environment; in this sense, it has to
be easily plugged in a larger context, and thus provide a set of endpoints where
common protocols are used.
The design of the system focused on effectiveness and efficiency while performing tasks
required for the specific application domain: usability was constantly evaluated taking into
account comparisons between how targeted users felt while performing specific tasks and
how technical requirements were achieved. The system thus offers friendly interfaces,
optimized to minimize the learning curve for targeted users, both for annotation and
retrieval, easy to learn and responsive, allowing users to rapidly annotate and search
multimedia material without a specific knowledge of technical jargons or engineering
background. Thanks to the exploitation of RIAs potential, users can annotate a transcription
of a television broadcast using uniquely a graphical interface, without need of manually
editing xml files or specific tags. A large amount of information can be presented using
RIA components (Flex allows easy interface design providing an XML-based development
transcriptionist
accesses from
different locations
performs
repetitive tasks
verifies well
formedness of a
TEI transcription
researcher
accesses with
low spec
equipment
performs
complex queries
needs to analyze
stats on a simple
and light page
accesses the full
archive of video
materials
Fig. 1 Use case scenarios
Multimed Tools Appl
6. platform) and data are shown graphically and can be manipulated interactively. Every task
is accomplished on demand and in realtime without multiple steps or pages being reloaded,
thus reducing servers load.
The project presents two different interfaces: a search engine, based on classical textual
input forms, and a multimedia interface, used both for data visualization and annotation
(latter functionalities being activated after authentication) and the whole systems relies on a
backend implemented to handle TEI transcriptions and provide the necessary indexing and
search functions. The interface for multimedia data handling, thanks to the use of the RIA
paradigm, is multi-platform (Windows, Linux, Mac), runs on any browser equipped with a
standard Adobe Flash player [14], conforms to general hardware requirements for the
targeted audience (thus avoiding possible performance issues) and has extended support of
transcriptions and annotations using any encoding format (ANSI, UTF-8, UTF-16, ASCII,
etc.). Using a multi-encoding enabled environment means that indexing and search
functions can be easily extended to support all kind of language and characters (the system
currently works with all western languages, with special functionalities defined for accented
characters).
The requirements and specifications described have been taken into account and detailed
in the following sections, with regards to three main actions:
1. definition of the interchange format, conforming to a standard schema;
2. learnability, memorability and satisfaction of an annotation and visualization web
interface;
3. efficiency of a fully featured indexing and search engine.
2.1 Standard in the interchange format
Since LIT has to focus more on linguistic issues than general multimedia handling
approach, the data interchange format chosen is XML-TEI, the international standard
mentioned above. TEI can be used for describing humanistic and historical documents in
digital format: key features are that it is XML-based, portable and customizable. Moreover,
TEI has already been used for another project funded by the Accademia and thus presents
an interesting outcome for the integration of data provided by different contexts, which will
be taken into account during the integration of the system in a larger research project
funded by the Italian ministry of education and research, the FIRB 2009 Vivit.
The LIT project is the first web-based tool that allows video annotations in this format:
since it has to deal with its own definition of contexts in annotations, using a standard
which allows schema customizations is mandatory, hence TEI represents a perfect choice.
The revision chose for the annotation system is the latest P5, and section 8 of TEI
guidelines define specifically the structure of speech transcription (a complete reference can
be found at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TS.html). TEI offers great
flexibility when defining the structure of a text transcription, and assuming the basic XML
schema is respected, specific elements and attributes for the definition of a proprietary
structure can be defined: an online tool then can be used for compliancy verifications.
A TEI document is composed of blocks where global metadata definitions are generally
placed at the beginning of the document inside a <teiHeader> element, while subsequently a
set of one or more <TEI> elements define the content itself, made of local metadata instances.
For this reason, the structure of a document can be schematized as in Fig. 2.
The standard is extensible with regard to the definition of the taxonomy used, which can be
modified without affecting backward compatibility of records in the database. The categories,
Multimed Tools Appl
7. defined as identifiers in the general teiHeader block, are then referred in the specific local
instances of TEI blocks by elements defining a specific recording present in the annotation.
The categories and sub-categories defined in LIT are:
& advertising
& fiction (sub-categories: TV film, short series, series, serial)
& entertainment (sub-categories: variety show, game show, reality show, humour and
satire)
& information (sub-categories: news, reportage, in-depth report, live report)
& scientific and cultural (sub-categories: documentaries, magazines)
& talk shows (sub-categories: political, cultural, sports)
The cinema category is totally missing because, due to its nature of being programmatically
prepared for a specific distribution, it does not present evidences of language evolution
interesting for research as in the rest of the footage.
The recording element contains all the data useful for the identification of a specific
broadcast, including the name of the broadcaster, date and time, transmission and category.
The most important part of a single TEI block is the set of utterances, which clearly defines
the transcription of the recording and is accompanied by additional definitions of
production details and references to cue-points defined in the overall timeline in the
teiHeader global definitions. The details used for the specification of an utterance are:
& type of communication (can be: monologue, dialogue)
& type of speech (can be: improvisation, programmed, executed)
<teiCorpus>
</teiCorpus>
<teiHeader>
<!-- global definitions: timeline, list of
people, taxonomy, etc. -->
</teiHeader>
<TEI>
<teiHeader>
<!-- local instances: recordings,
utterances, etc. -->
</teiHeader>
</TEI>
<!-- other TEI blocks -->
Fig. 2 Structure of a TEI docu-
ment
Multimed Tools Appl
8. & gender of speaker (can be: male, female)
& type of speaker (can be: professional, non professional)
& speech technique (can be: on scene, voice-over)
A single utterance in a TEI transcription appears thus as follows:
<u who=“#Person49” start=“#t1” end=“#t2” comtype=“dialogo” spokentype=“esecutivo”
env=“esterno” voice=“incampo”>che c’è Rex ? che succede ?</u>
where the 3 attribute parameters “who”, “start” and “end” are defined as references to
identifiers declared in the global header of the file as:
<timeline origin=“” unit=“s” id=“timeline1”>
<when id=“t1” absolute=“00:00:02:665”/>
<when id=“t2” absolute=“00:00:05:174”/>
</timeline>
and
<person id=“Person49”>
<persName>
<forename>pm1</forename>
</persName>
<sex>Maschio</sex>
<provenance>interno</provenance>
</person>
As it clearly appears from the example reported, TEI allows to define specific needs for
annotation in terms of attributes and elements of the XML schema, which are then validated
through an online tool. Its flexibility is a strong plus for the adoption of the standard in the
linguistic field of research, and the schema generated for the transcription becomes an
important building block of the search engine, where optimized XML based solutions can
be effectively used for information retrieval, as shown in following sections.
2.2 Annotation and visualization
The term “linguistic annotation” refers to any analytical or descriptive note that can be
applied to linguistic data collected in the form of textual data. There are essentially two
methods for annotation of linguistic corpora: automatic and manual, the latter being more
diffused mainly for its formal correctness of contents where this feature is a major
requirement. In recent years there has been a growing interest in statistical-based language
processing: these kind of systems need a training set provided by manual annotators, and
results provided by manual annotations usually work as ground truth for performance
evaluation of automatic systems. Furthermore, automatic systems normally present some
limitations and high error rate in text to speech alignment due to speaker’s voice
characteristics, poor signal-to-noise ratio and overlapping speaker’s voices. While it is fairly
easy to automatically obtain time aligned transcriptions, it is far more difficult, if not
impossible, to determine and identify other crucial characteristics such as:
& single utterances, due to speech elliptics, overlappings, disfluencies et alia;
& speaker’s identity and gender;
& the characteristics of her/his pronunciation;
Multimed Tools Appl
9. & the environment in which the characters are speaking;
& the exact punctuation, capitalization and prosodic features of the text from a linguistic
point of view [12].
For these reasons the choice made for LIT focused on a manual annotation system, in order to
deliver and manage speech to text transcriptions far more accurately than automated systems allow.
It has already been outlined how annotation tools should focus on specific classes of
annotation problems, in order to make the process of annotation more efficient. Such issues in
fact have great influence on the design of specialized tools used to manually create annotations.
From this point of view it is useless, if not impossible, to develop a tool that covers all specific
needs of different annotations, but it is necessary to design a system that effectively complies
with the specific requirements of a domain [9]. If correctly annotated, a corpus of multimedia
materials can be reusable by other subjects and research projects independently from the
specific use case scenario, henceforth LIT is implemented as a specialized tool that makes use
of a data model based on a standard interchange format. Developing a manual annotator
therefore was a decision dictated by specific needs of linguistic research. Even if this decision
led to a fairly high cost in terms of human resources (several linguists have transcribed and
annotated all the multimedia materials hosted by LIT over a period of 18 months
approximately), it was still acceptable due to the foreseen outcomes, being mainly the
accuracy of results, the novelty in the definition of a web based architecture for annotation, and
the usability of the system for non-expert users. Using a semi-automated speech transcription
system was out of scope for LIT, since this important step was almost completed during the
early phases of the project. Furthermore, a set of syntactic, prosodic and phonological metadata
was manually added to transcription, thus making impossible the use of automated software.
The “Collected Requirements for annotations tools”, defined by the ISLE Natural
Interactivity and Multimodality Working Group report [16], have been taken into account
for usability and functionality evaluations of LIT. The tool was designed according to the
points outlined by the Group:
& Portability: it can be used on different platforms.
& Source code: it is web based and its source code can be open to the research community.
& Flexible architecture: adding new components can easily extend it.
& Three layered structure: the user interface layer is separated from the logic layer and the
data representation layer. Each layer can be changed independently.
& I/O Flexibility: the annotation schema is available and the output format is compatible
with other tools.
& Robustness and stability: the tool has been tested by developers, then extensively used
by transcriptionists, it is robust, stable and can be used in real time.
& Audio/Video Interface: it provides an easy to use method to view and play audio / video
segments. It supports large files and controls of streaming multimedia material.
& Flexibility in the coding scheme: the scheme is extendible. The tool allows for custom
taxonomies.
& Easy to use interface: the interface is intuitive and easy to use. Users are always
informed on available actions through proper feedback, thus letting them to understand
how it works simply using it. Similarly, the interface implements concepts familiar to
targeted users, and provides interactions typical of a desktop application.
& Learnability: it is easy to learn and annotations can be performed with very few actions.
Learnabilty has been addressed according to several principles: predictability (users
should be able to foresee how the system responses to actions), synthesizability (users
should be able to understand which actions have led to the current state), consistency
Multimed Tools Appl
10. (same or similar components have to look alike and to respond similarly on user input),
generalizability (annotations tasks should be grouped together when possible and
respond to the same principles) and familiarity (annotation tasks in the web application
domain should correspond to annotation activities in the real world)
& Attractiveness: it is attractive to the user and uses the modern Rich Internet
Applications paradigm. The application has a clean and minimal design, where users
can ‘play’ with it, running and scrubbing videos, displaying frames in which sentences
are pronounced and searching their favourite television characters.
& Transcription support: it is based on speech transcription.
& Marking: it allows the marking of segments of any length and also overlapping or
discontinuous fragments.
& Meta-data: it supports metadata referring to related annotations.
& Annotation process: it supports selection-based annotation with appropriates tags.
& Visualization: the annotated information is visible for all annotation elements in real
time and in the form of text.
& Documentation: there is a user manual.
& Querying, extraction: a search engine is integrated with the tool.
& Data Analysis: it supports the estimated actual duration of speech.
The system consists of two views:
& an interface to browse the corpus and view the selected videos, along with their
transcription and metadata;
& an interface to create and edit video annotations on the transcription source files.
The browsing interface (Fig. 3) shows the video collection present in the model. Users can
select a video and play it immediately, and read the associated metadata and speech
transcription in sync. Each record in the list of videos provides a link to the raw annotation
in XML-TEI format. The annotation can be opened directly inside the browser and saved
on the local systems. Subtitles are displayed at the bottom of the video while segments in
Fig. 3 Browse interface
Multimed Tools Appl
11. the transcription area are automatically highlighted during playback and metadata are
updated accordingly. When the text-to-speech alignment is completed through annotation
activities, users can select a unit of text inside the transcription area and the video cue-point
is aligned accordingly; on the contrary, scrolling the trigger on the annotated video segment
highlights the corresponding segment of text.
The annotation interface (Fig. 4) is accessed by transcriptionists after authentication, and
allows the association between transcriptions and the corresponding video sequences.
Annotators can set, using the tools provided by the graphic user interface, the cue points of
speech on the video sequences and assign them an annotation without having prior knowledge
of the format used. The tool provides functionalities for the definition of metadata at different
levels, or multiple “layers”: features can be assigned to the document as a whole, to individual
transmissions, to speakers in the transmissions and to each single segment of the transcription.
The interaction on the interface is based on the metaphor of the accordion, in order to
guide users through all the necessary steps, built on three subsequent panels, each
corresponding to a specific layer of the annotation (Transmission, People and Transcrip-
tion). The goal of using this metaphor is to logically distribute information through different
levels and avoid stress that usually emerges when users are exposed to an excessive
cognitive load: the accordion switches between different views gradually sliding panels, in
order to cover the area displayed by the current view. This gives the user a smooth and
clean look and feel of the navigation process.
& The first panel (Transmission) allows annotators to define metadata for the
“transmission” layer (title, broadcaster, date, time of screening, and a category from
the custom taxonomy tree representation).
& In the second panel (People) users can define metadata for the “people” layer (identity,
gender, type of speaker and a colour used to highlight utterances within the
transcription area).
& In the third panel (Transcription) the text of the transcription is either pasted or directly
typed in, and then utterances and their specifications are assigned to different segments
of the video.
Fig. 4 Annotation interface
Multimed Tools Appl
12. The utterance metadata, associated to individual segments, is defined selecting a line in
the Transcription panel and clicking the button labelled “edit utterance”. The action opens a
specific annotation panel for the definition of specifications: the identity of the speaker,
type of speech (improvisation, programmed, executed), speech technique (on scene, voice-
over) and type of communication (monologue, dialogue), start and end points of the
segment (in milliseconds).
The annotation panel provides also an additional trigger to control the video playback:
users can move through the video sequence with fine grain precision and set a value in
decimals to control the size of the step used for moving the cue-point forward and
backward; this allows rapid and accurate identification of the segments’ key frames uttered
by speakers. At this stage the user manages the non-trivial task of text-to-speech alignment,
establishing biunivocal relations between units of text and units of speech.
Annotations can be modified or deleted in case of an error; the browsing view is notified
and automatically updated after each modification. When modifications are saved, the
XML-TEI file of the transmission is automatically generated with the structure described in
the previous section, and is ready for the indexing.
Both the annotation and the browse interfaces are developed in Adobe Flex and
Actionscript 3. The videos displayed in the system are streamed with RTMP protocol (Real
Media Transfer Protocol), using the free developer edition of Adobe Media Server.
2.3 Indexing and search engine
The problem of efficiently indexing and storing a large amount of data in XML can be
solved in several ways: the main problem for the LIT project was to find a method that
could obtain good performance while processing data, and provide the level of granularity
for data access required by the engine specifications.
In a simple environment, developed for an untrained user, most of the search functions
implemented would be completely useless and would extremely decrease performances. In
the case of LIT, the users targeted for the project set the requirements which, in some cases,
are radically different from the kind of functions that a common search engine usually
provide. This aspect represents an important novelty in the effort of the scientific
community aimed at the definition of standard tools for linguistic analysis. The main
features are described in the following list.
& Case sensitiveness. Can be switched on and off by users.
& Accented characters. The presence of characters like “à”, “è”, “é”, “ò”, etc. in the Italian
language is very frequent, so it is fundamental to make queries that allow defining the
sensitiveness to accented characters, in order to determine the evolution of uses of
particular forms.
& Frequency analysis of root form expansion. It is of fundamental importance for
linguistic research to analyze the frequency of use of some words, and results are
usually compared using a fixed root of a word to see how many times each possible
expansion of that root is used.
& Jolly characters. Querying with a “?” to specify a generic character and a “*” to specify
a generic sequence of characters is very common for most search engines.
& Ordered or unordered word sequence with defined distance. A query for an “exact
phrase” search is very common; it is although very rare the option of specifying the
distance between each couple of words in a sequence.
& Fine grain specification of the context specified for the single utterances.
Multimed Tools Appl
13. The use of XML-native database for the implementation of the functions described,
initially appeared to be the best solution for index creation and implementation of the
search functions; there were nonetheless some critical issues that made the solution
infeasible with this technology such as, among the others, the complexity of creating
structured query which had to deal with case sensitiveness and unordered sequences. This
led to the definition of an object oriented mapping of the TEI structure in use with its
defined custom fields, and the subsequent adoption of an open source engine for storage,
indexing and retrieval of the data objects. All the architecture is developed in Java.
The library used for XML-TEI parsing is Apache Digester, which provides serial access
to XML files in a way quite similar to the standard Java SAX parser, adding the additional
strength of reading elements and attributes from the source file and instantiating them as
objects with properties in the Java virtual machine, thus unleashing the power of object
oriented programming for data handling. This allows the definition of a mapping between
TEI elements and objects as described by the UML class diagram in Fig. 5.
Next step in the implementation is to store the objects and make them accessible for
retrieval: in order to maximize performances and implement all the search functionalities, LIT
needs more to rely on established and consolidated indexing and querying systems, than to
implement a classic data persistence mechanism. The solution adopted is henceforth based on
an open source indexing and search engine: Apache Lucene. This engine has an indexing
method based on data fragments and stores the documents on the filesystem, while integer
offsets are used to refer words contained in them. The indexing algorithms are based on a
fundamental data fragment: the document. Each document can be defined independently and
can contain a set of different fields. Each field contains the instance of a property defined in
the diagram of Fig. 5 and can be used either as a search parameter or as a fragment of
content which will be displayed in the list of results after a query has been performed. The
fields implemented for each Lucene documents in LIT are 23, described as follows:
1. file full path and name, used for reference;
2. file extension: as the engine could index different types of documents, it is used as a
constrain for selecting just the XML-TEI sources in this specific application;
3. last update: can be used as reference to selectively update the index;
4. transcription: the utterance as written in the TEI file, with notes removed;
5. transcription, lower case: same utterance, lower case to perform case insensitive search,
since it is more efficient to save an extra field in the database for lower case search, using
little extra disk space, rather than performing functions related to case sensitiveness;
Fig. 5 XML to object mappings
Multimed Tools Appl
14. 6. transcription, full text with notes: from time to time some notes are included in the
utterance by the transcriptionists, and this text must not appear in the statistical data of
the corpora;
7. recording title: name of the transmission;
8. recording author: name of the broadcaster;
9. recording category: one of the categories for transmission classification defined in the
taxonomy (see section 2.1);
10. recording date, text: date of the recording, in human readable form;
11. recording date, sortable: used for date range queries;
12. recording time: recordings are extracted from afternoon and evening television slots;
13. recording duration: several recordings can appear inside a single extracted hour of
broadcast;
14. recording type: can be audio or video, for further development;
15. video ID: refers to the video file handled by the streaming server;
16. time start: beginning of an utterance, in milliseconds;
17. time end: end point of an utterance, in milliseconds;
18. utterance, type of communication: can be either monologue or dialogue;
19. utterance, type of speech: can be improvisation, programmed or executed;
20. utterance, type of speaker: can be either professional or non-professional;
21. person, gender: can be male or female;
22. person, speech technique: can be on scene or voice-over;
23. person ID and name.
When a query is performed, the Lucene API accesses the specified repository and performs
the search on the indexes, returning a set of hits ordered by relevance. It should be clear
enough at this point that the trickiest task for the application is to construct a meaningful
query in order to provide significant results. Lucene provides a query construction kit based
on a function called BooleanQuery, which allows to programmatically add pieces of query,
coming from different search fields of the interface, and compose them logically. Each
piece of query is thus added to the complete pool using a set of specific functions, which
can better express the search field used (i.e. a MultiFieldQueryParser function is used for
the “all these words” input field, while a SpanNearQuery function is used for the definition
of word sequences with ranges). After the boolean query is composed, a set of filtering
parameters is added, in order to constrain the query and give results conforming to what
was specified in the interface. Thread safe java objects then make data access and searches
are based on integer sums and subtractions, thus resulting in a very high performance
framework, even when handling large amount of data. Results are sent back to the interface
in a proprietary format in order to correctly render data on the frontend.
The search interface (Fig. 6) is based on standard text input fields. It provides a JSP
frontend to the search functions defined for the engine and uses the Lucene query syntax for
the identification of HTML elements. The interfaces recall a common “advanced search”
form, providing all the boolean combinations usually present in search engines and, for this
reason, making users comfortable with basic features. Notably, some uncommon features
appears among other fields, such as:
& the “free sequence” field, with option for defining it exact, ordered or unordered;
& the “distance” parameter, where free sequences can appear within specified ranges
inside a single utterance;
& the date range parameter.
Multimed Tools Appl
15. Advanced search features are shown inside dedicated panels (Fig. 7), which can be expanded
if necessary. These panels give all the options for specifying the constraints of a query, as
defined for the XML-TEI custom fields used in LIT. The extended parameters allow to:
& set the case sensitiveness of a query;
& perform a word root expansion of jolly characters present in the query;
& set the constraint for specific categories defined in the taxonomy;
& select specific parameters for utterances, such as type of speech (improvisation,
programmed, executed), speech technique (on scene, voice-over), type of communica-
tion (monologue, dialogue), speaker gender and type (professional, non professional).
When a query is executed, results are presented with a header that gives a quick glance at
the distribution within the set of categories and broadcaster (Fig. 8).
Fig. 6 Search interface
Fig. 7 Advanced search fields
Multimed Tools Appl
16. The complete list of results is shown just beneath the stats. Each result contains the
recording data (title, broadcaster, timing and category) and a preview of the utterance with
its corresponding metadata (Fig. 9).
Selecting the title of a recording opens directly the visualization tool, and the referred
utterance is automatically marked up (as in Fig. 3), the two modules of the system in fact,
search engine and annotation tool, are connected using HTTP GET so that the user can
select a text result from the results view provided by the search engine and open the
transcription with the corresponding video sequence on the first view of the annotation tool.
Fig. 8 Stat summary
Fig. 9 Utterance metadata
Multimed Tools Appl
17. 3 Results and developments
This paper shows the results of a collaborative work between computer engineers and social
scientists, aimed at the development of a speech transcription and annotation tool for
linguistic research. Providing these tools is greatly useful in the field of linguistic
computational methodologies, primarily because they allow the systematic management of
large amounts of data. It is henceforth essential for such tools to be open to the online
community, and to use advanced and extensible data interchange formats. The purpose of
LIT is to provide technical means aimed at improving efficiency in annotation, search and
analysis of multimedia data to linguistic researchers, as well as accessing, visualizing and
sharing materials. The tool provides a specialized domain-oriented architecture that
significantly increases the productivity of the researchers and is currently running on
MICC servers at the address http://deckard.micc.unifi.it:8080/litsearch/ where the complete
corpus of annotated XML-TEI files can be downloaded as well.
The system contains 168 hours of RAI (Radio Televisione Italiana) broadcasts, aired
during the year 2006. All the annotations were created by researchers of the Accademia
della Crusca while LIT was under development, in late 2009. They helped to improve the
software reporting bugs and problems, proposing tips and suggestions and contributing
proactively to the definition of the initial set of specifications: the application was thus
developed in an Agile programming framework in order to allow maximum flexibility and
seamless integration of features while the system was already running with limited
functionalities, and progressively achieving an easy to use interface, efficient, usable and
ergonomic. The system has approximately 20.000 utterances stored and using Lucene for
search and retrieval does not raise any performance issue. Optimization of indexes can scale
easily up to 2M documents without significant loss of performance, as demonstrated in the
testing environment, providing results in few milliseconds even for complex queries. Both
the Java application (Tomcat 6) and the streaming server run on a single Ubuntu virtual
machine on VMWare ESXi with 2 dedicated 64bit cores and 4GB RAM.
The tool currently supports a content-based annotation but a new release is under
development and will be included in the VIVIT project [23]. Upcoming features support
additional types of annotation, such as structuring (syntactic and rhetorical structures),
signal-spectrographic analysis, automatic suggestion of metadata and semantic struc-
tures. In particular the system will be extended to allow term suggestion functionalities
for semi-automatic definition of ontologies. Ontologies can provide a framework for
semantic classification of concepts, through which instances of each class can be
inferred using RDF as a standard model for data interchange on the Web. RDF has
features that facilitate data merging even if the underlying schemas differ, and it
specifically supports the evolution of schemas over time without requiring all the data
consumers to be changed. This specification defines the syntax and semantics of the
SPARQL query language for RDF. SPARQL can be used to express queries across
diverse data sources, whether the data is stored natively as RDF or viewed as RDF via
middleware. This approach allows using machine-learning techniques in order to help
users while building the concept classification and refine search results by automatically
adding related concepts.
Acknowledgments The LIT project was initially funded by CLIEO, the “Center for theoretical and
historical Linguistics: Italian, European and Oriental languages”, in collaboration with the Accademia della
Crusca, the leading research institution on the Italian language, and we owe a debt of gratitude to Prof.
Nicoletta Maraschio, who allowed to kick off this research. Most of the computer engineering work done at
Multimed Tools Appl
18. the Media Integration and Communication Center had a continuous feedback from researchers of the
Accademia and would not have been possible with the precious support of Marco Biffi and Vera Gheno.
Luckily enough, the initial financial support of CLIEO has been extended for a 3 years project funded by the
Italian Ministry of Education, University and Research, which will allow integrating semantic web
functionalities.
Besmir Bregasi, Ervin Hoxha and Tiberio Uricchio are three M.Eng. students who merely knew how
things could get complicated when projects are delivered by the Media Integration and Communication
Center, but they had the chance to learn a lot while working on their B.Eng. dissertation assignment.
Freshness of young minds is always a plus when efforts can be focused and welded with senior experience.
References
1. A complete list is available at http://annotation.exmaralda.org/index.php/Tools
2. Agile software development refers to a group of software development methodologies based on iterative
development, where requirements and solutions evolve through collaboration between self-organizing
cross-functional teams. http://agilemanifesto.org/
3. Amaral R, Meinedo H, Caseiro D, Trancoso I, Neto JP (2006) Automatic vs. Manual Topic
Segmentation and Indexation in Broadcast News. In: Proc. of the IV Jornadas en Tecnologia del Habla
4. Anvil: http://www.anvil-software.de/ for more details
5. at the moment of this writing, definitions of TEI standard DTD are those of rev. 5, released Nov. 2007:
http://www.tei-c.org/Guidelines/P5/
6. Bender EM, Langendoen DT (2010) `Computational linguistics in support of Linguistic Theory.
Linguistic Issues in Language Technology 3(2):1–31
7. Bertini M, Cucchiara R, Del Bimbo A, Grana C, Serra G, Torniai C, Vezzani R (2009) Dynamic
pictorially enriched ontologies for video digital libraries. IEEE Multimedia (MMUL)
8. bits of memory for the ATLAS project definition: http://xml.coverpages.org/atlasAnnotation.html and
official website of the Multimodal Information Group, the research board which is currently hosting parts
of the project: http://www.nist.gov/itl/iad/mig/
9. Dybkjaer L, Berman S, Kipp M, Olsen MW, Pirelli V, Reithinger N, Soria C (2001) Survey of existing tools,
standards and user needs for annotation of natural interaction and multimodal data. Technical report, January
10. Garrett J (2002) Elements of user experience: user-centered design for the web. New Riders Press, USA
11. Hauptmann AG, Jin R, Ng TD (2003) Video retrieval using speech and image information. In Storage
and retrieval for multimedia databases 2003. EI’03 Electronic Imaging, pp 148–159
12. Helena Moniz, Fernando Batista, Hugo Meinedo, Alberto Abad, Isabel Trancoso, Ana Isabel Mata da
Silva, Nuno J (2010) Mamede, Prosodically-based automatic segmentation and punctuation, In Speech
Prosody 2010, ISCA, Chicago, USA, May
13. ILSP: http://www.ilsp.gr for more details
14. in terms of diffusion, Flash player is supported by over 95% of market share: http://www.statowl.com/
custom_ria_market_penetration.php
15. it is fairly difficult to find a complete and updated reference of annotation tools and applications. The
following, dated 2001, even if a little bit old, is almost complete and can be useful as a starting point:
http://www.ldc.upenn.edu/annotation/
16. Kristoffersen S (2008) Learnability and robustness of user interfaces: towards a formal analysis of
usability design principles. In: Proceedings of the ICSOFT 2008: 3rd International Conference on
Software and Data Technologies. vol. SE/GSDCA/MUSE: Institute for Systems and Technologies of
Information, Control and Communication, pp 261–268
17. Kuniavsky M (2003) Observing the user experience—A practitioner’s guide to user research. Morgan
Kaufmann Publishers, Elsevier Science, USA
18. NITE: http://www.dfki.de/nite/ for more details
19. Praat: http://www.fon.hum.uva.nl/praat/ for more details
20. RDF standard official website: http://www.w3.org/RDF/ and SPARQL recommendation for RDF
queries: http://www.w3.org/TR/rdf-sparql-query/
21. SOAP protocol standard definition at W3C: http://www.w3.org/TR/soap/ and definition of REST web
services: http://en.wikipedia.org/wiki/Representational_State_Transfer
Multimed Tools Appl
19. 22. the MPEG-7 standard definition and overview: http://mpeg.chiariglione.org/standards/mpeg-7/mpeg-7.htm
23. The VIVIT (Vivi l’italiano) project is funded by the FIRB, the Investment Fund for Basic Research of
the Italian Ministry of University and Research and has the objective of creating an integrated digital
archive of teaching materials, texts and iconographic documents and media for knowledge dissemination
of Italian language and cultural history
24. Transcriber: http://trans.sourceforge.net/en/presentation.php for more details
Thomas M. Alisi is self employed in the digital media industry. He worked for a long time as project
manager and researcher at the Media Integration and Communication Center, where he took also a Ph.D. in
Computer Engineering, Multimedia and TLC, and became skilled in semantic web and interactive
environments. Then he discovered that the time had come to speak of many other things, but he still
collaborates with the University, where he occasionally manages to write a paper and a project proposal.
Alberto Del Bimbo is Professor of Computer Engineering at the Faculty of Engineering, University of
Firenze. His research interests address analysis and interpretation of images and video and their applications,
with particular interest at content-based retrieval in visual and multimedia digital archives, advanced
videosurveillance and target tracking, and natural man-machine interaction assisted by computer vision.
Multimed Tools Appl
20. Andrea Ferracani is working as researcher in the Visual Information and Media Lab at the Media
Integration and Communication Centre, University of Florence. His research interests focus on web
applications, collective intelligence and semantic web. He is principal lecturer for “Web languages and
programming” at the Master in Multimedia Content Design and “Editoria Multimediale” at the Faculty of
Architecture of the University of Florence. When the evening comes, he plays classic guitar, perfectly aware
of the fact that mastering an instrument is an endless challenge.
Multimed Tools Appl