This document provides a curriculum vitae for Ido Dagan. It outlines his personal details, education history, employment history, teaching experience, research interests, and publications. Ido Dagan received his PhD in Computer Science from Technion - Israel Institute of Technology in 1992. He has held various academic and industrial positions, including Vice President of Technology at LingoMotors and Founding CTO of FocusEngine. His research interests include natural language processing with a focus on empirical and machine learning methods. He has numerous publications in journals, books, and conferences.
April 2020 most read artilce in contro theory & computer controllingijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of particular ambiguous lexical item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses that render sentences in different meanings. In our experiment we have achieved around 84% accurate result on the sense classification over the total input sentences. We have analyzed those residual sentences that did not comply with our experiment and did affect the results to note that in many cases, wrong syntactic structures and less semantic information are the main hurdles in semantic classification of sentences. The applicational relevance of this study is attested in automatic text classification, machine learning, information extraction, and word sense disambiguation
Ronald Vance Bjarnason is a PhD candidate at Oregon State University seeking summer work from June to September 2003. He has a background in programming and research interests in relational learning, Bayesian learning, reinforcement learning, game theory, and multi-agent systems. His current anticipated dissertation topic is on relational reinforcement learning on the semantic web.
This document provides a biography and bibliography for Dr. Sakinat Oluwabukonla Folorunso Nee Tijani, a lecturer in the Department of Mathematical Sciences at Olabisi Onabanjo University in Nigeria. It outlines her education history, qualifications, professional experience, publications, memberships, and areas of research interest.
This document provides a curriculum vitae for Nicola Polettini, including biographical information, educational history, certificates, professional activities, languages, computer skills, military position, research interests, and publications. It details that Nicola Polettini is currently a PhD student studying Information and Communication Technology at the University of Trento, and lists his educational background and work experience in machine learning and document classification.
This document provides an overview of Debopriyo Roy's research portfolio from 2011. It outlines his areas of focus which include document design practices, procedural visual design, usability testing processes, statistical analysis of web interactions, cognitive and behavioral frameworks, online collaboration/interface design, the technical writing market in India, and the tools/interfaces used for research. It also lists selected publications, research projects/funding, accomplishments, and research initiatives.
Ali Akbar Dehkhoda, the prominent lexicographer, describes a person who has difficulty in grasping knowledge as someone who “Cannot understand something without knowing all its details.” If the knowledge required by somebody is in a language other than the person’s mother tongue, access to this knowledge will surely meet special difficulties resulting from the person’s lack of mastery over the second
language. Any project that can monitor knowledge sources written in English and change them into the
user’s language by employing a simple understandable model is capable of being a knowledge-based
project with a world view regarding text simplification. This article creates a knowledge system,
investigates some algorithms for analyzing contents of complex texts, and presents solutions for changing
such texts simple and understandable ones. Texts are automatically analyzed and their ambiguous points
are identified by software, but it is the author or the human agent who makes decisions concerning
omission of the ambiguities or correction of the texts.
Wear blue sapphire to pacify your life during shani mahadashabluesa
Blue Sapphire Gemstone in Vedic Astrology and even in Hindu Mythology is believed to be the bull’s eye gemstone for calming Saturn or Shani Deva. Its association with Shani is so intense that even in ancient texts its mentioned as Shanipriya which means “ Beloved of Shani “ although its also called as Neelam in Hindi. Word has it that it’s the fastest acting gemstone in the world, but expert astrologers say that only hardworking people are benefitted by this sorcerer’s stone. It is of great help to people who are under Shani Mahadasha , Saade saati or Shani Dhaiya because period of Shani’s anger in one’s horoscope is the most difficult in one’s life.On the top of it Saturn is the slowest moving planet of all which makes the problem even worse.Transparent Blue Sapphire from Ceylon gives best results which has VVS quality inclusions.If this stone suits you , it can bestow immense wealth , great health and unmatchable success which grows in multiple folds.
April 2020 most read artilce in contro theory & computer controllingijctcm
Based on the sense definition of words available in the Bengali WordNet, an attempt is made to classify the Bengali sentences automatically into different groups in accordance with their underlying senses. The input sentences are collected from 50 different categories of the Bengali text corpus developed in the TDIL project of the Govt. of India, while information about the different senses of particular ambiguous lexical item is collected from Bengali WordNet. In an experimental basis we have used Naive Bayes probabilistic model as a useful classifier of sentences. We have applied the algorithm over 1747 sentences that contain a particular Bengali lexical item which, because of its ambiguous nature, is able to trigger different senses that render sentences in different meanings. In our experiment we have achieved around 84% accurate result on the sense classification over the total input sentences. We have analyzed those residual sentences that did not comply with our experiment and did affect the results to note that in many cases, wrong syntactic structures and less semantic information are the main hurdles in semantic classification of sentences. The applicational relevance of this study is attested in automatic text classification, machine learning, information extraction, and word sense disambiguation
Ronald Vance Bjarnason is a PhD candidate at Oregon State University seeking summer work from June to September 2003. He has a background in programming and research interests in relational learning, Bayesian learning, reinforcement learning, game theory, and multi-agent systems. His current anticipated dissertation topic is on relational reinforcement learning on the semantic web.
This document provides a biography and bibliography for Dr. Sakinat Oluwabukonla Folorunso Nee Tijani, a lecturer in the Department of Mathematical Sciences at Olabisi Onabanjo University in Nigeria. It outlines her education history, qualifications, professional experience, publications, memberships, and areas of research interest.
This document provides a curriculum vitae for Nicola Polettini, including biographical information, educational history, certificates, professional activities, languages, computer skills, military position, research interests, and publications. It details that Nicola Polettini is currently a PhD student studying Information and Communication Technology at the University of Trento, and lists his educational background and work experience in machine learning and document classification.
This document provides an overview of Debopriyo Roy's research portfolio from 2011. It outlines his areas of focus which include document design practices, procedural visual design, usability testing processes, statistical analysis of web interactions, cognitive and behavioral frameworks, online collaboration/interface design, the technical writing market in India, and the tools/interfaces used for research. It also lists selected publications, research projects/funding, accomplishments, and research initiatives.
Ali Akbar Dehkhoda, the prominent lexicographer, describes a person who has difficulty in grasping knowledge as someone who “Cannot understand something without knowing all its details.” If the knowledge required by somebody is in a language other than the person’s mother tongue, access to this knowledge will surely meet special difficulties resulting from the person’s lack of mastery over the second
language. Any project that can monitor knowledge sources written in English and change them into the
user’s language by employing a simple understandable model is capable of being a knowledge-based
project with a world view regarding text simplification. This article creates a knowledge system,
investigates some algorithms for analyzing contents of complex texts, and presents solutions for changing
such texts simple and understandable ones. Texts are automatically analyzed and their ambiguous points
are identified by software, but it is the author or the human agent who makes decisions concerning
omission of the ambiguities or correction of the texts.
Wear blue sapphire to pacify your life during shani mahadashabluesa
Blue Sapphire Gemstone in Vedic Astrology and even in Hindu Mythology is believed to be the bull’s eye gemstone for calming Saturn or Shani Deva. Its association with Shani is so intense that even in ancient texts its mentioned as Shanipriya which means “ Beloved of Shani “ although its also called as Neelam in Hindi. Word has it that it’s the fastest acting gemstone in the world, but expert astrologers say that only hardworking people are benefitted by this sorcerer’s stone. It is of great help to people who are under Shani Mahadasha , Saade saati or Shani Dhaiya because period of Shani’s anger in one’s horoscope is the most difficult in one’s life.On the top of it Saturn is the slowest moving planet of all which makes the problem even worse.Transparent Blue Sapphire from Ceylon gives best results which has VVS quality inclusions.If this stone suits you , it can bestow immense wealth , great health and unmatchable success which grows in multiple folds.
This document provides a bibliography of references related to information extraction and retrieval. It lists over 80 sources including journal articles, conference proceedings, book chapters, tutorials, and websites covering topics like text summarization, cross-language retrieval, information extraction techniques and systems, probabilistic models in IR, and using natural language processing for IR tasks. The references were compiled to provide background reading for a tutorial on information extraction and retrieval.
Deep generative and discriminative models for speech recognition. The document outlines the history of speech recognition models including early neural networks, hidden dynamic models, and deep belief networks. It describes how deep learning entered speech recognition around 2009 through the collaboration of Microsoft Research and academics. This led to replacing generative models with discriminative deep neural networks which achieved large error reductions. The talk outlines further innovations in deep learning for speech including context-dependent models and better optimization techniques.
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION IJCI JOURNAL
Nowadays, text classification for different purposes becomes a basic task for concerned people. Hence, much research has been done to develop automatic text classification for the majority of national and international languages. However, the need for an automated text classification system for local languages is felt. The main purpose of this study is to establish a novel automatic classification system of Pashto text. In order to follow this up, we established a collection of Pashto documents and constructed the dataset. In addition, this study includes several models containing statistical techniques and neural network neural machine learning including DistilBERT-base-multilingual-cased, Multilayer Perceptron, Support Vector Machine, K Nearest Neighbor, decision tree, Gaussian naïve Bayes, multinomial naïve Bayes, random forest, and logistic regression to discover the most effective approach. Moreover, this investigation evaluates two different feature extraction methods including bag of words, and Term Frequency Inverse Document Frequency. Subsequently, this research obtained an average testing accuracy rate of 94% using the MLP classification algorithm and TFIDF feature extraction method in single label multi-class classification. Similarly, MLP+TFIDF with F1-measure of 0.81 showed the best result. Experiments on the use of pre-trained language representation models (such as DistilBERT) for classifying Pashto texts show that we need a specific tokenizer for a particular language to obtain reasonable results.
New research articles 2020 october issue international journal of multimedi...ijma
The International Journal of Multimedia & Its Applications (IJMA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Multimedia & its applications. The journal focuses on all technical and practical aspects of Multimedia and its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding recent developments this arena, and establishing new collaborations in these areas.
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSIJCI JOURNAL
Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP), and two hybrid models of CNN and RNN. We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.
The document discusses a study that developed a computerized material to help English learners acquire and use English phrasal verbs. The study found:
1) The software enhanced learners' acquisition of target phrasal verbs, especially their ability to understand figurative meanings.
2) Learners performed better on phrasal verbs containing "break" compared to those containing "bring" or "come" after using the software.
3) The software improved learners' comprehension of phrasal verbs on immediate post-tests and on delayed tests taken one week later, indicating its effects lasted over time.
This document provides an overview of swarm intelligence and various swarm intelligence algorithms. It defines swarm intelligence as the collective behavior of decentralized, self-organized systems, both natural and artificial. Examples of swarm intelligence in nature include ant colonies, bird flocking, fish schooling, and bacterial growth. Several swarm intelligence algorithms are described, including particle swarm optimization, ant colony optimization, artificial bee colony, bacterial foraging optimization, and gravitational search algorithm. These algorithms were inspired by behaviors observed in swarms in nature.
Exploring the Order of Precedence when Using Contextual Dimensions for Mobile...Periquest Ltd
This document summarizes a study exploring the order of precedence of contextual dimensions for delivering mobile information to students. It involved delivering information via RSS and Twitter and surveying students about contexts like location, time of day, and activity. Preliminary results showed students prioritized location over other contexts. The ongoing study will analyze dimensions in more realistic situations and develop mobile apps for additional contexts.
This document summarizes the agenda and key topics for a CIS 890 project final presentation on topics modelling with LDA. The presentation will cover LDA modelling, HMMLDA modelling, LDA with collocations modelling, and experimental results on the NIPS collection. It will discuss topic modelling approaches like LDA, discriminative vs generative methods, and limitations of bag-of-words assumptions.
This document provides a summary of Satanjeev Banerjee's education, work experience, areas of research interest, publications, software projects, and references. It details his PhD studies in language technologies at Carnegie Mellon University, as well as his master's degrees from CMU and the University of Minnesota Duluth. His work experience includes research assistantships at CMU working on topics like speech summarization and meeting understanding. He has numerous publications in these areas and has developed software like the SmartNotes meeting recording system and the METEOR machine translation evaluation metric.
The document discusses the relationship between neurodiversity, specifically autism, and software development. It provides empirical evidence that autism occurs more frequently in families of engineers and scientists and that mothers of autistic children are more likely to work in technical fields. It explores historical studies of programmers that found traits common in autistic individuals, such as a preference for hierarchical decomposition and opportunistic problem-solving approaches. The document examines cognitive aspects of programming, like chunking, beacons, and schema, that play to the strengths of autistic thinking.
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingTheodore J. LaGrow
The document describes a study that used natural language processing to analyze scientific literature on specific topics and identify trends in technologies and methods used. The researchers extracted text from the first 100 articles returned for searches on Hawkes processes, galaxy evolution, T-cell receptor genomes, and natural language processing. NLP was used to identify noun phrases relating to predefined interesting words, and word clouds were generated to visualize frequencies. Results showed technologies and methods relevant to each topic. The researchers aim to continue improving the software to better connect researchers with useful tools.
A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...Martha Brown
This document presents a questionnaire for conducting fieldwork on endangered and indigenous languages in India. The questionnaire was developed through discussions with linguists and is designed to create dictionaries and basic grammars for documented languages. It includes sections on details of language experts, language vitality, diversity and attitudes, word and sentence lists, anthropological questions, and demographic profiling. The goal is to document languages in a standardized yet flexible way while balancing academic and community needs. Picture books and videos are used to elicit unique linguistic aspects for each language.
In recent decades, the Neo-Darwinian Synthesis has been quietly expanded to embrace the evolution of complex systems (living and non-living) and the information on which they are based (e.g., Adami 2011; Mayfield 2013). The expanded theoretical framework is especially appropriate—perhaps essential—for understanding the evolution of modern humans, who represent major changes in the way that information is stored, transmitted, translated, and manipulated (Maynard Smith and Szathmáry 1995). Modern humans may be distinguished from earlier forms of Homo by an enhanced faculty for manipulation of information (i.e., computation) that permits generation of a potentially infinite variety of combinations of hierarchically-organized units of information. This faculty is most commonly manifest in the computations that underlie spoken and unspoken language (Hauser et al. 2002), which may be considered a form of information technology. Spoken or imagined words are “material symbols” (Clark 2008) manipulated in the brain to facilitate complex computation in a manner analogous to the beads of an abacus.
If technology is viewed as a form of computation (i.e., manipulation of objects and materials), this faculty also is evident in the artifacts produced by modern humans, which exhibit an increasingly complex, hierarchical organization with a potentially infinite variety of combinatorial possibilities. Because the acquisition of syntactic language requires a lengthy "critical period" of exposure during childhood, the computational complexity of language appears to be linked to the significantly delayed maturation of the modern human brain (which is only 25% of its adult volume at birth). Greenfield (1991) found that the manipulation of objects exhibits increasing complexity (i.e., more hierarchical levels of organization) during childhood and noted overlap in areas of the brain activated for language and object manipulation. The enhanced faculty for manipulation of information and objects (i.e., increased computational complexity) found in modern humans is thus plausibly tied to the delayed growth of the brain and extended childhood, which begins to evolve after about 0.5 million years ago, but apparently is not comparable to that of living people until after 0.2 million years ago (Smith et al. 2007; Smith et al. 2010). The evolution of enhanced computational complexity in modern humans transformed existing systems of communication and technology, yielding an open-ended syntactic form of language and potentially infinite variety of hierarchically structured artifacts. Modern humans created new forms of information, including visual art (analog) and notation (digital), and colonized most terrestrial habitats on Earth by designing their own adaptive “traits” (e.g., tailored clothing) based on complex technological computations.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Deep Learning - What's the buzz all aboutDebdoot Sheet
Deep learning is a genre of machine learning algorithms that attempt to solve tasks by learning abstraction in data following a stratified description paradigm using non-linear transformation architectures.
When put in simple terms, say you want to make the machine recognize some Mr. X with Mt. E in the background, then this task is a stratified or hierarchical recognition task. At the base of the recognition pyramid would be kernels which can discriminate flats, lines, curves, sharp angles, color; higher up will be kernels which use this information to discriminate body parts, trees, natural scenery, clouds, etc.; higher up will use this knowledge to recognize humans, animals, mountains, etc.; and higher up will learn to recognize Mr. X and Mt. E and finally the apex lexical synthesizer module would say that Mr. X is standing in front of Mt. E. Deep learning is all about how you make machines synthesize this hierarchical logic and also learn these representative kernels all by itself.
Deep learning has been extensively used to efficiently solve these kinds of problems from handwritten character recognition (NYU, U Toronto), speech recognition (Microsoft, Google Voice), lexical ordered speech synthesis (Google Voice, iPhone Siri), object and poster recognition (Cortica), image retrieval (Baidu), content filtering (Youtube, Metacafe, Twitter), product visibility tracking (GazeMetrix), computational medical imaging (IITKgp).
This talk will focus on the buzz around this topic and how firm does the buzz hold on to the claims it boasts of?
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSkevig
Recently, many researchers have focused on building and improving speech recognition systems to
facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system
has become an important and commontool from games to translation systems, robots, and so on. However,
there is still a need for research on speech recognition systems for low-resource languages. This article
deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients
(MFCCs) feature extraction method and three different deep neural networks including Convolutional
Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our
models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms.
This study obtained the impressive result of 98.365% average accuracy.
Tuning Dari Speech Classification Employing Deep Neural Networkskevig
Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.
This document outlines the structure and contents of a literature review for a study exploring the effectiveness of using computer games for vocabulary learning. It discusses strategies for vocabulary learning, students' perceptions of using computer games, and the effectiveness of computer games compared to traditional memorization methods. The review covers incidental vs intentional learning, students' implicit evaluations of different learning methods, and how multimedia and engagement in computer games can facilitate vocabulary acquisition.
This document provides a list of references related to conversation analysis and ethnomethodology. It includes over 30 sources from the 1960s to the 2000s that examine the analysis of natural conversation and institutional talk, the use of language in legal and medical settings, and the study of social interaction and the organization of conversation. The references cover foundational works in conversation analysis and ethnomethodology as well as more recent applications of these approaches.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
This document provides a bibliography of references related to information extraction and retrieval. It lists over 80 sources including journal articles, conference proceedings, book chapters, tutorials, and websites covering topics like text summarization, cross-language retrieval, information extraction techniques and systems, probabilistic models in IR, and using natural language processing for IR tasks. The references were compiled to provide background reading for a tutorial on information extraction and retrieval.
Deep generative and discriminative models for speech recognition. The document outlines the history of speech recognition models including early neural networks, hidden dynamic models, and deep belief networks. It describes how deep learning entered speech recognition around 2009 through the collaboration of Microsoft Research and academics. This led to replacing generative models with discriminative deep neural networks which achieved large error reductions. The talk outlines further innovations in deep learning for speech including context-dependent models and better optimization techniques.
TUNING LANGUAGE PROCESSING APPROACHES FOR PASHTO TEXTS CLASSIFICATION IJCI JOURNAL
Nowadays, text classification for different purposes becomes a basic task for concerned people. Hence, much research has been done to develop automatic text classification for the majority of national and international languages. However, the need for an automated text classification system for local languages is felt. The main purpose of this study is to establish a novel automatic classification system of Pashto text. In order to follow this up, we established a collection of Pashto documents and constructed the dataset. In addition, this study includes several models containing statistical techniques and neural network neural machine learning including DistilBERT-base-multilingual-cased, Multilayer Perceptron, Support Vector Machine, K Nearest Neighbor, decision tree, Gaussian naïve Bayes, multinomial naïve Bayes, random forest, and logistic regression to discover the most effective approach. Moreover, this investigation evaluates two different feature extraction methods including bag of words, and Term Frequency Inverse Document Frequency. Subsequently, this research obtained an average testing accuracy rate of 94% using the MLP classification algorithm and TFIDF feature extraction method in single label multi-class classification. Similarly, MLP+TFIDF with F1-measure of 0.81 showed the best result. Experiments on the use of pre-trained language representation models (such as DistilBERT) for classifying Pashto texts show that we need a specific tokenizer for a particular language to obtain reasonable results.
New research articles 2020 october issue international journal of multimedi...ijma
The International Journal of Multimedia & Its Applications (IJMA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Multimedia & its applications. The journal focuses on all technical and practical aspects of Multimedia and its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding recent developments this arena, and establishing new collaborations in these areas.
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSIJCI JOURNAL
Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP), and two hybrid models of CNN and RNN. We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.
The document discusses a study that developed a computerized material to help English learners acquire and use English phrasal verbs. The study found:
1) The software enhanced learners' acquisition of target phrasal verbs, especially their ability to understand figurative meanings.
2) Learners performed better on phrasal verbs containing "break" compared to those containing "bring" or "come" after using the software.
3) The software improved learners' comprehension of phrasal verbs on immediate post-tests and on delayed tests taken one week later, indicating its effects lasted over time.
This document provides an overview of swarm intelligence and various swarm intelligence algorithms. It defines swarm intelligence as the collective behavior of decentralized, self-organized systems, both natural and artificial. Examples of swarm intelligence in nature include ant colonies, bird flocking, fish schooling, and bacterial growth. Several swarm intelligence algorithms are described, including particle swarm optimization, ant colony optimization, artificial bee colony, bacterial foraging optimization, and gravitational search algorithm. These algorithms were inspired by behaviors observed in swarms in nature.
Exploring the Order of Precedence when Using Contextual Dimensions for Mobile...Periquest Ltd
This document summarizes a study exploring the order of precedence of contextual dimensions for delivering mobile information to students. It involved delivering information via RSS and Twitter and surveying students about contexts like location, time of day, and activity. Preliminary results showed students prioritized location over other contexts. The ongoing study will analyze dimensions in more realistic situations and develop mobile apps for additional contexts.
This document summarizes the agenda and key topics for a CIS 890 project final presentation on topics modelling with LDA. The presentation will cover LDA modelling, HMMLDA modelling, LDA with collocations modelling, and experimental results on the NIPS collection. It will discuss topic modelling approaches like LDA, discriminative vs generative methods, and limitations of bag-of-words assumptions.
This document provides a summary of Satanjeev Banerjee's education, work experience, areas of research interest, publications, software projects, and references. It details his PhD studies in language technologies at Carnegie Mellon University, as well as his master's degrees from CMU and the University of Minnesota Duluth. His work experience includes research assistantships at CMU working on topics like speech summarization and meeting understanding. He has numerous publications in these areas and has developed software like the SmartNotes meeting recording system and the METEOR machine translation evaluation metric.
The document discusses the relationship between neurodiversity, specifically autism, and software development. It provides empirical evidence that autism occurs more frequently in families of engineers and scientists and that mothers of autistic children are more likely to work in technical fields. It explores historical studies of programmers that found traits common in autistic individuals, such as a preference for hierarchical decomposition and opportunistic problem-solving approaches. The document examines cognitive aspects of programming, like chunking, beacons, and schema, that play to the strengths of autistic thinking.
An-Exploration-of-scientific-literature-using-Natural-Language-ProcessingTheodore J. LaGrow
The document describes a study that used natural language processing to analyze scientific literature on specific topics and identify trends in technologies and methods used. The researchers extracted text from the first 100 articles returned for searches on Hawkes processes, galaxy evolution, T-cell receptor genomes, and natural language processing. NLP was used to identify noun phrases relating to predefined interesting words, and word clouds were generated to visualize frequencies. Results showed technologies and methods relevant to each topic. The researchers aim to continue improving the software to better connect researchers with useful tools.
A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...Martha Brown
This document presents a questionnaire for conducting fieldwork on endangered and indigenous languages in India. The questionnaire was developed through discussions with linguists and is designed to create dictionaries and basic grammars for documented languages. It includes sections on details of language experts, language vitality, diversity and attitudes, word and sentence lists, anthropological questions, and demographic profiling. The goal is to document languages in a standardized yet flexible way while balancing academic and community needs. Picture books and videos are used to elicit unique linguistic aspects for each language.
In recent decades, the Neo-Darwinian Synthesis has been quietly expanded to embrace the evolution of complex systems (living and non-living) and the information on which they are based (e.g., Adami 2011; Mayfield 2013). The expanded theoretical framework is especially appropriate—perhaps essential—for understanding the evolution of modern humans, who represent major changes in the way that information is stored, transmitted, translated, and manipulated (Maynard Smith and Szathmáry 1995). Modern humans may be distinguished from earlier forms of Homo by an enhanced faculty for manipulation of information (i.e., computation) that permits generation of a potentially infinite variety of combinations of hierarchically-organized units of information. This faculty is most commonly manifest in the computations that underlie spoken and unspoken language (Hauser et al. 2002), which may be considered a form of information technology. Spoken or imagined words are “material symbols” (Clark 2008) manipulated in the brain to facilitate complex computation in a manner analogous to the beads of an abacus.
If technology is viewed as a form of computation (i.e., manipulation of objects and materials), this faculty also is evident in the artifacts produced by modern humans, which exhibit an increasingly complex, hierarchical organization with a potentially infinite variety of combinatorial possibilities. Because the acquisition of syntactic language requires a lengthy "critical period" of exposure during childhood, the computational complexity of language appears to be linked to the significantly delayed maturation of the modern human brain (which is only 25% of its adult volume at birth). Greenfield (1991) found that the manipulation of objects exhibits increasing complexity (i.e., more hierarchical levels of organization) during childhood and noted overlap in areas of the brain activated for language and object manipulation. The enhanced faculty for manipulation of information and objects (i.e., increased computational complexity) found in modern humans is thus plausibly tied to the delayed growth of the brain and extended childhood, which begins to evolve after about 0.5 million years ago, but apparently is not comparable to that of living people until after 0.2 million years ago (Smith et al. 2007; Smith et al. 2010). The evolution of enhanced computational complexity in modern humans transformed existing systems of communication and technology, yielding an open-ended syntactic form of language and potentially infinite variety of hierarchically structured artifacts. Modern humans created new forms of information, including visual art (analog) and notation (digital), and colonized most terrestrial habitats on Earth by designing their own adaptive “traits” (e.g., tailored clothing) based on complex technological computations.
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
Deep Learning - What's the buzz all aboutDebdoot Sheet
Deep learning is a genre of machine learning algorithms that attempt to solve tasks by learning abstraction in data following a stratified description paradigm using non-linear transformation architectures.
When put in simple terms, say you want to make the machine recognize some Mr. X with Mt. E in the background, then this task is a stratified or hierarchical recognition task. At the base of the recognition pyramid would be kernels which can discriminate flats, lines, curves, sharp angles, color; higher up will be kernels which use this information to discriminate body parts, trees, natural scenery, clouds, etc.; higher up will use this knowledge to recognize humans, animals, mountains, etc.; and higher up will learn to recognize Mr. X and Mt. E and finally the apex lexical synthesizer module would say that Mr. X is standing in front of Mt. E. Deep learning is all about how you make machines synthesize this hierarchical logic and also learn these representative kernels all by itself.
Deep learning has been extensively used to efficiently solve these kinds of problems from handwritten character recognition (NYU, U Toronto), speech recognition (Microsoft, Google Voice), lexical ordered speech synthesis (Google Voice, iPhone Siri), object and poster recognition (Cortica), image retrieval (Baidu), content filtering (Youtube, Metacafe, Twitter), product visibility tracking (GazeMetrix), computational medical imaging (IITKgp).
This talk will focus on the buzz around this topic and how firm does the buzz hold on to the claims it boasts of?
TUNING DARI SPEECH CLASSIFICATION EMPLOYING DEEP NEURAL NETWORKSkevig
Recently, many researchers have focused on building and improving speech recognition systems to
facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system
has become an important and commontool from games to translation systems, robots, and so on. However,
there is still a need for research on speech recognition systems for low-resource languages. This article
deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients
(MFCCs) feature extraction method and three different deep neural networks including Convolutional
Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our
models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms.
This study obtained the impressive result of 98.365% average accuracy.
Tuning Dari Speech Classification Employing Deep Neural Networkskevig
Recently, many researchers have focused on building and improving speech recognition systems to facilitate and enhance human-computer interaction. Today, Automatic Speech Recognition (ASR) system has become an important and common tool from games to translation systems, robots, and so on. However, there is still a need for research on speech recognition systems for low-resource languages. This article deals with the recognition of a separate word for Dari language, using Mel-frequency cepstral coefficients (MFCCs) feature extraction method and three different deep neural networks including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Multilayer Perceptron (MLP). We evaluate our models on our built-in isolated Dari words corpus that consists of 1000 utterances for 20 short Dari terms. This study obtained the impressive result of 98.365% average accuracy.
This document outlines the structure and contents of a literature review for a study exploring the effectiveness of using computer games for vocabulary learning. It discusses strategies for vocabulary learning, students' perceptions of using computer games, and the effectiveness of computer games compared to traditional memorization methods. The review covers incidental vs intentional learning, students' implicit evaluations of different learning methods, and how multimedia and engagement in computer games can facilitate vocabulary acquisition.
This document provides a list of references related to conversation analysis and ethnomethodology. It includes over 30 sources from the 1960s to the 2000s that examine the analysis of natural conversation and institutional talk, the use of language in legal and medical settings, and the study of social interaction and the organization of conversation. The references cover foundational works in conversation analysis and ethnomethodology as well as more recent applications of these approaches.
Este documento analiza el modelo de negocio de YouTube. Explica que YouTube y otros sitios de video online representan un nuevo modelo de negocio para contenidos audiovisuales debido al cambio en los hábitos de consumo causado por las nuevas tecnologías. Describe cómo YouTube aprovecha la participación de los usuarios para mejorar continuamente y atraer una audiencia diferente a la de los medios tradicionales.
The defense was successful in portraying Michael Jackson favorably to the jury in several ways:
1) They dressed Jackson in ornate costumes that conveyed images of purity, innocence, and humility.
2) Jackson was shown entering the courtroom as if on a red carpet, emphasizing his celebrity status.
3) Jackson appeared vulnerable, childlike, and in declining health during the trial, eliciting sympathy from jurors.
4) Defense attorney Tom Mesereau effectively presented a coherent narrative of Jackson as a victim and portrayed Neverland as a place of refuge, undermining the prosecution's arguments.
Michael Jackson was born in 1958 in Gary, Indiana and rose to fame in the 1960s as the lead singer of The Jackson 5, topping music charts in the 1970s. As a solo artist in the 1980s, his album Thriller broke music records. In the 1990s and 2000s, Jackson faced several legal issues related to child abuse allegations while continuing to release music. He married Lisa Marie Presley and Debbie Rowe and had two children before his death in 2009.
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
This document appears to be a list of popular books from various authors. It includes over 150 book titles across many genres such as fiction, non-fiction, memoirs, and novels. The books cover a wide range of topics from politics to cooking to autobiographies.
The prosecution lost the Michael Jackson trial due to several key mistakes and weaknesses in their case:
1) The lead prosecutor, Thomas Sneddon, was too personally invested in the case against Jackson, having pursued him for over a decade without success.
2) Sneddon's opening statement was disorganized and weak, failing to effectively outline the prosecution's case.
3) The accuser's mother was not credible and damaged the prosecution's case through her erratic testimony, history of lies and con artist behavior.
4) Many prosecution witnesses were not credible due to prior lawsuits against Jackson, debts owed to him, or having been fired by him. Several witnesses even took the Fifth Amendment.
Here are three examples of public relations from around the world:
1. The UK government's "Be Clear on Cancer" campaign which aims to raise awareness of cancer symptoms and encourage early diagnosis.
2. Samsung's global brand marketing and sponsorship activities which aim to increase brand awareness and favorability of Samsung products worldwide.
3. The Brazilian government's efforts to improve its international image and relations with other countries through strategic communication and diplomacy.
The three most important functions of public relations are:
1. Media relations because the media is how most organizations reach their key audiences. Strong media relationships are crucial.
2. Writing, because written communication is at the core of public relations and how most information is
Michael Jackson Please Wait... provides biographical information about Michael Jackson including his birthdate, birthplace, parents, height, interests, idols, favorite foods, films, and more. It discusses his background, career highlights including influential albums like Thriller, and films he appeared in such as The Wiz and Moonwalker. The document contains photos and details about Jackson's life and illustrious music career.
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
The document discusses the process of manufacturing celebrity and its negative byproducts. It argues that celebrities are rarely the best in their individual pursuits like singing, dancing, etc. but become famous due to being products of a system controlled by wealthy elites. This system stifles opportunities for worthy artists and creates feudalism. The document also asserts that manufactured celebrities should not be viewed as role models due to behaviors like drug abuse and narcissism that result from the celebrity-making process.
Michael Jackson was a child star who rose to fame with the Jackson 5 in the late 1960s and early 1970s. As a solo artist in the 1970s and 1980s, he had immense commercial success with albums like Off the Wall, Thriller, and Bad, which featured hit singles and groundbreaking music videos. However, his career and public image were plagued by controversies related to allegations of child sexual abuse in the 1990s and 2000s. He continued recording and performing but faced ongoing media scrutiny into his private life until his death in 2009.
Social Networks: Twitter Facebook SL - Slide 1butest
The document discusses using social networking tools like Twitter and Facebook in K-12 education. Twitter allows students and teachers to share short updates and can be used to give parents a window into classroom activities. Facebook allows targeted advertising that could be used to promote educational activities. Both tools could help facilitate communication between schools and communities if used properly while managing privacy and security concerns.
Facebook has over 300 million active users who log on daily, and allows brands to create public profile pages to interact with users. Pages are for brands and organizations only, while groups can be made by any user about any topic. Pages do not show admin names and have no limits on fans, while groups display admin names and are limited to 5,000 members. Content on pages should aim to provoke action from subscribers and establish a regular posting schedule using a conversational tone.
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
Hare Chevrolet is a car dealership located in Noblesville, Indiana that has successfully used social media platforms like Twitter, Facebook, and YouTube to create a positive brand image. They invest significant time interacting directly with customers online to foster a sense of community rather than overtly advertising. As a result, Hare Chevrolet has built a large, engaged audience on social media and serves as a model for how brands can use online presences strategically.
Welcome to the Dougherty County Public Library's Facebook and ...butest
This document provides instructions for signing up for Facebook and Twitter accounts. It outlines the sign up process for both platforms, including filling out forms with name, email, password and other details. It describes how the platforms will then search for friends and suggest people to connect with. It also explains how to search for and follow the Dougherty County Public Library page on both Facebook and Twitter once signed up. The document concludes by thanking participants and providing a contact for any additional questions.
Paragon Software announces the release of Paragon NTFS for Mac OS X 8.0, which provides full read and write access to NTFS partitions on Macs. It is the fastest NTFS driver on the market, achieving speeds comparable to native Mac file systems. Paragon NTFS for Mac 8.0 fully supports the latest Mac OS X Snow Leopard operating system in 64-bit mode and allows easy transfer of files between Windows and Mac partitions without additional hardware or software.
This document provides compatibility information for Olympus digital products used with Macintosh OS X. It lists various digital cameras, photo printers, voice recorders, and accessories along with their connection type and any notes on compatibility. Some products require booting into OS 9.1 for software compatibility or do not support devices that need a serial port. Drivers and software are available for download from Olympus and other websites for many products to enable use with OS X.
To use printers managed by the university's Information Technology Services (ITS), students and faculty must install the ITS Remote Printing software on their Mac OS X computer. This allows them to add network printers, log in with their ITS account credentials, and print documents while being charged per page to funds in their pre-paid ITS account. The document provides step-by-step instructions for installing the software, adding a network printer, and printing to that printer from any internet connection on or off campus. It also explains the pay-in-advance printing payment system and how to check printing charges.
The document provides an overview of the Mac OS X user interface for beginners, including descriptions of the desktop, login screen, desktop elements like the dock and hard disk, and how to perform common tasks like opening files and folders. It also addresses frequently asked questions for Windows users switching to Mac OS X, such as where documents are stored, how to save or find documents, and what the equivalent of the C: drive is in Mac OS X. The document concludes with sections on file management tasks like creating and deleting folders, organizing files within applications, using Spotlight search, and an overview of the Dashboard feature.
This document provides a checklist for securing Mac OS X version 10.5, focusing on hardening the operating system, securing user accounts and administrator accounts, enabling file encryption and permissions, implementing intrusion detection, and maintaining password security. It describes the Unix infrastructure and security framework that Mac OS X is built on, leveraging open source software and following the Common Data Security Architecture model. The checklist can be used to audit a system or harden it against security threats.
This document summarizes a course on web design that was piloted in the summer of 2003. The course was a 3 credit course that met 4 times a week for lectures and labs. It covered topics such as XHTML, CSS, JavaScript, Photoshop, and building a basic website. 18 students from various majors enrolled. Student and instructor evaluations found the course to be very successful overall, though some improvements were suggested like ensuring proper software and pairing programming/non-programming students. The document also discusses implications of incorporating web design material into existing computer science curriculums.
1. Ido Dagan - Curriculum Vitae
Personal Data
Date of birth: September 7, 1960
Place of birth: Israel
Military service: 1978-1983
Address: 26a Shivtei Israel St., Ramat Hasharon 47267, Israel.
Home phone: +972-(0)3-5472410
Cell phone: +972-(0)54-395336
Email: dagan@cs.biu.ac.il
Education
1992: Ph.D. Computer Science, Technion – Israel Institute of Technology.
Thesis topic: Multilingual statistical methods for natural language disambiguation.
Supervisor: Prof. Alon Itai.
1986: B.Sc. Computer Science, Summa Cum Laude, Technion – Israel Institute of
Technology. On the Technion President’s List of Excellence 1984,1985,1986.
Employment
2002 – present: Vice President of Technology, LingoMotors (continued employment after
FocusEngine acquisition).
1998 - 2001: Founder, Chief Technology Officer and Director, FocusEngine (until acquired
by LingoMotors).
Comment: During my industrial employment I continued my academic activity with Bar Ilan
at lower profile, mostly supervising graduate students and publishing papers with them, while
conducting some international activity (details below).
1996 - 1998: Visiting Lecturer, Dept. of Mathematics and Computer Science, Bar Ilan
University.
1994 - 1996: Post Doctoral Research Fellow, Dept. of Mathematics and Computer
Science, Bar Ilan University.
1992 - 1994: Member of Technical Staff, AT&T Bell Laboratories, Research Division.
1990 - 1991: Research Fellow, IBM Israel Scientific Center, Haifa.
1984 - 1986: CAD Software Engineer, INTEL Israel Ltd.
1978 - 1983: Israel Defense Forces. Software development and system analysis.
Teaching Experience
Teaching courses and seminars at Bar Ilan University: Algorithms 1, Natural Language
Processing (NLP), Empirical Methods for NLP, Information Retrieval.
1
2. Research Interests
Natural Language Processing (NLP):
• Empirical Natural Language Processing
• Machine learning methods for NLP
• Robust corpus-based semantic-level processing
• Applications for textual information access and extraction (such as information
extraction, question answering, text categorization), and multi-lingual applications
Publications
Journal Articles
1. Dagan, Ido, Martin C. Golumbic and Ron Y. Pinter. Trapezoid graphs and their
coloring, Discrete Applied Mathematics, 1988, Vol. 21, pp. 35-46.
2. Dagan, Ido and Alon Itai. Set expression based inheritance system, Annals of
Mathematics and Artificial Intelligence, 1991, Vol. 4(3-4), pp. 269-280.
3. Dagan, Ido and Alon Itai. Word sense disambiguation using a second language
monolingual corpus, Computational Linguistics, 1994, Vol. 20(4), pp. 563-596.
4. Dagan, Ido, John Justeson, Shalom Lappin, Herbert Leass and Amnon Ribak. Syntax
and lexical statistics in anaphora resolution, Applied Artificial Intelligence, 1995, Vol.
9, pp. 633-644.
5. Dagan, Ido, Shaul Marcus and Shaul Markovitch. Contextual word similarity and
estimation from sparse data, Computer, Speech and Language, 1995, Vol. 9, pp.
123-152.
6. Dagan, Ido and Kenneth Church. Termight: Coordinating man and machine in
bilingual terminology acquisition, Machine Translation, 1997, Vol. 12(1-2), pp.
89-107.
7. Feldman, Ronen, Ido Dagan and Haym Hirsh. Mining text using keyword
distributions, Journal of Intelligent Information Systems, 1998, Vol. 10(3), pp.
281-300.
8. Dagan, Ido, Lillian Lee and Fernando Pereira. Similarity-based models of
cooccurrence probabilities, Machine Learning, 1999, Vol. 34(1-3) special issue on
Natural Language Learning, pp. 43-69.
9. Argamon, Shlomo, Ido Dagan and Yuval Krymolowski. A memory based approach
to learning shallow natural language patterns, Journal of Experimental and
Theoretical AI (JETAI), 1999, Vol. 11, pp. 369-390.
10.Argamon-Engleson, Shlomo and Ido Dagan. Committee-Based Sample Selection for
Probabilistic Classifiers, Journal of Artificial Intelligence Research (JAIR), 1999,
Vol. 11, pp. 335-360.
2
3. 11.Marx, Zvika and Ido Dagan. Conceptual mapping through keyword coupled
clustering. Mind and Society: a Special Issue on Commonsense and Scientific
Reasoning, 2002, forthcoming (27 pages).
12.Marx, Zvika, Ido Dagan, Joachim M. Buhmann and Eli Shamir. Coupled clustering:
a method for detecting structural correspondence, Journal of Machine Learning
Research, 2002, forthcoming (29 pages).
Refereed Articles in Books
Comment: Four of the articles below (Nos. 1,2,4,5) appear in refereed article collections
dedicated to original research results in specific areas, which were published as books
(similar to journal special issues). The fifth article (No. 3) is a refereed invited chapter in the
Handbook of Natural Language Processing.
1. Engelson, Sean and Ido Dagan. Sample selection in natural language learning, in S.
Wermter, E. Riloff and G. Scheler (Eds.), Connectionist, Statistical and Symbolic
Approaches to Learning for Natural Language Processing, Springer, 1996, pp.
230-245.
2. Dagan, Ido, Kenneth Church and William Gale. Robust bilingual word alignment for
machine aided translation, in S. Armstrong, K. Church, P. Isabelle, S. Manzi, E.
Tzoukermann and D. Yarowsky (Eds.), Natural Language Processing Using Very
Large Corpora, Kluwer Academic Publishers, 1999, pp. 209-224.
3. Dagan, Ido. Contextual Word Similarity, in Rob Dale, Hermann Moisl and Harold
Somers (Eds.), Handbook of Natural Language Processing, Marcel Dekker Inc, 2000,
Chapter 19, pp. 459-476.
4. Choueka, Yaacov, Ehud S. Conley and Ido Dagan. A comprehensive bilingual word
alignment system: application to disparate languages - Hebrew and English, in J.
Veronis (Ed.), Parallel Text Processing, Kluwer Academic Publishers, 2000, pp. 69–
96.
5. Dagan, Ido and Yuval Krymolowski. Compositional memory-based partial parsing, in
R. Bod, R. Scha and K. Sima'an (Eds.), Data-Oriented Parsing, CSLI Publications,
2002, forthcoming (20 pages).
Papers at Refereed Conferences and Workshops
1. Dagan, Ido and Alon Itai. Automatic Acquisition of Constraints for the Resolution of
Anaphora References and Syntactic Ambiguities, in Proceedings of COLING, 1990,
pp. 330-332.
2. Dagan, Ido and Alon Itai. A Statistical Filter for Resolving Pronoun References, in Y.
A. Feldman and A. Bruckstein (Eds.), Artificial Intelligence and Computer Vision,
Elsevier Science Publishers B.V., 1991, pp. 125-135 (Proceedings of the 7th Israeli
Symposium on Artificial Intelligence and Computer Vision, 1990).
3. Dagan, Ido, Alon Itai and Ulrike Schwall. Two languages are more informative than
one, in Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 1991, pp. 130-137.
3
4. (Extended version appears in journal article 3)
4. Dagan, Ido. Lexical disambiguation: Information sources and their statistical
realization, in Proceedings of the Annual Meeting of the Association for
Computational Linguistics (ACL) (Student Session), 1991, pp. 341-342.
5. Rackow, Ulrike, Ido Dagan and Ulrike Schwall. Automatic translation of noun
compounds, in Proceedings of COLING, 1992, pp. 1249-1253.
6. Dagan, Ido, Shaul Marcus and Shaul Markovitch. Contextual word similarity and
estimation from sparse data, in Proceedings of the Annual Meeting of the Association
for Computational Linguistics (ACL), 1993, pp. 164-171.
(Extended version appears in journal article 5)
7. Dagan, Ido, Kenneth Church and William Gale. Robust bilingual word alignment for
machine aided translation, in Proceedings of the Workshop on Very Large Corpora
(WVLC), 1993, pp. 1-8.
(Extended version appears in book article 2)
8. Dagan, Ido, John Justeson, Shalom Lappin Herbert Leass and Amnon Ribak. Syntax
and lexical statistics in anaphora resolution, Bar-Ilan Symposium on Foundations of
AI, 1993.
(Extended version included in journal article 4)
9. Dagan, Ido, Fernando Pereira and Lillian Lee. Similarity-based estimation of word
cooccurrence probabilities, in Proceedings of the Annual Meeting of the Association
for Computational Linguistics (ACL), 1994, pp. 272-278.
(Extended version included in journal article 8)
10.Dagan, Ido and Kenneth Church. Termight: Identifying and translating technical
terminology, in Proceedings of the 4th Conference on Applied Natural Language
Processing (ANLP), 1994, pp. 34-40.
(Extended version appears in journal article 6)
11.Dagan, Ido and Sean Engelson. Committee-based sampling for training probabilistic
classifiers, in Proceedings of the Twelfth International Conference on Machine
Learning (ICML), 1995.
(Extended version included in journal article 10)
12.Dagan, Ido and Sean Engelson. Selective sampling in natural language learning, in
Proceedings of the IJCAI Workshop on New Approaches to Learning for Natural
Language Processing, 1995, pp. 41-48.
(Extended version appears in book article 1)
13.Feldman, Ronen and Ido Dagan. KDT - Knowledge Discovery in Texts, in
Proceedings of the First International Conference on Knowledge Discovery (KDD),
1995, pp. 112-117.
(Extended version included in journal article 7)
14.Feldman, Ronen and Ido Dagan. Knowledge Discovery in Textual Databases, in
Proceedings of the ECML Workshop in Knowledge Discovery, 1995.
15.Engelson, Sean and Ido Dagan. Minimizing Manual Annotation Cost in Supervised
Training from Corpora, in Proceedings of the Annual Meeting of the Association for
4
5. Computational Linguistics (ACL), 1996, pp. 319-326.
(Extended version included in journal article 10)
16.Dagan, Ido, Ronen Feldman and Haym Hirsh. Keyword-Based Browsing and
Analysis of Large Document Sets, in Proceedings of The Fifth Annual Symposium on
Document Analysis and Information Retrieval (SDAIR), 1996, pp. 191-208.
(Extended version included in journal article 7)
17.Feldman, Ronen, Ido Dagan and Willi Kloesgen. Efficient algorithms for mining and
manipulating associations in texts, in Proceedings of the Thirteenth European
Meeting on Cybernetics and Systems Research (EMCSR), 1996.
18.Dagan, Ido, Lillian Lee and Fernando Pereira. Similarity-based methods for word
sense disambiguation, in Proceedings of the Annual Meeting of the Association for
Computational Linguistics (ACL), 1997, pp 56-63.
(Extended version included in journal article 8)
19.Dagan, Ido, Yael Karov and Dan Roth. Mistake-driven learning in text categorization,
in Proceedings of Second Conference on Empirical Methods in Natural Language
Processing (EMNLP-2), 1997.
20.Yamazaki, Takefumi and Ido Dagan. Mistake-driven learning with thesaurus for text
categorization, in Proceedings of the Natural Language Pacific Rim Symposium
(NLPRS-97), 1997.
21.Argamon, Shlomo, Ido Dagan and Yuval Krymolowsky. Memory-based learning of
shallow natural language patterns, in Proceedings of the Annual Meeting of the
Association for Computational Linguistics (ACL), 1998.
(Extended version appears in journal article 9)
22.Marx, Zvi, Ido Dagan and Eli Shamir. Detecting Sub-Topic Correspondence through
Bipartite Term Clustering, in Proceedings of the ACL-1999 Workshop on
Unsupervised Learning in Natural Language Processing, 1999, pp. 45-51.
(Extended version included in journal article 11)
23.Krymolowski, Yuval and Ido Dagan. Compositional Memory-Based Partial Parsing,
in Proceedings of the Annual Meeting of the Association for Computational
Linguistics (ACL), 2000, pp. 45-52.
(Extended version appears in book article 5)
24.Marx, Zvika, Ido Dagan, Joachim M. Buhmann. Coupled Clustering: a method for
detecting structural correspondence, in Proceedings of the Eighteenth International
Conference on Machine Learning (ICML), 2001, pp.353–360.
(Extended version appears in journal article 12)
25.Marx, Zvika, Ido Dagan and Eli Shamir. Cross-component clustering for template
induction, in Proceedings of the ICML Workshop on Text Learning (TextML), 2002,
pp. 66-75.
26.Dagan, Ido, Zvika Marx and Eli Shamir. Cross-dataset clustering: revealing
corresponding Themes Across Multiple Corpora, in Proceedings of the Sixth
Conference on Natural Language Learning (CoNLL), 2002, pp. 15-21.
5
6. Patents
1. Glossary construction tool, with co-inventor Kenneth Church, at AT&T. U.S. Patent
No. 5,850,561, filed September 23, 1994, issued December 15, 1998.
Co-inventor of the following three patent applications at FocusEngine, in the areas of text
categorization and its combination with other information access methods:
2. U.S. Patent Application No. 09/512,252, filed February 24, 2000
3. U.S. Patent Application No. 09/690,307, filed October 17, 2000
4. U.S. Patent Application No. 60/275,839, filed March 14,2001
International Professional Activities
Journal Editorial Boards
1. Editorial Board of the Computational Linguistics (CL) journal, 1995 – 1997.
2. Editorial Board of the Machine Translation (MT) journal, 1999 – present.
Program Chairing
• Program co-chair of the Fourth ACL SIGDAT International Workshop on Very Large
Corpora (WVLC-4), Copenhagen, 1996.
International Conference Program Committees
1. ANLP 1994, ACL Conference on Applied Natural Language Processing.
2. ACL 1995, Annual Meeting of the Association for Computational Linguistics.
3. BISFAI 1995, Fourth Bar-Ilan Symposium on Foundations of Artificial Intelligence.
4. TMI 1997, International Conference on Theoretical and Methodological Issues in
Machine Translation.
5. WVLC 1997, Fifth ACL SIGDAT International Workshop on Very Large Corpora.
6. BISFAI 1997, Fifth Bar-Ilan Symposium on Foundations of Artificial Intelligence.
7. AAAI 1998, The Fifteenth National Conference on Artificial Intelligence (NLP track).
8. COLING/ACL 1998, Joint conference for COLING and the Annual Meeting of the
Association for Computational Linguistics.
9. COLING/ACL 1998 Student Session.
10. Computerm 1998, International Workshop on Computational Terminology.
11.ANLP 1999, ACL Conference on Applied Natural Language Processing.
12. ACL 2000, Annual Meeting of the Association for Computational Linguistics.
13.ANLP 2000, ACL Conference on Applied Natural Language Processing.
14. NAACL 2000, North-American Chapter of the ACL.
15. ACL 2001, Annual Meeting of the Association for Computational Linguistics.
16. NAACL 2001, North-American Chapter of the ACL.
17. ACL 2002, Annual Meeting of the Association for Computational Linguistics.
18. EMNLP 2002, Empirical Methods in Natural Language Processing.
19. COLING 2002.
20. Computerm 2002, International Workshop on Computational Terminology.
21. ACL 2003, Annual Meeting of the Association for Computational Linguistics (Area:
Machine Learning for Natural Language).
6
7. Reviewing
• Reviewing empirical NLP article submissions for the following journals:
a. Computational Linguistics
b. Machine Translation
c. Journal of Artificial Intelligence Research
d. Machine Learning
e. Natural Language Engineering
f. Annals of Mathematics and Artificial Intelligence
g. Information Processing and Management.
• Reviewing research grant proposals in the NLP area for:
a. Israel Science Foundation (ISF)
b. US-Israel Binational Science Foundation (BSF)
c. German-Israeli Foundation for Scientific Research and Development (GIF)
d. Israel Ministry of Science.
Invited Summer School Courses
1. Statistical Methods for Natural Language Processing.
ESSLLI – European Summer School on Language, Logic and Information, Lisbon,
Portugal, 1993.
2. Statistical Machine Translation.
ELSNET European Summer School on Language and Speech Communication: Corpus-
Based Methods, Utrecht, Holland, 1994.
3. Multilingual Corpus Processing.
ELSNET European Summer School on Language and Speech Communication:
Multilinguality in Speech and Language Processing, Edinburgh, Scotland, 1995.
4. Lexical Statistical Methods for Natural Language Processing.
ESSLLI – European Summer School on Language, Logic and Information, Saarbrucken,
Germany, 1998.
5. Text Mining.
ELSNET European Summer School on Language and Speech Communication: Text and
Speech Triggered Information Access, Xios, Greece, 2000
Invited Talks and Panels
1. Panelist at the meeting of the European Expert Advisory Group on Language Engineering
Standards (EAGLES), Madrid, 1997. Topic: Bilingual alignment and lexicon acquisition.
2. Invited talk at the SPARKLE (Shallow PARsing and Knowledge Extraction for Language
Engineering) European project review, Pisa, 1998. Topic: Automatic thesaurus
construction.
3. Invited talk at the Bolzano (Italy) Workshop on Corpus-based Terminology, 1998. Topic:
Automated corpus-based acquisition of bilingual terminology.
4. Invited talk at TALN, Annual Meeting of the French Natural Language Processing
Association, 1999. Topic: Vector models in language processing.
7
8. 5. Invited talk at the TELRI European Seminar (Trans-European Language Resources
Infrastructure), 1999. Topic: Automatic acquisition of multi-lingual resources.
Conference Tutorials
1. Bilingual Word Alignment and Lexicon Construction.
At the Annual Meeting of the Association for Computational Linguistics (ACL), 1996.
2. Bilingual Word Alignment and Lexicon Construction.
At the International Conference on Computational Linguistics (COLING), 1996.
3. Lexical Statistical Methods for Natural Language Processing.
At the joint COLING-ACL conference, 1998.
EACL Advisory Board
• Advisory Board of the European Chapter of the Association for Computational
Linguistics (EACL), 2003-2004.
Advising the EACL president and other officers on various issues, such as events,
projects and academia-industry collaboration.
Research Grants
1. Ido Dagan and Alon Itai. Statistical Methods for Disambiguation in Natural
Languages. Grant number 120-741 of the Israel Council for Research and
Development, 1988-1992.
2. Ido Dagan, Yaacov Choueka and Sean Engleson. Alignment of parallel bilingual
texts: Handling disparate languages and rich morphology and applying local syntactic
constraints. Grant number 488/95-1 of the Israel Science Foundation (ISF),
1995-1998.
3. Yaacov Choueka, Ido Dagan, Tomi Klein, Ariel Frank and Michael Elhadad. Taming
the information highway: Infrastructures and prototypes for intelligent textual
information handling, with special attention to Hebrew. Grant of the Israeli Ministry
of Science and the Arts for Scientific and Technological Infrastructure, 1995-1998.
4. Ronen Feldman, Ido Dagan, Willy Kloewsgen and Stefan Wrobel. Generic
environment and high-level language for knowledge discovery in texts. Grant of the
German-Israeli Foundation for Scientific Research and Development (GIF),
1996-1999.
5. Ido Dagan, Ronen Feldman, Beatrice Daille, Yves Kodratof. Term level text mining:
representations and algorithms. Grant of AFIRST – French-Israeli Scientific
Cooperation, 1996-1998.
6. Ido Dagan. Similarity and analogy in structured textual information processing. Grant
of the Israel Science Foundation (ISF), 1998-2001.
8
9. Supervising Graduate Students
M.Sc. Thesis
1.Shlomit Hazan (with Dr. Ronen Feldman): Discovery and clustering of association rules
in large data bases. 1997.
2.Erez Lotan: Automatic construction of a statistical thesaurus. 1998.
3.Alex Avramovitch: An Internet Crawler for automatic corpus and thesaurus construction.
1998.
4.Shelly Katz (with Dr. Ariel Frank): Intelligent information filtering within information
harvesting in the Internet. 1998.
5.Roman Mitnitsky: A personal search agent for Internet users. 1998.
6.Michal Finkelstein-Landau: Term-based summarization and knowledge discovery in
texts. 1999.
7.Marina Risher: Automatic query generation. 2001.
8.Ehud Conley: Seq_align: A parsing-independent bilingual sequence alignment algorithm.
2002.
9.Odelia Dayan: Automatic classification of text entities by machine learning methods.
(Thesis submission expected at early 2003)
Ph.D. Thesis
10.Zvika Marx (with Prof. Eli Shamir): Structure Based Computational Aspects of
Similarity and Analogy in Natural Language. (Towards completion – thesis submission
expected at end of 2002)
11.Yuval Krymolowski (with Prof. Amihood Amir): Partial Parsing using Memory-Based
Sequence Learning. (Towards completion – thesis submission expected at early 2003)
12.Oren Glickman (with Prof. Moshe Koppel): Generic Shallow Semantic Inference based
on Automatic Knowledge Acquisition. (Ph.D. research proposal approved in 2002)
13.Maayan Gefet (with Dr. Dror Feitelson): Automatic construction of ontology from text.
(Ph.D. research proposal to be submitted by end of 2002)
9