Computational Linguistics (Part II) .. FCIS'13 - ASU


Published on

Computational Linguistics Summary For FCIS'13- ASU Part (II)

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Computational Linguistics (Part II) .. FCIS'13 - ASU

  1. 1. 1
  2. 2. -What is Computational Linguistics -Approaches of the Study Of Computational Linguistics Developmental Structural Production Comprehension -Internet Linguistics -What is Internet Linguistics -Internet Linguistics Perspectives Sociolinguistic Perspective Educational Perspective Stylistic Perspective Applied Perspective • Multilingualism •Language Change •Conversation Discourse •Stylistic Diffusion •MetaLanguage -Linguistic Future Of The Internet -Translation Memory By:AbdoHelal 2
  3. 3.     Computational linguistics is an interdisciplinary field concerned with the rule-based modeling of natural language from a computational perspective. Computational linguistics works with language experts and computer scientists and it draws upon the involvement of : 1Linguists, 2-Mathematicians, 3-Computer scientists, 4-Experts in artificial intelligence, 5-Logicians, 6-Cognitive science, 7Cognitive psychologists, 8-Psycholinguistis. It has theoretical components which takes up issues in theoretical linguistics and cognitive science, and also has Applied components which focuses on the practical outcome of modeling human language use. Computational Linguistics is originated with efforts in the United states in the 1950s. By:AbdoHelal 3
  4. 4.      Computational linguistics is a new field to study devoted to developing algorithm and software for intelligently processing language data. Artificial intelligence came into existence in the 1960s. Morphology : The grammar of word form, Syntax: The grammar of sentence structure, Semantics: The study of the meaning, Lexicon: The meaning in the dictionary, Pragmatics: The Usage of language. Research within the scope of computational linguistics is done at computational linguistics departments, some researches aim to create working speech or text processing system, others aim to create a system allowing human-machine interaction. Conversational agents: programs meant for humanmachine communications. By:AbdoHelal 4
  5. 5. Examine language acquisition and development. Disadvantages:  1-Takes long time to learn  2-Only correct evidence is provided and this is insufficient. Language can be learned more efficiently with a combination of simple input at first presented incrementally. -Contributions of Developmental approach are :  1- Neural network  2-Robotic system (in order to test linguistics theories ) : these robots are able to acquire functioning word-tomeaning mapping without needing grammar structure  3-Predication of future changes in language and give insight into evolutionary history of modern days language.  By:AbdoHelal 5
  6. 6. One of the most important pieces of being able to study linguistic structure is the availability of large linguistic corpora.  Penn Treebank: one of the most cited linguistics corpora, containing over 4.5 million words of American English, this corpus has been annotated for part-of-speech information. -Contributions of Structural approach are:  1- allows computational linguistics to have a framework to work out hypothesis that will further the understanding of the language in several ways  2-Allowrs for the discovery and implementation of similarity recognition between pairs of text utterances.  Structural data is not simply available for English but available for other languages such as Japanese.  Computational linguistics allow scientists to parse large amount of data reliably and efficiently, creating possibility for discoveries unlike any other approach.  By:AbdoHelal 6
  7. 7.        Very complex approach as it deals with all the skills that a person need to speak a language fluently. Comprehension in only half the battle of communication , the other half is how system produces language. " Alan Turing " proposed the possibility that machine might one day be able to think, he proposed an ' imitation test ' in which human subject has two text-only conversations, one with a human and another with machine attempting to respond as a human, if the subject cannot tell the difference between the machine and human it may be concluded that the machine is capable of thinking. Today, this test is called ' Turing Test '. ELIZA program is one of the earliest and best known examples of computer programs designed to converse naturally with humans, its developed by " Joseph Weizenbaum " at MIT in 1966. In an effort to improve computer translation, several methods have been compared including : 1- Hidden Markov models, 2-Smoothing techniques, and the specific refinements of those to apply them to verb translation. Production approach has also done in making computer produce language in more naturalistic manner, making human-computer interaction much more natural. By:AbdoHelal 7
  8. 8.      Much of focus of modern computational linguistics is on comprehension. Bayesian statistics have applied to the task of character recognition illustrated by Bledsoe and Browing in 1959, and also applied to language analysis included the work of Mosteller and Wollace in 1963. Lunar is a project developed by NASA to answer written questions about geographically analysis of Lunar rocks by the Apollo missions. Signal modeling language was achieved with the use of Hidden Markov models detailed by Rabiner in 1989. Applications on Comprehension approach:1-Topic Identification, 2-Improved search engines, 3-Automated customer service, 4-Online Education. By:AbdoHelal 8
  9. 9.    It is a sub domain of linguistics advocated by David Crystal. It studies the new language styles and forms that have arisen under the influence of internet and other new media ,such as: SMS, HCI, CMC, IMC Contribution of Internet Linguistics: Studying the emerging language of the internet will help improving the conceptual organizations, translation and web usability, and that will benefit both linguists and web users. Four main perspectives of Internet Linguistics are : Sociolinguistics, Educational, Stylistics, and Applied. By:AbdoHelal 9
  10. 10.       Deals with how the society views the impact of internet development on language. It changed the way people communicate and created new platform with far-reaching social impact. ways of social communication : SMS, E-Mails, Chat groups, Virtual worlds, and the Web. Influence of Internet language personally, CMC such as SMS text messaging and e-mailing has greatly enhanced instantaneous communication, such as : Blackberry & iPhone. Influence of Internet language on Education: in school, it's common for students and educators to be given personalized e-mail accounts for communication and interaction purposes, classrooms discussions are increasingly brought onto the internet in form of discussion forums. Influence of Internet language professionally, it is a common sight for companies to have their computers and laptops hooked up onto the internet, it facilitates internal and external communication, Mobile communication such as smart phones are increasingly making their way into the corporate world. By:AbdoHelal 10
  11. 11.  Multilingualism: It looks at the status of the various language on the internet.  Language change: It explores the linguistic changed over time, with emphasis on the internet lingo.  Conversational discourse: It explores the change in patterns of social interaction and communicative practice on the internet.  Stylistic diffusion: It involves the study of the spread of the internet jargons and related linguistic forms into common usage.  Meta-language and folk linguistics: It involves looking at the way these linguistic forms and changes on the internet are being labeled and discussed. By:AbdoHelal 11
  12. 12.   Examine the internet impact on formal language use The rapid spread of internet use has brought onto new features such as:  The increase in usage of informal written language.  Inconsistency of written styles and stylistics and the use of new abbreviations in the internet chats and SMS.  Constraints of technology on the word count contributed to the rise of new abbreviations such as acronyms, and examples of acronyms are "LOL (Laughing out loud) - GTG (Got to go) - OMG (Oh my God)".  Disadvantages of Internet use:  Informal language and incorrect words use in academic and formal situation such as the use of the casual word "Guy" and the choice of the word "Preclude" instead of "Precede".  Use of abbreviations in the academic work such as "u" for "you" and "2" for "two".  Advantages of Internet use:  Internet provides potential benefits in enhancing language learners through communication aspects (use of E-mail, discussion forums, chatting messenger and blogs...)  IMC allow for the greater interaction between language learners and the native speakers of the language, providing for the better error corrections and more learning opportunity of the standard language allowing picking up of some special skills such as negotiation and persuasion. By:AbdoHelal 12
  13. 13.       Examine how the internet and its related technologies have encouraged new and different forms of creativity in language. This new mode of language is interesting to study because it is an mixture of both spoken and written languages, Traditional writing is static compared to the dynamic nature of new language on the internet where words can appear in different colors and font sizes on the computer screen. This new mode of language also contains other elements not found in natural languages, example is the concept of framing found in e-mails and discussion forums. Mobile Phone (cell phones) : have expressive potential beyond their basic communicative functions, The 160-character limit imposed by cell phone have motivated the users to exercise their linguistic creativity to overcome them. Cell phone has also created a new literary genre (cell phone novels). Blogs : Blogging has brought about new ways of writing diaries and from a linguistic perspective, the language used in blogs is published to the world to see without undergoing the formal editing process. Blogs have become so popular that they have expand beyond written blogs with emerging to photoblog, videoblog, audioblog, mobileblog. Virtual worlds : provide insight of how users are adapting the usage of their natural language for communication within these new mediums. Some of CMC strategies used include capitalization for words such a "EMPHASIS" , creative usage of the punctuation like "??!?!?!", and usage of symbols such as the asterisk to enclose words such as "*Stress*". Virtual worlds are good tools for language learning among younger learners as they already see such places as a "place to learn and play". By:AbdoHelal 13
  14. 14.   E-Mails : One of the most popular Internet-related technologies is E-mail, which expanded the stylistics of language in many ways. There is a hybrid of speech and writing styles in terms of format, grammar, and style. Email is rapidly replacing traditional letter-writing because of its convenience, speed, and spontaneity. Instant messaging : has developed its own acronyms and short forms. Instant messaging is quite different from email and chat-groups because it allows participant to interact in real-time while conversing in private. There are also greater occurrences of stylistic variation because there can be a very wide age gap between participants. By:AbdoHelal 14
  15. 15.    Views the linguistic exploitation of the internet in terms of its communicative capabilities - The good and the bad. The internet is a platform where minority and endangered languages seek to revive their languages use and to create awareness, it provides these languages opportunities to make progress in two important regards : 1- Language documentation, 2-Language revitalization ( ). Language documentation :  The internet facilitates language documentation.  Digital archives of media help to preserve language documentation and allow global dissemination through the internet.  Publicity about endangered languages has helped a spur worldwide interest in linguistic documentation.  The HRELP is a project that seeks to document endangered languages, preserve and disseminate documentation materials amount others.  Language Newsletter provides news and articles about topics in endangered languages. By:AbdoHelal 15
  16. 16.  Language revitalization :  The internet facilitates language revitalization.  Virtual environments (emails, chats, instant messaging) have helped to bridge the distance between communicators.  The use of e-mails facilitates language revitalization in the sense that speakers of minority languages who moved to a location where their native language is not spoken, can use the internet to communicate with their family and friends, thus maintaining the use of their native language.  Leoki (powerful voice) : is a system developed in Hawaiian where the content, interface and menus are entirely in the Hawaiian language.  Another use of the internet include having students of minority languages write about their native cultures in their native language for distant audience, in attempt to preserve their language and culture. By:AbdoHelal 16
  17. 17.         People Will alter their language use to suit the dimensions of Communication. The Increase of The Internet Users make cultural background , habits , and language differences to be brought to The Web. The Internet is on its way to become more diverse multilingual web. The interaction between English and other Languages will be important to study it. Promotion Will be done to The Minority Languages. However , the Minority Languages will be affected by the The Majority Ones. Speakers Of Minority Languages will be encouraged To Learn The majority languages to be Allowed to access more Re-Sources The Future Of Minority Languages is in danger Due to the Spread Of the internet By:AbdoHelal 17
  18. 18. Translation Memory : is a database that stores segments that have been translated previously To Aid Human Translation  Source-Text and its corresponding translation in “translation Units”  Words Are handled by “Terminology Bases”  Software Using the TM Are Called (TMM) Translation Memory Managers.  TM is used in CAT Tools , Word-processing and Terminology Management systems.  Many Companies producing multilingual documentation are using TM Systems -How The TMs Work : 1- Breaking the Source-Text into Segments. 2- Looks For Matches Between The Segments. 3- Presents Such Matching Pairs As Translation Candidates. 4- Accepting a Candidate and Replacing it With Fresh Translation , Modify , Or To match them To the Source. 5- Saving The database  By:AbdoHelal 18
  19. 19.    Typical TMs only search for text in the source segment. Segments where no-match Found Will have be Translated Manually and to be saved in the database. TMs Work best on texts which are highly repeative Such as Technical manuals. Main Benefits 1- Ensuring that the Document is Completely Translated 2- Ensuring Consistency, Including Common Definitions And Terminology. 3- Various Formats To Be Translated 4- Accelerating The Overall Translation Process 5- Reducing Time And Money By:AbdoHelal 19
  20. 20. Main Obstacles 1- Recycled Translation Lost an Important Princible is that “Taking The message From the Text” 2- Not Supporting All Files Types 3- Can’t Work with the Repeative lack Text 4- Quality Of The translated Text is not Guaranteed 5- Dealing With the Text Sentence-By-Sentence , Instead of the Whole Meaning 6- Expensive Software , And The more Cheaper Software used , the less Features That you Will See. Also Read the Rest of these Obstacles in the book  By:AbdoHelal 20
  21. 21. Special Thanks To :: Farah El-Mowaled Created By : Abdohelal By:AbdoHelal 21