Invited talk at Processing ROmanian in Multilingual, Interoperational and Scalable Environments (PROMISE 2010) on how to port the QALL-ME framework to a new language
The document describes Smart-M3, an open source platform for building distributed applications using shared information. It discusses key concepts of Smart-M3 including lightweight implementation, accommodating existing systems, and emergent applications. The architecture uses Knowledge Processors that contribute and consume content according to ontologies, and Semantic Information Brokers that manage triples of information. The document provides examples of deploying Smart-M3 and its open source implementation components for Knowledge Processors and Semantic Information Brokers.
NYK Logistics provides a range of supply chain and logistics services including transportation, warehousing, and international services. The document outlines NYK's values of integrity, innovation, and intensity. It also describes NYK's medium-term management plan from 2008-2010 and various supply chain solutions and capabilities including visibility systems, transportation services, and specialized services for projects, international shipping, and Mexico.
The document describes how stock markets work using an allegory about villagers catching monkeys. A man buys monkeys from villagers for increasing prices, until his assistant sells the monkeys back to the villagers for a higher price and disappears with the money. This story illustrates how stock prices can rise and fall based on supply and demand, and how investors can lose money if they buy into a bubble. It then provides an overview of key concepts in equity markets like bull/bear markets, primary/secondary markets, and market indices.
Major brands face challenges with social media including generating direct revenue and dealing with large volumes of customer feedback and content. While early social media practices worked well for small businesses and allowed basic interactions, the landscape has changed dramatically with over 600 million Facebook users and increased expectations from marketing professionals. Best practices that used to work, like targeted Facebook ads and live chat events, need to evolve as consumers and the social ecosystem change. Moving forward, companies must focus on being human-centered in their social approach by prioritizing customer needs over brands and allowing experimentation.
Jean Fares is a fashion designer from Lebanon known for his elegant mixes of color. His designs have been worn by celebrities on red carpets for events like the Oscars, Golden Globes, and NAACP Image Awards. Fares aims to communicate Middle Eastern culture and values through his fashion while pushing boundaries from the Middle East to the global stage.
Gather - a wide range of vegan dishes that excited and pleased the taste buds!Heena Modi
Gather is a restaurant located at 2200 Oxford Street in Berkeley, California with the zip code 94704. Its website is gatherrestaurant.com where customers can find more information about the restaurant.
The document summarizes the life cycle of a broiler hen from hatching to slaughter. Newly hatched chicks are quickly transported on conveyor belts and crammed into small spaces to grow rapidly to meet demand for meat. However, their legs cannot support their fast-growing bodies. Workers roughly grab the fully-grown hens and cram them into crates, causing injuries, before transporting them to slaughter where machines stun, decapitate, and scald them while still alive at times due to inaccuracies.
The document describes Smart-M3, an open source platform for building distributed applications using shared information. It discusses key concepts of Smart-M3 including lightweight implementation, accommodating existing systems, and emergent applications. The architecture uses Knowledge Processors that contribute and consume content according to ontologies, and Semantic Information Brokers that manage triples of information. The document provides examples of deploying Smart-M3 and its open source implementation components for Knowledge Processors and Semantic Information Brokers.
NYK Logistics provides a range of supply chain and logistics services including transportation, warehousing, and international services. The document outlines NYK's values of integrity, innovation, and intensity. It also describes NYK's medium-term management plan from 2008-2010 and various supply chain solutions and capabilities including visibility systems, transportation services, and specialized services for projects, international shipping, and Mexico.
The document describes how stock markets work using an allegory about villagers catching monkeys. A man buys monkeys from villagers for increasing prices, until his assistant sells the monkeys back to the villagers for a higher price and disappears with the money. This story illustrates how stock prices can rise and fall based on supply and demand, and how investors can lose money if they buy into a bubble. It then provides an overview of key concepts in equity markets like bull/bear markets, primary/secondary markets, and market indices.
Major brands face challenges with social media including generating direct revenue and dealing with large volumes of customer feedback and content. While early social media practices worked well for small businesses and allowed basic interactions, the landscape has changed dramatically with over 600 million Facebook users and increased expectations from marketing professionals. Best practices that used to work, like targeted Facebook ads and live chat events, need to evolve as consumers and the social ecosystem change. Moving forward, companies must focus on being human-centered in their social approach by prioritizing customer needs over brands and allowing experimentation.
Jean Fares is a fashion designer from Lebanon known for his elegant mixes of color. His designs have been worn by celebrities on red carpets for events like the Oscars, Golden Globes, and NAACP Image Awards. Fares aims to communicate Middle Eastern culture and values through his fashion while pushing boundaries from the Middle East to the global stage.
Gather - a wide range of vegan dishes that excited and pleased the taste buds!Heena Modi
Gather is a restaurant located at 2200 Oxford Street in Berkeley, California with the zip code 94704. Its website is gatherrestaurant.com where customers can find more information about the restaurant.
The document summarizes the life cycle of a broiler hen from hatching to slaughter. Newly hatched chicks are quickly transported on conveyor belts and crammed into small spaces to grow rapidly to meet demand for meat. However, their legs cannot support their fast-growing bodies. Workers roughly grab the fully-grown hens and cram them into crates, causing injuries, before transporting them to slaughter where machines stun, decapitate, and scald them while still alive at times due to inaccuracies.
I spent a wonderful vacation on the island of Zanzibar off the coast of Tanzania in August 2010. The island has beautiful beaches and a rich history that make it a popular tourist destination. During my trip I enjoyed relaxing on the beaches and learning about the island's culture.
Developing Cocoa Applications with macRubyBrendan Lim
This document provides an outline for a presentation on developing Cocoa applications with MacRuby. Some key points include:
- MacRuby allows developing desktop applications on Mac OS X using the Ruby language while still leveraging the Cocoa frameworks.
- It provides a way to write Ruby code that interacts directly with Objective-C and Cocoa with no translation layer, unlike RubyCocoa.
- Examples are shown of how basic Ruby constructs like strings and arrays map directly to their Objective-C counterparts like NSString and NSMutableArray in MacRuby.
- Tools like Xcode, Interface Builder, and Instruments can still be used for MacRuby application development. The HotCocoa library provides a simpler way to build user
LinkedIn is a professional networking platform launched in 2003 with over 48 million members worldwide. It allows users to create profiles, connect with colleagues and professionals in their fields, and expand their networks. The basic service is free for users, while business accounts offer additional features for a fee. LinkedIn targets affluent professionals globally, including job seekers and those wanting to stay connected with contacts.
The document discusses differences between programming languages. It notes that while there are thousands of programming languages, few become widely used. It explores various aspects that are used to compare programming languages, including whether they are object-oriented, use static or dynamic typing, support generic classes and inheritance. The document provides details on these concepts and how they differ between languages.
The document discusses fashion designer Jean Fares and his fashion house Jean Fares Couture. It provides details about Fares' background and philosophy, describes Jean Fares Couture's collections and ready-to-wear lines. It also lists many famous Hollywood stars and celebrities who have worn Jean Fares Couture designs, praising the brand's innovative couture gowns.
The document describes a proposed collaborative knowledge management (CKM) system powered by DatMobil and SpinCNet. The CKM system would allow users to send search requests that are analyzed, parsed, and delivered to multiple search locations. Results would then be received, processed, and made available to the original requester. The system is intended to securely, reliably, and automatically deliver matching information and documents in an auditable fashion across disconnected environments.
This document lists various sights and landmarks found in the state of Kansas, including the state flag and seal, counties, a hand dug well in Greensburg, botanical gardens in Wichita, the Cosmosphere in Hutchinson, Fort Larned, Mount Sunflower, an Oregon Trail marker, Pawnee Rock, Fort Scott, and the Kansas Vietnam Memorial. It also mentions wagon trails.
The document provides instructions on basic router configuration and commands. It covers topics like:
- Connecting to a router via terminal emulation programs and setting terminal settings
- Gathering information about the router and its interfaces
- Configuring basic settings like hostname, passwords, and descriptions
- Viewing, saving and managing configurations
- Recovering lost passwords by changing the configuration register
The document discusses the services provided by a software quality assurance company. It offers on-demand testing services, rapid assessments to evaluate clients' QA processes, and consulting to help clients reduce costs and improve quality. Clients have benefited from increased sales, reduced testing cycles and costs, and retaining QA knowledge and experts. The company aims to help clients maximize their QA investments.
This document contains descriptions of artworks created by Zoe Bent between 2007 and 2008. It includes sketches of celebrities, drawings of statues from museums in Europe, a charcoal still life, an oil painting based on another work, photographs taken during travels in London and on planes, and an action shot of the artist's sister selected for a student art show. The pieces demonstrate Bent's interest in mythology, skill in rendering light and shadows, and ability to capture moments in time through photographs.
The Prem ni Parab project was started to provide a good education to children in India in a loving way without fear, ensuring they learn basic numeracy and literacy. The project individualizes learning, makes good use of resources, and increases school attendance. It has been very successful in the schools it has worked with, with the government now requesting it expand to more schools to help reform education. The leader of the project assured ongoing support for originally involved schools as the project grows.
1. 532 people originally went to see the Harry Potter movie.
2. 67 people left halfway through because they didn't like it.
3. There were 213 kids in the audience.
The question asks how many people saw the whole film.
The document expresses regret for any flaws or statements in its content that are not aligned with Jain teachings. It apologizes for any offense caused knowingly or unknowingly through its statements. The document can be shared privately with friends if deemed appropriate, but is only for non-commercial, private use.
Warren Buffet, the second richest man in the world who has donated $31 billion to charity, emphasizes living simply and investing for the long term. He still lives in the same house he purchased over 50 years ago, drives his own car, and does not have an entourage. His meeting with Bill Gates was supposed to last 30 minutes but ended up being 10 hours, with Gates becoming a devoted follower of Buffet's philosophy. Buffet advises focusing on bettering yourself through education and saving, avoiding unnecessary purchases and debt, and spending on others who are truly in need.
1) The document discusses the fear factors involved in remote-based development projects when Western initiatives meet Eastern cultures.
2) It identifies the dominant fear as losing strategic control over core operations, products, talent, and intellectual property.
3) To handle the fear factor, the presentation recommends creating social awareness, effective retention strategies, understanding different business models, adequate planning, and ensuring information security.
The Way Out Cafe is located at 3188 Mission St in San Francisco, between Valencia St and Powers Ave in the Bernal Heights and Mission neighborhoods. It can be reached at (415) 240-2743.
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
This document provides an overview of automatic speech recognition systems and their components. It discusses:
- The main components of ASR systems including preprocessing/feature extraction, acoustic models, and language models.
- Unique challenges of working with speech data like data volume, quality, and annotation.
- Common modeling approaches used in ASR systems have historically included hidden Markov models and n-gram language models, but more recent approaches use end-to-end neural networks like deep speech, wav2vec, conformer and whisper models.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
I spent a wonderful vacation on the island of Zanzibar off the coast of Tanzania in August 2010. The island has beautiful beaches and a rich history that make it a popular tourist destination. During my trip I enjoyed relaxing on the beaches and learning about the island's culture.
Developing Cocoa Applications with macRubyBrendan Lim
This document provides an outline for a presentation on developing Cocoa applications with MacRuby. Some key points include:
- MacRuby allows developing desktop applications on Mac OS X using the Ruby language while still leveraging the Cocoa frameworks.
- It provides a way to write Ruby code that interacts directly with Objective-C and Cocoa with no translation layer, unlike RubyCocoa.
- Examples are shown of how basic Ruby constructs like strings and arrays map directly to their Objective-C counterparts like NSString and NSMutableArray in MacRuby.
- Tools like Xcode, Interface Builder, and Instruments can still be used for MacRuby application development. The HotCocoa library provides a simpler way to build user
LinkedIn is a professional networking platform launched in 2003 with over 48 million members worldwide. It allows users to create profiles, connect with colleagues and professionals in their fields, and expand their networks. The basic service is free for users, while business accounts offer additional features for a fee. LinkedIn targets affluent professionals globally, including job seekers and those wanting to stay connected with contacts.
The document discusses differences between programming languages. It notes that while there are thousands of programming languages, few become widely used. It explores various aspects that are used to compare programming languages, including whether they are object-oriented, use static or dynamic typing, support generic classes and inheritance. The document provides details on these concepts and how they differ between languages.
The document discusses fashion designer Jean Fares and his fashion house Jean Fares Couture. It provides details about Fares' background and philosophy, describes Jean Fares Couture's collections and ready-to-wear lines. It also lists many famous Hollywood stars and celebrities who have worn Jean Fares Couture designs, praising the brand's innovative couture gowns.
The document describes a proposed collaborative knowledge management (CKM) system powered by DatMobil and SpinCNet. The CKM system would allow users to send search requests that are analyzed, parsed, and delivered to multiple search locations. Results would then be received, processed, and made available to the original requester. The system is intended to securely, reliably, and automatically deliver matching information and documents in an auditable fashion across disconnected environments.
This document lists various sights and landmarks found in the state of Kansas, including the state flag and seal, counties, a hand dug well in Greensburg, botanical gardens in Wichita, the Cosmosphere in Hutchinson, Fort Larned, Mount Sunflower, an Oregon Trail marker, Pawnee Rock, Fort Scott, and the Kansas Vietnam Memorial. It also mentions wagon trails.
The document provides instructions on basic router configuration and commands. It covers topics like:
- Connecting to a router via terminal emulation programs and setting terminal settings
- Gathering information about the router and its interfaces
- Configuring basic settings like hostname, passwords, and descriptions
- Viewing, saving and managing configurations
- Recovering lost passwords by changing the configuration register
The document discusses the services provided by a software quality assurance company. It offers on-demand testing services, rapid assessments to evaluate clients' QA processes, and consulting to help clients reduce costs and improve quality. Clients have benefited from increased sales, reduced testing cycles and costs, and retaining QA knowledge and experts. The company aims to help clients maximize their QA investments.
This document contains descriptions of artworks created by Zoe Bent between 2007 and 2008. It includes sketches of celebrities, drawings of statues from museums in Europe, a charcoal still life, an oil painting based on another work, photographs taken during travels in London and on planes, and an action shot of the artist's sister selected for a student art show. The pieces demonstrate Bent's interest in mythology, skill in rendering light and shadows, and ability to capture moments in time through photographs.
The Prem ni Parab project was started to provide a good education to children in India in a loving way without fear, ensuring they learn basic numeracy and literacy. The project individualizes learning, makes good use of resources, and increases school attendance. It has been very successful in the schools it has worked with, with the government now requesting it expand to more schools to help reform education. The leader of the project assured ongoing support for originally involved schools as the project grows.
1. 532 people originally went to see the Harry Potter movie.
2. 67 people left halfway through because they didn't like it.
3. There were 213 kids in the audience.
The question asks how many people saw the whole film.
The document expresses regret for any flaws or statements in its content that are not aligned with Jain teachings. It apologizes for any offense caused knowingly or unknowingly through its statements. The document can be shared privately with friends if deemed appropriate, but is only for non-commercial, private use.
Warren Buffet, the second richest man in the world who has donated $31 billion to charity, emphasizes living simply and investing for the long term. He still lives in the same house he purchased over 50 years ago, drives his own car, and does not have an entourage. His meeting with Bill Gates was supposed to last 30 minutes but ended up being 10 hours, with Gates becoming a devoted follower of Buffet's philosophy. Buffet advises focusing on bettering yourself through education and saving, avoiding unnecessary purchases and debt, and spending on others who are truly in need.
1) The document discusses the fear factors involved in remote-based development projects when Western initiatives meet Eastern cultures.
2) It identifies the dominant fear as losing strategic control over core operations, products, talent, and intellectual property.
3) To handle the fear factor, the presentation recommends creating social awareness, effective retention strategies, understanding different business models, adequate planning, and ensuring information security.
The Way Out Cafe is located at 3188 Mission St in San Francisco, between Valencia St and Powers Ave in the Bernal Heights and Mission neighborhoods. It can be reached at (415) 240-2743.
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
This document provides an overview of automatic speech recognition systems and their components. It discusses:
- The main components of ASR systems including preprocessing/feature extraction, acoustic models, and language models.
- Unique challenges of working with speech data like data volume, quality, and annotation.
- Common modeling approaches used in ASR systems have historically included hidden Markov models and n-gram language models, but more recent approaches use end-to-end neural networks like deep speech, wav2vec, conformer and whisper models.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
This document provides an overview of using data mining and text mining techniques to analyze patent documents. It discusses text mining processes like preprocessing, transformation, and feature selection. It also demonstrates various visualizations that can be used, including word clouds, hierarchical clustering, and contour plots. Finally, it compares the R and KNIME tools for performing text mining and data analysis on patent documents.
IRJET - Language Linguist using Image Processing on Intelligent Transport Sys...IRJET Journal
This document summarizes a research paper that proposes a system to detect text in images of traffic signs, extract the text, and translate it to English. The system uses convolutional neural networks (CNNs) to detect text areas and recurrent neural networks (RNNs) to translate the extracted text. The goal is to help travelers understand traffic signs written in unfamiliar languages like Spanish or French by automatically translating the text in images to English. The system performs three steps: 1) detect text areas in images of signs, 2) extract the words from the detected text regions, and 3) translate the extracted text to English for the user.
The document summarizes a workshop on applying federated authentication standards like SAML to the GEOSS system. It introduces the COBWEB project and its goals of integrating crowdsourced environmental data. The workshop covered previous work using SAML, related work in GEOSS, and COBWEB's initial plans to pilot federated authentication for accessing data from multiple sources. Attendees were encouraged to participate in future COBWEB authentication activities.
Localize your business - Software Localization Services LocServSoftengi
LocServ is a localization department of Softengi that provides internationalization and localization services. They have over 15 years of experience in localization and a team of over 100 people. Their services include internationalization consulting and development, localization of software, websites, and documentation, and ongoing localization management.
LocServ - presentation of great localization and internationalization servicesLocServ
This document provides an overview of consulting services for internationalization, localization, and localization management. It describes assessments to analyze technical requirements, costs, and localization readiness. It also outlines services for internationalization development and testing, software and website localization and translation, and localization testing. The goal is to help clients expand their product or service into global markets.
traffic sign detection using deep learning.pptxbrijeshbs2
This document discusses methods for traffic sign detection and classification using neural networks. It describes gathering and labeling a large dataset of street images containing various traffic signs and objects. A YOLO algorithm is used to detect regions of interest within images for classification by an R-CNN neural network. Results are evaluated based on accuracy and types of failures, such as false positives. Future work involves improving the dataset size and quality to increase detection accuracy.
Plone at Harvard School of Engineering and Applied SciencesJazkarta, Inc.
The Harvard School of Engineering and Applied Sciences (SEAS) wanted to launch a dynamic network of websites that attracts prospective students and promotes academic activities both internally and externally. SEAS engaged Jazkarta, a Boston-based open source technology consultancy specializing in Plone, on a project to build a set of websites that achieve these goals. Jazkarta redesigned SEAS' existing public website, constructed an intranet site that allows SEAS to provide up-to-date information to their community of faculty, staff and students, and developed a facility for deploying faculty and lab subsites within the site infrastructure.
Mike Trachtman, Project Manager at Jazkarta, will present a case study of the project that covers development processes, designing highly available and scalable Plone site architectures, integrating/creating Plone components to satisfy functional requirements of .EDU websites, and repeatable deployment of customized Plone software solutions.
This paper proposes DaViT, a vision transformer architecture that uses both spatial and channel attention to efficiently capture global context. Spatial attention performs local interactions across spatial locations while channel attention captures global representations by attending to all spatial positions across channels. Together, they complement each other to achieve state-of-the-art performance on image classification, semantic segmentation, and object detection tasks, with linear computational complexity scaling to high-resolution inputs.
Mobile Multi-domain Search over Structured Web DataAtakanAral
Text-based web search that is primarily designed for personal computers, can be enhanced and optimized while moving to mobile devices. New methods on web search may let user conduct the search without being hampered by the limitations of the device. Moreover, appropriate solutions may also exploit the advantages of such devices. This paper summarizes new trends and technologies of searching, especially multi-domain and exploratory search, as well as demonstrating how they can be best applied to mobile environments.
http://link.springer.com/chapter/10.1007/978-3-642-34213-4_7
Linking Services and Linked Data: Keynote for AIMSA 2012John Domingue
An overview of the approach, principles and technologies supporting how services and Linked Data can be combined to support the creation of applications.
This document discusses the ATLAS project, which aims to create language resources for automatic text-to-Italian Sign Language (LIS) translation. The project involves developing a translation model, technical platform, and corpus of over 3,000 annotated signs. The goal is to provide deaf individuals access to information through virtual sign language characters across different devices. Key objectives include performing feasible translation retrieval and setting signs properly in virtual space.
Adaptive streaming for immersive communicationSilvia Rossi
Virtual reality (VR) endows any user with a sense of full immersion within a virtual environment. Omnidirectional or spherical video content is the new multimedia format that provides this immersive sensation. The viewer is placed at the centre of the sphere and dynamically changes the displayed portion of the spherical content, viewport.
This new spherical multimedia format and this new interactive way of consuming the content have created novel challenges:
• large volume of data to store, deliver and display,
• ultra-low-delay constraints over bandwidth-limited resources
• uncertainty on the portion of content that will be displayed by the user.
Using Deep Learning at Scale - Guhan Suriyanarayanan and Adi Oltean, MicrosoftGuhan Suriyanarayanan
Presentation at GTC2019 by Guhan Suriyanarayanan and Adi Oltean, describing some of the key techniques and trade-offs we use at Microsoft Bing to ship deep learning models at large scale.
Denovolab ( www.denovolab.com ) is a SIP switching solution that is extremely high performance. Suitable for call center, wholesales termination, carrier services.
The document discusses the importance of using consistent language and terminology when communicating about geospatial data and spatial data infrastructures. It notes that as data volumes increase, having a shared understanding of concepts and their meanings through formal definitions and ontologies becomes crucial for enabling data integration and transformation while maintaining quality. Rules-based approaches can help provide this semantic consistency needed for different stakeholders to effectively discuss and work with spatial data.
Semantically-aware Networks and Services for Training and Knowledge Managemen...Gilbert Paquette
This document discusses semantically-aware networks and services for training and knowledge management. It describes software developed at CICE/LICEF for building ontologies and semantically referencing resources to enable semantic search and personalized recommendations. The TELOS system uses competency descriptors and comparison methods to power rules-based recommender agents that are integrated into learning scenarios to provide adaptive assistance to users. Future work is aimed at experimental validation, improving group recommendations, automation, and integrating other recommendation methods.
Similar to Porting the QALL-ME framework to Romanian (20)
Tutorial given at RANLP 2015 in Hissar, Bulgaria
Recent years have seen lots of changes in the field of computational linguistics, most of them due to the widespread use of the Internet and the benefits and problems it brings. The first part of this tutorial will discuss these changes and will focus on crowdsourcing and how it influenced the creation of annotated data.
Annotation of data employed to train and test NLP methods used to be the task of language experts who had a good understanding of the linguistic phenomena to be tackled. Given that a large number of people now have access to the Internet, crowdsourcing has become an alternative way of obtaining annotated data. The core idea of crowdsourcing is that it is possible to design tasks that can be completed by non-experts and that the outputs of these tasks can be combined to obtain high-quality linguistic annotation, which would normally be produced by experts. Examples of how crowdsourcing was employed in computational linguistics will be given.
Big data is another trend in computational linguistics as researchers rely on more and more data for improving the results of a method. The second part of the tutorial will introduce the MapReduce programming model and show how it was used in processing language. Combined with processing larger quantities of data, the field of computational linguistics has applied deep learning to various tasks successfully, improving their accuracy. An introduction to deep learning will be provided, followed by examples of how it was applied to tasks such as learning semantic representations, sentiment analysis and machine translation evaluation.
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
The document summarizes a presentation on question answering systems. It begins by providing context on information overload and defining question answering. It then discusses the evolution of QA systems from early databases to today's open-domain systems. The presentation focuses on IBM's Watson system, providing an overview of its unprecedented ability to answer open-domain questions as well as the massive resources required for its development. It concludes by arguing that open-domain QA remains unsolved and that closed-domain, interactive QA may be more practical for real-world applications.
The role of linguistic information for shallow language processingConstantin Orasan
The document discusses shallow language processing and summarization. It argues that while deep language understanding is limited, shallow methods can be improved by adding linguistic information. As an example, it shows how term frequency, anaphora resolution, discourse cues and genetic algorithms can select extractive summaries that better match human abstracts, without requiring full text comprehension.
What is Computer-Aided Summarisation and does it really work?Constantin Orasan
Computer-aided summarization (CAS) uses automatic methods to identify important information in documents, which humans can then edit to produce summaries. An evaluation of a CAS tool called CAST found that it reduced the time professional summarizers needed to produce summaries by 20% on average without significantly affecting summary quality. User feedback indicated the tool was most useful for identifying related sentences to include.
The document discusses automatic summarization and related disciplines. It defines summarization as the condensation of a source text into a shorter version by selecting key information. Automatic summarization involves producing summaries computationally. Related fields include automatic classification, keyword extraction, information retrieval, information extraction, and question answering, which all aim to organize and understand information from text.
The MESSAGE project aims to:
1) Develop tools to rapidly disseminate reliable emergency messages across Europe.
2) Ensure messages are comprehensible to facilitate response.
3) Propose making available a controlled language editing tool to allow quick and accurate editing of alerts.
Annotation of anaphora and coreference for automatic processingConstantin Orasan
This document discusses annotation of anaphora and coreference in corpora for computational linguistics. It covers several annotation schemes including MUC, which aimed to achieve high inter-annotator agreement by focusing on coreference between noun phrases. The NP4E corpus aimed to develop guidelines for annotating both noun phrase and event coreference in newspaper articles. Annotation is a time-consuming process that requires concentration to identify mentions and relations accurately. Guidelines must be clear and consistent to help annotators agree on how to mark up texts.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
1. Porting the QALL-ME framework to Romanian
Constantin Or˘san
a
Research Group in Computational Linguistics
Research Institute in Information and Language Processing
University of Wolverhampton
http://www.wlv.ac.uk/~in6093/
29th March 2010
2. 1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
3. Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
4. Need to access information
• as a result of the Internet development more and more
information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
summarisation, question answering, text mining, etc. can help
people deal with information
5. Need to access information
• as a result of the Internet development more and more
information becomes available
• this information is in many languages
• fields from computational linguistics such as automatic
summarisation, question answering, text mining, etc. can help
people deal with information
6. Question answering (QA)
• Question answering aims at identifying the answer to a
question in a large collection of documents
• the information provided by QA is more focused than
information retrieval
• the output can be the exact answer or a text snippet which
contains the answer
• the domain took off as a result of the introduction of QA
track in TREC, whilst cross-lingual QA as a result of CLEF
7. Types of QA systems
• open-domain QA systems: can answer any question from any
collection
+ can potentially answer any question
- very low accuracy (especially in cross-lingual settings)
8. Types of QA systems
• open-domain QA systems: can answer any question from any
collection
+ can potentially answer any question
- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
questions for which the answer is known
+ very little processing necessary
- limited to the answers in the database
9. Types of QA systems
• open-domain QA systems: can answer any question from any
collection
+ can potentially answer any question
- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository of
questions for which the answer is known
+ very little processing necessary
- limited to the answers in the database
• closed-domain QA systems: are built for very specific domains
and exploit expert knowledge in them
+ very high accuracy
- can require extensive language processing and limited to one
domain
10. Purpose of the presentation
• briefly present the QALL-ME project
11. Purpose of the presentation
• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanian
about movies
12. Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
13. The QALL-ME project
• QALL-ME = Question Answering Learning technologies in a
multiLingual and Multimodal Environment
• EU-funded project part of FP6
• 7 partners:
• FBK-irst, Italy
• University of Wolverhampton, UK
• University of Alicante, Spain
• DFKI, Germany
• Comdata, Italy
• UbiEST, Italy
• WayCom, Italy
• Web page: http://qallme.fbk.eu
14. The QALL-ME project
• aimed at establishing a shared infrastructure for multilingual
and multimodal QA in the domain of tourism
• In the QALL-ME system
• users ask natural language questions in several languages (both
in textual and speech modality) using a variety of input devices
(e.g. mobile phones), and
• returns a list of specific answers formatted in the most
appropriate modality, ranging from small texts, maps, videos,
and pictures.
15. Local Information Semantic
Sources representation
Service Provider
English Answer German Answer
Extractor Extractor
QALLME central
QA planner
Spanish Answer Italian Answer
Extractor Extractor
Question Type Answer Type Speech Dialog Models
ontology ontology Recognizers
16. Main outputs of the project
• an ontology for the domain of tourism
• entailment based QA framework
• the QALL-ME benchmark
• an entailment framework
(all accessible from the project’s web page:
http://qallme.fbk.eu)
17. The ontology
• A domain-specific ontology for the tourism domain was
developed and shared among all the partners.
• The ontology was used to serve as:
• bridge between different languages
• communication language between different components of the
system
• The ontology was linked to domain independent ontologies
such as MultiWordNet and Sumo
• For more information see (Ou et al., 2008)
18. Design of the ontology
• Analysis of data from content providers
• Analysis of users requirements
• Inspired by similar ontologies:
• Harmonise and eTourism: focus on static information (e.g.
accommodation and events/activities)
• Similar to eTourism as is written in OWL rather RDFs
• but wider coverage
• Introspection
19. The ontology
• Main classes: Country, Destination, Site (i.e.
Accommodation, Attraction, Gastro, and Infrastructure),
Transportation, EventContent and Event
• Element classes: Facility, Room, PersonOrganization,
Language, and Currency
• Attribute classes: Contact, Location, Period and Price.
• Element and attribute classes cannot exist independently and
have to be attached to other main or element classes
20. Price Site
GPSCoordinate
priceType
hasGPSCoordinate
subClassOf subClassOf
PostalAddress
priceValue Event hasPostalAddress
TicketPrice Cinema
DirectionLocation
hasCurrency subClassOf DirectionLocation
Currency isInSite
hasPrice
hasContact
name description
Contact
hasSiteFacility
MovieShow hasRoom
CinemaRoom
SiteFacility
Period EventContent
hasRoomFacility
endTime startTime hasPeriod
hasEventContent RoomFacility
subClassOf subClassOf
TimePeriod
Director
hasTimePeriod
hasDirector
DateTimePeriod Movie hasProducer Producer
hasDatePeriod hasStar
DatePeriod hasWriter Star
name certificate
endDate startDate synposis genre Writer
21. The ontology
• Encoded using OWL DL, since it has more expressive power
than OWL Lite and has more efficient reasoning support than
OWL Full
• Used Protege-OWL as the editor and RacerPro7 as the
reasoner
• The ontology contains
• 122 classes (concepts),
• 55 datatype properties and
• 52 object properties which indicate the relationships among
the 122 classes.
• 15 top-level classes.
• The class hierarchy has a maximum depth of 4.
22. The QALL-ME framework
• is an architecture skeleton for multilingual QA systems for
closed domains
• designed in such a way that it allows fast development of
closed domain QA systems
• freely available from http://qallme.sourceforge.net/
• is based on a Service Oriented Architecture (SOA) which is
realised using web services
• relies on textual entailment recognisers
23. Web services
1 Context providers: are used to anchor questions in space
and time
2 Annotators: Currently three types of annotators are
available:
• named entity annotators which identify names of cinemas,
movies, persons, etc.
• term annotators which identify hotel facilities, movie genres
and other domain-specific terminology
• temporal annotators that are used to recognise and normalise
temporal expressions in user questions
3 Entailment engine: determines whether a user question
entails a retrieval procedure
4 Query generator: which relies on an entailment engine to
generate a query to extract the answer.
5 Answer pool: retrieves the answers from a database.
24. Context providers
• are used to anchor a question in space and time
• return the current position and time
• used by the presentation module when maps are displayed
• used by temporal process to normalise temporal entities
• determines which services are used in a cross-lingual scenario
• can be static or determined from a mobile phone
25. Named entity and term annotators
• named entity recogniser = identifies names of hotels, movies,
persons, etc.
• term annotator = identifies domain specific terms such as
hotel facilities, movie genres, etc.
• the entities and terms are known, so the task is reduced to a
database look up
• Gazetteers are the main source for determining the entities
• The annotation module needs to determine the canonical form
of a entity
• greedy algorithm that uses character based similarity, a
modified TF*IDF and a greedy algorithm
• does not allow overlapping and there are few ambiguities
26. Named entity and term annotators
• Annotates both standard and non-standard entities: cinema,
movie, location, genre, certificate
• Needs to deal with nosy input:
• misspelt words/input from ASR engines/SMS input e.g.
becaming Jane, becoming Jade
• free word order (Will Smith / Smith, Will)
• equivalent strings (saw III / three / 3; Smith, Will / Smith,
W.)
• Needs to deal with questions in mixed languages
• Needs to deal with ambiguous entities
27. Temporal annotator
• questions from the domain of tourism contain a large number
of temporal expressions
• we use a simplified version of the tagger implemented by
Pu¸ca¸u (2004)
s s
• the simplification was done to reduce the processing time
(Varga, Pu¸ca¸u, and Or˘san, 2009)
s s a
• identifies both self-contained temporal expressions (TEs) and
indexical/under-specified TEs
• uses TIMEX2 standard
• the output is used by TIMEX2SPARQL service to restrict the
extracted answers
28. Entailment engine
• often closed-domain QA systems transform a question to a
Prolog fact or SQL query
• often this solution works only partially due to language
variability
• in QALL-ME this problem is solved using textual entailment
• the entailment engine determines whether two questions entail
the same meaning so they share the same retrieval procedure:
• T the input question
• H is textual pattern stored in a repository
• textual patterns have SPARQL retrieval procedures
• we calculate the similarity between two sentences to determine
whether between them there is an entailment relation
29. Query generation service
• produces a SPARQL query that can be used to answer the
question
• has a list of question templates with their associated SPARQL
queries
• relies on the entailment engine to determine which of the
question patterns entail the same meaning as the user
question
• fills in the slots of the question patterns
30. Example
User question (T): What movie can I see tonight in
Wolverhampton?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• ...
31. Example
User question (T): What movie can I see tonight in
Wolverhampton? → What movie can I see [TIMEX] in
[DESTINATION]?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• ...
Select the retrieval pattern associated with the question
What movies are on in Wolverhampton tonight
32. Answer Pool service
• takes the SPARQL query generated by the query generator
and extracts the answer
• SPARQL is a query language for accessing RDF graphs by the
W3C RDF Data Access Working Group
• SPARQL provides interoperability between languages
33. Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
34. Cross-lingual QA
• QALL-ME tourism prototype is design to allow both
monolingual and cross-lingual QA
• relevant web services are activated depending on the source
and target language
• user scenario: Romanian tourist in UK who wants to find out
more about the movies in Wolverhampton
36. Prototype for Romanian
• we wanted to find out how long it takes to develop a demo for
Romanian
• components had to be adapted:
• named entity and term annotators had to be trained on a
different list of entities
• a simple temporal annotator was implemented on the basis of
the English one
• the language independent similarity entailment engine was used
• the question patterns were translated to Romanian
• answer pool did not required any change
• the whole process took under one week
38. Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
39. Conclusions
• multilinguality is a very important issue for the QALL-ME
project
• the ontology constitute the bridge between languages
• the QALL-ME framework can be used to quickly develop
prototypes for other languages
42. Ou, Shiyan, Viktor Pekar, Constantin Or˘san, Christian Spurk, and Matteo Negri.
a
2008. Development and alignment of a domain-specific ontology for question
answering. In European Language Resources Association (ELRA), editor, Proceedings
of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,
Morocco, May 28 – 30.
Pu¸ca¸u, Georgiana. 2004. A framework for temporal resolution. In Proceedings of
s s
the 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,
Portugal, May, 26-28.
Varga, Andrea, Georgiana Pu¸ca¸u, and Constantin Or˘san. 2009. Identification of
s s a
temporal expressions in the domain of tourism. In Knowledge Engineering: Principles
and Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.