Context-based Web search has become an important research area and many strategies have been proposed to reflect contextual information in search queries. Despite the success of some of these proposals they still have serious limitations due to their inability to bridge the terminology gap existing between the user context description and the relevant documents' vocabulary. This paper presents a quantitative technique to learn vocabularies useful for describing the theme of a context under analysis. The enriched vocabulary allows the formulation of search queries to identify resources with higher precision than those identified using the initial vocabulary. Rigorous experimentation leads us to conclude that the proposed technique is superior to a baseline and other well-known query reformulation techniques.
Learning Better Context Characterizations: An Intelligent Information Retriev...Carlos Lorenzetti
This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.
K-repeating Substrings: a String-Algorithmic Approach to Privacy-Preserving P...Yusuke Matsubara
De-identifying textual data is an important task for publishing and
sharing the data among researchers while protecting privacy of
individuals referenced therein. While supervised learning approaches
are successfully applied to the task in the clinical domain, existing
methods are hard to transfer to different domains and languages because
they require a considerable cost and time for preparation of linguistic
resources. This paper presents an efficient unsupervised algorithm to
detect all substrings occurring less than $k$ times in the input
string, based on the assumption that such rare sequences are likely to
contain sensitive information such as names of people and rare diseases
that may identify individuals. The proposed algorithm works in
asymptotically and empirically linear time against the input size when
$k$ is a constant. Empirical evaluation on the \emph{i2b2} (Informatics
for Integrating Biology and Bedside) dataset shows the effectiveness of
the algorithm in comparison to baselines that use simple word frequencies.
THE EFFECT OF STICK FIGURE ON STUDENTS’ VOCABULARY ENRICHMENTReny Eka Sari
This document discusses a study on the effect of using stick figures to teach vocabulary to elementary school students. It describes how most elementary students have difficulty remembering and translating words between their first and target languages. The study aims to see if using stick figures can help students better remember and understand word meanings, by focusing on job and hobby vocabulary. It will use a pre-test post-test control group design to test the null hypothesis that stick figures have no effect on vocabulary enrichment. The population is 2nd grade students at a school in Cibinong, with a sample of 20% randomly selected from each class.
Using Internet Resources and Digital Technology in Language Teaching.Víctor González
AGIS presentation in Hannover, February 2009.
The Web 2.0 is an innovative tool for education that is changing the 21st century process of teaching. Not only its potential is limitless but also its social and behavioral power. Creativity, participation and collaboration are only some of the key elements in Web 2.0. But there are more.
This presentation will explore the endless possibilities that internet can have for a Language teacher. From web quests to online exercises and other powerful resources. We will also have a look at some useful simple digital video tools and podcasts to be both used and created by students in class.
The document discusses various memory and study aids for vocabulary, including visual aids like drawings, diagrams, charts and graphic organizers; word associations; and flashcards. It recommends creating flashcards with the word, part of speech, definition, and example sentence on both sides. The document provides tips for effective use of flashcards and guidelines for studying vocabulary, such as studying in short sessions throughout the day and using active learning strategies like making up example sentences.
The document summarizes a study on the effects of small reading groups on student engagement and motivation, particularly for English as an Additional Language (EAL) students. It describes the school demographics, main setting of reading groups, literature review on relevant topics, focus students and data collection methods. Key findings indicate that classroom arrangement, technology, hands-on learning and understanding vocabulary can motivate students, while cultural and parental influences impact learning. The conclusion discusses implications for engaging students intrinsically and extrinsically in reading groups through diversity and parental involvement.
This document provides instructions for using Microsoft Excel. It explains how to open Excel, what spreadsheets are used for, how to enter and edit data in cells, common errors like disabled macros and how to fix them, and who to contact for help.
Learning Better Context Characterizations: An Intelligent Information Retriev...Carlos Lorenzetti
This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.
K-repeating Substrings: a String-Algorithmic Approach to Privacy-Preserving P...Yusuke Matsubara
De-identifying textual data is an important task for publishing and
sharing the data among researchers while protecting privacy of
individuals referenced therein. While supervised learning approaches
are successfully applied to the task in the clinical domain, existing
methods are hard to transfer to different domains and languages because
they require a considerable cost and time for preparation of linguistic
resources. This paper presents an efficient unsupervised algorithm to
detect all substrings occurring less than $k$ times in the input
string, based on the assumption that such rare sequences are likely to
contain sensitive information such as names of people and rare diseases
that may identify individuals. The proposed algorithm works in
asymptotically and empirically linear time against the input size when
$k$ is a constant. Empirical evaluation on the \emph{i2b2} (Informatics
for Integrating Biology and Bedside) dataset shows the effectiveness of
the algorithm in comparison to baselines that use simple word frequencies.
THE EFFECT OF STICK FIGURE ON STUDENTS’ VOCABULARY ENRICHMENTReny Eka Sari
This document discusses a study on the effect of using stick figures to teach vocabulary to elementary school students. It describes how most elementary students have difficulty remembering and translating words between their first and target languages. The study aims to see if using stick figures can help students better remember and understand word meanings, by focusing on job and hobby vocabulary. It will use a pre-test post-test control group design to test the null hypothesis that stick figures have no effect on vocabulary enrichment. The population is 2nd grade students at a school in Cibinong, with a sample of 20% randomly selected from each class.
Using Internet Resources and Digital Technology in Language Teaching.Víctor González
AGIS presentation in Hannover, February 2009.
The Web 2.0 is an innovative tool for education that is changing the 21st century process of teaching. Not only its potential is limitless but also its social and behavioral power. Creativity, participation and collaboration are only some of the key elements in Web 2.0. But there are more.
This presentation will explore the endless possibilities that internet can have for a Language teacher. From web quests to online exercises and other powerful resources. We will also have a look at some useful simple digital video tools and podcasts to be both used and created by students in class.
The document discusses various memory and study aids for vocabulary, including visual aids like drawings, diagrams, charts and graphic organizers; word associations; and flashcards. It recommends creating flashcards with the word, part of speech, definition, and example sentence on both sides. The document provides tips for effective use of flashcards and guidelines for studying vocabulary, such as studying in short sessions throughout the day and using active learning strategies like making up example sentences.
The document summarizes a study on the effects of small reading groups on student engagement and motivation, particularly for English as an Additional Language (EAL) students. It describes the school demographics, main setting of reading groups, literature review on relevant topics, focus students and data collection methods. Key findings indicate that classroom arrangement, technology, hands-on learning and understanding vocabulary can motivate students, while cultural and parental influences impact learning. The conclusion discusses implications for engaging students intrinsically and extrinsically in reading groups through diversity and parental involvement.
This document provides instructions for using Microsoft Excel. It explains how to open Excel, what spreadsheets are used for, how to enter and edit data in cells, common errors like disabled macros and how to fix them, and who to contact for help.
This document discusses vocabulary enrichment services that could be provided through the LoCloud project. It introduces web services and vocabulary standards like SKOS that could be used to build shared multilingual vocabularies. Examples of existing vocabulary management tools are also presented that could serve as a model for the experimental LoCloud application to enable local institutions to collaborate on vocabularies for local history and archaeology. The outcomes of a previous workshop are summarized, including suggestions for importing existing open vocabularies and guidelines for using the new shared vocabulary tool.
Ctl ( contextual teaching and learning )Sary Nieman
CTL called contextual approach because the concept of learning that help teacher’s content associate between the lesson and the real world situation with the students and encourage students to make the relationship between knowledge held by the implementations in their lives as members of the community.
Series 16 -Attachment 4 -Momin Chetamani -GujaratiSatpanth Dharm
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
This document discusses how the speed of light can be calculated from verses in the Quran. It describes a physicist's calculation showing that the speed of light (C) is equal to 299792.5 km/sec based on the distance traveled by the moon in one sidereal month. This value matches modern measurements of the speed of light, providing evidence that the Quran contained knowledge of modern physics centuries before its discovery.
This document discusses facts and myths about asthma. It begins by stating that asthma is not "all in the mind" but emotional triggers can cause flare-ups. It also notes that while asthma symptoms may become inactive in teenage years for some children, it cannot be outgrown. The document emphasizes that asthma cannot be cured but can be controlled with medical treatment and underscores the seriousness of the condition. It confirms several triggers of asthma attacks and notes that medications used to treat asthma are not habit-forming or addictive. Overall, the document provides information to distinguish true and false statements about the nature, causes, and treatment of asthma.
The document provides an executive summary of the Generation Peace Youth Network General Assembly held from February 23-25, 2012. It summarizes the activities and programs conducted by GenPeace from 2010-2011. It also outlines the members' assessment of GenPeace's work and recommendations for future activities and programs from 2012-2014. Key topics discussed at the assembly included updates on the Mindanao and NDF peace processes, presentations on institutional development and capacity building, and the planning of GenPeace's advocacy campaigns and projects for peace in the coming years.
El documento presenta un horario semanal que incluye las actividades de levantarse, desayunar, jugar, comer, merendar y dormir para cada día de la semana de lunes a sábado.
Find-A-Church.org is a directory that receives 300,000 monthly visitors seeking information on congregations. The site allows churches to welcome visitors, display ministries and activities, introduce members, and link to their website. Churches can easily update their profile by providing contact information, details about services and schedules, and descriptions of ministries to help potential congregants learn more about their community.
Dokumen tersebut memberikan inspirasi untuk mahasiswa agar tetap semangat dalam kuliah meski juga sibuk berbisnis. Dokumen tersebut menyarankan untuk mengubah mindset, menemukan alasan spiritual, dan percaya diri dalam ide kreatif agar bisnis dan kuliah dapat berjalan bersamaan.
This document discusses the transition from Blackboard 8 to GCU Learn (Blackboard 9.1) at Glasgow Caledonian University. It highlights several key differences and new features in GCU Learn, including the inclusion of social media tools like blogs, wikis and podcasts. GCU Learn also allows for the integration of external media from sites like YouTube, Flickr and SlideShare. Additionally, GCU Learn introduces Communities which allow groups to share documents, participate in social media, and communicate online, as well as a content system to easily collect, share and discover learning objects within GCU Learn. The overall goal of the transition is to provide a richer learning environment for students and staff through these enhanced tools and resources.
A União Europeia está preocupada com o impacto ambiental do plástico descartável e planeja proibir itens como canudos, talheres e pratos até 2021. A proibição visa reduzir a poluição plástica nos oceanos e promover alternativas mais sustentáveis. Os países da UE terão que encontrar substitutos ecológicos para esses itens de plástico de uso único.
Kevin Suttle is a channel developer for litl and has contributed to various tech publications and projects. He discusses the importance of considering context when developing applications, noting that context includes more than just screen size. Context refers to the user's surroundings, capabilities of their device, time sensitivity, social interactions and more. He advocates developing "contextual applications" or "contextual experiences" that account for these various contexts rather than focusing only on devices or screen sizes.
A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).
LLMs in Production: Tooling, Process, and Team StructureAggregage
Join Dr. Greg Loughnane and Chris Alexiuk in this exciting webinar to learn all about the tooling, processes, and team structure you need to build and operate performant, reliable, and scalable production-grade LLM applications!
This document proposes improvements to the StackOverflow website by modeling question and answer data in Elasticsearch. It describes input question and answer data containing fields like post_id, tags, and user_id. It models this data in Elasticsearch indexes with types and documents corresponding to tables and rows. It shows how to query this data, such as calculating the probability a question is answered within 10 minutes based on its tags. It also models user-tag relationships and computes tag popularity. Finally, it describes a data pipeline to sample historical data, compute probabilities, and construct a graph of users and the types of questions they answer.
This document proposes improvements to the StackOverflow website by modeling question and tag data using Elasticsearch. It describes input data formats for questions and answers, and models the data with indexes, types and documents. It shows how to build tag graphs to calculate tag similarities and recommend related tags to users. Queries are proposed to calculate the probability of questions being answered within a time period based on tags, using stratified sampling. The document introduces the author and provides an overview of the data pipeline and modeling approach.
This document discusses vocabulary enrichment services that could be provided through the LoCloud project. It introduces web services and vocabulary standards like SKOS that could be used to build shared multilingual vocabularies. Examples of existing vocabulary management tools are also presented that could serve as a model for the experimental LoCloud application to enable local institutions to collaborate on vocabularies for local history and archaeology. The outcomes of a previous workshop are summarized, including suggestions for importing existing open vocabularies and guidelines for using the new shared vocabulary tool.
Ctl ( contextual teaching and learning )Sary Nieman
CTL called contextual approach because the concept of learning that help teacher’s content associate between the lesson and the real world situation with the students and encourage students to make the relationship between knowledge held by the implementations in their lives as members of the community.
Series 16 -Attachment 4 -Momin Chetamani -GujaratiSatpanth Dharm
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
This document discusses how the speed of light can be calculated from verses in the Quran. It describes a physicist's calculation showing that the speed of light (C) is equal to 299792.5 km/sec based on the distance traveled by the moon in one sidereal month. This value matches modern measurements of the speed of light, providing evidence that the Quran contained knowledge of modern physics centuries before its discovery.
This document discusses facts and myths about asthma. It begins by stating that asthma is not "all in the mind" but emotional triggers can cause flare-ups. It also notes that while asthma symptoms may become inactive in teenage years for some children, it cannot be outgrown. The document emphasizes that asthma cannot be cured but can be controlled with medical treatment and underscores the seriousness of the condition. It confirms several triggers of asthma attacks and notes that medications used to treat asthma are not habit-forming or addictive. Overall, the document provides information to distinguish true and false statements about the nature, causes, and treatment of asthma.
The document provides an executive summary of the Generation Peace Youth Network General Assembly held from February 23-25, 2012. It summarizes the activities and programs conducted by GenPeace from 2010-2011. It also outlines the members' assessment of GenPeace's work and recommendations for future activities and programs from 2012-2014. Key topics discussed at the assembly included updates on the Mindanao and NDF peace processes, presentations on institutional development and capacity building, and the planning of GenPeace's advocacy campaigns and projects for peace in the coming years.
El documento presenta un horario semanal que incluye las actividades de levantarse, desayunar, jugar, comer, merendar y dormir para cada día de la semana de lunes a sábado.
Find-A-Church.org is a directory that receives 300,000 monthly visitors seeking information on congregations. The site allows churches to welcome visitors, display ministries and activities, introduce members, and link to their website. Churches can easily update their profile by providing contact information, details about services and schedules, and descriptions of ministries to help potential congregants learn more about their community.
Dokumen tersebut memberikan inspirasi untuk mahasiswa agar tetap semangat dalam kuliah meski juga sibuk berbisnis. Dokumen tersebut menyarankan untuk mengubah mindset, menemukan alasan spiritual, dan percaya diri dalam ide kreatif agar bisnis dan kuliah dapat berjalan bersamaan.
This document discusses the transition from Blackboard 8 to GCU Learn (Blackboard 9.1) at Glasgow Caledonian University. It highlights several key differences and new features in GCU Learn, including the inclusion of social media tools like blogs, wikis and podcasts. GCU Learn also allows for the integration of external media from sites like YouTube, Flickr and SlideShare. Additionally, GCU Learn introduces Communities which allow groups to share documents, participate in social media, and communicate online, as well as a content system to easily collect, share and discover learning objects within GCU Learn. The overall goal of the transition is to provide a richer learning environment for students and staff through these enhanced tools and resources.
A União Europeia está preocupada com o impacto ambiental do plástico descartável e planeja proibir itens como canudos, talheres e pratos até 2021. A proibição visa reduzir a poluição plástica nos oceanos e promover alternativas mais sustentáveis. Os países da UE terão que encontrar substitutos ecológicos para esses itens de plástico de uso único.
Kevin Suttle is a channel developer for litl and has contributed to various tech publications and projects. He discusses the importance of considering context when developing applications, noting that context includes more than just screen size. Context refers to the user's surroundings, capabilities of their device, time sensitivity, social interactions and more. He advocates developing "contextual applications" or "contextual experiences" that account for these various contexts rather than focusing only on devices or screen sizes.
A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).
LLMs in Production: Tooling, Process, and Team StructureAggregage
Join Dr. Greg Loughnane and Chris Alexiuk in this exciting webinar to learn all about the tooling, processes, and team structure you need to build and operate performant, reliable, and scalable production-grade LLM applications!
This document proposes improvements to the StackOverflow website by modeling question and answer data in Elasticsearch. It describes input question and answer data containing fields like post_id, tags, and user_id. It models this data in Elasticsearch indexes with types and documents corresponding to tables and rows. It shows how to query this data, such as calculating the probability a question is answered within 10 minutes based on its tags. It also models user-tag relationships and computes tag popularity. Finally, it describes a data pipeline to sample historical data, compute probabilities, and construct a graph of users and the types of questions they answer.
This document proposes improvements to the StackOverflow website by modeling question and tag data using Elasticsearch. It describes input data formats for questions and answers, and models the data with indexes, types and documents. It shows how to build tag graphs to calculate tag similarities and recommend related tags to users. Queries are proposed to calculate the probability of questions being answered within a time period based on tags, using stratified sampling. The document introduces the author and provides an overview of the data pipeline and modeling approach.
This document proposes using Elasticsearch to model and analyze data from StackOverflow. It describes modeling questions and answers as documents with fields like question_id, tags, answer_time. Tags would be indexed to analyze relationships between tags and recommend related tags to users. Graph structures and equations are proposed to calculate the similarity between tags based on the number of users who answered questions with both tags and the maximum weight of paths connecting tags. Stratifying sampling by month and tags is suggested to estimate the probability a question with a tag will be answered within 10 minutes.
Techniques For Deep Query UnderstandingAbhay Prakash
The document summarizes techniques for deep query understanding in search systems. It discusses query understanding, which involves understanding a user's information need from their query. This allows for query correction, suggestion, expansion, classification and semantic tagging. Query correction reformulates ill-formed queries. Query suggestion provides similar queries. Query expansion adds synonyms to broaden results. Query classification determines the intent or topic of the query. Semantic tagging identifies entities in the query. The document outlines various models for these techniques, including using contextual information and graph representations of search logs.
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Lucidworks
Trey Grainger gave a presentation about using Lucene/Solr as a self-learning data system through the concept of "reflected intelligence". The presentation covered topics like basic keyword search, taxonomies/entity extraction, query intent, and relevancy tuning. It proposed that by leveraging previous user data and interactions, new data and interactions could be better interpreted to continuously improve the system.
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
What if your search engine could automatically tune its own domain-specific relevancy model? What if it could learn the important phrases and topics within your domain, automatically identify alternate spellings (synonyms, acronyms, and related phrases) and disambiguate multiple meanings of those phrases, learn the conceptual relationships embedded within your documents, and even use machine-learned ranking to discover the relative importance of different features and then automatically optimize its own ranking algorithms for your domain?
In this presentation, you’ll learn you how to do just that - to evolving Lucene/Solr implementations into self-learning data systems which are able to accept user queries, deliver relevance-ranked results, and automatically learn from your users’ subsequent interactions to continually deliver a more relevant experience for each keyword, category, and group of users.
Such a self-learning system leverages reflected intelligence to consistently improve its understanding of the content (documents and queries), the context of specific users, and the relevance signals present in the collective feedback from every prior user interaction with the system. Come learn how to move beyond manual relevancy tuning and toward a closed-loop system leveraging both the embedded meaning within your content and the wisdom of the crowds to automatically generate search relevancy algorithms optimized for your domain.
This document discusses a demonstration of named entity recognition using a conditional random fields classifier trained on Wikipedia data. The goal is to develop an accurate NER classifier and demonstrate its utility for search. Key points made include that the CRF classifier was trained on word properties from Wikipedia data and performed similarly to the Stanford classifier on a validation set, with a small improvement on miscellaneous entities.
Dynamic Search Using Semantics & StatisticsPaul Hofmann
This presentation shows 3 applications of successfully combining semantics and statistics for text mining and interactive search.
1) We predict the Lehman bankruptcy using statistical topic modeling, SAP Business Objects entity extraction and associative memories (powered by Saffron Technologies).
2) We semi-automatically handle service requests at Cisco using knowledge extraction and knowledge reuse.
3) We discover user intent for interactive retrieval. User intent is defined as a latent state. The observations of this latent state are the reformulated query sequence, and the retrieved documents, together with the positive or negative feedback provided by the user. Demo shows recognizing user’s intent for health care search.
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
Search engines frequently miss the mark when it comes to understanding user intent. This talk will walk through some of the key building blocks necessary to turn a search engine into a dynamically-learning "intent engine", able to interpret and search on meaning, not just keywords. We will walk through CareerBuilder's semantic search architecture, including semantic autocomplete, query and document interpretation, probabilistic query parsing, automatic taxonomy discovery, keyword disambiguation, and personalization based upon user context/behavior. We will also see how to leverage an inverted index (Lucene/Solr) as a knowledge graph that can be used as a dynamic ontology to extract phrases, understand and weight the semantic relationships between those phrases and known entities, and expand the query to include those additional conceptual relationships.
As an example, most search engines completely miss the mark at parsing a query like (Senior Java Developer Portland, OR Hadoop). We will show how to dynamically understand that "senior" designates an experience level, that "java developer" is a job title related to "software engineering", that "portland, or" is a city with a specific geographical boundary (as opposed to a keyword followed by a boolean operator), and that "hadoop" is the skill "Apache Hadoop", which is also related to other terms like "hbase", "hive", and "map/reduce". We will discuss how to train the search engine to parse the query into this intended understanding and how to reflect this understanding to the end user to provide an insightful, augmented search experience.
Topics: Semantic Search, Apache Solr, Finite State Transducers, Probabilistic Query Parsing, Bayes Theorem, Augmented Search, Recommendations, Query Disambiguation, NLP, Knowledge Graphs
This document proposes improvements to StackOverflow.com by helping users tag their questions. It presents a data modeling approach using Elasticsearch to model question and answer data from StackOverflow. Graphs are constructed from user tags to determine tag similarity and recommend related tags to users. Queries are defined to calculate the probability of questions being answered within a given time based on tags, and to determine tag similarity based on the relationships between tags in the user graph.
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
The Semantic Knowledge Graph is an Apache Solr plugin that can be used to discover and rank the relationships between any arbitrary queries or terms within the search index. It is a relevancy swiss army knife, able to discover related terms and concepts, disambiguate different meanings of terms given their context, cleanup noise in datasets, discover previously unknown relationships between entities across documents and fields, rank lists of keywords based upon conceptual cohesion to reduce noise, summarize documents by extracting their most significant terms, generate recommendations and personalized search, and power numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. This talk will walk you through how to setup and use this plugin in concert with other open source tools (probabilistic query parser, SolrTextTagger for entity extraction) to parse, interpret, and much more correctly model the true intent of user searches than traditional keyword-based search approaches.
The document discusses Modware, an object-oriented Perl interface for querying and updating the Chado database schema. It provides semantically sensible classes and methods that encapsulate Chado's business rules for easier and more efficient development. An example demonstrates storing a gene with exons in Chado using Modware and generating a web page to display the gene details.
This document provides an overview of the topics covered in a QTP (Quick Test Professional) training syllabus, including:
- QTP's recording and identification logic, object identification configuration, object repository, data tables, actions, environment variables, checkpoints, synchronization, debugging, recovery scenarios, parameterization, and VBScript basics.
It also covers working with web tables, databases, Microsoft Excel, Internet Explorer and Firefox, and creating automation frameworks using VBScript and a modular, data-driven, keyword-driven or hybrid approach.
This document provides an outline for a complete Java training course. It covers topics such as Java programming fundamentals, object-oriented programming, packages, exception handling, arrays, strings, collections, advanced Java topics like J2EE, servlets, JSP, databases, Hibernate, and Struts. It also includes contact information for the training provider.
Workflow Provenance: From Modelling to ReportingRayhan Ferdous
This document provides an overview of workflow provenance and proposes a programming model and system architecture for collecting and querying workflow provenance data at scale. It begins by defining provenance and its importance for big data analytics. It then classifies different types of provenance queries and proposes a taxonomy. The document outlines a programming model using object-oriented programming and domain-specific languages to automate provenance logging. It proposes parsing logs into a graph database to support fundamental provenance queries and data visualization. Finally, it discusses scaling the system and conducting further research through user studies and query optimization.
The document summarizes three papers from the SIGIR 2011 workshop on query representation and understanding.
The first paper analyzes temporal queries using web snippets and query logs to identify queries with implicit temporal intent. It finds that web snippets contain more temporal evidence than query logs.
The second paper analyzes the complex network structure of search queries, finding they exhibit a kernel-periphery structure like natural language with popular and rare query segments.
The third paper investigates query refinement through topic analysis and learning with personalization. It generates topics from query logs, builds user profiles, and uses the topics and profiles to score candidate query refinements.
The Semantic Web - This time... its PersonalMark Wilkinson
My presentation on SADI, SHARE, CardioSHARE, and the new iConsent project. Presented to the faculty and students at Stanford Medical Informatics, Palo Alto, USA. May 14th, 2010.
How do we make the semantic web, and medical research, more personal? (both for the researcher and for the patient) I present some ideas we're exploring
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
Технологии с открытым исходным кодом, такие как Microsoft Orleans и ElasticSearch, - ключевые элементы архитектуры YouScan. О том, как они помогают справляться с постоянно растущими объемами данных из социальных сетей, об эволюции архитектуры YouScan, я расскажу в данном докладе.
Similar to Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach (20)
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Tuning Topical Queries through Context Vocabulary Enrichment: A Corpus-Based Approach
1. Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based Approach Carlos M Lorenzetti Ana G Maguitman [email_address] [email_address] Universidad Nacional del Sur Av. L.N. Alem 1253 Bahía Blanca - Argentina Grupo de Investigación en Recuperación de Información y Gestión del Conocimiento Laboratorio de Investigación y Desarrollo en Inteligencia Artificial CONICET AGENCIA
19. Topics Descriptors Topic: Java Virtual Machine Initial Context Term descriptive power in a topic of a document 0 jdk 0 jvm 0 province 0 island 0 coffee 3 programming 1 language 1 virtual 2 machine 4 java 0,014 0,032 0,040 0,040 0,055 0,064 0,089 0,124 0,158 0,385
20. Topics Discriminators Topic: Java Virtual Machine Initial Context Term discriminating power in a topic of a document 0 province 0 island 0 coffee 4 java 1 language 2 machine 3 programming 1 virtual 0 jdk 0 jvm 0,385 0,385 0,385 0,493 0,517 0,524 0,566 0,566 0,848 0,848
21. Proposed Algorithm Context w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w m-1 w m w m-2 w 9 . . . Roulette query 01 query 02 query 03 query n result 03 result 01 result 02 result n w 0,5 w 0,25 . . . w 0,1 1 2 m DESCRIPTORS DESCRIPTORS w 0,4 w 0,37 . . . w 0,01 1 2 m DISCRIMINATORS DISCRIMINATORS 1 2 4 3 Terms
22.
23.
24. Evaluation – N Similarity Context update Top/Computers/Open_Source/Software Query formulation and retrieval process 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 20 40 60 80 100 120 140 160 180 iteration novelty-driven similarity Maximum Average Minimum [0.5866; 0.6073] 0.5970 best [0.0618; 0.0704] 0.0661 1 st 95% CI Mean N
25. Evaluation – N Similarity [0.0822; 0.0924] 0.087 Baseline [0.5866; 0.6073] 0.597 Incremental [0.0710; 0.0803] 0.075 Bo1-DFR 95% CI Mean N
29. Thank you! CONICET AGENCIA Laboratorio de Investigación y Desarrollo en Inteligencia Artificial lidia.cs.uns.edu.ar Universidad Nacional del Sur Bahía Blanca www.uns.edu.ar
Editor's Notes
Context-based search is the process of seeking information related to a user’s thematic context. Meaningful automatic context-based search can only be achieved if the semantics of the terms in the context under analysis is reflected in the search queries. For example, if a user is searching using their own words ...
He or she could find a lot of topics related to this search.
An information request is usually initiated or generated within a task. For example, if the user is editing or reading a document on a specific topic, he may be willing to explore new material related to that topic. Topical queries can be formed using small sets of terms from the user’s context.
And in this way we could disambiguate a query that belongs to more than one topic.
Query tuning is usually achieved by replacing or extending the terms of a query, or by adjusting the weights of a query vector. Relevance feedback is a query refinement mechanism used to tune queries based on the relevance assessments of the query’s results. A driving hypothesis for relevance feedback methods is that it may be difficult to formulate a good query when the collection of documents isn’t known in advance, but it’s easy to judge particular documents, and so it makes sense to engage in an iterative query refinement process. A typical relevance feedback scenario will involve the following steps: A query is formulated the system returns an initial set of results a relevance assessment on the returned results is issued (by relevance feedback) The system computes a better representation of the information needs based on this feedback. The system returns a revised set of results. Depending on the level of automation of step 3 we can distinguish three forms of feedback: Supervised Feedback requires explicit feedback, which is typically obtained from users who indicate the relevance of each of the retrieved documents. Unsupervised Feedback applies blind relevance feedback, and typically assumes that the top k documents returned by a search process are relevant. And in Semi-supervised Feedback, the relevance of a document is inferred by the system. A common approach is to monitor the user behavior (e.g., which documents are selected for viewing or time spent viewing a document). Provided that the information seeking process is performed within a thematic context, and another automatic way to infer the relevance of a document is by computing the similarity of the document to the user’s current context. We’ll use it in this work.
Much work has addressed the problem of computing the informativeness of a term across a corpus and a good deal of research has focused on computing the descriptive and discriminating power of a term in a document with respect to a corpus. All this work, however, has been done based on a predefined collection of documents and independently from a thematic context. In a previous work we proposed to study the descriptive and discriminating power of a term based on its distribution across the topic of pages returned by a search engine. To distinguish between topic descriptors and discriminators we argue that good topic descriptors can be found by looking for terms that occur often in documents related to the given topic. On the other hand, good topic discriminators can be found by looking for terms that occur only in documents related to the given topic. Both topic descriptors and discriminators are important as query terms.
Because topic descriptors occur often in relevant pages, using them as query terms may improve recall.
Similarly, good topic discriminators occur primarily in relevant pages, and therefore using them as query terms may improve precision.
Now we’ll see a simple example to assess the potential of this kind of terms.
For example, if we choose the topic: Java Virtual Machine, we could take the following words in our context :
So, intuitively, and in sense with the definition, we could say that Good Descriptors would be words suchs as Java, Machine or Virtual, And …
Good discriminators would be: JVM and JDK.
More precisely we’ll see a practical example. As we see in this slide, we build a matrix of documents against terms. Our initial context is the first column of that matrix and the next columns are the pages that we could obtain through a search engine making queries with the initial context’s terms. By definition, each cell of the matrix represents the ocurrences of a term in a document. In this example we have four pages, two of them about the Island and Coffee and the rest about Java as a programming language.
We define the Descriptive power of term in a document with this expression and we can see the values of the terms. We see that the values of the terms that don’t belong to the initial context are zero.
Also we define the Discriminating power of a term in a document with this other expression and see the results. As our objective is to learn the user needs , instead of extracting the descriptors and discriminator of documents (like the user context) we need to find user context topic descriptors and discriminators. This term identification needs an Incremental Method that identifies which documents are similar to the user context. So, we need …
A document comparison criteria and we choose Cosine Similarity. It uses the most simple way to compare documents and it’s the most common method in IR. We don’t explain this method here, but with this criteria we define the topic notion. A topic will be a group of documents with a high cosine similarity.
Using the previous definitions, we define the Term descriptive power in a topic of a document using this equation. We see again the weights reached by every term and we note that Java and Machine are good topic descriptors as we mentioned before.
Also we define the notion of Term discriminating power in a topic of a document, and we note one of the most important things. Terms like JVM and JDK, which don’t belong to the initial context are excellent topic discriminators, as we thought before. Incremental search methods are useful for collecting information from diverse information sources. The incremental identification of context-specific terms can guide the search process through huge repositories of potentially useful material, helping to filter irrelevant content.
Our proposal is to approximate the terms' descriptive and discriminating power for the thematic context under analysis with the purpose of generating good queries . Our approach adapts the typical relevance feedback mechanism to account for a thematic context as follows: First, we extract terms from the user context. With these terms we make queries and the system returns an initial set of results. Simultaneously, with the obtained results and the context the descriptors and discriminators lists are built. These steps are repeated until no improvements are observed. Then, the context characterization is updated with new learned material and the process starts again. The system monitors the effectiveness achieved at each iteration and we use novelty-driven similarity as an estimation of the retrieval effectiveness (we’ll explain it later). If after a number of trials the retrieval effectiveness has not crossed a given threshold (that is, no significant improvements are observed after certain number of trials), the system forces to explore new potentially useful regions of the vocabulary landscape and it can be regarded as a vocabulary leap, which can be thought of as a significant transformation (typically an improvement) of the context characterization. ----------------------------------------------------------------------------------- Step 1: A query is formulated based on C . Step 2: The system returns an initial set of results. Step 3: Repeat for at least v iterations or until no improvements are registered Step 3.1: A relevance assessment on the returned results is issued based on C . Step 3.2: After a certain number of trials and depending on the relevance assessments, the system computes a better representation of the thematic context (phase change). Step 3.3: The system formulates new queries and returns a revised set of results. In order to learn better characterizations of the thematic context, the system undergoes a series of phases. At the end of each phase, the context characterization is updated with new learned material. Each phase evolves through a sequence of trials, where each trial consists in the formulation of a set of queries, the analysis of the retrieved results, the adjustment of the terms' weights, and the discovery of new potentially useful terms. To form queries during phase i we implemented a roulette selection mechanisms where the probability of choosing a particular term t to form a query is proportional to (weight of the term at phase i). Roulette selection is a technique typically used by Genetic Algorithms to choose potentially useful solutions for recombination, where the fitness level is used to associate a probability of selection. This approach resulted in a non-deterministic exploration of term space that favored the fittest terms. The system monitors the effectiveness achieved at each iteration. In our approach we use novelty-driven similarity as an estimation of the retrieval effectiveness (we’ll explaine it later). If after a number of trials the retrieval effectiveness has not crossed a given threshold (i.e., no significant improvements are observed after certain number of trials), the system forces a phase change to explore new potentially useful regions of the vocabulary landscape. A phase change can be regarded as a vocabulary leap, which can be thought of as a significant transformation (typically an improvement) of the context characterization.
Now, we’ll see an evaluation of the proposed solution
We compare the proposed method against two other methods. The first is a baseline that submits queries directly from the thematic context and doesn’t apply any refinement mechanism. The second method used for comparison is the Bo1-DFR method, which is based on the well known Rocchio method. To perform our tests we used 448 topics from the Open Directory Project (ODP) and a number of constraints were imposed on this selection with the purpose of ensuring the quality of our test set. The topics were selected from the third level of the ODP taxonomy, the minimum size for each selected topic was 100 pages and the language was restricted to English. For each topic we collected all of its URLs as well as those in its subtopics. The total number of collected pages was more than 350,000. In our tests we used the ODP description of each selected topic to create an initial context description. In order to compare the implemented methods we used three measures of query performance: Novelty-driven similarity is based on Cosine Similarity but disregards the terms that form the query, overcoming the bias introduced by those terms and favoring the exploration of new material. Precision measures the fraction of retrieved documents which are known to be relevant. The relevant set for each analyzed topic was set as the collection of its URLs as well as those in its subtopics. And Semantic Precision is a measure that considers the inter-topics similarity because other topics in the ontology could be semantically similar (and therefore partially relevant) to the topic of the given context. To compute it we used a semantic similarity measure for generalized ontologies proposed in a previous work.
As an example of the algorithm behaviour we show here its evolution in a representative topic. We see in this chart the novelty-driven similarity evolution and its behaviour at the different execution steps.
Now, in this chart we see the novelty-driven similarity again but in this case in a comparative chart. Each of the 448 topics is represented by a point. It’s interesting to note that for all the tested cases the incremental method was superior to other two methods.
Here, we see the comparison for the Precision metric. In this case the incremental method was strictly superior for 66.96% of the evaluated topics.
Finally, we see the Semantic Precision metric. Here the incremental method was superior in 65.18% of the topics.
The vocabulary problem is a main challenge in human-system communication. In this work we propose a solution to the semantic sensitivity problem, that is the limitation that arises when documents with similar context but different term vocabulary won't be associated, resulting in a false negative match. Our method operates by incrementally learning better vocabularies from a large external corpus such as the Web. We have shown that by implementing an incremental context refinement method we can perform better than a baseline method, which submits queries directly from the initial context, and to the Bo1-DFR method, which doesn't refine queries based on context. This points to the usefulness of simultaneously taking advantage of the terms in the current thematic context and an external corpus to learn better vocabularies and to automatically tune queries.