Nowadays, successful applications are those which contain features that captivate and engage users. Using an interactive news retrieval system as a use case, in this paper we study the effect of timeline and named-entity components on user engagement. This is in contrast with previous studies where the importance of these components were studied from a retrieval effectiveness point of view. Our experimental results show significant improvements in user engagement when named-entity and timeline components were installed. Further, we investigate if we can predict user-centred metrics through user's interaction with the system. Results show that we can successfully learn a model that predicts all dimensions of user engagement and whether users will like the system or not. These findings might steer systems that apply a more personalised user experience, tailored to the user's preferences.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
Broad introduction to information retrieval and web search, used to teaching at the Yahoo Bangalore Summer School 2013. Slides are a mash-up from my own and other people's presentations.
An Introduction to Information Retrieval and Applicationssathish sak
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
Proposals are *required* for each team, and will be counted in the score
This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
Broad introduction to information retrieval and web search, used to teaching at the Yahoo Bangalore Summer School 2013. Slides are a mash-up from my own and other people's presentations.
An Introduction to Information Retrieval and Applicationssathish sak
An Introduction to Information Retrieval and Applications The score you get depends on the functions, difficulty and quality of your project
For system development:
System functions and correctness
For academic paper presentation
Quality and your presentation of the paper
Major methods/experimental results *must* be presented
Papers from top conferences are strongly suggested
E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
Proposals are *required* for each team, and will be counted in the score
This 2-hour lecture was held at Amsterdam University of Applied Sciences (HvA) on October 16th, 2013. It represents a basic overview over core technologies used by ICT companies such as Google, Twitter or Facebook. The lecture does not require a strong technical background and stays at conceptual level.
MLIS Course Code 5501-Information Retrieval and Dissemination- Workshop AIOU 2013, Information Management, Information Retrieval and Dissemination, Information Retrieval, Information Dissemination, Workshop, AIOU, Computer Science, Information science, Information technology, Hardware, Software, Computer basics,
Using “Distant Reading” to Explore Discussion Threads in Online CoursesShalin Hai-Jew
In this age of mass data, “distant reading” has come to the fore as a way to deal with large amounts of text data—including from student discussion threads in online courses. Kansas State University has a site license for NVivo 11 Plus, a software that enables multimedia data curation and qualitative and mixed methods data analysis. Two new features in NVivo—sentiment analysis and theme extraction (topic modeling)—enable users to “distant read” large amounts of text to extract some early insights.
What are the expressed sentiments of learners when discussing a particular issue? Do these trend positive or negative?
What topics or themes or concepts are brought up by students given a certain discussion thread prompt?
What do the sentiment and topic insights suggest about where students are at with a particular issue? Are there latent (hidden) insights?
These new features, in combination with text frequency counts (with related text clustering), text searches, and other text data query capabilities (and related data visualization capabilities) in NVivo enable distant reading for use in online courses. This digital slideshow will introduce NVivo 11 Plus (a local software tool with both Windows and Mac platform versions) and walk-through how it may be applied to textual data extracted from an online course.
Understanding what students are thinking is a critical part of transformational teaching and learning. Using computational means to listen and to hear is important to this end.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
The diversity and complexity of contents available on the web have dramatically increased in recent years. Multimedia content such as images, videos, maps, voice recordings has been published more often than before. Document genres have also been diversified, for instance, news, blogs, FAQs, wiki. These diversified information sources are often dealt with in a separated way. For example, in web search, users have to switch between search verticals to access different sources. Recently, there has been a growing interest in finding effective ways to aggregate these information sources so that to hide the complexity of the information spaces to users searching for relevant information. For example, so-called aggregated search investigated by the major search engine companies will provide search results from several sources in a single result page. Aggregation itself is not a new paradigm; for instance, aggregate operators are common in database technology.
This talk presents the challenges faced by the like of web search engines and digital libraries in providing the means to aggregate information from several and complex information spaces in a way that helps users in their information seeking tasks. It also discusses how other disciplines including databases, artificial intelligence, and cognitive science can be brought into building effective and efficient aggregated search systems.
Information Storage and Retrieval : A Case StudyBhojaraju Gunjal
Bhojaraju.G, M.S.Banerji and Muttayya Koganurmath (2004). Information Storage and Retrieval: A Case Study, In Proceedings of International Conference on Digital Libraries (ICDL 2004), New Delhi, Feb 24-27, 2004.
(Best Poster Presentation Award)
MLIS Course Code 5501-Information Retrieval and Dissemination- Workshop AIOU 2013, Information Management, Information Retrieval and Dissemination, Information Retrieval, Information Dissemination, Workshop, AIOU, Computer Science, Information science, Information technology, Hardware, Software, Computer basics,
Using “Distant Reading” to Explore Discussion Threads in Online CoursesShalin Hai-Jew
In this age of mass data, “distant reading” has come to the fore as a way to deal with large amounts of text data—including from student discussion threads in online courses. Kansas State University has a site license for NVivo 11 Plus, a software that enables multimedia data curation and qualitative and mixed methods data analysis. Two new features in NVivo—sentiment analysis and theme extraction (topic modeling)—enable users to “distant read” large amounts of text to extract some early insights.
What are the expressed sentiments of learners when discussing a particular issue? Do these trend positive or negative?
What topics or themes or concepts are brought up by students given a certain discussion thread prompt?
What do the sentiment and topic insights suggest about where students are at with a particular issue? Are there latent (hidden) insights?
These new features, in combination with text frequency counts (with related text clustering), text searches, and other text data query capabilities (and related data visualization capabilities) in NVivo enable distant reading for use in online courses. This digital slideshow will introduce NVivo 11 Plus (a local software tool with both Windows and Mac platform versions) and walk-through how it may be applied to textual data extracted from an online course.
Understanding what students are thinking is a critical part of transformational teaching and learning. Using computational means to listen and to hear is important to this end.
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
The diversity and complexity of contents available on the web have dramatically increased in recent years. Multimedia content such as images, videos, maps, voice recordings has been published more often than before. Document genres have also been diversified, for instance, news, blogs, FAQs, wiki. These diversified information sources are often dealt with in a separated way. For example, in web search, users have to switch between search verticals to access different sources. Recently, there has been a growing interest in finding effective ways to aggregate these information sources so that to hide the complexity of the information spaces to users searching for relevant information. For example, so-called aggregated search investigated by the major search engine companies will provide search results from several sources in a single result page. Aggregation itself is not a new paradigm; for instance, aggregate operators are common in database technology.
This talk presents the challenges faced by the like of web search engines and digital libraries in providing the means to aggregate information from several and complex information spaces in a way that helps users in their information seeking tasks. It also discusses how other disciplines including databases, artificial intelligence, and cognitive science can be brought into building effective and efficient aggregated search systems.
Information Storage and Retrieval : A Case StudyBhojaraju Gunjal
Bhojaraju.G, M.S.Banerji and Muttayya Koganurmath (2004). Information Storage and Retrieval: A Case Study, In Proceedings of International Conference on Digital Libraries (ICDL 2004), New Delhi, Feb 24-27, 2004.
(Best Poster Presentation Award)
The Halifax Index tells Halifax, Nova Scotia's story - the strength of the economy, the health of the community, the sustainability of the environment and the progress of the Economic Strategy. It’s a new and smarter way to measure progress that reaches beyond traditional economic indicators like GDP and jobs. It provides deeper insight and the most comprehensive look at what’s happening in Halifax than anything currently available, and could be used as a model for other Canadian cities.
The 1st Halifax Index was presented at the Halifax State of the Economy Conference www.agreaterhalifax.com
Presented at the Halifax State of the Economy Conference
Is there something about Nova Scotia’s DNA that hinders the development of business-led clustering and achieving greater productivity?
Canada now faces a significant and growing productivity gap relative to the U.S., which will threaten our long-term prosperity. Low productivity is, and will continue to be, the most significant threat to Canada’s standard of living. As a result, Deloitte has invested significant time and energy in the study of Canadian productivity.
One of the findings was that in regions with cluster characteristics, business-led incubator and innovation parks can be strong catalysts for economic development. There are several leading incubators gaining global recognition.
What is hindering Nova Scotia from developing a similar clustering and innovation culture?
Beyond xUnit example-based testing: property-based testing with ScalaCheckFranklin Chen
Test-Driven Development has become deservedly popular in the past decade, with easy-to-use xUnit unit testing frameworks leading the way toward encouraging developers to write tests. But xUnit has limitations: how does one know one has written enough test cases for a desired behavior? And what if the behavior is conditional on other behavior? Property-based testing, first popularized for Haskell with the QuickCheck library, but available now for other languages as well, offers a powerful addition to one's testing toolkit.
I will discuss the concepts of property-based testing and illustrate them concretely using ScalaCheck for Scala, and point toward similar test frameworks in other languages.
Audio Music Similarity is a task within Music Information Retrieval that deals with systems that retrieve songs musically similar to a query song according to their audio content. Evaluation experiments are the main scientific tool in Information Retrieval to determine what systems work better and advance the state of the art accordingly. It is therefore essential that the conclusions drawn from these experiments are both valid and reliable, and that we can reach them at a low cost. This dissertation studies these three aspects of evaluation experiments for the particular case of Audio Music Similarity, with the general goal of improving how these systems are evaluated. The traditional paradigm for Information Retrieval evaluation based on test collections is approached as an statistical estimator of certain probability distributions that characterize how users employ systems. In terms of validity, we study how well the measured system distributions correspond to the target user distributions, and how this correspondence affects the conclusions we draw from an experiment. In terms of reliability, we study the optimal characteristics of test collections and statistical procedures, and in terms of efficiency we study models and methods to greatly reduce the cost of running an evaluation experiment.
Introduction to Usability Testing for Survey ResearchCaroline Jarrett
The basics of how to incorporate usability testing in the development process of a survey. Workshp first presented at the SAPOR conference, Raleigh, North Carolina USA, October 2011 by Emily Geisen of RTI and Caroline Jarrett of Effortmark.
Whole systems change across a neighbourhood
How can we collaborate with people to help them build their resilience? Get under the skin of the culture and the lives people live. Identify people’s feelings and experiences of community and understand what people think is shaped by different values and by the environment and infrastructure around them. The future of collaboration could bring many opportunities but people find it more difficult to live and act together than before. How can we help people…and communities build their resilience? Understand people’s different situations and capabilities to develop pathways that help them build resilient relationships. Help people experience and practice change together. Help people grow everyday practices into sustainable projects. Turn people’s everyday motivations into design principles. Support infrastructure that connects different cultures of collaboration. Build relationships with people designing in collaboration for the future…now.
The purpose of this research proposal is to identify organizational principles for the development of online learning curriculum in higher education. This study will address the following research questions: Can educational psychology learning theories (such as cognitive load theory) be used to inform usability-testing methods? Can usability-testing methods be used to discover basic principles of online learning curricular organization? Are there basic principles of online learning curricular organization that can improve the efficiency, effectiveness, and user satisfaction of learning in online environments? While there are many theoretical directions one could take to examine the interface of instructional design and technology, this research proposal will use the lens of the cognitive load theory. This study will use the cognitive walkthrough method as established by usability testing standards. Cognitive walkthroughs use an explicitly detailed procedure to simulate a user’s problem solving process at each step through the dialogue, checking if the simulated user’s goals and memory content can be assumed to lead to the next correct action. Participants will be asked to complete a series of tasks in an online learning environment formulated to compare different methods of organization. This study has the potential to make significant contributions to the field of educational psychology and online education by providing substantive empirical data that sheds light on potential principles that improve the effectiveness, efficiency, and user satisfaction of Web-based education.
Entity Linking via Graph-Distance MinimizationRoi Blanco
Entity-linking is a natural-language--processing task that consists in identifying strings of text that refer to a particular
item in some reference knowledge base.
One instance of entity-linking can be formalized as an optimization problem on the underlying concept graph, where the quantity to be optimized is the average distance between chosen items.
Inspired by this application, we define a new graph problem which is a natural variant of the Maximum Capacity Representative Set. We prove that our problem is NP-hard for general graphs; nonetheless, it turns out to be solvable in linear time under some more restrictive assumptions. For the general case, we propose several heuristics: one of these tries to enforce the above assumptions while the others try to optimize similar easier objective functions; we show experimentally how these approaches perform with respect to some baselines on a real-world dataset.
Slides used for the keynote at the even Big Data & Data Science http://eventos.citius.usc.es/bigdata/
Some slides are borrowed from random hadoop/big data presentations
Slides for the iDB summer school (Sapporo, Japan) http://db-event.jpn.org/idb2013/
Typically, Web mining approaches have focused on enhancing or learning about user seeking behavior, from query log analysis and click through usage, employing the web graph structure for ranking to detecting spam or web page duplicates. Lately, there's a trend on mining web content semantics and dynamics in order to enhance search capabilities by either providing direct answers to users or allowing for advanced interfaces or capabilities. In this tutorial we will look into different ways of mining textual information from Web archives, with a particular focus on how to extract and disambiguate entities, and how to put them in use in various search scenarios. Further, we will discuss how web dynamics affects information access and how to exploit them in a search context.
Beyond document retrieval using semantic annotations Roi Blanco
Traditional information retrieval approaches deal with retrieving full-text document as a response to a user's query. However, applications that go beyond the "ten blue links" and make use of additional information to display and interact with search results are becoming increasingly popular and adopted by all major search engines. In addition, recent advances in text extraction allow for inferring semantic information over particular items present in textual documents. This talks presents how enhancing a document with structures derived from shallow parsing is able to convey a different user experience in search and browsing scenarios, and what challenges we face as a consequence.
Large knowledge bases consisting of entities and relationships between them have become vital sources of information for many applications. Most of these knowledge bases adopt the Semantic-Web data model RDF as a representation model. Querying these knowledge bases is typically done using structured queries utilizing graph-pattern languages such as SPARQL. However, such structured queries require some expertise from users which limits the accessibility to such data sources. To overcome this, keyword search must be supported. In this paper, we propose a retrieval model for keyword queries over RDF graphs. Our model retrieves a set of subgraphs that match the query keywords, and ranks them based on statistical language models. We show that our retrieval model outperforms the-state-of-the-art IR and DB models for keyword search over structured data using experiments over two real-world datasets.
Extending BM25 with multiple query operatorsRoi Blanco
Traditional probabilistic relevance frameworks for informational retrieval refrain from taking positional information into account, due to the hurdles of developing a sound model while avoiding an explosion in the number of parameters. Nonetheless, the well-known BM25F extension of the successful Okapi ranking function can be seen as an embryonic attempt in that direction. In this paper, we proceed along the same line, defining the notion of virtual region: a virtual region is a part of the document that, like a BM25F-field, can provide a (larger or smaller, depending on a tunable weighting parameter) evidence of relevance of the document; differently from BM25F fields, though, virtual regions are generated implicitly by applying suitable (usually, but not necessarily, positional-aware) operators to the query. This technique fits nicely in the eliteness model behind BM25 and provides a principled explanation to BM25F; it specializes to BM25(F) for some trivial operators, but has a much more general appeal. Our experiments (both on standard collections, such as TREC, and on Web-like repertoires) show that the use of virtual regions is beneficial for retrieval effectiveness.
Energy-Price-Driven Query Processing in Multi-center WebSearch EnginesRoi Blanco
Concurrently processing thousands of web queries, each with a response time under a fraction of a second, necessitates maintaining and operating massive data centers. For large-scale web search engines, this translates into high energy consumption and a huge electric bill. This work takes the challenge to reduce the electric bill of commercial web search engines operating on data centers that are geographically far apart. Based on the observation that energy prices and query workloads show high spatio-temporal variation, we propose a technique that dynamically shifts the query workload of a search engine between its data centers to reduce the electric bill. Experiments on real-life query workloads obtained from a commercial search engine show that significant financial savings can be achieved by this technique.
Effective and Efficient Entity Search in RDF dataRoi Blanco
Triple stores have long provided RDF storage as well as data access using expressive, formal query languages such as SPARQL. The new end users of the Semantic Web, however, are mostly unaware of SPARQL and overwhelmingly prefer imprecise, informal keyword queries for searching over data. At the same time, the amount of data on the Semantic Web is approaching the limits of the architectures that provide support for the full expressivity of SPARQL. These factors combined have led to an increased interest in semantic search, i.e. access to RDF data using Information Retrieval methods. In this work, we propose a method for effective and efficient entity search over RDF data. We describe an adaptation of the BM25F ranking function for RDF data, and demonstrate that it outperforms other state-of-the-art methods in ranking RDF resources. We also propose a set of new index structures for efficient retrieval and ranking of results. We implement these results using the open-source MG4J framework.
Caching Search Engine Results over Incremental IndicesRoi Blanco
A Web search engine must update its index periodically to incorporate changes to the Web. We argue in this paper that index updates fundamentally impact the design of search engine result caches, a performance-critical component of modern search engines. Index updates lead to the problem of cache invalidation: invalidating cached entries of queries whose results have changed. Naive approaches, such as flushing the entire cache upon every index update, lead to poor performance and in fact, render caching futile when the frequency of updates is high. Solving the invalidation problem efficiently corresponds to predicting accurately which queries will produce different results if re-evaluated, given the actual changes to the index.
To obtain this property, we propose a framework for developing invalidation predictors and define metrics to evaluate invalidation schemes. We describe concrete predictors using this framework and compare them against a baseline that uses a cache invalidation scheme based on time-to-live (TTL). Evaluation over Wikipedia documents using a query log from the Yahoo! search engine shows that selective invalidation of cached search results can lower the number of unnecessary query evaluations by as much as 30% compared to a baseline scheme, while returning results of similar freshness. In general, our predictors enable fewer unnecessary invalidations and fewer stale results compared to a TTL-only scheme for similar freshness of results.
We study the problem of finding sentences that explain the relationship between a named entity and an ad-hoc query, which we refer to as entity support sentences. This is an important sub-problem of entity ranking which, to the best of our knowledge, has not been addressed before. In this paper we give the first formalization of the problem, how it can be evaluated, and present a full evaluation dataset. We propose several methods to rank these sentences, namely retrieval-based, entity-ranking based and position-based. We found that traditional bag-of-words models perform relatively well when there is a match between an entity and a query in a given sentence, but they fail to find a support sentence for a substantial portion of entities. This can be improved by incorporating small windows of context sentences and ranking them appropriately.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Influence of Timeline and Named-entity Components on User Engagement
1. Influence of Timeline and Named-entity Components
on User Engagement
Yashar Moshfeghi1, Michael Matthews2, Roi Blanco2, Joemon M. Jose1
1 School of Computing Science, University of Glasgow, Glasgow, UK
2 Yahoo! Labs, Barcelona, Spain
Yashar.Moshfeghi@glasgow.ac.uk
ECIR 2013, Moscow, Russia
2. Outline
• User Engagement
• Prediction of User-centred metrics
• Evaluation Methodology
• Results
• Conclusions
6. Research Question
• We aim to answer the following research
question:
– “can timeline and named-entity components
improve user engagement in the context of a
news retrieval system?”
7. Multi-faceted
concept:
emotional,
cognitive and
behavioural
Subjective measures
(O’Brien and Toms):
focused attention,
aesthetics,
perceived usability,
endurability,
novelty,
involvement
Objective measures:
Subjective Perception
of Time
8. An increase of information-rich user experiences in
the search realm (logged interaction data)
Prediction of user
preferences for web
search results
Prediction of user-centred
metrics of
an IIR system
Build search applications in which the layout and elements
displayed adapt to the needs of the user or context
9. Submit Query
Retrieved Results
The News System Anatomy
Timeline Component
Entity Component
10. Experimental Methodology
• Design
– A ‘within-subjects’ design was used in this study.
• The independent variable
– the system (with two levels: baseline, enriched),
– controlled by the viewing timeline and named-entity
components (enriched) or hiding them (baseline).
• The dependent variables were:
– (i) user engagement
• (involvement, novelty, endurability, usability, aesthetics, attention)
– (ii) system preference
11. Experimental Methodology - Task
• We used a simulated information need
situation.
• The simulated task was defined as follow:
– “Imagine you are reading today’s news events and
one of them is very important or interesting to
you, and you want to learn more. Find as much
relevant news information as possible so that you
can construct an overall (big) picture of the event
and also cover the important parts of it.”
12. Experimental Methodology - Task
• The search task was presented twice to each
participant with different search topics.
13. • Advantages:
• Reduced monetary cost
• Ease of engaging a large number of users in the study.
• Disadvantages
• Low quality data and in turn, the challenge is to improve
and assure data quality.
• Need for techniques to minimise
• spammers,
• multiple account workers
• Lazy worker
14. • Multiple response technique for our questionnaire
• known to be very effective and cost efficient to improve
the data quality
• Browser cookies were used to guard against multiple account
workers
• To avoid spammers (as recommended in the literature),
• Population screening based on location (United States)
• HIT approval rate greater than 95%
• To reduce attrition, demographic questions were put at the
beginning of the experimental procedure.
15. Experimental Methodology - Procedure
• Participants were instructed that the experiment
would take approximately 60 minutes to complete
• They were informed that they could only participate
in this study once
• Payment for study completion was $5 (The total cost
of the evaluation was $510 )
• Each participant had to complete two search tasks,
one for each level of independent variable (i.e.
baseline and enriched system)
17. Experimental Methodology
• We considered six dimensions introduced by O’Brien et al.:
– focused attention, aesthetics, perceived usability, endurability,
novelty, and involvement
• The different dimensions were measured through a number
of forced-choice type questions.
• A 5-point scale respond (strong disagree to strong agree)
– “Based on this news retrieval experience, please indicate whether
you agree or disagree with each statement”.
• In total, in each post-search questionnaire we have asked
31 questions related to user engagement
– adapted from O’Brien et al.
– randomised its assignment to participants
18. Experimental Methodology
• Pilot Studies:
– We run three pilot studies using 10 participants.
– Other changes consisted of
• modifications to the questionnaires to clarify questions,
• modifications to the system to improve logging capabilities
• improvements to the training video.
– After the final pilot, it was determined that
• the participants were able to complete the user study
without problems
• the system was correctly logging the interaction data.
19. Results Analysis – Data Preprocessing
• To ensure the availability of relevant documents
– two evaluators manually calculated
• the Precision@1, 5, and 10
• for all the topics
• a set of queries issued by the participants.
– Precision@1, 5 and 10 were 0.85, 0.84, and 0.86
respectively,
– Judges had a very high inter-annotator agreement with
Kappa > 0.9.
– This indicates that the queries the users issued into the
system had good coverage and the ranking was
accurate enough.
20. Results Analysis – Data Preprocessing
• 63 out of 92 users successfully completed the study.
• A relatively even split by condition, with 47% in the scenario
where group 1, and 53% conversely.
• We removed the
– incomplete surveys
– participants who repeated the study
– participants who completed the survey incorrectly (based on task
conditions)
• they had to visit at least three relevant documents for a given topic, and
• the issued queries should be related to the selected topic
– identifying suspect attempts by checking
• the extremely short task durations
• comments that are repeated verbatim across multiple open-ended questions
21. Results Analysis – Demographic Info.
• 126 search sessions that were successfully carried out by 63 participants.
• The 63 participants
– female=46%, male=54%, prefer not to say=0%
– were mainly under the age of 41 (84%)
• with the largest group between the ages of 24-29 (33.3%).
• Participants had
– a high school diploma or equivalent (11.11%),
– associates degree (15.87%),
– graduate degree (11.11%),
– bachelor (31.7%) or
– some college degree (30.15%).
• They were
– primarily employed for a company or organisation (39.68%),
– though there were a number of self-employed (22.22%),
– students (11.11%), and
– not employed (26.98%).
23. Results Analysis
• We did not find any statistically significant
difference between the two systems for
Subjective Perception of Time metric
– with mean and standard deviation of 10.03, ±
5.22, and 10.12, ±4.95, for the baseline and
enriched system respectively
24. Results Analysis - System Preference
• the exit questionnaire posed the question
– “Please select the system you preferred? (answer:
1: First System, 2: Second System)”
– and overall, 76% of the participants preferred the
enriched system better than the baseline system.
25. Prediction of User-centred Metrics:
• The demographic features
– participants’ age, gender, education, and occupation
• The search habits features
– the number of years they have used web search and online news
systems,
– the frequency they engaged in different news search intention
such as browsing, navigating, searching, etc.
– the news domain they are interested in
• The interaction features (derived from log information)
– the total time they spent on each component and to complete a
task,
– the number of clicks, retrieved documents, queries,
– the number of times they used the previous/next button, and
other functionality of the systems
26. Prediction of User-centred Metrics:
• We chose
– the System Preference question
– all the user engagement dimensions.
• For System Preference question,
– we have a binary class of “−1” indicating the participant did
not prefer the enriched system and “+1” otherwise.
• For the user engagement dimensions,
– we used the final value calculated by aggregating all the
questions related to each dimension
– We transformed the values for each dimension to binary by
mapping 4-5 to “+1” and otherwise to “−1”
27. Prediction of User-centred Metrics:
• We learned a model to discriminate between the
two classes using
– SVMs trained with a polynomial kernel,
– based on our analysis in the majority of cases,
outperformed other SVM kernels (linear, and radial-basis).
• We also tried other models such as bayesian
logistic regression and decision trees but they
underperformed with respect to SVMs.
28. Prediction of User-centred Metrics:
• classification performance
– averaged over the 63 participants of the study
– using 10-fold cross validation
Results indicate that
◦ for all the user engagement dimensions (excluding focused
attention), the combination of all features leads to the best
prediction accuracy
◦ Regarding the system preference question, user-system
interaction features determine with high accuracy the
participants’ preference of a system (over 87%).
29. Summary
• Given the competitiveness of the market on the web, applications
nowadays are designed to be both efficient and engaging.
• Thus, a new line of research is to identify system features that
steer user engagement.
• This work studies the interplay between user engagement and
retrieval of named-entities and time, in an interactive search
scenario.
• We devised an experimental setup that exposed our participants
on two news systems, one with a timeline and named-entity
components and one without.
• Two search tasks were performed by the participants and through
questionnaires, user engagement was analysed.
30. Conclusions
• Overall findings based on user questionnaires, show that substantial user
engagement improvements can be achieved by integrating time and entity
information into the system.
• Further analysis of the results show that the majority of the participants
preferred the enriched system over the baseline system.
• We also investigated the hypothesis that user-centred metrics can be
predicted in an IIR scenario given the participants’ demographics and
search habits, and/or interaction with the system.
• The results obtained across all the user engagement dimensions as well as
System Preference question, supported our hypothesis.
• As future work, we will continue to study how user interactions can be
leveraged to predict satisfaction measures and possibly build interfaces
that adapt based on user interaction patterns.
31. Acknowledgement: This work was partially supported by the EU FP7 LiMoSINe project
(288024).
This work was performed while intern at Yahoo! Research lab in
Editor's Notes
“how and why people develop a relationship with technology and integrate it into their lives.”
Thus, a new line of research is to identify system features that steer user engagement,
which has become a key concept in designing user-centred web applications.
Given the ubiquity of the choices on the web
the competitiveness of the market,
applications nowadays are designed to
not only be
efficient, effective, or satisfying
but also
engaging.
There has been great attention on retrieving named entities, and using the time dimension for retrieval.
Those approaches are evaluated exclusively focusing on a Cranfield-style paradigm, with little or no attention on user input, context and interaction.
However, it is difficult to correlate user engagement with traditional retrieval metrics such as MAP.
This problem becomes exacerbated when the user has to cope with content-rich user interfaces that include different sources of evidence and information nuggets of a different nature.
This work studies the interplay between user engagement and retrieval of named-entities and time, in an interactive search scenario.
User engagement is a multi-faceted concept associated with the emotional, cognitive and behavioural connection of user with a technological resource at any point of interaction period [1].
O’Brien and Toms defined a model characterising the key indicative dimensions of user engagement:
focused attention, aesthetics, perceived usability, endurability, novelty, involvement.
These factors elaborate the user engagement notion over the emotional, cognitive and behavioural aspects.
Subjective and objective measures are proposed to evaluate user engagement [1], the former being considered to be the most for evaluation.
We use the subjective measures proposed by O’Brien et al. [3].
Objective measures include subjective perception of time (SPT) and information retrieval metrics among others.
SPT is calculated by asking participants to estimate the time taken to complete their searching task, which is compared with the actual time [1].
Given the increase of information-rich user experiences in the search realm, we leverage the amount of logged interaction data.
Prediction of user preferences for web search results based on user interaction with the system has been studied previously.
In this work, we try to predict user-centred metrics of an IIR system rather than user preferences for its search results.
Our positive findings could steer research into building search applications in which the layout and elements displayed adapt to the needs of the user or context.
To provide a use case for our investigation, we experiment with a news search system, which encourages interaction due to the information overload problem associated with the news domain. One way to facilitate user interaction in such scenarios is to develop new methods of accessing such electronic resources.
For this purpose, we carefully varied the components of a news retrieval system page. We experimented with a timeline and named-entity component (enriched) or hiding them (baseline), while keeping everything else fixed, and tested whether adding these components can help improve user engagement.
To study the predictability of the user centred metrics, we repeat our interactive experiments at two different points in time, with a tightly controlled setting.
As an outcome of those experiments, we conclude that the user-centred metrics can be predicted with high accuracy given their interaction with the system and their demographics and search habits are provided as an input.
We introduced a short cover story that helped us describe to our participants the source of their information need, the environment of the situation and the problem to be solved.
2) This facilitated a better understanding of the search objective and, in addition, introduced a layer of realism, while preserving well-defined relevance criteria.
We prepared a number of search topics that covered a variety of contexts, from entertainment and sport to crime and political issues, in order to capture participants’ interests as best as possible.
We make use of Amazon’s Mechanical Turk (M-Turk), as our crowdsourcing platform.
Particular attention was paid in our experimental design to help motivate participants to respond honestly to the self-report questions and take the tasks seriously.
- though they would be given 120 minutes between the time they accepted and submitted the HIT assignment.
-------
- they would not be paid if they had participated in any of the previous pilot studies.
-------
- Given the findings of Mason and Watts, we expect the increase in wage just to change the rate of incoming workers to accept the HITS, and not affect their performance.
- The total cost of the evaluation was $510 including the cost of the pilot studies and some of the rejected participants, which we consider to be cost-effective.
-------
- The order in which each participant was introduced to the systems was randomised to soften any bias, e.g. the effect of task and/or fatigue.
- Subsequently, participants were assigned to one of two systems (baseline or enriched) by clicking the link to the external survey.
-------
At the beginning of the experiment, the participants were introduced to an entry questionnaire,
Demographic information,
Previous experience with online news,
in particular, browsing and search habits to estimate their familiarity with news retrieval systems and their related tasks.
At the beginning of each task, the participants completed a Pre-search questionnaire,
to understand the reason why a particular topic was selected
At the end of each task, the participants completed a post-search questionnaire
to elicit subject’s viewpoint on all user engagement dimensions.
Finally, an exit questionnaire was introduced at the end of the study.
In this questionnaire we gathered information about the user study in general:
which system and task they preferred and why and their general comments.
For example, involvement was measured by adapting three questions from [3]:
(1) I was really drawn into my news search task.
(2) I felt involved in this news search task.
(3) This news search experience was fun.
In each iteration, a number of changes were made to the system based on feedback from the pilot study.
For example, for each dimension we computed Cronbach’s alpha to evaluate the reliability of the questions adopted for each dimension.
We finalised the questions of each dimension by confirming their Cronbach’s alpha value (> 0.8).
Cronbach’s alpha is used as a measure of the internal consistency of a psychometric test score for a sample of subjects.
This is further explained by the fact that the topics were timely and most news providers including in the index contained articles related to them.
after either abandoning it part-way through or had completed it once before.
Figures 2 shows the box plot for the user engagement analysis, for the two systems (baseline and enriched), based on the post-study questionnaire.
The box plot reports, over the data gathered from 63 participants, five important pieces of information namely: the minimum, first, second (median), third, and maximum quartiles.
We performed a paired Wilcoxon Mann-Whitney test between measures obtained for the enriched system for each user to check the significance of the difference with the baseline system.
We use (*) and (**) to denote the fact that a dimension had results different from that of the baseline with the confidence levels (p < 0.05) and (p < 0.01) respectively.
As shown in Figure 2, the enriched system has a better median and/or mean and lower variance than the baseline system across all dimensions.
This shows that substantial user engagement improvements can be achieved by integrating time and entity information into the system.
The findings also show that participants are significantly more engaged both from cognition (considering endurability and involvement) and emotion (considering the aesthetics and novelty) aspects when time and entity dimensions of the information space are provided (i.e. enriched system).
(we refer to as System Preference)
We investigate whether user engagement and in a more general sense user-centred metrics can be predicted, given the participants’ demographic and search habits information, and/or their interaction with the system.
taken from exit and post-search questionnaire respectively
Remarkably, the machine learned model is able to predict with a low error all of the user and system metrics.
Given these positive findings, it is possible to move towards personalised search applications in which the layout and elements displayed adapt to the needs of the user or context which in turn results in increasing the users’ engagement as well as their preference of the system.