A talk summarizing the main lessons from the CWI participation in the 2014 TREC Contextual Suggestions track. If you want to suggest tourist locations, use tourist sources. If you want reproduceable research results, map these into Clueweb first.
Slides presented at the TREC conference about our participation in the Contextual Suggestion track. We applied tourist domain knowledge inferred from the public Web to filter documents from the ClueWeb12 collection (a crawl from the Web).
Mathematics & Computer Science Seminar
Emory University
October 2, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
Slides presented at the TREC conference about our participation in the Contextual Suggestion track. We applied tourist domain knowledge inferred from the public Web to filter documents from the ClueWeb12 collection (a crawl from the Web).
Mathematics & Computer Science Seminar
Emory University
October 2, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
Search Analytics: Conversations with Your Customersrichwig
Did you know that the search box on your home page handles half or more of all your visitors requests? What do people search for most often when they visit your Web site? How can you tune your site search -- and your site -- to perform better?
Rich Wiggins presents a talk that he and co-author Lou Rosenfeld prepared, covering the topis of search analytics, Best Bets, and tuning your Web site to match what your customers seek.
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011
http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/
LANL Research Library
March 12, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
www.cs.odu.edu/~{mklein,mln}
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
In my talk I walk throgh Semantic Web initiatives, like RDF and SPARQL, linked data principles, discuss some implementation and adoption issues and talk about semantic annotation in HTML. Semantic annotation using the Schema.org vocabulary is demonstrated using both HTML 5 Microdata or JSON-LD input. There is a strong highlight in benefits seen in Google search results with Rich Snippets, Actions in Email, and Google Now with real examples.
This slide deck explains the basic principals that Office Graph is using to create keywords and search results.
It also explains, how Office Graph is using artificial intelligence to extract keywords by itself.
In addition I show to to query Office Graph by using SharePoint Search REST queries and the new GQL.
Enrich Search User Experience Using Amazon CloudSearch (SVC302) | AWS re:Inve...Amazon Web Services
Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices.
Evolutionary & Swarm Computing for the Semantic WebAnkit Solanki
Semantic Web will be the next big thing in the world of internet. This presentation talks about various approaches that can be used to query the underlying triple store that has all the information.
Search Analytics: Conversations with Your Customersrichwig
Did you know that the search box on your home page handles half or more of all your visitors requests? What do people search for most often when they visit your Web site? How can you tune your site search -- and your site -- to perform better?
Rich Wiggins presents a talk that he and co-author Lou Rosenfeld prepared, covering the topis of search analytics, Best Bets, and tuning your Web site to match what your customers seek.
Presentation at ELAG 2011, European Library Automation Group Conference, Prague, Czech Republic. 25th May 2011
http://elag2011.techlib.cz/en/815-lifting-the-lid-on-linked-data/
LANL Research Library
March 12, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
www.cs.odu.edu/~{mklein,mln}
Linked Data in Use: Schema.org, JSON-LD and hypermedia APIs - Front in Bahia...Ícaro Medeiros
In my talk I walk throgh Semantic Web initiatives, like RDF and SPARQL, linked data principles, discuss some implementation and adoption issues and talk about semantic annotation in HTML. Semantic annotation using the Schema.org vocabulary is demonstrated using both HTML 5 Microdata or JSON-LD input. There is a strong highlight in benefits seen in Google search results with Rich Snippets, Actions in Email, and Google Now with real examples.
This slide deck explains the basic principals that Office Graph is using to create keywords and search results.
It also explains, how Office Graph is using artificial intelligence to extract keywords by itself.
In addition I show to to query Office Graph by using SharePoint Search REST queries and the new GQL.
Enrich Search User Experience Using Amazon CloudSearch (SVC302) | AWS re:Inve...Amazon Web Services
Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices.
Web Archives and the dream of the Personal Search EngineArjen de Vries
Keynote at the 4th Alexandria Workshop organised by Avishek Anand and Wolfgang Nejdl, L3S, Hannover (Germany). I argue that Web Archives should act as a pivot while revisiting the idea of decentralised search.
See also http://alexandria-project.eu/events/4th-int-alexandria-workshop-19-20-october-2017/
Lecture on Information Retrieval and Social Media, given to PhD students in the User-Centred Social Media Summer School, in Duisburg, September 19, 2017.
See also https://www.ucsm.info/events/118-new-frontiers-in-social-media-research-%E2%80%93-international-summer-school-2018
Opening statement at the "Looking forward" panel at the 25 years of TREC celebration event, Nov 15th, 2016.
Webcast to appear within a week: https://www.nist.gov/news-events/events/2016/11/webcast-text-retrieval-conference
Huygens colloquium at Radboud University Science Faculty.
Effective web search engines (and the commercial success of a few internet giants) depend upon the data collected from the online seeking behaviour of huge numbers of users. Put differently, the high quality search results we accept for granted every day come at the price of reduced privacy.
A personal search engine would not only search the web, but also rich personal data including email, browsing history, documents read and contents of the user’s home directory. Results with so-called "slow search" indicate that the user experience can be improved significantly when the search engine gains access to additional data. However, will we be prepared to give up even more of our privacy, and eventually be prepared to give up control over all that personal information?
My proposal is to mitigate these concerns by developing a new architecture for web search, in which users control the trade-off between search result quality and the privacy risk inherent to sharing usage logs. Under this design, all data of the “personal search engine” (PSE) (web and usage data) resides in its owner’s personal digital infrastructure.
Two challenges need to be overcome to turn this into a viable alternative. Can we compensate for the loss of information about searches of large numbers of users? And, can we maintain an up-to-date index in a cost-effective manner? As a solution, I propose to organise personal search engines in a decentralised social network. This serves two goals: the index can be kept up-to-date collaboratively, and usage data may be traded with peers.
Models for Information Retrieval and RecommendationArjen de Vries
Online information services personalize the user experience by applying recommendation systems to identify the information that is most relevant to the user. The question how to estimate relevance has been the core concept in the field of information retrieval for many years. Not so surprisingly then, it turns out that the methods used in online recommendation systems are closely related to the models developed in the information retrieval area. In this lecture, I present a unified approach to information retrieval and collaborative filtering, and demonstrate how this let’s us turn a standard information retrieval system into a state-of-the-art recommendation system.
Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.
Social media sites (by some referred to as the web 2.0) allow their users to interact with each other, for example in collecting and sharing so-called user-generated content - these can be just bookmarks, but also blogs, images, and videos. Social media support co-creation: processes where customers (or users, if you prefer) do not just consume but play an active role in defining and shaping the end product. Famous examples include Six Degrees, LiveJournal, Digg, Epinions, Myspace, Flickr, YouTube, Linked-in, and Pinterest. Of course, today's internet giants Facebook and Twitter are key new developments. Finally, Wikipedia should not be overlooked - a major resource in many language technologies including information retrieval!
The second part of the lecture looks into the opportunities for information retrieval research. Social media platforms tend to provide access to user profiles, connections between users, the content these users publish or share, and how they react to each other's content through commenting and rating. Also, the large majority of social media platforms allow their users to categorize content by means of tags (or, in direct communication, through hash-tags), resulting in collaborative ways of information organization known as folksonomies. However, these social media also form a challenge for information retrieval research: the many platforms vary in functionalities, and we have only very little understanding of clearly desirable features like combining tag usage and ratings in content recommendation! A unifying approach based on random walks will be discussed to illustrate how we can answer some of these questions [1], but clearly the area has ample opportunity to leave your own marks.
In the final part of the lecture I will briefly touch upon an even wider range of opportunities, where data derived from social media form a key component to enable new research and insights. I will review a few important results from research centered on Wikipedia, facebook and twitter data, as well as a diverse range of new information sources including the geo- and temporal information derived from images and tweets, product reviews and comments on youtube videos, and how url shorteners may give a view on what is popular on the web.
[1] Maarten Clements, Arjen P. De Vries, and Marcel J. T. Reinders. 2010. The task-dependent effect of tags and ratings on social media access. ACM Trans. Inf. Syst. 28, 4, Article 21 (November 2010), 42 pages. http://doi.acm.org/10.1145/1852102.1852107
Looking beyond plain text for document representation in the enterpriseArjen de Vries
In many real life scenarios, searching for information is not the user's end goal. In this presentation I look into the specific example of corporate strategy and business development in a university setting.
In today's academic institutions, strategic questions are those that relate to dependency on funding instruments, the public private partnerships that exist (and those that should be extended!), and the match between topic areas addressed by the research staff and those claimed important by policy makers. The professional search tasks encountered to answer questions in this domain are usually addressed by business intelligence (BI) tools, and not by search engines. However, professionals are known to be busy people inspired by their own research interests, and not particularly fond of keeping the
customer relationship management (CRM) or knowledge management systems up to date for the organisation's strategic interest. This then results in incomplete and inaccurate data.
Instead of requiring research staff (or their administrative support) to provide this management information, I will illustrate by example how the desired information usually exists already in the documents inherent to the academic work process. Information retrieval could thus play an important role in the computer systems that support the business analytics involved, and could significantly improve the coverage of entities of interest - i.e., to reduce the effort involved in achieving good recall in business analytics. The ranking functionality over the enterprise's (textual) content should however not be an isolated component. Our example setting integrates the information derived from research proposals, research publications and the financial systems, providing an excellent motivation for a more unified approach to structured and unstructured data.
Recommendation and Information Retrieval: Two Sides of the Same Coin?Arjen de Vries
Status update on our current understanding of how collaborative filtering relates far more closely to information retrieval than usually thought. Includes work by Jun Wang and Alejandro Bellogín. This presentation has been given at the Siks PhD student course on computational intelligence, May 24th, 2013
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
2. Contextual Suggestions
Given a user profile and a context, make
suggestions
AKA Context-aware Recommendation, zero-query
Information Retrieval, …
3. “Entertain me”
Recommend “things to do”, where
User profile consists of opinions about attractions
Context consists of a specific geo-location
7. TREC Contextual Suggestions (1/3)
Given a user profile
70 – 100 POIs represented by a title, description
and URL (situated in Chicago / Santa Fe)
Rated on a scale 0 – 4
125, Adler Planetarium & Astronomy Museum, ''Interactive
exhibits & high-tech sky shows entertain stargazers --
lakefront views are a bonus.'',
http://www.adlerplanetarium.org/
131,Lincoln Park Zoo,"Lincoln Park Zoo is a free 35-acre zoo
located in Lincoln Park in Chicago, Illinois. The zoo was
founded in 1868, making it one of the oldest zoos in the
U.S. It is also one of a few free admission zoos in the
United States.", http://www.lpzoo.org/
700, 125, 4, 4
700, 131, 0, 1
8. TREC Contextual Suggestions (2/3)
… and a context
Corresponding to a metropolitan area in the USA,
e.g., 109, Kalamazoo, MI
9. TREC Contextual Suggestions (3/3)
Suggest Web pages / snippets
From the Open Web, or from ClueWeb
700, 109 ,1,"About KIA History Kalamazoo
Institute of Arts KIA History","The Kalamazoo Institute
of Arts is a nonprofit art museum and school. Since , the
institute has offered art classes and free admission
programming, including exhibitions, lectures, events,
activities and a permanent collection. The KIAs mission
is to cultivate the creation and appreciation of the visual
arts for the communities",clueweb12-1811wb-14-09165
10. Approach
For a given location, select candidate web
pages from Clueweb
Rank the candidates based on their cosine-similarity
to the POIs in the user profile
(separated in a positive and a negative profile)
11. Snippet Generation
Generate POI title:
Extract <title> or <header> tags
Generate personalized POI description:
Extract <description> tag
Break documents into sentences, ranked on their
similarity with the user profile
Concatenate until 512 bytes reached
12. Candidate selection
In 2013, the CWI Clueweb based run ranked
far below all other (Open Web) runs
A few issues related to evaluation, see our ECIR
2014 short paper
But, also, the commercial Open Web search
engines (Google, Bing or Yahoo!) return much
better candidates for queries derived from the
context than we did
13. Geo-Filtering
Exact mention of given context
Format: {City, ST} e.g., Miami, FL
Exclude documents that mention multiple
contexts
E.g., a Wikipedia page about cities in Florida state
14. Domain Knowledge (1/2)
Point-of-Interest heuristic:
POIs will be represented on the major tourist
information sites
{yelp, tripadvisor, wikitravel, zagat,
xpedia, orbitz, and travel.yahoo}
Extract the Clueweb documents from these
domains (TouristListFiltered)
E.g., http://www.zagat.com/miami
Expand with the outlinks also contained in
ClueWeb12 (TouristOutlinksFiltered)
15. Domain Knowledge (2/2)
Use Foursquare API to identify the URLs of POIs for
the given context
50 attractions per context
Format: attraction name, URL
e.g., Cortés Restaurant, http://cortesrestaurant.com
Miami, FL
If the POI has no corresponding URL, use Google API with
a query using foursquare POI + context, i.e., “Cortés
Restaurant Miami, FL”
Extract any document from Clueweb whose host
matches the (1,454 unique) hosts of the URLs
identified (AttractionFiltered)
20. Decompose metrics
Ignore geo-relevance:
Geo-relevance only:
The two runs have
almost similar
performance in the
desc and doc
dimensions
TouristFiltered is
more
geographically
appropriate
21. Type of domain knowledge
TouristFiltered consists of three parts:
TouristListFiltered (TLF)
TouristOutlinksFiltered (TOF)
AttractionFiltered (AF)
Foursquare gives the
most significant
improvement in
performance
22. Conclusions
Domain knowledge about sites that are more
likely to offer attractions lead to better
suggestions
The best results were obtained when
identifying attractions through specialized
services such as Foursquare
23. Next Steps
Improve our recommendation algorithm
E.g., weighted candidate selection
Understand the remaining difference with
Open Web based results
Our Clueweb results are reproduceable but not
yet as good