FIRE 2022

Forum for Information Retrieval Evaluation

Indian Statistical Institute, Kolkata

9th-13th December

Retrieval and Generative Approaches to Entity Linking

Nicola Cancedda, Meta, UK

Entity Linking is a fundamental language processing step with many direct applications, but it is also more than that: it is the crucial capability of intelligent agents to ground referring expressions into concepts and objects in an extra-linguistic representation of the world. Modern Entity Linking systems cast the problem as either retrieval or generation, with complementary advantages and disadvantages. In this presentation I will share some challenges faced in developing large-scale entity linkers capable of handling many languages under realistic conditions.


Triggering advances in lifelogging. An Overview of the NTCIR Lifelog campaign

Frank Hopfgartner, Universität Koblenz-Landau, Germany

In recent years, various software and hardware tools have entered the consumer market which enable users to log data about their lives on a continuous basis. Popular examples include self-tracking devices or apps such as Fitbit or Garmin that allow users to keep track of their physical activities or to monitor their biometrics. The process of gathering such multi- modal data from multiple sources is also referred to as lifelogging. Due to the constant stream of data being captured, lifelogging can result in the creation of large personal archives that are too large for manual organization. Consequently, automated approaches to handle such data are needed. However, due to privacy concerns, advances in the field have been limited by the lack of shared test collections. Aiming to promote further research on novel approaches to multi-modal personal data analytics and retrieval, we organized a comparative benchmarking exercise, Lifelog, that ran between 2015 and 2022 as part of the evaluation conference NTCIR. Several Lifelog datasets were released and participants could work on various sub-tasks to tackle different challenges related to Lifelog retrieval. In this keynote presentation, I will give an overview of these sub-tasks and reflect on lessons learned.


Understanding How Dimension Reduction Tools Work

Cynthia Rudin, Duke University, USA

Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMap have demonstrated impressive visualization performance on many real world datasets. They are useful for understanding data and trustworthy decision-making, particularly for biological data. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure: past methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure: it is difficult to design a better method without a true understanding of the choices we make in our algorithms and their empirical impact on the lower-dimensional embeddings they produce. Towards the goal of local structure preservation, we provide several useful design principles for DR loss functions based on our new understanding of the mechanisms behind successful DR methods. Towards the goal of global structure preservation, our analysis illuminates that the choice of which components to preserve is important. We leverage these insights to design a new algorithm for DR, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure. Our work provides several unexpected insights into what design choices both to make and avoid when constructing DR algorithms.


Searching for better representations to transfer NLP models across related languages and dialects

Serge Sharoff, University of Leeds, UK

Some languages have very few NLP resources, while many of them are closely related to better resourced languages. We can explore various methods for exploiting the typological links between related languages. This implies having a representation shared between these languages, so that an NLP model trained a better-resourced language can be applied to lesser-resourced languages. In particular, it is possible to build a cross-lingual word embedding representation shared across related languages by combining alignment methods with a lexical similarity measure which is based on the Weighted Levenshtein Distance WLD. As word embeddings have been largely replaced with contextual embedding models such as BERT, the next step is to combine WLD with the multilingual alignment obtained through weight sharing.


Towards better evaluation of open domain dialogue

Yvette Graham, Trinity College, Ireland

Evaluation of open-domain dialogue systems is highly challenging, and development of better techniques is highlighted time and again as desperately needed. Despite substantial efforts to carry out reliable live evaluation of systems in recent competitions, annotations have been abandoned and reported as too unreliable to yield sensible results. This is a serious problem since automatic metrics are known to provide no real indication of what may or may not be a high-quality conversation. Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, this talk presents the successful development of human evaluation that is highly reliable while still remaining feasible and low cost. Self-replication experiments reveal almost perfectly repeatable results with a correlation of r = 0.969. Due to the lack of appropriate methods of statistical significance testing, the likelihood of potential improvements to systems occurring due to chance is rarely taken into account in dialogue evaluation, and the evaluation presented facilitates application of standard tests. Highly reliable evaluation methods then provide new insight into system performance and this talk includes a comparison of state-of-the-art models (i) with and without personas, to measure the contribution of personas to conversation quality, as well as (ii) prescribed versus freely chosen topics. Interestingly with respect to personas, results indicate that personas do not positively contribute to conversation quality, a surprising result that will hopefully inspire discussion within the dialogue community.


Domain-Specific Knowledge Graphs - Tasks and Challenges for Next Generation Information Systems

Ralf Krestel, ZBW - Leibniz Information Centre for Economics & Kiel University, Germany

Knowledge graphs (KGs) are more and more replacing traditional information systems, such as relational database systems. Due to the high diversity of application areas, a one-size-fits-all knowledge graph is not (yet) on the horizon. But domain-specific solutions are showing good results in information retrieval, closed-domain question answering, health applications, customer support systems, and in the fashion and legal domain. In this talk, we not only want to explore the challenges and pitfalls of creating a domain-specific knowledge graph for the art domain, but also shed some light on general issues related to knowledge graphs, from named entity recognition to relation extraction and KG embeddings.


Re-Thinking Re-Ranking

Sean MacAvaney, University of Glasgow, UK

Re-Ranking pipelines, wherein an initial pool of documents are re-scored by a more expensive (but more accurate) retrieval function, are commonplace in search systems. A common critique of this "telescoping" approach is that documents missed by the first stage are lost. But does this need to be the case? In this talk, I cover a family of smarter re-ranking algorithms that efficiently pull in new documents to score from the corpus that are similar to the best ones seen so far. Remarkably, the approach exceeds the performance of expensive dense retrieval approaches when using only lexical signals.


Bibliometric-enhanced Information Retrieval: Connecting IR, NLP with Scientometrics

Philipp Mayr-Schlegel, GESIS - Leibniz-Institute for the Social Sciences, Germany

The Bibliometric-enhanced Information Retrieval workshop series (BIR) tackles issues related to academic search, at the crossroads between Information Retrieval, Natural Language Processing and Scientometrics. Searching for scientific information is a long-lived information need. In the early 1960s, Salton was already striving to enhance information retrieval by including clues inferred from bibliographic citations. The development of citation indexes pioneered by Garfield proved determinant for such a research endeavour at the crossroads between the nascent fields of Information Retrieval (IR) and Bibliometrics. The pioneers who established these fields in Information Science—such as Salton and Garfield—were followed by scientists who specialised in one of these, leading to the two loosely connected fields we know of today. The purpose of the BIR workshop series founded in 2014 is to tighten up the link between IR and Bibliometrics. We strive to get the ‘retrievalists’ and ‘citationists’ active in both academia and the industry together, who are developing search engines and recommender systems such as ArnetMiner, CiteSeerX, Google Scholar, Microsoft Academic Search, and Semantic Scholar, just to name a few. Bibliometric-enhanced IR systems must deal with the multifaceted nature of scientific information by searching for or recommending academic papers, patents, venues (i.e., conferences or journals), authors, experts (e.g., peer reviewers), references (to be cited to support an argument), and datasets. The underlying models harness relevance signals from keywords provided by authors, topics extracted from the full-texts, co-authorship networks, citation networks, and various classifications schemes of science. The presentation at FIRE 2022 will reflect on the achievements of the BIR workshop series and present selected studies from the multiple workshops and special issues on BIR (Cabanac et al.,2020a;b; Mayr et al., 2018; Mayr & Scharnhorst, 2015).


Search Results Diversification for Effective Fair Ranking

Graham McDonald, University of Glasgow, UK

Providing users with relevant search results has traditionally been the primary focus of information retrieval research. However, focusing on relevance alone can lead to undesirable side effects. For example, small differences in the documents’ predicted relevance scores can result in large differences in the exposure that the relevant documents receive, i.e., the likelihood that the documents will be seen by searchers. Therefore, developing fair ranking techniques to try to ensure that search results are not dominated by, for example, certain information sources is of growing interest, to mitigate potential exposure bias. In this talk, I will discuss some of our work on casting the fair ranking problem as a search results diversification task across a number of fairness groups, where groups represent demographics or characteristics that we wish to be fair to. Our work shows that leveraging search results diversification can be an effective strategy for increasing the fairness of exposure that groups receive. However, diversification does not by itself provide an out-of-the-box solution to the fair ranking problem.


Responsible Conversational AI: Trusted, Safe, and Bias-free

Verena Rieser, Heriot Watt University, UK

With continued progress in deep learning, there is increasing activity in learning dialogue behaviour from data, also known as “Conversational AI”. In this talk I will survey my research showing that in order to responsibly apply deep language generation in the context of user-facing Conversational AI, we need to ensure that their outputs can be trusted and are safe to use, and that their design is bias-free. I will highlight several methods developed in my research team starting to address these issues -- including reducing `hallucinations’ for task-based systems, safety critical issues for open-domain chatbots, and the often-overlooked problem of anthropomorphic design. I will conclude with the lessons learnt and upcoming challenges.