F ORUM FOR I NFORMATION R ETRIEVAL E VALUATION

( FIRE 2016 )

ISI, Kolkata

7 - 10 December

Anatomy of Search Engine Performances
by Nicola Ferro, University of Padua, Italy

Component-based evaluation, i.e. the ability of assessing the impact of the different components in the pipeline of an Information Retrieval (IR) system and understanding their interaction, is a long-standing challenge, as early pointed out by Robertson in 1982:"if we want to decide between alternative indexing strategies for example, we must use these strategies as part of a complete information retrieval system, and examine its overall performance (with each of the alternatives) directly”. We propose a methodology, based on General Linear Mixed Model (GLMM) and ANalysis Of VAriance (ANOVA) to address this issue and to estimate the effects of the different components of an IR system, thus giving us better insights on what system variance and system effects are. In particular, the proposed methodology allows us to break-down the system eect into the contributions of stops lists, stemmers or n-grams and IR models, as well as to study their interaction.
Today's NLP is highly machine learning oriented. Huge volumes of text in electronic form are available for processing, and it is hoped that the data will reveal the underlying regularities. One typically postulates a distribution, applies maximum likelihood or maximum entropy methods to find parameters, and uses the learnt distribution to predict linguistic phenomena. In this picture of NLP, linguistics seems to have no role. But as is the observation made repeatedly, accuracy values of prediction are reaching saturation. Since learning algorithms need features, insight into language is crucial for better feature engineering. In the current talk we will look at NLP tasks at various levels of processing and show how computation transitions gradually from being highly data driven to knowledge driven. A development, symptomatic of NLP-ML synergy/discord, called Deep Learning provides hope for large scale language processing, but grapples with the fundamental question of what word embeddings are after all. We will take a number of applications to bring out the need for better synergy between ML and linguistics for high performance NLP systems.

Text Mining and its Evaluation: Finding the Perspective of the Domain Expert
by Thomas Mandl, Information Science,University of Hildesheim

The evaluation of Information Retrieval Services has always demanded more attention when systems were intended for the use by experts. Search is increasingly seen as a commodity which is taken for granted and systems need to support complex work tasks. For example, mining for relationships andtrends as based on advanced text mining technologies. Especially in the technical domain, their evaluation requires expert input, which is hard and expensive to obtain. Approaches of evaluatingpatent retrieval at CLEF-IP and patent mining at NTCIR are reviewed. The presentation points out some shortcomings of the methods applied so far and will discuss future options.

Using semantic information to improve IR systems
by Paulo Quaresma, Universidade de Evora, Portugal

In order to improve existent information retrieval systems there is a need to take into account the semantic information conveyed in the texts. In this talk different approaches to this problem will be discussed aiming to identify the main issues and some possible solutions. A special focus on a deep linguistic based approach developed in the Computer Science Department of the University of Évora, Portugal, will be done. In this approach sentences are parsed and represented by DRS - Discourse Representation Structures. Then, these structures are transformed to graphs and distance metrics between these graphs are calculated. The overall idea behind this approach is that graph distance metrics are good ways of modelling the semantic distance between sentences. This approach was already applied with promising results to several NLP tasks, such as, text IR systems, text classification, and sentence similarity, and, in the talk, some of the obtained results will be presented.
Web search has been the focus of research for decades (actually just two decades really). There are however many more areas of IR that offer challenging problems, perhaps with less of a chance for fame and (citation) fortune but also less competition from thousands of highly paid engineers. Site search, for example, has attracted much less attention but is equally challenging. In fact, what makes site search (as well as intranet and enterprise search) even more interesting is that it shares some common problems with general Web search but also offers a good number of additional problems that need to be addressed in order to make search on a Web site no longer a waste of time. Similarly, providing search support appears to be much more the focus of attention than helping to explore a document collection. Finally, personalising the search experience has shown to offer huge potential but treating individual users as part of a cohort is interesting too and less commonly explored.

I will illustrate how the access log files collected on a Web site can be used to do all that and -- as a bonus -- will provide a practical use case for combining natural language processing and information retrieval techniques.
Commercial providers of information access systems (such as Amazon or Google) usually evaluate the performance of their algorithms by observing how many of their customers interact with different instances of their services. Unfortunately, due to the lack of access to users, university­-based research is struggling to catch up with this large­-scale evaluation methodology. In this talk, I will introduce the news recommendation evaluation lab NewsREEL, a benchmarking campaign that aims to address this growing "evaluation gap" between academia and industry. NewsREEL is the first instance of a living lab in which researchers gain access to the infrastructure and user base of a information access service provider to evaluate their algorithms. Moreover, I will highlight similar evaluation campaigns that have recently been introduced.

Deep Learning for Information Retrieval: Models, Progress, and Opportunities
by Matt Lease, School of Information, University of Texas at Austin

A "third wave" of Neural Network (NN) approaches, popularly referred to as "deep learning", has swept over speech recognition, computer vision, and natural language processing, leaving new standards of machine learning performance behind in its wake. This deep learning wave has recently begun to swell in Information Retrieval (IR) as well, and while Neural IR has yet to achieve the level of success deep learning has achieved in other areas, the recent surge of interest and work on Neural IR suggest that this state of affairs may be quickly changing. In this talk, I will survey the past and present landscape of Neural IR research, paying special attention to the use of learned representations of queries and documents (i.e., neural embeddings). I will highlight the successes of Neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research. For further reading, please see this very recent paper: https://arxiv.org/abs/1611.06792.
Individual tasks carried out within benchmarking initiatives enable direct comparison of alternative approaches to tackling shared research challenges and ideally promote new research ideas and foster communities of researchers interested in common or related scientific topics. When a task has a clear predefined use case, it might straightforwardly adopt a well established framework and methodology. For example, an ad hoc information retrieval task adopting the standard Cranfield paradigm. On the other hand, in cases of new and emerging tasks which pose more complex challenges in terms of use scenarios or dataset design, the development of a new task is far from a straightforward process. This presentation summarises reflections on the experiences of the task organisers of the MediaEval Search and Hyperlinking task from its origins and evolutionas a task at the MediaEval benchmarking campaign (2011--2014) to its current instantiation as a task at the NIST TRECVid benchmark (since 2015). The introduces the motivations and detailsof the task and highlights the challenges encountered in its development over a number of annual iterations, the solutions found so far, and the maintenance of a vision for the ongoing advancement of the task's ambition.
The Knowledge Base Population (KBP) track started in 2009 at NIST'sText Analysis Conference (TAC) with the goal of promoting thedevelopment of systems that can populate a knowledge base withstructured information about entities as found in a collection ofunstructured text. Building a complex structure such as a knowledgebase poses challenges not just for system developers but also forevaluation organizers. This presentation describes the TAC approachto developing open community evaluations for Knowledge BasePopulation, highlights some lessons learned since KBP was firstintroduced, and outlines possible future directions for KBP at TAC.

Text Quantification: Current Research and Future Challenges
by Fabrizio Sebastiani, Qatar Computing Research Institute, Qatar Foundation

In recent years it has been pointed out that, in a number of applications involving text classification, the final goal is not determining which class (or classes) individual unlabelled instances belong to, but estimating the relative frequency (or “prevalence”) of each class in the unlabelled data. The task of estimating class prevalence via supervised learning is known as “quantification”. While still a little-known task, quantification has lately seen increased interest, due to its key role in areas such as market research, social science, political science, and epidemiology. Performing quantification by classifying each unlabelled instance and then counting the instances that have been attributed the class (the ``classify and count'' method) usually leads to suboptimal quantification accuracy; as a result, developing methods (and evaluation measures) that address quantification as a task in its own right is an important research goal. In this talk I will discuss some recent (and still ongoing) research efforts in quantification, and outline challenges that lie ahead.

Contextual Intelligence to enrich Enterprise Business Intelligence
by Lipika Dey, Principal Scientist, TCS Innovation Lab

In this talk we will explore how contextual intelligence gathered from internal and external sources are changing traditional BI concepts within an Enterprise. We will be discussing a framework for doing the same with a machine learning platform to perform advanced text analytics.