FIRE 2015

Forum for Information Retrieval Evaluation

DAIICT , Gandhinagar

4 - 6 December

Many of the world’s languages are still disadvantaged and under-resourced in that they are not fully researched and lack the basic tools that would make it possible for people to use their mother tongue in education and other democratic arenas, and to use computers or access the Internet in their own language. The talk will describe hands-on experiences with efficiently creating computational linguistic resources for under-resourced languages, drawing examples from projects on harvesting big Web corpora and developing language processing resources, tools and techniques for the languages of Ethiopia.
Progress in information retrieval requires us to quantify the performance of search engines and other information access systems. In this talk, I will motivate and describe a novel evaluation framework for information retrieval evaluation, called time-biased gain. Time-biased gain unifies and generalizes many traditional effectiveness measures (e.g., NDCG) while accommodating aspects of user behavior not captured by these measures. By using time as a basis for calibration against actual user data, time-biased gain can reflect aspects of the search process that directly impact user experience. I will present an instantiation of time-biased gain, applicable to systems where the user judges the quality of their experience by the amount of time well spent. Rather than the single number produced by traditional effectiveness measures, time-biased gain models user variability and produces a distribution of gain on a per-query basis, allowing us to accommodate different types of user behavior and increasing the realism of the results.
If the goal of a search engine is to make the user happy, then it matters a great deal who that user is. Over the last decade, we have been working with lawyers to learn what it takes to make them happy when they are searching for evidence in civil litigation, a problem known as e-discovery. Unlike Web searchers, lawyers routinely seek high recall, which drives us to think a bit differently about evaluation. Unlike many other search problems with which we have experience, in e-discovery lawyers have both things they want to find and things they want not to be found (so-called “privileged content”). That too drives us to think a bit differently about evaluation. Unlike some of our users, lawyers are not content just to know that we have made our best effort – they want to know just how well we have done. That also drives us to think a bit differently about evaluation. Unlike many of our users, lawyers are sometimes willing to invest hundreds of hours in a single search. Evan that makes us think a little bit differently about evaluation. We still don’t understand lawyers completely, but we now do know enough about evaluation to be able to tell when we have made them happy. In this talk I will tell you what we have learned about that.
The early years of research on semantics annotations focused almost exclusively on the ``supply'' end: how to turn unstructured information into machine readable ``semantically'' enriched data by relying on modern Web languages, user tagging and annotation, emerging robust NLP tools, and an ever growing volume of linked data. Recent years have seen an increasing focus on the ``demand'' end: what types of annotation and structure are actually useful, and for what sort of users, applications and use cases? This talk will give an overview of some of the current developments and directions, and some of the challenges and barriers to success ahead.
The ubiquitous use of social media is presenting interesting opportunities and challenges. Users express their feelings and opinions towards different topics and issues, allowing for interesting social studies pertaining to people’s attitudes, feelings, and behaviors. Many recent studies have looked at applying retrieval, classification, and quantification methods to enable large-scale social studies. Such studies can overcome the need for laborious surveys and field studies. This talk will show some of examples of such studies that highlight the use of retrieval and classification techniques to learn user leanings and to conduct predictive studies. The talk will also address some of the underlying technologies and challenges involved in this work including NLP on informal language, online classification, automatic discovery of polarization, geo-tagging, and information verification.
In order to improve existent information retrieval systems there is a need to take into account the semantic information conveyed in the texts. In this talk different approaches to this problem will be discussed aiming to identify the main issues and some possible solutions. A special focus on a deep linguistic based approach developed in the Computer Science Department of the University of Évora, Portugal, will be done. In this approach sentences are parsed and represented by DRS - Discourse Representation Structures. Then, these structures are transformed to graphs and distance metrics between these graphs are calculated. The overall idea behind this approach is that graph distance metrics are good ways of modelling the semantic distance between sentences. This approach was already applied with promising results to several NLP tasks, such as, text IR systems, text classification, and sentence similarity, and, in the talk, some of the obtained results will be presented.
In this talk, we will look at what it means to be doing Big Data research in industry. Why is Big Data critical for success of companies and also when it is not needed. We will look at a selection of problems from the e-commerce space that are well served by standard machine learning constructs. We will take a breadth first view of around a dozen problems from e-commerce, some of which are applicable to the larger area of consumer internet. Time permitting, we will also discuss the new emerging field of Deep learning (aka Deep Neural Networks) and their applicability to Big Data.
Modern organizations collect many different kinds of data about the activities, work and events related to their employees. HR analytics attempts to discover novel and actionable insights and patterns from such employees related historical databases and document repositories for solving HR-specific problems and to meet HR business goals. In this talk I will give a short overview of some of the research initiatives in TCS related to HR analytics.