LRE - Laboratoire de Recherche de l'ÉPITA

A benchmark for toxic comment classification on civil comments dataset

Corentin Duchêne · Henri Jamet · Pierre Guillaume · Réda Dehak

Actes de l’atelier gestion et analyse des données spatiales et temporelles

Nidà Meddouri · Aurélie Leborgne · Loic Salmon

Peripheral nervous system responses to food stimuli: Analysis using data science approaches

Maelle Moranges · Marc Plantevit · Moustafa Bensafi

In the field of food, as in other fields, the measurement of emotional responses to food and their sensory properties is a major challenge. In the present protocol, we propose a step-by-step procedure that allows a physiological description of odors, aromas, and their hedonic properties. The method rooted in subgroup discovery belongs to the field of data science and especially data mining. It is still little used in the field of food and is based on a descriptive modeling of emotions on the basis of human physiological responses.

CosySEL: Improving SAT solving using local symmetries

S. Saouli · Souheib Baarir · C. Dutheillet · J. Devriendt

Many satisfiability problems exhibit symmetry properties. Thus, the development of symmetry exploitation techniques seems a natural way to try to improve the efficiency of solvers by preventing them from exploring isomorphic parts of the search space. These techniques can be classified into two categories: dynamic and static symmetry breaking. Static approaches have often appeared to be more effective than dynamic ones. But although these approaches can be considered as complementary, very few works have tried to combine them. In this paper, we present a new tool, CosySEL, that implements a composition of the static Effective Symmetry Breaking Predicates (esbp) technique with the dynamic Symmetric Explanation Learning (sel). esbp exploits symmetries to prune the search tree and sel uses symmetries to speed up the tree traversal. These two accelerations are complementary and their combination was made possible by the introduction of Local symmetries. We conduct our experiments on instances issued from the last ten sat competitions and the results show that our tool outperforms the existing tools on highly symmetrical problems.

Heart segmentation and evaluation of fibrosis

Zhou Zhao

Introduction to the special issue on distributed hybrid systems

Alessandro Abate · Uli Fahrenberg · Martin Fränzle

This special issue contains seven papers within the broad subject of Distributed Hybrid Systems, that is, systems combining hybrid discrete-continuous state spaces with elements of concurrency and logical or spatial distribution. It follows up on several workshops on the same theme which were held between 2017 and 2019 and organized by the editors of this volume. The first of these workshops was held in Aalborg, Denmark, in August 2017 and associated with the MFCS conference. It featured invited talks by Alessandro Abate, Martin Fränzle, Kim G. Larsen, Martin Raussen, and Rafael Wisniewski. The second workshop was held in Palaiseau, France, in July 2018, with invited talks by Luc Jaulin, Thao Dang, Lisbeth Fajstrup, Emmanuel Ledinot, and André Platzer. The third workshop was held in Amsterdam, The Netherlands, in August 2019, associated with the CONCUR conference. It featured a special theme on distributed robotics and had invited talks by Majid Zamani, Hervé de Forges, and Xavier Urbain. The vision and purpose of the DHS workshops was to connect researchers working in real-time systems, hybrid systems, control theory, formal verification, distributed computing, and concurrency theory, in order to advance the subject of distributed hybrid systems. Such systems are abundant and often safety-critical, but ensuring their correct functioning can in general be challenging. The investigation of their dynamics by analysis tools from the aforementioned domains remains fragmentary, providing the rationale behind the workshops: it was conceived that convergence and interaction of theories, methods, and tools from these different areas was needed in order to advance the subject.

Learning diversity attributes in multi-session recommendations

Nassim Bouarour · Idir Benouaret · Sihem Amer-Yahia

Diversity in recommendation has been studied extensively. It has been shown that maximizing diversity subject to constrained relevance yields high user engagement over time. Existing work largely relies on setting some attributes that are used to craft an item similarity function and diversify results. In this paper, we examine the question of learning diversity attributes. That is particularly important when users receive recommendations over multiple sessions. We devise two main approaches to look for the best diversity attribute in each session: the first is a generalization of traditional diversity algorithms and the second is based on reinforcement learning. We implement both approaches and run extensive experiments on a semi-synthetic dataset. Our results demonstrate that learning diversity attributes yields a higher overall diversity than traditional diversity algorithms. We also find that training policies using reinforcement learning is more efficient in terms of response time, in particular for high dimensional data.

Trie-based output itemset sampling

Lamine Diop · Cheikh Talibouya Diop · Arnaud Giacometti · Dominique Li · Arnaud Soulet

Pattern sampling algorithms produce interesting patterns with a probability proportional to a given utility measure. Utility changes need quick re-preprocessing when sampling patterns from large databases. In this context, existing sampling techniques require storing all data in memory, which is costly. To tackle these issues, this work enriches D. Knuth’s trie structure, avoiding 1) the need to access the database to sample since patterns are drawn directly from the enriched trie and 2) the necessity to reprocess the whole dataset when the utility changes. We define the trie of occurrences that our first algorithm TPSpace (Trie-based Pattern Space) uses to materialize all of the database patterns. Factorizing transaction prefixes compresses the transactional database. TPSampling (Trie-based Pattern Sampling), our second algorithm, draws patterns from a trie of occurrences under a length-based utility measure. Experiments show that TPSampling produces thousands of patterns in seconds.

The cost of dynamism in static languages for image processing

Baptiste Esteban · Edwin Carlinet · Guillaume Tochon · Didier Verna

Generic programming is a powerful paradigm abstracting data structures and algorithms to improve their reusability, as long as they respect a given interface. Coupled with a performance-driven language, it is a paradigm of choice for scientific libraries where the implementation of manipulated objects may change depending on their use case, or for performance purposes. In those performance-driven languages, genericity is often implemented statically to perform some optimization. This does not fit well with the dynamism needed to handle objects which may only be known at runtime. Thus, in this article, we evaluate a model that couples static genericity with a dynamic model based on type erasure in the context of image processing. Its cost is assessed by comparing the performance of the implementation of some common image processing algorithms in C++ and Rust, two performance-driven languages supporting some form of genericity. Finally, we demonstrate that compile-time knowledge of some specific information is critical for performance, and also that the runtime overhead depends on the algorithmic scheme in use.

Higher-dimensional timed and hybrid automata

Uli Fahrenberg

We introduce a new formalism of higher-dimensional timed automata, based on Pratt and van Glabbeek’s higher-dimensional automata and Alur and Dill’s timed automata. We prove that their reachability is PSPACE-complete and can be decided using zone-based algorithms. We also extend the setting to higher-dimensional hybrid automata. The interest of our formalism is in modeling systems which exhibit both real-time behavior and concurrency. Other existing formalisms for real-time modeling identify concurrency and interleaving, which, as we shall argue, is problematic.