Publications

Type-checking of heterogeneous sequences in Common Lisp

Jim Newton · Akim Demaille · Didier Verna

We introduce the abstract concept of rational type expression and show its relationship to rational language theory. We further present a concrete syntax, regular type expression, and a Common Lisp implementation thereof which allows the programmer to declaratively express the types of heterogeneous sequences in a way which is natural in the Common Lisp language. The implementation uses techniques well known and well founded in rational language theory, in particular the use of the Brzozowski derivative and deterministic automata to reach a solution which can match a sequence in linear time. We illustrate the concept with several motivating examples, and finally explain many details of its implementation.

Towards the rectification of highly distorted texts

Stefania Calarasanu · Séverine Dubuisson · Jonathan Fabrizio

A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images to which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images, but despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. Indeed, most OCRs are not designed to handle text strings in perspective but rather expect horizontal texts in a parallel-frontal plane to provide a correct transcription. In this paper, we propose a rectification procedure that can correct highly distorted texts, subject to rotation, shearing and perspective deformations. The method is based on an accurate estimation of the quadrangle bounding the deformed text in order to compute a homography to transform this quadrangle (and its content) into a horizontal rectangle. The rectification is validated on the dataset proposed during the ICDAR 2015 Competition on Scene Text Rectification.

What is a good evaluation protocol for text localization systems? Concerns, arguments, comparisons and solutions

Stefania Calarasanu · Jonathan Fabrizio · Séverine Dubuisson

A trustworthy protocol is essential to evaluate a text detection algorithm in order to, first measure its efficiency and adjust its parameters and, second to compare its performances with those of other algorithms. However, current protocols do not give precise enough evaluations because they use coarse evaluation metrics, and deal with inconsistent matchings between the output of detection algorithms and the ground truth, both often limited to rectangular shapes. In this paper, we propose a new evaluation protocol, named EvaLTex, that solves some of the current problems associated with classical metrics and matching strategies. Our system deals with different kinds of annotations and detection shapes. It also considers different kinds of granularity between detections and ground truth objects and hence provides more realistic and accurate evaluation measures. We use this protocol to evaluate text detection algorithms and highlight some key examples that show that the provided scores are more relevant than those of currently used evaluation protocols.

TextCatcher: A method to detect curved and challenging text in natural scenes

Jonathan Fabrizio · Myriam Robert-Seidowsky · Séverine Dubuisson · Stefania Calarasanu · Raphaël Boissel

In this paper, we propose a text detection algorithm which is hybrid and multi-scale. First, it relies on a connected component-based approach: After the segmentation of the image, a classification step using a new wavelet descriptor spots the letters. A new graph modeling and its traversal procedure allow to form candidate text areas. Second, a texture-based approach discards the false positives. Finally, the detected text areas are precisely cut out and a new binarization step is introduced. The main advantage of our method is that few assumptions are put forward. Thus, “challenging texts” like multi-sized, multi-colored, multi-oriented or curved text can be localized. The efficiency of TextCatcher has been validated on three different datasets: Two come from the ICDAR competition, and the third one contains photographs we have taken with various daily life texts. We present both qualitative and quantitative results.

Efficient dynamic type checking of heterogeneous sequences

Jim Newton

This report provides detailed background of our development of the rational type expression, concrete syntax, regular type expression, and a Common Lisp implementation which allows the programmer to declarative express the types of heterogeneous sequences in a way which is natural in the Common Lisp language. We present a brief theoretical background in rational language theory, which facilitates the development of rational type expressions, in particular the use of the Brzozowski derivative and deterministic automata to arrive at a solution which can match a sequence in linear time. We illustrate the concept with several motivating examples, and finally explain many details of its implementation.

Improvement of a text detection chain and the proposition of a new evaluation protocol for text detection algorithms

Stefania Calarasanu

The objective of this thesis is twofold. On one hand it targets the proposition of a more accurate evaluation protocol designed for text detection systems that solves some of the existing problems in this area. On the other hand, it focuses on the design of a text rectification procedure used for the correction of highly deformed texts. Text detection systems have gained a significant importance during the last years. The growing number of approaches proposed in the literature requires a rigorous performance evaluation and ranking. In the context of text detection, an evaluation protocol relies on three elements: a reliable text reference, a matching set of rules deciding the relationship between the ground truth and the detections and finally a set of metrics that produce intuitive scores. The few existing evaluation protocols often lack accuracy either due to inconsistent matching procedures that provide unfair scores or due to unrepresentative metrics. Despite these issues, until today, researchers continue to use these protocols to evaluate their work. In this Ph.D thesis we propose a new evaluation protocol for text detection algorithms that tackles most of the drawbacks faced by currently used evaluation methods. This work is focused on three main contributions: firstly, we introduce a complex text reference representation that does not constrain text detectors to adopt a specific detection granularity level or annotation representation; secondly, we propose a set of matching rules capable of evaluating any type of scenario that can occur between a text reference and a detection; and finally we show how we can analyze a set of detection results, not only through a set of metrics, but also through an intuitive visual representation. We use this protocol to evaluate different text detectors and then compare the results with those provided by alternative evaluation methods. A frequent challenge for many Text Understanding Systems is to tackle the variety of text characteristics in born-digital and natural scene images to which current OCRs are not well adapted. For example, texts in perspective are frequently present in real-word images because the camera capture angle is not normal to the plane containing text regions. Despite the ability of some detectors to accurately localize such text objects, the recognition stage fails most of the time. Indeed, most OCRs are not designed to handle text strings in perspective but rather expect horizontal texts in a parallel-frontal plane to provide a correct transcription. All these aspects, together with the proposition of a very challenging dataset, motivated us to propose a rectification procedure capable of correcting highly distorted texts.

MToS: A tree of shapes for multivariate images

Edwin Carlinet · Thierry Géraud

The Tree of Shapes (ToS) is a morphological tree that provides an high-level hierarchical representation of the image suitable for many image processing tasks. When dealing with color images, one cannot use the ToS because its definition is ill-formed on multivariate data. Common workarounds such as marginal processing, or imposing a total order on data are not satisfactory and yield many problems (color artifacts, loss of invariances...) In this paper, we highlight the need for a self-dual and contrast invariant representation of the image and provide a method that builds a single ToS by merging the shapes computed marginally and preserving the most important properties of the ToS. This method does not try to impose an arbitrary total ordering on values but uses only the inclusion relationship between shapes and the merging strategy works in a shape space. Eventually, we show the relevance of our method and our structure through several applications involving color and multispectral image analysis.

SAT-based minimization of deterministic $\omega$-automata

Souheib Baarir · Alexandre Duret-Lutz

We describe a tool that inputs a deterministic $\omega$-automaton with any acceptance condition, and synthesizes an equivalent $\omega$-automaton with another arbitrary acceptance condition and a given number of states, if such an automaton exists. This tool, that relies on a SAT-based encoding of the problem, can be used to provide minimal $\omega$-automata equivalent to given properties, for different acceptance conditions.

A tree of shapes for multivariate images

Edwin Carlinet

Nowadays, the demand for multi-scale and region-based analysis in many computer vision and pattern recognition applications is obvious. No one would consider a pixelbased approach as a good candidate to solve such problems. To meet this need, the Mathematical Morphology (MM) framework has supplied region-based hierarchical representations of images such as the Tree of Shapes (ToS). The ToS represents the image in terms of a tree of the inclusion of its level-lines. The ToS is thus self-dual and contrastchange invariant which make it well-adapted for high-level image processing. Yet, it is only defined on grayscale images and most attempts to extend it on multivariate images - e.g. by imposing an “arbitrary” total ordering - are not satisfactory. In this dissertation, we present the Multivariate Tree of Shapes (MToS) as a novel approach to extend the grayscale ToS on multivariate images. This representation is a mix of the ToS’s computed marginally on each channel of the image; it aims at merging the marginal shapes in a “sensible” way by preserving the maximum number of inclusion. The method proposed has theoretical foundations expressing the ToS in terms of a topographic map of the curvilinear total variation computed from the image border; which has allowed its extension on multivariate data. In addition, the MToS features similar properties as the grayscale ToS, the most important one being its invariance to any marginal change of contrast and any marginal inversion of contrast (a somewhat “self-duality” in the multidimensional case). As the need for efficient image processing techniques is obvious regarding the larger and larger amount of data to process, we propose an efficient algorithm that can build the MToS in quasi-linear time w.r.t. the number of pixels and quadratic w.r.t. the number of channels. We also propose tree-based processing algorithms to demonstrate in practice, that the MToS is a versatile, easy-to-use, and efficient structure. Eventually, to validate the soundness of our approach, we propose some experiments testing the robustness of the structure to non-relevant components (e.g. with noise or with low dynamics) and we show that such defaults do not affect the overall structure of the MToS. In addition, we propose many real-case applications using the MToS. Many of them are just a slight modification of methods employing the “regular” ToS and adapted to our new structure. For example, we successfully use the MToS for image filtering, image simplification, image segmentation, image classification and object detection. From these applications, we show that the MToS generally outperforms its ToS-based counterpart, demonstrating the potential of our approach.

Morphological object picking based on the color tree of shapes

Edwin Carlinet · Thierry Géraud

The Tree of Shapes is a self-dual and contrast invariant morphological tree that provides a high-level hierarchical representation of images, suitable for many image processing tasks. Despite its powerfulness and its simplicity, it is still under-exploited in pattern recognition and computer vision. In this paper, we show that both interactive and automatic image segmentation can be achieved with some simple tree processings. To that aim, we rely on the “Color Tree of Shapes”, recently defined. We propose a method for interactive segmentation that does not involve any statistical learning, yet yielding results that compete with state-of-the-art approaches. We further extend this algorithm to unsupervised segmentation and give some results. Although they are preliminary, they highlight the potential of such an approach that works in the shape space.