Publications

Use of machine learning and infrared spectra for rheological characterization and application to the apricot

Xavier F. Cadet · Ophélie Lo-Thong · Sylvie Bureau · Réda Dehak · Miloud Bessafi

State-of-the-art speaker recognition for telephone and video speech: The JHU-MIT submission for NIST SRE18

Jesús Villalba · Nanxin Chen · David Snyder · Daniel Garcia-Romero · Alan McCree · Gregory Sell · Jonas Borgstrom · Fred Richardson · Suwon Shon · François Grondin · Réda Dehak · Leibny Paola García-Perera · Daniel Povey · Pedro A. Torres-Carrasquillo · Sanjeev Khudanpur · Najim Dehak

Taking into account inclusion and adjacency information in morphological hierarchical representations, with application to the extraction of text in natural images and videos.

Lê Duy Huỳnh

The inclusion and adjacency relationship between image regions usually carry contextual information. The later is widely used since it tells how regions are arranged in images. The former is usually not taken into account although it parallels the object-background relationship. The mathematical morphology framework provides several hierarchical image representations. They include the Tree of Shapes (ToS), which encodes the inclusion of level-line, and the hierarchies of segmentation (e.g., alpha-tree, BPT), which is useful in the analysis of the adjacency relationship. In this work, we take advantage of both inclusion and adjacency information in these representations for computer vision applications. We introduce the spatial alignment graph w.r.t inclusion that is constructed by adding a new adjacency relationship to nodes of the ToS. In a simple ToS such as our Tree of Shapes of Laplacian sign, which encodes the inclusion of Morphological Laplacian 0-crossings, the graph is reduced to a disconnected graph where each connected component is a semantic group. In other cases, e.g., classic ToS, the spatial alignment graph is more complex. To address this issue, we expand the shape-spaces morphology. Our expansion has two primary results: 1)It allows the manipulation of any graph of shapes. 2)It allows any tree filtering strategy proposed by the connected operators frameworks. With this expansion, the spatial graph could be analyzed with the help of an alpha-tree. We demonstrated the application aspect of our method in the application of text detection. The experiment results show the efficiency and effectiveness of our methods, which is appealing to mobile applications.

High throughput automated detection of axial malformations in fish embryo

Diane Genest · Puybareau · Jean Cousty · Marc Leonard · Hugues Talbot · Noemie De Croze

Fish embryo models are widely used as screening tools to assess the efficacy and /or toxicity of chemicals. This assessment involves analysing embryo morphological abnormalities. In this article, we propose a multi-scale pipeline to allow automated classification of fish embryos (Medaka: Oryzias latipes) based on the presence or absence of spine malformations. The proposed pipeline relies on the acquisition of fish embryo 2D images, on feature extraction due to mathematical morphology operators and on machine learning classification. After image acquisition, segmentation tools are used to focus on the embryo before analysing several morphological features. An approach based on machine learning is then applied to these features to automatically classify embryos according to the detection of axial malformations. We built and validated our learning model on 1,459 images with a 10-fold cross- validation by comparison with the gold standard of 3D observations performed under a microscope by a trained operator. Our pipeline results in correct classification in 85% of the cases included in the database. This percentage is similar to the percentage of success of a trained human operator working on 2D images. Indeed, most of the errors are due to the inherent limitations of 2D images compared to 3D observations. The key benefit of our approach is the low computational cost of our image analysis pipeline, which guarantees optimal throughput analysis.

Representing and computing with types in dynamically typed languages

Jim Newton

In this report, we present code generation techniques related to run-time type checking of heterogeneous sequences. Traditional regular expressions can be used to recognize well defined sets of character strings called rational languages or sometimes regular languages. Newton et al. present an extension whereby a dynamic programming language may recognize a well defined set of heterogeneous sequences, such as lists and vectors. As with the analogous string matching regular expression theory, matching these regular type expressions can also be achieved by using a finite state machine (deterministic finite automata, DFA). Constructing such a DFA can be time consuming. The approach we chose, uses meta-programming to intervene at compile-time, generating efficient functions specific to each DFA, and allowing the compiler to further optimize the functions if possible. The functions are made available for use at run-time. Without this use of meta-programming, the program might otherwise be forced to construct the DFA at run-time. The excessively high cost of such a construction would likely far outweigh the time needed to match a string against the expression. Our technique involves hooking into the Common Lisp type system via the DEFTYPE macro. The first time the compiler encounters a relevant type specifier, the appropriate DFA is created, which may be a Omega(2^n operation, from which specific low-level code is generated to match that specific expression. Thereafter, when the type specifier is encountered again, the same pre-generated function can be used. The code generated is Theta(n) complexity at run-time. A complication of this approach, which we explain in this report, is that to build the DFA we must calculate a disjoint type decomposition which is time consuming, and also leads to sub-optimal use of TYPECASE in machine generated code. To handle this complication, we use our own macro OPTIMIZED-TYPECASE in our machine generated code. Uses of this macro are also implicitly expanded at compile time. Our macro expansion uses BDDs (Binary Decision Diagrams) to optimize the OPTIMIZED-TYPECASE into low level code, maintaining the TYPECASE semantics but eliminating redundant type checks. In the report we also describe an extension of BDDs to accomodate subtyping in the Common Lisp type system as well as an in-depth analysis of worst-case sizes of BDDs.

Recognizing heterogeneous sequences by rational type expression

Jim Newton · Didier Verna

We summarize a technique for writing functions which recognize types of heterogeneous sequences in Common Lisp. The technique employs sequence recognition functions, generated at compile time, and evaluated at run-time. The technique we demonstrate extends the Common Lisp type system, exploiting the theory of rational languages, Binary Decision Diagrams, and the Turing complete macro facility of Common Lisp. The resulting system uses meta-programming to move an exponential complexity operation from run-time to a compile-time operation, leaving a highly optimized linear complexity operation for run-time.

The tree of shapes turned into a max-tree: A simple and efficient linear algorithm

Edwin Carlinet · Thierry Géraud · Sébastien Crozet

The Tree of Shapes (ToS) is a morphological tree-based representation of an image translating the inclusion of its level lines. It features many invariances to image changes, which makes it well-suited for a lot of applications in image processing and pattern recognition. In this paper, we propose a way of turning this algorithm into a Max-Tree computation. The latter has been widely studied, and many efficient algorithms (including parallel ones) have been developed. Furthermore, we develop a specific optimization to speed-up the common 2D case. It follows a simple and efficient algorithm, running in linear time with a low memory footprint, that outperforms other current algorithms. For Reproducible Research purpose, we distribute our code as free software.

Real-time document detection in smartphone videos

Puybareau · Thierry Géraud

Smartphones are more and more used to capture photos of any kind of important documents in many different situations, yielding to new image processing needs. One of these is the ability of detecting documents in real time on smartphones’ video stream while being robust to classical defects such as low contrast, fuzzy images, flares, shadows, etc. This feature is interesting to help the user to capture his document in the best conditions and to guide this capture (evaluating appropriate distance, centering and tilt). In this paper we propose a solution to detect in real time documents taking very few assumptions concerning their contents and background. This method is based on morphological operators which contrasts with classical line detectors or gradient based thresholds. The use of such invariant operators makes our method robust to the defects encountered in video stream and suitable for real time document detection on smartphones.

Segmentation of gliomas and prediction of patient overall survival: A simple and fast procedure

Puybareau · Guillaume Tochon · Joseph Chazalon · Jonathan Fabrizio

In this paper, we propose a fast automatic method that seg- ments glioma without any manual assistance, using a fully convolutional network (FCN) and transfer learning. From this segmentation, we predict the patient overall survival using only the results of the segmentation and a home made atlas. The FCN is the base network of VGG-16, pretrained on ImageNet for natural image classification, and fine tuned with the training dataset of the MICCAI 2018 BraTS Challenge. It relies on the "pseudo-3D" method published at ICIP 2017, which allows for segmenting objects from 2D color images which contain 3D information of MRI volumes. For each n th slice of the volume to segment, we consider three images, corresponding to the (n-1)th, nth, and (n-1)th slices of the original volume. These three gray-level 2D images are assembled to form a 2D RGB color image (one image per channel). This image is the input of the FCN to obtain a 2D segmentation of the n th slice. We process all slices, then stack the results to form the 3D output segmentation. With such a technique, the segmentation of a 3D volume takes only a few seconds. The prediction is based on Random Forests, and has the advantage of not being dependant of the acquisition modality, making it robust to inter-base data.