LRE - Laboratoire de Recherche de l'ÉPITA

Comparison between factor analysis and GMM support vector machines for speaker verification

Najim Dehak · Réda Dehak · Patrick Kenny · Pierre Dumouchel

We present a comparison between speaker verification systems based on factor analysis modeling and support vector machines using GMM supervectors as features. All systems used the same acoustic features and they were trained and tested on the same data sets. We test two types of kernel (one linear, the other non-linear) for the GMM support vector machines. The results show that factor analysis using speaker factors gives the best results on the core condition of the NIST 2006 speaker recognition evaluation. The difference is particularly marked on the English language subset. Fusion of all systems gave an equal error rate of 4.2% (all trials) and 3.2% (English trials only).

Kernel combination for SVM speaker verification

Réda Dehak · Najim Dehak · Patrick Kenny · Pierre Dumouchel

We present a new approach for constructing the kernels used to build support vector machines for speaker verification. The idea is to construct new kernels by taking linear combination of many kernels such as the GLDS and GMM supervector kernels. In this new kernel combination, the combination weights are speaker dependent rather than universal weights on score level fusion and there is no need for extra-data to estimate them. An experiment on the NIST 2006 speaker recognition evaluation dataset (all trial) was done using three different kernel functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We compared our kernel combination to the optimal linear score fusion obtained using logistic regression. This optimal score fusion was trained on the same test data. We had an equal error rate of $\simeq 5,9\%$ using the kernel combination technique which is better than the optimal score fusion system ($\simeq 6,0\%$).

Semantics driven disambiguation: A comparison of different approaches

Akim Demaille · Renaud Durlin · Nicolas Pierron · Benoît Sigoure

Transformers

context-free grammar

attribute grammar

Stratego

ASF

SDF

disambiguation

parsing

program transformation

term rewriting

Context-sensitive languages such as or can be parsed using a context-free but ambiguous grammar, which requires another stage, disambiguation, in order to select the single parse tree that complies with the language’s semantical rules. Naturally, large and complex languages induce large and complex disambiguation stages. If, in addition, the parser should be extensible, for instance to enable the embedding of domain specific languages, the disambiguation techniques should feature traditional software-engineering qualities: modularity, extensibility, scalability and expressiveness. We evaluate three approaches to write disambiguation filters for sdf grammars: algebraic equations with asf, rewrite-rules with programmable traversals for , and attribute grammars with , our system. To this end we introduce , a highly ambiguous language. Its “standard” grammar exhibits ambiguities inspired by those found in the and standard grammars. To evaluate modularity, the grammar is layered: it starts with a small core language, and several layers add new features, new production rules, and new ambiguities.

The role of speaker factors in the NIST extended data task

Patrick Kenny · Najim Dehak · Réda Dehak · Vishwa Gupta · Pierre Dumouchel

We tested factor analysis models having various numbers of speaker factors on the core condition and the extended data condition of the 2006 NIST speaker recognition evaluation. In order to ensure strict disjointness between training and test sets, the factor analysis models were trained without using any of the data made available for the 2005 evaluation. The factor analysis training set consisted primarily of Switchboard data and so was to some degree mismatched with the 2006 test data (drawn from the Mixer collection). Consequently, our initial results were not as good as those submitted for the 2006 evaluation. However we found that we could compensate for this by a simple modification to our score normalization strategy, namely by using 1000 z-norm utterances in zt-norm. Our purpose in varying the number of speaker factors was to evaluate the eigenvoiceMAP and classicalMAP components of the inter-speaker variability model in factor analysis. We found that on the core condition (i.e. 2–3 minutes of enrollment data), only the eigenvoice MAP component plays a useful role. On the other hand, on the extended data condition (i.e. 15–20 minutes of enrollment data) both the classical MAP component and the eigenvoice component proved to be useful provided that the number of speaker factors was limited. Our best result on the extended data condition (all trials) was an equal error rate of 2.2% and a detection cost of 0.011.

Algorithme de calcul de l’arbre des composantes avec applications à la reconnaissance des formes en imagerie satellitaire

Anthony Baillard · Christophe Berger · Emmanuel Bertin · Thierry Géraud · Roland Levillain · Nicolas Widynski

In this paper a new algorithm to compute the component tree is presented. As compared to the state-of-the-art, this algorithm does not use excessive memory and is able to work efficiently on images whose values are highly quantized or even with images having floating values. We also describe how it can be applied to astronomical data to identify relevant objects.

Effective component tree computation with application to pattern recognition in astronomical imaging

Christophe Berger · Thierry Géraud · Roland Levillain · Nicolas Widynski · Anthony Baillard · Emmanuel Bertin

In this paper a new algorithm to compute the component tree is presented. As compared to the state of the art, this algorithm does not use excessive memory and is able to work efficiently on images whose values are highly quantized or even with images having floating values. We also describe how it can be applied to astronomical data to identify relevant objects.

Web services at TERAPIX

Olivier Ricou · Anthony Baillard · Emmanuel Bertin · Frederic Magnard · Chiara Marmo · Yannick Mellier

We present an implementation of V.O.-compliant web services built around software tools developed at the TERAPIX centre. These services allow to operate from a remote site several pipeline tasks dedicated to astronomical data processing on the TERAPIX cluster, including the latest EFIGI morphological analysis tool.

Linear and non linear kernel GMM SuperVector machines for speaker verification

Réda Dehak · Najim Dehak · Patrick Kenny · Pierre Dumouchel

This paper presents a comparison between Support Vector Machines (SVM) speaker verification systems based on linear and non linear kernels defined in GMM supervector space. We describe how these kernel functions are related and we show how the nuisance attribute projection (NAP) technique can be used with both of these kernels to deal with the session variability problem. We demonstrate the importance of GMM model normalization (M-Norm) especially for the non linear kernel. All our experiments were performed on the core condition of NIST 2006 speaker recognition evaluation (all trials). Our best results (an equal error rate of 6.3%) were obtained using NAP and GMM model normalization with the non linear kernel.

Probabilistic abstraction for model checking: An approach based on property testing

Sophie Laplante · Richard Lassaigne · Frédéric Magniez · Sylvain Peyronnet · Michel Rougemont

The goal of model checking is to verify the correctness of a given program, on all its inputs. The main obstacle, in many cases, is the intractably large size of the program’s transition system. Property testing is a randomized method to verify whether some fixed property holds on individual inputs, by looking at a small random part of that input. We join the strengths of both approaches by introducing a new notion of probabilistic abstraction, and by extending the framework of model checking to include the use of these abstractions. Our abstractions map transition systems associated with large graphs to small transition systems associated with small random subgraphs. This reduces the original transition system to a family of small, even constant-size, transition systems. We prove that with high probability, “sufficiently” incorrect programs will be rejected ($\eps$-robustness). We also prove that under a certain condition (exactness), correct programs will never be rejected (soundness). Our work applies to programs for graph properties such as bipartiteness, $k$-colorability, or any $\exists\forall$ first order graph properties. Our main contribution is to show how to apply the ideas of property testing to syntactic programs for such properties. We give a concrete example of an abstraction for a program for bipartiteness. Finally, we show that the relaxation of the test alone does not yield transition systems small enough to use the standard model checking method. More specifically, we prove, using methods from communication complexity, that the OBDD size remains exponential for approximate bipartiteness.

Local reasoning in fuzzy attribute graphs for optimizing sequential segmentation

Geoffroy Fouquier · Jamal Atif · Isabelle Bloch

Spatial relations play a crucial role in model-based image recognition and interpretation due to their stability compared to many other image appearance characteristics. Graphs are well adapted to represent such information. Sequential methods for knowledge-based recognition of structures require to define in which order the structures have to be recognized. We propose to address this problem of order definition by developing algorithms that automatically deduce sequential segmentation paths from fuzzy spatial attribute graphs. As an illustration, these algorithms are applied on brain image understanding.