Research Interests:
Research Interests: Information Retrieval, Bioacoustics, Pattern Recognition, Support Vector Machines, Speech Recognition, and 12 moreCategorization, Neural Network, Databases, Performance Evaluation, Hidden Markov Models, Recurrent Neural Networks, Pure Data, hidden Markov model, Indexing, Feature Extraction, Support vector machine, and Indexation
We describe and evaluate our toolkit openBliSSART (open-source Blind Source Separation for Audio Recognition Tasks), which is the C++ framework and toolbox that we have successfully used in a multiplicity of research on blind audio source... more
We describe and evaluate our toolkit openBliSSART (open-source Blind Source Separation for Audio Recognition Tasks), which is the C++ framework and toolbox that we have successfully used in a multiplicity of research on blind audio source separation and feature extraction. To our knowledge, it provides the first open-source implementation of a widely applicable algorithmic framework based on non-negative matrix factorization (NMF), including several pre processing, factorization, and signal reconstruction algorithms for monaural signals. Apart from blind source separation using super vised and unsupervised NMF, we show how the framework is useful for the increasingly popular audio feature extraction methods by NMF. Furthermore, we point out a numerical optimization for NMF, and show that NMF source separation in real-time on a desktop PC is feasible with our implementation. We conclude with an evaluation of our toolkit on supervised speaker separation, demonstrating how our algorithmic framework allows to tune the real-time factors to the desired perceptual quality.
Research Interests:
Research Interests:
Research Interests:
This paper introduces a novel graphical model architecture for robust and vocabulary independent keyword spotting which does not require the training of an explicit garbage model. We show how a graphical model structure for phoneme... more
This paper introduces a novel graphical model architecture for robust and vocabulary independent keyword spotting which does not require the training of an explicit garbage model. We show how a graphical model structure for phoneme recognition can be extended to a keyword spotter that is robust with respect to phoneme recognition errors. We use a hidden garbage variable together with the concept of switching parents to model keywords as well as arbitrary speech. This implies that keywords can be added to the vocabulary without having to re-train the model. Thereby the design of our model architecture is optimised to reliably detect keywords rather than to decode keyword phoneme sequences as arbitrary speech, while offering a parameter to adjust the operating point on the receiver operating characteristics curve. Experiments on the TIMIT corpus reveal that our graphical model outperforms a comparable hidden Markov model based keyword spotter that uses conventional garbage modelling.
Research Interests:
Research Interests:
Abstract This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a... more
Abstract This paper discusses innovative techniques to automatically estimate a user's emotional state analyzing the speech signal and haptical interaction on a touch-screen or via mouse. The knowledge of a user's emotion permits adaptive strategies striving for a more natural and robust interaction. We classify seven emotional states: surprise, joy, anger, fear, disgust, sadness, and neutral user state. The user's emotion is extracted by a parallel stochastic analysis of his spoken and haptical machine interactions while understanding ...
Research Interests:
In this paper, we investigate acoustic features which differentiate the two speech registers neutral and intimate within different constellations of speakers and addressees. Three different types of speakers are considered: mothers... more
In this paper, we investigate acoustic features which differentiate the two speech registers neutral and intimate within different constellations of speakers and addressees. Three different types of speakers are considered: mothers addressing their own children or an unknown adult, women with no children addressing an imaginary child or an imaginary adult, and children addressing a pet robot using both intimate and neutral speech. We use a large, systematically generated feature vector, upsampling, and SVM and RF for learning. Results are reported for extensive test-runs facing speaker- independency and using PCA-SFFS vs. SVM-SFFS for feature ranking. Classification performance and most relevant feature types are discussed in detail.
Research Interests:
This paper investigates the automatic recognition of emotion from spoken words by vector space modeling vs. string kernels which have not been investigated in this respect, yet. Apart from the spoken content directly, we integrate... more
This paper investigates the automatic recognition of emotion from spoken words by vector space modeling vs. string kernels which have not been investigated in this respect, yet. Apart from the spoken content directly, we integrate part-of-speech and higher semantic tagging in our analyses. As opposed to most works in the field, we evaluate the performance with an ASR engine in the loop. Extensive experiments are run on the FAU Aibo Emotion Corpus of 4 k spontaneous emotional child-robot interactions and show surprisingly low performance degradation with real ASR over transcription-based emotion recognition. In the result, bag of words dominate over all other modeling forms based on the spoken content.
Research Interests:
Research Interests:
Abstract Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is currently one of the most challenging tasks in the field of affective computing. This article presents our recent advances in... more
Abstract Automatically recognizing human emotions from spontaneous and non-prototypical real-life data is currently one of the most challenging tasks in the field of affective computing. This article presents our recent advances in assessing dimensional representations of emotion, such as arousal, expectation, power, and valence, in an audiovisual human-computer interaction scenario. Building on previous studies which demonstrate that long-range context modeling tends to increase accuracies of emotion recognition, we propose ...
Research Interests:
Abstract Minimal invasive surgery demands for utmost precise and reliable camera control to prevent any harm to the patient during operations. We therefore introduce a robot-driven camera that can be controlled either manually by a... more
Abstract Minimal invasive surgery demands for utmost precise and reliable camera control to prevent any harm to the patient during operations. We therefore introduce a robot-driven camera that can be controlled either manually by a joystick, or by speech to ensure free hands and feet, and reduced cognitive workload of the surgeon. Speech control is chosen as simple, yet highly robust command and control application. However, due to high stress, and partially fatigue, emotional factors can play a life decisive role in the operational ...
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
Research Interests:
ABSTRACT More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts... more
ABSTRACT More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where ...
