Yazar "Verkholyak, Oxana Vladimirovna" seçeneğine göre listele
Listeleniyor 1 - 3 / 3
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Hierarchical Two-Level Modelling of Emotional States in Spoken Dialog Systems(IEEE, 2019) Verkholyak, Oxana Vladimirovna; Fedotov, Dmitrii; Kaya, Heysem; Zhang, Yang; Karpov, Alexey A.Emotions occur in complex social interactions, and thus processing of isolated utterances may not be sufficient to grasp the nature of underlying emotional states. Dialog speech provides useful information about context that explains nuances of emotions and their transitions. Context can be defined on different levels; this paper proposes a hierarchical context modelling approach based on RNN-LSTM architecture, which models acoustical context on the frame level and partner's emotional context on the dialog level. The method is proved effective together with cross-corpus training setup and domain adaptation technique in a set of speaker independent cross-validation experiments on IEMOCAP corpus for three levels of activation and valence classification. As a result, the state-of-the-art on this corpus is advanced for both dimensions using only acoustic modality.Öğe LSTM based Cross-corpus and Cross-task Acoustic Emotion Recognition(Isca-Int Speech Communication Assoc, 2018) Kaya, Heysem; Fedotov, Dmitrii; Yesilkanat, Ali; Verkholyak, Oxana Vladimirovna; Zhang, Yang; Karpov, Alexey A.Acoustic emotion recognition is a popular and central research direction in paralinguistic analysis, due its relation to a wide range of affective states/traits and manifold applications. Developing highly generalizable models still remains as a challenge for researchers and engineers, because of multitude of nuisance factors. To assert generalization, deployed models need to handle spontaneous speech recorded under different acoustic conditions compared to the training set. This requires that the models are tested for cross-corpus robustness. In this work, we first investigate the suitability of Long-Short-Term-Memory (LSTM) models trained with time- and space-continuously annotated affective primitives for cross-corpus acoustic emotion recognition. We next employ an effective approach to use the frame level valence and arousal predictions of LSTM models for utterance level affect classification and apply this approach on the ComParE 2018 challenge corpora. The proposed method alone gives motivating results both on development and test set of the Self-Assessed Affect Sub-Challenge. On the development set, the cross-corpus prediction based method gives a boost to performance when fused with top components of the baseline system. Results indicate the suitability of the proposed method for both time-continuous and utterance level cross-corpus acoustic emotion recognition tasks.Öğe Modeling short-term and long-term dependencies of the speech signal for paralinguistic emotion classification(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 2019) Verkholyak, Oxana Vladimirovna; Kaya, Heysem; Karpov, Alexey A.Recently, Speech Emotion Recognition (SER) has become an important research topic of affective computing. It is a difficult problem, where some of the greatest challenges lie in the feature selection and representation tasks. A good feature representation should be able to reflect global trends as well as temporal structure of the signal, since emotions naturally evolve in time; it has become possible with the advent of Recurrent Neural Networks (RNN), which are actively used today for various sequence modeling tasks. This paper proposes a hybrid approach to feature representation, which combines traditionally engineered statistical features with Long Short-Term Memory (LSTM) sequence representation in order to take advantage of both short-term and long-term acoustic characteristics of the signal, therefore capturing not only the general trends but also temporal structure of the signal. The evaluation of the proposed method is done on three publicly available acted emotional speech corpora in three different languages, namely RUSLANA (Russian speech), BUEMODB (Turkish speech) and EMODB (German speech). Compared to the traditional approach, the results of our experiments show an absolute improvement of 2.3% and 2.8% for two out of three databases, and a comparative performance on the third. Therefore, provided enough training data, the proposed method proves effective in modelling emotional content of speech utterances. © 2019 St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences. All rights reserved.