Yazar "Kaya, Heysem" seçeneğine göre listele
Listeleniyor 1 - 20 / 25
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Automatic detection of meniscal area in the knee MR images(IEEE, 2016) Saygılı, Ahmet; Kaya, Heysem; Albayrak, SongülNowadays computer-aided medical systems has become widespread. These systems assist the scientists in the medical field with diagnosis and treatment. In the same vein, in this study detection of medial meniscus from MR images of the knee is performed automatically. Knee MR images used in this study were obtained from Osteoarthritis initiative. 75% of MR images were used for training, while the remainder was used for the test. Attributes to be used in the training and test process were obtained by the Histogram of Oriented Gradients (HOG) method. The regression approach used in the training process and found correlation and mean square error value for patch in different sizes. The maximum correlation value detected is about 91%. The objective of the study will be to accelerate the current system for minimizing the time for treatment in the later stages and to provide a functional decision support system.Öğe AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition(Assoc Computing Machinery, 2018) Ringeval, Fabien; Schuller, Bjoern; Valstar, Michel; Cowie, Roddy; Kaya, Heysem; Schmitt, Maximilian; Pantic, MajaThe Audio/Visual Emotion Challenge and Workshop (AVEC 2018) Bipolar disorder, and cross-cultural affect recognition is the eighth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: bipolar disorder classification, cross-cultural dimensional emotion recognition, and emotional label generation from individual ratings, respectively.Öğe BOUN-NKU in mediaeval 2017 emotional impact of movies task(CEUR-WS, 2017) Karslioglu, N.; Timar, Y.; Salah, Albert Ali; Kaya, HeysemIn this paper, we present our approach for the Emotional Impact of Movies task of Mediaeval 2017 Challenge, involving multimodal fusion for predicting arousal and valence for movie clips. In our system, we have two pipelines. In the first one, we extracted audio/visual features, and used a combination of PCA, Fisher vector encoding, feature selection, and extreme learning machine classifiers. In the second one, we focused on the classifiers, rather than on feature selection. © 2017 Author/owner(s).Öğe Combining Deep Facial and Ambient Features for First Impression Estimation(Springer International Publishing Ag, 2016) Gürpınar, Furkan; Kaya, Heysem; Salah, Albert AliFirst impressions influence the behavior of people towards a newly encountered person or a human-like agent. Apart from the physical characteristics of the encountered face, the emotional expressions displayed on it, as well as ambient information affect these impressions. In this work, we propose an approach to predict the first impressions people will have for a given video depicting a face within a context. We employ pre-trained Deep Convolutional Neural Networks to extract facial expressions, as well as ambient information. After video modeling, visual features that represent facial expression and scene are combined and fed to a Kernel Extreme Learning Machine regressor. The proposed system is evaluated on the ChaLearn Challenge Dataset on First Impression Recognition, where the classification target is the Big Five personality trait labels for each video. Our system achieved an accuracy of 90.94% on the sequestered test set, 0.36% points below the top system in the competition.Öğe Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup(Springer Verlag, 2018) Fedotov, Dmitrii; Kaya, Heysem; Karpov, AlexeyRecently, focus of research in the field of affective computing was shifted to spontaneous interactions and time-continuous annotations. Such data enlarge the possibility for real-world emotion recognition in the wild, but also introduce new challenges. Affective computing is a research area, where data collection is not a trivial and cheap task; therefore it would be rational to use all the data available. However, due to the subjective nature of emotions, differences in cultural and linguistic features as well as environmental conditions, combining affective speech data is not a straightforward process. In this paper, we analyze difficulties of automatic emotion recognition in time-continuous, dimensional scenario using data from RECOLA, SEMAINE and CreativeIT databases. We propose to employ a simple but effective strategy called “mixup” to overcome the gap in feature-target and target-target covariance structures across corpora. We showcase the performance of our system in three different cross-corpus experimental setups: single-corpus training, two-corpora training and training on augmented (mixed up) data. Findings show that the prediction behavior of trained models heavily depends on the covariance structure of the training corpus, and mixup is very effective in improving cross-corpus acoustic emotion recognition performance of context dependent LSTM models. © 2018, Springer Nature Switzerland AG.Öğe Efficient and effective strategies for cross-corpus acoustic emotion recognition(Elsevier, 2018) Kaya, Heysem; Karpov, Alexey A.An important research direction in speech technology is robust cross-corpus and cross-language emotion recognition. In this paper, we propose computationally efficient and performance effective feature normalization strategies for the challenging task of cross-corpus acoustic emotion recognition. We particularly deploy a cascaded normalization approach, combining linear speaker level, nonlinear value level and feature vector level normalization to minimize speaker-and corpus-related effects as well as to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on five corpora representing five languages from different families, namely Danish, English, German, Russian and Turkish. Using a standard set of suprasegmental features, the proposed normalization strategies show superior performance compared to benchmark normalization approaches commonly used in the literature. (C) 2017 Elsevier B.V. All rights reserved.Öğe Emotion, age, and gender classification in children's speech by humans and machines(Academic Press Ltd- Elsevier Science Ltd, 2017) Kaya, Heysem; Salah, Albert Ali; Karpov, Alexey A.; Frolova, Olga; Grigorev, Aleksey; Lyakso, ElenaIn this article, we present the first child emotional speech corpus in Russian, called EmoChildRu, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners' analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination. (C) 2017 Elsevier Ltd. All rights reserved.Öğe Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips(Assoc Computing Machinery, 2018) Timar, Yasemin; Karslıoğlu, Nihan; Kaya, Heysem; Salah, Albert AliPerceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.Öğe Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks(Isca-Int Speech Communication Assoc, 2016) Kaya, Heysem; Karpov, Alexey A.The field of Computational Paralinguistics is rapidly growing and is of interest in various application domains ranging from biomedical engineering to forensics. The INTERSPEECH ComParE challenge series has a field-leading role, introducing novel problems with a common benchmark protocol for comparability. In this work, we tackle all three ComParE 2016 Challenge corpora (Native Language, Sincerity and Deception) benefiting from multi-level normalization on features followed by fast and robust kernel learning methods. Moreover, we employ computer vision inspired low level descriptor representation methods such as the Fisher vector encoding. After nonlinear preprocessing, obtained Fisher vectors are kernelized and mapped to target variables by classifiers based on Kernel Extreme Learning Machines and Partial Least Squares regression. We finally combine predictions of models trained on popularly used functional based descriptor encoding (openSMILE features) with those obtained from the Fisher vector encoding. In the preliminary experiments, our approach has significantly outperformed the baseline systems for Native Language and Sincerity sub-challenges both in the development and test sets.Öğe Hierarchical Two-Level Modelling of Emotional States in Spoken Dialog Systems(IEEE, 2019) Verkholyak, Oxana Vladimirovna; Fedotov, Dmitrii; Kaya, Heysem; Zhang, Yang; Karpov, Alexey A.Emotions occur in complex social interactions, and thus processing of isolated utterances may not be sufficient to grasp the nature of underlying emotional states. Dialog speech provides useful information about context that explains nuances of emotions and their transitions. Context can be defined on different levels; this paper proposes a hierarchical context modelling approach based on RNN-LSTM architecture, which models acoustical context on the frame level and partner's emotional context on the dialog level. The method is proved effective together with cross-corpus training setup and domain adaptation technique in a set of speaker independent cross-validation experiments on IEMOCAP corpus for three levels of activation and valence classification. As a result, the state-of-the-art on this corpus is advanced for both dimensions using only acoustic modality.Öğe Introducing Weighted Kernel Classifiers for Handling Imbalanced Paralinguistic Corpora: Snoring, Addressee and Cold(Isca-Int Speech Communication Assoc, 2017) Kaya, Heysem; Karpov, Alexey A.The field of paralinguistics is growing rapidly with a wide range of applications that go beyond recognition of emotions, laughter and personality. The research flourishes in multiple directions such as signal representation and classification, addressing the issues of the domain. Apart from the noise robustness, an important issue with real life data is the imbalanced nature: some classes of states/traits are under-represented. Combined with the high dimensionality of the feature vectors used in the state-of-the-art analysis systems, this issue poses the threat of over-fitting. While the kernel trick can be employed to handle the dimensionality issue, regular classifiers inherently aim to minimize the misclassification error and hence are biased towards the majority class. A solution to this problem is over sampling of the minority class(es). However, this brings increased memory/computational costs, while not bringing any new information to the classifier. In this work, we propose a new weighting scheme on instances of the original dataset, employing Weighted Kernel Extreme Learning Machine, and inspired from that, introducing the Weighted Partial Least Squares Regression based classifier. The proposed methods are applied on all three 1NTERSPEECH ComParF, 2017 challenge corpora, giving better or competitive results compared to the challenge baselines.Öğe Kernel ELM and CNN based Facial Age Estimation(IEEE, 2016) Gürpınar, Furkan; Kaya, Heysem; Dibeklioglu, Hamdi; Salah, Albert Alie propose a two-level system for apparent age estimation from facial images. Our system first classifies samples into overlapping age groups. Within each group, the apparent age is estimated with local regressors, whose outputs are then fused for the final estimate. We use a deformable parts model based face detector, and features from a pre-trained deep convolutional network. Kernel extreme learning machines are used for classification. We evaluate our system on the ChaLearn Looking at People 2016 - Apparent Age Estimation challenge dataset, and report 0.3740 normal score on the sequestered test set.Öğe LSTM based Cross-corpus and Cross-task Acoustic Emotion Recognition(Isca-Int Speech Communication Assoc, 2018) Kaya, Heysem; Fedotov, Dmitrii; Yesilkanat, Ali; Verkholyak, Oxana Vladimirovna; Zhang, Yang; Karpov, Alexey A.Acoustic emotion recognition is a popular and central research direction in paralinguistic analysis, due its relation to a wide range of affective states/traits and manifold applications. Developing highly generalizable models still remains as a challenge for researchers and engineers, because of multitude of nuisance factors. To assert generalization, deployed models need to handle spontaneous speech recorded under different acoustic conditions compared to the training set. This requires that the models are tested for cross-corpus robustness. In this work, we first investigate the suitability of Long-Short-Term-Memory (LSTM) models trained with time- and space-continuously annotated affective primitives for cross-corpus acoustic emotion recognition. We next employ an effective approach to use the frame level valence and arousal predictions of LSTM models for utterance level affect classification and apply this approach on the ComParE 2018 challenge corpora. The proposed method alone gives motivating results both on development and test set of the Self-Assessed Affect Sub-Challenge. On the development set, the cross-corpus prediction based method gives a boost to performance when fused with top components of the baseline system. Results indicate the suitability of the proposed method for both time-continuous and utterance level cross-corpus acoustic emotion recognition tasks.Öğe Modeling short-term and long-term dependencies of the speech signal for paralinguistic emotion classification(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, 2019) Verkholyak, Oxana Vladimirovna; Kaya, Heysem; Karpov, Alexey A.Recently, Speech Emotion Recognition (SER) has become an important research topic of affective computing. It is a difficult problem, where some of the greatest challenges lie in the feature selection and representation tasks. A good feature representation should be able to reflect global trends as well as temporal structure of the signal, since emotions naturally evolve in time; it has become possible with the advent of Recurrent Neural Networks (RNN), which are actively used today for various sequence modeling tasks. This paper proposes a hybrid approach to feature representation, which combines traditionally engineered statistical features with Long Short-Term Memory (LSTM) sequence representation in order to take advantage of both short-term and long-term acoustic characteristics of the signal, therefore capturing not only the general trends but also temporal structure of the signal. The evaluation of the proposed method is done on three publicly available acted emotional speech corpora in three different languages, namely RUSLANA (Russian speech), BUEMODB (Turkish speech) and EMODB (German speech). Compared to the traditional approach, the results of our experiments show an absolute improvement of 2.3% and 2.8% for two out of three databases, and a comparative performance on the third. Therefore, provided enough training data, the proposed method proves effective in modelling emotional content of speech utterances. © 2019 St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences. All rights reserved.Öğe Movie Emotion Estimation with Multimodal Fusion and Synthetic Data Generation(IEEE, 2019) Karslıoğlu, Nihan; Kaya, Heysem; Salah, Albert AliIn this work, we propose a method for automatic emotion recognition from movie clips. This problem has applications in indexing and retrieval of large movie and video collections, summarization of visual content, selection of emotion-invoking materials, and such. Our approach aims to estimate valence and arousal values automatically. We extract audio and visual features, summarize them via functionals, PCA, and Fisher vector encoding approaches. We used feature selection based on canonical correlation analysis. For classification, we used extreme learning machine and support vector machine. We tested our approach on the LIRIS-ACCEDE database with ground truth annotations. The class imbalance problem was solved by generating synthetic data. By fusing the best features at score and feature level, we obtain good results on this problem, especially for the valence prediction.Öğe Multi-modal Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs(IEEE, 2017) Kaya, Heysem; Gürpınar, Furkan; Salah, Albert AliWe describe an end-to-end system for explainable automatic job candidate screening from video CVs. In this application, audio, face and scene features are first computed from an input video CV, using rich feature sets. These multiple modalities are fed into modality-specific regressors to predict apparent personality traits and a variable that predicts whether the subject will be invited to the interview. The base learners are stacked to an ensemble of decision trees to produce the outputs of the quantitative stage, and a single decision tree, combined with a rule-based algorithm produces interview decision explanations based on the quantitative results. The proposed system in this work ranks first in both quantitative and qualitative stages of the CVPR 2017 ChaLearn Job Candidate Screening Coopetition.Öğe Multimodal Fusion of Audio, Scene, and Face Features for First Impression Estimation(IEEE Computer Soc, 2016) Gürpınar, Furkan; Kaya, Heysem; Salah, Albert AliAffective computing, particularly emotion and personality trait recognition, is of increasing interest in many research disciplines. The interplay of emotion and personality shows itself in the first impression left on other people. Moreover, the ambient information, e.g. the environment and objects surrounding the subject, also affect these impressions. In this work, we employ pre-trained Deep Convolutional Neural Networks to extract facial emotion and ambient information from images for predicting apparent personality. We also investigate Local Gabor Binary Patterns from Three Orthogonal Planes video descriptor and acoustic features extracted via the popularly used openSMILE tool. We subsequently propose classifying features using a Kernel Extreme Learning Machine and fusing their predictions. The proposed system is applied to the ChaLearn Challenge on First Impression Recognition, achieving the winning test set accuracy of 0.913, averaged over the Big Five personality traits.Öğe Oak Leaf Classification: An Analysis of Features and Classifiers(IEEE, 2019) Kaya, Heysem; Keklik, İlhan; Ensari, Tolga; Alkan, Fatih; Biricik, YağmurAutomatic classification of trees from leaves is a popular computer vision/machine learning task and has important applications in monitoring of forest wealth. While the final aim is preparing an application, which is capable of visual signal processing and classification, in this paper we present a new oak leaf dataset and preliminary results for classification of 8 types of oak trees. The novelties include comparative analysis of a small set of hand-crafted geometric features and popularly used high-dimensional appearance features, such as Local Binary Patterns (LBP) and Histograms of Oriented Gradients (HOG). We further compare commonly used Support Vector Machines (SVM) classifier with a recently popular, fast and robust learner called Extreme Learning Machines (ELM). Results indicate that a small set of geometric features reach an accuracy of 75%, while high dimensional appearance features can boost the performance up to92%.Öğe Potential audio treatment predictors for bipolar mania(Wiley, 2018) Çiftçi, Elvan; Kaya, Heysem; Güleç, H.; Salah, Albert Ali[No Abstract Available]Öğe Predicting CO and NOx emissions from gas turbines: novel data and a benchmark PEMS(Tubitak Scientific & Technical Research Council Turkey, 2019) Kaya, Heysem; Tüfekçi, Pınar; Uzun, ErdincPredictive emission monitoring systems (PEMS) are important tools for validation and backing up of costly continuous emission monitoring systems used in gas-turbine-based power plants. Their implementation relies on the availability of appropriate and ecologically valid data. In this paper, we introduce a novel PEMS dataset collected over five years from a gas turbine for the predictive modeling of the CO and NOx emissions. We analyze the data using a recent machine learning paradigm, and present useful insights about emission predictions. Furthermore, we present a benchmark experimental procedure for comparability of future works on the data.