Yazar "Salah, Albert Ali" seçeneğine göre listele
Listeleniyor 1 - 14 / 14
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe BOUN-NKU in mediaeval 2017 emotional impact of movies task(CEUR-WS, 2017) Karslioglu, N.; Timar, Y.; Salah, Albert Ali; Kaya, HeysemIn this paper, we present our approach for the Emotional Impact of Movies task of Mediaeval 2017 Challenge, involving multimodal fusion for predicting arousal and valence for movie clips. In our system, we have two pipelines. In the first one, we extracted audio/visual features, and used a combination of PCA, Fisher vector encoding, feature selection, and extreme learning machine classifiers. In the second one, we focused on the classifiers, rather than on feature selection. © 2017 Author/owner(s).Öğe Combining Deep Facial and Ambient Features for First Impression Estimation(Springer International Publishing Ag, 2016) Gürpınar, Furkan; Kaya, Heysem; Salah, Albert AliFirst impressions influence the behavior of people towards a newly encountered person or a human-like agent. Apart from the physical characteristics of the encountered face, the emotional expressions displayed on it, as well as ambient information affect these impressions. In this work, we propose an approach to predict the first impressions people will have for a given video depicting a face within a context. We employ pre-trained Deep Convolutional Neural Networks to extract facial expressions, as well as ambient information. After video modeling, visual features that represent facial expression and scene are combined and fed to a Kernel Extreme Learning Machine regressor. The proposed system is evaluated on the ChaLearn Challenge Dataset on First Impression Recognition, where the classification target is the Big Five personality trait labels for each video. Our system achieved an accuracy of 90.94% on the sequestered test set, 0.36% points below the top system in the competition.Öğe Emotion, age, and gender classification in children's speech by humans and machines(Academic Press Ltd- Elsevier Science Ltd, 2017) Kaya, Heysem; Salah, Albert Ali; Karpov, Alexey A.; Frolova, Olga; Grigorev, Aleksey; Lyakso, ElenaIn this article, we present the first child emotional speech corpus in Russian, called EmoChildRu, collected from 3 to 7 years old children. The base corpus includes over 20 K recordings (approx. 30 h), collected from 120 children. Audio recordings are carried out in three controlled settings by creating different emotional states for children: playing with a standard set of toys; repetition of words from a toy-parrot in a game store setting; watching a cartoon and retelling of the story, respectively. This corpus is designed to study the reflection of the emotional state in the characteristics of voice and speech and for studies of the formation of emotional states in ontogenesis. A portion of the corpus is annotated for three emotional states (comfort, discomfort, neutral). Additional data include the results of the adult listeners' analysis of child speech, questionnaires, as well as annotation for gender and age in months. We also provide several baselines, comparing human and machine estimation on this corpus for prediction of age, gender and comfort state. While in age estimation, the acoustics-based automatic systems show higher performance, they do not reach human perception levels in comfort state and gender classification. The comparative results indicate the importance and necessity of developing further linguistic models for discrimination. (C) 2017 Elsevier Ltd. All rights reserved.Öğe Feature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie Clips(Assoc Computing Machinery, 2018) Timar, Yasemin; Karslıoğlu, Nihan; Kaya, Heysem; Salah, Albert AliPerceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.Öğe Kernel ELM and CNN based Facial Age Estimation(IEEE, 2016) Gürpınar, Furkan; Kaya, Heysem; Dibeklioglu, Hamdi; Salah, Albert Alie propose a two-level system for apparent age estimation from facial images. Our system first classifies samples into overlapping age groups. Within each group, the apparent age is estimated with local regressors, whose outputs are then fused for the final estimate. We use a deformable parts model based face detector, and features from a pre-trained deep convolutional network. Kernel extreme learning machines are used for classification. We evaluate our system on the ChaLearn Looking at People 2016 - Apparent Age Estimation challenge dataset, and report 0.3740 normal score on the sequestered test set.Öğe Movie Emotion Estimation with Multimodal Fusion and Synthetic Data Generation(IEEE, 2019) Karslıoğlu, Nihan; Kaya, Heysem; Salah, Albert AliIn this work, we propose a method for automatic emotion recognition from movie clips. This problem has applications in indexing and retrieval of large movie and video collections, summarization of visual content, selection of emotion-invoking materials, and such. Our approach aims to estimate valence and arousal values automatically. We extract audio and visual features, summarize them via functionals, PCA, and Fisher vector encoding approaches. We used feature selection based on canonical correlation analysis. For classification, we used extreme learning machine and support vector machine. We tested our approach on the LIRIS-ACCEDE database with ground truth annotations. The class imbalance problem was solved by generating synthetic data. By fusing the best features at score and feature level, we obtain good results on this problem, especially for the valence prediction.Öğe Multi-modal Score Fusion and Decision Trees for Explainable Automatic Job Candidate Screening from Video CVs(IEEE, 2017) Kaya, Heysem; Gürpınar, Furkan; Salah, Albert AliWe describe an end-to-end system for explainable automatic job candidate screening from video CVs. In this application, audio, face and scene features are first computed from an input video CV, using rich feature sets. These multiple modalities are fed into modality-specific regressors to predict apparent personality traits and a variable that predicts whether the subject will be invited to the interview. The base learners are stacked to an ensemble of decision trees to produce the outputs of the quantitative stage, and a single decision tree, combined with a rule-based algorithm produces interview decision explanations based on the quantitative results. The proposed system in this work ranks first in both quantitative and qualitative stages of the CVPR 2017 ChaLearn Job Candidate Screening Coopetition.Öğe Multimodal Fusion of Audio, Scene, and Face Features for First Impression Estimation(IEEE Computer Soc, 2016) Gürpınar, Furkan; Kaya, Heysem; Salah, Albert AliAffective computing, particularly emotion and personality trait recognition, is of increasing interest in many research disciplines. The interplay of emotion and personality shows itself in the first impression left on other people. Moreover, the ambient information, e.g. the environment and objects surrounding the subject, also affect these impressions. In this work, we employ pre-trained Deep Convolutional Neural Networks to extract facial emotion and ambient information from images for predicting apparent personality. We also investigate Local Gabor Binary Patterns from Three Orthogonal Planes video descriptor and acoustic features extracted via the popularly used openSMILE tool. We subsequently propose classifying features using a Kernel Extreme Learning Machine and fusing their predictions. The proposed system is applied to the ChaLearn Challenge on First Impression Recognition, achieving the winning test set accuracy of 0.913, averaged over the Big Five personality traits.Öğe Potential audio treatment predictors for bipolar mania(Wiley, 2018) Çiftçi, Elvan; Kaya, Heysem; Güleç, H.; Salah, Albert Ali[No Abstract Available]Öğe Predicting depression and emotions in the cross-roads of cultures, para-linguistics, and non-linguistics(Association for Computing Machinery, Inc, 2019) Kaya, Heysem; Fedotov, D.; Dresvyanskiy, D.; Doyran, M.; Mamontov, D.; Markitantov, M.; Salah, Albert AliCross-language, cross-cultural emotion recognition and accurate prediction of affective disorders are two of the major challenges in affective computing today. In this work, we compare several systems for Detecting Depression with AI Sub-challenge (DDS) and Cross-cultural Emotion Sub-challenge (CES) that are published as part of the Audio-Visual Emotion Challenge (AVEC) 2019. For both sub-challenges, we benefit from the baselines, while introducing our own features and regression models. For the DDS challenge, where ASR transcripts are provided by the organizers, we propose simple linguistic and word-duration features. These ASR transcriptbased features are shown to outperform the state of the art audio visual features for this task, reaching a test set Concordance Correlation Coefficient (CCC) performance of 0.344 in comparison to a challenge baseline of 0.120. Our results show that non-verbal parts of the signal are important for detection of depression, and combining this with linguistic information produces the best results. For CES, the proposed systems using unsupervised feature adaptation outperform the challenge baselines on emotional primitives, reaching test set CCC performances of 0.466 and 0.499 for arousal and valence, respectively. © 2019 Association for Computing Machinery.Öğe Robust Acoustic Emotion Recognition Based on Cascaded Normalization and Extreme Learning Machines(Springer International Publishing Ag, 2016) Kaya, Heysem; Karpov, Alexey A.; Salah, Albert AliOne of the challenges in speech emotion recognition is robust and speaker-independent emotion recognition. In this paper, we take a cascaded normalization approach, combining linear speaker level, non-linear value level and feature vector level normalization to minimize speaker-related effects and to maximize class separability with linear kernel classifiers. We use extreme learning machine classifiers on a four class (i.e. joy, anger, sadness, neutral) problem. We show the efficacy of our proposed method on the recently collected Turkish Emotional Speech Database.Öğe The Turkish Audio-Visual Bipolar Disorder Corpus(IEEE, 2018) Ciftci, Elvan; Kaya, Heysem; Güleç, Hüseyin; Salah, Albert AliThis paper introduces a new audio-visual Bipolar Disorder (BD) corpus for the affective computing and psychiatric communities. The corpus is annotated for BD state, as well as Young Mania Rating Scale (YMRS) by psychiatrists. The paper also presents an audio-visual pipeline for BD state classification. The investigated features include functionals of appearance descriptors extracted from fine-tuned Deep Convolutional Neural Networks (DCNN), geometric features obtained using tracked facial landmarks, as well as acoustic features extracted via openSMILE tool. Furthermore, acoustics based emotion models are trained on a Turkish emotional database and emotion predictions are cast on the utterances of the BD corpus. The affective scores/predictions are investigated with linear regression and correlation analyses against YMRS declines to give insights about BD, which is directly linked with emotional lability, i.e., quick changes in affect.Öğe Video-based emotion recognition in the wild(Academic Press Ltd-Elsevier Science Ltd, 2019) Salah, Albert Ali; Kaya, Heysem; Gürpınar, Furkan[No Abstract Available]Öğe Video-based emotion recognition in the wild using deep transfer learning and score fusion(Elsevier, 2017) Kaya, Heysem; Gürpınar, Furkan; Salah, Albert AliMultimodal recognition of affective states is a difficult problem, unless the recording conditions are carefully controlled. For recognition in the wild, large variances in face pose and illumination, cluttered backgrounds, occlusions, audio and video noise, as well as issues with subtle cues of expression are some of the issues to target. In this paper, we describe a multimodal approach for video-based emotion recognition in the wild. We propose using summarizing functionals of complementary visual descriptors for video modeling. These features include deep convolutional neural network (CNN) based features obtained via transfer learning, for which we illustrate the importance of flexible registration and fine-tuning. Our approach combines audio and visual features with least squares regression based classifiers and weighted score level fusion. We report state-of-the-art results on the EmotiW Challenge for in the wild facial expression recognition. Our approach scales to other problems, and ranked top in the ChaLearn-LAP First Impressions Challenge 2016 from video clips collected in the wild. (C) 2017 Elsevier B.V. All rights reserved.