Timar, YaseminKarslıoğlu, NihanKaya, HeysemSalah, Albert Ali2022-05-112022-05-112018978-1-4503-5046-4https://doi.org/10.1145/3206025.3206074https://hdl.handle.net/20.500.11776/61088th ACM International Conference on Multimedia Retrieval (ACM ICMR) -- JUN 11-14, 2018 -- Yokohama, JAPANPerceptual understanding of media content has many applications, including content-based retrieval, marketing, content optimization, psychological assessment, and affect-based learning. In this paper, we model audio visual features extracted from videos via machine learning approaches to estimate the affective responses of the viewers. We use the LIRIS-ACCEDE dataset and the MediaEval 2017 Challenge setting to evaluate the proposed methods. This dataset is composed of movies of professional or amateur origin, annotated with viewers' arousal, valence, and fear scores. We extract a number of audio features, such as Mel-frequency Cepstral Coefficients, and visual features, such as dense SIFT, hue-saturation histogram, and features from a deep neural network trained for object recognition. We contrast two different approaches in the paper, and report experiments with different fusion and smoothing strategies. We demonstrate the benefit of feature selection and multimodal fusion on estimating affective responses to movie segments.en10.1145/3206025.3206074info:eu-repo/semantics/closedAccessAffective computingmultimodal interactionemotion estimationaudio-visual featuresmovie analysisface analysisExtreme Learning-MachineFeature Selection and Multimodal Fusion for Estimating Emotions Evoked by Movie ClipsConference Object405412N/AWOS:000461145900055