Multimodal hypergraph network with contrastive learning for sentiment analysis

Huang, J.; Jiang, K.; Pu, Y.; Zhao, Z.; Yang, Q.; Gu, J.; Xu, D.

Neurocomputing 627: 129566

2025


ISSN/ISBN: 0925-2312
DOI: 10.1016/j.neucom.2025.129566
Accession: 095430684

Article/Abstract emailed within 0-6 h
Payments are secure & encrypted
Powered by Stripe
Powered by PayPal

Summary
Multimodal Sentiment Analysis (MSA) is the process of relying on multimodal information, such as text, audio, and visual, to determine a subject's affective tendencies. While many recent studies have adopted graph-based techniques for MSA, they have yet to fully explore the sentimental interactions both within unimodal temporal steps and across individual modalities. To address the limitations arising from the isolation of unimodal hypergraphs in affective relationship mining, this paper proposes a multimodal hypergraph network based on contrastive learning. It constructs the hypergraph structure by utilizing the sequential time steps of all three modalities as a collection of nodes, aiming to explore the multidimensional affective relationships across uni-, bi-, and tri-modalities. Specifically, this paper first generates the initial hypergraph structure using a correlation-based hypergraph construction method to ensure the effectiveness of the constructed hypergraph. Then, both supervised and unsupervised contrastive learning methods are designed to optimize feature learning and the structure of the multimodal hypergraph, adaptively and simultaneously capturing the relationships among the time-series nodes of uni-, bi-, and tri-modalities. The proposed methods in this paper have demonstrated the advantages and effectiveness by conducting a large number of comparative and validation experiments on the English CMU-MOSI and CMU-MOSEI datasets as well as the Chinese CH-SIMS dataset.