In an increasingly complex world, users must process more and more information which makes information processing and decision-making all the more complex. Understanding cognitive load (i.e., the measurement of the mental burden placed on an individual’s cognitive resources during task performance) is essential in this context, particularly in high-stakes environments where multiple tasks must be managed simultaneously. (Read what this entails in autonomous driving scenarios, for example.) But how should the data foundation and the underlying machine learning models look like to effectively assess cognitive load?
The ability of multimodal machine learning models to assess cognitive load is frequently examined in current research. By integrating various data modalities, such as physiological signals and behavioral data, these models present promising approaches for evaluating cognitive load. However, a common issue is that these models are often assessed only in the scenarios for which they were trained, limiting their robustness in real-world applications.
Cognitive load assessment in varying settings
Fraunhofer IIS, specifically the Medical Data Analysis Group, tackles this research gap by examining the performance of these models under varying data distributions that may arise during deployment. The emphasis is on the models’ ability to manage scenarios that deviate from the training data.
To achieve this, we reanalyzed our own recorded data on cognitive load (learn more about ADAbase), which includes two distinct scenarios: an n-back test that assesses working memory performance and a close-to-real-world driving simulation that poses complex simultaneous monitoring tasks (see Oppelt et al. 2023 [1]).
In this study, we selected various classical and deep learning architectures to evaluate their prediction accuracy and uncertainty estimates. The findings indicate that late fusion of models – where predictions from individual modalities are combined – produces more stable classification results. This approach enhances robustness compared to feature-based fusion methods, which involve combining the features from various modalities (such as physiological signals and behavioral data) into a single feature vector. Feature-based fusion methods often suffer from biases, as they rely heavily on specific features that may not generalize well across different scenarios or datasets.

Illustration of two scenarios in the field of cognitive load detection: the driving simulator (blue) and the n-back psychological baseline test (green). Models should not only exhibit strong performance in a single application but also demonstrate effective generalization across different contexts.
Should I stay or I should I go? Why uncertainty can be an asset
A key conclusion of the study is that a model’s ability to quantify uncertainties is essential for its trustworthiness. Well-calibrated uncertainty estimates allow models to make accurate predictions while also recognizing their own uncertainties. This, in turn, helps minimize misjudgments and enhances system interactivity. For instance, in high-uncertainty situations, a model could propose alternative actions or prompt the user to indicate their current cognitive load, thereby enhancing the system’s adaptability and leading to better outcomes in high-stake scenarios.
The analysis of the results also reveals significant variations in the performance of individual modalities between n-back test and the monitoring tasks in the driving simulation. While eye tracking performed best in the n-back test, other physiological signals, such as the electrocardiogram (ECG), showed superior performance in the driving simulation. These findings underscore the necessity of developing robust models that can operate reliably under dynamic and unpredictable conditions.
Navigating data variations: The linchpin of multimodal machine learning models
Our research highlights the critical importance of robustness and trustworthiness in the development of future systems for estimating cognitive load, revealing the strengths and weaknesses of multimodal machine learning models in handling data variations (see Foltyn et al. 2024 [2]). To further enhance the robustness of these systems, we recommend utilizing diverse datasets and investigating various types of data shifts. After all, the combination of robustness and precise uncertainty estimation can substantially enhance the development of trustworthy systems for cognitive load assessment and other applications.
References
[1] Oppelt, M.P.; Foltyn, A.; Deuschel, J.; Lang, N.R.; Holzer, N.; Eskofier, B.M.; Yang, S.H. (2023) ADABase: A Multimodal Dataset for Cognitive Load Estimation. Sensors 2023, 23, 340. doi: 10.3390/s23010340 https://www.mdpi.com/1424-8220/23/1/340[2] Foltyn A., Deuschel J., Lang-Richter N.R., Holzer N. and Oppelt M.P. (2024) Evaluating the robustness of multimodal task load estimation models. Front. Comput. Sci. 6:1371181. doi: 10.3389/fcomp.2024.1371181 https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2024.1371181/full
Image copyright (featured image): Fraunhofer IIS
Add comment