Affective computing has gained momentum in recent years due to the increasing recognition of the importance of emotional intelligence in human-computer interaction and user-experience design. Jaspar Pahl, a computer scientist and affective computing enthusiast, says: Data is the most challenging aspect of affective computing. He and his team develop automated methods to visualize emotional responses in human-machine interactions using multimodal data. We took a deep dive with him on the significance of high-quality data in affective computing.
Jaspar, I just heard in a podcast on IoT that data is the most challenging aspect of the field. Why is that?
Jaspar Pahl: In affective computing, we do not have many datasets because they are very difficult to generate. We need to bring study subjects into different emotional states and then measure the physiological signals. Afterwards, the data is labeled and categorized into specific emotions. Compared to, let’s say, classifying images for an image classification algorithm, the effort to generate data for affective computing is a lot higher. Especially when you’re doing it with good emotion labels.
For us, data generation is the biggest challenge
The shortage of data for affective computing also motivated us to make the ADAbase Dataset available for the research community. ADAbase is a dataset we created for cognitive load assessment in autonomous driving environments.
Can you walk us through how you generate dataset in more detail?
Sure. In our lab, we have an exposure cabin for emotion analysis called EmotionAI Box. Our study subjects are equipped with wearables and other sensors and tracked with a camera-based monitoring system during analysis inside the box. Both the sensors and the monitoring system are continuously measuring physiological and behavioral signals while the subjects are presented with stimuli that are supposed to trigger emotional responses. These stimuli can be images or video sequences. Each dataset we generate is basically a synchronized version of what’s happening inside the EmotionAI Box: It includes the time stamp when the stimulus is shown and which stimulus is shown, the subject’s reactions to the stimulus (meaning the different physiological and behavioral measurements), and any additional contextual information.
When we generate data for applications in the automotive sector, we use a driving simulator rather than the EmotionAI box – just like in the study on cognitive load estimation in autonomous vehicles. It has a similar sensor and monitoring setup like our exposure cabin, but also has a car seat, a steering wheel, and a multi-monitor setup to create an immersive, close-to-real-life driving experience. The data acquisition process in the driving simulator is pretty much the same as in the EmotionAI box: The participants are wired up and presented with different stimuli or asked to complete different tasks, while we monitor everything and label the different emotions or affective states.
Biases within the data will lead to biased algorithms.
Another core issue – as with all data used to train Ais – seems to be their quality, especially when it comes to the balance of the dataset. Can you elaborate on that?
We need balanced datasets to get optimal results with our algorithms. Balanced in our case means not only that the emotions are shown with comparable frequency, but also that we have a proportioned representation of different ethnicities, different gender, and age groups to avoid generating biased or limited datasets. Not taking these factors into account can lead to algorithms that will ultimately reproduce biases inside the data. Having a balanced dataset is a big challenge, though, because you simply need a lot of data.
So how do you solve the problem?
As of now, we cannot balance datasets 100 %. That’s why an important aspect of our research is bias research. We try to find ways how we can overcome this issue with robust statistical methods. For instance, if our experimental group includes only one individual from a particular ethnicity, we train our algorithm using a larger proportion of data from this individual compared to datasets from test subjects of other ethnicities. This approach helps mitigate biases and consequently enhances the performance of our algorithms.
Balanced data, rigorous testing and validation of the algorithms, data privacy — those are the major challenges we need to address.
To sum it up, algorithms for affective computing are only as good as the data they are trained with. Can you highlight some other considerations you need to address when developing them?
We already covered the aspect of biased datasets, but this is actually just one of the methodological considerations when developing robust algorithms. We also need rigorous testing and validation of these algorithms to make sure they work across different contexts and applications, for different demographics and subgroups.
Another issue we haven’t touched yet is privacy and the whole discussion on potential misuse of sensitive information and affective computing technologies in general. The analysis of facial expressions and bio-signals, and labeling of emotions involve highly sensitive data. Therefore, we always have to take ethical considerations into account and be transparent about our data protection protocols.
Sounds like that’s a discussion for another day – we’ll be looking forward to that. Thank you so much, Jaspar, for sharing these insights with us.
Image Copyright: Fraunhofer IIS
Add comment