Multimodal Fusion of EMG and Vision for Human Grasp Intent Interference in Prosthetic Hand Control

The use of robotic prosthetic limbs is getting more widespread in society. Even relatively simple limb replacements are associated with an increase in quality of life, and we can just imagine how the quality of life would be upgraded by introducing high-degree precision, intelligent grasp and gesture prediction technologies. This is the main aim of the scientific study recently published on

Image credit: Mehrshad Zandigohar et al, arXiv:2104.03893

What is this research about?

A team of researchers behind this work presented a paper in which they devised a method for a multimodal data fusion dedicated to better predict intention of a user who wants to properly control a robotic hand prosthesis. This work encompasses the dataset collection and the development of a novel method combining two different gasp detection modes. Dataset consists of a first-person video imagery, gaze and dynamic EMG data. Later, this data is classified and segmented based on independent modalities (dynamic EMG and visual grasp detection), and later compared to the multi-modal fusion of those modes with aim to achieve better robustness and accuracy.

Why was this research conducted?

The aim of this research was to help amputees, especially lower arm amputees. According to the presented statistical data, approximately 1.6 million people suffered the loss of a limb in the year 2005, and most of them preferred a prosthetic limb as a replacement. Undoubtedly, bionic prosthetics hold a vast potential to improve the quality of life for their users.

What are the limitations of the existing bionic models?

Usually, robotic prosthetics or bionic arms are attached to patients with the promise of performing object manipulation in day-to-day activities. But the current methods have a limitation. Bionic prosthetics are currently based on physiological methods like EEG (Electroencephalogram) and EMG (Electromyography). These signals are physiological and therefore depend on many limitations such as muscle fatigue, electromagnetic interference, unexpected shifts of electrodes, motion artifacts, and the variation of the impedance of skin-electrode junction over time. Visual evidence is also affected by factors like lighting, occlusion, and the change in the shape of the objects based on the angle through it is viewed. Basically, the current models are susceptible to a certain margin of error as it is altered by a number of intrinsic and extrinsic factors.

How were the experiments conducted?

Experimental data was collected from five perfectly healthy subjects – four males and one female, with their full consent. All the subjects were right-handed, and only the dominant hand was studied in this experiment.

An MVC test (Maximum Voluntary Contraction) was conducted on all the involved muscles at the beginning of the test. After this, the subjects were put through a series of pre-designed motions to collect data using EMG electrode and eye-tracking equipment.

How is the Multimodal Fusion method an upgrade to the existing modes?

This novel method aims to unify the positives of EEG, EMG, and visual evidence by reducing the factors of inefficiency. The scientists presented a ‘Bayesian evidence fusion’ framework using neural network models. They then analyzed a variety of performances as a function of time taken by the user’s hand to approach and grasp the object in front of it.

The data collected from this experiment and developed data processing model has demonstrated that a multimode fusion system is more efficient than both the grasp classification modes segregated individually.

What are the limitations of this method?

As mentioned above, this method relies on the fusion of the best and most efficient parts of physiological modes as well as visual evidence and relies on their complementarity for optimum performance.

While the prosthetic robotic arm is at rest, the object of interest is very clearly noticeable by the camera, which shows higher accuracy of the visual evidence counterpart. On the other hand, when the arm is active, and the subject reaches out to an object, the EMG characteristic is more active than the visual classifier. Fusion-based method outperformed individual classifiers in all scenarios, achieving the total grasp classification accuracy of 95.3%.

Apparently there are no limitations to this technique compared to existing methods, except that more computational power is needed to process all the data. Although, from practical perspective, this is not actually a limitation that could not be resolved with the current state of computational technology.

What is the future scope of this research?

This research could potentially spark the next generation of bio-prosthetics. With a comprehensive multidisciplinary integration of neural networks , programming, and biomechanics, this could be the key to help amputees across the world. Wearing a new generation of smart prostheses would actually feel like a part of the amputee’s body while also providing seamless movement and useful real-life functionality.

Source: Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif, Sezen Yagmur Gunay, Mariusz P. Furmanek, Mathew Yarossi, Paolo Bonato, Cagdas Onal, Taskin Padir, Deniz Erdogmus, Gunar Schirner “Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control”. pre-print, 2104.03893 [2021]


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x