MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records

Nowadays, hospitals record health data in digital format. Deep learning can help to predict diagnosis, suggest treatments, or model the correlation between medical events using these records.

However, electronic health record consists of different types of information: continuous features such as lab tests, categorical features like diagnosis codes, or free-text clinical notes. As for now, models for processing multimodal data are designed by hand and are task-specific.

 Image credit: Amanda Mills, USCDCP, CC0 Public Domain via Pixnio

A recent study proposes a novel way to optimize strategies for fusing multimodal data in health records. It looks both for independent architectures for each modality and the fusion strategy to combine the architectures at the right representation level. The experiment shows that the suggested approach outperforms current unimodal neural architecture search methods in such tasks as predicting the diagnosis code.

One important challenge of applying deep learning to electronic health records (EHR) is the complexity of their multimodal structure. EHR usually contains a mixture of structured (codes) and unstructured (free-text) data with sparse and irregular longitudinal features — all of which doctors utilize when making decisions. In the deep learning regime, determining how different modality representations should be fused together is a difficult problem, which is often addressed by handcrafted modeling and intuition. In this work, we extend state-of-the-art neural architecture search (NAS) methods and propose MUltimodal Fusion Architecture SeArch (MUFASA) to simultaneously search across multimodal fusion strategies and modality-specific architectures for the first time. We demonstrate empirically that our MUFASA method outperforms established unimodal NAS on public EHR data with comparable computation costs. In addition, MUFASA produces architectures that outperform Transformer and Evolved Transformer. Compared with these baselines on CCS diagnosis code prediction, our discovered models improve top-5 recall from 0.88 to 0.91 and demonstrate the ability to generalize to other EHR tasks. Studying our top architecture in depth, we provide empirical evidence that MUFASA’s improvements are derived from its ability to both customize modeling for each data modality and find effective fusion strategies.

Research paper: Xu, Z., So, D. R., and Dai, A. M., “MUFASA: Multimodal Fusion Architecture Search for Electronic Health Records”, 2021. Link:


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x