Medical Visual Question Answering

VQA system for X-ray and MRI scans using NLP and Computer Vision — 90% test accuracy

Duration: February 2021 – May 2021

Overview

This project tackled Medical Visual Question Answering (VQA) — given a medical image (X-ray or MRI scan of the brain, kidney, or lungs) and a natural language question, the system produces a relevant answer by attending jointly over the image and question.

Problem Statement

Medical imaging generates vast amounts of diagnostic data, but interpreting it requires specialized expertise. A VQA system can assist clinicians by answering specific questions about scan findings, making AI a useful second opinion tool in medical workflows.

Approach

  • Extracted visual features from medical images using a CNN backbone, producing spatial feature maps.
  • Encoded question text using NLP techniques (tokenization + embedding).
  • Applied multi-modal attention to fuse visual and textual representations, enabling the model to focus on image regions relevant to the question.
  • Built and trained end-to-end using TensorFlow.

Results

  • Achieved approximately 90% accuracy on the test dataset.
  • The model generalized well across different scan types (brain, kidney, lungs) and varied question formats.

Skills & Tools

Python · TensorFlow · Computer Vision · NLP · Multi-modal Attention · Medical Imaging

View on GitHub