Medical Visual Question Answering

Duration: February 2021 – May 2021

Overview

This project tackled Medical Visual Question Answering (VQA) — given a medical image (X-ray or MRI scan of the brain, kidney, or lungs) and a natural language question, the system produces a relevant answer by attending jointly over the image and question.

Problem Statement

Medical imaging generates vast amounts of diagnostic data, but interpreting it requires specialized expertise. A VQA system can assist clinicians by answering specific questions about scan findings, making AI a useful second opinion tool in medical workflows.

Approach

Extracted visual features from medical images using a CNN backbone, producing spatial feature maps.
Encoded question text using NLP techniques (tokenization + embedding).
Applied multi-modal attention to fuse visual and textual representations, enabling the model to focus on image regions relevant to the question.
Built and trained end-to-end using TensorFlow.

Results

Achieved approximately 90% accuracy on the test dataset.
The model generalized well across different scan types (brain, kidney, lungs) and varied question formats.

Skills & Tools

Python · TensorFlow · Computer Vision · NLP · Multi-modal Attention · Medical Imaging

View on GitHub