Liver and Hepatic Vessel Segmentation with Attention-ResUNet

2025-01-05 · 12 min read · Try Live Demo →

Medical image segmentation has been a hot topic for decades due to its usefulness in diagnosis and treatment planning. Historically, segmentation relied on manual or semi-automated methods—thresholding, region-growing, active contours—requiring significant human intervention. With deep learning, we've shifted to data-driven approaches that are faster and more accurate.

The liver reveals a lot about a patient's health through CT and MRI imaging. Automated segmentation significantly enhances the diagnostic process, making it less dependent on manual intervention. In this post, I'll walk through the Attention Residual UNet architecture I built to segment not just the liver, but also hepatic vessels and tumors.

Try the Live Demo →

Why Not Just Thresholding?

Traditional approaches like thresholding work by marking regions based on pixel intensity values. It's fast and simple—if pixels are above threshold T, they're in; otherwise, out. The problem? Medical images don't have consistent intensity values across tissues. The liver in a CT scan can have similar Hounsfield units to surrounding tissue, making simple thresholding useless.

CNNs changed the game by learning spatial hierarchies automatically through convolutional layers that act as feature detectors. Max pooling, convolutions, and activation functions let the network learn patterns that were impossible to hand-engineer.

The UNet Foundation

UNet's U-shaped design has an encoder path that captures features through successive convolutions and pooling, and a decoder path that upsamples these features to reconstruct spatial details. The key innovation is skip connections that bridge encoder and decoder layers—preserving fine details essential for precise segmentation.

D_i = f(E_i) + g(D_i+1)

Skip connection formulation: encoder output E_i combines with decoder D_i+1

This setup helps retain spatial information that would otherwise be lost during downsampling—critical when segmenting fine structures like hepatic vessels that are only a few pixels wide.

Adding Residual Connections

The vanishing gradient problem kills deep networks. ResNet's solution: add the input directly to the output of a layer, letting the model learn "residuals" instead of absolute mappings.

y = x + F(x, {W_i})

In my implementation, residual blocks replace standard convolutional blocks in both encoder and decoder paths. Each block has two conv layers with batch normalization and ReLU, followed by the addition with the shortcut. This lets gradients flow through the shortcut path, stabilizing training even in deeper networks.

Attention Gates: Focus on What Matters

Here's where it gets interesting. Attention mechanisms highlight relevant regions while suppressing irrelevant ones. In my model, spatial attention layers sit at each decoder stage, filtering features from the encoder based on their relevance to the target region.

The attention coefficient α is computed from query (Q), key (K), and value (V) matrices:

Attention(Q, K, V) = softmax(QK^T / √d_k) · V

The softmax normalizes the dot products of Q and K, ensuring high values correspond to high attention areas. The output modulates V, focusing the network's resources on informative regions—like liver boundaries instead of empty background.

         Skip Connection (from encoder)
                    │
                    ▼
              ┌─────────┐
              │  W_x    │  1×1 conv
              └────┬────┘
                   │
         Gating Signal (from decoder)
                   │
                   ▼
              ┌─────────┐
              │  W_g    │  1×1 conv  
              └────┬────┘
                   │
                   ▼
                  ADD → ReLU → ψ → Sigmoid → α
                   │
                   ▼
              x_out = α ⊙ x  (element-wise multiplication)

Dataset: Medical Segmentation Decathlon

I used a subset of the Medical Segmentation Decathlon dataset—54 contrast-enhanced, labeled CT images for liver segmentation. The split: 43 training, 11 validation, with no image overlap. Each image contains 70-300 slices.

Computational tradeoff: I filtered out images with >300 slices. High slice counts dramatically increase training time but provide limited advantage for this task. The vessel dataset was filtered to <80 slices, resulting in 216 images (172 train, 44 validation).

For the liver model, I used 8 base features. For hepatic vessels, 16 base features—the vessel dataset is smaller, so more features fit in memory. Features double with each encoder downsampling layer.

Training Setup

Parameter	Value
Batch Size	32
Initial Learning Rate	1 × 10^-4
Optimizer	Adam
LR Scheduler	ReduceLROnPlateau (factor=0.5, patience=3)
Early Stopping	Patience = 5 epochs
Liver Model Epochs	25 (auto-stopped)
Vessel Model Epochs	18 (auto-stopped)

The early stopping mechanism is simple but effective: if validation loss doesn't improve for 5 consecutive epochs, training stops. This avoids overfitting while letting occasional loss bumps pass without premature termination.

Loss Function: Dice-Cross Entropy

I used a combined Dice-CE loss to balance foreground and background accuracy. The Dice coefficient measures overlap between predicted and ground truth masks:

Dice = 2 × |P ∩ T| / (|P| + |T|)

where P is the predicted mask and T is ground truth. Score ranges from 0 to 1—higher is better. A threshold of 0.5 converts predicted probabilities to binary values. Smoothing terms in numerator and denominator avoid division by zero.

Results

Task	Structure	Dice Score
Liver Segmentation	Liver	0.88
Liver Segmentation	Tumor	0.73
Vessel Segmentation	Hepatic Vessels	0.83
Vessel Segmentation	Tumor	0.66

The training and validation loss curves show healthy convergence with no overfitting. The model achieves strong performance on both large anatomical structures and tumor regions.

Potential Improvements

Class imbalance is always a challenge in medical segmentation—background dominates every slice. A few directions that could push results even higher:

Weighted Dice Loss: Penalize errors on minority classes (tumors) more heavily
Focal Loss: Down-weight easy examples, focus on hard-to-classify pixels
Oversampling: More training examples from slices containing tumors
Post-processing: Morphological operations to clean up predictions

Try It Yourself

I've deployed the model as a live demo. You can upload NIfTI files or use the synthetic phantom generator to see the segmentation in action across axial, sagittal, and coronal views.

Launch CT Segmentation Demo →

The demo runs on CPU (no GPU on the VPS), so expect a few seconds for segmentation. Toggle liver, vessel, and tumor overlays independently to see what the model detects.

Code

The full implementation—model architecture, training scripts, and inference code—is on GitHub:

github.com/AbdallahAbou/Attention_Res_Unet_CT_Liver_Segmentation