The introduction of the Segment Anything Model (SAM) has paved the way for numerous semantic segmentation applications. For several tasks, quantifying the uncertainty of SAM is of particular interest. However, the ambiguous nature of the class-agnostic foundation model SAM challenges current uncertainty quantification (UQ) approaches. This paper presents a theoretically motivated uncertainty quantification model based on a Bayesian entropy formulation jointly respecting aleatoric, epistemic, and the newly introduced task uncertainty. We use this formulation to train USAM, a lightweight post-hoc UQ method. Our model traces the root of uncertainty back to under-parameterised models, insufficient prompts or image ambiguities. Our proposed deterministic USAM demonstrates superior predictive capabilities on the SA-V, MOSE, ADE20k, DAVIS, and COCO datasets, offering a computationally cheap and easy-to-use UQ alternative that can support user-prompting, enhance semi-supervised pipelines, or balance the tradeoff between accuracy and cost efficiency.
This project page guides you through the main contributions of our work, provides an overview of the method, and presents the evaluation results by showing tables and figures from the original work. For more details, please refer to our paper.
Figure 2: The SAM framework with our USAM extension and Bayesian entropy approximation to quantify uncertainty. Starting from the the bottom left, an image x_I is the input. A user defines one or more coordinate prompts x_P to specify his desired segmentation task a. The image and prompt are encoded into embeddings, concatenated together with random embeddings, and fed into the mask decoder which applies attention, MLPs and upsampling layers. The SAM framework estimates three potential masks that address different tasks. Additionally, SAM estimates the IoU between the ground truth of a and the masks corresponding to \hatA, denoted as SamScore. We extract and concatenate the mask and confidence tokens to train MLPs that estimate the expected predictive, epistemic, task, and prompt uncertainty. Furthermore, the process to calculate the entropy of uncertainty is visualized in blue. Multiple prompts are chosen by the user, augmentations t are applied to the input, and models of different size theta are used to apply variational inference. The first row shows the gap between the simple and cheap predictions of the Tiny model and the refined predictions of the Large model. The second row shows the gap between the simple and cheap predictions of the Tiny model and the refined predictions of the Large model. The second row shows the gap when using a single point coordinate as prompt and the refined predictions with a prompt that consists of a dense set of coordinates. Finally, the third row shows the gap when using the mask selected by the SamScore and the best mask in \hatA with respect to the ground truth.
Figure 3: Training objectives of our MLPs. They estimate the gap between simple and cheap (left) and refined (right) predictions.
Figure 5: Performance gain while improving predictions selected with UQ on the COCO dataset. We evaluate the SamScore, mask entropy H_Std, the Bayesian entropy approximations H_Y, H_A, H_X_P, H_Theta and our USAM_theta, Delta*_A$, Delta*_X_P, Delta*_Theta. The dashed line denotes an oracle estimation. Beginning from left to right, the first plot shows the improvement when replacing a ratio of the most uncertain predictions of the Tiny model with the Large models. The second plot shows the improvement, when using refined prompts to the most uncertain samples. The third and forth, when the best mask in \hatA is selected ignoring the SamScore and if a ratio of the most uncertain predictions is replaced by the ground truth. It shows that our MLPs are consistently superior to all other methods or on-par with the Bayesian approximation.
Table 7: Runtime of SAM with and without UQ methods on a regular image performed on a NVIDIA RTX3050 Ti. Entropy is calculated on SAMs logit map, |T| and |X_P| denote the number of applied image and prompt augmentations used for MC sampling, and USAM contains the calculation of all our proposed MLPs. Compared to the runtime of all other UQ methods, our USAM is faster and easier to implement.
Table 5: Pearson correlation between different UQ measures on the COCO validation dataset using by the Large SAM model. IoU_GT denotes the real intersection over union between SAMs prediction and the ground truth.
Table 6: Token ablation. The UQ performance of USAM when removing mask or IoU tokens from the MLP input on the COCO dataset, measured in relative AUC as in the main experiments.
Blue masks are ground truth and the green masks are the predictions of the tiny SAM model. Delta_A, Delta_X_P, and Delta*_Theta are the uncertainty estimates of USAM for the task, prompt, and model uncertainty, respectively.
If you use this code in your research, please cite the following paper:
@inproceedings{
kaiser2025uncertainsam,
title={Uncertain{SAM}: Fast and Efficient Uncertainty Quantification of the Segment Anything Model},
author={Timo Kaiser and Thomas Norrenbrock and Bodo Rosenhahn},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=G3j3kq7rSC}
}