SegFormer (b5-sized) encoder pre-trained-onlySegFormer en...

SegFormer (b5-sized) encoder pre-trained-only
SegFormer encoder fine-tuned on Imagenet-1k. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Disclaimer: The team releasing SegFormer did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description
SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

This repository only contains the pre-trained hierarchical Transformer, hence it can be used for fine-tuning purposes.

Intended uses & limitations
You can use the model for fine-tuning of semantic segmentation. See the model hub to look for fine-tuned versions on a task that interests you.
https://huggingface.co/nvidia/mit-b5

huggingface.co

nvidia/mit-b5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.