Dataset Card for MedTrinity-25MMedTrinity-25M, a comprehe... | Dataset Card for MedTrinity-25MMedTrinity-25M, a comprehe...
Dataset Card for MedTrinity-25M
MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as detailed local annotations for regions of interest (ROIs), including bounding boxes, segmentation masks. Compared to existing datasets, MedTrinity-25M provides the most enriched annotations, supporting a comprehensive range of multimodal tasks such as captioning and report generation, as well as vision-centric tasks like classification and segmentation. This dataset can be utilized to support large-scale pre-training of multimodal medical AI models, contributing to the development of future foundation models in the medical domain.

Homepage: https://github.com/yunfeixie233/MedTrinity-25M
Paperlink: https://arxiv.org/abs/2408.02900
Github Repo: https://github.com/UCSC-VLAA/MedTrinity-25M GitHub - yunfeixie233/MedTrinity-25M