πŸ“š Trained on a large dataset of 558k Arabic triplets tra... | πŸ“š Trained on a large dataset of 558k Arabic triplets tra...
πŸ“š Trained on a large dataset of 558k Arabic triplets translated from the AllNLI triplet dataset:
Omartificial-Intelligence-Space/Arabic-NLi-Triplet

6️⃣ 6 different base models: AraBERT, MarBERT, LaBSE, MiniLM, paraphrase-multilingual-mpnet-base, mpnet-base, ranging from 109M to 471M parameters.
πŸͺ† Trained with a Matryoshka loss, allowing you to truncate embeddings with minimal performance loss: smaller embeddings are faster to compare.
πŸ“ˆ Outperforms all commonly used multilingual models like
intfloat/multilingual-e5-large
,
sentence-transformers/paraphrase-multilingual-mpnet-base-v2
, and
sentence-transformers/LaBSE
.

Check them out here:
-
Omartificial-Intelligence-Space/Arabic-mpnet-base-all-nli-triplet

-
Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabic-labse-Matryoshka

-
Omartificial-Intelligence-Space/Marbert-all-nli-triplet-Matryoshka

-
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet

Or the collection with all:
Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e


My personal favourite is likely
Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka
: a very efficient 135M parameters & scores #1 on
mteb/leaderboard
https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-NLi-Triplet
. Omartificial-Intelligence-Space/Arabic-NLi-Triplet Β· Datasets at Hugging Face