Share and discover more about AI with social posts from the community.huggingface/OpenAi
๐๐
๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ป๐ด ๐๐ผ๐๐ฟ ๐๐ง๐ ๐ ๐๐ฒ๐ฏ๐ฝ๐ฎ๐ด๐ฒ๐ ๐๐ผ ๐บ๐ฎ๐ฟ๐ธ๐ฑ๐ผ๐๐ป ๐ถ๐ ๐ป๐ผ๐ ๐ฝ๐ผ๐๐๐ถ๐ฏ๐น๐ฒ ๐ฒ๐ป๐ฑ-๐๐ผ-๐ฒ๐ป๐ฑ ๐๐ถ๐๐ต ๐ฎ ๐๐ถ๐บ๐ฝ๐น๐ฒ ๐๐๐ ! ๐
Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.
A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., โif the text is in a <p> tag, keep it, but if itโs hidden behind another, remove itโ.
๐ค But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.
โก๏ธ So they decided, ๐บ๐ฎ๐๐ฏ๐ฒ ๐ต๐ฒ๐๐ฟ๐ถ๐๐๐ถ๐ฐ๐ ๐๐ฒ๐ฟ๐ฒ ๐ป๐ผ๐ ๐ฒ๐ป๐ผ๐๐ด๐ต: ๐ถ๐ป๐๐๐ฒ๐ฎ๐ฑ, ๐๐ต๐ฒ๐ ๐๐ฟ๐ถ๐ฒ๐ฑ ๐๐ผ ๐๐ฟ๐ฎ๐ถ๐ป ๐ฎ ๐๐๐ ๐๐ผ ๐ฑ๐ผ ๐๐ต๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ ๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป. This LLM does not need to be very strong,but it should handle a very long context: itโs a challenging, โshallow-but-wideโ architecture.
๐ง๐ฒ๐ฐ๐ต๐ป๐ถ๐ฐ๐ฎ๐น ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
2๏ธโฃ models: Reader-LM-0.5B and 1.5B
โ๏ธ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
๐ Use contrastive search for decoding: this empirically reduces โrepeating outputโ issues
โก๏ธ Their models beat much larger models at HTML extraction ๐ฅ
๐ค Weights available on HF (sadly cc-by-nc license):
jinaai/reader-lm-1.5b
Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.
A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., โif the text is in a <p> tag, keep it, but if itโs hidden behind another, remove itโ.
๐ค But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.
โก๏ธ So they decided, ๐บ๐ฎ๐๐ฏ๐ฒ ๐ต๐ฒ๐๐ฟ๐ถ๐๐๐ถ๐ฐ๐ ๐๐ฒ๐ฟ๐ฒ ๐ป๐ผ๐ ๐ฒ๐ป๐ผ๐๐ด๐ต: ๐ถ๐ป๐๐๐ฒ๐ฎ๐ฑ, ๐๐ต๐ฒ๐ ๐๐ฟ๐ถ๐ฒ๐ฑ ๐๐ผ ๐๐ฟ๐ฎ๐ถ๐ป ๐ฎ ๐๐๐ ๐๐ผ ๐ฑ๐ผ ๐๐ต๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ ๐ฒ๐ ๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ผ๐ป. This LLM does not need to be very strong,but it should handle a very long context: itโs a challenging, โshallow-but-wideโ architecture.
๐ง๐ฒ๐ฐ๐ต๐ป๐ถ๐ฐ๐ฎ๐น ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
2๏ธโฃ models: Reader-LM-0.5B and 1.5B
โ๏ธ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
๐ Use contrastive search for decoding: this empirically reduces โrepeating outputโ issues
โก๏ธ Their models beat much larger models at HTML extraction ๐ฅ
๐ค Weights available on HF (sadly cc-by-nc license):
jinaai/reader-lm-1.5b
Hugging face presents FineVideo ๐ฅ! Unlocking the next generation of Video understanding ๐
๐คฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.
Very psyched to fine-tune idefics on this dataset. โก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
๐คฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.
Very psyched to fine-tune idefics on this dataset. โก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
Hugging face presents FineVideo ๐ฅ! Unlocking the next generation of Video understanding ๐
๐คฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.
Very psyched to fine-tune idefics on this dataset. โก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
๐คฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.
Very psyched to fine-tune idefics on this dataset. โก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
๐๐ฉ๐๐ง๐๐ ๐๐ข๐ง๐๐ฅ๐ฅ๐ฒ ๐ซ๐๐ฏ๐๐๐ฅ๐ฌ โ๐โ: ๐๐ซ๐๐ณ๐ฒ ๐๐ก๐๐ข๐ง-๐จ๐-๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ-๐ญ๐ฎ๐ง๐๐ ๐ฆ๐จ๐๐๐ฅ >> ๐๐๐-๐๐จ ๐ฅ
OpenAI had hinted at a mysterious โproject strawberryโ for a long time: ๐๐ต๐ฒ๐ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐๐ต๐ถ๐ ๐ป๐ฒ๐ ๐บ๐ผ๐ฑ๐ฒ๐น ๐ฐ๐ฎ๐น๐น๐ฒ๐ฑ โ๐ผ๐ญโ ๐ญ๐ต๐ผ๐๐ฟ ๐ฎ๐ด๐ผ, ๐ฎ๐ป๐ฑ ๐๐ต๐ฒ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ถ๐ ๐ท๐๐๐ ๐บ๐ถ๐ป๐ฑ-๐ฏ๐น๐ผ๐๐ถ๐ป๐ด.
๐คฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐คฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐คฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐คฏ GPT-4o on MATH scored 60% โ o1 scores 95%
How did they pull this? Sadly OpenAI keeps increasing their performance in โmaking cryptic AF reports to not reveal any real infoโ, so here are excerpts:
๐ฌ โ๐ผ๐ญ ๐๐๐ฒ๐ ๐ฎ ๐ฐ๐ต๐ฎ๐ถ๐ป ๐ผ๐ณ ๐๐ต๐ผ๐๐ด๐ต๐ ๐๐ต๐ฒ๐ป ๐ฎ๐๐๐ฒ๐บ๐ฝ๐๐ถ๐ป๐ด ๐๐ผ ๐๐ผ๐น๐๐ฒ ๐ฎ ๐ฝ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ. ๐ง๐ต๐ฟ๐ผ๐๐ด๐ต ๐ฟ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด, ๐ผ๐ญ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ ๐๐ผ ๐ต๐ผ๐ป๐ฒ ๐ถ๐๐ ๐ฐ๐ต๐ฎ๐ถ๐ป ๐ผ๐ณ ๐๐ต๐ผ๐๐ด๐ต๐ ๐ฎ๐ป๐ฑ ๐ฟ๐ฒ๐ณ๐ถ๐ป๐ฒ ๐๐ต๐ฒ ๐๐๐ฟ๐ฎ๐๐ฒ๐ด๐ถ๐ฒ๐ ๐ถ๐ ๐๐๐ฒ๐. It learns to recognize and correct its mistakes.โ
And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโs to protect users:
๐ฌ โWe also do not want to make an unaligned chain of thought directly visible to users.โ
Theyโre right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.
๐ค I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.
This model will be immediately released for ChatGPT and some โtrusted API usersโ.
Letโs start cooking to release the same thing in 6 months! ๐
OpenAI had hinted at a mysterious โproject strawberryโ for a long time: ๐๐ต๐ฒ๐ ๐ฝ๐๐ฏ๐น๐ถ๐๐ต๐ฒ๐ฑ ๐๐ต๐ถ๐ ๐ป๐ฒ๐ ๐บ๐ผ๐ฑ๐ฒ๐น ๐ฐ๐ฎ๐น๐น๐ฒ๐ฑ โ๐ผ๐ญโ ๐ญ๐ต๐ผ๐๐ฟ ๐ฎ๐ด๐ผ, ๐ฎ๐ป๐ฑ ๐๐ต๐ฒ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ถ๐ ๐ท๐๐๐ ๐บ๐ถ๐ป๐ฑ-๐ฏ๐น๐ผ๐๐ถ๐ป๐ด.
๐คฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐คฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐คฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐คฏ GPT-4o on MATH scored 60% โ o1 scores 95%
How did they pull this? Sadly OpenAI keeps increasing their performance in โmaking cryptic AF reports to not reveal any real infoโ, so here are excerpts:
๐ฌ โ๐ผ๐ญ ๐๐๐ฒ๐ ๐ฎ ๐ฐ๐ต๐ฎ๐ถ๐ป ๐ผ๐ณ ๐๐ต๐ผ๐๐ด๐ต๐ ๐๐ต๐ฒ๐ป ๐ฎ๐๐๐ฒ๐บ๐ฝ๐๐ถ๐ป๐ด ๐๐ผ ๐๐ผ๐น๐๐ฒ ๐ฎ ๐ฝ๐ฟ๐ผ๐ฏ๐น๐ฒ๐บ. ๐ง๐ต๐ฟ๐ผ๐๐ด๐ต ๐ฟ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด, ๐ผ๐ญ ๐น๐ฒ๐ฎ๐ฟ๐ป๐ ๐๐ผ ๐ต๐ผ๐ป๐ฒ ๐ถ๐๐ ๐ฐ๐ต๐ฎ๐ถ๐ป ๐ผ๐ณ ๐๐ต๐ผ๐๐ด๐ต๐ ๐ฎ๐ป๐ฑ ๐ฟ๐ฒ๐ณ๐ถ๐ป๐ฒ ๐๐ต๐ฒ ๐๐๐ฟ๐ฎ๐๐ฒ๐ด๐ถ๐ฒ๐ ๐ถ๐ ๐๐๐ฒ๐. It learns to recognize and correct its mistakes.โ
And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโs to protect users:
๐ฌ โWe also do not want to make an unaligned chain of thought directly visible to users.โ
Theyโre right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.
๐ค I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.
This model will be immediately released for ChatGPT and some โtrusted API usersโ.
Letโs start cooking to release the same thing in 6 months! ๐
I believe Hugging Face should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more Hugging Face events.
๐ข The Three-hop (๐กaspect + ๐คopinion + ๐ง reason) Chain-of-Thought concept + LLM represent a decent concept for reasoning emotions of participants in textual dialogues.
Delighted to share the tutorial video which make you aware of:
โ The proper application of LLM towards implicit IR
โ Ways for aligning different information types (causes and states) within the same LLM
โ Launch your LLM in GoogleColab that is capable for characters Emotion Extraction in dialogues ๐งช
๐ฅ: https://www.youtube.com/watch?v=vRVDQa7vfkU
Project: https://github.com/nicolay-r/THOR-ECAC
Paper: https://aclanthology.org/2024.semeval-1.4/
Model card:
nicolay-r/flan-t5-emotion-cause-thor-base
Delighted to share the tutorial video which make you aware of:
โ The proper application of LLM towards implicit IR
โ Ways for aligning different information types (causes and states) within the same LLM
โ Launch your LLM in GoogleColab that is capable for characters Emotion Extraction in dialogues ๐งช
๐ฅ: https://www.youtube.com/watch?v=vRVDQa7vfkU
Project: https://github.com/nicolay-r/THOR-ECAC
Paper: https://aclanthology.org/2024.semeval-1.4/
Model card:
nicolay-r/flan-t5-emotion-cause-thor-base
The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data ๐ค
The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer ๐ฆฅ
Link to the base model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1
Link to the instruct model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct
Link to the dataset:
louisbrulenaudet/Romulus-cpt-fr
Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.https://cdn-uploads.huggingface.co/production/uploads/6459fa0f5b3111fbe83286e1/n_KKbhGEDZg-2NMBu3OGo.jpeg
The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer ๐ฆฅ
Link to the base model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1
Link to the instruct model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct
Link to the dataset:
louisbrulenaudet/Romulus-cpt-fr
Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.https://cdn-uploads.huggingface.co/production/uploads/6459fa0f5b3111fbe83286e1/n_KKbhGEDZg-2NMBu3OGo.jpeg
> ๐ช๐ฎ๐ป๐ ๐๐ผ ๐ธ๐ป๐ผ๐ ๐ต๐ผ๐ ๐บ๐๐ฐ๐ต ๐ฎ๐ป ๐๐ฃ๐ ๐๐๐ ๐ฐ๐ฎ๐น๐น ๐ฐ๐ผ๐๐๐ ๐๐ผ๐?
I've just made this Space that gets you the API price for any LLM call, for nearly all inference providers out there!
This is based on a comment by @victor under my HF Post a few months back, and leverages BerriAI's data for LLM prices.
Check it out here ๐
m-ric/text_to_dollars
I've just made this Space that gets you the API price for any LLM call, for nearly all inference providers out there!
This is based on a comment by @victor under my HF Post a few months back, and leverages BerriAI's data for LLM prices.
Check it out here ๐
m-ric/text_to_dollars
Auto-regressive LMs have ruled, but encoder-based architectures like GLiNER are proving to be just as powerful for information extraction while offering better efficiency and interpretability. ๐โจ
Past encoder backbones were limited by small pre-training datasets and old techniques, but with innovations like LLM2Vec, we've transformed decoders into high-performing encoders! ๐๐ก
Whatโs New?
๐นConverted Llama & Qwen decoders to advanced encoders
๐นImproved GLiNER architecture to be able to work with rotary positional encoding
๐นNew GLiNER (zero-shot NER) & GLiClass (zero-shot classification) models
๐ฅ Check it out:
New models:
knowledgator/llm2encoder-66d1c76e3c8270397efc5b5e
GLiNER package: https://github.com/urchade/GLiNER
GLiClass package: https://github.com/Knowledgator/GLiClass
๐ป Read our blog for more insights, and stay tuned for whatโs next!
https://medium.com/@knowledgrator/llm2encoders-e7d90b9f5966
Past encoder backbones were limited by small pre-training datasets and old techniques, but with innovations like LLM2Vec, we've transformed decoders into high-performing encoders! ๐๐ก
Whatโs New?
๐นConverted Llama & Qwen decoders to advanced encoders
๐นImproved GLiNER architecture to be able to work with rotary positional encoding
๐นNew GLiNER (zero-shot NER) & GLiClass (zero-shot classification) models
๐ฅ Check it out:
New models:
knowledgator/llm2encoder-66d1c76e3c8270397efc5b5e
GLiNER package: https://github.com/urchade/GLiNER
GLiClass package: https://github.com/Knowledgator/GLiClass
๐ป Read our blog for more insights, and stay tuned for whatโs next!
https://medium.com/@knowledgrator/llm2encoders-e7d90b9f5966
Free research tip:
Get used to writing the first draft of your paper in markdown using vscodeโs jupyter notebook extension - it lets you do quick sanity checks with code and maths - an absolute AAA experience:)
Get used to writing the first draft of your paper in markdown using vscodeโs jupyter notebook extension - it lets you do quick sanity checks with code and maths - an absolute AAA experience:)
made an image similarity demo to test out the
mistral-community/pixtral-12b-240910
model .
If anyone knows how to generate captions with it , please do let me know x ๐
here's the demo :
Tonic/Pixtral
hope you like it ๐ค
mistral-community/pixtral-12b-240910
model .
If anyone knows how to generate captions with it , please do let me know x ๐
here's the demo :
Tonic/Pixtral
hope you like it ๐ค
What if we asked the AI what it thought of our hugging face profile? ๐น
I've released a new space capable of doing it.... watch out, it hits hard! ๐ฅ
Try it now โก๏ธ
enzostvs/hugger-roaster
Share your roast below ๐
I've released a new space capable of doing it.... watch out, it hits hard! ๐ฅ
Try it now โก๏ธ
enzostvs/hugger-roaster
Share your roast below ๐
If you are interested in deep reinforcement learning, find below my ICML paper on how we can detect adversaries in deep reinforcement learning:
Paper: Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Link: https://proceedings.mlr.press/v202/korkmaz23a.html
Paper: Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Link: https://proceedings.mlr.press/v202/korkmaz23a.html
๐๐ฟ๐ฐ๐ฒ๐ฒ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ๐ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ก๐ผ๐๐ฎ, ๐ฏ๐ฒ๐๐๐ฒ๐ฟ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ฒ ๐ผ๐ณ ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ณ๐ฌ๐!
2๏ธโฃ versions: 70B and 8B
๐ง Trained by distilling logits from Llama-3.1-405B
๐ฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โ๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐ค 8B weights are available on HF (not the 70B)
Read their blog post ๐ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐
arcee-ai/Llama-3.1-SuperNova-Lite
2๏ธโฃ versions: 70B and 8B
๐ง Trained by distilling logits from Llama-3.1-405B
๐ฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โ๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐ค 8B weights are available on HF (not the 70B)
Read their blog post ๐ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐
arcee-ai/Llama-3.1-SuperNova-Lite
๐ Sentence Transformers v3.1 is out! Featuring a hard negatives mining utility to get better models out of your data, a new strong loss function, training with streaming datasets, custom modules, bug fixes, small additions and docs changes. Here's the details:
โ Hard Negatives Mining Utility: Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
๐ New loss function: This loss function works very well for symmetric tasks (e.g. clustering, classification, finding similar texts/paraphrases) and a bit less so for asymmetric tasks (e.g. question-answer retrieval).
๐พ Streaming datasets: You can now train with the datasets.IterableDataset, which doesn't require downloading the full dataset to disk before training. As simple as "streaming=True" in your "datasets.load_dataset".
๐งฉ Custom Modules: Model authors can now customize a lot more of the components that make up Sentence Transformer models, allowing for a lot more flexibility (e.g. multi-modal, model-specific quirks, etc.)
โจ New arguments to several methods: encode_multi_process gets a progress bar, push_to_hub can now be done to different branches, and CrossEncoders can be downloaded to specific cache directories.
๐ Bug fixes: Too many to name here, check out the release notes!
๐ Documentation: A particular focus on clarifying the batch samplers in the Package Reference this release.
Check out the full release notes here โญ๏ธ: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0
I'm very excited to hear your feedback, and I'm looking forward to the future changes that I have planned, such as ONNX inference! I'm also open to suggestions for new features: feel free to send me your ideas.
โ Hard Negatives Mining Utility: Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
๐ New loss function: This loss function works very well for symmetric tasks (e.g. clustering, classification, finding similar texts/paraphrases) and a bit less so for asymmetric tasks (e.g. question-answer retrieval).
๐พ Streaming datasets: You can now train with the datasets.IterableDataset, which doesn't require downloading the full dataset to disk before training. As simple as "streaming=True" in your "datasets.load_dataset".
๐งฉ Custom Modules: Model authors can now customize a lot more of the components that make up Sentence Transformer models, allowing for a lot more flexibility (e.g. multi-modal, model-specific quirks, etc.)
โจ New arguments to several methods: encode_multi_process gets a progress bar, push_to_hub can now be done to different branches, and CrossEncoders can be downloaded to specific cache directories.
๐ Bug fixes: Too many to name here, check out the release notes!
๐ Documentation: A particular focus on clarifying the batch samplers in the Package Reference this release.
Check out the full release notes here โญ๏ธ: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0
I'm very excited to hear your feedback, and I'm looking forward to the future changes that I have planned, such as ONNX inference! I'm also open to suggestions for new features: feel free to send me your ideas.
Please check the Open Source AI Network: we mapped the top 500 HF users
based on their followers' profiles.
The map can be found here:
bunkalab/mapping_the_OS_community
based on their followers' profiles.
The map can be found here:
bunkalab/mapping_the_OS_community
Finally tried Kotaemon, an open-source RAG tool for document chat!
With local models, it's free and private. Perfect for journalists and researchers.
I put Kotaemon to the test with EPA's Greenhouse Gas Inventory. Accurately answered questions on CO2 percentage in 2022 emissions and compared 2022 vs 2021 data
๐ Kotaemon's no-code interface makes it user-friendly.
- Use your own models or APIs from OpenAI or Cohere
- Great documentation & easy installation
- Multimodal capabilities + reranking
- View sources, navigate docs & create graphRAG
๐ Kotaemon is gaining traction with 11.3k GitHub stars
Try the online demo:
cin-model/kotaemon-demo
GitHub: https://github.com/Cinnamon/kotaemon
Docs: https://cinnamon.github.io/kotaemon/usage/
With local models, it's free and private. Perfect for journalists and researchers.
I put Kotaemon to the test with EPA's Greenhouse Gas Inventory. Accurately answered questions on CO2 percentage in 2022 emissions and compared 2022 vs 2021 data
๐ Kotaemon's no-code interface makes it user-friendly.
- Use your own models or APIs from OpenAI or Cohere
- Great documentation & easy installation
- Multimodal capabilities + reranking
- View sources, navigate docs & create graphRAG
๐ Kotaemon is gaining traction with 11.3k GitHub stars
Try the online demo:
cin-model/kotaemon-demo
GitHub: https://github.com/Cinnamon/kotaemon
Docs: https://cinnamon.github.io/kotaemon/usage/
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.
Whisper large-v3 has the same architecture as the previous large and large-v2 models, except for the following minor differences:
The spectrogram input uses 128 Mel frequency bins instead of 80
A new language token for Cantonese
The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2 . The model was trained for 2.0 epochs over this mixture dataset.
The large-v3 model shows improved performance over a wide variety of languages, showing 10% to 20% reduction of errors compared to Whisper large-v2 . For more details on the different checkpoints available, refer to the section Model details.
Disclaimer: Content for this model card has partly been written by the ๐ค Hugging Face team, and partly copied and pasted from the original model card.
Whisper large-v3 has the same architecture as the previous large and large-v2 models, except for the following minor differences:
The spectrogram input uses 128 Mel frequency bins instead of 80
A new language token for Cantonese
The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2 . The model was trained for 2.0 epochs over this mixture dataset.
The large-v3 model shows improved performance over a wide variety of languages, showing 10% to 20% reduction of errors compared to Whisper large-v2 . For more details on the different checkpoints available, refer to the section Model details.
Disclaimer: Content for this model card has partly been written by the ๐ค Hugging Face team, and partly copied and pasted from the original model card.
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.
MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.
MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.
Key Features
Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.
Key Features
Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.