Share and discover more about AI with social posts from the community.huggingface/OpenAi
๐—˜๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐˜†๐—ผ๐˜‚๐—ฟ ๐—›๐—ง๐— ๐—Ÿ ๐˜„๐—ฒ๐—ฏ๐—ฝ๐—ฎ๐—ด๐—ฒ๐˜€ ๐˜๐—ผ ๐—บ๐—ฎ๐—ฟ๐—ธ๐—ฑ๐—ผ๐˜„๐—ป ๐—ถ๐˜€ ๐—ป๐—ผ๐˜„ ๐—ฝ๐—ผ๐˜€๐˜€๐—ถ๐—ฏ๐—น๐—ฒ ๐—ฒ๐—ป๐—ฑ-๐˜๐—ผ-๐—ฒ๐—ป๐—ฑ ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐˜€๐—ถ๐—บ๐—ฝ๐—น๐—ฒ ๐—Ÿ๐—Ÿ๐— ! ๐Ÿ‘

Jina just released Reader-LM, that handles the whole pipeline of extracting markdown from HTML webpages.

A while ago, Jina had released a completely code-based deterministic program to do this extraction, based on some heuristics : e.g., โ€œif the text is in a <p> tag, keep it, but if itโ€™s hidden behind another, remove itโ€.

๐Ÿค” But they received complaints from readers: some found it too detailed, other not enough, depending on the pages.

โžก๏ธ So they decided, ๐—บ๐—ฎ๐˜†๐—ฏ๐—ฒ ๐—ต๐—ฒ๐˜‚๐—ฟ๐—ถ๐˜€๐˜๐—ถ๐—ฐ๐˜€ ๐˜„๐—ฒ๐—ฟ๐—ฒ ๐—ป๐—ผ๐˜ ๐—ฒ๐—ป๐—ผ๐˜‚๐—ด๐—ต: ๐—ถ๐—ป๐˜€๐˜๐—ฒ๐—ฎ๐—ฑ, ๐˜๐—ต๐—ฒ๐˜† ๐˜๐—ฟ๐—ถ๐—ฒ๐—ฑ ๐˜๐—ผ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป ๐—ฎ ๐—Ÿ๐—Ÿ๐—  ๐˜๐—ผ ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—น๐—ฒ๐˜๐—ฒ ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป. This LLM does not need to be very strong,but it should handle a very long context: itโ€™s a challenging, โ€œshallow-but-wideโ€ architecture.

๐—ง๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—ฐ๐—ฎ๐—น ๐—ถ๐—ป๐˜€๐—ถ๐—ด๐—ต๐˜๐˜€:
2๏ธโƒฃ models: Reader-LM-0.5B and 1.5B
โš™๏ธ Two stages of training: first, short and simple HTML to get the basics, then ramp up to longer and harder HTML up to 128k tokens
๐Ÿ”Ž Use contrastive search for decoding: this empirically reduces โ€œrepeating outputโ€ issues
โžก๏ธ Their models beat much larger models at HTML extraction ๐Ÿ”ฅ
๐Ÿค— Weights available on HF (sadly cc-by-nc license):
jinaai/reader-lm-1.5b
Hugging face presents FineVideo ๐ŸŽฅ! Unlocking the next generation of Video understanding ๐Ÿš€

๐Ÿคฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐Ÿ”ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. โšก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
Hugging face presents FineVideo ๐ŸŽฅ! Unlocking the next generation of Video understanding ๐Ÿš€

๐Ÿคฏ3400 hours of annotated Creative Common videos with rich character descriptions, scene splits, mood, and content descriptions per scene as well as QA pairs.
๐Ÿ”ฅ
@mfarre processed over 2M videos of Youtube-CC to make this incredibly powerful selection.

Very psyched to fine-tune idefics on this dataset. โšก๏ธ
Explore the videos:
HuggingFaceFV/FineVideo-Explorer
๐Ž๐ฉ๐ž๐ง๐€๐ˆ ๐Ÿ๐ข๐ง๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ž๐ฏ๐ž๐š๐ฅ๐ฌ โ€œ๐Ÿ“โ€: ๐œ๐ซ๐š๐ณ๐ฒ ๐œ๐ก๐š๐ข๐ง-๐จ๐Ÿ-๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ-๐ญ๐ฎ๐ง๐ž๐ ๐ฆ๐จ๐๐ž๐ฅ >> ๐†๐๐“-๐Ÿ’๐จ ๐Ÿ’ฅ

OpenAI had hinted at a mysterious โ€œproject strawberryโ€ for a long time: ๐˜๐—ต๐—ฒ๐˜† ๐—ฝ๐˜‚๐—ฏ๐—น๐—ถ๐˜€๐—ต๐—ฒ๐—ฑ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ฒ๐˜„ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฐ๐—ฎ๐—น๐—น๐—ฒ๐—ฑ โ€œ๐—ผ๐Ÿญโ€ ๐Ÿญ๐—ต๐—ผ๐˜‚๐—ฟ ๐—ฎ๐—ด๐—ผ, ๐—ฎ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ถ๐˜€ ๐—ท๐˜‚๐˜€๐˜ ๐—บ๐—ถ๐—ป๐—ฑ-๐—ฏ๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด.

๐Ÿคฏ Ranks among the top 500 students in the US in a qualifier for the USA Math Olympiad
๐Ÿคฏ Beats human-PhD-level accuracy by 8% on GPQA, hard science problems benchmark where the previous best was Claude 3.5 Sonnet with 59.4.
๐Ÿคฏ Scores 78.2% on vision benchmark MMMU, making it the first model competitive w/ human experts
๐Ÿคฏ GPT-4o on MATH scored 60% โ‡’ o1 scores 95%

How did they pull this? Sadly OpenAI keeps increasing their performance in โ€œmaking cryptic AF reports to not reveal any real infoโ€, so here are excerpts:

๐Ÿ’ฌ โ€œ๐—ผ๐Ÿญ ๐˜‚๐˜€๐—ฒ๐˜€ ๐—ฎ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐˜„๐—ต๐—ฒ๐—ป ๐—ฎ๐˜๐˜๐—ฒ๐—บ๐—ฝ๐˜๐—ถ๐—ป๐—ด ๐˜๐—ผ ๐˜€๐—ผ๐—น๐˜ƒ๐—ฒ ๐—ฎ ๐—ฝ๐—ฟ๐—ผ๐—ฏ๐—น๐—ฒ๐—บ. ๐—ง๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต ๐—ฟ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด, ๐—ผ๐Ÿญ ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป๐˜€ ๐˜๐—ผ ๐—ต๐—ผ๐—ป๐—ฒ ๐—ถ๐˜๐˜€ ๐—ฐ๐—ต๐—ฎ๐—ถ๐—ป ๐—ผ๐—ณ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฟ๐—ฒ๐—ณ๐—ถ๐—ป๐—ฒ ๐˜๐—ต๐—ฒ ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐—ถ๐—ฒ๐˜€ ๐—ถ๐˜ ๐˜‚๐˜€๐—ฒ๐˜€. It learns to recognize and correct its mistakes.โ€

And of course, they decide to hide the content of this precious Chain-of-
Thought. Would it be for maximum profit? Of course not, you awful capitalist, itโ€™s to protect users:

๐Ÿ’ฌ โ€œWe also do not want to make an unaligned chain of thought directly visible to users.โ€

Theyโ€™re right, it would certainly have hurt my feelings to see the internal of this model tearing apart math problems.

๐Ÿค” I suspect it could be not only CoT, but also some agentic behaviour where the model can just call a code executor. The kind of score improvement the show certainly looks like the ones you see with agents.

This model will be immediately released for ChatGPT and some โ€œtrusted API usersโ€.

Letโ€™s start cooking to release the same thing in 6 months! ๐Ÿš€
I believe Hugging Face should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more Hugging Face events.
๐Ÿ“ข The Three-hop (๐Ÿ’กaspect + ๐Ÿค”opinion + ๐Ÿง reason) Chain-of-Thought concept + LLM represent a decent concept for reasoning emotions of participants in textual dialogues.
Delighted to share the tutorial video which make you aware of:
โœ… The proper application of LLM towards implicit IR
โœ… Ways for aligning different information types (causes and states) within the same LLM
โœ… Launch your LLM in GoogleColab that is capable for characters Emotion Extraction in dialogues ๐Ÿงช

๐ŸŽฅ: https://www.youtube.com/watch?v=vRVDQa7vfkU

Project: https://github.com/nicolay-r/THOR-ECAC
Paper: https://aclanthology.org/2024.semeval-1.4/
Model card:
nicolay-r/flan-t5-emotion-cause-thor-base
The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data ๐Ÿค—

The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer ๐Ÿฆฅ

Link to the base model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1


Link to the instruct model:
louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct


Link to the dataset:
louisbrulenaudet/Romulus-cpt-fr


Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.https://cdn-uploads.huggingface.co/production/uploads/6459fa0f5b3111fbe83286e1/n_KKbhGEDZg-2NMBu3OGo.jpeg
> ๐—ช๐—ฎ๐—ป๐˜ ๐˜๐—ผ ๐—ธ๐—ป๐—ผ๐˜„ ๐—ต๐—ผ๐˜„ ๐—บ๐˜‚๐—ฐ๐—ต ๐—ฎ๐—ป ๐—”๐—ฃ๐—œ ๐—Ÿ๐—Ÿ๐—  ๐—ฐ๐—ฎ๐—น๐—น ๐—ฐ๐—ผ๐˜€๐˜๐˜€ ๐˜†๐—ผ๐˜‚?

I've just made this Space that gets you the API price for any LLM call, for nearly all inference providers out there!

This is based on a comment by @victor under my HF Post a few months back, and leverages BerriAI's data for LLM prices.

Check it out here ๐Ÿ‘‰
m-ric/text_to_dollars
Auto-regressive LMs have ruled, but encoder-based architectures like GLiNER are proving to be just as powerful for information extraction while offering better efficiency and interpretability. ๐Ÿ”โœจ

Past encoder backbones were limited by small pre-training datasets and old techniques, but with innovations like LLM2Vec, we've transformed decoders into high-performing encoders! ๐Ÿ”„๐Ÿ’ก

Whatโ€™s New?
๐Ÿ”นConverted Llama & Qwen decoders to advanced encoders
๐Ÿ”นImproved GLiNER architecture to be able to work with rotary positional encoding
๐Ÿ”นNew GLiNER (zero-shot NER) & GLiClass (zero-shot classification) models

๐Ÿ”ฅ Check it out:

New models:
knowledgator/llm2encoder-66d1c76e3c8270397efc5b5e


GLiNER package: https://github.com/urchade/GLiNER

GLiClass package: https://github.com/Knowledgator/GLiClass

๐Ÿ’ป Read our blog for more insights, and stay tuned for whatโ€™s next!
https://medium.com/@knowledgrator/llm2encoders-e7d90b9f5966 GitHub - urchade/GLiNER: Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @โ€ฆ
Free research tip:
Get used to writing the first draft of your paper in markdown using vscodeโ€™s jupyter notebook extension - it lets you do quick sanity checks with code and maths - an absolute AAA experience:)
made an image similarity demo to test out the
mistral-community/pixtral-12b-240910
model .

If anyone knows how to generate captions with it , please do let me know x ๐Ÿš€

here's the demo :
Tonic/Pixtral


hope you like it ๐Ÿค—
What if we asked the AI what it thought of our hugging face profile? ๐Ÿ‘น
I've released a new space capable of doing it.... watch out, it hits hard! ๐ŸฅŠ

Try it now โžก๏ธ
enzostvs/hugger-roaster


Share your roast below ๐Ÿ‘‡
If you are interested in deep reinforcement learning, find below my ICML paper on how we can detect adversaries in deep reinforcement learning:

Paper: Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions
Link: https://proceedings.mlr.press/v202/korkmaz23a.html
๐—”๐—ฟ๐—ฐ๐—ฒ๐—ฒ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ก๐—ผ๐˜ƒ๐—ฎ, ๐—ฏ๐—ฒ๐˜๐˜๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ฒ ๐—ผ๐—ณ ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿณ๐Ÿฌ๐—•!

2๏ธโƒฃ versions: 70B and 8B
๐Ÿง  Trained by distilling logits from Llama-3.1-405B
๐Ÿฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โš™๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐Ÿค— 8B weights are available on HF (not the 70B)

Read their blog post ๐Ÿ‘‰ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐Ÿ‘‰
arcee-ai/Llama-3.1-SuperNova-Lite Arcee-SuperNova: Training Pipeline and Model Composition
๐Ÿš€ Sentence Transformers v3.1 is out! Featuring a hard negatives mining utility to get better models out of your data, a new strong loss function, training with streaming datasets, custom modules, bug fixes, small additions and docs changes. Here's the details:

โ› Hard Negatives Mining Utility: Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
๐Ÿ“‰ New loss function: This loss function works very well for symmetric tasks (e.g. clustering, classification, finding similar texts/paraphrases) and a bit less so for asymmetric tasks (e.g. question-answer retrieval).
๐Ÿ’พ Streaming datasets: You can now train with the datasets.IterableDataset, which doesn't require downloading the full dataset to disk before training. As simple as "streaming=True" in your "datasets.load_dataset".
๐Ÿงฉ Custom Modules: Model authors can now customize a lot more of the components that make up Sentence Transformer models, allowing for a lot more flexibility (e.g. multi-modal, model-specific quirks, etc.)
โœจ New arguments to several methods: encode_multi_process gets a progress bar, push_to_hub can now be done to different branches, and CrossEncoders can be downloaded to specific cache directories.
๐Ÿ› Bug fixes: Too many to name here, check out the release notes!
๐Ÿ“ Documentation: A particular focus on clarifying the batch samplers in the Package Reference this release.

Check out the full release notes here โญ๏ธ: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0

I'm very excited to hear your feedback, and I'm looking forward to the future changes that I have planned, such as ONNX inference! I'm also open to suggestions for new features: feel free to send me your ideas. Release v3.1.0 - Hard Negatives Mining utility; new loss function for symmetric tasks; streaming datasets; custom modules ยท UKPLab/sentenceโ€ฆ
Please check the Open Source AI Network: we mapped the top 500 HF users
based on their followers' profiles.

The map can be found here:
bunkalab/mapping_the_OS_community
Finally tried Kotaemon, an open-source RAG tool for document chat!

With local models, it's free and private. Perfect for journalists and researchers.

I put Kotaemon to the test with EPA's Greenhouse Gas Inventory. Accurately answered questions on CO2 percentage in 2022 emissions and compared 2022 vs 2021 data

๐Ÿ›  Kotaemon's no-code interface makes it user-friendly.
- Use your own models or APIs from OpenAI or Cohere
- Great documentation & easy installation
- Multimodal capabilities + reranking
- View sources, navigate docs & create graphRAG

๐ŸŒŸ Kotaemon is gaining traction with 11.3k GitHub stars

Try the online demo:
cin-model/kotaemon-demo

GitHub: https://github.com/Cinnamon/kotaemon
Docs: https://cinnamon.github.io/kotaemon/usage/ GitHub - Cinnamon/kotaemon: An open-source RAG-based tool for chatting with your documents.
Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting.

Whisper large-v3 has the same architecture as the previous large and large-v2 models, except for the following minor differences:

The spectrogram input uses 128 Mel frequency bins instead of 80
A new language token for Cantonese
The Whisper large-v3 model was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio collected using Whisper large-v2 . The model was trained for 2.0 epochs over this mixture dataset.

The large-v3 model shows improved performance over a wide variety of languages, showing 10% to 20% reduction of errors compared to Whisper large-v2 . For more details on the different checkpoints available, refer to the section Model details.

Disclaimer: Content for this model card has partly been written by the ๐Ÿค— Hugging Face team, and partly copied and pasted from the original model card.
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.

MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post.

Key Features
Cutting-edge output quality, second only to our state-of-the-art model FLUX.1 [pro].
Competitive prompt following, matching the performance of closed source alternatives .
Trained using guidance distillation, making FLUX.1 [dev] more efficient.
Open weights to drive new scientific research, and empower artists to develop innovative workflows.
Generated outputs can be used for personal, scientific, and commercial purposes as described in the FLUX.1 [dev] Non-Commercial License.