Share and discover more about AI with social posts from the community.huggingface/OpenAi
๐ฃAi2 Releasing OLMoE!
OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters, and, OLMoE is 100% open-source in model, code-base, datasets!
๐ฆPaper: https://arxiv.org/abs/2409.02060
๐คModel:
allenai/OLMoE-1B-7B-0924-Instruct
๐พDatasets:
allenai/OLMoE-mix-0924
๐โโ๏ธDemo:
vilarin/OLMoE https://huggingface.co/spaces/vilarin/OLMoE
OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters, and, OLMoE is 100% open-source in model, code-base, datasets!
๐ฆPaper: https://arxiv.org/abs/2409.02060
๐คModel:
allenai/OLMoE-1B-7B-0924-Instruct
๐พDatasets:
allenai/OLMoE-mix-0924
๐โโ๏ธDemo:
vilarin/OLMoE https://huggingface.co/spaces/vilarin/OLMoE
the new version of Enigma, our code-instruct specialist, is out now:
-
ValiantLabs/Llama3.1-8B-Enigma
is trained on code-instruct and general chat data.
- the updated code-instruct database is available now as well:
sequelbox/Tachibana
more to come soon!
https://huggingface.co/datasets/sequelbox/Tachibana
-
ValiantLabs/Llama3.1-8B-Enigma
is trained on code-instruct and general chat data.
- the updated code-instruct database is available now as well:
sequelbox/Tachibana
more to come soon!
https://huggingface.co/datasets/sequelbox/Tachibana
๐ฅณ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ๐ ๐๐ด๐ฒ๐ป๐๐ ๐ป๐ผ๐ ๐๐๐ฝ๐ฝ๐ผ๐ฟ๐๐ ๐ ๐๐น๐๐ถ-๐ฎ๐ด๐ฒ๐ป๐ ๐๐๐๐๐ฒ๐บ๐!
Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.
You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)
To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.
Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks ๐
Read more in the doc ๐ https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md
Checkout an advanced multi-agent system that tops the GAIA leaderboard ๐ https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py
Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.
You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)
To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.
Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks ๐
Read more in the doc ๐ https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md
Checkout an advanced multi-agent system that tops the GAIA leaderboard ๐ https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py
My tool calling playgrounds repo has been updated again to include the use of flux1-schnell or dev image generation. This functionality is similar to using Dall-E 3 via the @ decorator in ChatGPT. Once the function is selected, the model will either extract or improve your prompt (depending on how you ask).
I have also included 2 notebooks that cover different ways to access Flux for your specific use case. The first method covers how to access flux via LitServe from Lightning AI. LitServe is a bare-bones inference engine with a focus on modularity rather than raw performance. LitServe supports text generation models as well as image generation, which is great for some use cases, but does not provide the caching mechanisms from a dedicated image generation solution.
Since dedicated caching mechanisms are so crucial to performance, I also included an example for how to integrate SwarmUI/ComfyUI to utilize a more dedicated infrastructure that may already be running as part of your tech stack. Resulting in a Llama-3.1 capable of utilizing specific ComfyUI JSON configs, and many different settings.
Lastly, I tested the response times for each over a small batch request to simulate a speed test.
It becomes clear quickly how efficient caching mechanisms can greatly reduce the generation time, even in a scenario where another model is called. An average 4.5 second response time is not bad at all when you consider that an 8B model is calling a 12B parameter model for a secondary generation.
Repo: https://github.com/tdolan21/tool-calling-playground
LitServe: https://github.com/Lightning-AI/LitServe
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
I have also included 2 notebooks that cover different ways to access Flux for your specific use case. The first method covers how to access flux via LitServe from Lightning AI. LitServe is a bare-bones inference engine with a focus on modularity rather than raw performance. LitServe supports text generation models as well as image generation, which is great for some use cases, but does not provide the caching mechanisms from a dedicated image generation solution.
Since dedicated caching mechanisms are so crucial to performance, I also included an example for how to integrate SwarmUI/ComfyUI to utilize a more dedicated infrastructure that may already be running as part of your tech stack. Resulting in a Llama-3.1 capable of utilizing specific ComfyUI JSON configs, and many different settings.
Lastly, I tested the response times for each over a small batch request to simulate a speed test.
It becomes clear quickly how efficient caching mechanisms can greatly reduce the generation time, even in a scenario where another model is called. An average 4.5 second response time is not bad at all when you consider that an 8B model is calling a 12B parameter model for a secondary generation.
Repo: https://github.com/tdolan21/tool-calling-playground
LitServe: https://github.com/Lightning-AI/LitServe
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
Just wrapped up a deep dive into the latest lecture on building LLMs, such as ChatGPT, from @Stanford CS229 course. Here are my top takeaways:
๐ Understanding the Components: LLMs like ChatGPT, Claude, and others are more than just neural networks; they are a complex blend of architecture, training loss, data evaluation, and systems. Knowing how these components work together is key to improving and scaling these models.
๐ Scaling Matters: Performance improves predictably with more data, bigger models, and greater computational power. However, balancing these factors is crucial to avoid overfitting and resource waste.
๐ Data is King: LLMs are trained on trillions of tokens scraped from the internet, but the quality of this data matters immensely. Rigorous filtering and deduplication processes are essential to maintaining data integrity.
๐ Pre-Training vs. Post-Training: While pre-training equips the model with general knowledge, post-training (like RLHF) fine-tunes it to follow human-like responses, reducing toxic outputs and improving alignment with human values.
๐ Reinforcement Learning from Human Feedback (RLHF): This technique allows LLMs to maximize outputs that align with human preferences, making models more reliable and accurate.
๐ก Why It Matters: Understanding these processes not only helps us appreciate the complexity behind our everyday AI tools but also highlights the challenges and opportunities in the ever-evolving field of AI.
Whether youโre in tech, data science, or just AI-curious, staying updated on these advancements is crucial. LLMs are not just transforming industries; theyโre redefining the future of human-computer interaction!
I just realized this was almost 2 hours long...
Link: https://www.youtube.com/watch?v=9vM4p9NN0Ts
๐ Understanding the Components: LLMs like ChatGPT, Claude, and others are more than just neural networks; they are a complex blend of architecture, training loss, data evaluation, and systems. Knowing how these components work together is key to improving and scaling these models.
๐ Scaling Matters: Performance improves predictably with more data, bigger models, and greater computational power. However, balancing these factors is crucial to avoid overfitting and resource waste.
๐ Data is King: LLMs are trained on trillions of tokens scraped from the internet, but the quality of this data matters immensely. Rigorous filtering and deduplication processes are essential to maintaining data integrity.
๐ Pre-Training vs. Post-Training: While pre-training equips the model with general knowledge, post-training (like RLHF) fine-tunes it to follow human-like responses, reducing toxic outputs and improving alignment with human values.
๐ Reinforcement Learning from Human Feedback (RLHF): This technique allows LLMs to maximize outputs that align with human preferences, making models more reliable and accurate.
๐ก Why It Matters: Understanding these processes not only helps us appreciate the complexity behind our everyday AI tools but also highlights the challenges and opportunities in the ever-evolving field of AI.
Whether youโre in tech, data science, or just AI-curious, staying updated on these advancements is crucial. LLMs are not just transforming industries; theyโre redefining the future of human-computer interaction!
I just realized this was almost 2 hours long...
Link: https://www.youtube.com/watch?v=9vM4p9NN0Ts
Meta Platforms to use social media posts from Europe to train AI
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
FACEBOOK owner Meta Platforms plans to start incorporating social media content from Europe to train its generative artificial intelligence models, the company said on Monday (Jun 10).
Meta will train its Llama large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram and Facebook, it said in a blog post.
The shift appears to bring the companyโs approach in Europe roughly in line with how it treats the data it feeds into its AI models from elsewhere around the world, despite earlier caution due to stringent EU privacy and transparency regulations.
Metaโs top policy executive told Reuters in an interview in September that it uses public Facebook and Instagram posts to train its Llama models, while excluding private posts and messages shared only with friends.
As of April, when the company started releasing the latest versions of Llama, Meta was โstill working on the right way to do this in Europe,โ its chief product officer told Reuters at the time.
The social media giant said last month that it would start notifying Facebook and Instagram users in the European region and the United Kingdom about how it uses public information shared on Metaโs services to develop and improve AI.https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-platforms-use-social-media-posts-europe-train-ai
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
FACEBOOK owner Meta Platforms plans to start incorporating social media content from Europe to train its generative artificial intelligence models, the company said on Monday (Jun 10).
Meta will train its Llama large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram and Facebook, it said in a blog post.
The shift appears to bring the companyโs approach in Europe roughly in line with how it treats the data it feeds into its AI models from elsewhere around the world, despite earlier caution due to stringent EU privacy and transparency regulations.
Metaโs top policy executive told Reuters in an interview in September that it uses public Facebook and Instagram posts to train its Llama models, while excluding private posts and messages shared only with friends.
As of April, when the company started releasing the latest versions of Llama, Meta was โstill working on the right way to do this in Europe,โ its chief product officer told Reuters at the time.
The social media giant said last month that it would start notifying Facebook and Instagram users in the European region and the United Kingdom about how it uses public information shared on Metaโs services to develop and improve AI.https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-platforms-use-social-media-posts-europe-train-ai
Chinese and US scientists create AI model to help develop new drugs
Victoria Bela
Published: 6:30pm, 26 Aug 2024
Scientists in China and the United States say they have developed a new artificial intelligence (AI) model that could help overcome some major challenges to drug development and discovery.
The model, called ActFound, outperforms competing models while bypassing challenges to using machine learning in bioactivity prediction, according to a paper published in Nature Machine Intelligence.
โBioactivity encompasses various properties of compounds, such as their interaction with targets, impact on biological systems and therapeutic effects,โ said the researchers from Peking University, the University of Washington and AI tech firm INF Technology Shanghai.
The main challenges to using machine learning include limited data labelling and incompatibility between assays, the tests that measure the activity or potency of drugs.
The model not only outperforms competing AI models, but also functions as well as free-energy perturbation (FEP) โ a traditional computational method.
Although FEP calculations have a high level of accuracy, the team warned that they โrequire extensive computational resources that are often not affordable for large-scale applicationsโ.
Such methods often rely on hard-to-obtain, three-dimensional protein structures to run, which can only be obtained using expensive equipment and extensive laboratory procedures.
Victoria Bela
Published: 6:30pm, 26 Aug 2024
Scientists in China and the United States say they have developed a new artificial intelligence (AI) model that could help overcome some major challenges to drug development and discovery.
The model, called ActFound, outperforms competing models while bypassing challenges to using machine learning in bioactivity prediction, according to a paper published in Nature Machine Intelligence.
โBioactivity encompasses various properties of compounds, such as their interaction with targets, impact on biological systems and therapeutic effects,โ said the researchers from Peking University, the University of Washington and AI tech firm INF Technology Shanghai.
The main challenges to using machine learning include limited data labelling and incompatibility between assays, the tests that measure the activity or potency of drugs.
The model not only outperforms competing AI models, but also functions as well as free-energy perturbation (FEP) โ a traditional computational method.
Although FEP calculations have a high level of accuracy, the team warned that they โrequire extensive computational resources that are often not affordable for large-scale applicationsโ.
Such methods often rely on hard-to-obtain, three-dimensional protein structures to run, which can only be obtained using expensive equipment and extensive laboratory procedures.
0904-NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business networks to ensure AI systems align with local values, laws and interests.
In support of these efforts, NVIDIA today announced the availability of four new NVIDIA NIM microservices that enable developers to more easily build and deploy high-performing generative AI applications.
The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.
In the Asia-Pacific region alone, generative AI software revenue is expected to reach $48 billion by 2030 โ up from $5 billion this year, according to ABI Research.
Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations and other customs.
The RakutenAI 7B family of models, built on Mistral-7B, were trained on English and Japanese datasets, and are available as two different NIM microservices for Chat and Instruct. Rakutenโs foundation and instruct models have achieved leading scores among open Japanese large language models, landing the top average score in the LM Evaluation Harness benchmark carried out from January to March 2024.
Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.
The models offer leading performance for Japanese and Mandarin language understanding, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.
Nations worldwide โ from Singapore, the United Arab Emirates, South Korea and Sweden to France, Italy and India โ are investing in sovereign AI infrastructure.
The new NIM microservices allow businesses, government agencies and universities to host native LLMs in their own environments, enabling developers to build advanced copilots, chatbots and AI assistants.https://blogs.nvidia.com/blog/nim-microservices-generative-ai/
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business networks to ensure AI systems align with local values, laws and interests.
In support of these efforts, NVIDIA today announced the availability of four new NVIDIA NIM microservices that enable developers to more easily build and deploy high-performing generative AI applications.
The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.
In the Asia-Pacific region alone, generative AI software revenue is expected to reach $48 billion by 2030 โ up from $5 billion this year, according to ABI Research.
Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations and other customs.
The RakutenAI 7B family of models, built on Mistral-7B, were trained on English and Japanese datasets, and are available as two different NIM microservices for Chat and Instruct. Rakutenโs foundation and instruct models have achieved leading scores among open Japanese large language models, landing the top average score in the LM Evaluation Harness benchmark carried out from January to March 2024.
Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.
The models offer leading performance for Japanese and Mandarin language understanding, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.
Nations worldwide โ from Singapore, the United Arab Emirates, South Korea and Sweden to France, Italy and India โ are investing in sovereign AI infrastructure.
The new NIM microservices allow businesses, government agencies and universities to host native LLMs in their own environments, enabling developers to build advanced copilots, chatbots and AI assistants.https://blogs.nvidia.com/blog/nim-microservices-generative-ai/
NEW Goldman, Nomura tap Meta Llama AI models
In the 18 months since launch, the mostly free open source Llama models have seen nearly 350 million downloads and been taken up several major firms, including in financial services.
In a progress report, Meta says that Goldman Sachs' GS AI Platform allows the bank's engineers to use Llama models for various use cases, including information extraction from documents.
Meanwhile, Nomura uses Llama on AWS to achieve faster innovation, transparency, bias guardrails, and performance across text summarisation, code generation, log analysis, and document processing.
Meta has ploughed billions of dollars into AI but is taking a different approach to rivals such as OpenAI with its open source model.
In a July letter, Mark Zuckerberg argued that open source AI is good for Meta because it prevents the firm getting locked into a competitor's closed ecosystem.
In addition, he, wrote: "The bottom line is that open source AI represents the worldโs best shot at harnessing this technology to create the greatest economic opportunity and security for everyone."https://www.finextra.com/newsarticle/44650/goldman-nomura-tap-meta-llama-ai-models
In the 18 months since launch, the mostly free open source Llama models have seen nearly 350 million downloads and been taken up several major firms, including in financial services.
In a progress report, Meta says that Goldman Sachs' GS AI Platform allows the bank's engineers to use Llama models for various use cases, including information extraction from documents.
Meanwhile, Nomura uses Llama on AWS to achieve faster innovation, transparency, bias guardrails, and performance across text summarisation, code generation, log analysis, and document processing.
Meta has ploughed billions of dollars into AI but is taking a different approach to rivals such as OpenAI with its open source model.
In a July letter, Mark Zuckerberg argued that open source AI is good for Meta because it prevents the firm getting locked into a competitor's closed ecosystem.
In addition, he, wrote: "The bottom line is that open source AI represents the worldโs best shot at harnessing this technology to create the greatest economic opportunity and security for everyone."https://www.finextra.com/newsarticle/44650/goldman-nomura-tap-meta-llama-ai-models
AI โtigerโ MiniMax launches text-to-video-generating model to rival OpenAIโs Sora
Xinmei Shen
Published: 7:00pm, 2 Sep 2024
Chinese artificial intelligence (AI) start-up MiniMax has launched video-01, its new text-to-video-generating model, heating up competition with other mainland tech firms that look to catch up with the advances made by OpenAIโs Sora.
MiniMax โ known as one of Chinaโs AI โtigersโ along with Zhipu AI, Baichuan and Moonshot AI โ made video-01 available to the public via its website after unveiling the new tool at the companyโs first developer conference in Shanghai on Saturday.
Video-01 enables a user to input a text description to create a video that is up to six seconds in length. The process from the text prompt to generating a video takes about two minutes.
MiniMax founder and chief executive Yan Junjie said at the event that video-01 is the first iteration of the firmโs video-generating tool. He pointed out that future updates will enable users to generate videos from images and to edit these videos, according to local media reports.https://img.i-scmp.com/cdn-cgi/image/fit=contain,width=1024,format=auto/sites/default/files/d8/images/canvas/2024/09/02/7b4222b5-84e6-4a26-8d36-3a1be489faff_46fe445f.jpg
Xinmei Shen
Published: 7:00pm, 2 Sep 2024
Chinese artificial intelligence (AI) start-up MiniMax has launched video-01, its new text-to-video-generating model, heating up competition with other mainland tech firms that look to catch up with the advances made by OpenAIโs Sora.
MiniMax โ known as one of Chinaโs AI โtigersโ along with Zhipu AI, Baichuan and Moonshot AI โ made video-01 available to the public via its website after unveiling the new tool at the companyโs first developer conference in Shanghai on Saturday.
Video-01 enables a user to input a text description to create a video that is up to six seconds in length. The process from the text prompt to generating a video takes about two minutes.
MiniMax founder and chief executive Yan Junjie said at the event that video-01 is the first iteration of the firmโs video-generating tool. He pointed out that future updates will enable users to generate videos from images and to edit these videos, according to local media reports.https://img.i-scmp.com/cdn-cgi/image/fit=contain,width=1024,format=auto/sites/default/files/d8/images/canvas/2024/09/02/7b4222b5-84e6-4a26-8d36-3a1be489faff_46fe445f.jpg
Qwen2-VL-7B-Instruct
Introduction
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
Whatโs New in Qwen2-VL?
Key Enhancements:
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
Model Architecture Updates:
Naive Dynamic Resolution: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.
Multimodal Rotary Position Embedding (M-ROPE): Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.
We have three models with 2, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2-VL model. For more information, visit our Blog and GitHub.https://github.com/QwenLM/Qwen2-VL
Introduction
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
Whatโs New in Qwen2-VL?
Key Enhancements:
SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.
Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.
Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.
Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.
Model Architecture Updates:
Naive Dynamic Resolution: Unlike before, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, offering a more human-like visual processing experience.
Multimodal Rotary Position Embedding (M-ROPE): Decomposes positional embedding into parts to capture 1D textual, 2D visual, and 3D video positional information, enhancing its multimodal processing capabilities.
We have three models with 2, 7 and 72 billion parameters. This repo contains the instruction-tuned 7B Qwen2-VL model. For more information, visit our Blog and GitHub.https://github.com/QwenLM/Qwen2-VL
๐ค ๐ง๐ต๐ฒ ๐๐ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐๐: ๐๐ด๐ฒ๐ป๐๐ถ๐ฐ, ๐ณ๐๐น๐น๐-๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ๐ฑ ๐ฟ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ ๐ณ๐ผ๐ฟ ๐๐ป๐ฑ๐ฒ๐ฟ $๐ญ๐ฑ ๐ฝ๐ฒ๐ฟ ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ
Researchers have just created an AI system that ๐ฐ๐ฎ๐ป ๐ฐ๐ผ๐ป๐ฑ๐๐ฐ๐ ๐ฒ๐ป๐๐ถ๐ฟ๐ฒ ๐ฟ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต ๐ฝ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ ๐ณ๐ฟ๐ผ๐บ ๐๐๐ฎ๐ฟ๐ ๐๐ผ ๐ณ๐ถ๐ป๐ถ๐๐ต, ๐ฝ๐ผ๐๐ฒ๐ป๐๐ถ๐ฎ๐น๐น๐ ๐ฟ๐ฒ๐๐ผ๐น๐๐๐ถ๐ผ๐ป๐ถ๐๐ถ๐ป๐ด ๐ต๐ผ๐ ๐๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐ณ๐ถ๐ฐ ๐ฑ๐ถ๐๐ฐ๐ผ๐๐ฒ๐ฟ๐ถ๐ฒ๐ ๐ฎ๐ฟ๐ฒ ๐บ๐ฎ๐ฑ๐ฒ.
It doesn't just assist with specific tasks - it automates the entire research process, from generating ideas to writing and reviewing papers.
1 - brainstorm novel research directions, 2- write and execute code for experiments & visualize results, get references, and even 3- write up findings in a full academic paper format!
And it can do all this for under $15 per paper! ๐คฏ
๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
๐ง Generates novel research ideas across multiple topics (e.g. diffusion modeling, transformers, learning dynamics aka โgrokkingโ)
๐จโ๐ป Uses open-source coding assistant Aider to implement ideas and run experiments. This is especially important since this agentic assistant can iterate if it fails somewhere.
๐ Visualizes results and plans follow-up experiments (up to 5 rounds)
โ๏ธ Writes full academic papers, including finding references using Semantic Search API
๐ต๏ธ Runs a simulated peer review process to evaluate paper quality
๐ฐ Total cost per paper is under $15. This system can generate "hundreds of interesting, medium-quality papers" in just a week !
๐ฆ๐๐ถ๐น๐น ๐ป๐ผ๐ ๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐๐ผ ๐ณ๐ถ๐น๐น ๐๐๐๐ฅ ๐๐ถ๐๐ต ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ๐:
๐ Ideas generated in one domain tend to be repetitive across different runs, and even different language model
๐ Does not use vision capabilities to fix visual issues in plots
๐ญ Models occasionally hallucinate entire results tables
โ Only few of the generated papers would actually meet the threshold for acceptance at a top AI conference
๐ Read their paper:
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292) https://huggingface.co/papers/2408.06292
Researchers have just created an AI system that ๐ฐ๐ฎ๐ป ๐ฐ๐ผ๐ป๐ฑ๐๐ฐ๐ ๐ฒ๐ป๐๐ถ๐ฟ๐ฒ ๐ฟ๐ฒ๐๐ฒ๐ฎ๐ฟ๐ฐ๐ต ๐ฝ๐ฟ๐ผ๐ท๐ฒ๐ฐ๐๐ ๐ณ๐ฟ๐ผ๐บ ๐๐๐ฎ๐ฟ๐ ๐๐ผ ๐ณ๐ถ๐ป๐ถ๐๐ต, ๐ฝ๐ผ๐๐ฒ๐ป๐๐ถ๐ฎ๐น๐น๐ ๐ฟ๐ฒ๐๐ผ๐น๐๐๐ถ๐ผ๐ป๐ถ๐๐ถ๐ป๐ด ๐ต๐ผ๐ ๐๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐ณ๐ถ๐ฐ ๐ฑ๐ถ๐๐ฐ๐ผ๐๐ฒ๐ฟ๐ถ๐ฒ๐ ๐ฎ๐ฟ๐ฒ ๐บ๐ฎ๐ฑ๐ฒ.
It doesn't just assist with specific tasks - it automates the entire research process, from generating ideas to writing and reviewing papers.
1 - brainstorm novel research directions, 2- write and execute code for experiments & visualize results, get references, and even 3- write up findings in a full academic paper format!
And it can do all this for under $15 per paper! ๐คฏ
๐๐ฒ๐ ๐ถ๐ป๐๐ถ๐ด๐ต๐๐:
๐ง Generates novel research ideas across multiple topics (e.g. diffusion modeling, transformers, learning dynamics aka โgrokkingโ)
๐จโ๐ป Uses open-source coding assistant Aider to implement ideas and run experiments. This is especially important since this agentic assistant can iterate if it fails somewhere.
๐ Visualizes results and plans follow-up experiments (up to 5 rounds)
โ๏ธ Writes full academic papers, including finding references using Semantic Search API
๐ต๏ธ Runs a simulated peer review process to evaluate paper quality
๐ฐ Total cost per paper is under $15. This system can generate "hundreds of interesting, medium-quality papers" in just a week !
๐ฆ๐๐ถ๐น๐น ๐ป๐ผ๐ ๐ฟ๐ฒ๐ฎ๐ฑ๐ ๐๐ผ ๐ณ๐ถ๐น๐น ๐๐๐๐ฅ ๐๐ถ๐๐ต ๐ฝ๐ฎ๐ฝ๐ฒ๐ฟ๐:
๐ Ideas generated in one domain tend to be repetitive across different runs, and even different language model
๐ Does not use vision capabilities to fix visual issues in plots
๐ญ Models occasionally hallucinate entire results tables
โ Only few of the generated papers would actually meet the threshold for acceptance at a top AI conference
๐ Read their paper:
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery (2408.06292) https://huggingface.co/papers/2408.06292
Hey everyone ๐ค!
Check out this awesome new model for object segmentation!
finegrain/finegrain-object-cutter
.
We (finegrain) have trained this new model in partnership with Nfinite and some of their synthetic data, the resulting model is incredibly accurate ๐.
Itโs all open source under the MIT license (
finegrain/finegrain-box-segmenter
), complete with a test set tailored for e-commerce (
finegrain/finegrain-product-masks-lite
). Have fun experimenting with it!
Check out this awesome new model for object segmentation!
finegrain/finegrain-object-cutter
.
We (finegrain) have trained this new model in partnership with Nfinite and some of their synthetic data, the resulting model is incredibly accurate ๐.
Itโs all open source under the MIT license (
finegrain/finegrain-box-segmenter
), complete with a test set tailored for e-commerce (
finegrain/finegrain-product-masks-lite
). Have fun experimenting with it!
๐๐ปโโ๏ธhey there folks ,
โ๏ธInkubaLM has been trained from scratch using 1.9 billion tokens of data for five African languages, along with English and French data, totaling 2.4 billion tokens of data. It is capable of understanding and generating content in five African languages: Swahili, Yoruba, Hausa, isiZulu, and isiXhosa, as well as English and French.
model
lelapa/InkubaLM-0.4B
demo
Tonic/Inkuba-0.4B
โ๏ธInkubaLM has been trained from scratch using 1.9 billion tokens of data for five African languages, along with English and French data, totaling 2.4 billion tokens of data. It is capable of understanding and generating content in five African languages: Swahili, Yoruba, Hausa, isiZulu, and isiXhosa, as well as English and French.
model
lelapa/InkubaLM-0.4B
demo
Tonic/Inkuba-0.4B
Spent a few minutes to build an alternative to Character AI on top of llama3.1 405B through SambaNova's super fast inference API
Space:
kz919/Persona-AI
API referral link: https://sambanova.ai/fast-api?api_ref=907266
Space:
kz919/Persona-AI
API referral link: https://sambanova.ai/fast-api?api_ref=907266
I started training a public LoRA style (2 seperate training each on 4x A6000).
Experimenting captions vs non-captions. So we will see which yields best results for style training on FLUX.
Generated captions with multi-GPU batch Joycaption app.
I am showing 5 examples of what Joycaption generates on FLUX dev. Left images are the original style images from the dataset.
I used my multi-GPU Joycaption APP (used 8x A6000 for ultra fast captioning) : https://www.patreon.com/posts/110613301
I used my Gradio batch caption editor to edit some words and add activation token as ohwx 3d render : https://www.patreon.com/posts/108992085
The no caption dataset uses only ohwx 3d render as caption
I am using my newest 4x_GPU_Rank_1_SLOW_Better_Quality.json on 4X A6000 GPU and train 500 epochs โ 114 images : https://www.patreon.com/posts/110879657
Total step count is being 500 * 114 / 4 (4x GPU โ batch size 1) = 14250
Taking 37 hours currently if I donโt terminate early
Will save a checkpoint once every 25 epochs
Full Windows Kohya LoRA training tutorial : https://youtu.be/nySGu12Y05k
Full cloud tutorial I am still editing
Hopefully will share trained LoRA on Hugging Face and CivitAI along with full dataset including captions.
I got permission to share dataset but canโt be used commercially.
Also I will hopefully share full workflow in the CivitAI and Hugging Face LoRA pages.
Experimenting captions vs non-captions. So we will see which yields best results for style training on FLUX.
Generated captions with multi-GPU batch Joycaption app.
I am showing 5 examples of what Joycaption generates on FLUX dev. Left images are the original style images from the dataset.
I used my multi-GPU Joycaption APP (used 8x A6000 for ultra fast captioning) : https://www.patreon.com/posts/110613301
I used my Gradio batch caption editor to edit some words and add activation token as ohwx 3d render : https://www.patreon.com/posts/108992085
The no caption dataset uses only ohwx 3d render as caption
I am using my newest 4x_GPU_Rank_1_SLOW_Better_Quality.json on 4X A6000 GPU and train 500 epochs โ 114 images : https://www.patreon.com/posts/110879657
Total step count is being 500 * 114 / 4 (4x GPU โ batch size 1) = 14250
Taking 37 hours currently if I donโt terminate early
Will save a checkpoint once every 25 epochs
Full Windows Kohya LoRA training tutorial : https://youtu.be/nySGu12Y05k
Full cloud tutorial I am still editing
Hopefully will share trained LoRA on Hugging Face and CivitAI along with full dataset including captions.
I got permission to share dataset but canโt be used commercially.
Also I will hopefully share full workflow in the CivitAI and Hugging Face LoRA pages.
# Excited to Share: New LLM Tokenization - Convert Text to tokens and vice versa! ๐
I've just developed a powerful tool for anyone working with Language Models (LLMs) or diving into Natural Language Processing (NLP).
๐ Introducing the LLM Tokenization - Convert Text to tokens and vice versa!!
Key Features:
- Convert text to tokens and token IDs
- Reverse engineer: convert token IDs back to text
- Support for popular models: LLama3 (Will add more models iteratively)
- User-friendly Gradio interface for easy interaction
Whether you're debugging your NLP pipeline, exploring how different models tokenize text, or just curious about the inner workings of LLMs, this tool is for you!
๐ฉโ๐ป Tech Stack:
- Python
- Gradio for the web interface
- Hugging Face Transformers for tokenization
The application is deployed in Hugging Face spaces as Gradio application
๐ Try it out: https://lnkd.in/g6R5z9k2
#NLP #MachineLearning #AI #PythonDevelopment #OpenSourceAI
I've just developed a powerful tool for anyone working with Language Models (LLMs) or diving into Natural Language Processing (NLP).
๐ Introducing the LLM Tokenization - Convert Text to tokens and vice versa!!
Key Features:
- Convert text to tokens and token IDs
- Reverse engineer: convert token IDs back to text
- Support for popular models: LLama3 (Will add more models iteratively)
- User-friendly Gradio interface for easy interaction
Whether you're debugging your NLP pipeline, exploring how different models tokenize text, or just curious about the inner workings of LLMs, this tool is for you!
๐ฉโ๐ป Tech Stack:
- Python
- Gradio for the web interface
- Hugging Face Transformers for tokenization
The application is deployed in Hugging Face spaces as Gradio application
๐ Try it out: https://lnkd.in/g6R5z9k2
#NLP #MachineLearning #AI #PythonDevelopment #OpenSourceAI
๐๐๐ฐ ๐๐๐ฅ๐๐๐ฌ๐: ๐๐๐ฃ๐จ๐ซ ๐๐๐ ๐๐ข๐ ๐ข๐ญ๐๐ฅ ๐๐ฅ๐๐ฏ๐๐ญ๐ข๐จ๐ง ๐๐จ๐๐๐ฅ ๐๐ฑ๐ฉ๐๐ง๐ฌ๐ข๐จ๐ง ๐บ
Dataset:
Major-TOM/Core-DEM
Today with European Space Agency - ESA and Adobe Research, we release a global expansion to Major TOM with GLO-30 DEM data.
You can now instantly access nearly 2M of Major TOM samples with elevation data to build your next AI model for EO. ๐
๐ Browse the data in our usual viewer app:
Major-TOM/MajorTOM-Core-Viewer
Fantastic work championed by Paul Borne--Pons @NewtNewt ๐
Dataset:
Major-TOM/Core-DEM
Today with European Space Agency - ESA and Adobe Research, we release a global expansion to Major TOM with GLO-30 DEM data.
You can now instantly access nearly 2M of Major TOM samples with elevation data to build your next AI model for EO. ๐
๐ Browse the data in our usual viewer app:
Major-TOM/MajorTOM-Core-Viewer
Fantastic work championed by Paul Borne--Pons @NewtNewt ๐
๐๐ฒ ๐๐ข๐ซ๐ฌ๐ญ ๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐๐ซ๐ญ๐ข๐๐ฅ๐! ๐๐๐ฅ๐๐๐ญ๐ข๐ฏ๐ ๐๐ข๐ง๐-๐ญ๐ฎ๐ง๐ข๐ง๐ ๐ฐ๐ข๐ญ๐ก ๐๐ฉ๐๐๐ญ๐ซ๐ฎ๐ฆ ๐ฏ
Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
๐ ๐ฃ https://huggingface.co/blog/anakin87/spectrum
---
Looking to fine-tune Language Models efficiently and save on computational resources?
One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.
However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.
What if we could identify the most informative layers and only fine-tune those? ๐ค
This is exactly what Spectrum does! ๐
๐ฌ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)
๐ฏ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).
You can then โ๏ธ freeze the rest of the model and focus your ๐๏ธโโ๏ธ training on the chosen layers.
๐ Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...
---
For a practical guide, check out the article above.
Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
๐ ๐ฃ https://huggingface.co/blog/anakin87/spectrum
---
Looking to fine-tune Language Models efficiently and save on computational resources?
One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.
However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.
What if we could identify the most informative layers and only fine-tune those? ๐ค
This is exactly what Spectrum does! ๐
๐ฌ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)
๐ฏ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).
You can then โ๏ธ freeze the rest of the model and focus your ๐๏ธโโ๏ธ training on the chosen layers.
๐ Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...
---
For a practical guide, check out the article above.
The Forward-Forward Algorithm๐ค
FFA replaces the forward and backward passes in backpropagtion with two forward passes - one with positive (real) data and another with negative data. Each layer has its objective function - to increase or decrease a โgoodness" metric. The positive pass uses real data and adjusts weights to increase โgoodnessโ in every hidden layer. The negative pass does the opposite.
I must say reading&Implementing a godfather paper feels quite fulfilling:)
Thank you Prof. Geoffrey Hinton.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_the_forward_forward_algor
FFA replaces the forward and backward passes in backpropagtion with two forward passes - one with positive (real) data and another with negative data. Each layer has its objective function - to increase or decrease a โgoodness" metric. The positive pass uses real data and adjusts weights to increase โgoodnessโ in every hidden layer. The negative pass does the opposite.
I must say reading&Implementing a godfather paper feels quite fulfilling:)
Thank you Prof. Geoffrey Hinton.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_the_forward_forward_algor