Share and discover more about AI with social posts from the community.huggingface/OpenAi
๐ ๐ช๐ต๐ฒ๐ฟ๐ฒ ๐๐ฐ๐ฎ๐น๐ถ๐ป๐ด ๐น๐ฎ๐๐ ๐ฎ๐ฟ๐ฒ ๐๐ฎ๐ธ๐ถ๐ป๐ด ๐๐ : ๐ฏ๐ ๐ฎ๐ฌ๐ฎ๐ด, ๐๐ ๐๐น๐๐๐๐ฒ๐ฟ๐ ๐๐ถ๐น๐น ๐ฟ๐ฒ๐ฎ๐ฐ๐ต ๐๐ต๐ฒ ๐ฝ๐ผ๐๐ฒ๐ฟ ๐ฐ๐ผ๐ป๐๐๐บ๐ฝ๐๐ถ๐ผ๐ป ๐ผ๐ณ ๐ฒ๐ป๐๐ถ๐ฟ๐ฒ ๐ฐ๐ผ๐๐ป๐๐ฟ๐ถ๐ฒ๐
Reminder : โScaling lawsโ are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.
To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.
๐ As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.
The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So letโs take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the โsuper AI clusterโ of Elon Muskโs xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.
Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
โก๏ธ Three key ingredients of training might be a roadblock to scaling up :
๐ธ Money: but itโs very unlikely, given the potential market size for AGI, that investors lose interest.
โก๏ธ Energy supply at a specific location
๐ Training data: weโre already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.
๐ค Iโd be curious to hear your thoughts: do you think weโll race all the way there?
Reminder : โScaling lawsโ are empirical laws saying that if you keep multiplying your compute by x10, your models will mechanically keep getting better and better.
To give you an idea, GPT-3 can barely write sentences, and GPT-4, which only used x15 its amount of compute, already sounds much smarter than some of my friends (although it's not really - or at least I haven't tested them side-by side). So you can imagine how far a x100 over GPT-4 can take us.
๐ As a result, tech titans are racing to build the biggest models, and for this they need gigantic training clusters.
The picture below shows the growth of training compute: it is increasing at a steady exponential rate of a x10 every 2 years. So letโs take this progress a bit further:
- 2022: starting training for GPT-4 : 10^26 FLOPs, cost of $100M
- 2024: today, companies start training on much larger clusters like the โsuper AI clusterโ of Elon Muskโs xAI, 10^27 FLOPS, $1B
- 2026 : by then clusters will require 1GW, i.e. around the full power generated by a nuclear reactor
- 2028: we reach cluster prices in the 100 billion dollars, using 10GW, more than the most powerful power stations currently in use in the US. This last size seems crazy, but Microsoft and OpenAI already are planning one.
Will AI clusters effectively reach these crazy sizes where the consume as much as entire countries?
โก๏ธ Three key ingredients of training might be a roadblock to scaling up :
๐ธ Money: but itโs very unlikely, given the potential market size for AGI, that investors lose interest.
โก๏ธ Energy supply at a specific location
๐ Training data: weโre already using 15 trillion tokens for Llama-3.1 when Internet has something like 60 trillion.
๐ค Iโd be curious to hear your thoughts: do you think weโll race all the way there?
How do i access llama 3.1 70b in my space ?
this doesn't seem to work, can someone help me with a working code
from transformers import AutoConfig
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-70B", revision="main")
config.rope_scaling = {"type": "llama3", "factor": 8.0}
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-70B", config=config, use_auth_token=True)
this doesn't seem to work, can someone help me with a working code
from transformers import AutoConfig
config = AutoConfig.from_pretrained("meta-llama/Meta-Llama-3.1-70B", revision="main")
config.rope_scaling = {"type": "llama3", "factor": 8.0}
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3.1-70B", config=config, use_auth_token=True)
I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
-
vidore/colpali
for retrieval ๐ it doesn't need indexing with image-text pairs but just images!
-
Qwen/Qwen2-VL-2B-Instruct
for generation ๐ฌ directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new ๐ญ Byaldi library by @bclavie ๐ค
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb
-
vidore/colpali
for retrieval ๐ it doesn't need indexing with image-text pairs but just images!
-
Qwen/Qwen2-VL-2B-Instruct
for generation ๐ฌ directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new ๐ญ Byaldi library by @bclavie ๐ค
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb
The timm leaderboard
timm/leaderboard
has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.
Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit (
timm/test_vit.r160_in1k
)
* test_efficientnet (
timm/test_efficientnet.r160_in1k
)
* test_byobnet (
timm/test_byobnet.r160_in1k
, a mix of resnet, darknet, effnet/regnet like blocks)
They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
timm/leaderboard
has been updated with the ability to select different hardware benchmark sets: RTX4090, RTX3090, two different CPUs along with some NCHW / NHWC layout and torch.compile (dynamo) variations.
Also worth pointing out, there are three rather newish 'test' models that you'll see at the top of any samples/sec comparison:
* test_vit (
timm/test_vit.r160_in1k
)
* test_efficientnet (
timm/test_efficientnet.r160_in1k
)
* test_byobnet (
timm/test_byobnet.r160_in1k
, a mix of resnet, darknet, effnet/regnet like blocks)
They are < 0.5M params, insanely fast and originally intended for unit testing w/ real weights. They have awful ImageNet top-1, it's rare to have anyone bother to train a model this small on ImageNet (the classifier is roughly 30-70% of the param count!). However, they are FAST on very limited hadware and you can fine-tune them well on small data. Could be the model you're looking for?
Decided to try to check how many weights in a 70b F32 model would be squashed when converted to F16 (spoiler, it's shockingly few)
The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16
Had claude make me a script, using the new Reflection-70B, and these are the results:
Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%
0.03%!!!!
A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)
This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.
Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.
The reason for this comparison is that it should represent the same percentage of squishing as bf16 to fp16
Had claude make me a script, using the new Reflection-70B, and these are the results:
Total weights: 70553706496
Fully representable: 70530215524
Squashed: 23490972
Percentage squashed: 0.03%
0.03%!!!!
A couple things to note, this uses a roundtrip of F32 -> F16 -> F32 and then torch.isclose to account for rounding errors that come up by the very nature of extremely accurate numbers, but it uses VERY small tolerances (rtol=1e-5, atol=1e-8)
This is also examining EVERY weight that was stored at F32, and for most layers I was somewhere between 0% and 0.03% of weights being squashed, no major outliers.
Overall, I feel even safer converting to F16 for llama.cpp, the extremely small number of weights that fall outside the range are likely so small that they don't actually play a role in the final output of the model at inference anyways.
๐ Introducing Hugging Face's Multilingual Speech-to-Speech! ๐ค
๐ฌOur modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.
๐ Building on an amazing early reception with 2600 stars on GitHub ๐
๐ We are expanding the library to support multiple languages
๐ฅ Try it out with a flag: --language fr
๐คฏ Or don't set the flag and let the system detect the language
๐ก What feature should we add next? https://cdn-uploads.huggingface.co/production/uploads/65d66b494bbd0d92b641cdbb/WbpkWi8OlJGXnL1kzmcqK.mp4
๐ฌOur modular, cross-platform pipeline to run GPT4o-like experiences on device can now seamlessly switch languages mid-conversation with an imperceptible 100ms delay.
๐ Building on an amazing early reception with 2600 stars on GitHub ๐
๐ We are expanding the library to support multiple languages
๐ฅ Try it out with a flag: --language fr
๐คฏ Or don't set the flag and let the system detect the language
๐ก What feature should we add next? https://cdn-uploads.huggingface.co/production/uploads/65d66b494bbd0d92b641cdbb/WbpkWi8OlJGXnL1kzmcqK.mp4
@victor Sorry for the repetitiveness.
I'm not sure if Post is the right place to report such an error, but it seems to be a server error unrelated to the Zero GPU space error the other day, so I don't know where else to report it.
Since this morning, I have been getting a strange error when running inference from space in Gradio 3.x.
Yntec (https://huggingface.co/Yntec) discovered it, but he is not in the Pro subscription, so I am reporting it on behalf of him.
The error message is as follows: 1girl and other prompts will show cached output, so experiment with unusual prompts.
Thank you in advance.
John6666/blitz_diffusion_error
John6666/GPU-stresser-t2i-error
I'm not sure if Post is the right place to report such an error, but it seems to be a server error unrelated to the Zero GPU space error the other day, so I don't know where else to report it.
Since this morning, I have been getting a strange error when running inference from space in Gradio 3.x.
Yntec (https://huggingface.co/Yntec) discovered it, but he is not in the Pro subscription, so I am reporting it on behalf of him.
The error message is as follows: 1girl and other prompts will show cached output, so experiment with unusual prompts.
Thank you in advance.
John6666/blitz_diffusion_error
John6666/GPU-stresser-t2i-error
A few weeks ago, we uploaded the MERIT Dataset ๐๐๐ into Hugging Face ๐ค!
Now, we are excited to share the Merit Dataset paper via arXiv! ๐๐ซ
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. ๐ง๐จ
MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? ๐ง
Resources:
- Dataset:
de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
PD: We are grateful to Hugging Face ๐ค for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.
Now, we are excited to share the Merit Dataset paper via arXiv! ๐๐ซ
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. ๐ง๐จ
MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? ๐ง
Resources:
- Dataset:
de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
PD: We are grateful to Hugging Face ๐ค for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.
I am integrating Azure Cosmos DB, the database system that backs GPT conversations into my workflow, and experimenting with new patterns to accelerate dataset evolution for evaluation and training of AI.
While initially using it for research prompts and research outputs using my GPT-4o client here which can interface and search ArXiv, I am excited to try out some new features specifically for AI at scale. Research on memory augmentation is shown.
awacke1/GPT-4o-omni-text-audio-image-video
awacke1/AzureCosmosDBUI https://huggingface.co/spaces/awacke1/GPT-4o-omni-text-audio-image-video
While initially using it for research prompts and research outputs using my GPT-4o client here which can interface and search ArXiv, I am excited to try out some new features specifically for AI at scale. Research on memory augmentation is shown.
awacke1/GPT-4o-omni-text-audio-image-video
awacke1/AzureCosmosDBUI https://huggingface.co/spaces/awacke1/GPT-4o-omni-text-audio-image-video
A few weeks ago, we uploaded the MERIT Dataset ๐๐๐ into Hugging Face ๐ค!
Now, we are excited to share the Merit Dataset paper via arXiv! ๐๐ซ
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. ๐ง๐จ
MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? ๐ง
Resources:
- Dataset:
de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
PD: We are grateful to Hugging Face ๐ค for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.
Now, we are excited to share the Merit Dataset paper via arXiv! ๐๐ซ
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts (2409.00447)
The MERIT Dataset is a fully synthetic, labeled dataset created for training and benchmarking LLMs on Visually Rich Document Understanding tasks. It is also designed to help detect biases and improve interpretability in LLMs, where we are actively working. ๐ง๐จ
MERIT contains synthetically rendered students' transcripts of records from different schools in English and Spanish. We plan to expand the dataset into different contexts (synth medical/insurance documents, synth IDS, etc.) Want to collaborate? Do you have any feedback? ๐ง
Resources:
- Dataset:
de-Rodrigo/merit
- Code and generation pipeline: https://github.com/nachoDRT/MERIT-Dataset
PD: We are grateful to Hugging Face ๐ค for providing the fantastic tools and resources we find in the platform and, more specifically, to @nielsr for sharing the fine-tuning/inference scripts we have used in our benchmark.
๐ฃAi2 Releasing OLMoE!
OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters, and, OLMoE is 100% open-source in model, code-base, datasets!
๐ฆPaper: https://arxiv.org/abs/2409.02060
๐คModel:
allenai/OLMoE-1B-7B-0924-Instruct
๐พDatasets:
allenai/OLMoE-mix-0924
๐โโ๏ธDemo:
vilarin/OLMoE https://huggingface.co/spaces/vilarin/OLMoE
OLMoE-1B-7B-Instruct is a Mixture-of-Experts LLM with 1B active and 7B total parameters, and, OLMoE is 100% open-source in model, code-base, datasets!
๐ฆPaper: https://arxiv.org/abs/2409.02060
๐คModel:
allenai/OLMoE-1B-7B-0924-Instruct
๐พDatasets:
allenai/OLMoE-mix-0924
๐โโ๏ธDemo:
vilarin/OLMoE https://huggingface.co/spaces/vilarin/OLMoE
the new version of Enigma, our code-instruct specialist, is out now:
-
ValiantLabs/Llama3.1-8B-Enigma
is trained on code-instruct and general chat data.
- the updated code-instruct database is available now as well:
sequelbox/Tachibana
more to come soon!
https://huggingface.co/datasets/sequelbox/Tachibana
-
ValiantLabs/Llama3.1-8B-Enigma
is trained on code-instruct and general chat data.
- the updated code-instruct database is available now as well:
sequelbox/Tachibana
more to come soon!
https://huggingface.co/datasets/sequelbox/Tachibana
๐ฅณ ๐ง๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ๐ ๐๐ด๐ฒ๐ป๐๐ ๐ป๐ผ๐ ๐๐๐ฝ๐ฝ๐ผ๐ฟ๐๐ ๐ ๐๐น๐๐ถ-๐ฎ๐ด๐ฒ๐ป๐ ๐๐๐๐๐ฒ๐บ๐!
Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.
You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)
To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.
Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks ๐
Read more in the doc ๐ https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md
Checkout an advanced multi-agent system that tops the GAIA leaderboard ๐ https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py
Multi-agent systems have been introduced in Microsoft's framework Autogen. It simply means having several agents working together to solve your task instead of only one : this paradigm empirically yields better performance on most benchmarks. The reason for this better performance is conceptually simple: for many tasks, rather than using a do-it-all system, you would prefer to specialize units on sub-tasks. Here, having agents with separate tool sets and memories allows to achieve efficient specialization.
You can now easily build hierarchical multi-agent systems with transformers.agents (not released yet, use the dev version)
To do so, encapsulate the agent in a ManagedAgent object. This object needs arguments agent, name, and a description, which will then be embedded in the manager agent's system prompt to let it know how to call this managed agent, as we also do for tools.
Cf the example in the image! We'll keep building on this paradigm in the upcoming weeks ๐
Read more in the doc ๐ https://github.com/huggingface/transformers/blob/main/docs/source/en/agents_advanced.md
Checkout an advanced multi-agent system that tops the GAIA leaderboard ๐ https://github.com/aymeric-roucher/GAIA/blob/main/gaia_multiagent.py
My tool calling playgrounds repo has been updated again to include the use of flux1-schnell or dev image generation. This functionality is similar to using Dall-E 3 via the @ decorator in ChatGPT. Once the function is selected, the model will either extract or improve your prompt (depending on how you ask).
I have also included 2 notebooks that cover different ways to access Flux for your specific use case. The first method covers how to access flux via LitServe from Lightning AI. LitServe is a bare-bones inference engine with a focus on modularity rather than raw performance. LitServe supports text generation models as well as image generation, which is great for some use cases, but does not provide the caching mechanisms from a dedicated image generation solution.
Since dedicated caching mechanisms are so crucial to performance, I also included an example for how to integrate SwarmUI/ComfyUI to utilize a more dedicated infrastructure that may already be running as part of your tech stack. Resulting in a Llama-3.1 capable of utilizing specific ComfyUI JSON configs, and many different settings.
Lastly, I tested the response times for each over a small batch request to simulate a speed test.
It becomes clear quickly how efficient caching mechanisms can greatly reduce the generation time, even in a scenario where another model is called. An average 4.5 second response time is not bad at all when you consider that an 8B model is calling a 12B parameter model for a secondary generation.
Repo: https://github.com/tdolan21/tool-calling-playground
LitServe: https://github.com/Lightning-AI/LitServe
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
I have also included 2 notebooks that cover different ways to access Flux for your specific use case. The first method covers how to access flux via LitServe from Lightning AI. LitServe is a bare-bones inference engine with a focus on modularity rather than raw performance. LitServe supports text generation models as well as image generation, which is great for some use cases, but does not provide the caching mechanisms from a dedicated image generation solution.
Since dedicated caching mechanisms are so crucial to performance, I also included an example for how to integrate SwarmUI/ComfyUI to utilize a more dedicated infrastructure that may already be running as part of your tech stack. Resulting in a Llama-3.1 capable of utilizing specific ComfyUI JSON configs, and many different settings.
Lastly, I tested the response times for each over a small batch request to simulate a speed test.
It becomes clear quickly how efficient caching mechanisms can greatly reduce the generation time, even in a scenario where another model is called. An average 4.5 second response time is not bad at all when you consider that an 8B model is calling a 12B parameter model for a secondary generation.
Repo: https://github.com/tdolan21/tool-calling-playground
LitServe: https://github.com/Lightning-AI/LitServe
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
Just wrapped up a deep dive into the latest lecture on building LLMs, such as ChatGPT, from @Stanford CS229 course. Here are my top takeaways:
๐ Understanding the Components: LLMs like ChatGPT, Claude, and others are more than just neural networks; they are a complex blend of architecture, training loss, data evaluation, and systems. Knowing how these components work together is key to improving and scaling these models.
๐ Scaling Matters: Performance improves predictably with more data, bigger models, and greater computational power. However, balancing these factors is crucial to avoid overfitting and resource waste.
๐ Data is King: LLMs are trained on trillions of tokens scraped from the internet, but the quality of this data matters immensely. Rigorous filtering and deduplication processes are essential to maintaining data integrity.
๐ Pre-Training vs. Post-Training: While pre-training equips the model with general knowledge, post-training (like RLHF) fine-tunes it to follow human-like responses, reducing toxic outputs and improving alignment with human values.
๐ Reinforcement Learning from Human Feedback (RLHF): This technique allows LLMs to maximize outputs that align with human preferences, making models more reliable and accurate.
๐ก Why It Matters: Understanding these processes not only helps us appreciate the complexity behind our everyday AI tools but also highlights the challenges and opportunities in the ever-evolving field of AI.
Whether youโre in tech, data science, or just AI-curious, staying updated on these advancements is crucial. LLMs are not just transforming industries; theyโre redefining the future of human-computer interaction!
I just realized this was almost 2 hours long...
Link: https://www.youtube.com/watch?v=9vM4p9NN0Ts
๐ Understanding the Components: LLMs like ChatGPT, Claude, and others are more than just neural networks; they are a complex blend of architecture, training loss, data evaluation, and systems. Knowing how these components work together is key to improving and scaling these models.
๐ Scaling Matters: Performance improves predictably with more data, bigger models, and greater computational power. However, balancing these factors is crucial to avoid overfitting and resource waste.
๐ Data is King: LLMs are trained on trillions of tokens scraped from the internet, but the quality of this data matters immensely. Rigorous filtering and deduplication processes are essential to maintaining data integrity.
๐ Pre-Training vs. Post-Training: While pre-training equips the model with general knowledge, post-training (like RLHF) fine-tunes it to follow human-like responses, reducing toxic outputs and improving alignment with human values.
๐ Reinforcement Learning from Human Feedback (RLHF): This technique allows LLMs to maximize outputs that align with human preferences, making models more reliable and accurate.
๐ก Why It Matters: Understanding these processes not only helps us appreciate the complexity behind our everyday AI tools but also highlights the challenges and opportunities in the ever-evolving field of AI.
Whether youโre in tech, data science, or just AI-curious, staying updated on these advancements is crucial. LLMs are not just transforming industries; theyโre redefining the future of human-computer interaction!
I just realized this was almost 2 hours long...
Link: https://www.youtube.com/watch?v=9vM4p9NN0Ts
Meta Platforms to use social media posts from Europe to train AI
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
FACEBOOK owner Meta Platforms plans to start incorporating social media content from Europe to train its generative artificial intelligence models, the company said on Monday (Jun 10).
Meta will train its Llama large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram and Facebook, it said in a blog post.
The shift appears to bring the companyโs approach in Europe roughly in line with how it treats the data it feeds into its AI models from elsewhere around the world, despite earlier caution due to stringent EU privacy and transparency regulations.
Metaโs top policy executive told Reuters in an interview in September that it uses public Facebook and Instagram posts to train its Llama models, while excluding private posts and messages shared only with friends.
As of April, when the company started releasing the latest versions of Llama, Meta was โstill working on the right way to do this in Europe,โ its chief product officer told Reuters at the time.
The social media giant said last month that it would start notifying Facebook and Instagram users in the European region and the United Kingdom about how it uses public information shared on Metaโs services to develop and improve AI.https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-platforms-use-social-media-posts-europe-train-ai
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
Meta will train its large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram, Facebook. PHOTO: REUTERS
FACEBOOK owner Meta Platforms plans to start incorporating social media content from Europe to train its generative artificial intelligence models, the company said on Monday (Jun 10).
Meta will train its Llama large language models using content that people in the European Union have chosen to share publicly on its platforms such as Instagram and Facebook, it said in a blog post.
The shift appears to bring the companyโs approach in Europe roughly in line with how it treats the data it feeds into its AI models from elsewhere around the world, despite earlier caution due to stringent EU privacy and transparency regulations.
Metaโs top policy executive told Reuters in an interview in September that it uses public Facebook and Instagram posts to train its Llama models, while excluding private posts and messages shared only with friends.
As of April, when the company started releasing the latest versions of Llama, Meta was โstill working on the right way to do this in Europe,โ its chief product officer told Reuters at the time.
The social media giant said last month that it would start notifying Facebook and Instagram users in the European region and the United Kingdom about how it uses public information shared on Metaโs services to develop and improve AI.https://www.businesstimes.com.sg/companies-markets/telcos-media-tech/meta-platforms-use-social-media-posts-europe-train-ai
Chinese and US scientists create AI model to help develop new drugs
Victoria Bela
Published: 6:30pm, 26 Aug 2024
Scientists in China and the United States say they have developed a new artificial intelligence (AI) model that could help overcome some major challenges to drug development and discovery.
The model, called ActFound, outperforms competing models while bypassing challenges to using machine learning in bioactivity prediction, according to a paper published in Nature Machine Intelligence.
โBioactivity encompasses various properties of compounds, such as their interaction with targets, impact on biological systems and therapeutic effects,โ said the researchers from Peking University, the University of Washington and AI tech firm INF Technology Shanghai.
The main challenges to using machine learning include limited data labelling and incompatibility between assays, the tests that measure the activity or potency of drugs.
The model not only outperforms competing AI models, but also functions as well as free-energy perturbation (FEP) โ a traditional computational method.
Although FEP calculations have a high level of accuracy, the team warned that they โrequire extensive computational resources that are often not affordable for large-scale applicationsโ.
Such methods often rely on hard-to-obtain, three-dimensional protein structures to run, which can only be obtained using expensive equipment and extensive laboratory procedures.
Victoria Bela
Published: 6:30pm, 26 Aug 2024
Scientists in China and the United States say they have developed a new artificial intelligence (AI) model that could help overcome some major challenges to drug development and discovery.
The model, called ActFound, outperforms competing models while bypassing challenges to using machine learning in bioactivity prediction, according to a paper published in Nature Machine Intelligence.
โBioactivity encompasses various properties of compounds, such as their interaction with targets, impact on biological systems and therapeutic effects,โ said the researchers from Peking University, the University of Washington and AI tech firm INF Technology Shanghai.
The main challenges to using machine learning include limited data labelling and incompatibility between assays, the tests that measure the activity or potency of drugs.
The model not only outperforms competing AI models, but also functions as well as free-energy perturbation (FEP) โ a traditional computational method.
Although FEP calculations have a high level of accuracy, the team warned that they โrequire extensive computational resources that are often not affordable for large-scale applicationsโ.
Such methods often rely on hard-to-obtain, three-dimensional protein structures to run, which can only be obtained using expensive equipment and extensive laboratory procedures.
0904-NVIDIA Launches NIM Microservices for Generative AI in Japan, Taiwan
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business networks to ensure AI systems align with local values, laws and interests.
In support of these efforts, NVIDIA today announced the availability of four new NVIDIA NIM microservices that enable developers to more easily build and deploy high-performing generative AI applications.
The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.
In the Asia-Pacific region alone, generative AI software revenue is expected to reach $48 billion by 2030 โ up from $5 billion this year, according to ABI Research.
Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations and other customs.
The RakutenAI 7B family of models, built on Mistral-7B, were trained on English and Japanese datasets, and are available as two different NIM microservices for Chat and Instruct. Rakutenโs foundation and instruct models have achieved leading scores among open Japanese large language models, landing the top average score in the LM Evaluation Harness benchmark carried out from January to March 2024.
Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.
The models offer leading performance for Japanese and Mandarin language understanding, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.
Nations worldwide โ from Singapore, the United Arab Emirates, South Korea and Sweden to France, Italy and India โ are investing in sovereign AI infrastructure.
The new NIM microservices allow businesses, government agencies and universities to host native LLMs in their own environments, enabling developers to build advanced copilots, chatbots and AI assistants.https://blogs.nvidia.com/blog/nim-microservices-generative-ai/
Nations around the world are pursuing sovereign AI to produce artificial intelligence using their own computing infrastructure, data, workforce and business networks to ensure AI systems align with local values, laws and interests.
In support of these efforts, NVIDIA today announced the availability of four new NVIDIA NIM microservices that enable developers to more easily build and deploy high-performing generative AI applications.
The microservices support popular community models tailored to meet regional needs. They enhance user interactions through accurate understanding and improved responses based on local languages and cultural heritage.
In the Asia-Pacific region alone, generative AI software revenue is expected to reach $48 billion by 2030 โ up from $5 billion this year, according to ABI Research.
Llama-3-Swallow-70B, trained on Japanese data, and Llama-3-Taiwan-70B, trained on Mandarin data, are regional language models that provide a deeper understanding of local laws, regulations and other customs.
The RakutenAI 7B family of models, built on Mistral-7B, were trained on English and Japanese datasets, and are available as two different NIM microservices for Chat and Instruct. Rakutenโs foundation and instruct models have achieved leading scores among open Japanese large language models, landing the top average score in the LM Evaluation Harness benchmark carried out from January to March 2024.
Training a large language model (LLM) on regional languages enhances the effectiveness of its outputs by ensuring more accurate and nuanced communication, as it better understands and reflects cultural and linguistic subtleties.
The models offer leading performance for Japanese and Mandarin language understanding, regional legal tasks, question-answering, and language translation and summarization compared with base LLMs like Llama 3.
Nations worldwide โ from Singapore, the United Arab Emirates, South Korea and Sweden to France, Italy and India โ are investing in sovereign AI infrastructure.
The new NIM microservices allow businesses, government agencies and universities to host native LLMs in their own environments, enabling developers to build advanced copilots, chatbots and AI assistants.https://blogs.nvidia.com/blog/nim-microservices-generative-ai/
NEW Goldman, Nomura tap Meta Llama AI models
In the 18 months since launch, the mostly free open source Llama models have seen nearly 350 million downloads and been taken up several major firms, including in financial services.
In a progress report, Meta says that Goldman Sachs' GS AI Platform allows the bank's engineers to use Llama models for various use cases, including information extraction from documents.
Meanwhile, Nomura uses Llama on AWS to achieve faster innovation, transparency, bias guardrails, and performance across text summarisation, code generation, log analysis, and document processing.
Meta has ploughed billions of dollars into AI but is taking a different approach to rivals such as OpenAI with its open source model.
In a July letter, Mark Zuckerberg argued that open source AI is good for Meta because it prevents the firm getting locked into a competitor's closed ecosystem.
In addition, he, wrote: "The bottom line is that open source AI represents the worldโs best shot at harnessing this technology to create the greatest economic opportunity and security for everyone."https://www.finextra.com/newsarticle/44650/goldman-nomura-tap-meta-llama-ai-models
In the 18 months since launch, the mostly free open source Llama models have seen nearly 350 million downloads and been taken up several major firms, including in financial services.
In a progress report, Meta says that Goldman Sachs' GS AI Platform allows the bank's engineers to use Llama models for various use cases, including information extraction from documents.
Meanwhile, Nomura uses Llama on AWS to achieve faster innovation, transparency, bias guardrails, and performance across text summarisation, code generation, log analysis, and document processing.
Meta has ploughed billions of dollars into AI but is taking a different approach to rivals such as OpenAI with its open source model.
In a July letter, Mark Zuckerberg argued that open source AI is good for Meta because it prevents the firm getting locked into a competitor's closed ecosystem.
In addition, he, wrote: "The bottom line is that open source AI represents the worldโs best shot at harnessing this technology to create the greatest economic opportunity and security for everyone."https://www.finextra.com/newsarticle/44650/goldman-nomura-tap-meta-llama-ai-models
AI โtigerโ MiniMax launches text-to-video-generating model to rival OpenAIโs Sora
Xinmei Shen
Published: 7:00pm, 2 Sep 2024
Chinese artificial intelligence (AI) start-up MiniMax has launched video-01, its new text-to-video-generating model, heating up competition with other mainland tech firms that look to catch up with the advances made by OpenAIโs Sora.
MiniMax โ known as one of Chinaโs AI โtigersโ along with Zhipu AI, Baichuan and Moonshot AI โ made video-01 available to the public via its website after unveiling the new tool at the companyโs first developer conference in Shanghai on Saturday.
Video-01 enables a user to input a text description to create a video that is up to six seconds in length. The process from the text prompt to generating a video takes about two minutes.
MiniMax founder and chief executive Yan Junjie said at the event that video-01 is the first iteration of the firmโs video-generating tool. He pointed out that future updates will enable users to generate videos from images and to edit these videos, according to local media reports.https://img.i-scmp.com/cdn-cgi/image/fit=contain,width=1024,format=auto/sites/default/files/d8/images/canvas/2024/09/02/7b4222b5-84e6-4a26-8d36-3a1be489faff_46fe445f.jpg
Xinmei Shen
Published: 7:00pm, 2 Sep 2024
Chinese artificial intelligence (AI) start-up MiniMax has launched video-01, its new text-to-video-generating model, heating up competition with other mainland tech firms that look to catch up with the advances made by OpenAIโs Sora.
MiniMax โ known as one of Chinaโs AI โtigersโ along with Zhipu AI, Baichuan and Moonshot AI โ made video-01 available to the public via its website after unveiling the new tool at the companyโs first developer conference in Shanghai on Saturday.
Video-01 enables a user to input a text description to create a video that is up to six seconds in length. The process from the text prompt to generating a video takes about two minutes.
MiniMax founder and chief executive Yan Junjie said at the event that video-01 is the first iteration of the firmโs video-generating tool. He pointed out that future updates will enable users to generate videos from images and to edit these videos, according to local media reports.https://img.i-scmp.com/cdn-cgi/image/fit=contain,width=1024,format=auto/sites/default/files/d8/images/canvas/2024/09/02/7b4222b5-84e6-4a26-8d36-3a1be489faff_46fe445f.jpg