Share and discover more about AI with social posts from the community.huggingface/OpenAi
Prompt caching with
@AnthropicAI


Production-ready LLM applications often involve long, static instructions in every prompt. Anthropic's new prompt caching feature improves model latency by up to 80% and cost by up to 90% on such prompts.

Try it out in LangChain today!

Python: langchain-anthropic==0.1.23
JS: langchain/anthropic 0.2.15

Anthropic announcement: https://anthropic.com/news/prompt-caching Prompt caching with Claude
Even with preference alignment, LLMs can be enticed into harmful behavior via adversarial prompts 😈.

🚨 Breaking: our theoretical findings confirm:
LLM alignment is fundamentally limited!

More details, on framework, statistical bounds and phenomenal defense results 👇🏻
mistral-7b-instruct-v0.1-awq
Beta
Model ID: @hf/thebloke/mistral-7b-instruct-v0.1-awq

Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.

Properties
Task Type: Text Generation

Use the Playground
Try out this model with Workers AI Model Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
https://playground.ai.cloudflare.com/?model=@hf/thebloke/mistral-7b-instruct-v0.1-awq
hermes-2-pro-mistral-7b
Beta Function calling
Model ID: @hf/nousresearch/hermes-2-pro-mistral-7b

Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

Properties
Task Type: Text Generation

Use the Playground
Try out this model with Workers AI Model Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the Model Playground
https://playground.ai.cloudflare.com/?model=@hf/nousresearch/hermes-2-pro-mistral-7b
Cloudflare WARP client allows you to protect corporate devices by securely and privately sending traffic from those devices to Cloudflare’s global network, where Cloudflare Gateway can apply advanced web filtering. The WARP client also makes it possible to apply advanced Zero Trust policies that check for a device’s health before it connects to corporate applications.

Downloading and deploying the WARP client to your devices enhances the protection Cloudflare Zero Trust can provide to your users and data, wherever they are.

Here are a few ways in which the WARP client provides in-depth protection for your organization:

WARP lets you enforce security policies anywhere.
With the WARP client deployed in the Gateway with WARP mode, Gateway policies are not location-dependent — they can be enforced anywhere.

WARP lets you enforce HTTP filtering and user-based policies.
Download and install the WARP client to enable Gateway features such as Anti-Virus scanning, HTTP filtering, Browser Isolation, and identity-based policies.

WARP lets you have in-depth, application-specific insights.
With WARP installed on your corporate devices, you can populate the Zero Trust Shadow IT Discovery page with visibility down to the application and user level. This makes it easy to discover, analyze, and take action on any shadow IT your users may be using every day.

WARP allows you to build rich device posture rules.
The WARP client provides advanced Zero Trust protection by making it possible to check for device posture. By setting up device posture checks, you can build Zero Trust policies that check for a device’s location, disk encryption status, OS version, and more.https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/ WARP | Cloudflare Zero Trust docs
Build serverless applications and deploy instantly across the globe for exceptional performance, reliability, and scale.

Available on all plans
Cloudflare Workers provides a serverless execution environment that allows you to create new applications or augment existing ones without configuring or maintaining infrastructure.

Cloudflare Workers runs on Cloudflare’s global network in hundreds of cities worldwide, offering both Free and Paid plans.https://developers.cloudflare.com/workers/ Overview | Cloudflare Workers docs
Build real-time serverless video, audio and data applications.

Cloudflare Calls is infrastructure for real-time audio/video/data applications. It allows you to build real-time apps without worrying about scaling or regions. It can act as a selective forwarding unit (WebRTC SFU), as a fanout delivery system for broadcasting (WebRTC CDN) or anything in between.

Cloudflare Calls runs on Cloudflare’s global cloud network in hundreds of cities worldwide.https://developers.cloudflare.com/calls/ Overview | Cloudflare Calls docs
Cloudflare for Platforms
Cloudflare’s offering for SaaS businesses.

Extend Cloudflare’s security, reliability, and performance services to your customers with Cloudflare for Platforms. Together with Cloudflare for SaaS and Workers for Platforms, your customers can build custom logic to meet their needs right into your application.

Products
Cloudflare for SaaS
Cloudflare for SaaS allows you to extend the security and performance benefits of Cloudflare’s network to your customers via their own custom or vanity domains.

Use Cloudflare for SaaS
Workers for Platforms
Workers for Platforms help you deploy serverless functions programmatically on behalf of your customers.

Use Workers for Platforms
https://developers.cloudflare.com/cloudflare-for-platforms/ Cloudflare for Platforms | Cloudflare for Platforms docs
The Cloudflare China Network is a package of selected Cloudflare’s performance and security products running on data centers located in mainland China and operated by Cloudflare’s partner JD Cloud.

The data centers cover most populated regions in China. Combining Cloudflare’s technological leadership and JD Cloud’s local operations expertise, the Cloudflare China Network is designed to meet the needs for secure, reliable, and fast-performing content delivery in China. You can use the same configurations that you use with Cloudflare everywhere else in the world and with the same dashboard experience.

Main features
The Cloudflare China Network provides:

A single solution for both performance improvement and security services such as WAF, DDoS, and bot management.
An unified experience for managing network traffic and security posture. You can manage all configurations on the same dashboard.
The same customer support capabilities as Cloudflare’s global network. You may also have access to premium service and local language support.https://developers.cloudflare.com/china-network/ Overview | Cloudflare China Network docs
Speed up your online experience with Cloudflare’s public DNS resolver.

Available on all plans
1.1.1.1 is Cloudflare’s public DNS resolver. It offers a fast and private way to browse the Internet. DNS resolvers translate domains like cloudflare.com into the IP addresses necessary to reach the website (like 104.16.123.96).

Unlike most DNS resolvers, 1.1.1.1 does not sell user data to advertisers. 1.1.1.1 has also been measured to be the fastest DNS resolver available — it is deployed in hundreds of cities worldwide, and has access to the addresses of millions of domain names on the same servers it runs on.

1.1.1.1 is completely free. Setting it up takes minutes and requires no special software.https://developers.cloudflare.com/1.1.1.1/ 1.1.1.1 (DNS Resolver) | Cloudflare 1.1.1.1 docs
Dataset containing synthetically generated (by GPT-3.5 and GPT-4) short stories that only use a small vocabulary.

Described in the following paper: https://arxiv.org/abs/2305.07759.

The models referred to in the paper were trained on TinyStories-train.txt (the file tinystories-valid.txt can be used for validation loss). These models can be found on Huggingface, at roneneldan/TinyStories-1M/3M/8M/28M/33M/1Layer-21M.

Additional resources: tinystories_all_data.tar.gz - contains a superset of the stories together with metadata and the prompt that was used to create each story.

TinyStoriesV2-GPT4-train.txt - Is a new version of the dataset that is based on generations by GPT-4 only (the original dataset also has generations by GPT-3.5 which are of lesser quality). It contains all the examples in TinyStories.txt which were GPT-4 generated as a subset (but is significantly larger).

Evaluation_prompts.yaml: List of prompts used to evaluate our models (see paper)https://huggingface.co/datasets/roneneldan/TinyStories
Dataset Card for Alpaca-Cleaned
Repository: https://github.com/gururise/AlpacaDataCleaned
Dataset Description
This is a cleaned version of the original Alpaca Dataset released by Stanford. The following issues have been identified in the original release and fixed in this dataset:

Hallucinations: Many instructions in the original dataset had instructions referencing data on the internet, which just caused GPT3 to hallucinate an answer.https://huggingface.co/datasets/yahma/alpaca-cleaned GitHub - gururise/AlpacaDataCleaned: Alpaca dataset from Stanford, cleaned and curated
Simplified image-based implementation training on a CPU with live preview support - very satisfying to watch:)

I-JEPA is the image-based version of JEPA (Joint-Embedding Predictive Architecture - an alternative to autoregressive LLM architectures ) pioneered by professor Yann Lecun

At a higher level, I-JEPA predicts image segment representations (Target) based on representations of other segments within the same image (Context). It consists of the key components: a context encoder, target encoder and a predictor.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/mnist_ijepa.ipynb ai-algorithms/mnist_ijepa.ipynb at main · Jaykef/ai-algorithms
Hugging Face Bolsters AI Infrastructure With XetHub Acquisition
Hugging Face, a leading platform for open-source machine learning projects, has made a strategic acquisition of XetHub, a Seattle-based startup specializing in file management for artificial intelligence projects. This move aims to significantly enhance Hugging Face's AI storage capabilities, enabling developers to work with larger models and datasets more efficiently.

XetHub was founded by Yucheng Low, Ajit Banerjee and Rajat Arya, who previously worked at Apple, where they built and scaled Apple's internal ML infrastructure. The founders have a strong background in machine learning and data management, with Yucheng Low having co-founded Turi, a transformative ML/AI company acquired by Apple in 2016.

The startup has successfully raised $7.5 million in seed financing led by Seattle-based venture capital firm Madrona Ventures.https://www.forbes.com/sites/janakirammsv/2024/08/12/hugging-face-bolsters-ai-infrastructure-with-xethub-acquisition/ Hugging Face Bolsters AI Infrastructure With XetHub Acquisition
Hugging Face acquires Seattle data storage startup XetHub
GeekWire’s in-depth startup coverage tells the stories of the Pacific Northwest entrepreneurial scene.


XetHub CEO Yucheng Low. (LinkedIn Photo)
Hugging Face has acquired XetHub, a data storage and collaboration startup founded by former Apple engineers that helped developers streamline the process of building machine learning and artificial intelligence applications.

“Together we share a vision of democratizing AI to enable everyone to host, share, and build models and datasets,” XetHub CEO Yucheng Low wrote on LinkedIn. “At Hugging Face, we will continue to pursue this vision, integrating our technologies into the Hugging Face Hub to create the future of AI collaboration.”

The deal reflects the demand for data storage and compute needs driven by the AI boom, and is the latest in a string of smaller AI startups — and executives from those startups — getting gobbled up by larger companies.

New York-based Hugging Face offers a number of developer tools to help companies test, store, and run large-scale AI models that require substantial compute and storage capabilities. It raised a $235 million Series D round a year ago and acquired another developer tool startup called Argilla for $10 million in June.

Hugging Face isn’t sharing terms of the XetHub deal but says it’s the company’s largest acquisition to date. XetHub will add 14 employees to Hugging Face’s workforce.

Low co-founded Seattle-based XetHub in 2021. He previously worked at Turi, a Seattle machine learning startup that was acquired in 2016 by Apple, where he then spent nearly five years. Low is a graduate of Carnegie Mellon University, where he earned a PhD in machine learning.

XetHub co-founder Rajat Arya also worked at Turi and Apple. He previously worked at Amazon Web Services and Microsoft.

The startup’s third co-founder, Ajit Banerjee, is a former senior software architect at Apple who co-founded a Seattle job interviewing and matching startup called TalentWorks.

“When Amazon recommends a product, or Gmail auto-suggests an email reply, or Apple’s FaceID unlocks a screen: these are examples of intelligent applications,” Arya told GeekWire last year. “Up until now, this AI-powered functionality has been limited to those big players, but increasingly we’re going to see every business adopt AI.”

XetHub raised a $7.5 million round last year from Madrona Ventures.https://www.geekwire.com/2024/hugging-face-acquires-seattle-data-storage-startup-xethub/ Hugging Face acquires Seattle data storage startup XetHub
Hugging Face acquires XetHub from ex-Apple researchers for large AI model hosting
Hugging Face today announced it has acquired Seattle-based XetHub, a collaborative development platform founded by former Apple researchers to help machine learning teams work more efficiently with large datasets and models.

While the exact value of the deal remains undisclosed, CEO Clem Delangue said in an interview with Forbes that this is the largest acquisition the company has made thus far.https://venturebeat.com/ai/hugging-face-acquires-xethub-from-ex-apple-researchers-for-large-ai-model-hosting/ Hugging Face acquires XetHub from ex-Apple researchers for large AI model hosting
NVIDIA NIM Now Available on Hugging Face with Inference-as-a-Service
Hugging Face has announced the launch of an inference-as-a-service capability powered by NVIDIA NIM. This new service will provide developers easy access to NVIDIA-accelerated inference for popular AI models.

The new service allows developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from NVIDIA NIM microservices running on NVIDIA DGX Cloud. This will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production.

The Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment. The NVIDIA DGX Cloud platform is purpose-built for generative AI and provides scalable GPU resources that support every step of AI development, from prototype to production.

To use the service, users must have access to an Enterprise Hub organization and a fine-grained token for authentication. The NVIDIA NIM Endpoints for supported Generative AI models can be found on the model page of the Hugging Face Hub.https://www.infoq.com/news/2024/08/nvidia-nim-huggingface/ NVIDIA NIM Now Available on Hugging Face with Inference-as-a-Service
Serverless Inference API has shorter context length than the model?
I tried Llama 3.1 70B with Huggingface Serverless Inference API but got an error with 20k tokens even if the model has 128k context length. Does Huggingface limit the context length on top of the model and is there any workaround for this?
Need Help Integrating black-forest-labs/FLUX.1-dev Text-to-Image Model in Next.js App
I'm trying to build a Next.js app using the black-forest-labs/FLUX.1-dev text-to-image model, but I've been struggling to get it working for the past few days. I've tried using the Next.js AI SDK and the HfInference library, but I'm not sure how to properly integrate them. Has anyone had experience with this or could offer some guidance? Any help would be greatly appreciated!
Difficulties to deal with HuggingFace transformers
Hi,

I am currently working with HuggingFace's transformers library. It is somewhat convenient to load models. I am not a troll. But the deeper I go, the more difficulties arise and I got the impression that the api is not well designed.

It allows for setting the same option at various places, and it is not documented how they interplay. For instance, it seems there is no uniform way to handle special tokens such as EOS. One can set these tokens 1. in the model, 2. in the tokenizer, and 3. in the pipeline. It is unclear to me how exactly these options interplay, and also the documentation does not say anything about it. Sometimes parameters are just ignored, and the library does not warn you about it. For instance, the parameter "add_eos_token" of the tokenizer seems to have no effect in some cases, and I am not the only one with this issue (https://github.com/huggingface/transformers/issues/30947).

It seems that it strongly depends on the model where and how you actually configure options, what effects they will have, or which settings work at all. This somehow contrasts the purpose of the api. It wants to make it easy to switch from one model to another, giving the impression that everything is controlled by just the model id. But when you go deeper it turns out that many small things have to be tailored to the model (even if restricted to a certain class such as generative text LLM). A look into the sourcecode of the transformers library confirms that it makes distinctions depending on the model id. That is, internally the library seems to exploit knowledge about the different models. That's not what one expects from a platform that pretends to work with arbitrary models.

Anyone having thoughts like this?