Serverless Inference API has shorter context length than... | Serverless Inference API has shorter context length than...
Serverless Inference API has shorter context length than the model?
I tried Llama 3.1 70B with Huggingface Serverless Inference API but got an error with 20k tokens even if the model has 128k context length. Does Huggingface limit the context length on top of the model and is there any workaround for this?