Understanding the json format response with HF's Serverle...

Understanding the json format response with HF's Serverless Inference API 🤗

As it stands, there seems to be an inconsistency with the OpenAI documentation on the question of implementing the JSON response format using the InferenceClient completion API.

After investigating the InferenceClient source code, I share the official solution using a JSON Schema. This consolidates the structure of the response and simplifies parsing as part of an automated process for extracting metadata, information:

from huggingface_hub import InferenceClient

client = InferenceClient("meta-llama/Meta-Llama-3-70B-Instruct")

messages = [
{
"role": "user",
"content": "I saw a puppy a cat and a raccoon during my bike ride in the park. What did I saw and when?",
},
]

response_format = {
"type": "json",
"value": {
"properties": {
"location": {"type": "string"},
"activity": {"type": "string"},
"animals_seen": {"type": "integer", "minimum": 1, "maximum": 5},
"animals": {"type": "array", "items": {"type": "string"}},
},
"required": ["location", "activity", "animals_seen", "animals"],
},
}

response = client.chat_completion(
messages=messages,
response_format=response_format,
max_tokens=500,
)

print(response.choices[0].message.content)

As a reminder, json mode is activated with the OpenAI client as follows:

response = client.chat.completions.create(
model="gpt-3.5-turbo-0125",
messages=[...],
response_format={"type": "json_object"}
)

One question remains unanswered, however, and will perhaps be answered by the community: it seems that an incompatibility persists for list of dictionaries generation, and currently, the production of simple dictionaries seems to be the only functional option.