Model Card for deberta-v3-base-prompt-injection-v2
This model is a fine-tuned version of microsoft/deberta-v3-base specifically developed to detect and classify prompt injection attacks which can manipulate language models into producing unintended outputs.
Introduction
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The deberta-v3-base-prompt-injection-v2 model is designed to enhance security in language model applications by detecting these malicious interventions.
Model Details
Fine-tuned by: Protect AI
Model type: deberta-v3-base
Language(s) (NLP): English
License: Apache License 2.0
Finetuned from model: microsoft/deberta-v3-base
Intended Uses
This model classifies inputs into benign (0) and injection-detected (1).
Limitations
deberta-v3-base-prompt-injection-v2 is highly accurate in identifying prompt injections in English. It does not detect jailbreak attacks or handle non-English prompts, which may limit its applicability in diverse linguistic environments or against advanced adversarial techniques.
Additionally, we do not recommend using this scanner for system prompts, as it produces false-positives.
https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2
This model is a fine-tuned version of microsoft/deberta-v3-base specifically developed to detect and classify prompt injection attacks which can manipulate language models into producing unintended outputs.
Introduction
Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses. The deberta-v3-base-prompt-injection-v2 model is designed to enhance security in language model applications by detecting these malicious interventions.
Model Details
Fine-tuned by: Protect AI
Model type: deberta-v3-base
Language(s) (NLP): English
License: Apache License 2.0
Finetuned from model: microsoft/deberta-v3-base
Intended Uses
This model classifies inputs into benign (0) and injection-detected (1).
Limitations
deberta-v3-base-prompt-injection-v2 is highly accurate in identifying prompt injections in English. It does not detect jailbreak attacks or handle non-English prompts, which may limit its applicability in diverse linguistic environments or against advanced adversarial techniques.
Additionally, we do not recommend using this scanner for system prompts, as it produces false-positives.
https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2