AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Which prompting attack directly exposes the configured behavior of a large language model (LLM)?

Exam-Like

Community

RRitesh

Last updated: December 8, 2025 at 19:13

Prompted persona switches

Exploiting friendliness and trust

Ignoring the prompt template

Extracting the prompt template

Explanation:

Option D is CORRECT because extracting the prompt template involves crafting inputs to directly reveal the underlying instructions or configurations of the LLM, such as system-level prompts or hidden instructions. This type of attack directly exposes the model's configured behavior, potentially revealing sensitive or proprietary information.

Explanation:

Extracting the prompt template is a specific type of prompt injection attack where an adversary crafts inputs designed to make the LLM reveal its underlying prompt template, system instructions, or configuration details. This directly exposes the model's configured behavior because:

Direct Exposure of Configuration: The attack aims to reveal the exact instructions, rules, and constraints that have been programmed into the LLM
System-Level Prompt Revelation: Many LLMs have hidden system prompts that define their behavior, role, and limitations - extracting these reveals the core configuration
Proprietary Information Disclosure: Prompt templates often contain proprietary logic, business rules, or sensitive instructions that should remain confidential
Behavioral Understanding: By extracting the prompt template, attackers gain deep insight into how the model is designed to behave in various scenarios

Why other options are incorrect:

A. Prompted persona switches: This involves making the LLM adopt different personas or roles, but doesn't necessarily expose the underlying configuration
B. Exploiting friendliness and trust: This leverages the model's programmed helpfulness to extract information, but focuses on content extraction rather than configuration exposure
C. Ignoring the prompt template: This refers to bypassing or overriding the intended instructions, but doesn't involve extracting or revealing the template itself

Security Implications:

Confidentiality Breach: Exposes proprietary prompt engineering work
Attack Surface Expansion: Revealed configurations can be used to craft more sophisticated attacks
Behavioral Manipulation: Understanding the exact configuration enables precise manipulation of model responses

Powered ByGPT-5.2

Comments

Loading comments...