Skip to content

Self Prompting

Self-prompting injects a short, internal prompt and then deterministically constrains the model’s next-token choices using a strategy. It’s a single-question controller that (a) writes the prompt, (b) gates logits according to a compiled strategy, and (c) signals completion and optional cleanup.

Concept

  • Prompt: text or tokens prepended to the stream to “ask yourself a question”.
  • Strategy: a compiled constraint (e.g., ChoicesStrat, UntilStrat, CharsStrat, ListStrat) that provides allowed/disallowed token sets per step.
  • Controller: maintains per-request state, produces actions (ForceTokens, AdjustedLogits, Backtrack), and exposes the answer tokens.

Lifecycle

Given a SelfPrompt(prompt, strategy, ...) instance:

  • Prefilled

    • Compile strategy with the runtime tokenizer.
    • Tokenize prompt; reset state (prompt_emitted=false, outstanding_forced=0, completed=false).
  • ForwardPass

    • If a backtrack was scheduled on completion, emit it now and mark complete.
    • If prompt not emitted yet: emit ForceTokens(prompt_tokens) and set outstanding_forced.
    • If still flushing forced tokens: noop until Added(forced=True) decrements the counter.
    • If a completion suffix is pending: emit ForceTokens(suffix_tokens) (optional).
    • Otherwise, compute allowed/disallowed sets from the strategy and emit AdjustedLogits(masked_logits); optionally set token_temp=0 for argmax sampling.
  • Added

    • If tokens were forced: decrement outstanding_forced; do not mutate answer state.
    • Else: strategy.step(token), append token to answer_tokens.
    • When strategy.is_complete becomes true:
      • Optionally stage a completion suffix (e.g., a newline) if not handled by the strategy.
      • If erase mode is active, schedule a Backtrack to run on the next ForwardPass.
      • Otherwise, mark completed=true and continue with noop.

State Model (per request)

  • compiled: compiled strategy instance
  • strat_state: runtime state object from strategy.start(...)
  • prompt_tokens: tokenized prompt
  • prompt_emitted: whether we have forced the prompt
  • outstanding_forced: how many forced tokens we’re still consuming
  • completed: whether the self-prompt has finished
  • answer_tokens: collected non-forced tokens emitted while strategy ran
  • suffix_tokens / suffix_pending: optional completion suffix bookkeeping
  • backtrack_n / backtrack_reinject / backtrack_scheduled: erase scheduling

Logit Masking

On each ForwardPass (after the prompt is flushed and no suffix is pending):

  • Ask the strategy for allowed and disallowed sets.
  • If both empty: noop (no constraints that step).
  • Else build a masked logits tensor by applying mask_value (default -1e9):
    • If allowed is non-empty: mask everything except allowed (and also apply any explicit disallowed).
    • If allowed is empty but disallowed is non-empty: mask just disallowed.
  • Emit AdjustedLogits(masked, token_temp=0.0) for argmax, or omit token_temp to keep normal sampling.

Completion & Erase

When the strategy completes:

  • Completion suffix

    • If the strategy did not already append a terminator and a suffix is configured, stage suffix_tokens so the next ForwardPass can force them.
    • If using ListStrat(end_with=...), the strategy consumes the suffix; avoid double-inserting a suffix in that case.
  • Erase modes

    • NONE: keep prompt and answer in the output.
    • PROMPT: remove just the prompt and reinject the answer tokens.
    • ALL: remove both prompt and answer (useful for “classify then hide”).

Erase is implemented by scheduling a Backtrack(n, reinject?) for the next ForwardPass where n = len(prompt_tokens) + len(answer_tokens) + len(suffix_tokens) and reinject is answer_tokens when mode is PROMPT, otherwise None.

Dynamic Choices

For ChoicesStrat (and ListStrat containing ChoicesStrat elements) the available responses can be updated on the fly:

# Update a flat choices strategy
sp.refresh_responses(["Yes", "No"], request_id)
 
# Update a nested list’s element choices by index
sp.refresh_responses(["A", "B", "C"], request_id=request_id, idx=0)

When responses change, the compiled state for that request is cleared and recompiled on the next event.

API Summary

  • handle_prefilled(event, tokenizer) -> None
  • handle_forward_pass(event, actions, tokenizer) -> ModAction
  • handle_added(event, actions, tokenizer) -> None
  • is_complete(request_id) -> bool
  • answer_tokens(request_id) -> list[int] | None
  • refresh_responses(responses, request_id?, idx?) -> None

Usage Examples

Classification

from quote_mod_sdk.self_prompt import SelfPrompt, EraseMode
from quote_mod_sdk.strategies.strategy_constructor import ChoicesStrat
 
sp = SelfPrompt(
  prompt={"text": " Choose: yes/no "},
  strategy=ChoicesStrat(["yes", "no"]),
  erase=EraseMode.ALL,  # hide prompt + answer after decision
)
 
# In your mod: route Prefilled/ForwardPass/Added to sp.*
# When sp.is_complete(req): decode sp.answer_tokens(req) to get the answer

Extract Until Tag

from quote_mod_sdk.self_prompt import SelfPrompt
from quote_mod_sdk.strategies.strategy_constructor import UntilStrat
from quote_mod_sdk.strategies.primitives import UntilEndType
 
sp = SelfPrompt(
  prompt={"text": " Wrap in <answer>...</answer> and stop: "},
  strategy=UntilStrat("<answer>", UntilEndType.TAG, "</answer>"),
)
 
# After completion, strip tags from decoded answer if desired

List of Choices with Terminator

from quote_mod_sdk.self_prompt import SelfPrompt
from quote_mod_sdk.strategies.strategy_constructor import ListStrat, ChoicesStrat
 
sp = SelfPrompt(
  prompt={"text": " Pick up to 3 colors: "},
  strategy=ListStrat(
    elements=ChoicesStrat(["red", "green", "blue"]),
    open="[", close="]", sep=", ", wrap='"', end_with="\n",
    min=1, max=3,
  ),
)

See the Strategies page for supported types and configuration details.