Bring-Your-Own-Endpoint (BYOE)¶

Specsmith ships first-class support for self-hosted OpenAI-v1-compatible LLM servers (vLLM, llama.cpp server, LM Studio, TGI, text-generation-webui, …). Every endpoint you register can be selected per session via --endpoint <id> on specsmith run, chat, and serve (PR-2).

Quick start¶

Register a vLLM running on your LAN:

specsmith endpoints add \
  --id home-vllm \
  --name "Home vLLM" \
  --base-url http://10.0.0.4:8000/v1 \
  --default-model Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8 \
  --auth none \
  --set-default

specsmith endpoints test home-vllm

Once the test reports ok, run an agent against it:

specsmith run --endpoint home-vllm "summarise the last commit"

Storage layout¶

All endpoints live in ~/.specsmith/endpoints.json (override with SPECSMITH_HOME). The on-disk schema is versioned:

{
  "schema_version": 1,
  "default_endpoint_id": "home-vllm",
  "endpoints": [
    {
      "id": "home-vllm",
      "name": "Home vLLM",
      "base_url": "http://10.0.0.4:8000/v1",
      "auth": {"kind": "bearer-keyring",
               "keyring_service": "specsmith",
               "keyring_user": "endpoint:home-vllm"},
      "default_model": "Qwen/Qwen2.5-Coder-32B",
      "verify_tls": true,
      "tags": ["local", "coder"],
      "created_at": "2026-05-01T11:30:17Z"
    }
  ]
}

The file is written chmod 600 on POSIX. Token bytes for the inline strategy are the only secret material that ever lands in this file — the keyring and env-var strategies leave it secret-free.

Auth strategies¶

Kind	Where the token lives	When to use
`none`	nowhere — request is unauthenticated	trusted LAN, open vLLM dev box
`bearer-inline`	`endpoints.json` (plaintext, `chmod 600`)	quick scratch setups where keyring is unavailable
`bearer-env`	the env var name you specify (`--token-env FOO`)	CI / containers / 12-factor deploys
`bearer-keyring`	OS keyring, indexed by `(service, user)` (default)	desktop / laptop installs (default)

The list --json output redacts inline tokens to "***". The CLI never logs token bytes to terminal output.

Health checks¶

specsmith endpoints test home-vllm --json
specsmith endpoints models home-vllm --json

test calls <base_url>/models with the resolved bearer token, prints the latency in milliseconds, and reports up to 5 model ids. models returns the full list.

If the endpoint does not expose /v1/models, test will still return a clear error message — set default_model manually and rely on the session-level model dropdown instead.

CLI reference¶

Command	Notes
`specsmith endpoints add`	Register a new endpoint. `--auth bearer-keyring` (default) prompts for the secret without echo.
`specsmith endpoints list [--json]`	Tabular by default, JSON for IDE consumers. Tokens are redacted.
`specsmith endpoints remove <id> [--purge-keyring]`	Remove the entry; pass `--purge-keyring` to also delete the saved token.
`specsmith endpoints default <id>`	Promote an existing endpoint to the default.
`specsmith endpoints test [<id>] [--timeout 5]`	Probe `/v1/models`. Exits 1 on failure.
`specsmith endpoints models [<id>]`	List every model the endpoint advertises.

Security notes¶

The store path is chmod 600 on POSIX where supported.
verify_tls: false is opt-in (--no-verify-tls); otherwise the CLI verifies the certificate chain. Disabling it for an https endpoint is documented per-endpoint in the on-disk JSON so a drift audit can spot insecure configurations.
auth.kind == bearer-inline is functional but not recommended. Prefer bearer-keyring when the OS keyring is available; otherwise use bearer-env and inject the secret through your shell or container environment.

Roadmap¶

PR-2 (this milestone): wires --endpoint <id> into run, chat, and serve, plus a new _run_openai_compat provider driver.
PR-4: 0.8.0 release notes + tag.