Download
Get a prebuilt binary from the latest GitHub Release for macOS, Windows, or Linux.
Native local inference
Run Bonsai-8B from your terminal with a small native CLI. Bonsai downloads the GGUF model on first use, caches it locally, and streams generated text directly to standard output.
$ ./bonsai "Explain GitHub Actions in one sentence."
Downloading Bonsai-8B.gguf on first run...
GitHub Actions is a CI/CD automation service that runs workflows from your repository when events such as pushes, pull requests, or releases happen.
$ terraform plan | ./bonsai "Summarize this Terraform plan."
The plan updates networking resources, leaves existing compute instances unchanged, and does not destroy any infrastructure.
Download the prebuilt binary for your platform, extract it, and run it with a prompt. The model is downloaded once and reused from the local cache after that.
Get a prebuilt binary from the latest GitHub Release for macOS, Windows, or Linux.
Unpack the archive and place the executable wherever you keep command-line tools.
Pass a prompt as an argument, or pipe input through standard input for summarization and review workflows.
./bonsai "Explain GitHub Actions in one sentence."
.\bonsai.exe "Explain GitHub Actions in one sentence."
Bonsai works well as a direct prompt tool and as a Unix-style filter for existing command output, logs, plans, and text files.
./bonsai "Summarize this text in Japanese." < terraform_plan_result.log
terraform plan | ./bonsai "Summarize the following Terraform plan in Japanese."
Tune the model path, sampling behavior, context size, cache location, and Hugging Face authentication from the command line.
| Flag | Default | Description |
|---|---|---|
-model-path |
auto | Local path to the GGUF model. |
-model-repo |
prism-ml/Bonsai-8B-gguf |
Hugging Face model repository. |
-model-file |
Bonsai-8B.gguf |
GGUF filename inside the repository. |
-cache-dir |
OS user cache directory | Root directory for downloaded models. |
-context-size |
4096 |
Context window size. |
-max-tokens |
64 |
Maximum number of generated tokens. |
-threads |
runtime.NumCPU()/2 |
Number of CPU threads, with a minimum of one. |
-temperature |
0.5 |
Sampling temperature. |
-top-k |
20 |
Top-k sampler setting. |
-top-p |
0.9 |
Top-p sampler setting. |
-repeat-penalty |
1.0 |
Repeat penalty. |
-seed |
0 |
Random seed. |
-raw-prompt |
false |
Send the prompt without applying the model chat template. |
-hf-token |
empty | Hugging Face token for gated models. |
Build Bonsai CLI from source with Go, CMake, a C/C++ toolchain, and the llama.cpp submodule checked out.
git clone --recurse-submodules https://github.com/pluswing/bonsai-cli.git
cd bonsai-cli
./scripts/build-bonsai.sh
BONSAI_CONTEXT_SIZE=4096 \
BONSAI_MAX_TOKENS=24 \
BONSAI_PROMPT="Reply in one short sentence about CPU-only inference." \
docker compose up --build --abort-on-container-exit \
--exit-code-from bonsai-cli-test