OpenCode on DigitalOcean Serverless Inference: Using DO-Hosted Models as an Agentic Coding Backend with the Inference Router

Jun 30, 2026 11:43 PM - 1 day ago 939

Frontier models for illustration Claude Opus 4.8 tin thrust agentic coding crossed languages from Python to C++ to GDScript; but astatine accumulation scale, their per-token costs compounds fast. DigitalOcean’s Inference Router addresses this directly: it routes regular activity to smaller open-source models and escalates to frontier models only erstwhile a task demands it. To measurement the quality connected a existent multi-file codebase, we built a complete Godot 4 game, PK Shootout, wholly done OpenCode. In this build, OpenCode completed 596 routed tasks crossed 83 adjunct turns. The task utilized astir 4.1M tokens, and costs astir $8.25 done the router. Most tasks routed to MiMo V2.5 and GLM-5.2, while only 2 of 596 tasks required fallback. The aforesaid build would person costs an estimated $123 (if the aforesaid number of tokens were generated) connected a frontier exemplary alone.

PK Shootout was developed complete the people of a fewer hours utilizing OpenCode. To powerfulness the models, we connected OpenCode to DigitalOcean’s Serverless Inference done its OAuth login, past created our ain Inference Router to service arsenic the backend. With those 2 pieces successful place, we built a complete Godot 4 penalty-shootout crippled (aim, powerfulness meter, keeper AI, force simulation, abrupt death, extremity screen, and restart) without ever naming a exemplary successful our workflow. The router picked the exemplary per task. But the much useful uncovering wasn’t the savings: it was discovering that the router doesn’t ever tally the models you configured. It makes its ain cost-driven choices, and what really ran amazed us.

This article walks done each measurement we took to create the game, from connecting OpenCode to hosting connected App Platform. Follow on for a elaborate breakdown of wherever things went well, wherever we had to measurement in, and the clip and costs the improvement process really incurred.

Connecting OpenCode to DigitalOcean

This was 1 of the simplest parts of the process. We logged into our DigitalOcean account, opened the Inference Router section, and created a router named game-designer pinch 4 chopped routes:

  • design: the astir heavy utilized way during the coding and creation phases. It runs GLM-5.2 arsenic its superior model, backed by Deepseek V4 Pro, Kimi K2.6, and MiMo V2.5 Pro.
  • repo-qa: tuned for elemental repository mobility answering. Pool: Gemma 4, MiMo V2.5, Arcee Trinity Large Thinking (Public Preview), and Qwen 3.5 397B A17B.
  • chore: trivial, low-stakes edits that request nary architectural reasoning: perpetrate messages, docstrings, inline comments, README boilerplate, .gitignore entries, adaptable renames, formatting. Pool: Qwen3 Coder Flash, Ministral 3 14B, Gemma 4, and MiMo V2.5, defaulting to Gemma 4.
  • debug: diagnoses runtime errors, stack traces, failing builds, and unexpected behavior, reasoning measurement by measurement to find and hole bugs successful existing code. Pool: GLM-5.2, Deepseek 3.2, MiniMax M2.5 (Public Preview), and Kimi K2.6.

This router handled 100% of the game’s development. Connecting it to OpenCode took 1 step: instal OpenCode successful your terminal, tally /connect, take Login pinch DigitalOcean (the OAuth path: this is the action that exposes your routers), and authorize. Run /models and your routers look prefixed pinch router:. We selected router:game-designer and started building.

From location the loop is mean agentic development: picture a alteration successful earthy language, fto the supplier publication the applicable files, propose edits, and use them. What’s different is underneath: for each petition the router decides which way and exemplary should grip it, pinch nary exemplary sanction anyplace successful our prompts. The remainder of this article examines what we learned from leaning connected that setup to build PK Shootout extremity to end.

Which DO-hosted models tin really thrust a coding agent

The honorable type of this mobility isn’t “which exemplary did we pick?”. We ne'er picked one. It’s “which models did the router delegate to which kinds of work, and really did each clasp up successful that role?” That reframing matters, because it’s the full premise of a router-backed workflow: you picture the task, and routing decides the model.

photo

Across the build period, the router resolved 596 tasks into 5 buckets: chore astatine 335 tasks (56.2%), creation astatine 207 (34.7%), debug astatine 41 (6.9%), repo-qa astatine 11 (1.8%), and 2 fallbacks (0.3%). The style is what you’d expect from existent development: a agelong tail of mini edits, a coagulated halfway of creation work, a thinner set of genuine debugging, and almost nary axenic repo Q&A.

Map that onto the exemplary distribution and the communicative gets much absorbing than the config unsocial would suggest. Three models carried fundamentally each the traffic: MiMo V2.5 astatine astir 56%, GLM-5.2 astatine astir 42%, and Gemma 4 astatine a bladed ~2%. The galore different models we listed successful each route’s pool, for illustration Deepseek, Kimi, Qwen, Ministral, etc., ne'er had to travel disconnected the bench. In practice, 2 workhorses did the job.

GLM-5.2 behaved precisely arsenic we’d configured it. Its ~42% stock lines up almost perfectly pinch the 2 routes wherever it’s the superior model: creation (34.7%) positive debug (6.9%) comes to ~41.6%. That’s a cleanable awesome that the creation and debug routing did what we designed it to do, and that GLM-5.2 was a tin capable supplier to beryllium trusted pinch some the architectural activity and the step-by-step bug hunts: the 2 places wherever anemic instruction-following would person shown up fastest.

photo

The chore way is wherever configuration and reality diverged, and it’s worthy being upfront about. Chore was our largest bucket astatine 56% of each tasks, and Gemma 4 only served ~2% of calls, while MiMo V2.5, which isn’t the stated default for immoderate route, absorbed ~56%. This is because MiMo V2.5 was efficaciously cheaper, starring the router to prime it complete Gemma 4. In different words, the router’s effective prime for low-stakes edits was MiMo V2.5, not the Gemma 4 we’d nominally set. For a laminitis evaluating this approach, that’s the useful caveat: routes definitive intent, but the exemplary that really answers is the router’s call, and it’s worthy watching your distribution to corroborate what’s really running.

What nary of these models had problem pinch was the agentic protocol itself. Throughout the session, the backing models reliably chained instrumentality calls: reference files, grepping for old references, applying targeted str_replace-style edits, and not conscionable dumping prose and hoping. They performed multi-file consistency checks (verifying that a segment file, its script, and the calling codification each agreed aft a change) and reasoned correctly astir non-obvious power flow, including the ordering of an await wrong an async match-end usability and whether a authorities was group earlier the usability suspended. That’s the existent barroom for “can this exemplary thrust a coding agent,” and it’s a higher barroom than benchmark scores: it’s not whether the exemplary knows GDScript, but whether it tin run the tools, clasp the repo successful its head, and not break things it isn’t looking at. On that bar, the router’s 2 workhorses cleared it.

What it costs and really agelong it took

A moving crippled is the headline, but the numbers underneath it are what make the workflow legible.

The build took 83 adjunct turns against 17 personification turns to scope a complete, documented game: a astir five-to-one ratio of supplier activity to quality input. Seventeen prompts carried the task from an quiet Godot task to aim, powerfulness meter, keeper AI, force simulation, abrupt death, an extremity screen, a restart flow, and a afloat README.

Summed crossed each 83 turns, the supplier spent astir 74.5 minutes of cumulative procreation clip - the exemplary really working, not counting the clip we spent reading, testing successful Godot, aliases deciding what to inquire next. Individual turns ranged widely, from a 2.1-second one-line hole to a 529.6-second walk that wrote the full README. That dispersed reflects really routed activity really distributes: a agelong tail of speedy edits punctuated by a fistful of heavy, multi-file generations. In wall-clock terms, PK Shootout came together successful a fewer hours, astir an hr and a 4th of which was the exemplary generating.

The router adds 1 costs that a azygous hardcoded exemplary doesn’t: the clip it takes to determine wherever each petition should go. That overhead is mini and measurable. Router solution latency held astir 0.27–0.30 seconds crossed the period, pinch game-designer measured astatine 0.276s connected June 25. At astir a quarter-second per request, it’s the literal value of not naming a exemplary yourself. The latency curve stays level crossed the two-day model and only ticks up astatine the very end, truthful routing didn’t degrade arsenic the task grew.

Because the bulk of postulation resolved to lower-cost models, MiMo and GLM alternatively than a frontier API, the full walk came successful good nether what the aforesaid 596 tasks would person costs connected an all-frontier setup. To put this into existent numbers, it took astir 4.1M tokens to do everything - building the app, designing the plan, penning the README file, and answering a fewer questions astir what was done. For everything, the calls to MiMo v2.5 totalled astir 0.65 USD, the GLM-5.2 calls added up to astir 7.56 USD, and the calls to Gemma 4 costs astir 0.04 USD, adding up to a full of astir 8.25 USD. This is compared to a full cumulative estimated costs of 18.04 USD if we were moving GLM-5.2 unsocial without the router, aliases moreover higher costs pinch frontier models for illustration ChatGPT 5.5 costing 123.00 USD connected the aforesaid number of tokens. While these models could apt do the task much efficiently and successful little tokens than the router methodology, it wouldn’t beryllium capable to offset the monolithic per-token costs summation erstwhile utilizing these larger models. Routing the costly activity to costly models and leaving everything other connected cheaper ones is the full economical statement for this workflow, and the task distribution is what delivers it.

Where we had to measurement in

A walk-through that only shows the parts that worked isn’t worthy much. The router carried the bulk of PK Shootout connected its own, but 3 moments needed a quality successful the loop, and they’re the astir instructive portion of the build. Two were the supplier doing its occupation well; 1 was a limitation worthy knowing earlier you spot this workflow pinch thing load-bearing.

The first was a existent bug, and the supplier handled it cleanly. After an early build, the crippled looked correct but didn’t respond: pressing the near rodent fastener produced nary powerfulness barroom and nary shot. We described the denotation successful plain connection and asked the supplier to find the cause. It publication the input handler, traced the arena flow, and worked retired that input was ne'er reaching _unhandled_input astatine all. The culprit was the full-screen Background node, a ColorRect that, for illustration each Control successful Godot 4, defaults to MOUSE_FILTER_STOP and silently consumes rodent events earlier they propagate. The hole was a one-line alteration to group the background’s rodent select to ignore. What’s notable is that we ne'er pointed astatine the cause; we only reported the symptom. The supplier diagnosed the guidelines origin itself, which is precisely the benignant of step-by-step reasoning the debug way is meant to provide.

The 2nd infinitesimal was friction, but the instructive kind: it was our fault, not the model’s. We asked to move the extremity “to astir 1/3 of the measurement down the screen.” The supplier publication that arsenic one-third from the apical and moved the extremity upward, the other of what we pictured. We told it arsenic much, past clarified pinch “1/3 from the bottommost of the screen,” and it landed the extremity wherever we wanted connected the adjacent try. The instruction isn’t that the exemplary failed; it’s that natural-language spatial instructions are ambiguous, and the costs of that ambiguity is simply a mates of other turns. A precise instruction the first clip would person saved both. This is the mean texture of agentic development: the human’s occupation shifts from penning codification to penning unambiguous intent, and you get amended astatine it arsenic you go.

The 3rd infinitesimal is the 1 to return seriously, because it exposes the existent bound of a sub-frontier backend. When we asked the supplier to constitute a bid for generating image assets done DigitalOcean’s Serverless Inference, it didn’t cognize the existent API. In its reasoning it floated respective plausible-looking endpoint URLs, nary of them correct, earlier doing the correct thing: it flagged its ain uncertainty, asked america for the existent endpoint and token alternatively than committing to a guess, and produced a generic OpenAI-compatible book pinch the specifics near arsenic placeholders. The hedge is genuinely to its credit. But the underlying truth is the important one. The exemplary did not reliably cognize current, outer API details, and if we’d taken its first conjecture astatine look worth we’d person shipped a surgery command. We supplied the existent configuration, which is documented successful the last conception of this article.

Taken together, these 3 moments tie a cleanable line. The supplier is beardown astatine reasoning complete codification it tin see: diagnosing the rodent bug from denotation alone, applying targeted fixes, and recovering gracefully from an ambiguous instruction. It is anemic precisely wherever immoderate exemplary is anemic without retrieval: charismatic knowledge of current, outer systems it can’t inspect. Knowing which broadside of that statement a task falls connected is astir of what it takes to usage this workflow well.

When this workflow makes consciousness (and erstwhile to scope for a frontier API)

By the extremity of the build we had capable grounds to shape a defensible sentiment astir wherever a router-backed, DO-hosted setup belongs and wherever it doesn’t.

Reach for it erstwhile the activity is mostly routine. Our task divided was telling: astir 56% chore and 35% design, pinch only a bladed set of genuine debugging and almost nary axenic repository Q&A. That is what mean improvement really looks like, a agelong tail of small, well-scoped edits complete a dependable halfway of creation work, and it’s precisely the style a router is built to exploit. Cheap tasks resoluteness to inexpensive models, the costly ones escalate only erstwhile the way calls for it, and you ne'er salary frontier prices for a docstring aliases a adaptable rename. The costs numbers carnivore this retired directly: astir 8.25 USD for the full build against an estimated 18.04 USD connected GLM-5.2 unsocial and acold much connected a frontier model. When the bulk of your activity is implementation alternatively than caller reasoning, the router’s operation is the full point.

Reach for a frontier API erstwhile the task depends connected accurate, existent knowledge of outer systems, aliases connected long-horizon architectural reasoning that has to clasp together crossed galore steps. The clearest awesome we deed was the infinitesimal the backend invented a DigitalOcean endpoint it didn’t really know. A sub-frontier exemplary operating without retrieval will confidently capable gaps successful its knowledge of existent APIs and SDKs, and that’s precisely the nonaccomplishment mode that costs you erstwhile you slightest expect it. The bound is elemental to state: regular implementation, yes; charismatic outer facts, verify aliases escalate. If a task hinges connected a item the exemplary can’t inspect successful your repo, either springiness it the facts yourself aliases way it to a exemplary you spot to person them.

Underneath some cases is the aforesaid tradeoff, and it’s a favorable one. You springiness up a small velocity (the router added astir a quarter-second of solution latency per request) and you judge the occasional fallback (2 crossed 596 tasks, erstwhile a route’s superior didn’t return usable output and the adjacent exemplary successful the excavation stepped in). In exchange, you ne'er hardcode a model, ne'er overpay for trivial work, and get a built-in information nett erstwhile a azygous consequence goes bad. For the benignant of workload PK Shootout represents, that’s a waste and acquisition worthy making.

When a router-backed backend is the correct call

PK Shootout went from an quiet Godot task to a complete, playable penalty-shootout crippled (aim, powerfulness meter, keeper AI, force simulation, abrupt death, an extremity screen, and restart) successful a fewer hours, built wholly done OpenCode connected DigitalOcean’s Serverless Inference, pinch an Inference Router choosing the exemplary for each 1 of 596 tasks and a exemplary sanction ne'er erstwhile appearing successful our prompts.

The consequence is simply a clear image of what this workflow is bully for. The supplier reasoned capably complete codification it could see: it diagnosed a mouse-input bug from the denotation alone, applied targeted fixes crossed aggregate files, and recovered gracefully from ambiguous instructions. The router kept inexpensive activity connected inexpensive models and reserved the tin ones for creation and debugging, delivering the afloat build for astir 8.25 USD. Its limits were arsenic clear, and they autumn wherever immoderate model’s do without retrieval: charismatic knowledge of current, outer systems it can’t inspect. Knowing which broadside of that statement a task sits connected is astir of what it takes to usage the setup well.

For routine, well-scoped improvement (which is astir development) a router-backed, DO-hosted backend is simply a genuinely applicable measurement to build, and a markedly cheaper 1 than reaching for a frontier API by default.

photo

Everything successful this build is reproducible pinch a DigitalOcean account, OpenCode, and a fewer minutes of setup. Here’s the nonstop configuration for really to usage OpenCode pinch DigitalOcean:

  1. Create a exemplary entree key. In the DigitalOcean Control Panel, unfastened Inference → Serverless Inference, past connected the Get Started tab click Create a Model Access Key. A azygous cardinal covers each exemplary and each modality (text, image, audio), and the aforesaid cardinal useful for some instauration models and routers. Save it location safe — you’ll request it for immoderate non-OAuth access.
  2. Build the router (optional but recommended). In the Control Panel, unfastened the Inference Router conception and create a router. We named ours game-designer and gave it 4 routes — design, repo-qa, chore, and debug — each pinch a superior exemplary and a fallback pool, arsenic described earlier successful this article. The router is what lets you picture a task and ne'er sanction a model; you tin skip this and telephone instauration models directly, but past routing is connected you.
  3. Connect OpenCode. Install OpenCode, motorboat it (TUI, desktop, aliases web), and run:
/connect

Search for DigitalOcean and take Login pinch DigitalOcean. This is the OAuth path, and it’s the 1 that matters: it requests the genai:read and inference:query scopes and pulls successful some your instauration models and your routers. The alternative, Paste Model Access Key, authenticates good but will not aboveground your routers — truthful if you’ve built one, usage OAuth. After authorizing successful the browser, run:

/models

Your routers look prefixed pinch router:. We selected router:game-designer and started building. That’s the full setup. Calling the endpoint directly. If you’d alternatively thrust Serverless Inference from your ain codification alternatively of done OpenCode, it’s OpenAI-compatible. Point immoderate OpenAI customer astatine the fixed guidelines URL and walk your exemplary entree cardinal arsenic a bearer token:

from openai import OpenAI import os client = OpenAI( base_url="https://inference.do-ai.run/v1/", api_key=os.getenv("MODEL_ACCESS_KEY"), ) resp = client.chat.completions.create( model="game-designer", # a router name, aliases a instauration exemplary id messages=[{"role": "user", "content": "Summarize this repo's architecture."}], ) print(resp.choices[0].message.content)

The aforesaid guidelines URL, key, and petition style activity from cURL, the Gradient Python SDK, LangChain, aliases LlamaIndex — switch the backend and your existing codification runs unchanged.

  • Serverless Inference docs
  • Using coding agents pinch DigitalOcean (covers OpenCode specifically)
  • Inference API reference
  • Creating and managing exemplary entree keys
  • The PK Shootout repository

Creative CommonsThis activity is licensed nether a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

More