Every AI supplier yet runs into the aforesaid structural problem: the exemplary tin reason, but it cannot enactment without tools. Someone has to tally those tools, for example, fetch the hunt results, query the database, telephone the API, and that personification is usually your code.
Most teams build this the aforesaid way. The exemplary returns a instrumentality call. Your codification catches it, runs the tool, formats the result, and sends it back. Repeat until the exemplary has what it needs to answer. The loop works, but it intends your squad owns the afloat instrumentality layer: connections, credentials, retry logic, correction handling, and observability. None of that is your merchandise — it’s conscionable the behind-the-scenes infrastructure that is working.
There is an alternative: move instrumentality execution into the conclusion furniture itself, truthful the devices tally arsenic portion of the API telephone alternatively than betwixt API calls.
DigitalOcean’s Server-Side Tools for Inference Engine does this. This article explains what that displacement really changes, the architecture, the latency profile, and the cases wherever it does not make sense, truthful you tin determine whether it fits your usage case.
Key Takeaways
- Your codification becomes simpler, but you still request to deliberation astir failures, performance, and retry safety. DigitalOcean handles instrumentality execution, but those responsibilities stay yours.
- Client-side is amended for development; server-side is amended for production. Client-side gives you afloat visibility and section debugging. Server-side removes infrastructure overhead erstwhile you are fresh to ship.
- Cold starts are still your responsibility. DigitalOcean manages the connection, but if your MCP server goes cold, the startup clip still shows up successful your response. Keep it warm.
- Tool nonaccomplishment and supplier nonaccomplishment are different problems. A instrumentality timing retired is an infrastructure issue. The exemplary answering poorly contempt moving devices is simply a punctual issue. Track them separately.
- Wire up tracing earlier you request it. Once thing breaks successful production, you will want to cognize precisely which instrumentality telephone caused it. Set up the Agent Tracing API from the start.
- MCP servers must beryllium publically reachable for server-side use. If your devices unrecorded connected a backstage network, usage client-side MCP instead.
- Tool Search matters erstwhile you walk 20 to 30 instrumentality definitions. Below that, loading each devices connected each petition is fine. Above it, lazy-loading pinch Tool Search reduces input token costs connected each turn.
Server-Side Tools for DigitalOcean Inference Engine fto you adhd instrumentality execution straight into conclusion requests. You usage your existing Model Access Key, pinch nary caller credentials and nary caller API surface. The devices are disposable done Serverless Inference and Dedicated Inference.
You tin link 5 types of outer capabilities to an conclusion petition today:
1. Web Search, powered by Exa Real-time web hunt backed by Exa’s neural hunt index. The exemplary decides erstwhile to search, runs the query, and uses the results successful its response. You power really galore searches tally per petition (max_uses: 1 to 5) and really galore results each hunt returns (max_results: 1 to 10). Priced astatine $10 per 1,000 requests.
2. Web Fetch, powered by Exa Fetches and extracts contented from circumstantial URLs during inference. Exa’s extraction returns clean, parsed matter alternatively than earthy HTML, which reduces the tokens the exemplary has to process. No other complaint beyond modular token costs.
3. Knowledge Base Retrieval Lets the exemplary query your backstage data. You supply a knowledge guidelines ID, and the API retrieves applicable contented and includes it successful the consequence automatically.
4. Customer-owned MCP Servers Connects the exemplary to immoderate distant Model Context Protocol server you operate. You walk the server URL and a bearer token successful the request. DigitalOcean handles the MCP connection, instrumentality discovery, and execution. You power which devices from your server the exemplary tin telephone utilizing the allowed_tools parameter.
5. Tool Search (Anthropic and OpenAI models) Once you person much than astir 20 to 30 instrumentality definitions, loading each of them connected each petition adds meaningful input token cost. At 50 aliases much communal successful agents that link to aggregate soul systems, each exposing respective devices the overhead tin tally to hundreds of tokens per request. Multiplied crossed thousands of requests per day, that costs compounds quickly. Tool Search solves this by lazy-loading instrumentality definitions. Tools marked pinch defer_loading: existent are only loaded into discourse erstwhile the exemplary needs them, not connected each request.
For Anthropic models, this useful via the Messages API utilizing a hunt instrumentality (tool_search_tool_regex_20251119 for shape matching aliases tool_search_tool_bm25_20251119 for earthy connection queries). For OpenAI models, it useful via the Responses API pinch GPT-5.4+ utilizing type: "tool_search".
# Tool Search pinch Anthropic (Messages API) import anthropic client = anthropic.Anthropic( base_url="https://inference.do-ai.run/v1", api_key="your-model-access-key" ) response = client.messages.create( model="anthropic-claude-opus-4.8", max_tokens=2048, messages=[{"role": "user", "content": "What is the upwind successful zip codification 94107?"}], tools=[ # The hunt instrumentality loads immediately { "type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex" }, # These devices only load erstwhile the exemplary searches for them { "name": "get_weather_by_zip", "description": "Return existent upwind conditions for a US zip code.", "input_schema": { "type": "object", "properties": { "zip_code": {"type": "string"}, "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} }, "required": ["zip_code"] }, "defer_loading": True }, { "name": "search_files", "description": "Search done files successful the workspace", "input_schema": { "type": "object", "properties": { "query": {"type": "string"} }, "required": ["query"] }, "defer_loading": True } ] )The exemplary searches for the instrumentality it needs, loads only that definition, past calls it. This matters astir erstwhile you are past 20 to 30 devices supra that threshold, the input token savings connected each petition commencement to adhd up.
# Standard web search from openai import OpenAI client = OpenAI( base_url="https://inference.do-ai.run/v1", api_key="your-model-access-key" ) response = client.chat.completions.create( model="openai-gpt-4o", messages=[{"role": "user", "content": "What changed successful DigitalOcean pricing this month?"}], tools=[{"type": "web_search", "max_uses": 3, "max_results": 5}] ) # One request, 1 response print(response.choices[0].message.content)Without server-side tools, this aforesaid task takes 3 steps: nonstop the connection to the model, get backmost a instrumentality telephone instruction, tally the hunt yourself, past nonstop the results backmost for the last answer. With server-side tools, each of that happens wrong a azygous API call.
Use Cases
Research Agents
A investigation supplier that tin hunt the web and fetch pages tin reply questions that request real-time information: existent pricing, caller merchandise changes, news that happened aft the model’s training cutoff, aliases thing that needs aggregate sources combined.
How it useful pinch Server-Side Tools:
response = client.chat.completions.create( model="openai-gpt-4o", messages=[{ "role": "user", "content": "Summarize the cardinal changes to Kubernetes networking successful the past 6 months and their implications for teams moving microservices." }], tools=[{ "type": "web_search", "max_uses": 5, "max_results": 5 }] )The exemplary decides erstwhile to search, what to hunt for, really galore times, and erstwhile it has capable accusation to constitute the summary. Your exertion codification stays the aforesaid sloppy of really galore searches the supplier runs internally.
Without server-side tools, you would instrumentality the loop yourself: drawback the instrumentality call, telephone a hunt API, append results to the conversation, telephone the exemplary again, and repetition until the exemplary produces a last answer.
AI Apps Connecting to Internal Systems via MCP
Teams that person already built MCP servers for soul devices (project management, scheduling, CRM, soul APIs) tin link them to immoderate conclusion petition without changing their exertion code.
response = client.chat.completions.create( model="openai-gpt-4o", messages=[{ "role": "user", "content": "Pull the unfastened support tickets for our endeavor tier and draught a play digest for the team." }], tools=[{ "type": "mcp", "server_label": "internal-crm", "server_url": "https://tools.yourcompany.com/mcp", "authorization": "Bearer your-mcp-token", "allowed_tools": ["list-tickets", "get-ticket-details", "get-customer-info"] }] )The allowed_tools database controls what the exemplary is allowed to do connected your MCP server. If you time off it out, the exemplary tin telephone each instrumentality your server exposes, truthful it is bully believe to database only what the existent task needs.
Your MCP server needs to beryllium publically reachable because the conclusion furniture connects from DigitalOcean’s network, not from wherever your exertion runs. If your server presently only accepts soul traffic, you will request to expose it.
Multi-Step Agents Combining Tools
More analyzable agents harvester aggregate instrumentality types successful a azygous request. A owed diligence supplier mightiness web-search for caller news astir a company, fetch its pricing page, and query an soul knowledge guidelines for your existing relationship, each successful 1 conclusion call.
response = client.chat.completions.create( model="openai-gpt-4o", messages=[{ "role": "user", "content": "We're evaluating Acme Corp arsenic a vendor. What's their existent product, caller news, and do we person immoderate history pinch them?" }], tools=[ { "type": "web_search", "max_uses": 3, "max_results": 5 }, { "type": "knowledge_base_retrieval", "knowledge_base_id": "vendor-history-kb" } ] )The exemplary decides which devices to usage and successful what bid based connected the task successful the prompt.
The Latency Tradeoff
Client-side execution intends your exertion codification runs the tool. You nonstop the user’s connection to the model. The exemplary replies pinch “I request to hunt for X.” Your codification runs the search. You nonstop the consequence back. The exemplary gives the last answer. Everything runs successful your ain process, truthful it is accelerated but you constitute and support each step.
Server-side execution intends DigitalOcean runs the instrumentality for you. You nonstop 1 request, and the instrumentality call, execution, and consequence each hap earlier your consequence comes back. The costs is that a web telephone to an outer work now happens wrong the request, and that clip adds to your full consequence time.
Client-side instrumentality execution: what the numbers look like
When your codification runs the tool, the clip depends connected what that instrumentality does:
- A usability moving successful your ain process (input validation, drawstring formatting): nether 1ms
- An MCP server moving connected the aforesaid instrumentality connected complete stdio: 0.91 to 1.10ms per call — this is simply a measured mean from a study of accumulation MCP deployments
- An HTTP telephone to an outer API: 50 to 300ms, depending connected the service
You spot each millisecond of this successful your logs.
Server-side instrumentality execution: what the numbers look like
When DigitalOcean calls a instrumentality connected your behalf, the clip that telephone takes is added to your consequence time. Two devices person meaningfully different latency profiles.
Web search: DO’s web hunt is powered by Exa, which maintains a pre-built neural scale alternatively than crawling the web connected each request. In a 2026 benchmark crossed 15 hunt API providers, Exa was specifically measured and fell successful the index-based tier: nether 400ms astatine median pinch small variance betwixt median and p95. The aforesaid benchmark recovered that real-time SERP APIs which fetch unrecorded results connected each petition ran astatine 600 to 700ms astatine median pinch overmuch higher variance, and p95 exceeding 5 seconds for respective providers. That tail consequence applies to real-time crawlers, not to the index-based backend DO uses.
Customer-owned MCP servers: The consequence present is acold starts. When an MCP server has not handled a petition recently, it has to restart earlier it tin respond. For a Node.js-based MCP server, loading the runtime and modules unsocial tin return complete 2 seconds. If your server is not kept moving betwixt requests, the first telephone successful a caller speech pays that startup costs each time.

When instrumentality latency adds up
Each instrumentality telephone adds clip to the full response. One instrumentality telephone is usually fine. The problem is erstwhile an supplier makes respective instrumentality calls successful a row: each 1 stacks connected apical of the last.
For example, a investigation supplier that runs 3 web searches successful series utilizing DO’s Exa-backed search: astatine nether 400ms median per call, 3 sequential searches adhd astir 1.2 seconds of hunt clip earlier the exemplary generates a response.
That is the emblematic lawsuit for index-based search. For comparison, real-time hunt APIs tally astatine 600 to 700ms median per call, and astatine p95, respective providers successful the Proxyway 2026 benchmark exceeded 5 seconds per call, pushing 3 sequential calls past 15 seconds. The tail consequence depends connected which backend you use. Tool calls stack, and each 1 you adhd increases the chance of hitting the slow tail.
If your supplier is doing research, answering analyzable questions, aliases moving successful the background, a fewer other seconds per task is not a problem. But if you are building a sound adjunct aliases a real-time app wherever users expect a consequence successful nether a second, each instrumentality telephone matters. In those cases, client-side devices aliases faster single-step tasks are a amended fit.

When instrumentality execution moves server-side, you nary longer negociate the relationship lifecycle, retries, aliases scaling for those tools. Keep these 3 considerations successful mind earlier starting pinch server-side tools.
1. Write definitive instructions for instrumentality failures
With client-side tools, a nonaccomplishment throws an objection and your codification catches it. With server-side tools, if a web hunt returns nary useful results, the exemplary continues reasoning pinch immoderate it got. It will not needfully show you the hunt came up empty.
The hole is simple: constitute it into your strategy prompt. “If the knowledge guidelines returns nary applicable results, opportunity truthful and inquire the personification to rephrase,” gives the exemplary a clear way alternatively of letting it guess. Explicit instructions for the empty-result lawsuit make your supplier predictably useful, not conscionable usually useful.
2. Design MCP constitute operations to beryllium safe to retry
Read devices (web search, web fetch, knowledge guidelines retrieval) are safe to retry. A petition times out, you effort again, thing changes.
Write operations done customer-owned MCP servers request much care. If a petition times out, you do not cognize whether the action executed. The correct shape is to delegate each constitute cognition a unsocial petition ID that your MCP server uses to deduplicate retries. This gives you safe retries without double-writes.
3. Know which instrumentality furniture failed
DO-managed devices and your MCP server person abstracted nonaccomplishment paths. If each devices extremity working, that points to the managed infrastructure layer. If web hunt keeps moving but MCP calls fail, that points to your server.
Build your alerting to separate betwixt these 2 cases. “Tool failure” arsenic a azygous alert class is difficult to enactment on. “DO-managed devices down” vs. “MCP server unreachable” tells you precisely wherever to look.
When to Use Client-Side vs Server-Side
Use server-side devices when:
You do not want to negociate instrumentality infrastructure. The relationship lifecycle, credential management, retries, and scaling for web hunt and retrieval are existent engineering work. Moving it to a managed furniture makes consciousness erstwhile that overhead is higher than the nonaccomplishment of nonstop control.
Credentials should not beryllium successful your exertion code. Web APIs, soul systems, and third-party services often require secrets. Server-side devices fto you support credentials person to the devices that usage them alternatively than routing them done your application.
Multiple agents aliases squad members request entree to the aforesaid tools. Client-side devices are scoped to wherever your codification runs. Server-side devices are disposable to immoderate supplier making requests to your conclusion endpoint pinch nary duplicated setup.
You request web hunt aliases real-time retrieval. Web hunt and knowledge guidelines retrieval would different require wiring a hunt supplier yourself and penning the loop to grip instrumentality calls. Server-side devices grip that for you.
You are already utilizing DigitalOcean Inference. If you person a Model Access Key, enabling server-side devices is simply a azygous further section successful your existing API call.
Use client-side devices when:
You are successful improvement and request afloat visibility. Client-side execution is transparent. You tin log each instrumentality call, inspect payloads, group breakpoints, and reproduce failures locally. Server-side execution requires the tracing API to get balanced visibility.
Tools are fast, local, and do not make web calls. Functions that tally in-memory (schema validation, drawstring parsing, calculations) adhd sub-millisecond overhead client-side. Moving them server-side adds a web round-trip for nary benefit.
Your latency fund has nary room. If you are building a sound adjunct wherever users expect a consequence successful nether 400ms and exemplary conclusion unsocial takes 300ms, each instrumentality call’s round-trip matters.
Tool logic depends connected your exertion state. Tools that dangle connected section convention data, in-memory state, aliases database connections your server holds are harder to expose arsenic MCP servers. Keep them client-side.
You request power complete retry behavior. Client-side, you determine precisely what happens erstwhile a instrumentality fails: retry, usage a fallback, aliases propagate a circumstantial correction to the user. Server-side retry behaviour is wished by the conclusion layer.
The applicable path
Server-side is the correct default erstwhile managing instrumentality infrastructure is the problem, which it usually is successful production. Client-side is the correct default erstwhile you request to spot precisely what is happening, which is almost ever existent successful development.
Build pinch client-side devices first. Add observability until you understand the nonaccomplishment modes. Move to server-side for accumulation deployment erstwhile the infrastructure overhead exceeds the debugging benefit.
How This Compares to LangChain and OpenAI Function Calling
If you person built agents pinch LangChain aliases the OpenAI API, present is really the approaches differ.
LangChain tools are client-side by design. You specify devices arsenic Python functions, LangChain wraps them successful a loop, and you tally the full process successful your ain code. You get maximum visibility and control. The costs is that you ain the infrastructure: connections, retries, correction handling, and credential guidance are your responsibility. LangChain does not negociate instrumentality execution for you; it structures the loop you write.
OpenAI’s Responses API built-in tools (web search, codification interpreter, record search) are the closest architectural lucifer to DO’s server-side tools. OpenAI manages execution wrong the conclusion request; you state the devices and get 1 response. But location is simply a existent architectural consequence to really OpenAI’s devices are built: they are tied to OpenAI’s exemplary layer. If you build a investigation supplier connected GPT-4o pinch built-in web hunt today, and later want to tally that aforesaid supplier connected Claude, you person to rebuild the instrumentality furniture from scratch — because OpenAI’s built-in devices only activity pinch OpenAI models. DO’s server-side devices support the instrumentality furniture abstracted from the model. You configure your devices erstwhile and tin tally them pinch immoderate exemplary connected DO’s catalog. Swapping the underlying exemplary does not require changing really your devices are group up. You tin besides link customer-owned MCP servers successful summation to the built-in tools.
DO’s server-side tools are person to the OpenAI exemplary than to LangChain successful position of really they work: managed execution, single-request flow. But they widen that exemplary to customer-owned infrastructure via MCP. The trade-off is the aforesaid successful each 3 cases: managed execution intends little nonstop visibility into what happened wrong the instrumentality call. You waste and acquisition observability for little infrastructure to manage.
For teams already moving LangChain successful production, server-side devices trim the infrastructure you support but do not switch the supplier logic you person already built. For teams evaluating which attack to commencement with, server-side devices make the astir consciousness erstwhile you want to vessel without owning a instrumentality layer.
Server-Side MCP vs Client-Side MCP
MCP (Model Context Protocol) is simply a modular measurement for an AI exemplary to link to outer devices and information sources. Think of it arsenic a plugin system: you build a mini server that exposes your soul tools, for illustration a database lookup aliases a CRM search, and the exemplary tin telephone those devices during a conversation.
For example: you person an soul API that looks up customer contracts. You wrap it successful an MCP server. When a personification asks “show maine the unfastened contracts for Acme Corp,” the exemplary calls your instrumentality straight alternatively of you hardcoding that logic into the prompt.
The cardinal mobility is who manages that connection.
Client-side MCP
With client-side MCP, your exertion codification manages the relationship to the MCP server directly. The exemplary tells your codification “I request to telephone this tool,” your codification makes the call, gets the result, and sends it backmost to the model.
What the latency looks like:
- If the MCP server runs connected the aforesaid instrumentality arsenic your app, the relationship is nonstop and accelerated astatine 0.91 to 1.10ms per call. This is the best-case scenario.
- If the MCP server is simply a abstracted work your app connects to complete the web (remote, but already connected), a lukewarm telephone takes 2 to 3ms, a emblematic HTTP round-trip to a adjacent server.
- If the MCP server has not been utilized successful a while and needs to commencement up caller (cold start), the first telephone tin return complete 2,000ms, much than 2 seconds, conscionable to initialize earlier it does thing useful.
You tin log each instrumentality call, spot precisely what was sent and received, group your ain timeouts, and grip failures nevertheless you want. If thing breaks, you tin reproduce it locally.
Example: You are building a coding adjunct that tin cheque whether a usability exists successful your codebase. You person an MCP server moving locally that indexes your repo. When a developer asks “does this inferior already exist?”, the exemplary calls your section MCP server, which searches the scale and returns the consequence successful astir 1ms. Everything runs connected your machine. No outer web call, nary credentials to manage, and you tin spot precisely what the exemplary asked for and what came back.
Server-side MCP
With server-side MCP, you do not negociate the connection. You show DigitalOcean’s conclusion furniture wherever your MCP server lives and what devices it is allowed to use. When the exemplary needs a tool, DO connects to your server, makes the call, and includes the consequence successful the response, each without your exertion codification doing thing successful between.
What the latency looks like:
A lukewarm HTTP telephone to a distant MCP server takes 2 to 3ms a emblematic baseline for a low-latency HTTP relationship to a adjacent server. But real-world MCP postulation does not enactment warm. Research crossed accumulation MCP deployments shows a median instrumentality telephone latency of 320ms, a p95 of 1,840ms, and a p99 of 6,200ms. The agelong tail is almost wholly acold starts. When a Node.js MCP server starts cold, module loading unsocial tin return complete 2,000ms earlier the first instrumentality telephone is moreover made.
If your server is not pre-warmed aliases kept live betwixt sessions, the first telephone successful a caller speech pays that startup cost. At p99, that is complete 6 seconds earlier your supplier does thing useful. The p99 is not a spot of MCP itself — it reflects cold-start outliers crossed a wide scope of server implementations successful the study. A well-maintained server that stays lukewarm betwixt requests will tally overmuch person to the 2 to 3ms lukewarm baseline. The agelong tail is simply a cold-start problem, not an MCP problem. Keep your MCP servers moving and connections alive.
Example: Same coding assistant, but now deployed and utilized by your full team. Instead of each developer moving a section MCP server, you expose 1 shared MCP server complete the internet. That server stays lukewarm because it is handling requests continuously. Calls travel successful astatine 2 to 3ms alternatively of 2,000ms+. Each conclusion petition includes the URL and a token. DO’s infrastructure handles the connection, and your team’s agents each usage the aforesaid instrumentality without anyone having to group up a section server.
{ "type": "mcp", "server_label": "my-internal-api", "server_url": "https://api.mycompany.com/mcp", "authorization": "Bearer your-secret", "allowed_tools": ["get-contract", "list-open-items", "search-customers"] }How they compare:
-
Who does the work. With client-side MCP, your exertion codification opens the connection, calls the tool, waits for the result, and sends it backmost to the model. With server-side MCP, that work moves to DigitalOcean’s infrastructure. Your codification sends 1 request; everything other happens connected their side.
-
Where your credentials live. Client-side, your API keys and secrets beryllium wrong your exertion (environment variables, a secrets manager, aliases config files). Server-side, you walk the credentials straight successful the conclusion petition complete HTTPS. Both approaches work; the correct prime depends connected really your squad manages secrets.
-
How overmuch you tin see. Client-side gives you afloat visibility: each instrumentality telephone is successful your ain logs, you tin spot precisely what was sent and what came back, and you tin debug locally. Server-side, instrumentality calls hap wrong the conclusion layer, truthful you request to usage the tracing API to get that aforesaid level of detail.
-
Who is responsible for slow starts. Cold starts are still your problem either way. DigitalOcean manages the relationship from its side, but if your MCP server has not received a petition successful a while and needs to restart, that startup clip (often 2,000ms aliases much for Node.js servers) still shows up successful your consequence latency. With a shared server-side setup, postulation from aggregate agents keeps the server lukewarm automatically.
-
Whether your server needs to beryllium connected the internet. Client-side MCP connects from wherever your exertion runs: your backstage network, your section machine, aliases an soul server. Server-side MCP connects from DigitalOcean’s network, which intends your MCP server must person a nationalist URL. If your devices beryllium down a firewall aliases successful a backstage network, client-side is the only action without further networking work.
-
Whether aggregate agents tin stock the aforesaid tools. Client-side devices are tied to the instrumentality aliases process moving your code. If 3 different agents request the aforesaid tool, each needs its ain connection. Server-side devices are disposable to immoderate supplier that uses your conclusion endpoint pinch nary duplicated setup.
-
What happens erstwhile a instrumentality fails. Client-side, you constitute the logic: retry, autumn backmost to a different information source, aliases return a circumstantial correction to the user. Server-side, the conclusion furniture handles retries pinch its ain defaults. For astir accumulation usage cases, the defaults are fine. But if your supplier needs precise power complete what happens aft a instrumentality fails, client-side gives you that.
When your devices tally wrong your ain code, debugging is straightforward: thing fails, you spot the error, you hole it. When devices tally wrong the conclusion layer, you suffer that nonstop view. Three things are worthy mounting up earlier you ship.
1. Wire up tracing from the start
With client-side tools, each instrumentality telephone is successful your ain logs. With server-side tools, that item lives wrong DigitalOcean’s infrastructure. Use the Agent Tracing API to spot what each instrumentality telephone cost. Without it, a slow consequence could beryllium the model, a web search, aliases a acold MCP server, and you will not cognize which.
2. Track instrumentality latency separately from full consequence time
Overall consequence clip is excessively wide a metric for supplier workloads. Track really agelong each individual instrumentality telephone takes, really galore instrumentality calls each petition makes, and really often a instrumentality returns thing useful. DigitalOcean’s Control Panel gives you aggregate conclusion metrics; per-tool breakdown requires the tracing API aliases your ain instrumentation.
3. Separate instrumentality failures from supplier failures
“The supplier did not work” tin mean 2 different things: the instrumentality collapsed (search timed out, knowledge guidelines returned nothing), aliases the instrumentality worked but the exemplary still gave the incorrect answer. These request different fixes. One is an infrastructure problem; the different is simply a punctual problem. Tracking them arsenic 1 metric makes some harder to diagnose.
FAQs
-
What devices tin I usage server-side today? Web search, web fetch, knowledge guidelines retrieval, customer-owned MCP servers, and Tool Search (lazy-loading for Anthropic and OpenAI models).
-
Does it costs other to usage server-side tools? Knowledge guidelines retrieval, MCP, and Tool Search do not adhd costs beyond modular conclusion pricing. Web hunt is charged astatine $10 per 1,000 requests.
-
Can I operation client-side and server-side devices successful the aforesaid agent? Yes. You tin walk server-side instrumentality definitions successful the conclusion petition and still grip different devices successful your ain code. They activity independently.
-
Does my MCP server request to beryllium connected the nationalist internet? Yes, for server-side MCP. DigitalOcean’s conclusion furniture connects to your server from its ain network, truthful your server needs a nationalist URL. If that is not possible, usage client-side MCP instead.
-
What happens if a server-side instrumentality fails mid-request? The exemplary continues pinch immoderate partial consequence it got. It will not ever show you the instrumentality returned thing useful. Write definitive instructions successful your strategy punctual for what to do erstwhile a instrumentality returns nary results.
-
How do I debug slow responses erstwhile devices tally server-side? Use the Agent Tracing API. It shows you the clip spent per instrumentality telephone truthful you tin show whether the exemplary aliases a circumstantial instrumentality is the bottleneck.
-
What is Tool Search and erstwhile should I usage it? Tool Search lazy-loads instrumentality definitions. Tools marked defer_loading: existent are only loaded erstwhile the exemplary needs them, not connected each request. Use it erstwhile your supplier has galore devices and you want to trim input token usage.
-
When should I instrumentality pinch client-side tools? When you are still successful improvement and request to debug easily, erstwhile your devices tally connected a backstage network, aliases erstwhile your latency fund is very tight and each web round-trip matters.
The Shift Worth Understanding
Moving instrumentality execution server-side changes wherever the complexity lives. The infrastructure you would different ain (connections, credentials, retries, scaling) moves to a managed layer. Your supplier codification gets smaller, and the operational aboveground your squad is responsible for shrinks.
But the difficult parts of building agents do not move pinch it. Designing for instrumentality failures, keeping MCP servers warm, separating instrumentality errors from exemplary errors, knowing what to instrument: those stay your job. The quality is that you are doing that activity connected apical of unchangeable infrastructure alternatively of while besides keeping the plumbing running.
These tradeoffs are the aforesaid sloppy of which supplier you use. The architecture determination does not alteration based connected wherever you tally inference.
If you are deciding correct now, present is the short version: usage server-side devices if your main problem is infrastructure overhead, specified arsenic credentials, connections, retries, and scaling. Use client-side devices if your main problem is visibility aliases control. And if you are not judge yet, commencement client-side, understand the nonaccomplishment modes, past move server-side erstwhile the infrastructure costs outweighs the debugging benefit.
References
DigitalOcean Resources
- How to Use Server-Side Tools — DigitalOcean Docs
- Inference Features — DigitalOcean Docs
- Inference Pricing — DigitalOcean Docs
- Agent Tracing Data — DigitalOcean Docs
MCP Benchmarks
- MCP Bridge: A Lightweight, LLM-Agnostic RESTful Proxy for Model Context Protocol Servers — arXiv
Exa Benchmarks
- Search APIs successful 2026: Overview & Benchmarks — Proxyway
- Introducing Exa Instant — Exa Blog
This activity is licensed nether a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.
English (US) ·
Indonesian (ID) ·