Using Local Ollama as an OpenAI-Compatible Backend for Immersive Translate Chrome Plugin: Debugging Notes and Final Setup

Background

I was running Ollama on a remote Linux machine with the qwen3.5:9b model, and I wanted to use it from the Immersive Translate browser extension through an OpenAI-compatible API.

The goal was straightforward:

Run Ollama on a remote server
Access it from a browser extension through an OpenAI-style API
Avoid exposing the raw LAN port if possible
Disable thinking to reduce translation latency

The final working endpoint looked like this:

https://<your-tailnet-node>.ts.net/v1/chat/completions

What I Was Trying to Do at First

The first thing I checked was whether Ollama was actually running correctly on the remote machine and whether inference was using the GPU.

I ran:

ollama ps

And saw:

PROCESSOR    100% GPU

That confirmed qwen3.5:9b was running fully on GPU rather than falling back to CPU.

After that, I tried to point Immersive Translate directly at Ollama’s OpenAI-compatible endpoint:

http://<server-lan-ip>:11434/v1/chat/completions

Problem 1: Ollama Was Only Reachable Locally on the Server

On the server itself, this worked:

curl http://127.0.0.1:11434/api/tags

But from my own machine, this initially failed:

curl http://<server-lan-ip>:11434/api/tags

Why It Happened

Ollama was only listening on the loopback interface:

ss -lntp | grep 11434

It showed:

127.0.0.1:11434

That meant:

The server itself could access Ollama
Other devices on the LAN could not

How I Fixed It

I added environment variables to ollama.service through systemd:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"

Then I restarted the service.

Why This Fix Made Sense

OLLAMA_HOST=0.0.0.0:11434 makes Ollama listen on the LAN
OLLAMA_ORIGINS=* allows cross-origin access from browser-based clients

After that, this worked from my own machine:

curl http://<server-lan-ip>:11434/api/tags

Problem 2: The Browser Extension Still Failed with `Failed to fetch`

Even though curl was working, Immersive Translate still failed when I clicked “Test Service”.

In the browser console, I saw:

TypeError: Failed to fetch

In the Network panel, the target request was:

http://<server-lan-ip>:11434/v1/chat/completions

What I Suspected at First

At first I considered a few possibilities:

The OpenAI-compatible URL path was wrong
The API key was configured incorrectly
The model was simply too slow
CORS was not configured correctly

What the Investigation Actually Showed

I tested the CORS preflight manually:

curl -i -X OPTIONS 'http://<server-lan-ip>:11434/v1/chat/completions' \
  -H 'Origin: chrome-extension://test' \
  -H 'Access-Control-Request-Method: POST' \
  -H 'Access-Control-Request-Headers: authorization,content-type,api-key'

The response included:

Access-Control-Allow-Origin: chrome-extension://test

But it did not allow the api-key request header.

The browser extension was actually sending:

Api-Key: ollama
Authorization: Bearer ollama
Content-Type: application/json

Root Cause

This was a classic browser CORS preflight failure:

The extension wanted to send Api-Key
The server did not allow Api-Key in Access-Control-Allow-Headers
The browser blocked the request before it was actually sent

How I Fixed It

I removed the API key from the extension configuration so it would stop sending the Api-Key header.

Why This Fix Made Sense

The problem was not the API key value. The problem was the header name itself. Removing that header was the most direct client-side workaround.

Problem 3: Even Without API Key, Direct LAN Access Was Still Unstable

Even after removing Api-Key, requests to <server-lan-ip> were still unreliable and sometimes failed with:

Failed to fetch

In some cases, it looked like the browser blocked the request before the server ever produced a real response.

Why I Thought This Was Happening

At that point, this no longer looked like a simple CORS issue. It looked more like the browser being stricter about requests to private LAN addresses.

The key points were:

<server-lan-ip> is an RFC1918 private address
Browser extensions still operate inside the browser’s networking security model
Accessing a private IP is often more problematic than accessing localhost

How I Tested a Better Path First: SSH Tunneling

I used an SSH host alias from ~/.ssh/config and created a local port forward:

ssh -N -L 11434:127.0.0.1:11434 <your-ssh-host-alias>

Then I changed the extension endpoint to:

http://127.0.0.1:11434/v1/chat/completions

Why This Worked Better

From the browser’s point of view, it was now talking to a local service:

Browser extension -> 127.0.0.1 -> SSH tunnel -> remote 127.0.0.1:11434

The browser was no longer directly dealing with a private LAN IP, and compatibility became much better.

This path worked reliably.

Problem 4: The Model Was Still “Thinking” Too Much and Translation Was Slow

Once the API path worked, another issue became obvious: responses included a large reasoning field, which made translation slower than I wanted.

For example, the response looked like this:

"message": {
  "role": "assistant",
  "content": "Hello, world",
  "reasoning": "Thinking Process: ..."
}

What I Thought at First

I initially considered:

The system prompt might be too long
The temperature might be too high
The model itself might simply be slow

These can affect latency, but they were not the core issue here.

What Was Actually Happening

qwen3.5:9b was using thinking, and the browser extension was not sending any parameter to disable it.

The actual request payload captured in the browser was:

{
  "model": "qwen3.5:9b",
  "temperature": 0,
  "messages": [
    { "role": "system", "content": "..." },
    { "role": "user", "content": "..." }
  ]
}

It did not include:

"think": false

And it also did not include:

"reasoning_effort": "none"

How I Verified the Problem Was on the Client Side

I sent the request manually with curl:

curl http://127.0.0.1:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [{"role":"user","content":"Hello"}],
    "think": false
  }'

That response came back without any reasoning field.

So the conclusion was clear:

The server supported disabling thinking
The extension was simply not sending the required parameter

Final Client-Side Fix

Based on Immersive Translate’s advanced configuration options, I added a model-specific override under openai-custom:

"modelsOverrides": [
  {
    "models": ["qwen3.5:9b"],
    "bodyConfigs": {
      "think": false
    }
  }
]

This made the extension include the disable-thinking parameter in the actual request body, and latency improved noticeably.

Why This Was the Right Fix

This solved the problem at the client layer:

No extra proxy was needed on the server
No request rewriting layer was needed in front of Ollama
The behavior was scoped to a specific model

Final Setup: Tailscale + Domain + HTTPS

The SSH tunnel worked, but it had two obvious drawbacks:

I had to keep the tunnel alive
It was not elegant as a long-term setup

So I moved to a better final setup:

Both machines joined the same Tailscale tailnet
I enabled MagicDNS
I enabled HTTPS certificates
I used tailscale serve on the remote machine

The command was:

sudo tailscale serve --bg --https=443 http://127.0.0.1:11434

This gave me a stable URL:

https://<your-tailnet-node>.ts.net/

The extension now uses:

https://<your-tailnet-node>.ts.net/v1/chat/completions

Why This Ended Up Being the Best Solution

It solved several problems at once:

No more dependency on a private LAN IP

The browser sees a standard HTTPS domain rather than a raw private IP.

No more SSH tunnel dependency

There is no need to keep ssh -L running manually.

Ollama can keep listening only on localhost

The service still stays on:

127.0.0.1:11434

So Ollama does not need to be exposed directly to the entire LAN.

Better fit for browser and extension expectations

HTTPS + domain is generally more compatible than private IP + custom port.

Persistent background setup

With:

sudo tailscale serve --bg --https=443 http://127.0.0.1:11434

The service stays available in the background without requiring an interactive terminal session.

What Tailscale Is Doing Under the Hood

The final request path looks roughly like this:

Immersive Translate
-> https://<your-tailnet-node>.ts.net/v1/chat/completions
-> Tailscale Serve
-> http://127.0.0.1:11434
-> Ollama

Tailscale provides several pieces here:

A WireGuard-based encrypted overlay network
MagicDNS for stable device names
HTTPS certificates
tailscale serve to proxy tailnet HTTPS traffic to a local service

So the browser ends up talking to a standard HTTPS service, not a bare port on a private LAN IP.

Key Lessons From This Debugging Session

1. `curl` working does not mean a browser extension will work

curl is not subject to browser security policies. Browser extensions still go through CORS, preflight behavior, and local-network-related restrictions.

2. Separate server-side failures from browser-side failures early

If local curl works, remote curl works, and the extension still fails, the browser layer is the first thing to suspect.

3. The actual request payload matters more than the configuration file

A setting is only real if it shows up in the captured request payload. Browser DevTools are the source of truth.

4. `localhost` and `192.168.x.x` are not equivalent from the browser’s point of view

They may both be “local” in a human sense, but the browser does not treat them the same way.

5. Standard access patterns are better long-term

If something is meant to be consumed by a browser or browser extension, these tend to win:

Stable domain names
HTTPS
Avoiding direct exposure of raw backend ports

That is usually more robust than patching around a private IP setup.

Final Configuration Summary

Server Side

Ollama runs on a remote Linux machine
Model: qwen3.5:9b
Ollama listens on: 127.0.0.1:11434
Tailscale Serve endpoint: https://<your-tailnet-node>.ts.net

Extension Side

API URL:

https://<your-tailnet-node>.ts.net/v1/chat/completions

Model:

qwen3.5:9b

Disable thinking:

"modelsOverrides": [
  {
    "models": ["qwen3.5:9b"],
    "bodyConfigs": {
      "think": false
    }
  }
]

Conclusion

This was not really about “how to make a model run.” It was about how to make a browser extension talk to a remotely hosted local-model service in a way that is stable, low-latency, and maintainable.

The hard part turned out not to be the model itself, but browser security rules, API compatibility, and deployment shape.

The final choice, Tailscale + HTTPS + domain + localhost-bound Ollama, worked well because it provided:

Stability
Better security
No raw LAN port exposure
Better browser compatibility
Lower maintenance overhead

If I had to do the same kind of setup again for a browser extension talking to a home or remote LLM server, I would start with this architecture instead of exposing a private IP first and fixing issues one by one.

Using Local Ollama as an OpenAI-Compatible Backend for Immersive Translate Chrome Plugin: Debugging Notes and Final Setup

Background

What I Was Trying to Do at First

Problem 1: Ollama Was Only Reachable Locally on the Server

Why It Happened

How I Fixed It

Why This Fix Made Sense

Problem 2: The Browser Extension Still Failed with Failed to fetch

What I Suspected at First

What the Investigation Actually Showed

Root Cause

How I Fixed It

Why This Fix Made Sense

Problem 3: Even Without API Key, Direct LAN Access Was Still Unstable

Why I Thought This Was Happening

How I Tested a Better Path First: SSH Tunneling

Why This Worked Better

Problem 4: The Model Was Still “Thinking” Too Much and Translation Was Slow

What I Thought at First

What Was Actually Happening

How I Verified the Problem Was on the Client Side

Final Client-Side Fix

Why This Was the Right Fix

Final Setup: Tailscale + Domain + HTTPS

Why This Ended Up Being the Best Solution

What Tailscale Is Doing Under the Hood

Key Lessons From This Debugging Session

1. curl working does not mean a browser extension will work

2. Separate server-side failures from browser-side failures early

3. The actual request payload matters more than the configuration file

4. localhost and 192.168.x.x are not equivalent from the browser’s point of view

5. Standard access patterns are better long-term

Final Configuration Summary

Server Side

Extension Side

Conclusion

Problem 2: The Browser Extension Still Failed with `Failed to fetch`

1. `curl` working does not mean a browser extension will work

4. `localhost` and `192.168.x.x` are not equivalent from the browser’s point of view