Publication: MAR 2026 Reading time: 4 min

I Built a TCP-to-gRPC Proxy Because Monolithic Game Servers Can't Scale

#cloud #tcp/ip #grpc #architecture #load-balancer #rust

I wanted to run a private server for an old game. Not a popular one — niche, mostly forgotten, kept alive by a small community. The server software was a monolith that nobody had touched in years. It spoke raw TCP, kept all session state in memory, and fell over the moment you pushed it past a few hundred concurrent players.

The obvious answer was to scale horizontally. The actual answer was: you can’t. TCP connections are bound to the OS process that accepted them. You cannot hand a socket to another process or move it across machines. Add a second instance and you’ve solved nothing — existing sessions are stranded on the first one.

That’s where PersistentParadoxPlex (PPlex) came from. For the full origin story and design rationale, see the original deep dive. The project page has the configuration reference and Protobuf contract.

The Constraint

This is not a software limitation you can patch around — it is a kernel-level boundary. A monolithic TCP server that holds session state in memory (game servers, trading systems, legacy enterprise backends) cannot be horizontally scaled without breaking every live connection.

The usual answer is “rewrite it as microservices.” That is often not an option — the codebase is old, undocumented, owned by a third party, or the binary is all you have. I had the source but not the time or appetite for a full rewrite of something that already worked.

What PPlex Does Instead

PPlex sits in front of the existing server and takes ownership of all TCP socket handling:

before PPlex:
[client] ---TCP---> [monolith]

after PPlex:
[client] ---TCP---> [PPlex] ---gRPC---> [worker 1]
                             ---gRPC---> [worker 2]
                             ---gRPC---> [worker 3]

Clients still speak raw TCP — no client changes required. PPlex accepts the connection, assigns a UUIDv4 to it, and routes incoming bytes to one upstream worker via gRPC bidirectional streaming. The UUID travels with every message so upstream workers always know which client they are serving.

Upstream workers are stateless. They receive normalized byte payloads with a client UUID, run business logic, and reply. They can be written in any gRPC-capable language (Go, Java, Python, Node.js) and scaled independently via Kubernetes or any container scheduler.

The gRPC Contract

Upstream workers implement a single service:

service UpstreamPeerService {
  rpc bidirectionalStreaming(stream InputStreamRequest)
      returns (stream OutputStreamRequest) {}
  rpc ready(google.protobuf.Empty) returns (ReadyResult);
  rpc live(stream google.protobuf.Empty) returns (stream LiveResult);
}

The bidirectionalStreaming RPC is the main data path. PPlex opens one long-lived stream per registered upstream peer and multiplexes all routed client sessions over it. The ready and live RPCs are health probes — PPlex checks readiness before routing traffic to a new upstream and checks liveness on a configurable heartbeat interval.

The payload in both directions is raw bytes. PPlex never inspects the application protocol — it is completely transparent to whatever the clients and servers are speaking. That transparency was intentional: the whole point is to not care about the custom protocol running inside the pipe.

Session Routing

PPlex maintains two maps in its runtime actor:

downstream peers: UUID → async channel to the client’s write task
upstream peers: peer ID → gRPC channel handle

When a downstream message arrives, the runtime picks the next upstream via round-robin and forwards { client_uuid, payload } on that peer’s gRPC stream. When an upstream sends a response targeting a specific client_uuid, the runtime looks up the UUID in the downstream map and writes the bytes back to the TCP socket.

Broadcast responses (BROADCAST_ALL, BROADCAST_ACTIVE, BROADCAST_NOT_ACTIVE) fan out to all matching downstream peers in parallel.

Dynamic Upstream Registration

New upstream workers register through the HTTP management API at runtime:

curl -X POST http://localhost:8000/upstream/add \
  -H 'Content-Type: application/json' \
  -d '{"host":"10.0.1.5","port":45888,"alive_timeout":30,"ready_timeout":5}'

PPlex runs the readiness probe, and if the upstream passes, starts routing new client sessions to it. No restart required. This is the mechanism that makes horizontal scaling work — Kubernetes pods spin up, register with PPlex, and immediately start receiving load.

Use Cases

Legacy game servers. The original motivation. Community servers running old game protocols that cannot be modified, needing to handle more concurrent players than a single machine allows.

Migration without downtime. Stand up new backend instances behind PPlex, register them, and drain the old ones. Existing clients stay connected throughout. PPlex handles the gradual cutover.

Protocol bridging. Any system where clients must speak raw TCP but upstream processing benefits from the gRPC ecosystem — observability, load balancing, multi-language workers, Kubernetes-native health checks.

Stateless computation over stateful connections. Business logic that is naturally stateless (authentication checks, game turn processing, message routing) but sits behind a long-lived TCP session. Move the stateless logic to a worker pool, leave the session management to PPlex.

What PPlex Does Not Do

PPlex is not a general-purpose API gateway. It does not understand application-layer protocols, does not do SSL termination, and does not provide request/response logging at the payload level (only connection-level events). It is deliberately minimal — one job, done well.

Full reference:

Project page with Protobuf definitions and configuration
From Monolithics to the Cloud with PersistentParadoxPlex — original long-form write-up
GitHub: tanguc/PersistentParadoxPlex