// Case Study

PersistentParadoxPlex

Making monolithic TCP servers horizontally scalable without touching their source

TCP proxy written in Rust that demultiplexes long-lived client connections to a pool of stateless gRPC upstream workers — making old monolithic TCP servers cloud-scalable without touching their source.

TCP Proxy gRPC Rust Load Balancer Cloud Protobuf Tokio

View on GitHub

Why I built this

I kept running into the same wall: game servers and legacy TCP applications that couldn’t scale because they were never designed to. You have a monolithic process that owns every socket it accepts. When it runs out of CPU, you buy a bigger machine. When it crashes, every connected client drops. There’s no path to horizontal scaling that doesn’t involve a full rewrite — and the codebases I was dealing with were old, undocumented, or flat-out owned by someone else.

The typical architecture looks like this:

[client] ---TCP---> [monolithic server]
                     |- I/O handling
                     |- session state in memory
                     |- business logic
                     |- database access

Everything is coupled. I/O, session tracking, business logic, database access — all in one process. The longer it runs, the harder it is to touch.

So instead of rewriting the server, I built a proxy that sits in front of it and handles the scaling problem transparently.

The idea: demultiplex TCP into gRPC

PersistentParadoxPlex (PPlex) accepts all downstream client connections using the same protocol they already speak — raw TCP. Internally, it bridges each connection to a pool of upstream workers via gRPC bidirectional streaming. The upstream workers implement a simple Protobuf contract, handle business logic, and send responses back. They’re stateless, cloud-deployable, and can be written in any language with gRPC support.

[client] ---TCP---> [PPlex] ---gRPC---> [upstream worker A]
                            ---gRPC---> [upstream worker B]
                            ---gRPC---> [upstream worker C]

PPlex owns everything networking-related: UUID assignment per connection, connection lifecycle, round-robin routing, health checks, and graceful failover. Upstream workers only deal with business logic. You can scale the worker pool independently, swap workers without downtime, and deploy them to Kubernetes without any awareness of TCP sockets.

Architecture

I wrote PPlex in Rust using Tokio for async I/O. Each accepted TCP connection spawns async read and write tasks. A central runtime actor holds two maps: downstream UUIDs to their event channels, and upstream peer IDs to their gRPC channels. Routing is a round-robin scheduler over registered upstream peers. When a downstream message arrives, the runtime picks the next upstream in rotation and forwards the payload over the existing bidirectional gRPC stream — no new connection opened per message.

The HTTP management API (built with Rocket) lets you register new upstream peers at runtime without restarting PPlex.

TCP listener (port 7999)
     |
     v
[downstream peer]  <-- spawns read/write async tasks
     |
     v
[runtime actor]  -- round-robin --> [upstream peer pool]
     |                                   |-- gRPC stream to worker A
     |                                   |-- gRPC stream to worker B
     v
[HTTP management API (port 8000)]  -- POST /upstream/add

The Protobuf contract

This is the full interface upstream workers need to implement. It lives in misc/grpc/proto/upstream.proto:

service UpstreamPeerService {
  // main data channel: downstream bytes in, responses out
  rpc bidirectionalStreaming(stream InputStreamRequest)
      returns (stream OutputStreamRequest) {}

  // readiness probe -- called before routing traffic to this peer
  rpc ready(google.protobuf.Empty) returns (ReadyResult);

  // liveness probe -- called on a heartbeat interval
  rpc live(stream google.protobuf.Empty) returns (stream LiveResult);
}

// downstream -> upstream: bytes from a specific client
message InputStreamRequest {
  string time       = 1;
  string client_uuid = 2;
  bytes  payload    = 3;
}

// upstream -> downstream: bytes to a specific client or broadcast
message OutputStreamRequest {
  string time = 1;
  enum Broadcast {
    BROADCAST_ALL        = 0;
    BROADCAST_ACTIVE     = 1;
    BROADCAST_NOT_ACTIVE = 2;
  }
  oneof target {
    string    client_uuid = 2;
    Broadcast broadcast   = 3;
  }
  bytes payload = 4;
}

The client_uuid field is the key abstraction. PPlex assigns a UUIDv4 to every TCP connection at accept time and stamps it on every message. Upstream workers use it to route responses back to the right client without ever touching a raw socket. They just read a UUID, run logic, and write back a UUID. Everything else is PPlex’s problem.

The broadcast targeting in OutputStreamRequest is useful for game-server scenarios: you can push a message to all active connections, all inactive ones, or a specific client — all without the upstream worker knowing anything about connection state.

Configuration

[server]
host = "0.0.0.0"
port = 7999          # TCP listener

[[upstream]]
host          = "127.0.0.1"
port          = 45888        # gRPC upstream target
alive_timeout = 30           # health check interval (seconds)
ready_timeout = 5            # readiness probe timeout

[management_server]
host = "0.0.0.0"
port = 8000          # HTTP management API

Additional upstream workers can be added at runtime via POST /upstream/add without downtime.

What PPlex handles

Capability	Detail
TCP demultiplexing	accepts raw TCP, fans out to gRPC pool
UUID session tracking	each downstream connection gets a stable identifier
Round-robin load balancing	fair distribution across all registered upstreams
Health checks	`ready` and `live` probes before routing and on heartbeat
Dynamic upstream registration	HTTP API, no restart needed
Broadcast targeting	send to one client, all active, or all inactive
Multi-language upstream	any gRPC-capable language works as an upstream worker

What I learned building it

The hardest problem was task lifecycle management. There’s a meaningful difference between a slow client and a dead connection — “no bytes for N seconds” versus “connection reset by peer”. Handling that wrong causes goroutine accumulation over time: tasks pile up waiting on sockets that are never going to send another byte. I ended up using read/write deadlines with per-connection activity tracking to time out genuinely dead connections while keeping valid but quiet sessions alive.

The gRPC bidirectional streaming model maps surprisingly well to the TCP session model. One long-lived stream per upstream peer multiplexes all the downstream sessions routed to it. This avoids the overhead of opening a new gRPC call per client message, which would have been a significant bottleneck at scale.

One architectural limit worth being honest about: PPlex is not fully stateless. Once a TCP connection is established, it lives on the specific PPlex instance that accepted it. For cases where you need multiple PPlex instances, I recommend putting a Round-Robin DNS in front rather than trying to use VRRP/CARP — the sockets are bound to the accepting load balancer for their lifetime. If your upstream logic requires per-session state shared across workers, that’s a problem for Redis or a similar external store, not for PPlex itself.

Running it

# build
cargo build

# run with trace logging
RUST_LOG=trace cargo run

# or Docker
docker run -p 7999:7999 tanguc/persistentparadoxplex

# connect a test client
nc 127.0.0.1 7999

For development, the reference upstream server is a Go gRPC implementation at tanguc/golang-grpc-server. To regenerate Rust files from the proto definition:

./generate_rs_from_proto.sh && cargo build

PPlex is also installable via Homebrew and pre-built binaries are available on the releases page.

// Related Reading

I Built a TCP-to-gRPC Proxy Because Monolithic Game Servers Can't Scale

Read the blog post

Taking a Monolithic TCP Game Server to the Cloud — The Hard Way

Read the blog post