PersistentParadoxPlex
Making monolithic TCP servers horizontally scalable without touching their source
TCP proxy written in Rust that demultiplexes long-lived client connections to a pool of stateless gRPC upstream workers — making old monolithic TCP servers cloud-scalable without touching their source.
Why I built this
I kept running into the same wall: game servers and legacy TCP applications that couldn’t scale because they were never designed to. You have a monolithic process that owns every socket it accepts. When it runs out of CPU, you buy a bigger machine. When it crashes, every connected client drops. There’s no path to horizontal scaling that doesn’t involve a full rewrite — and the codebases I was dealing with were old, undocumented, or flat-out owned by someone else.
The typical architecture looks like this:
[client] ---TCP---> [monolithic server] |- I/O handling |- session state in memory |- business logic |- database accessEverything is coupled. I/O, session tracking, business logic, database access — all in one process. The longer it runs, the harder it is to touch.
So instead of rewriting the server, I built a proxy that sits in front of it and handles the scaling problem transparently.
The idea: demultiplex TCP into gRPC
PersistentParadoxPlex (PPlex) accepts all downstream client connections using the same protocol they already speak — raw TCP. Internally, it bridges each connection to a pool of upstream workers via gRPC bidirectional streaming. The upstream workers implement a simple Protobuf contract, handle business logic, and send responses back. They’re stateless, cloud-deployable, and can be written in any language with gRPC support.
[client] ---TCP---> [PPlex] ---gRPC---> [upstream worker A] ---gRPC---> [upstream worker B] ---gRPC---> [upstream worker C]PPlex owns everything networking-related: UUID assignment per connection, connection lifecycle, round-robin routing, health checks, and graceful failover. Upstream workers only deal with business logic. You can scale the worker pool independently, swap workers without downtime, and deploy them to Kubernetes without any awareness of TCP sockets.
Architecture
I wrote PPlex in Rust using Tokio for async I/O. Each accepted TCP connection spawns async read and write tasks. A central runtime actor holds two maps: downstream UUIDs to their event channels, and upstream peer IDs to their gRPC channels. Routing is a round-robin scheduler over registered upstream peers. When a downstream message arrives, the runtime picks the next upstream in rotation and forwards the payload over the existing bidirectional gRPC stream — no new connection opened per message.
The HTTP management API (built with Rocket) lets you register new upstream peers at runtime without restarting PPlex.
TCP listener (port 7999) | v[downstream peer] <-- spawns read/write async tasks | v[runtime actor] -- round-robin --> [upstream peer pool] | |-- gRPC stream to worker A | |-- gRPC stream to worker B v[HTTP management API (port 8000)] -- POST /upstream/addThe Protobuf contract
This is the full interface upstream workers need to implement. It lives in misc/grpc/proto/upstream.proto:
service UpstreamPeerService { // main data channel: downstream bytes in, responses out rpc bidirectionalStreaming(stream InputStreamRequest) returns (stream OutputStreamRequest) {}
// readiness probe -- called before routing traffic to this peer rpc ready(google.protobuf.Empty) returns (ReadyResult);
// liveness probe -- called on a heartbeat interval rpc live(stream google.protobuf.Empty) returns (stream LiveResult);}
// downstream -> upstream: bytes from a specific clientmessage InputStreamRequest { string time = 1; string client_uuid = 2; bytes payload = 3;}
// upstream -> downstream: bytes to a specific client or broadcastmessage OutputStreamRequest { string time = 1; enum Broadcast { BROADCAST_ALL = 0; BROADCAST_ACTIVE = 1; BROADCAST_NOT_ACTIVE = 2; } oneof target { string client_uuid = 2; Broadcast broadcast = 3; } bytes payload = 4;}The client_uuid field is the key abstraction. PPlex assigns a UUIDv4 to every TCP connection at accept time and stamps it on every message. Upstream workers use it to route responses back to the right client without ever touching a raw socket. They just read a UUID, run logic, and write back a UUID. Everything else is PPlex’s problem.
The broadcast targeting in OutputStreamRequest is useful for game-server scenarios: you can push a message to all active connections, all inactive ones, or a specific client — all without the upstream worker knowing anything about connection state.
Configuration
[server]host = "0.0.0.0"port = 7999 # TCP listener
[[upstream]]host = "127.0.0.1"port = 45888 # gRPC upstream targetalive_timeout = 30 # health check interval (seconds)ready_timeout = 5 # readiness probe timeout
[management_server]host = "0.0.0.0"port = 8000 # HTTP management APIAdditional upstream workers can be added at runtime via POST /upstream/add without downtime.
What PPlex handles
| Capability | Detail |
|---|---|
| TCP demultiplexing | accepts raw TCP, fans out to gRPC pool |
| UUID session tracking | each downstream connection gets a stable identifier |
| Round-robin load balancing | fair distribution across all registered upstreams |
| Health checks | ready and live probes before routing and on heartbeat |
| Dynamic upstream registration | HTTP API, no restart needed |
| Broadcast targeting | send to one client, all active, or all inactive |
| Multi-language upstream | any gRPC-capable language works as an upstream worker |
What I learned building it
The hardest problem was task lifecycle management. There’s a meaningful difference between a slow client and a dead connection — “no bytes for N seconds” versus “connection reset by peer”. Handling that wrong causes goroutine accumulation over time: tasks pile up waiting on sockets that are never going to send another byte. I ended up using read/write deadlines with per-connection activity tracking to time out genuinely dead connections while keeping valid but quiet sessions alive.
The gRPC bidirectional streaming model maps surprisingly well to the TCP session model. One long-lived stream per upstream peer multiplexes all the downstream sessions routed to it. This avoids the overhead of opening a new gRPC call per client message, which would have been a significant bottleneck at scale.
One architectural limit worth being honest about: PPlex is not fully stateless. Once a TCP connection is established, it lives on the specific PPlex instance that accepted it. For cases where you need multiple PPlex instances, I recommend putting a Round-Robin DNS in front rather than trying to use VRRP/CARP — the sockets are bound to the accepting load balancer for their lifetime. If your upstream logic requires per-session state shared across workers, that’s a problem for Redis or a similar external store, not for PPlex itself.
Running it
# buildcargo build
# run with trace loggingRUST_LOG=trace cargo run
# or Dockerdocker run -p 7999:7999 tanguc/persistentparadoxplex
# connect a test clientnc 127.0.0.1 7999For development, the reference upstream server is a Go gRPC implementation at tanguc/golang-grpc-server. To regenerate Rust files from the proto definition:
./generate_rs_from_proto.sh && cargo buildPPlex is also installable via Homebrew and pre-built binaries are available on the releases page.