Taking a Monolithic TCP Game Server to the Cloud — The Hard Way
A while back I started playing with a niche game — old, no longer in active development, kept alive by a small community running unofficial servers. The game was composed of two parts: a client written in an ancient stack, and a server that communicated with it over TCP. The protocol was custom, undocumented, and nobody was changing it.
I wanted to run a community server. The existing server software was a monolith. It worked, but it could only handle so many concurrent connections before it fell apart. The natural instinct was: scale it out. Run more instances. The problem is you can’t. Not without breaking every live connection.
That constraint led to PersistentParadoxPlex.
The problem with monolithic TCP servers
TCP connections are bound to the OS process that accepted them. A socket is not a value you can hand to another process or ship across a network. A monolithic TCP server that holds session state in memory cannot be horizontally scaled — not without breaking every session that’s already alive.
That’s not just a game server problem. It’s trading platforms, legacy enterprise systems, any server that owns the full stack: I/O, business logic, database access, and the TCP connection itself. All in one process, all entangled.
The typical failure modes as load increases:
- Bottleneck as the single process runs out of CPU or memory
- Connection handling complexity: timeouts, errors, socket limits
- Thread synchronization issues if the server is multithreaded
- A single crash takes everything down
- When you need more capacity, you replace hardware — with a maintenance window
These aren’t design flaws from the 90s that nobody thought about. They’re the natural consequence of building something that works at one scale and never needing to think past it.
Why monolithics don’t fit the cloud
The cloud promise is horizontal scaling: when demand grows, add instances. When demand drops, remove them. No hardware swaps, no maintenance windows. But that model requires your application components to be stateless, or at least to externalize their state somewhere shared.
A monolithic TCP server fails this requirement immediately. The connection is the state. You can’t spread connections across instances because the instance that accepted a socket owns it for that socket’s lifetime.
Throwing more hardware at it (vertical scaling) buys time but doesn’t fix the architecture. You’re still one crash away from dropping every live connection.
The path to microservices
The right direction is splitting the monolith apart — extracting business logic into stateless services that can run in parallel and scale independently. This is well-understood. The hard part is the transition when you can’t rewrite everything at once, or when the server binary isn’t yours to modify.
What if the networking layer could be extracted without touching the business logic? The business logic doesn’t care about socket management. It cares about receiving a request from a client, doing something, and sending a response. The client identity can be passed as data.
That’s the insight behind PPP: separate the TCP socket management completely from the business logic. A dedicated proxy handles all client connections and forwards the payloads upstream — with a client identifier attached.
Enter PersistentParadoxPlex
I started this project — PersistentParadoxPlex (PPP). The name breaks down like this:
- Persistent — identify and keep TCP sockets alive, manage their full lifecycle
- Paradox — multiplexing on both sides of the pipe simultaneously
- Plex — multiplexing
The design goals from the start: be fast, be resilient, be easy to operate. I wanted something that a small team could drop in front of an existing server and immediately start routing traffic, without a migration project.
Assumptions and constraints
Assumption 1: The internal team will keep TCP/IP for client connections — safe, ordered, what the clients already speak.
Assumption 2: The custom protocol running inside the TCP pipe should be left alone. PPP doesn’t need to parse it. Whatever bytes the client sends, PPP forwards them opaquely.
Assumption 3: The upstream communication needs to be fast, resilient, and able to carry enough context for stateless workers. The answer was gRPC — high performance, bi-directional streaming, cross-language support, Protobuf for the contract.
The Protobuf contract
Below is PPP’s Protobuf definition that upstream servers must implement:
message InputStreamRequest { string time = 1; string client_uuid = 2; bytes payload = 3;}
message OutputStreamRequest { string time = 1; enum Broadcast { BROADCAST_ALL = 0; BROADCAST_ACTIVE = 1; BROADCAST_NOT_ACTIVE = 2; } oneof target { string client_uuid = 2; Broadcast broadcast = 3; } bytes payload = 4;}
service UpstreamPeerService { rpc bidirectionalStreaming(stream InputStreamRequest) returns(stream OutputStreamRequest) {} rpc ready(google.protobuf.Empty) returns(ReadyResult); rpc live(stream google.protobuf.Empty) returns(stream LiveResult);}The constraints this captures:
- Safe transmission and ordering for downstream TCP clients — that’s the existing TCP connection
- The internal protocol inside the pipe is unchanged and opaque —
bytes payload - Upstream workers get full context: which client sent this (
client_uuid) and what they sent (payload) - Responses can target a specific client or broadcast to groups — one worker’s response can go to every connected session
Architecture overview
PPP manages client identity through UUIDv4 — one per accepted TCP connection. Once a downstream socket is identified, PPP’s job is to forward the context to upstream workers via gRPC bidirectional streaming. Multiple upstream workers can be registered. PPP distributes incoming traffic across them via round-robin.
The “glue” between worker instances for shared state is typically a database — something with high concurrency capacity. PPP itself stays stateless about application state; it only tracks connection lifecycle.
Real-world example
Say the internal team has split their monolithic server into three service types: N1, N2, N3 — each handling a different area of business logic. They want to manage 5000 concurrent users, spread across 50 instances of service type N1.
With PPP in front:
- Implement the Protobuf contract in N1
- Open a gRPC port on each N1 instance
- Focus entirely on business logic — no socket management
When instances scale from 50 to 150, they register with PPP via the management API:
curl -X POST http://localhost:8000/upstream/add \ -H 'Content-Type: application/json' \ -d '{"host":"10.0.1.5","port":45888,"alive_timeout":30,"ready_timeout":5}'PPP runs the readiness probe, and if the upstream passes, starts routing new sessions to it immediately. No restart. Kubernetes pods spin up, register, and start receiving load. The internal team doesn’t handle any of the networking — PPP manages connection lifecycle, client identification, and traffic distribution.
What PPP provides
- Load balancing (Round Robin) across all registered upstream workers
- Downstream TCP socket lifecycle — sticky, long-lived connections managed entirely by PPP
- Client identification and context switching via UUIDv4 — upstream workers always know which client they’re serving
- Dynamic upstream registration — workers register and deregister at runtime with no restart
- Health probing — readiness and liveness checks before and during routing
- Broadcast routing — fan out a single response to all connected clients or a subset
PPP is not a magic migration tool. It is the networking layer extracted from the monolith — one specific job, done reliably. It removes the biggest obstacle to horizontal scaling without requiring a rewrite of the business logic.
The project: github.com/tanguc/PersistentParadoxPlex
For a focused technical reference on the gRPC contract and session routing internals, see the architecture deep dive. The project page has the configuration reference.