Skip to main content

Command Palette

Search for a command to run...

Building a Production-Grade PostgreSQL HA Cluster on AWS EC2

Stop Relying on RDS — Here's How to Do It Right with Patroni, etcd, and HAProxy

Updated
11 min read
Building a Production-Grade PostgreSQL HA Cluster on AWS EC2
S
Hey, I am Sagar Thosar — and I have spent 13 years in the trenches of enterprise Oracle database administration so you do not have to figure it out alone. I started as an Assistant Oracle DBA at a training firm in Pune, deployed to client sites, managing databases I barely understood and learning everything the hard way — through production incidents, 2 AM calls, and the kind of pressure that either breaks you or makes you very, very good. Thirteen years later, I have managed 150+ database estates at companies like Equinix, BNY Mellon, and Oracle India. I have patched Exadata clusters, architected GoldenGate replication for zero-downtime migrations, designed RIRA frameworks that eliminated 300+ recurring incidents per month, and — most recently — built AWS Bedrock Agentic AI systems that autonomously handle database operations that used to wake me up at night. The industry changed. I changed with it. And I wish someone had been writing honestly about that journey while I was living it. That is why this blog exists. Here you will find: 🔧 Deep Oracle DBA content — Exadata internals, RAC architecture, GoldenGate replication patterns, Data Guard configurations, performance tuning war stories. Real scenarios from production environments, not sanitised textbook examples. ☁️ Cloud database engineering — Oracle to AWS migrations, self-service provisioning with Terraform and GitHub Actions, AWS RDS architecture decisions, and what the documentation never tells you about moving enterprise Oracle to the cloud. 🤖 Intelligent database automation — How to use AWS Bedrock, Python, and GitLab CI/CD to eliminate manual DBA work. The future of database operations is not more DBAs doing the same things faster. It is smarter systems doing routine things autonomously while DBAs focus on architecture. 📓 Notebook to blog — I have been accumulating handwritten notes across years of reading, production incidents, and experiments. Many of these blogs start as scribbled observations in notebooks. If a concept changed how I think about databases, it ends up here. I write for DBAs who are tired of firefighting and want to build systems that prevent fires. For cloud engineers who inherited Oracle environments and need to understand what they are dealing with. For architects who need to make sense of how traditional database operations translate into modern cloud infrastructure. I do not write to impress. I write to be useful. If that sounds like your kind of content — follow along. And if you have a question, a problem, or a topic you want me to cover — reach out. Thirteen years of Oracle DBA experience is most valuable when it is shared. Connect with me: 🔗 LinkedIn 💻 GitHub 📧 sagarthosar.careers@gmail.com

If you're a junior DBA, chances are you've heard this advice more than once: "Just use RDS Multi-AZ."

And don't get me wrong, RDS is a fantastic service. I use it, I recommend it, and for a lot of workloads it's absolutely the right choice.

But here's the thing.

There are plenty of real-world scenarios where self-managed PostgreSQL on EC2 makes a lot more sense. Oracle migration projects, custom PostgreSQL extensions, specific OS-level requirements, or simply situations where you need more flexibility and cost optimisation beyond the 64TB RDS limit.

What surprises me is that most tutorials stop at a two-node streaming replication setup and call it High Availability.

It isn't.

That's replication.

True High Availability means automatic failover, automatic node rejoin, and seamless connection routing without someone logging in to fix things manually.

In this article, I'll show you how we build a production-grade PostgreSQL HA cluster on AWS using three battle-tested open-source tools: Patroni, etcd, and HAProxy.


What We're Building

Four EC2 instances. Each with a specific job. Each with its own EBS volume properly sized for its workload. Let me walk you through every decision.


The Tool Stack — And Why Each Tool

Patroni — Your HA Brain

Patroni is a Python-based HA manager that runs as an agent on each PostgreSQL node. It does three things:

  1. Monitors local PostgreSQL health every few seconds

  2. Manages leader election via etcd

  3. Exposes a REST API so external tools know who is primary

The REST API is the clever part. When HAProxy wants to know if a node is primary, it doesn't connect to PostgreSQL and run SQL. It just calls GET /primary on port 8008. Patroni responds with HTTP 200 if primary, 503 if replica. Simple, fast, no database connection overhead.

etcd — The Single Source of Truth

etcd is a distributed key-value store built on the Raft consensus algorithm. You probably know it as the brain of Kubernetes. Here, we use it for one purpose: storing the leader key.

Patroni on pg-node1 → writes "I am leader" to etcd (every 10 seconds)
Patroni on pg-node2 → reads etcd → sees pg-node1 is leader → stays standby

pg-node1 crashes → stops updating leader key
etcd TTL expires after 30 seconds
Patroni on pg-node2 → no leader detected → runs election → wins
→ promotes itself to primary → writes "I am leader" to etcd

No split-brain. No ambiguity. One source of truth.

HAProxy — Your Smart Traffic Router

This is where junior DBAs often ask: "Why not just use a CNAME or Elastic IP pointing to the primary?"

Here's why that doesn't work:

  • After failover, the primary changes. CNAME/EIP still points to old node.

  • Someone has to manually remap it .

  • Elastic IP remapping costs money per operation.

  • You cannot split reads and writes with a simple IP.

HAProxy solves all of this. It polls Patroni's REST API every 2 seconds. The moment it detects the primary changed, it reroutes traffic automatically. Your application connection string never changes. And crucially — you get two ports:

  • Port 5000 → always routes to whoever is primary (writes)

  • Port 5001 → always routes to whoever is replica (reads)

This read/write split alone can reduce your primary's load by 40-60% on read-heavy workloads.


Infrastructure Design Decisions

Before we touch a single command, let me explain the storage decisions — because they matter.

EBS Volume Strategy

We use separate EBS volumes for database data, not the root volume. Here's why:

If PostgreSQL fills up the root volume, the OS crashes. If it fills a separate data volume, PostgreSQL stops but the OS survives — you can SSH in and fix it.

For PostgreSQL nodes, we use io2 (provisioned IOPS):

PostgreSQL data: 20GB io2 at 3000 IOPS

Why io2 and not gp3? PostgreSQL WAL writes are sequential and latency-sensitive. io2 guarantees your IOPS are always available — no bursting, no surprises during peak load. For a Telecom or financial workload where WAL streaming replication must stay in sync, this matters.

For etcd, gp3 is sufficient:

etcd data: 10GB gp3

etcd writes are small. gp3 baseline of 3000 IOPS is more than enough for consensus operations.

Security Group Design

All cluster-internal ports are restricted to the security group itself — not open to the internet:

Port 5432 → pg-ha-sg (self)  ← only cluster nodes can connect to PostgreSQL
Port 8008 → pg-ha-sg (self)  ← Patroni REST API, internal only
Port 2379 → pg-ha-sg (self)  ← etcd client, internal only
Port 2380 → pg-ha-sg (self)  ← etcd peer, internal only
Port 5000 → application SG   ← HAProxy RW, application tier only
Port 5001 → application SG   ← HAProxy RO, application tier only
Port 22   → bastion SG       ← SSH via bastion only (not direct)

In production, your database nodes should have no public IPs. They live in private subnets, accessible only via a bastion host or AWS Systems Manager Session Manager.


Building the Cluster — Step by Step

Step 1: etcd First, Always

etcd must be running before Patroni starts. This is not optional — Patroni needs somewhere to register itself on startup.

sudo apt install -y etcd-server etcd-client

sudo tee /etc/default/etcd > /dev/null << 'EOF'
ETCD_NAME="etcd-node"
ETCD_DATA_DIR="/var/lib/etcd"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://172.31.30.123:2379"
ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://172.31.30.123:2380"
ETCD_INITIAL_CLUSTER="etcd-node=http://172.31.30.123:2380"
ETCD_INITIAL_CLUSTER_TOKEN="pg-ha-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_ENABLE_V2="true"
EOF

sudo systemctl start etcd

Verify it's healthy:

etcdctl endpoint health
etcdctl put test-key "hello" && etcdctl get test-key

Ubuntu 24.04 gotcha: The package is etcd-server not etcd. Many tutorials use sudo apt install etcd — this will fail on Ubuntu 24.04.

Step 2: PostgreSQL + Patroni on Node 1

Install PostgreSQL but immediately stop and disable it. Patroni will control when and how PostgreSQL starts — you should never start it manually again.

sudo apt install -y postgresql-16
sudo systemctl stop postgresql
sudo systemctl disable postgresql

pip3 install patroni[etcd] psycopg2-binary --break-system-packages

The Patroni config is where most of the magic lives. Key sections to understand:

scope: pg-ha-cluster       # Cluster name — must match on all nodes
name: pg-node1             # This node's unique name

etcd:
  host: 172.31.30.123:2379  # etcd address

bootstrap:
  dcs:
    ttl: 30                # Leader key TTL in seconds
    loop_wait: 10          # How often Patroni checks health

The ttl: 30 and loop_wait: 10 relationship is important. The TTL must be longer than loop_wait — otherwise a healthy primary might lose its leader key between updates. 30 seconds TTL with 10 second loop means three missed updates before failover triggers. This gives you resilience against transient network blips.

  pg_hba:
    - host replication replicator 172.31.0.0/16 md5
    - host all all 0.0.0.0/0 md5

Patroni generates pg_hba.conf automatically. You never touch it manually. On failover, every node has the correct pg_hba.conf because Patroni manages it — not you.

Step 3: The Golden AMI Pattern

This is something production teams do that tutorial writers skip. After configuring pg-node1 fully:

AWS Console → EC2 → pg-node1 → Actions → Image and templates → Create image
Name: pg-ha-base-image
No reboot: ✅ (keeps pg-node1 running)

Now launch pg-node2 from this AMI. It comes up with PostgreSQL 16 and Patroni 4.1.3 already installed. You only need to change three lines in the Patroni config — the node name and IP addresses.

For a 10-node cluster, this saves hours of installation time. More importantly, every node is guaranteed to be identical — no "I forgot to install X on node 7" surprises.

Step 4: HAProxy Configuration

The config is straightforward once you understand what each line does:

listen pg_primary
    bind *:5000
    option httpchk GET /primary      ← call Patroni REST API
    http-check expect status 200     ← 200 = this node is primary
    default-server inter 2s fall 3 rise 2 on-marked-down shutdown-sessions
    server pg-node1 172.31.20.153:5432 check port 8008
    server pg-node2 172.31.28.170:5432 check port 8008

inter 2s — check every 2 seconds. This is how quickly HAProxy detects a failover.

fall 3 — three failed checks before marking a server down. This prevents false failovers on transient network issues.

rise 2 — two successful checks before marking a server back up. Prevents flapping.

on-marked-down shutdown-sessions — immediately kill existing connections to a downed server. Don't let applications hang on dead connections.


The Proof — Failover in Action

Here's what actually happened when we tested this in our lab:

Before failover:

| Member   | Role    | State     | TL | Lag |
| pg-node1 | Replica | streaming |  1 |   0 |
| pg-node2 | Leader  | running   |  1 |     |

We stopped Patroni on pg-node2 (the primary).

~30 seconds later:

| Member   | Role   | State   | TL |
| pg-node1 | Leader | running |  2 |

pg-node1 automatically promoted. Timeline advanced from 1 to 2.

We restarted Patroni on pg-node2.

~30 seconds after restart:

| Member   | Role    | State     | TL | Lag |
| pg-node1 | Leader  | running   |  2 |     |
| pg-node2 | Replica | streaming |  2 |   0 |

pg-node2 automatically rejoined as replica. Patroni used pg_rewind to sync it with the new primary's timeline. Lag = 0.

Zero manual intervention. Zero application changes required.


Verifying the Health Check Mechanism

Don't trust what you can't verify. Run this from your HAProxy node:

# pg-node2 is primary — should return 200
curl -s http://172.31.28.170:8008/primary | python3 -m json.tool

# pg-node1 is replica — should return 503
curl -s http://172.31.20.153:8008/primary | python3 -m json.tool

The response body from Patroni tells you everything:

{
  "role": "primary",
  "state": "running",
  "replication": [{
    "application_name": "pg-node1",
    "state": "streaming",
    "sync_state": "async"
  }]
}

This is how HAProxy knows where to send traffic. No SQL queries, no database connections — just HTTP status codes. Elegant and fast.


Common Mistakes Junior DBAs Make

Mistake 1: Starting PostgreSQL manually instead of letting Patroni manage it

If you sudo systemctl start postgresql on a Patroni-managed node, you create a PostgreSQL instance outside Patroni's control. This causes split-brain. Always start Patroni, never PostgreSQL directly.

Mistake 2: Using the same node name in Patroni config

Every node must have a unique name in patroni.yml. If two nodes have the same name, they'll fight over the leader key.

Mistake 3: Connecting applications directly to node IPs

After failover, the primary IP changes. Applications pointing to 172.31.20.153:5432 will be pointing to a replica after failover. Always connect to HAProxy. Always.

Mistake 4: Using a single etcd node in production

We used one etcd node for simplicity in this lab. Production requires 3 or 5 etcd nodes for true consensus — a single etcd node is a single point of failure for your entire cluster's leader election.


Production Considerations

This lab uses public IPs for simplicity. In production:

  • Database nodes go in private subnets — no public IPs

  • Access via bastion host or AWS Systems Manager Session Manager

  • NAT Gateway for outbound internet access (package updates)

  • 3 etcd nodes for HA consensus (not 1)

  • io2 storage confirmed for WAL-heavy workloads

  • AWS CloudWatch monitoring on all nodes

  • Patroni watchdog configured to prevent split-brain on network partition


Cost Reality Check (Lab based)

For a production setup with this architecture:

Component Instance Cost/month
pg-node1 t3.medium ~$30
pg-node2 t3.medium ~$30
etcd-node t3.small ~$15
haproxy-node t3.small ~$15
EBS volumes ~100GB total ~$12
Total ~$102/month

Compare this to RDS Multi-AZ db.t3.medium ($120+/month for instance alone, plus storage, plus Multi-AZ surcharge). For the same cost you get more control, no storage limits, and the ability to install any PostgreSQL extension you need.


Final Thoughts

Building your own HA PostgreSQL cluster on EC2 is not "reinventing the wheel." It's making a deliberate architectural choice for workloads that need it. When you understand each component's role — etcd for consensus, Patroni for orchestration, HAProxy for routing — the whole system becomes transparent and debuggable.

More importantly, you can explain exactly what happens during a failover, because you built it. And in my experience, that depth of understanding is exactly what separates senior DBAs from everyone else.

Build it. Break it. Fix it. Then you'll own it.


Stack used: PostgreSQL 16.14, Patroni 4.1.3, etcd 3.4.30, HAProxy 2.8.16, Ubuntu 24.04 LTS, AWS ap-south-1