Hardening a cloud server

Spin up a fresh VPS, wait an hour, and the auth log already has thousands of brute-force attempts from across the internet. Every server-hardening guide says roughly the same things — here's what each one actually stops, and where the rules are theater.

Security intro May 25, 2026

Why it exists

Spin up a fresh VPS on any cloud, give it a public IP, leave the default SSH port open, walk away for an hour, then come back and read /var/log/auth.log. You’ll find thousands of failed login attempts — root, admin, ubuntu, oracle, pi, git — from IP addresses all over the world. Nobody knew your server existed an hour ago. Nobody is targeting you. The entire public IPv4 space is being continuously port-scanned by botnets, and the moment your address answered on port 22, it joined a work queue.

That’s the world a hardening guide is written for. The advice every guide repeats — keys-only SSH, no root login, close ports you don’t need, patch automatically, ship logs off-box — isn’t ritual; each line is aimed at a specific failure mode that has happened often enough to be the boring default attack. The point of this post is to walk through those rules and, for each one, say what it actually stops so you can tell the useful ones from the cargo-culted ones.

Why it matters now

Cloud servers are a commodity. A $5/month droplet runs side-project APIs, small-business sites, Mastodon instances, home-lab tunnels, and increasingly self-hosted LLM frontends. The default cloud image is built to boot, not to be exposed to the open internet. Defaults vary by distro and provider — I won’t claim a specific Ubuntu minimal image’s full security posture without re-checking it — but the general pattern is consistent: no host firewall configured, only the SSH key the provider injected, the application stack you install is your responsibility to keep patched, and the public IP you were given will be probed within minutes. The provider hands you a host; the rest is your problem.

Meanwhile the attacker side has gotten cheaper, not more sophisticated. Most server compromises in the wild are still opportunistic — credential stuffing, exposed Redis/MongoDB/Elasticsearch instances, unpatched CVEs weeks or months old, leaked AWS keys committed to public GitHub repos. The 2017 “MongoDB ransomware” wave that hit tens of thousands of databases worked not because the attackers were clever but because so many instances were exposed to the public internet with access control disabled. (MongoDB’s packaged installs bind to localhost by default; the failure mode was operators who had explicitly bound them to a public interface and skipped auth, often on the assumption that the network around them was trusted.) The same shape of mistake still ships today, just on different services.

The short answer

server hardening = shrink the attack surface + remove the easy wins + make compromise loud

Three goals, in that order. Shrink the attack surface means fewer ports, fewer services, fewer accounts that could be attacked at all. Remove the easy wins means the things that are exposed shouldn’t fall to a guessed password or a known CVE. Make compromise loud means when one of the first two fails — and eventually one will — you find out within minutes instead of finding out from your hosting bill.

Every concrete rule below slots into one of those three buckets.

How it works

Before you SSH in: the cloud account is the real perimeter

The first hardening step is one most server-focused guides skip, because they assume you’re already inside the VM.

Turn on MFA for the cloud console, and don’t keep long-lived root API keys on your laptop. Why: if someone phishes your AWS/GCP/Hetzner login, none of the in-VM hardening matters — they can snapshot your disk, spin up a new VM with your snapshot mounted, and read whatever they want, or just rotate the SSH key and walk in the front door. The host operating system is downstream of the cloud account; protect the upstream first. MFA with a hardware key beats TOTP beats SMS, in that order — see passkeys for why “something you have” is the rung that actually stops credential phishing.

Use the cloud firewall (security group / network ACL), not just in-VM ufw/iptables. Why: a cloud firewall filters packets before they ever reach the VM’s kernel. If a service inside your VM gets compromised and disables ufw, the cloud rules still apply. And if you misconfigure ufw itself — forgetting ufw enable, opening a port you only meant to allow from your office IP, leaving the default forward policy as ACCEPT — the cloud firewall is a second fence. The in-VM firewall is fine to keep, but as defense in depth, not as the only layer.

Have a backup that isn’t writeable from the server. Why: the hardening rules below are all about preventing compromise; backups are the only thing that helps after compromise. The constraint is “not writeable from the server” — a backup that the compromised root user can rm -rf is no backup. Provider-side snapshots with a separate account, or restic/borg to an append-only / immutable bucket, are the usual answers.

Lock the front door: SSH

The single port every hardening guide spends the most ink on, because it’s also the one every botnet spends the most ink on.

Disable password authentication; allow public-key only. Set PasswordAuthentication no and KbdInteractiveAuthentication no in /etc/ssh/sshd_config (older configs still set the deprecated alias ChallengeResponseAuthentication no — same effect, current OpenSSH prefers the new name). Why: the entire botnet-scanning industry is built on guessing passwords. A 2048-bit RSA or Ed25519 key is, for practical purposes, unguessable — the search space dwarfs anything a brute-forcer can throw at it. The instant you turn off passwords, the thousands of Failed password for root lines in auth.log become zero, because the server stops even offering that authentication method.

Disable root login over SSH. Set PermitRootLogin no. Why: if root login is allowed, the attacker knows the username for free — root exists on every Linux box. They only need to guess one thing (the credential), not two (the username and the credential). Forcing a named user plus sudo also gives you per-person attribution: sudo logs each elevated command to auth.log with the original user attached, so a post-incident review can tell which admin ran what. SSH still logs direct root logins too, but if multiple people share that root credential the trail stops at “root did it.”

Don’t bother changing the SSH port for “security.” Why: moving SSH from 22 to 2222 doesn’t stop a determined attacker — port scans take seconds. It does dramatically cut the volume of script-kiddie noise in your logs, which makes real anomalies easier to see. So move it if you want a quieter auth.log, not because it’s a security control. Security through obscurity isn’t worthless, but don’t count it as a layer.

Be skeptical of fail2ban as a security control on keys-only SSH. fail2ban watches log files for failed logins and temporarily blocks the source IP. Why the skepticism: with password auth disabled, those failed attempts can’t succeed anyway, so banning the IP doesn’t change the security posture much. It does cut log volume and CPU spent on doomed scans, which is a real operational win — just don’t count it as a defensive layer it isn’t. Where fail2ban earns its keep as security is in front of services that still authenticate with passwords (a database you exposed by mistake, a web admin panel, an SMTP relay).

Shrink the attack surface: close, don’t just guard

Default-deny inbound on the cloud firewall; explicitly open only what you need. Why: a service the internet can’t reach can’t be exploited from the internet, regardless of whatever CVEs it carries. Every port you open is a service that has to stay patched for as long as the server exists. The cheapest security work is the work you don’t have to do because the attack never reaches a listener. Start from “nothing open except SSH from my IPs,” then open exactly what the app needs.

Don’t bind databases or caches to a public interface. Bind Postgres, Redis, MongoDB, Elasticsearch, Memcached, etc. to 127.0.0.1 or to a private network, not to 0.0.0.0. Why: this is the failure mode behind almost every “tens of thousands of MongoDB/Elasticsearch/Redis instances ransomed” headline of the last decade. Defaults vary by product — some bind to localhost out of the box, some require auth, some don’t — but the common mistake is the same shape: an operator binds the service to a public interface because they need remote access, and either skips auth or sets a weak one because “it’s only used internally.” The robust fix isn’t “set a strong password,” it’s “don’t let the internet talk to it at all” — SSH-tunnel, VPN, or private-network access only.

Remove packages you aren’t using. A default Ubuntu install includes things you almost certainly don’t need on a single-purpose server — snapd, cloud-init modules, sometimes an MTA. Why: every installed package is in your patch surface even if it isn’t running. A CVE in a binary you never invoke can still be a privilege-escalation primitive for an attacker who got in through a different door. “Smaller image” isn’t aesthetic; it’s fewer things to track.

Run application services as non-root, with systemd sandboxing. Set User=, Group=, and the Protect* / Private* directives in the unit file (ProtectSystem=strict, ProtectHome=true, PrivateTmp=true, NoNewPrivileges=true, etc.). Why: if your web app gets popped via SQL injection or a deserialization bug, the blast radius is whatever that process can touch. Least privilege turns “RCE in my app” from “root on the host” into “a sandboxed process that can read its own working directory and not much else.”

Make patching automatic

Enable unattended security upgrades. On Debian/Ubuntu, install and configure unattended-upgrades to apply the ${distro_id}:${distro_codename}-security channel automatically. Why: a large share of the CVEs that actually get exploited in the wild — see CISA’s “Known Exploited Vulnerabilities” catalog for the running list — have had public patches for weeks, months, or years. The bottleneck usually isn’t the patch; it’s the human who has to remember to apply it. Automating security backports (specifically security, not arbitrary upgrades) is one of the highest return-on-effort hardening steps available, because distro maintainers already do the careful work of backporting fixes without changing behaviour.

Reboot when the kernel updates. needrestart or unattended-upgrades with Unattended-Upgrade::Automatic-Reboot "true" and a sensible time window. Why: a new kernel image on disk that the system never boots into isn’t actually running. Kernel CVEs (privilege escalations, especially) are common, and for most fixes the way you start running the patched kernel is a reboot. Live-patching services (Canonical Livepatch, kpatch, Ksplice) can cover a subset of fixes without one, but they’re a paid add-on and don’t cover everything — assume reboots unless you’ve explicitly bought your way out of them.

Pin major versions; auto-apply only security backports. Why: distro security updates aim to fix the bug without changing behaviour — that’s the whole point of a backport — so they rarely cause regressions in practice. Automatic major version bumps are a different story and regularly do break things; that’s where “the server updated overnight and now my app won’t start” stories come from. Keep the two separate.

Make compromise loud

This is the bucket most guides under-cover, and it’s the one that matters most once the first two layers eventually fail.

Ship logs off the box. Forward journald/auth.log/syslog to a separate account or service (a managed log service, an S3 bucket the VM can write but not read, another VM in another account). Why: a common move after getting root is to clobber the local logs — echo > /var/log/auth.log, journalctl --vacuum-time=1s, or just rm. Logs that live on the compromised box can be tampered with by whoever compromised it. Logs in a separate trust boundary are the artifact you’ll actually use in incident response. The cost is small; the value the day you need it is enormous.

Watch for new binaries in the usual drop spots. World-writable directories like /tmp, /var/tmp, and /dev/shm, plus /usr/local/bin, are the obvious places for a payload that arrived via a web-app exploit to land — there’s no reason your own apps should be writing executables there. A systemd timer that periodically diffs a manifest (or a small tool like osquery, aide, or wazuh) catches the boring case. Why: the median “my server got hacked” story isn’t nation-state malware; it’s an XMRig miner running off a binary the attacker dropped after exploiting a public web app. The signature is “a process you didn’t install showed up.” That’s easy to detect if you ever look — and almost no one does, until the CPU bill arrives.

Alert on outbound traffic to unusual destinations. Why: most compromises become visible at the egress, not the ingress. The attacker’s payload phones home to a command-and-control server, or the miner connects to a mining pool, or your AWS keys start being used from an IP you’ve never seen. A baseline of “this server talks to these three IPs” plus an alert when it doesn’t is a surprisingly effective last line of detection. Cloud-native flow logs (VPC Flow Logs, equivalent on other providers) are the cheap way to get it.

Show the seams

A hardening guide that doesn’t tell you where it lies isn’t being honest. The honest version:

OS hardening protects you against opportunistic attacks, not targeted ones. Most servers that get compromised get compromised through an application bug (SQL injection, RCE in a dependency, leaked secrets in a public repo, an unpatched CMS) — not through the OS. The rules above shrink the easy attack surface; they do little against an attacker who already has your AWS keys or a 0-day in your Rails app.
“Defense in depth” can become “checkbox theater” fast. Layering a WAF, an IDS, a SIEM, a CASB, and a bastion host on top of a poorly patched application is more reassuring than it is protective. Spend on the layers that actually fire — keys, firewall, patches, logs — before you spend on the ones that look good in an audit document.
A bastion host only helps if you also rotate the bastion. Routing all SSH through a single jump host concentrates auditability and is worthwhile, but the bastion is now your single point of failure; if it isn’t patched and key-rotated, you’ve turned a flat attack surface into one with a clearly labelled front door.
Container hardening is a different post. Most of the rules above assume a long-lived VM. Cattle-not-pets workloads (ephemeral containers, immutable images, orchestrated rollouts) shift the problem — patching is “redeploy with a newer base image,” and SSH into a container is itself a smell. The threat model rhymes; the controls don’t always map one-to-one.
I don’t have a confident, current figure for what fraction of cloud compromises trace to leaked credentials vs. unpatched CVEs vs. app bugs. The general industry reports (Verizon DBIR, Mandiant M-Trends) point at credentials and exposed services as the leading causes year after year, but the exact mix shifts and I’d rather name the gap than quote a number I’d have to re-verify.

Defense in depth — defense in depth = assume each layer will fail; stack independent layers so the next one catches it — the meta-principle behind all of the above; the trap is that “more layers” can become a substitute for “layers that work.”
Principle of least privilege — least privilege = every actor gets exactly the permissions it needs, no more — sudo over root, non-root services, scoped IAM roles instead of root API keys.
Zero trust — zero trust ≈ "the network isn't a security boundary; authenticate every request" — the modern reframing of “don’t trust the LAN,” responsible for the shift from VPN-and-flat-network to per-service auth.
Bastion host — bastion = single auditable jump server that all SSH goes through — concentrates the SSH attack surface so you have one thing to patch and watch instead of dozens.
CVE — CVE = a public identifier for a specific known vulnerability — the unit of “what you’re patching against” and what unattended-upgrades is closing.
Security through obscurity — STO ≈ relying on attackers not knowing your setup — not a layer of defense; sometimes a useful signal-to-noise reducer (e.g. moving SSH off port 22 to quiet the logs).

Going deeper

The OpenSSH sshd_config(5) man page — the primary source for what every directive in this post actually does, straight from the OpenSSH project.
Mozilla’s OpenSSH guidelines — the best curated secondary, for “given all of those directives, here’s a sane modern profile and the reasoning behind each choice.”
The annual Verizon Data Breach Investigations Report — the antidote to building a hardening posture around the wrong threat model; if you want to know what actually compromises servers in a given year, this is where the numbers are.