Hardening a cloud server
Spin up a fresh VPS, wait an hour, and the auth log already has thousands of brute-force attempts from across the internet. Every server-hardening guide says roughly the same things — here's what each one actually stops, and where the rules are theater.
Why it exists
Spin up a fresh
VPS
on any cloud, give it a public IP, leave the default
SSH
port open, walk away for an hour, then come back and read
/var/log/auth.log. You’ll find thousands of failed login attempts —
root, admin, ubuntu, oracle, pi, git — from IP addresses all
over the world. Nobody knew your server existed an hour ago. Nobody is
targeting you. The entire public IPv4 space is being continuously port-scanned
by botnets, and the moment your address answered on port 22, it joined a
work queue.
That’s the world a hardening guide is written for. The advice every guide repeats — keys-only SSH, no root login, close ports you don’t need, patch automatically, ship logs off-box — isn’t ritual; each line is aimed at a specific failure mode that has happened often enough to be the boring default attack. The point of this post is to walk through those rules and, for each one, say what it actually stops so you can tell the useful ones from the cargo-culted ones.
Why it matters now
Cloud servers are a commodity. A $5/month droplet runs side-project APIs, small-business sites, Mastodon instances, home-lab tunnels, and increasingly self-hosted LLM frontends. The default cloud image is built to boot, not to be exposed to the open internet. Defaults vary by distro and provider — I won’t claim a specific Ubuntu minimal image’s full security posture without re-checking it — but the general pattern is consistent: no host firewall configured, only the SSH key the provider injected, the application stack you install is your responsibility to keep patched, and the public IP you were given will be probed within minutes. The provider hands you a host; the rest is your problem.
Meanwhile the attacker side has gotten cheaper, not more sophisticated. Most server compromises in the wild are still opportunistic — credential stuffing, exposed Redis/MongoDB/Elasticsearch instances, unpatched CVEs weeks or months old, leaked AWS keys committed to public GitHub repos. The 2017 “MongoDB ransomware” wave that hit tens of thousands of databases worked not because the attackers were clever but because so many instances were exposed to the public internet with access control disabled. (MongoDB’s packaged installs bind to localhost by default; the failure mode was operators who had explicitly bound them to a public interface and skipped auth, often on the assumption that the network around them was trusted.) The same shape of mistake still ships today, just on different services.
The short answer
server hardening = shrink the attack surface + remove the easy wins + make compromise loud
Three goals, in that order. Shrink the attack surface means fewer ports, fewer services, fewer accounts that could be attacked at all. Remove the easy wins means the things that are exposed shouldn’t fall to a guessed password or a known CVE. Make compromise loud means when one of the first two fails — and eventually one will — you find out within minutes instead of finding out from your hosting bill.
Every concrete rule below slots into one of those three buckets.
How it works
Before you SSH in: the cloud account is the real perimeter
The first hardening step is one most server-focused guides skip, because they assume you’re already inside the VM.
Turn on MFA for the cloud console, and don’t keep long-lived root API keys on your laptop. Why: if someone phishes your AWS/GCP/Hetzner login, none of the in-VM hardening matters — they can snapshot your disk, spin up a new VM with your snapshot mounted, and read whatever they want, or just rotate the SSH key and walk in the front door. The host operating system is downstream of the cloud account; protect the upstream first. MFA with a hardware key beats TOTP beats SMS, in that order — see passkeys for why “something you have” is the rung that actually stops credential phishing.
Use the cloud firewall (security group / network ACL), not just
in-VM ufw/iptables. Why: a cloud firewall filters packets
before they ever reach the VM’s kernel. If a service inside your VM
gets compromised and disables ufw, the cloud rules still apply. And
if you misconfigure ufw itself — forgetting ufw enable, opening a
port you only meant to allow from your office IP, leaving the default
forward policy as ACCEPT — the cloud firewall is a second fence.
The in-VM firewall is fine to keep, but as defense in depth, not as
the only layer.
Have a backup that isn’t writeable from the server. Why: the
hardening rules below are all about preventing compromise; backups are
the only thing that helps after compromise. The constraint is “not
writeable from the server” — a backup that the compromised root user
can rm -rf is no backup. Provider-side snapshots with a separate
account, or restic/borg to an
append-only / immutable
bucket, are the usual answers.
Lock the front door: SSH
The single port every hardening guide spends the most ink on, because it’s also the one every botnet spends the most ink on.
Disable password authentication; allow public-key only. Set
PasswordAuthentication no and KbdInteractiveAuthentication no in
/etc/ssh/sshd_config (older configs still set the deprecated alias
ChallengeResponseAuthentication no — same effect, current OpenSSH
prefers the new name). Why: the entire botnet-scanning industry is
built on guessing passwords. A 2048-bit RSA or Ed25519 key is, for
practical purposes, unguessable — the search space dwarfs anything a
brute-forcer can throw at it. The instant you turn off passwords, the
thousands of Failed password for root lines in auth.log become
zero, because the server stops even offering that authentication method.
Disable root login over SSH. Set PermitRootLogin no. Why: if root
login is allowed, the attacker knows the username for free — root
exists on every Linux box. They only need to guess one thing (the
credential), not two (the username and the credential). Forcing a
named user plus
sudo
also gives you per-person attribution: sudo logs each elevated command
to auth.log with the original user attached, so a post-incident
review can tell which admin ran what. SSH still logs direct root
logins too, but if multiple people share that root credential the trail
stops at “root did it.”
Don’t bother changing the SSH port for “security.” Why: moving
SSH from 22 to 2222 doesn’t stop a determined attacker — port scans
take seconds. It does dramatically cut the volume of script-kiddie
noise in your logs, which makes real anomalies easier to see. So move
it if you want a quieter auth.log, not because it’s a security
control.
Security through obscurity
isn’t worthless, but don’t count it as a layer.
Be skeptical of fail2ban as a security control on keys-only SSH.
fail2ban watches log files for failed logins and temporarily blocks
the source IP. Why the skepticism: with password auth disabled, those
failed attempts can’t succeed anyway, so banning the IP doesn’t change
the security posture much. It does cut log volume and CPU spent on
doomed scans, which is a real operational win — just don’t count it as
a defensive layer it isn’t. Where fail2ban earns its keep as
security is in front of services that still authenticate with
passwords (a database you exposed by mistake, a web admin panel, an
SMTP relay).
Shrink the attack surface: close, don’t just guard
Default-deny inbound on the cloud firewall; explicitly open only what you need. Why: a service the internet can’t reach can’t be exploited from the internet, regardless of whatever CVEs it carries. Every port you open is a service that has to stay patched for as long as the server exists. The cheapest security work is the work you don’t have to do because the attack never reaches a listener. Start from “nothing open except SSH from my IPs,” then open exactly what the app needs.
Don’t bind databases or caches to a public interface. Bind Postgres,
Redis, MongoDB, Elasticsearch, Memcached, etc. to 127.0.0.1 or to a
private network, not to 0.0.0.0. Why: this is the failure mode
behind almost every “tens of thousands of MongoDB/Elasticsearch/Redis
instances ransomed” headline of the last decade. Defaults vary by
product — some bind to localhost out of the box, some require auth,
some don’t — but the common mistake is the same shape: an operator
binds the service to a public interface because they need remote access,
and either skips auth or sets a weak one because “it’s only used
internally.” The robust fix isn’t “set a strong password,” it’s
“don’t let the internet talk to it at all” — SSH-tunnel, VPN, or
private-network access only.
Remove packages you aren’t using. A default Ubuntu install includes
things you almost certainly don’t need on a single-purpose server —
snapd, cloud-init modules, sometimes an MTA. Why: every installed
package is in your patch surface even if it isn’t running. A CVE in a
binary you never invoke can still be a privilege-escalation primitive
for an attacker who got in through a different door. “Smaller image”
isn’t aesthetic; it’s fewer things to track.
Run application services as non-root, with systemd sandboxing. Set
User=, Group=, and the Protect* / Private* directives in the
unit file (ProtectSystem=strict, ProtectHome=true, PrivateTmp=true,
NoNewPrivileges=true, etc.). Why: if your web app gets popped via
SQL injection or a deserialization bug, the blast radius is whatever
that process can touch.
Least privilege
turns “RCE in my app” from “root on the host” into “a sandboxed
process that can read its own working directory and not much else.”
Make patching automatic
Enable unattended security upgrades. On Debian/Ubuntu, install and
configure unattended-upgrades to apply the ${distro_id}:${distro_codename}-security
channel automatically. Why: a large share of the CVEs that actually
get exploited in the wild — see CISA’s “Known Exploited Vulnerabilities”
catalog for the running list — have had public patches for weeks,
months, or years. The bottleneck usually isn’t the patch; it’s the
human who has to remember to apply it. Automating security backports
(specifically security, not arbitrary upgrades) is one of the highest
return-on-effort hardening steps available, because distro maintainers
already do the careful work of backporting fixes without changing
behaviour.
Reboot when the kernel updates. needrestart or unattended-upgrades
with Unattended-Upgrade::Automatic-Reboot "true" and a sensible time
window. Why: a new kernel image on disk that the system never boots
into isn’t actually running. Kernel CVEs (privilege escalations,
especially) are common, and for most fixes the way you start running
the patched kernel is a reboot. Live-patching services (Canonical
Livepatch, kpatch, Ksplice) can cover a subset of fixes without one,
but they’re a paid add-on and don’t cover everything — assume reboots
unless you’ve explicitly bought your way out of them.
Pin major versions; auto-apply only security backports. Why: distro security updates aim to fix the bug without changing behaviour — that’s the whole point of a backport — so they rarely cause regressions in practice. Automatic major version bumps are a different story and regularly do break things; that’s where “the server updated overnight and now my app won’t start” stories come from. Keep the two separate.
Make compromise loud
This is the bucket most guides under-cover, and it’s the one that matters most once the first two layers eventually fail.
Ship logs off the box. Forward
journald/auth.log/syslog
to a separate account or service (a managed log service, an S3 bucket
the VM can write but not read, another VM in another account). Why:
a common move after getting root is to clobber the local logs —
echo > /var/log/auth.log, journalctl --vacuum-time=1s, or just
rm. Logs that live on the compromised box can be tampered with by
whoever compromised it. Logs in a separate trust boundary are the
artifact you’ll actually use in incident response. The cost is small;
the value the day you need it is enormous.
Watch for new binaries in the usual drop spots. World-writable
directories like /tmp, /var/tmp, and /dev/shm, plus /usr/local/bin,
are the obvious places for a payload that arrived via a web-app exploit
to land — there’s no reason your own apps should be writing executables
there. A systemd timer that periodically diffs a manifest
(or a small tool like osquery, aide, or wazuh) catches the
boring case. Why: the median “my server got hacked” story isn’t
nation-state malware; it’s an XMRig miner running off a binary the
attacker dropped after exploiting a public web app. The signature
is “a process you didn’t install showed up.” That’s easy to detect
if you ever look — and almost no one does, until the CPU bill
arrives.
Alert on outbound traffic to unusual destinations. Why: most compromises become visible at the egress, not the ingress. The attacker’s payload phones home to a command-and-control server, or the miner connects to a mining pool, or your AWS keys start being used from an IP you’ve never seen. A baseline of “this server talks to these three IPs” plus an alert when it doesn’t is a surprisingly effective last line of detection. Cloud-native flow logs (VPC Flow Logs, equivalent on other providers) are the cheap way to get it.
Show the seams
A hardening guide that doesn’t tell you where it lies isn’t being honest. The honest version:
- OS hardening protects you against opportunistic attacks, not targeted ones. Most servers that get compromised get compromised through an application bug (SQL injection, RCE in a dependency, leaked secrets in a public repo, an unpatched CMS) — not through the OS. The rules above shrink the easy attack surface; they do little against an attacker who already has your AWS keys or a 0-day in your Rails app.
- “Defense in depth” can become “checkbox theater” fast. Layering a WAF, an IDS, a SIEM, a CASB, and a bastion host on top of a poorly patched application is more reassuring than it is protective. Spend on the layers that actually fire — keys, firewall, patches, logs — before you spend on the ones that look good in an audit document.
- A bastion host only helps if you also rotate the bastion. Routing all SSH through a single jump host concentrates auditability and is worthwhile, but the bastion is now your single point of failure; if it isn’t patched and key-rotated, you’ve turned a flat attack surface into one with a clearly labelled front door.
- Container hardening is a different post. Most of the rules above assume a long-lived VM. Cattle-not-pets workloads (ephemeral containers, immutable images, orchestrated rollouts) shift the problem — patching is “redeploy with a newer base image,” and SSH into a container is itself a smell. The threat model rhymes; the controls don’t always map one-to-one.
- I don’t have a confident, current figure for what fraction of cloud compromises trace to leaked credentials vs. unpatched CVEs vs. app bugs. The general industry reports (Verizon DBIR, Mandiant M-Trends) point at credentials and exposed services as the leading causes year after year, but the exact mix shifts and I’d rather name the gap than quote a number I’d have to re-verify.
Famous related terms
- Defense in depth —
defense in depth = assume each layer will fail; stack independent layers so the next one catches it— the meta-principle behind all of the above; the trap is that “more layers” can become a substitute for “layers that work.” - Principle of least privilege —
least privilege = every actor gets exactly the permissions it needs, no more—sudoover root, non-root services, scoped IAM roles instead of root API keys. - Zero trust —
zero trust ≈ "the network isn't a security boundary; authenticate every request"— the modern reframing of “don’t trust the LAN,” responsible for the shift from VPN-and-flat-network to per-service auth. - Bastion host —
bastion = single auditable jump server that all SSH goes through— concentrates the SSH attack surface so you have one thing to patch and watch instead of dozens. - CVE —
CVE = a public identifier for a specific known vulnerability— the unit of “what you’re patching against” and whatunattended-upgradesis closing. - Security through obscurity —
STO ≈ relying on attackers not knowing your setup— not a layer of defense; sometimes a useful signal-to-noise reducer (e.g. moving SSH off port 22 to quiet the logs).
Going deeper
- The OpenSSH
sshd_config(5)man page — the primary source for what every directive in this post actually does, straight from the OpenSSH project. - Mozilla’s OpenSSH guidelines — the best curated secondary, for “given all of those directives, here’s a sane modern profile and the reasoning behind each choice.”
- The annual Verizon Data Breach Investigations Report — the antidote to building a hardening posture around the wrong threat model; if you want to know what actually compromises servers in a given year, this is where the numbers are.