DNS Deep Dive
The Domain Name System is the phone book of the internet — except it's also the routing slip, the trust anchor for TLS, the email policy file, and a few other things. Almost every outage that ends with "we never figured out what was wrong" has DNS somewhere in the root cause. This guide is the bigger-picture mental model that lets you read a dig output, predict resolver behaviour, and avoid the classic apex-CNAME / glue / TTL traps.
We'll start with the moving parts (resolvers vs authoritatives), walk a real query end-to-end, tour the record types and the anatomy of a dig answer, then look at DNSSEC, DoH, DoT, and the operational pitfalls that bite people in production.
Resolvers vs Authoritative Servers
The single most useful distinction in DNS is between the two server roles. Many products do both at once, but the roles are conceptually separate:
- Authoritative server — holds the master copy of a zone (
example.com) and answers questions about names within that zone. It does not chase referrals or cache other people's data. Examples: NS1, Route 53, Cloudflare Authoritative, BIND running in primary mode. - Recursive resolver (a.k.a. caching resolver) — has no zones of its own. Clients ask it questions; it walks the DNS tree to find answers, caches them according to TTL, and returns the result. Examples:
1.1.1.1,8.8.8.8, your ISP's resolver, thesystemd-resolved/unbound/dnsmasqinstance on your laptop. - Stub resolver — the tiny client built into your OS. It just asks one configured recursive resolver and trusts the answer.
When you type a hostname in a browser, the chain is stub → recursive → authoritative(s). Cache hits short-circuit the path.
Recursive vs Iterative Queries
DNS uses two different question modes. They look similar from the outside but they're handled very differently.
- Recursive query: "Please give me the final answer, do whatever it takes." Stub → recursive resolver.
- Iterative query: "Tell me what you know; if you don't have the answer, point me at someone who's closer." Recursive resolver → roots → TLDs → authoritatives.
Authoritative servers don't do recursion. If you ask a.gtld-servers.net for www.example.com, you'll get a referral to the example.com nameservers, not the answer.
End-to-end query for www.example.com (cold cache)
Your laptop Recursive resolver (1.1.1.1) Authoritative tree
─────────── ────────────────────────── ──────────────────
1. www.example.com? ──────▶ Cache miss. Start at the root.
│
│ 2. www.example.com?
├─────────────────────────────▶ Root server
│ "I don't know, but the
│ .com TLD servers are
│ a.gtld-servers.net, ..."
│◀─────────────────────────────
│
│ 3. www.example.com?
├─────────────────────────────▶ .com TLD server
│ "I don't know, but
│ example.com is served
│ by a.iana-servers.net,
│ b.iana-servers.net"
│◀─────────────────────────────
│
│ 4. www.example.com?
├─────────────────────────────▶ example.com auth
│ "www.example.com IN A
│ 93.184.216.34
│ TTL 86400"
│◀─────────────────────────────
│
│ Cache the answer until TTL expires.
5. 93.184.216.34 ◀────────── Send the final A record to the stub.
The root, TLD and authoritative servers each do one referral step. The recursive resolver is the only party that walks the whole tree.
The root zone and the root hints
There are 13 logical root server identities (A through M, a.root-servers.net .. m.root-servers.net) operated by 12 organisations. Each "letter" is actually hundreds of physical servers fronted by anycast. Every recursive resolver ships with a root hints file listing their IP addresses; dig . NS against any of them returns the authoritative copy of that list.
DNS Record Types You'll Actually Use
The protocol defines dozens of record types; in practice eleven of them cover almost everything:
- A — IPv4 address.
example.com IN A 93.184.216.34. - AAAA — IPv6 address (pronounced "quad-A").
example.com IN AAAA 2606:2800:220:1::248:1893. - CNAME — canonical name; an alias from one name to another. Must not coexist with other record types on the same name (RFC 1034 §3.6.2). That's why you can't put a CNAME at the zone apex (
example.com) — the apex already carries SOA and NS records. - NS — delegates a (sub)domain to the listed nameserver hostnames.
- SOA — Start of Authority. One per zone, lists the primary master, contact email (with the first
.standing in for@), serial number, refresh / retry / expire / minimum TTL fields. - MX — mail exchanger, with a preference number (lower = more preferred).
example.com IN MX 10 mail.example.com. - TXT — free-form text. Now overloaded for SPF, DKIM, DMARC, domain-verification challenges, and a lot more.
- SRV — service location:
_sip._tcp.example.com 86400 IN SRV 10 60 5060 sipserver.example.com.Holds priority, weight, port and target. - CAA — Certificate Authority Authorisation (RFC 8659); restricts which CAs may issue certs for the domain. Issuance-time check.
- PTR — reverse pointer. Lives under
in-addr.arpa(IPv4) orip6.arpa(IPv6). Used for reverse DNS lookups. - ALIAS / ANAME — vendor-specific "CNAME-flattening" records (not standardised). Useful when you really want CNAME-like behaviour at the apex.
TTLs and Caching
Every DNS record carries a TTL (Time To Live) in seconds — a hint to resolvers about how long they may cache the answer. The TTL is set by the zone operator. Resolvers are required to honour it as an upper bound; they may cache for less (and some clients ignore it entirely, which is its own problem).
- Long TTLs (1 day +) — fewer queries to your authoritatives, faster lookups for end users, but slower change propagation.
- Short TTLs (60–300s) — fast failover and changes, but the load on your authoritatives goes up dramatically.
- Negative caching (NXDOMAIN responses) is governed by the
minimumfield in the SOA record (capped at 24h, RFC 2308).
If you're about to do a planned migration, drop the TTL days ahead of the cutover so existing cached entries expire by the time you flip. Once the new value has propagated, raise the TTL back up.
Reading dig Output
dig (Domain Information Groper) ships with BIND-utils. We prefer it over nslookup because it shows you the raw protocol fields and is scriptable. Here is a default query and what each block means.
dig www.example.com
$ dig www.example.com ; <<>> DiG 9.18.18 <<>> www.example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12345 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;www.example.com. IN A ;; ANSWER SECTION: www.example.com. 86400 IN A 93.184.216.34 ;; Query time: 12 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Sat May 23 16:00:00 UTC 2026 ;; MSG SIZE rcvd: 60
Section by section:
- HEADER —
opcodeis almost alwaysQUERY.statusis the rcode:NOERROR,NXDOMAIN(name doesn't exist),SERVFAIL(resolver gave up),REFUSED(server won't answer). - flags —
qrresponse,rdrecursion desired,rarecursion available,aaauthoritative answer,adauthentic data (DNSSEC-validated),cdchecking disabled. - QUESTION — the question we asked, echoed back.
- ANSWER — the records that answer the question. The number left of
INis the TTL in seconds. - AUTHORITY — the NS records for the zone holding the answer (often omitted from cached responses).
- ADDITIONAL — extra records the server thought we'd want next: glue, OPT pseudo-record for EDNS, etc.
Practical dig recipes
Common dig invocations
# Just the answer value, nothing else dig +short www.example.com # Specific record type dig MX example.com dig AAAA example.com dig TXT _dmarc.example.com # Ask a specific server (bypass your configured resolver) dig @1.1.1.1 www.example.com dig @ns1.example.com SOA example.com # Trace the full delegation from the root down (iterative walk) dig +trace www.example.com # Show DNSSEC RRs and the AD flag dig +dnssec example.com # Reverse lookup dig -x 93.184.216.34 dig -x 2606:2800:220:1::248:1893 # Useful diagnostics dig +noall +answer +stats www.example.com # quieter output dig +tcp www.example.com # force TCP (e.g. large responses)
+trace is invaluable when you suspect a delegation problem. It starts at the root, follows referrals, and prints the records it gets at each step — so you can see where the chain breaks.
DNSSEC at a Glance
DNSSEC (DNS Security Extensions, RFCs 4033–4035) doesn't encrypt DNS. It signs DNS records so a resolver can verify they weren't tampered with on the wire or in cache. There are three new record types and a chain of trust that runs from the root down.
- DNSKEY — the zone's public key(s). Two flavours: KSK (Key Signing Key) and ZSK (Zone Signing Key).
- RRSIG — a signature over an RRset, made with the ZSK. There's one RRSIG per RRset per signing key.
- DS — Delegation Signer record. Lives in the parent zone, contains a hash of the child's KSK. This is what extends the chain across delegation boundaries.
- NSEC / NSEC3 — authenticated denial-of-existence proofs (so an attacker can't say "there's no record" when there is one).
The chain in plain English
. Root has a DNSKEY. Resolvers ship with its hash (trust anchor).
└── com Root signs com's DS record.
└── example.com .com signs example.com's DS record.
└── www example.com signs the A record for www.
A resolver validates from the top:
1. Root DNSKEY matches the trust anchor on disk? OK.
2. DS for com matches root's signed DS RRset? OK.
3. com DNSKEY's hash matches that DS? OK.
4. DS for example.com matches com's signed DS RRset? OK.
5. example.com DNSKEY's hash matches that DS? OK.
6. example.com RRSIG over (www IN A 93.184.216.34) validates with that key? OK.
→ Set the AD flag in the response. Done.
If any link breaks (expired RRSIG, missing DS, wrong key), the resolver returns SERVFAIL. That's why a DNSSEC key rollover gone wrong looks like "the entire domain is down". The classic operational rule: always pre-publish new keys, never let signatures expire.
DNS-over-TLS and DNS-over-HTTPS
Plain DNS travels over UDP/TCP port 53 with no encryption or authentication. Anyone on path can read your queries or modify the answers. Two standardised transports fix this on the stub-to-resolver hop:
- DNS-over-TLS (DoT) — RFC 7858. DNS messages in a TLS-wrapped TCP connection on port 853. Looks like a distinct service; easy to block at network boundaries if you want to.
- DNS-over-HTTPS (DoH) — RFC 8484. DNS messages in HTTPS POSTs (or GETs of base64-encoded queries) to a templated URL like
https://cloudflare-dns.com/dns-queryon port 443. Indistinguishable from regular web traffic, which is the point and also the controversy.
Both authenticate the resolver (you know you're talking to 1.1.1.1 and not someone wedged in front of it) and encrypt the queries. Neither hides the final destination IP from the network, and neither replaces DNSSEC — they complement it. systemd-resolved, unbound, stubby, and most modern browsers can be configured to use DoT or DoH.
kdig (knot-dnsutils) makes DoT/DoH easy to test
# DoT to Cloudflare kdig +tls @1.1.1.1 www.example.com # DoH to Cloudflare kdig +https @cloudflare-dns.com www.example.com # DNS-over-QUIC (DoQ, RFC 9250) — newest of the three kdig +quic @dns.adguard.com www.example.com
Common Operational Pitfalls
CNAME at the apex
You can't put a CNAME at example.com. — the apex already carries SOA and NS records and a CNAME forbids any other record type on the same name. Symptoms: email breaks (no MX), the zone refuses to load, or resolvers return SERVFAIL. Workarounds: an A/AAAA record pointing at the same upstream, a vendor "ALIAS/ANAME" flattening record, or moving to a service that returns the upstream's A records directly.
Missing or incorrect glue records
If example.com is delegated to ns1.example.com (a nameserver inside the zone it's serving), the parent zone must publish a glue A/AAAA record alongside the NS delegation. Without it, resolvers chicken-and-egg themselves: they need to query ns1.example.com to find example.com, but they need example.com to learn ns1.example.com's address. dig +trace reveals missing glue immediately.
Stale TTLs and "we changed it hours ago"
If you forgot to drop the TTL before a migration, every resolver downstream of you is contractually allowed to hold the old answer for up to the original TTL. Forcing a flush on someone else's resolver is not within your power; you can only wait. Plan migrations with a TTL of 60–300s for at least longer than the previous TTL before the change.
"It works from my laptop"
Resolvers differ. Your laptop might be hitting systemd-resolved, which is fronting your VPN's resolver, which is forwarding to your corporate resolver, which is forwarding to 8.8.8.8. Each hop has independent caching. Always reproduce DNS issues from multiple resolvers — dig @1.1.1.1, dig @8.8.8.8, dig @9.9.9.9, and at least one resolver on the same network as the user — before declaring a fix.
Forgetting AAAA
Dual-stack hosts answer to AAAA queries too. If your apex has an A record but no AAAA, and the user is on an IPv6-only network (or Happy Eyeballs picks v6 first), they'll see an inconsistent experience. Check both records when investigating.
Round-robin is not load balancing
Returning multiple A records cycles them per-query but offers no health checking, no weighting, and no session affinity. For real traffic shaping use a load balancer, geo-DNS, or a service-specific solution.
Quick Reference
DNS commands you'll keep using
# Inspect your stub configuration cat /etc/resolv.conf resolvectl status # systemd-resolved # Quick lookups dig +short A example.com dig +short AAAA example.com dig +short MX example.com dig +short TXT example.com dig +short NS example.com dig +short SOA example.com # Authoritative answer (skip the resolver) dig @ns1.example.com SOA example.com +norec # DNSSEC validation status dig +dnssec +cd example.com # cd = checking disabled (compare) dig +dnssec example.com # ad flag indicates validation success # Reverse lookups dig -x 8.8.8.8 dig -x 2001:db8::1 # Print only certain sections dig +noall +answer example.com dig +noall +authority example.com
If you internalise three things from this guide, make them: (1) resolvers chase, authoritatives just answer; (2) every record has a TTL you signed up to honour; (3) when something looks wrong, ask multiple resolvers — DNS is a distributed cache, and "wrong" is often "stale somewhere".