When DNS quietly becomes the bottleneck
In the beginning, DNS is invisible. A few servers, a few applications, a handful of users. Queries are fast, outages are rare, and nobody thinks about the resolver path because it “just works.” Then the environment grows. More sites. More VLANs. More SaaS dependencies. More internal services with short TTLs. Suddenly DNS is no longer a background detail—it becomes a shared dependency that can slow everything down, leak information, or become an easy pivot point for attackers.
In enterprise networks and ISP environments, a secure DNS cache is not about convenience. It is about control: controlling where queries go, controlling who can ask, controlling what gets logged, and controlling how failures behave. In this guide, we are going to build a secure BIND9 caching resolver on RHEL with an internal, enterprise-first posture—no reliance on public resolvers, and no “open resolver” risk.
Prerequisites and assumptions
Before we touch a command, we need to be explicit about the environment we are designing for. DNS is foundational; small assumptions become big incidents later.
- Platform: RHEL (supported enterprise deployment). The steps assume a modern RHEL release with
dnf,systemd, andfirewalld. - System state: A clean or well-maintained server with no other DNS service bound to port 53. If another resolver is already running, we must stop it before BIND can listen.
- Privileges: We need root access (direct root shell or sudo). All commands below assume we are running as root. If we are using sudo, we should prefix commands accordingly.
- Network design: This is an internal & enterprise DNS cache. We will restrict recursion to known internal networks and avoid becoming an open resolver.
- Upstream DNS: We will use internal enterprise upstream resolvers (for example, corporate DNS forwarders, ISP core resolvers, or authoritative internal resolvers). We will not use public resolvers.
- Firewall: We will explicitly allow DNS (TCP/UDP 53) only from internal networks. We will not expose this service to the internet.
- Persistence: Configuration must survive reboots, services must be enabled, and logs must be available for operations.
Step 1: Confirm the server identity and network facts
Before installing anything, we will capture the server’s IP addressing and default route. This matters because we will bind BIND to the correct interfaces and build firewall rules that match our internal networks.
hostnamectl
ip -br addr
ip route show default
We have now confirmed the hostname, active interfaces, and the default route. This gives us the interface names and IP ranges we will reference later.
Step 2: Install BIND9 and supporting tools
Now we will install BIND (named) and a few utilities that help us validate configuration and test resolution. We are doing this early so we can validate each change with real queries.
dnf -y install bind bind-utils policycoreutils-python-utils
BIND and its utilities are now installed. The bind-utils package provides tools like dig and named-checkconf, and the SELinux utilities help us inspect and adjust policy safely if needed.
Step 3: Define our internal networks and upstream enterprise resolvers
We are going to set variables for internal networks and upstream resolvers. This keeps the configuration consistent and reduces copy/paste drift. Because every environment is different, we will first print the server’s current IPs so we can choose the correct internal CIDRs.
ip -br addr | awk '{print $1, $3}'
We now have a quick view of interface-to-IP mappings. Next, we will set shell variables for internal networks and upstream resolvers. These upstream resolvers must be internal enterprise DNS servers (for example, corporate forwarders in core networks or data centers).
INTERNAL_NET_1="10.0.0.0/8"
INTERNAL_NET_2="172.16.0.0/12"
INTERNAL_NET_3="192.168.0.0/16"
UPSTREAM_DNS_1="10.10.10.10"
UPSTREAM_DNS_2="10.10.10.11"
We have now defined internal client ranges and upstream enterprise resolvers. In the next step, we will enforce these boundaries in BIND so recursion is only available to internal networks.
Step 4: Configure BIND as a secure caching resolver
Now we will configure named to behave like an enterprise caching resolver with controlled recursion, controlled listening, and safer defaults. We will also enable DNSSEC validation for integrity, and we will avoid exposing version details.
Before changing anything, we will back up the existing configuration. This gives us a clean rollback path.
cp -a /etc/named.conf /etc/named.conf.bak.$(date +%F_%H%M%S)
We have created a timestamped backup of /etc/named.conf. Next, we will write a complete, production-oriented configuration file.
cat > /etc/named.conf <<'EOF'
options {
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
recursing-file "/var/named/data/named.recursing";
secroots-file "/var/named/data/named.secroots";
/*
* Enterprise posture:
* - We provide recursion only to internal networks.
* - We do not become an open resolver.
* - We keep behavior predictable and observable.
*/
listen-on port 53 { 127.0.0.1; any; };
listen-on-v6 port 53 { ::1; };
allow-query { any; };
recursion yes;
/*
* Restrict recursion and cache access to internal networks only.
* This is the core control that prevents open resolver exposure.
*/
allow-recursion {
127.0.0.1;
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
allow-query-cache {
127.0.0.1;
10.0.0.0/8;
172.16.0.0/12;
192.168.0.0/16;
};
/*
* Forwarding to internal enterprise resolvers.
* We avoid public resolvers by design.
*/
forward only;
forwarders {
10.10.10.10;
10.10.10.11;
};
/*
* Security hardening.
*/
dnssec-validation yes;
auth-nxdomain no;
minimal-responses yes;
version "not disclosed";
/*
* Logging is handled via systemd/journald by default on RHEL,
* but we keep named's internal files in /var/named/data.
*/
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
};
zone "." IN {
type hint;
file "named.ca";
};
include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
EOF
We have replaced /etc/named.conf with a controlled caching-resolver configuration. Recursion and cache access are restricted to RFC1918 ranges and localhost, and forwarding is locked to internal upstream resolvers. We also enabled DNSSEC validation and reduced response verbosity.
Align the configuration with our environment variables
The configuration above includes example internal networks and upstream resolvers. Now we will safely apply our shell variables to the file so the running configuration matches our environment. We are doing this with explicit substitutions to keep the change deterministic.
sed -i
-e "s|10.0.0.0/8|${INTERNAL_NET_1}|g"
-e "s|172.16.0.0/12|${INTERNAL_NET_2}|g"
-e "s|192.168.0.0/16|${INTERNAL_NET_3}|g"
-e "s|10.10.10.10|${UPSTREAM_DNS_1}|g"
-e "s|10.10.10.11|${UPSTREAM_DNS_2}|g"
/etc/named.conf
We have now applied our internal network ranges and upstream enterprise resolvers into /etc/named.conf. Next, we will validate the configuration before starting the service.
Step 5: Validate BIND configuration before starting
We will validate syntax and referenced files. This prevents the most common failure mode: a service that refuses to start because of a small configuration error.
named-checkconf -z /etc/named.conf
If the command returned no output and exited successfully, the configuration is syntactically valid and zone references are consistent. If it printed errors, we should fix them before moving on.
Step 6: Enable and start named, then verify it is listening
Now we will enable the service so it persists across reboots, start it, and confirm it is actually bound to port 53. We are verifying at the socket level because “active” is not the same as “reachable.”
systemctl enable --now named
systemctl status named --no-pager
named is now enabled and started. Next, we will confirm it is listening on DNS ports.
ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'
We have verified UDP and TCP listeners on port 53. If we do not see named bound to port 53, we likely have a port conflict or a startup failure, which we will address in troubleshooting.
Step 7: Configure firewalld for internal-only DNS access
Now we will enforce network-level control. Even though BIND is configured to restrict recursion, the firewall is our second line of defense. We will allow DNS only from internal networks and keep the service closed to untrusted sources.
First, we will confirm firewalld is running.
systemctl enable --now firewalld
systemctl status firewalld --no-pager
Firewalld is now enabled and running. Next, we will identify the active zone and the interface attached to it so we apply rules in the correct place.
firewall-cmd --get-active-zones
DEFAULT_ZONE=$(firewall-cmd --get-default-zone)
echo "Default zone: ${DEFAULT_ZONE}"
firewall-cmd --zone="${DEFAULT_ZONE}" --list-interfaces
We now know which zone is active and which interfaces are in it. Next, we will add rich rules to allow DNS from internal networks only. We are using rich rules because they allow source CIDR restrictions cleanly.
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_1} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_1} port port=53 protocol=tcp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_2} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_2} port port=53 protocol=tcp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_3} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_3} port port=53 protocol=tcp accept"
firewall-cmd --reload
We have now allowed DNS traffic from internal networks to this server over both UDP and TCP. We did not open DNS broadly as a service, which helps prevent accidental exposure if the server ever becomes reachable from outside.
Next, we will verify the rules are present.
firewall-cmd --zone="${DEFAULT_ZONE}" --list-rich-rules
We should see rich rules permitting TCP/UDP 53 from the internal CIDRs. If they are missing, the reload may have failed or the zone selection may be wrong.
Step 8: SELinux considerations on RHEL
On RHEL, SELinux is part of the security model, not an obstacle to work around. For a standard caching resolver using default paths and ports, SELinux should allow named to run without custom policy. We will confirm SELinux mode and check for denials only if something fails.
getenforce
sestatus
We have confirmed SELinux status. If named is running and queries work, we do not need to change SELinux. If named fails unexpectedly, we will inspect audit logs in troubleshooting.
Step 9: Verify resolution and caching behavior
Now we will test the resolver locally first. Local testing removes network variables and confirms that BIND is functioning as a caching forwarder to our internal upstream resolvers.
We will query a well-known domain and confirm we get an answer. Then we will repeat the query to observe improved response time, which indicates caching is working.
dig @127.0.0.1 example.com A +noall +answer +stats
dig @127.0.0.1 example.com A +noall +answer +stats
If the first query succeeds and the second query shows reduced query time, caching is functioning. If queries fail, the most likely causes are upstream reachability, firewall rules, or recursion restrictions.
Verify recursion is restricted to internal networks
We also need to confirm we did not accidentally create an open resolver. The cleanest operational check is to ensure recursion is only allowed from internal networks. From a host outside the allowed CIDRs, recursion should be refused. From an internal host, it should succeed.
On the DNS server itself, we can confirm the access control lists are present in the active configuration by reviewing the file and ensuring it matches our internal CIDRs.
grep -nE 'allow-recursion|allow-query-cache|forwarders' -n /etc/named.conf
We have confirmed the key control points are present in the configuration. For a full validation, we should run a query from an internal client and, separately, ensure that untrusted networks cannot reach port 53 due to firewall policy.
Step 10: Make the server use itself for DNS safely
In many environments, we want the DNS cache server to use itself for name resolution. We will do this carefully to avoid locking ourselves out during remote sessions. The safest approach is to confirm local resolution works first (we already did), then update the system resolver configuration.
On RHEL, NetworkManager often manages /etc/resolv.conf. We will first check what is currently in use.
readlink -f /etc/resolv.conf
cat /etc/resolv.conf
We now know whether /etc/resolv.conf is managed and what nameservers are configured. Next, we will set the system to use 127.0.0.1 as the primary resolver via NetworkManager, which persists across reboots.
First, we will list active connections and pick the one in use.
nmcli -t -f NAME,UUID,DEVICE connection show --active
Now we will store the active connection name in a variable and apply DNS settings. This keeps the commands copy/paste safe.
CONN_NAME=$(nmcli -t -f NAME connection show --active | head -n1)
echo "Using connection: ${CONN_NAME}"
nmcli connection modify "${CONN_NAME}" ipv4.ignore-auto-dns yes
nmcli connection modify "${CONN_NAME}" ipv4.dns "127.0.0.1"
nmcli connection up "${CONN_NAME}"
The active connection is now configured to use the local BIND instance for DNS and to ignore automatically provided DNS servers. Next, we will verify the effective resolver configuration.
cat /etc/resolv.conf
dig example.com A +noall +answer +stats
We have confirmed the system resolver points to localhost and that name resolution still works. This change persists across reboots because it is stored in the NetworkManager connection profile.
Operational checks we should keep in our runbook
In production, we want a small set of checks that quickly answer: “Is DNS up, is it reachable, and is it behaving securely?” These commands are safe to run repeatedly.
systemctl is-active named
ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
firewall-cmd --get-active-zones
firewall-cmd --zone="$(firewall-cmd --get-default-zone)" --list-rich-rules
dig @127.0.0.1 example.com A +noall +answer +stats
We have a compact operational snapshot: service state, listening sockets, firewall posture, and a real query test.
Troubleshooting
Symptom: named fails to start
Likely causes: syntax error in /etc/named.conf, missing included files, or port 53 already in use.
First, we will check service logs and validate configuration again.
systemctl status named --no-pager
journalctl -u named --no-pager -n 200
named-checkconf -z /etc/named.conf
If named-checkconf reports an error, we should fix the referenced line. If logs mention “address already in use,” we need to find what is bound to port 53.
ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'
Once the conflicting service is stopped or reconfigured, restarting named should succeed.
Symptom: dig to 127.0.0.1 works, but clients cannot resolve
Likely causes: firewall rules not applied to the correct zone, clients not in allowed CIDRs, or routing/VLAN path issues.
We will confirm firewall rules and ensure they match the internal networks.
DEFAULT_ZONE=$(firewall-cmd --get-default-zone)
firewall-cmd --zone="${DEFAULT_ZONE}" --list-rich-rules
firewall-cmd --get-active-zones
If the interface is not in the expected zone, we should move it to the correct zone or apply rules to the active zone. After adjusting, we reload firewalld and retest from a client.
Symptom: queries return SERVFAIL
Likely causes: upstream resolvers unreachable, upstream refusing recursion, or DNSSEC validation failures due to upstream manipulation or broken chains.
We will first confirm upstream reachability on port 53.
UPSTREAM_DNS_1="10.10.10.10"
UPSTREAM_DNS_2="10.10.10.11"
timeout 3 bash -c "cat < /dev/null > /dev/tcp/${UPSTREAM_DNS_1}/53" && echo "TCP 53 reachable: ${UPSTREAM_DNS_1}" || echo "TCP 53 not reachable: ${UPSTREAM_DNS_1}"
timeout 3 bash -c "cat < /dev/null > /dev/tcp/${UPSTREAM_DNS_2}/53" && echo "TCP 53 reachable: ${UPSTREAM_DNS_2}" || echo "TCP 53 not reachable: ${UPSTREAM_DNS_2}"
If upstream TCP/53 is not reachable, we need to fix routing or upstream firewall policy. If upstream is reachable, we will query upstream directly to confirm it answers.
dig @${UPSTREAM_DNS_1} example.com A +noall +answer +stats
dig @${UPSTREAM_DNS_2} example.com A +noall +answer +stats
If upstream answers but BIND returns SERVFAIL, we will inspect logs for DNSSEC-related messages and consider whether upstream is compatible with DNSSEC validation in our path.
journalctl -u named --no-pager -n 200 | tail -n 200
If logs indicate DNSSEC validation issues caused by upstream behavior, we should correct upstream DNS integrity rather than disabling validation. In tightly controlled enterprise environments, DNSSEC validation is part of the trust model.
Symptom: clients get REFUSED
Likely causes: client source IP not included in allow-recursion or allow-query-cache, or NAT makes clients appear from an unexpected range.
We will confirm the internal CIDRs in the configuration and reload named after any change.
grep -n "allow-recursion" -A20 /etc/named.conf
grep -n "allow-query-cache" -A20 /etc/named.conf
systemctl reload named
systemctl status named --no-pager
After reload, clients in the allowed ranges should be able to recurse. If NAT is involved, we must allow the post-NAT source range, not the original client range.
Symptom: SELinux denials prevent named from working
Likely causes: non-standard paths, custom logging locations, or unexpected file contexts.
We will check for recent AVC denials related to named.
ausearch -m avc -ts recent | tail -n 50
journalctl --no-pager | grep -i avc | tail -n 50
If denials exist, we should correct file locations and contexts to match RHEL expectations rather than weakening SELinux. For standard deployments using default paths, this is uncommon.
Common mistakes
Mistake: Accidentally creating an open resolver
Symptom: external networks can query and receive recursive answers.
Fix: ensure allow-recursion and allow-query-cache are restricted to internal CIDRs and localhost, and ensure firewalld only allows port 53 from internal networks. Then reload named and firewalld.
named-checkconf /etc/named.conf
systemctl reload named
firewall-cmd --reload
We have revalidated configuration and reloaded both the DNS service and firewall policy.
Mistake: Forwarders set to the wrong upstream DNS
Symptom: dig returns timeouts or SERVFAIL, and logs show forwarding failures.
Fix: update the forwarders list to internal enterprise resolvers that are reachable from this server, then reload named and retest.
grep -n "forwarders" -A5 /etc/named.conf
systemctl reload named
dig @127.0.0.1 example.com A +noall +answer +stats
We have confirmed the forwarder configuration and validated resolution through the local cache.
Mistake: Firewall rules applied to the wrong zone
Symptom: local queries work, but internal clients time out.
Fix: identify the active zone for the interface and apply rich rules to that zone, then reload.
firewall-cmd --get-active-zones
firewall-cmd --get-default-zone
firewall-cmd --reload
We have confirmed zone state and reloaded policy. If the interface is not in the expected zone, we must correct zone assignment before rules will take effect.
Mistake: Another service is already bound to port 53
Symptom: named will not start, and logs mention “address already in use.”
Fix: identify the process bound to port 53, stop or reconfigure it, then start named.
ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'
systemctl restart named
systemctl status named --no-pager
We have identified the conflict and restarted named. Once port 53 is free, BIND should bind successfully.
How do we at NIILAA look at this
This setup is not impressive because it is complex. It is impressive because it is controlled. Every component is intentional. Every configuration has a reason. This is how infrastructure should scale — quietly, predictably, and without drama.
At NIILAA, we help organizations design, deploy, secure, and maintain internal and enterprise DNS platforms that hold up under real growth: segmented networks, multi-site routing, compliance constraints, and operational realities. We focus on resolver architecture, access control, logging strategy, change management, and long-term maintainability—so DNS stays boring, even when everything else is moving.
- Website: https://www.niilaa.com
- Email: [email protected]
- LinkedIn: https://www.linkedin.com/company/niilaa
- Facebook: https://www.facebook.com/niilaa.llc