Subscribe and receive upto $1000 discount on checkout. Learn more
Subscribe and receive upto $1000 discount on checkout. Learn more
Subscribe and receive upto $1000 discount on checkout. Learn more
Subscribe and receive upto $1000 discount on checkout. Learn more
Build a Secure DNS Cache Server Using BIND9

When DNS quietly becomes the bottleneck

In the beginning, DNS is invisible. A few servers, a few applications, a handful of users. Queries are fast, outages are rare, and nobody thinks about the resolver path because it “just works.” Then the environment grows. More sites. More VLANs. More SaaS dependencies. More internal services with short TTLs. Suddenly DNS is no longer a background detail—it becomes a shared dependency that can slow everything down, leak information, or become an easy pivot point for attackers.

In enterprise networks and ISP environments, a secure DNS cache is not about convenience. It is about control: controlling where queries go, controlling who can ask, controlling what gets logged, and controlling how failures behave. In this guide, we are going to build a secure BIND9 caching resolver on RHEL with an internal, enterprise-first posture—no reliance on public resolvers, and no “open resolver” risk.

Prerequisites and assumptions

Before we touch a command, we need to be explicit about the environment we are designing for. DNS is foundational; small assumptions become big incidents later.

  • Platform: RHEL (supported enterprise deployment). The steps assume a modern RHEL release with dnf, systemd, and firewalld.
  • System state: A clean or well-maintained server with no other DNS service bound to port 53. If another resolver is already running, we must stop it before BIND can listen.
  • Privileges: We need root access (direct root shell or sudo). All commands below assume we are running as root. If we are using sudo, we should prefix commands accordingly.
  • Network design: This is an internal & enterprise DNS cache. We will restrict recursion to known internal networks and avoid becoming an open resolver.
  • Upstream DNS: We will use internal enterprise upstream resolvers (for example, corporate DNS forwarders, ISP core resolvers, or authoritative internal resolvers). We will not use public resolvers.
  • Firewall: We will explicitly allow DNS (TCP/UDP 53) only from internal networks. We will not expose this service to the internet.
  • Persistence: Configuration must survive reboots, services must be enabled, and logs must be available for operations.

Step 1: Confirm the server identity and network facts

Before installing anything, we will capture the server’s IP addressing and default route. This matters because we will bind BIND to the correct interfaces and build firewall rules that match our internal networks.

hostnamectl
ip -br addr
ip route show default

We have now confirmed the hostname, active interfaces, and the default route. This gives us the interface names and IP ranges we will reference later.

Step 2: Install BIND9 and supporting tools

Now we will install BIND (named) and a few utilities that help us validate configuration and test resolution. We are doing this early so we can validate each change with real queries.

dnf -y install bind bind-utils policycoreutils-python-utils

BIND and its utilities are now installed. The bind-utils package provides tools like dig and named-checkconf, and the SELinux utilities help us inspect and adjust policy safely if needed.

Step 3: Define our internal networks and upstream enterprise resolvers

We are going to set variables for internal networks and upstream resolvers. This keeps the configuration consistent and reduces copy/paste drift. Because every environment is different, we will first print the server’s current IPs so we can choose the correct internal CIDRs.

ip -br addr | awk '{print $1, $3}'

We now have a quick view of interface-to-IP mappings. Next, we will set shell variables for internal networks and upstream resolvers. These upstream resolvers must be internal enterprise DNS servers (for example, corporate forwarders in core networks or data centers).

INTERNAL_NET_1="10.0.0.0/8"
INTERNAL_NET_2="172.16.0.0/12"
INTERNAL_NET_3="192.168.0.0/16"

UPSTREAM_DNS_1="10.10.10.10"
UPSTREAM_DNS_2="10.10.10.11"

We have now defined internal client ranges and upstream enterprise resolvers. In the next step, we will enforce these boundaries in BIND so recursion is only available to internal networks.

Step 4: Configure BIND as a secure caching resolver

Now we will configure named to behave like an enterprise caching resolver with controlled recursion, controlled listening, and safer defaults. We will also enable DNSSEC validation for integrity, and we will avoid exposing version details.

Before changing anything, we will back up the existing configuration. This gives us a clean rollback path.

cp -a /etc/named.conf /etc/named.conf.bak.$(date +%F_%H%M%S)

We have created a timestamped backup of /etc/named.conf. Next, we will write a complete, production-oriented configuration file.

cat > /etc/named.conf <<'EOF'
options {
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        recursing-file  "/var/named/data/named.recursing";
        secroots-file   "/var/named/data/named.secroots";

        /*
         * Enterprise posture:
         * - We provide recursion only to internal networks.
         * - We do not become an open resolver.
         * - We keep behavior predictable and observable.
         */

        listen-on port 53 { 127.0.0.1; any; };
        listen-on-v6 port 53 { ::1; };

        allow-query { any; };

        recursion yes;

        /*
         * Restrict recursion and cache access to internal networks only.
         * This is the core control that prevents open resolver exposure.
         */
        allow-recursion {
                127.0.0.1;
                10.0.0.0/8;
                172.16.0.0/12;
                192.168.0.0/16;
        };

        allow-query-cache {
                127.0.0.1;
                10.0.0.0/8;
                172.16.0.0/12;
                192.168.0.0/16;
        };

        /*
         * Forwarding to internal enterprise resolvers.
         * We avoid public resolvers by design.
         */
        forward only;
        forwarders {
                10.10.10.10;
                10.10.10.11;
        };

        /*
         * Security hardening.
         */
        dnssec-validation yes;
        auth-nxdomain no;
        minimal-responses yes;
        version "not disclosed";

        /*
         * Logging is handled via systemd/journald by default on RHEL,
         * but we keep named's internal files in /var/named/data.
         */
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
EOF

We have replaced /etc/named.conf with a controlled caching-resolver configuration. Recursion and cache access are restricted to RFC1918 ranges and localhost, and forwarding is locked to internal upstream resolvers. We also enabled DNSSEC validation and reduced response verbosity.

Align the configuration with our environment variables

The configuration above includes example internal networks and upstream resolvers. Now we will safely apply our shell variables to the file so the running configuration matches our environment. We are doing this with explicit substitutions to keep the change deterministic.

sed -i 
  -e "s|10.0.0.0/8|${INTERNAL_NET_1}|g" 
  -e "s|172.16.0.0/12|${INTERNAL_NET_2}|g" 
  -e "s|192.168.0.0/16|${INTERNAL_NET_3}|g" 
  -e "s|10.10.10.10|${UPSTREAM_DNS_1}|g" 
  -e "s|10.10.10.11|${UPSTREAM_DNS_2}|g" 
  /etc/named.conf

We have now applied our internal network ranges and upstream enterprise resolvers into /etc/named.conf. Next, we will validate the configuration before starting the service.

Step 5: Validate BIND configuration before starting

We will validate syntax and referenced files. This prevents the most common failure mode: a service that refuses to start because of a small configuration error.

named-checkconf -z /etc/named.conf

If the command returned no output and exited successfully, the configuration is syntactically valid and zone references are consistent. If it printed errors, we should fix them before moving on.

Step 6: Enable and start named, then verify it is listening

Now we will enable the service so it persists across reboots, start it, and confirm it is actually bound to port 53. We are verifying at the socket level because “active” is not the same as “reachable.”

systemctl enable --now named
systemctl status named --no-pager

named is now enabled and started. Next, we will confirm it is listening on DNS ports.

ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'

We have verified UDP and TCP listeners on port 53. If we do not see named bound to port 53, we likely have a port conflict or a startup failure, which we will address in troubleshooting.

Step 7: Configure firewalld for internal-only DNS access

Now we will enforce network-level control. Even though BIND is configured to restrict recursion, the firewall is our second line of defense. We will allow DNS only from internal networks and keep the service closed to untrusted sources.

First, we will confirm firewalld is running.

systemctl enable --now firewalld
systemctl status firewalld --no-pager

Firewalld is now enabled and running. Next, we will identify the active zone and the interface attached to it so we apply rules in the correct place.

firewall-cmd --get-active-zones
DEFAULT_ZONE=$(firewall-cmd --get-default-zone)
echo "Default zone: ${DEFAULT_ZONE}"
firewall-cmd --zone="${DEFAULT_ZONE}" --list-interfaces

We now know which zone is active and which interfaces are in it. Next, we will add rich rules to allow DNS from internal networks only. We are using rich rules because they allow source CIDR restrictions cleanly.

firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_1} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_1} port port=53 protocol=tcp accept"

firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_2} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_2} port port=53 protocol=tcp accept"

firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_3} port port=53 protocol=udp accept"
firewall-cmd --permanent --zone="${DEFAULT_ZONE}" --add-rich-rule="rule family=ipv4 source address=${INTERNAL_NET_3} port port=53 protocol=tcp accept"

firewall-cmd --reload

We have now allowed DNS traffic from internal networks to this server over both UDP and TCP. We did not open DNS broadly as a service, which helps prevent accidental exposure if the server ever becomes reachable from outside.

Next, we will verify the rules are present.

firewall-cmd --zone="${DEFAULT_ZONE}" --list-rich-rules

We should see rich rules permitting TCP/UDP 53 from the internal CIDRs. If they are missing, the reload may have failed or the zone selection may be wrong.

Step 8: SELinux considerations on RHEL

On RHEL, SELinux is part of the security model, not an obstacle to work around. For a standard caching resolver using default paths and ports, SELinux should allow named to run without custom policy. We will confirm SELinux mode and check for denials only if something fails.

getenforce
sestatus

We have confirmed SELinux status. If named is running and queries work, we do not need to change SELinux. If named fails unexpectedly, we will inspect audit logs in troubleshooting.

Step 9: Verify resolution and caching behavior

Now we will test the resolver locally first. Local testing removes network variables and confirms that BIND is functioning as a caching forwarder to our internal upstream resolvers.

We will query a well-known domain and confirm we get an answer. Then we will repeat the query to observe improved response time, which indicates caching is working.

dig @127.0.0.1 example.com A +noall +answer +stats
dig @127.0.0.1 example.com A +noall +answer +stats

If the first query succeeds and the second query shows reduced query time, caching is functioning. If queries fail, the most likely causes are upstream reachability, firewall rules, or recursion restrictions.

Verify recursion is restricted to internal networks

We also need to confirm we did not accidentally create an open resolver. The cleanest operational check is to ensure recursion is only allowed from internal networks. From a host outside the allowed CIDRs, recursion should be refused. From an internal host, it should succeed.

On the DNS server itself, we can confirm the access control lists are present in the active configuration by reviewing the file and ensuring it matches our internal CIDRs.

grep -nE 'allow-recursion|allow-query-cache|forwarders' -n /etc/named.conf

We have confirmed the key control points are present in the configuration. For a full validation, we should run a query from an internal client and, separately, ensure that untrusted networks cannot reach port 53 due to firewall policy.

Step 10: Make the server use itself for DNS safely

In many environments, we want the DNS cache server to use itself for name resolution. We will do this carefully to avoid locking ourselves out during remote sessions. The safest approach is to confirm local resolution works first (we already did), then update the system resolver configuration.

On RHEL, NetworkManager often manages /etc/resolv.conf. We will first check what is currently in use.

readlink -f /etc/resolv.conf
cat /etc/resolv.conf

We now know whether /etc/resolv.conf is managed and what nameservers are configured. Next, we will set the system to use 127.0.0.1 as the primary resolver via NetworkManager, which persists across reboots.

First, we will list active connections and pick the one in use.

nmcli -t -f NAME,UUID,DEVICE connection show --active

Now we will store the active connection name in a variable and apply DNS settings. This keeps the commands copy/paste safe.

CONN_NAME=$(nmcli -t -f NAME connection show --active | head -n1)
echo "Using connection: ${CONN_NAME}"

nmcli connection modify "${CONN_NAME}" ipv4.ignore-auto-dns yes
nmcli connection modify "${CONN_NAME}" ipv4.dns "127.0.0.1"
nmcli connection up "${CONN_NAME}"

The active connection is now configured to use the local BIND instance for DNS and to ignore automatically provided DNS servers. Next, we will verify the effective resolver configuration.

cat /etc/resolv.conf
dig example.com A +noall +answer +stats

We have confirmed the system resolver points to localhost and that name resolution still works. This change persists across reboots because it is stored in the NetworkManager connection profile.

Operational checks we should keep in our runbook

In production, we want a small set of checks that quickly answer: “Is DNS up, is it reachable, and is it behaving securely?” These commands are safe to run repeatedly.

systemctl is-active named
ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
firewall-cmd --get-active-zones
firewall-cmd --zone="$(firewall-cmd --get-default-zone)" --list-rich-rules
dig @127.0.0.1 example.com A +noall +answer +stats

We have a compact operational snapshot: service state, listening sockets, firewall posture, and a real query test.

Troubleshooting

Symptom: named fails to start

Likely causes: syntax error in /etc/named.conf, missing included files, or port 53 already in use.

First, we will check service logs and validate configuration again.

systemctl status named --no-pager
journalctl -u named --no-pager -n 200
named-checkconf -z /etc/named.conf

If named-checkconf reports an error, we should fix the referenced line. If logs mention “address already in use,” we need to find what is bound to port 53.

ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'

Once the conflicting service is stopped or reconfigured, restarting named should succeed.

Symptom: dig to 127.0.0.1 works, but clients cannot resolve

Likely causes: firewall rules not applied to the correct zone, clients not in allowed CIDRs, or routing/VLAN path issues.

We will confirm firewall rules and ensure they match the internal networks.

DEFAULT_ZONE=$(firewall-cmd --get-default-zone)
firewall-cmd --zone="${DEFAULT_ZONE}" --list-rich-rules
firewall-cmd --get-active-zones

If the interface is not in the expected zone, we should move it to the correct zone or apply rules to the active zone. After adjusting, we reload firewalld and retest from a client.

Symptom: queries return SERVFAIL

Likely causes: upstream resolvers unreachable, upstream refusing recursion, or DNSSEC validation failures due to upstream manipulation or broken chains.

We will first confirm upstream reachability on port 53.

UPSTREAM_DNS_1="10.10.10.10"
UPSTREAM_DNS_2="10.10.10.11"

timeout 3 bash -c "cat < /dev/null > /dev/tcp/${UPSTREAM_DNS_1}/53" && echo "TCP 53 reachable: ${UPSTREAM_DNS_1}" || echo "TCP 53 not reachable: ${UPSTREAM_DNS_1}"
timeout 3 bash -c "cat < /dev/null > /dev/tcp/${UPSTREAM_DNS_2}/53" && echo "TCP 53 reachable: ${UPSTREAM_DNS_2}" || echo "TCP 53 not reachable: ${UPSTREAM_DNS_2}"

If upstream TCP/53 is not reachable, we need to fix routing or upstream firewall policy. If upstream is reachable, we will query upstream directly to confirm it answers.

dig @${UPSTREAM_DNS_1} example.com A +noall +answer +stats
dig @${UPSTREAM_DNS_2} example.com A +noall +answer +stats

If upstream answers but BIND returns SERVFAIL, we will inspect logs for DNSSEC-related messages and consider whether upstream is compatible with DNSSEC validation in our path.

journalctl -u named --no-pager -n 200 | tail -n 200

If logs indicate DNSSEC validation issues caused by upstream behavior, we should correct upstream DNS integrity rather than disabling validation. In tightly controlled enterprise environments, DNSSEC validation is part of the trust model.

Symptom: clients get REFUSED

Likely causes: client source IP not included in allow-recursion or allow-query-cache, or NAT makes clients appear from an unexpected range.

We will confirm the internal CIDRs in the configuration and reload named after any change.

grep -n "allow-recursion" -A20 /etc/named.conf
grep -n "allow-query-cache" -A20 /etc/named.conf
systemctl reload named
systemctl status named --no-pager

After reload, clients in the allowed ranges should be able to recurse. If NAT is involved, we must allow the post-NAT source range, not the original client range.

Symptom: SELinux denials prevent named from working

Likely causes: non-standard paths, custom logging locations, or unexpected file contexts.

We will check for recent AVC denials related to named.

ausearch -m avc -ts recent | tail -n 50
journalctl --no-pager | grep -i avc | tail -n 50

If denials exist, we should correct file locations and contexts to match RHEL expectations rather than weakening SELinux. For standard deployments using default paths, this is uncommon.

Common mistakes

Mistake: Accidentally creating an open resolver

Symptom: external networks can query and receive recursive answers.

Fix: ensure allow-recursion and allow-query-cache are restricted to internal CIDRs and localhost, and ensure firewalld only allows port 53 from internal networks. Then reload named and firewalld.

named-checkconf /etc/named.conf
systemctl reload named
firewall-cmd --reload

We have revalidated configuration and reloaded both the DNS service and firewall policy.

Mistake: Forwarders set to the wrong upstream DNS

Symptom: dig returns timeouts or SERVFAIL, and logs show forwarding failures.

Fix: update the forwarders list to internal enterprise resolvers that are reachable from this server, then reload named and retest.

grep -n "forwarders" -A5 /etc/named.conf
systemctl reload named
dig @127.0.0.1 example.com A +noall +answer +stats

We have confirmed the forwarder configuration and validated resolution through the local cache.

Mistake: Firewall rules applied to the wrong zone

Symptom: local queries work, but internal clients time out.

Fix: identify the active zone for the interface and apply rich rules to that zone, then reload.

firewall-cmd --get-active-zones
firewall-cmd --get-default-zone
firewall-cmd --reload

We have confirmed zone state and reloaded policy. If the interface is not in the expected zone, we must correct zone assignment before rules will take effect.

Mistake: Another service is already bound to port 53

Symptom: named will not start, and logs mention “address already in use.”

Fix: identify the process bound to port 53, stop or reconfigure it, then start named.

ss -lunp | awk 'NR==1 || /:53[[:space:]]/ {print}'
ss -ltnp | awk 'NR==1 || /:53[[:space:]]/ {print}'
systemctl restart named
systemctl status named --no-pager

We have identified the conflict and restarted named. Once port 53 is free, BIND should bind successfully.

How do we at NIILAA look at this

This setup is not impressive because it is complex. It is impressive because it is controlled. Every component is intentional. Every configuration has a reason. This is how infrastructure should scale — quietly, predictably, and without drama.

At NIILAA, we help organizations design, deploy, secure, and maintain internal and enterprise DNS platforms that hold up under real growth: segmented networks, multi-site routing, compliance constraints, and operational realities. We focus on resolver architecture, access control, logging strategy, change management, and long-term maintainability—so DNS stays boring, even when everything else is moving.

Leave A Comment

All fields marked with an asterisk (*) are required