Cloudflare is one of the biggest providers of content delivery network services in the world. On July 2, they experienced a nearly complete service outage that affected all of their customers and lasted approximately half an hour. This unprecedented event was not a result of an attack but of a mistake in the configuration of the web application firewall (WAF). It also highlighted the danger associated with using certain types of regular expression engines.
Praise for Complete Transparency
Despite making a grave mistake, Cloudflare deserves praise for maintaining complete transparency. They set the example for how a serious enterprise should treat its customers in the case of a major outage, independent whether the outage was caused by a security incident or by a human error.
The entire incident was described on the Cloudflare blog with complete details. The cause of the incident was a seemingly improbable combination of minor issues that together led to a major problem. In summary, a fatal configuration update to the web application firewall led to the saturation of resources. The fatal update made it to production and Cloudflare was unable to revert it immediately.
Fatal Regular Expression
Cloudflare is well-known for their web application firewall, which is the first line of defense against zero-day attacks (but cannot be treated as the ultimate security solution). A web application firewall analyzes traffic before it reaches the web server, looks for patterns, and then eliminates calls that match potential attack payloads. Like many WAFs, Cloudflare relies on regular expressions to build filtering rules.
Regular expressions provide a powerful means to filter information for potential threats but they have their downsides. The engines that process them might require quite a lot of processing power to consider all possibilities. This was exactly what brought down Cloudflare. The engine that Cloudflare employs for their WAF uses a process called backtracking. Unfortunately, in the case of certain types of regular expressions, this process becomes extremely resource-intensive.
To see, how a simple regular expression may cause resource exhaustion, you can use the Perl Regexp::Debugger application to test a
/.*.*=.*;/ regular expression against a
x=xxxxxxxxxxxxxxxxxxxx string. Increase the number of
x’s to see how drastically the number of steps grows due to backtracking. An
x= followed by 20
x’s already takes 5,353 steps to match!
There are three major web security lessons to be learned based on the Cloudflare outage:
- Several minor mistakes or ommissions may come together to form a major problem. For example, Cloudflare team members could not authenticate to their own internal control panel because their credentials were revoked due to infrequent use.
- Web application firewalls must be configured and treated carefully and must not be treated as the ultimate security solution. WAFs affect production systems directly so if there is a WAF malfunction, your customers may lose access to your systems.
- Regular expressions may cause potential Denial of Service conditions. If your web application processes regular expressions you also need to make sure that a malicious payload doesn’t cause a resource overload due to mechanisms such as backtracking.