The very real threat of information disclosure by means of inadvertent exposure of sensitive files has been a constant source of woe for corporations and individuals alike. Despite having the potential for serious repercussions including legal ones, many webmasters, administrators and developers have struggled to contain this common issue for years. This article explores various manifestations of related issues, gives readers a glance at the modi operandi of real-world attackers trying to exploit them, and provides guidance on how to protect a website against file based information leakage.
One of the most common examples of backup files exposing sensitive information may be that of the backup copy of a .php file. A server administrator planning to modify a configuration file, such as wp-config.php, may choose to create a backup copy with a similar name first – in this example, wp-config.bak. Although clearly not best practice, this exact behaviour can be observed in the wild on a regular basis.
While the original configuration file’s name with the extension .php will be passed through the server’s PHP interpreter, the same can not necessarily be said about the backup copy. Unless configured otherwise, many popular web servers would simply deliver a file with the extension .bak as is, exposing the .php file’s source code, configuration options, and – in the case of an actual WordPress configuration file – database credentials.
When temporary files cause permanent damage
While it’s certainly easy to blame the exposure of such files on human error alone, many similar cases have resulted from software taking far-reaching decisions such as the creation of temporary or backup files on behalf of their users – and often without their knowledge, let alone consent. A typical example would be that of text editors quietly creating backup copies of currently edited files in the same directory with an easily guessable file name, often simply appending the tilde character (~) to the original file name. Even though most text editors tend to delete these files once deemed unnecessary, such functionality alone must not be relied upon given what’s at stake. Ultimately, the responsibility for the timely removal of sensitive files remains with the user.
Another common source of unintended information disclosure are versioning tools such as Git. Used by both novice and seasoned developers all over the world, partial and even whole Git repositories have repeatedly found their way onto publicly accessible web servers, and, as a result, into the hands of malicious actors.
While a proper and mature deployment cycle should not allow such data to reach a production system in the first place, experience has shown over and over again that a single mistake by programmers and administrators, often finding themselves working under relentless pressure, can be sufficient for entire
.git directories to slip through and expose its information to an audience far larger than anticipated.
Passive reconnaissance, active exploitation
Both malicious hackers and legitimate security researchers have been known to develop, distribute and employ tools specifically crafted with the sole purpose of locating and extracting sensitive information from forgotten or unintentionally shared files. The resulting dangers are greatly exacerbated by the relative ease with which these tools can be used to aid in the discovery of file based information disclosure.
Some of the resulting attacks do not rely on actively requesting sensitive files from the targeted web server, but will instead make use of more passive forms of reconnaissance. This includes, but is not limited to, abuse of the various little-known operators supported by freely available search engines, such as Google, Bing and Yandex. An example would be the combined use of the site: and ext: operators, as shown in the following example:
This search query
<https://www.google.com/search?q=site:testphp.vulnweb.com+ext:bak&hl=en&filter=0> will yield a list of files ending in “
.bak“, an extension commonly associated with backup files, while restricting search results to the domain of the target – in this case,
At the time of writing, the search results for this query consist of links to the copy of the site’s
index.php file, exposing its source code, followed by a copy of a common WordPress configuration file, exposing sensitive information such as database credentials:
To further expand on this example, readers will note the small triangle right next to the URIs of each search result:
Clicking on it will open a menu offering access to a “Cached” version of the file
<http://webcache.googleusercontent.com/search?q=cache:XaK4yx2VnxYJ:testphp.vulnweb.com/index.bak+&cd=1&hl=en&ct=clnk&gl=mt>, allowing for entirely passive access to its contents without leaving evidence of access in the log files of the targeted web server. This enables attackers to extract potentially sensitive information without having to connect to the server at all, leaving site administrators and breach investigators none the wiser.
Beating the bots: Why early detection matters
Search engine caching adds another layer of complexity to the remediation of such issues. Not only must these files be identified and removed from the system exposing them – often a daunting task, considering their creation may have been unintended in the first place, but they also have to be deleted from various caches, including those of search engines. In practice, exposed passwords and similarly sensitive data have to be considered as known to third parties, leading to time-intensive and costly follow-up investigations of other systems potentially affected by the leaked information.
Fortunately, Acunetix offers various checks for multiple variants of this common vulnerability class. This includes, but is certainly not limited to, the discovery of backup files, backup copies of both files and directories, temporary files, versioning and source control system data, and also more exotic causes of information disclosure such as PHP coredumps and phpMyAdmin SQL exports.