Back to Basics: Directory Traversal

Fastly Security Research Team, Fastly

Senior Security Researcher, Fastly

August 22, 2023

Back to Basics: A Primer
This is the second in a series of posts where we’ll cover the basics of different types of web application vulnerabilities and attacks. In each post, we’ll provide in-depth descriptions of the vulnerabilities, show how these vulnerabilities are exploited, discuss real-world examples, and discuss how to prevent them.

What is Directory Traversal?

Directory traversal, also known as “path traversal,” is a web application vulnerability that enables attackers to access unintended files on the underlying filesystem. Depending on how and where the traversal occurs, this could include reading arbitrary files on the web server or allowing write access to arbitrary files. This access could enable attackers to read sensitive data or files, modify application data, or take full control of the web server.

Traversal vulnerabilities are typically described based on whether they enable reading files or writing files. We’ll demonstrate the impact each can have in the following subsections.

Directory Traversal Allowing Arbitrary File Reads

Consider an application that allows a user to store photos and later retrieve them with a GET request using a filename parameter to specify which file to retrieve. If the application does not include protections against directory traversal and builds the file path using the provided filename parameter, it could be possible to retrieve arbitrary files from the underlying web server. Figure 1 shows this sequence in practice:

There are still some limitations, even if the application is vulnerable to traversal as previously described. The attacker is limited by the permissions of the application, so if a model of least privilege is applied that limits the application's access on the web server, that may limit the files the attacker can access with the vulnerability. This is one reason why traversal payloads frequently use /etc/passwd as their target — the file needs to be readable by all users.

There are other ways to sandbox and limit application access, which we’ll discuss further in the prevention section.

Directory Traversal Allowing Arbitrary File Writes

Consider this same photo-storing application, but it now allows the user to name each of the photos they’re storing. When saving the file to disk, the application uses the provided name to build the file path for the photo file. Unless there are sufficient protections, an attacker can add traversal sequences (e.g., ../) to the provided name, controlling which directory the file is stored in.

While this may seem innocuous, since it’s just storing a photo, there are ways to further the exploitation. If the file type is not validated (e.g., JPEG, PNG), an attacker could upload any file type, leading to several different exploitation scenarios. Some possible exploitation examples in this scenario:

Adding the attacker’s public key to a user’s authorized_keys file (e.g., /root/.ssh/authorized_keys) to gain persistent access
Overwriting application files to modify application behavior
Uploading a web shell within the web root
Causing a denial of service by overwriting needed system files
Uploading executable files (e.g., malware, ransomware)

Depending on the access level of the application, the impact of an arbitrary file write from a directory traversal can be devastating.

Directory Traversal in the Wild

CVE-2023-2825: Directory Traversal in Gitlab

In Gitlab version 16.0.0, there’s a directory traversal vulnerability that allows for arbitrary file reads. Uploading a file as an attachment to an Issue in a default Gitlab installation causes Gitlab to store the file 10 directories deep in a pattern as shown below:

/var/opt/gitlab/gitlab-
rails/uploads/@hashed/<directory>/<directory>/<directory>/<directory>/<filename>

After uploading, Gitlab also provides an endpoint to retrieve the uploaded file at

/<repo-name>/uploads/<file-id>/<filename>

In a request to that endpoint, Gitlab does not sanitize or validate the filename parameter, which permits a directory traversal attack. In order to exploit the traversal, the repository must be nested within at least 5 groups, with the amount of groups directly correlating to the number of directories you can traverse using the vulnerability.

In a standard installation, this means you need to nest the repository in 11 groups to be able to reach the root of the filesystem. In this scenario, an attack payload to retrieve the /etc/passwd file could look like the following:

GET /Group-1/Group-2/Group-3/Group-4/Group-5/Group-6/Group-7/Group-8/Group-9/Group-10/
Group-11/<repo-name>/uploads/<file id>/..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2f..%2
fetc%2fpasswd

Unauthenticated attackers can only exploit the vulnerability if a public repository fulfills the nested groups requirement. This is unlikely, making exploitation more likely to come from an authenticated user with privileges to create nested groups and repositories to satisfy the exploitation requirements. A full exploit chain can first create the needed groups and repository, upload a file, and then exploit the traversal to read arbitrary files, as seen in the PoC here.

CVE-2022-48362: Directory Traversal in ManageEngine Desktop Central

In ManageEngine Desktop Central builds prior to 10.1.2127.1, a directory traversal in file upload functionality allowed for arbitrary file writes by manipulating the “computerName” parameter (or a few others) to include traversal sequences.

At the time of its discovery, this vulnerability was being actively exploited and combined with an authentication bypass (CVE-2021-44515) to enable remote code execution. For brevity, we’ll focus just on the directory traversal portion of the exploitation, as it enables a file write and bypasses loose validation.

In a function named “doPost”, Desktop Central handles several parameters as part of a file upload in this function, including a “computerName”, “filename” and other parameters. There are two issues that lead to a successful directory traversal that enables file writes:

Only the filename parameter is checked for traversal sequences. Other parameters such as “domainName”, “computerName” or “customerId” are used to build the absolute path for the file, but are not checked for traversal sequences. The “computerName” is the last parameter used in the concatenated string, so would be the ideal place to enter a traversal sequence.
The file upload permits files with extensions of zip, 7z, and gz. At first glance, maybe this seems safe. However, because Desktop Central is a Java application, and JAR files are built on the zip format, this enables remote code execution. By uploading a zip file to the C:\Program Files\DesktopCentral_Server\lib directory and forcing a restart, a savvy attacker can overwrite class files in the application to include their own code and gain code execution.

This vulnerability highlights the severe impact a traversal vulnerability can enable, while also showing how an incomplete prevention can still enable directory traversal.

Directory Traversal as Seen by WAFs

Directory traversal is one of the most commonly observed attack techniques, which we’ve highlighted in our Network Effect Threat Report from the second quarter of 2023.

This could be for several reasons, including the severity of the impact if successful, or the size of payload lists that attackers and scanners may use. For example, a snippet from one of the PayloadsAllTheThings directory traversal lists attempts to read the same file with varying depths of traversal, as shown below:

../../../../../../../../../etc/passwd
../../../../../../../../etc/passwd
../../../../../../../etc/passwd
../../../../../../etc/passwd
../../../../../etc/passwd
../../../../etc/passwd
../../../etc/passwd

In most cases, an attacker likely does not know where the application is located on the filesystem when testing for a traversal vulnerability. The application won’t go past the operating system root (i.e., /), so longer sequences of “../” are valid to use to make sure the root is reached. However, there could be other limitations, such as on the input length, that make shorter sequences valuable to test as well in case longer ones are blocked or break application functionality.

Other attack types, such as Cross-Site Scripting (XSS), can use payload polyglots that combine multiple techniques into a single payload. This leads attackers and scanners to send many more payloads to test for traversal than when testing for XSS. Our example file from PayloadsAllTheThings contains 140 payloads, compared to their XSS_Polyglots file which has 16. The largest payload file for traversal in PayloadsAllTheThings has over 21,000 entries, compared to the largest for XSS at over 600, SQLi at over 400 (though one should assume this would be larger in practice if the database type is unknown, since multiple types would need to be tested), and over 400 for OS command injection.

However, the severity of the impact alone makes directory traversal a valuable target for attackers, as demonstrated in the above discussion of CVE-2022-48362, which enabled remote code execution and was known to be exploited at the time of discovery.

Preventing Directory Traversal Vulnerabilities

There are several strategies that can be used to prevent traversal vulnerabilities, including design changes to prevent building file paths with user input, strict validation of input, using path canonicalization, and limiting application access. We’ll cover each of these below.

Prevent building file paths with user input

Rather than using user input to build the file path, consider using a corresponding file id or name to reference the file. Then, map file ids to their corresponding storage path. When a user requests this file or uploads a file, they can only control its id, preventing the user from ever accessing the contents used to build the file path. This removes the possibility of traversal entirely, as the user’s input is no longer used to build the file path of the retrieved file.

Strict validation of user input

Strict validation is different from sanitization. Rather than attempting to remove traversal sequences (e.g., ../), which can be bypassed in innumerable ways, validate that only expected content is in user input and reject anything that does not pass the strict validation. Examples of validation that can help prevent directory traversal:

Validating a filename only contains alphanumeric values
Validating only a single . character is in the provided filename
Validating unwanted characters are not included in the provided input (e.g., /, \)
Validating the file type of an uploaded file (not by extension, which can be easily bypassed)

Use path canonicalization language features for strict validation

Path canonicalization essentially shortens the file path to its actual path, effectively removing symlinks, ../ sequences, and other symbolic content. After getting a canonical path, you can verify it still starts with your expected base directory (e.g., uploads/photos/). Some examples of functions to get a canonical path are:

Java: getCanonicalPath
PHP: realpath
C: realpath
ASP.NET: GetFullPath

Limit Application Access

In general, an application should only have access to files and directories it needs access to in order to function properly. This helps in some cases of directory traversal by limiting the impact of the vulnerability if one is discovered. For example, a web application should never be running under the root user, and ideally be restricted to only access the files it needs to serve the application.

References and Further Reading

Further reading:
- CWE-22
  - https://cwe.mitre.org/data/definitions/22.html
- Portswigger Web Security Academy
  - https://portswigger.net/web-security/file-path-traversal
Examples: