Misconfigured CORS, Stealing User Data From The Alexa 1M


I found a significant number of sites in the alexa 1 million misconfigured in a way that allows user data to be stolen.

Cross Origin Resource Sharing (CORS) is a method of allowing websites the ability to share resources across origin by way of HTTP Headers. The average case without CORS looks something like this. Example1 requests the web root. The user agent does not let this response complete because it violates Same Origin Policy.

If example2.com wanted to share with example1.com using CORS, the same request would look like this. Example1 sends an Origin header, and Example2 responds with an Access Control Allow Origin header, which allows example1.com to do whatever with the data. This is a pretty basic configuration and there's lots of additional things to consider in the real world. For example, allowed request methods, whether or not credentials can be a part of the request, and more.

The Hypothesis

Same Origin Policy is the bedrock of web applications security. Misconfiguring CORS headers have massive consequences as we'll see. I hypothesized that a signficant number of sites had CORS headers configured incorrectly, allowing for user data to be stolen by malicious web pages.

There are two key requirements for a CORS misconfiguration to allow a malicious web page to steal user data. First, victim services must return an appropriate Access Control Allow Origin header for the origin request header it receives. A wildcard, "*", is not appropriate for our purposes because it does not allow the sending of credentials required to steal user specific data. Second, victim services must have a Access-Control-Allow-Credentials response header set to true. With these two response headers in check, an attack would look like this (on paper).

The Attack

I scanned the alexa top 1m with a bash script and cURL. I specified a special crafted Origin header.

Instead of sending the Origin header of evil.com I sent the Origin header of $domain.evil.com. This turns out to trick a lot of websites in to reflecting my origin header and giving me the Allow Origin header I needed. Lots of websites checked for the appearance of their domain name within the Origin header. This check isn't sufficient.

There's no way for me to actually verify that this little piece of trickery moved the numbers, but by hand when actually making PoC webpages to do this, I did notice that many websites required I got the extra step of using the domain as the subdomain, like flipboard.

Here's what my shell script looked like.

Grepping logs is not really searchable or understandable to pass along. Instead I wrote a tiny golang program, referred to above as respirator, that crudely reads in HTTP response headers and turns them in to JSON if the response indicate it site that might be misconfigured. That way at the end of my scan I had a big dump of sites to look at.

Here is a list of sites I found that return both a reflected Origin header, and an Access Control Allow Credentials header set to true, from the alexa top 1 million. It's nice to note that I added the domain name to each JSON entry of HTTP Response Headers as "x-hostname", but I added an extra dot to the end which was a bug, not a feature.

The Conclusion & Open Questions

The conclusion is that many sites might have misconfigured CORS headers which is a complete breakdown in the bedrock of web application security, Same Origin Policy. The RFC for CORS and these headers are available, but this is an inherently confusing topic with a lot of moving parts. Even I don't completely understand all the in's and out's of Same Origin Policy and CORS. There are lots of details in RFCs that get brushed over when using a technology.

This type of thing makes me wonder if the people writing RFCs should pass on making feature full RFCs in favor of dead simple RFCs that are less likely to have mistakes made by the RFC consumers. Looking at newer RFCs like HPKP and HSTS, this isn't the case as these RFCs provide a lot of opportunity for foot-shooting.

Some strange patterns I noticed is that many websites that were possibly misconfigured also responded with a "Via: vegur" header. Roughly 10% of all sites that may be misconfigured. I did not investigate this further.

Just to note, this is not the only CORS configuration that is bad. Several people I talked to about this noted that internal APIs are oftentimes the worst CORS abusers. Unauthenticated APIs with Access Control Allow Origin * means that an attacker who knows the internal domain name through a URL leaking can use employees of the company to pivot to internal assets. This is kind of similar to drive by router attacking, with CSRF. It also isn't necessarily a bad thing to reply with Access Control Allow Origin headers.

Finally, I'll leave you with this. Security is contextual. Not all of the sites in the dump are misconfigured, but it's a safe bet that a bunch of them are. Traditional sites that require a username and password with this configuration are almost certainly misconfigured. Static pages displaying `man pages` or something might be misconfigured, but those don't matter.

ejj, Feb 2016