So, in previous posts, we saw what XSS is and how the attacks are executed. Keep in mind the nasty things like img tags that simply execute what is in their "src" attribute without any thought to whether or not there is a real image being returned. If a hacker puts inline script or a URI to an offsite script into the src attribrute, then the script will be returned and executed in your browser.
Ow! So what can be done? Is all lost?? No of course not...settle down. First of all, the bad img tag needs to be on your site somehow. YOU won't put one there, so how else can it get there? Well, if you allow comments on your press releases or on photos on your site, or you allow users to supply input that gets stored or processed for redisplay on your site (comments, posts, discussion groups, user content, profile, etc) then you have to block XSS from happening.
Whitelist Data EntryGo to those points of data entry into your system, and don't forget things like URLs that you process. Consider whitelisting all input. What is whitelisting? Well, you know that a blacklist is something you don't want to be on because it keeps you out of fun places. Anyone not on the list is allowed in -- but in our case, what if you forget to add the next new method of attack that comes out? And did you catch every way that the attack could be encoded to hide it and beat your filters? A whitelist is the list of acceptable input for a point of entry. Everything else is disallowed -- this protects you now AND in the future. And the maitre d' doesn't accept bribes.
Think about your site's contact form that asks for name and address info. Why accept 4000 characters in the city field? Why accept special characters in a zip/postal code field? Whitelist acceptable input for those fields.
- Limit your input
- Strongly type input where possible
- Do NOT rely on input constraints to save you
What To Do If The Whitelist Idea Isn't PossibleAnother reason to not rely on input validation is that in some cases your whitelist is going to be too broad to stop some things from happening and/or you can't whitelist easily. Let's say you have a piece of code that looks at the querystring and spits some things out to the screen based on it:
This page is going to get data based on the
regionquerystring variable, and then use that variable as the title to the graph of data. Sure, not a best practice website right there, but you sometimes have to work with other peoples' code, dontcha? So the code grabs the
regionvariable and pulls data, showing a message "No data found" if there is an error or if there is no data found. The logic then takes the
regionvariable and puts it in a label control on the screen. Easy enough! Weeeelllll, let's put some script in the querystring and see what we get!
Clearly, this "region" will not return any data and will probably return an error of some sort. But what about when we take the variable and put it into a label control? Can you be sure it won't be put to the browser "as is" and executed as script?
What if the web page takes in a comment about the website and upon postback says "Thank you for your feedback. The following has been sent to our staff: your comment here."? And you want to let people put html in the field so they can bold their nasty words and up the font size when they are yelling at you. What if someone puts the above script tag in that field? You get the idea. People can and will misuse your forms to put script in there. Will the script attack you? Maybe. Could be setting up shop to use your server as a member of a botnet...or opening a gate to distributing malware to attack Land's End. Who knows? Anyway, whitelisting won't help you in some cases.
Encode Your Output To The ScreenOne of the best (and only) ways to protect your web pages is to encode all output to your browser. Assume it has been tampered with or is user-generated (we never trust users) and put the proper encoding on all of it. Hey, let the input come in decoded and nasty, and then process it normally, but then encode it before you put it back to the screen. Like with the
regionexample above, who cares if you try to query the database for a region with a script name? (It DOES matter, but we will cover this when we talk about SQL Injection.) So the database returns no records. Great! But if your page takes the input and does a
Response.Write()or some other method of putting the raw input to the screen, you need to encode it.
To encode it, you may think to use
Whatever you use, encoded script won't run. Period. Doesn't matter what the pesky script would have done, it becomes plain old text that is output as is to the browser rather than executed.
ASP.NET's Default Protection Against XSSThe nice thing for all of us is that ASP.NET 1.1+ has built-in protection against XSS by looking for script tags or other dangerous things. If someone tries to put script in there, including ways where they try to obfuscate what they are doing, the ASP.NET engine sees it and gives you an error message like the following:
A potentially dangerous Request.QueryString value was detected from the client (txt="<SCRIPT SRC=http://h...").
This is turned on by default. Great, right? Well, us lovely developers hit a message like that and wonder "How can I turn that off so I can allow
btags in my content?" We see, "Oh! It is just a page-level attribute!" and turn it off. Sigh. The attribute is
ValidateRequestand it is set to True by default. Sure, set it to False and then tags are allowed through...but hey wake up! Tags are allowed through! You've just given hackers an open door. Find other ways to allow
btags. Which is better, a user complaining they can't bold something or a user complaining their social security number was stolen off your site? Your call.
Remember that the default protection is only on the request so you still need to encode the response.
AlsoAnother reason for going only after the output to the browser is that the browser is where these scripts run. You do have to consider how your data is being used by other people/systems, however. Web input can go to an internal reporting system and that can be as big a problem.
One thing you might not have considered is error logging. Be careful! Don't throw the raw data put into your fields to your event log without encoding it! Think about it -- you drop the script into the event log and there is some internal PHP page your system admins have put in place to view errors called from your site. BAM...script runs in THEIR browser and it might be silent and behind the scenes. Your site is fully protected, your input is validated, your output is encoded, you don't allow scripts to affect your processing, etc. But you dropped the script into a data store (event log) that can be used by other groups who don't have all that security in place. In my opinion, your system admins need to be better about security...they don't have an excuse either. Just like you. But don't give someone a lit bomb either. You are the one in control of the data so you are responsible for it...and you are the one that should be held responsible for the script running on their machines. You handed them a lit bomb. Don't blame them for not disarming it.
SummaryThat is really all there is to it. Blur your eyes on all my explanation and it boils down to three things:
- Validate and whitelist all input but don't rely on these alone
- Encode all output to the screen, plus to any data store that might have consumers of the data that don't/can't encode their output.
- Don't turn off default ASP.NET protection of
ValidateRequestunless you plan on writing the code you need to whitelist and protect your input sources.
And doing all of these is easy. Use good validators from ASP.NET's toolbox, and use the Microsoft Anti-XSS Library to encode your output.
The next and last post on XSS will give you links and ideas on where to learn the details of how you perform the actual protection in your code.