We just did a big update on our system, where we try to prevent possible xss attacks. For those who don't know, xss attacks can happend when you allow user to send text to your server and then display it. From v. 1.1 Asp.Net provided some protection from xss but it's very limited and I disabled it immediately when it came out since it just throws an ugly error message to the user who is trying to put something as small as <b> in his text.
The framework from Microsoft, using viewstate and postback, provides some protection as well since it's harder to change variables, but I never liked that framework, I instead did it the old fashion way, a edit.aspx page that submited it's forms to save.aspx using post. So this is how I get the variables.
string name = Request.Form["name"];
int id = int.Parse(Request.Form["id"]);
bool isTrue = (Request.Form["isTrue"] == "1");
The problem with this is that I had to clean the variables every time to prevent xss with something like my method
Util.ClearHtml(string), the code is something like this
string name = Util.ClearHtml(Request.Form["name"]);
int id = int.Parse(Clear.Html(Request.Form["id"]));
bool isTrue = (Clear.Html(Request.Form["isTrue"]) == "1");
This is a bit to much code for me and since I'm lazy I almost never did this. Then if somebody sends id=abc the page would crash. This didn't bother me so much, since it only happend when something spooky was going on. But it's ugly. Also the ClearHtml method didn't really do very good job in cleaning the text from possible xss attacks.
So my solution is this. I created the XssClear class with the following methods
This is for getting value from a submitted form using POST
public string F(name); //returns null if string is null else string.Empty if fails to convert, string is Trim()-ed
public int FInt(name); //returns 0 if fails to convert
public double FDouble(name); //returns 0 if fails to convert
public decimal FDecimal(name); //returns 0 if fails to convert
public bool FBool(name); //returns false if fails to convert
public DateTime FDateTime(name); //returns DateTime.MinValue if fails to convert
public string[] FArray(name); //returns null if fails to convert or if value is == "" and "," is the default seperator
public string[] FIntArray(name); //returns null if fails to convert or if value is == "" and "," is the default seperator
public XssClearResult FHtml(name); //this is used when the user can insert html
This is for getting variable using GET (or QueryString)
public string QS(name);
public int QSInt(name);
public double QSDouble(name);
public decimal QSDecimal(name);
public bool QSBool(name);
public DateTime QSDateTime(name);
public string[] QSArray(name);
public string[] QSIntArray(name);
public XssClearResult QSHtml(name);
All my pages inherits from the page BarnalandPage (old name), there I have these methods. So when I want to get a varible from QueryString this is how it's done
string name = QS("name");
int id = QSInt("id");
bool isTrue = QSBool("isTrue");
This saves alot of pain, errors in our system has gone down extremly in the first hours of having it running like this.
So to clean the text for xss attacks I do few things. I don't have think about the basic variables, int, bool, datetime, double, decimal. If the class can't convert it, the default value is returned. Allowing text and html (FHtml and QSHtml) is the big trouble.
This is the method to clean a text and not allowing html
private string Get(string key, string value, string def) {
if (value == null) return def;
//Let's check if we have already retrieved this key before
if (content.ContainsKey(key)) return (string) content[key];
//AntiXss is a library from Microsoft, seems to work fine
string s = AntiXss.HtmlEncode(Ingig.Util.ClearHTML(value)).Trim();
content.Add(key, s);
return s;
}
The AntiXss library seem to do it's job fine, prevents every attack that is on the
xss cheat sheet
Next job is to remove any possible xss attack in a html text. This is more complicated, since the possiblities are many as you can see on the cheat sheet.
We use the method GetHtml(string key, string value); and it looks like this
public class XssClear {
....
....
....
XssClearResult result;
string pattern;
public XssClearResult GetHtml(string key, string value) {
if (value == null) return null;
if (content.ContainsKey(key)) return (XssClearResult) content[key];
result = new XssClearResult();
// change every hexadecimal to ascii
// change every hex value to ascii
// remove 	 | 	 | 
 | | 
 | 
// remove tab | \n | \r
// remove /\*\s*\S*\s*\*/ 'This is comments in code
value = Regex.Replace(value, @"(&\#[0-9]{1,20};?)", new MatchEvaluator(XssClear.DecimalToString), RegexOptions.IgnoreCase);
value = Regex.Replace(value, @"(&\#x[0-9][0-9a-f];?)|(%[0-9a-f]{2};?)", new MatchEvaluator(XssClear.HexToString), RegexOptions.IgnoreCase);
value = Regex.Replace(value, @"¼(/?script)¾", "<$1>", RegexOptions.IgnoreCase);
value = Regex.Replace(value, @"/\*[^*]*\*/", "", RegexOptions.IgnoreCase);
// value = Regex.Replace(value, @"(\n)*", "");
value = Regex.Replace(value, @"(\t|\r|\f)*", "");
DataTable dt;
//I have the database table XssRegex, which contains list of regex that prevent xss attacks, let's load it from db if not cached
if (HttpContext.Current.Cache["XssRegexList"] == null) {
//This uses my DbConnection class which connects to the database and saves alot of pain
DbConnection db = new DbConnection();
db.Sql = "SELECT regex FROM XssRegex";
dt = db.Query().Table;
HttpContext.Current.Cache.Insert("XssRegexList", dt);
} else {
dt = (DataTable) HttpContext.Current.Cache["XssRegexList"];
}
//Now we run through the regex and remove any thing that is a possible xss attack
//If we find a possible attack we remove it and add it into our error list
//and report what pattern it was that caused the error
Regex regex;
MatchCollection matches;
foreach (DataRow dr in dt.Rows) {
regex = new Regex(dr["regex"].ToString(), RegexOptions.IgnoreCase);
matches = regex.Matches(value);
if (matches.Count > 0) {
for (int i=0;i<matches.Count;i++) {
result.Errors.Add(AntiXss.HtmlEncode(matches[i].Value.Replace(matches[i].Value, "[[[span class[]red]]]" + matches[i].Value + "[[[///span]]]")).Replace("[]", "=").Replace("///", "/").Replace("[[[", "<").Replace("]]]", ">"));
result.Patterns.Add(AntiXss.HtmlEncode(dr["regex"].ToString()));
}
value = regex.Replace(value, "");
}
}
pattern = "<[a-z*^on]*\\s*([a-z^on]*=\"?[a-z]*\"?)|(?<Event>on[a-z]*[^=]*=[^ >]*)|";
pattern += "<[a-z*^on]*\\s*([a-z^on]*=\"?[a-z]*\"?)|(?<Event>seeksegmenttime[^=]*=[^ >]*)|";
pattern += "<[a-z*^on]*\\s*([a-z^on]*=\"?[a-z]*\"?)|(?<Event>fscommand[^=]*=[^ >]*)|";
pattern += "<[a-z]*\\s*[a-z]*=\"?([a-z^on]*=\"?[a-z]*\"?)|(?<Event>(mocha|livescript)[^:]*:[^ \">]*)\"?";
pattern = "<[a-z*^on]*\\s*((?<Attributes>[a-z]*=?\"?[^\">]*\"?)\\s*)*";
regex = new Regex(pattern, RegexOptions.IgnoreCase);
matches = regex.Matches(value);
for (int i=0;i<matches.Count;i++) {
value = regex.Replace(value, new MatchEvaluator(RemoveOnEvent));
}
pattern = "<embed(\\s*\\w*\\s+|(\\w*=\"((?<Always>always)|[^\" ]*)\"))*";
regex = new Regex(pattern, RegexOptions.IgnoreCase);
matches = regex.Matches(value);
for (int i=0;i<matches.Count;i++) {
value = regex.Replace(value, new MatchEvaluator(RemoveAlwaysInScriptAccess));
}
result.Text = value.Trim();
string s = (value);
content.Add(key, result);
return result;
}
//This is for removing AllowScriptAccess in embed tags
public string RemoveAlwaysInScriptAccess(Match m) {
if (m.Groups["Always"].Value.Trim() == "") return m.Groups["Always"].Value;
result.Errors.Add(AntiXss.HtmlEncode(m.Value.Replace(m.Groups["Always"].Value, "[[[span class[]red]]]" + m.Groups["Always"].Value + "[[[///span]]]")).Replace("[]", "=").Replace("///", "/").Replace("[[[", "<").Replace("]]]", ">"));
result.Patterns.Add(AntiXss.HtmlEncode(pattern));
return m.Value.Replace(m.Groups["Always"].Value, "no");
}
public string RemoveOnEvent(Match m) {
string txt = m.Value;
for (int i=0;i<m.Groups["Attributes"].Captures.Count;i++) {
string temp = m.Groups["Attributes"].Captures[i].Value.ToLower();
if (temp.StartsWith("on") || temp.StartsWith("seeksegmenttime") || temp.StartsWith("fscommand") || temp.StartsWith("mocha") || temp.StartsWith("livescript")) {
result.Errors.Add(AntiXss.HtmlEncode(m.Value.Replace(m.Groups["Attributes"].Captures[i].Value, "[[[span class[]red]]]" + m.Groups["Attributes"].Captures[i].Value + "[[[///span]]]")).Replace("[]", "=").Replace("///", "/").Replace("[[[", "<").Replace("]]]", ">"));
result.Patterns.Add(AntiXss.HtmlEncode(pattern));
txt = txt.Replace(m.Groups["Attributes"].Captures[i].Value, "");
}
}
return txt;
}
/* One trick for xss is to type everything in Decimal or Hex, these two methods (DecimalToString and HexToString)
changes all decimal and hex into regular text */
public static string DecimalToString(Match m) {
return char.ConvertFromUtf32(System.Convert.ToInt32(m.Value.Replace("&#", "").Replace(";", "")));
}
public static string HexToString(Match m) {
return char.ConvertFromUtf32(Convert.ToInt32(Convert.ToUInt32(m.Value.Replace("&#x", "").Replace("%", "").Replace(";", ""), 16)));
}
}
The result is then loaded into XssClearResult
public class XssClearResult {
private ArrayList errors;
private ArrayList pattern;
private string text;
public XssClearResult() {
this.errors = new ArrayList();
this.pattern = new ArrayList();
}
public ArrayList Errors {
get {return errors;}
}
public ArrayList Patterns {
get {return pattern;}
set {pattern = value;}
}
public string Text {
get {return text;}
set {text = value;}
}
}
So when I want to allow user to send html over the wire I simply do
string text = FHtml("text").Text;
If I want to list all the possible xss attack that the user inserted I simply do
ArrayList al = FHtml("text").Errors;
for (int i=0;i<al.Count;i++) {
Response.Write(al[i]);
}
As far a I can see this prevents "any" known xss attacks which is nice because they used to be everywhere in our system. Why did I use "any"? Well I don't do checks for old Netscape 4 browsers.
The are some assumptions that I make,
- string is always trimmed,
- number variables(int,double,decimal) returns 0 if it fails to convert,
- boolean variable returns false if it fails to convert,
- DateTime returns DateTime.MinValue if it fails to convert,
- if the value that is going to be converted to an array is empty it returns null.
You can still change the default value, in one case I needed FInt("id") to return -1 as default value, so the method FInt("id", -1) came to be available.
You
can download the XssClear class here, it includes XssClear, XssClearResult and a txt file with the regex's in the database, hope you like it and can use it.