Broken Thoughts

Techknowledge

.Net Dojo: Data Validation

Validating data is a very important part of any application that takes user input. Without validation, users can enter whatever they feel like. This is especially bad when needing specific data, like an e-mail address.

To make things easy for myself, I have all my common validation methods in one library. Validating user input is done using regular expressions. While I do my validation on the backend, it is a common practice to use validators do do much of your validation on the client. This saves the user from having to do a postback to the server only to get an error back. I agree that this is a good practice, but for every javascript validation I do, I also validate it on the server. The reason behind this is that a user can download your html to their computer, strip out the validation, and post to your server. This was something I tested when working on websites for a certain polling and market research company. Because of this flaw, we developed a standard practice to only use required field validators for client side validation and test all other validation on the backend.

The most difficult validation of all is the e-mail validation. Over the years I have used several regular expressions, including at one time splitting the e-mail address on the @ and running two separate validations - one on the prefix and the other on the domain.

When I worked at a popular market research company, we had difficulty building correct e-mail validation.  The expression below isn’t the expression we used.  Another option that we considered (that I proposed) was to do a DNS lookup on the domain to make sure it exists.  While this can be a very effective means of making sure e-mails are valid, it was decided that they didn’t want to lookup the domain every time someone signed up on our sites.  For one, you have to wait for the response to come back.  We opted instead for validation via regular expressions and called it “good enough”.

Account functions, such as password resets are typically done through e-mail, so if a user inputs an invalid e-mail address, they will have to contact some sort of tech support, or just make a new account under a valid e-mail. Another option used in validating e-mails is the old “account validation” workflow where a user must click a link sent to them in an e-mail. This also can be very effective in makingsure users only register with valid e-mail addresses. Personally, the issue I see with validating e-mail by sending test messages or DNS lookups is that sometimes servers go down. I don’t want to inconvenience my users by telling them their normally valid e-mail is invalid.

This regular expression, however, is the holy grail of e-mail validation. It allows the use of a domain or ip address, and limits the address values to 0-255. The only problem with this particular expression is that the prefix can *techincally* be anything. If you own your own mail server, your mail address could be !#%#$%!(*&)(*$@mydomain.com if you so desired, but very few websites would let you register using it. This regex lets you use alpha-numerics, as well as underscores (_), dashes (-), and periods (.). This accounts for 99.99% of the e-mails users would have when registering for your website.

E-mail address

^([a-zA-Z0-9_\-\.])+@((([0-2]?[0-5]?[0-5]|[0-9]?[0-9]|[0-9])\.([0-2]?[0-5]?[0-5]|[0-9]?[0-9]|[0-9])\.([0-2]?[0-5]?[0-5]|[0-9]?[0-9]|[0-9])\.([0-2]?[0-5]?[0-5]|[0-9]?[0-9]|[0-9]))|((([a-zA-Z0-9\-])+\.)+([a-zA-Z\-])+))$

Another popular validation needed for data is numeric validation. I split the validation into several pieces. Integers, decimals, and numeric (both)

Numeric

^([-][0-9]*[.][0-9]+|[.][0-9]+|[-.][0-9]+|[0-9]*[.][0-9]+|[-][0-9]+|[0-9]+)$

Integer

^([-][0-9]+|[0-9]+)$

Decimal

^([-][0-9]*[.][0-9]+|[.][0-9]+|[-.][0-9]+|[0-9]*[.][0-9]+)$

By adding server side validation and not only relying on javascript validation you secure your data from potential sql injection, or other harmful data. Client-side validation is great, but it is unsecured, and is at best only a mild deterrent. Err on the side of caution, take the slight performance hit, and double check all your user input on the server. You’ll be happy if your application ever takes a hit from users trying to abuse your system.

Special thanks to Colin for pointing out flaws with the expressions in the first version of this article.

February 5, 2008 - Posted by Broken Bokken | .Net | , , , , , , , , , , , , , , , , , , , , , , , , | 1 Comment

1 Comment »

  1. Excelent Work
    Saves time to code java script

    Comment by vitthal | February 7, 2008

Leave a comment