SigParser parses millions of emails each month. As a result we’ve got a lot of experience and data on what email addresses look like. In this guide we’ll cover practical ways to validate email addresses. Not only will we use regexes to do the validation but we’ll cover some other strategies.
High Level Thoughts
It is incredibly difficult to build a good regex to handle all the validation scenarios.
For example, these are all valid email addresses:
!@mydomain.net firstname.lastname@example.org email@example.com "test"@example.com firstname.lastname@example.org email@example.com ip@[IPv6:2002:DB8::1] nodot@xx
So you should decide how restrictive you need to be in your matching. If this is for a user signing up on your website and if you’re going to email them a validation code then you might not need to be too strict.
Super Basic Regex
If someone types an @ symbol sometimes that’s good enough. Notice the examples above. This regex will match all of them. The @ sign is a super simple way to do some easy validation. Don’t overlook this as a valid option. But you could also throw in some length validation as well which is discussed below.
Matches firstname.lastname@example.org but won’t match m@.com or .@..com.
This regex ensures the user typed at least one character before the @ and one after. This will match every one of the examples up above and is fast. That’s one problem with more complex regex solutions is they aren’t always fast.
This validation pattern is from the ASP.NET code base. It is complex and some addresses like !@example.com won’t match it but it will match most email addresses. You could use it but why risk having a valid email address be rejected.
Validating on length isn’t a bad idea either.
Here is some data on what we’ve seen for email address lengths for real people when parsing emails:
- Average email address length is 22 characters long
- Email addresses are almost never less than 7 characters in the real world.
- 99.99859% of email addresses are 8 characters or longer
If you built your email validation rule to validate that an email address is at least 7 characters, that would be a pretty good rule.
If you’re curious, the average marketing/spam email address is 30 characters long.
Other Valiation Strategies
Often times you’re not validating an email address to see if it is real but instead if it is a human or not. One way to detect this is using some common patterns. For example, an email address will often have noreply somewhere in it.