![]() |
![]() |
|
![]() |
Email Address Validation © 2004 Rick Kelly www.crooit.com Preface The example OPAL (Paradox® 9) presented in this article is available as a download here. After downloading into the folder of your choice, make that folder :WORK: and run the included script for a demonstration. Introduction An interesting aspect of email addresses is that there does not exist an established set of rules for validation. In developing an OPAL based validator, there is a balancing act between how tight or how open the validation process will be. The approach taken here is not the only methodology that could be applied and the process outlined does have a solid technical foundation based on the published RFC standards for SMTP and POP3 mail protocols. Note that the validation is for syntax only and a logical followup would be to connect to the domain mail server (MX record via DNS) for additional validity checking. This connection session does take time and a deployment decision by the developer is necessary to evaluate requirements and trade-offs. Email Address Syntax The basic email address syntax structure is: <account@domain> The bounding <> pair is optional and does not affect the general syntax and are stripped out if found. A leading account, delimited by a @ character, precedes a domain. An account and domain are both required for a valid email address. Account After review of RFC 822 (http://www.faqs.org/rfcs/rfc822.html), it seems that the account portion can potentially contain a wide range of character values. Although there are some rules covering syntax, our validator will only ensure that some account is present and that it is terminated by the last @ character found. This means that the account itself could contain @ and we will have to design the account validator portion to take that into account. The first steps for our validator are to separate the account and domain portions using the last @ separator found. The OPAL methods breakApart() and searchEx() will be the main agents. The first obstacle is to locate the last @ character. Since searchEx() scans from left to right, it seems that some sort of repetitive loop would be necessary. Rather than loop through, we will reverse the entire presented email address and one searchEx() will locate the correct @ character. In effect, after reversal, we are searching from right to left which is exactly what we want in this case. Along the way, we will check for missing or empty account and/or domain segments. A generic string reversal procedure might look like: Proc cmReverseString(var stInput String) String ; ; This function takes an input string and reverses it ; var stOutput String liIndex LongInt liSize LongInt endVar stOutput = blank() liSize = stInput.sizeEx() switch case liSize > 0 : for liIndex from liSize to 1 step -1 stOutput = stOutput + stInput.substr(liIndex,1) endFor endSwitch return stOutput endProcOne additional feature in the validator will be the return of error codes of the LongInt Type that can be used to pinpoint the problem and build custom error messages. Now that we have a string reversal procedure, the extraction and separation of the account and domain looks like: Proc cmSeparateAccountAndDomain( var stEmailAddress String, var stEmailAccount String, var stEmailDomain String, var liError LongInt) Logical ; ; Given an email address, separate and return the ; account and domain portions. ; ; The leading account is separated from the domain portion ; by the rightmost @ character. ; var loReturn Logical stAny String liPosition LongInt endVar loReturn = False stEmailAccount = blank() stEmailDomain = blank() ; ; Strip leading and trailing white space ; stEmailAddress = stEmailAddress.rTrim() stEmailAddress = stEmailAddress.lTrim() switch ; ; Missing email address ; case stEmailAddress.isBlank() = True or stEmailAddress = "<>" : liError = 1 ; ; @ account/domain separator found? ; case stEmailAddress.searchEx("@") = 0 : liError = 2 otherwise : ; ; The address may be encapsulated by < and > and ; those will be removed if found and any leading ; or trailing white space removed. ; switch case stEmailAddress.substr(1,1) = "<" and stEmailAddress.substr(stEmailAddress.sizeEx(),1) = ">" : stEmailAddress = stEmailAddress.substr(2,stEmailAddress.sizeEx() - 2) stEmailAddress = stEmailAddress.rTrim() stEmailAddress = stEmailAddress.lTrim() endSwitch ; ; To determine the position of the last @ character, we ; will reverse the string, locate the first @ character ; and calculate the position. ; stAny = cmReverseString(stEmailAddress) liPosition = stAny.searchEx("@") liPosition = stAny.sizeEx() - liPosition + 1 switch ; ; If the @ character is at the end or beginning of the address, ; the address is invalid ; case liPosition = 1 : liError = 3 case liPosition = stAny.sizeEx() : liError = 4 otherwise : stEmailAccount = stEmailAddress.substr(1,liPosition - 1) stEmailDomain = stEmailAddress.substr(liPosition + 1, stEmailAddress.sizeEx() - liPosition) loReturn = True endSwitch endSwitch return loReturn endProcDomain At this point, we have separated the account and domain address components and validated that both are available. Any leading or trailing white space and/or encapsulating <> characters have been removed. The domain is where a majority of the validation work is performed. The domain consists of one or more sub-domains separated by one dot (.) each of which must contains one or more of the allowable characters 0-9, a-z and dash (-). The right most sub domain found is considered to be a top level domain (TLD). Domain Syntax: subdomain.subdomain.TLD Each "subdomain" shown above is optional, only the TLD is required. The Internet Corporation for Assigned Names and Numbers (ICANN) maintains a list of TLD’s at: http://www.icann.org/tlds/ To maximize flexibility for our validator, we will use a standard windows profile (*.ini) file to store TLD’s for validation that can be customized and/or maintained for each application reference. [TLDS] LastTLDId=257 1=ac 2=ad ... 257=biz The TLD’s are loaded into an Array of type String using the following procedure. Type arString = Array[] String endType Proc cmRetrieveTLDValidators(var arTLD arString, stTLDFileName String) ; ; Parse the TLD.ini profile file for valid TLD's ; to validate against ; var liTotalTLD LongInt liTLDID LongInt stSection String stTLD String endVar arTLD.empty() stSection = "TLDS" try liTotalTLD = longInt(cmReadINI(stTLDFileName,stSection,"LastTLDId")) onFail liTotalTLD = 0 errorClear() endTry switch case liTotalTLD > 0 : for liTLDID from 1 to liTotalTLD stTLD = lower(cmReadINI(stTLDFileName,stSection,strval(liTLDID))) switch case stTLD.isBlank() = False : arTLD.addLast(stTLD) endSwitch endFor endSwitch endProc Proc cmReadINI(stINIFile String,stSection String,stKey String) String return readProfileString(stINIFile, stSection, stKey) endProcEach sub-domain and TLD will also be scanned to see that only the characters 0-9, a-z and dash (-) are found. Proc cmDomainValidCharacters(var stDomain String) Logical ; ; Check that a domain or sub domain contains only the ; characters a-z, 0-9 or - ; var loReturn Logical liIndex LongInt stChar String endVar loReturn = True for liIndex from 1 to stDomain.sizeEx() stChar = stDomain.substr(liIndex,1) switch case (stChar >= "a" and stChar <= "z") or (stChar >= "0" and stChar <= "9") or stChar = "-" : otherwise : liIndex = stDomain.sizeEx() loReturn = False endSwitch endFor return loReturn endProcOur validator will also provide for validation with a TLD required and also where TLD validation is optional – i.e. myname@mydomain which is a valid email address if TLD validation is optional. This flexibility is useful when internal email networks assume and append a TLD. Our domain validation procedure looks like this: Proc cmDomainValidation( var stEmailDomain String, var arSubDomains arString, var stTopLevelDomain String, var arTLD arString, var liError LongInt, loSubDomainsRequired Logical) Logical ; ; Validate the domain portion of an email address ; ; Rules are: ; ; 1. The domain is split into one or more sub-domains ; 2. Sub domains are always separated by a period ('.') ; 3. Sub domain valid character set is a-z, 0-9 and - ; 4. Each sub domain must be at least one character is size ; 5. The last sub domain might be a top level domain name ; var loReturn Logical stDomain String arDomains arString liIndex LongInt stAny String liTotalDomains LongInt endVar loReturn = True arSubDomains.empty() stTopLevelDomain = blank() ; ; Check if the domain ends with a period ; switch case stEmailDomain.substr(stEmailDomain.sizeEx(),1) = "." : loReturn = False liError = 8 otherwise : ; ; Put raw domain address portion in lower case to allow ; easier validation ; stDomain = stEmailDomain.lower() stDomain.breakApart(arDomains,".") liTotalDomains = arDomains.size() for liIndex from 1 to liTotalDomains stAny = arDomains[liIndex] switch ; ; If an empty string is found, there must have ; been two or more consecutive periods in the ; domain address portion ; case stAny.isBlank() = True : liIndex = liTotalDomains loReturn = False liError = 5 ; ; Validate domain character set values ; case cmDomainValidCharacters(stAny) = False : liIndex = liTotalDomains loReturn = False liError = 6 ; ; If this is last possible sub domain, treat it as ; a possible top level ; case liIndex = liTotalDomains : stTopLevelDomain = stAny ; ; Save sub domain ; otherwise : arSubDomains.addLast(stAny) endSwitch endFor ; ; Validate against TLD list ; switch case loReturn = False : case arSubDomains.size() = 0 and loSubDomainsRequired = True : loReturn = False liError = 7 otherwise : loReturn = arTLD.contains(stTopLevelDomain) liError = iif(loReturn = True,0,8) endSwitch endSwitch return loReturn endProcNow that we have all the individual validation procedures defined, here is an example of how they might be used. method ValidationErrorDescription(var liError LongInt) String ; ; Return validation error description ; var stErrorMessage String endVar switch case liError = 1 : stErrorMessage = "No email address was found." case liError = 2 or liError = 4 : stErrorMessage = "Email address domain is missing." case liError = 3 : stErrorMessage = "Email address account is missing." case liError = 5 : stErrorMessage = "Email address domain cannot be blank." case liError = 6 : stErrorMessage = "Email address domain can only use a-z, 0-9 and -." case liError = 7 : stErrorMessage = "Email address sub domain is missing." case liError = 8 : stErrorMessage = "Email address top level domain is invalid." otherwise : stErrorMessage = "Unknown (" + strval(liError) + ") email address validation error" endSwitch return stErrorMessage endMethodPutting everything together: var stEmailAddress String stEmailAccount String stEmailDomain String arSubDomains arString stTopLevelDomain String arTLD arString liError LongInt endVar cmRetrieveTLDValidators(arTLD,":WORK:TLD.ini") stEmailAddress = "myname@myisp.net" switch case EmailAddressValidation(stEmailAddress, stEmailAccount, arSubDomains, stTopLevelDomain, arTLD, liError, True) = False : msgStop("Email address validation failed", stEmailAddress + "\n\n" + ValidationErrorDescription(liError)) otherwise : msgInfo(stEMailAddress,"Email address format validated successfully") endSwitch Conclusion We now have methods that provide basic syntax validation of email addresses. Use the parts that work for you. If you add or improve to what is shown here, share it with the rest of us. From my Paradox toolbox to yours! Rick Kelly Discussion of this article |
![]() Feedback | Paradox Day | Who Uses Paradox | I Use Paradox | Downloads ![]() |
|
![]() The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation. Paradox is a registered trademark of Corel Corporation. ![]() |
|
![]() Modified: 09 Jun 2004 Terms of Use / Legal Disclaimer ![]() |
![]() Copyright © 2001- 2004 Paradox Community. All rights reserved. Company and product names are trademarks or registered trademarks of their respective companies. Authors hold the copyrights to their own works. Please contact the author of any article for details. ![]() |
![]() |
|