Paradox Community
Search:

 Welcome |  What is Paradox |  Paradox Folk |  Paradox Solutions |
 Interactive Paradox |  Paradox Programming |  Internet/Intranet Development |
 Support Options |  Classified Ads |  Wish List |  Submissions 


Paradox Programming Articles  |  Beyond Help Articles  |  Tips & Tricks Articles  


Email Address Validation
© 2004 Rick Kelly
www.crooit.com

Preface

The example OPAL (Paradox® 9) presented in this article is available as a download here. After downloading into the folder of your choice, make that folder :WORK: and run the included script for a demonstration.


Introduction

An interesting aspect of email addresses is that there does not exist an established set of rules for validation. In developing an OPAL based validator, there is a balancing act between how tight or how open the validation process will be. The approach taken here is not the only methodology that could be applied and the process outlined does have a solid technical foundation based on the published RFC standards for SMTP and POP3 mail protocols. Note that the validation is for syntax only and a logical followup would be to connect to the domain mail server (MX record via DNS) for additional validity checking. This connection session does take time and a deployment decision by the developer is necessary to evaluate requirements and trade-offs.


Email Address Syntax

The basic email address syntax structure is:

<account@domain>

The bounding <> pair is optional and does not affect the general syntax and are stripped out if found.

A leading account, delimited by a @ character, precedes a domain. An account and domain are both required for a valid email address.

Account

After review of RFC 822 (http://www.faqs.org/rfcs/rfc822.html), it seems that the account portion can potentially contain a wide range of character values. Although there are some rules covering syntax, our validator will only ensure that some account is present and that it is terminated by the last @ character found. This means that the account itself could contain @ and we will have to design the account validator portion to take that into account.

The first steps for our validator are to separate the account and domain portions using the last @ separator found. The OPAL methods breakApart() and searchEx() will be the main agents. The first obstacle is to locate the last @ character. Since searchEx() scans from left to right, it seems that some sort of repetitive loop would be necessary. Rather than loop through, we will reverse the entire presented email address and one searchEx() will locate the correct @ character. In effect, after reversal, we are searching from right to left which is exactly what we want in this case. Along the way, we will check for missing or empty account and/or domain segments.

A generic string reversal procedure might look like:
Proc cmReverseString(var stInput String) String
;
; This function takes an input string and reverses it
;
var
  stOutput String
  liIndex  LongInt
  liSize   LongInt
endVar
  stOutput = blank()
  liSize = stInput.sizeEx()
  switch
    case liSize > 0 :
      for liIndex from liSize to 1 step -1
        stOutput = stOutput
          + stInput.substr(liIndex,1)
      endFor
  endSwitch
  return stOutput
endProc 
One additional feature in the validator will be the return of error codes of the LongInt Type that can be used to pinpoint the problem and build custom error messages.

Now that we have a string reversal procedure, the extraction and separation of the account and domain looks like:
Proc cmSeparateAccountAndDomain(
  var stEmailAddress String,
  var stEmailAccount String,
  var stEmailDomain String,
  var liError LongInt) Logical
;
; Given an email address, separate and return the
; account and domain portions.
;
; The leading account is separated from the domain portion
; by the rightmost @ character.
;
var
  loReturn   Logical
  stAny      String
  liPosition LongInt
endVar
  loReturn = False
  stEmailAccount = blank()
  stEmailDomain = blank()
;
; Strip leading and trailing white space
;
  stEmailAddress = stEmailAddress.rTrim()
  stEmailAddress = stEmailAddress.lTrim()
  switch
;
; Missing email address
;
    case stEmailAddress.isBlank() = True or
         stEmailAddress = "<>" :
      liError = 1
;
; @ account/domain separator found?
;
    case stEmailAddress.searchEx("@") = 0 :
      liError = 2
    otherwise :
;
; The address may be encapsulated by < and > and
; those will be removed if found and any leading
; or trailing white space removed.
;
      switch
        case stEmailAddress.substr(1,1) = "<" and
             stEmailAddress.substr(stEmailAddress.sizeEx(),1) = ">" :
          stEmailAddress = stEmailAddress.substr(2,stEmailAddress.sizeEx() - 2)
          stEmailAddress = stEmailAddress.rTrim()
          stEmailAddress = stEmailAddress.lTrim()
      endSwitch
;
; To determine the position of the last @ character, we
; will reverse the string, locate the first @ character
; and calculate the position.
;
      stAny = cmReverseString(stEmailAddress)
      liPosition = stAny.searchEx("@")
      liPosition = stAny.sizeEx() - liPosition + 1
      switch
;
; If the @ character is at the end or beginning of the address,
; the address is invalid
;
        case liPosition = 1 :
          liError = 3
        case liPosition = stAny.sizeEx() :
          liError = 4
        otherwise :
          stEmailAccount = stEmailAddress.substr(1,liPosition - 1)
          stEmailDomain = stEmailAddress.substr(liPosition + 1,
                          stEmailAddress.sizeEx() - liPosition)
          loReturn = True
      endSwitch
  endSwitch
  return loReturn
endProc
Domain

At this point, we have separated the account and domain address components and validated that both are available. Any leading or trailing white space and/or encapsulating <> characters have been removed. The domain is where a majority of the validation work is performed.

The domain consists of one or more sub-domains separated by one dot (.) each of which must contains one or more of the allowable characters 0-9, a-z and dash (-). The right most sub domain found is considered to be a top level domain (TLD).

Domain Syntax:

subdomain.subdomain.TLD

Each "subdomain" shown above is optional, only the TLD is required.

The Internet Corporation for Assigned Names and Numbers (ICANN) maintains a list of TLD’s at:

http://www.icann.org/tlds/

To maximize flexibility for our validator, we will use a standard windows profile (*.ini) file to store TLD’s for validation that can be customized and/or maintained for each application reference.

[TLDS]
LastTLDId=257
1=ac
2=ad
...
257=biz

The TLD’s are loaded into an Array of type String using the following procedure.
Type
 arString  = Array[] String
endType

Proc cmRetrieveTLDValidators(var arTLD arString,
                            stTLDFileName String)
;
; Parse the TLD.ini profile file for valid TLD's
; to validate against
;
var
  liTotalTLD LongInt
  liTLDID    LongInt
  stSection  String
  stTLD      String
endVar
  arTLD.empty()
  stSection = "TLDS"
  try
    liTotalTLD = longInt(cmReadINI(stTLDFileName,stSection,"LastTLDId"))
  onFail
    liTotalTLD = 0
    errorClear()
  endTry
  switch
    case liTotalTLD > 0 :
      for liTLDID from 1 to liTotalTLD
        stTLD = lower(cmReadINI(stTLDFileName,stSection,strval(liTLDID)))
        switch
          case stTLD.isBlank() = False :
            arTLD.addLast(stTLD)
        endSwitch
      endFor
  endSwitch
endProc
Proc cmReadINI(stINIFile String,stSection String,stKey String) String
  return readProfileString(stINIFile, stSection, stKey)
endProc
Each sub-domain and TLD will also be scanned to see that only the characters 0-9, a-z and dash (-) are found.
Proc cmDomainValidCharacters(var stDomain String) Logical
;
; Check that a domain or sub domain contains only the
; characters a-z, 0-9 or -
;
var
  loReturn Logical
  liIndex  LongInt
  stChar   String
endVar
  loReturn = True
  for liIndex from 1 to stDomain.sizeEx()
    stChar = stDomain.substr(liIndex,1)
    switch
      case (stChar >= "a" and stChar <= "z") or
           (stChar >= "0" and stChar <= "9") or
           stChar = "-" :
      otherwise :
        liIndex = stDomain.sizeEx()
        loReturn = False
    endSwitch
  endFor
  return loReturn
endProc
Our validator will also provide for validation with a TLD required and also where TLD validation is optional – i.e. myname@mydomain which is a valid email address if TLD validation is optional. This flexibility is useful when internal email networks assume and append a TLD.

Our domain validation procedure looks like this:
Proc cmDomainValidation(
  var stEmailDomain String,
  var arSubDomains arString,
  var stTopLevelDomain String,
  var arTLD arString,
  var liError LongInt,
  loSubDomainsRequired Logical) Logical
;
; Validate the domain portion of an email address
;
; Rules are:
;
; 1. The domain is split into one or more sub-domains
; 2. Sub domains are always separated by a period ('.')
; 3. Sub domain valid character set is a-z, 0-9 and -
; 4. Each sub domain must be at least one character is size
; 5. The last sub domain might be a top level domain name
;
var
  loReturn       Logical
  stDomain       String
  arDomains      arString
  liIndex        LongInt
  stAny          String
  liTotalDomains LongInt
endVar
  loReturn = True
  arSubDomains.empty()
  stTopLevelDomain = blank()
;
; Check if the domain ends with a period
;
  switch
    case stEmailDomain.substr(stEmailDomain.sizeEx(),1) = "." :
      loReturn = False
      liError = 8
    otherwise :
;
; Put raw domain address portion in lower case to allow
; easier validation
;
      stDomain = stEmailDomain.lower()
      stDomain.breakApart(arDomains,".")
      liTotalDomains = arDomains.size()
      for liIndex from 1 to liTotalDomains
        stAny = arDomains[liIndex]
        switch
;
; If an empty string is found, there must have
; been two or more consecutive periods in the
; domain address portion
;
          case stAny.isBlank() = True :
            liIndex = liTotalDomains
            loReturn = False
            liError = 5
;
; Validate domain character set values
;
          case cmDomainValidCharacters(stAny) = False :
            liIndex = liTotalDomains
            loReturn = False
            liError = 6
;
; If this is last possible sub domain, treat it as
; a possible top level
;
          case liIndex = liTotalDomains :
            stTopLevelDomain = stAny
;
; Save sub domain
;
          otherwise :
            arSubDomains.addLast(stAny)
        endSwitch
      endFor
;
; Validate against TLD list
;
      switch
        case loReturn = False :
        case arSubDomains.size() = 0 and loSubDomainsRequired = True :
          loReturn = False
          liError = 7
        otherwise :
          loReturn = arTLD.contains(stTopLevelDomain)
          liError = iif(loReturn = True,0,8)
      endSwitch
  endSwitch
  return loReturn
endProc
Now that we have all the individual validation procedures defined, here is an example of how they might be used.
method ValidationErrorDescription(var liError LongInt) String
;
; Return validation error description
;
var
  stErrorMessage  String
endVar
  switch
    case liError = 1 :
      stErrorMessage = "No email address was found."
    case liError = 2 or liError = 4 :
      stErrorMessage = "Email address domain is missing."
    case liError = 3 :
      stErrorMessage = "Email address account is missing."
    case liError = 5 :
      stErrorMessage = "Email address domain cannot be blank."
    case liError = 6 :
      stErrorMessage = "Email address domain can only use a-z, 0-9 and -."
    case liError = 7 :
      stErrorMessage = "Email address sub domain is missing."
    case liError = 8 :
      stErrorMessage = "Email address top level domain is invalid."
    otherwise :
      stErrorMessage = "Unknown ("
                     + strval(liError)
                     + ") email address validation error"
  endSwitch
  return stErrorMessage
endMethod
Putting everything together:
var
  stEmailAddress    String
  stEmailAccount    String
  stEmailDomain     String
  arSubDomains      arString
  stTopLevelDomain  String
  arTLD             arString
  liError           LongInt
endVar
  cmRetrieveTLDValidators(arTLD,":WORK:TLD.ini")
  stEmailAddress = "myname@myisp.net"
  switch
    case EmailAddressValidation(stEmailAddress,
                                stEmailAccount,
                                arSubDomains,
                                stTopLevelDomain,
                                arTLD,
                                liError,
                                True) = False :
      msgStop("Email address validation failed",
               stEmailAddress +
               "\n\n" +
               ValidationErrorDescription(liError))
    otherwise :
      msgInfo(stEMailAddress,"Email address format validated successfully")
  endSwitch

Conclusion

We now have methods that provide basic syntax validation of email addresses. Use the parts that work for you. If you add or improve to what is shown here, share it with the rest of us.

From my Paradox toolbox to yours!

Rick Kelly


Discussion of this article


 Feedback |  Paradox Day |  Who Uses Paradox |  I Use Paradox |  Downloads 


 The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation.
 Paradox is a registered trademark of Corel Corporation.


 Modified: 09 Jun 2004
 Terms of Use / Legal Disclaimer


 Copyright © 2001- 2004 Paradox Community. All rights reserved. 
 Company and product names are trademarks or registered trademarks of their respective companies. 
 Authors hold the copyrights to their own works. Please contact the author of any article for details.