Paradox Community
Search:

 Welcome |  What is Paradox |  Paradox Folk |  Paradox Solutions |
 Interactive Paradox |  Paradox Programming |  Internet/Intranet Development |
 Support Options |  Classified Ads |  Wish List |  Submissions 


Paradox Programming Articles  |  Beyond Help Articles  |  Tips & Tricks Articles  


Manipulating Strings with ObjectPAL
© 2002 Al Breveleri

Extensive thanks to Liz for her tireless editing and excellent suggestions.


1. General Considerations

You can operate on strings stored in string variables or in text files. The tool collections available for these two locations are different. Neither location is best for all operations and applications.

1.1. String and File, Match and AdvMatch

Many developers are initially confused by the names given to the three pattern matching methods, and the different effects of similar patterns when used with the three methods. The three methods are (1) string.match(), (2) string.advMatch(), and (3) textstream.advMatch(). It so happens there are some similarities to be found, but the methods should be learned and used separately.

Following are some notes on these methods. This is an adjunct to the online help, which you should read first.
1.1.1. string.match( <patternString> [ , <receiverVars> ] ) logical
Pattern elements: Only wildcards '..' and '@' may be used.

Match extent: The entire pattern must match the entire string.

Extraction assignment: The part of the string matched by each wildcard is copied to a receiver variable.
1.1.2. string.advMatch( <patternString> [ , <receiverVars> ] ) logical
Pattern elements: A general expression pattern syntax is used. (Textstream.advMatch() also uses this syntax.)

Match extent: The entire string is examined. The entire pattern must match some part (or all) of the string. String.advMatch() finds the last (rightmost) match but tries to match as much subject as possible. Syntax elements are provided to force matching from the beginning or to the end of the string.

Extraction assignment: The syntax allows a pattern to be constructed of groups. The part of the string matched by each group is copied to a receiver variable.
1.1.3. teststream.advMatch( <bgnPosnLongint>, <endPosnLongint>, <patternString> ) logical
Pattern elements: A general expression pattern syntax is used. (String.advMatch() also uses this syntax.)

Match extent: The file is examined from a specified starting position to the end of the file. The entire pattern must match some part (or all) of the file. TextStream.advMatch() finds the first (leftmost) match but tries to match as much subject as possible.

Extraction assignment: A successful match returns the beginning and ending file positions of the matched string. You must construct any required extraction using textstream.setPosition() and textstream.readChars().
1.2. Breakapart Forever

An excellent way to break a text record into fields, or a sentence into words, is with the breakApart() method. Use this where a string can be divided into a list of substrings in a obvious way. BreakApart() puts the substrings into a string array.

BreakApart() discards the separator characters found in the subject. The destination array is filled with the strings found between but not including the separator characters.

BreakApart() exhibits one peculiar behavior. If the subject string ends with a separator character, this is taken to be a terminator rather than a separator, and no corresponding final element is generated in the target array. This can be quite a nuisance when, for example, parsing a comma-separated variable-length text record. If the last field is blank, the text line ends with a comma, and the resulting array gets no blank element for the last field.

An efficient way to deal with this behavior is to append a copy of the separator character to the end of the subject string before applying breakApart(). If the subject originally ended with a separator, appending another will force a final element. If the subject did not end with a separator, appending one will have no effect. I use this construct:
  breakApart( <SubjectString>+<SeparatorChar>, <TargetArray>, <SeparatorChar> )
A restriction on breakApart() is that breaks must be at a single character. You cannot specify a multicharacter substring as a break criterion. A multicharacter criterion argument will be taken as a set of break characters, and the string will be broken at every instance of each character.

With a multicharacter break criterion argument, when appending the separator character to the end of the subject string before applying breakApart(), you must be careful not to append the entire string. Use substr() to extract the first character. Of course, to use substr() on an arg string, you must insure that the string is not blank. You can take advantage of the fact that the default separator is a space:
  sTemp=iif(<SeparatorChars>=blank()," ",<SeparatorChars>)
  breakApart( <SubjectString>+substr(sTemp,1,1), <TargetArray>, sTemp )
Another restriction is the fact that there is no corresponding mendTogether() method. You can assemble a string from an array of strings only by repetitive concatenation, which is a slow operation. Some techniques for speeding this up are given below.


2. Searching In A String

2.1. Searching for a Substring

When searching for a simple substring, the best tool to use is the eponymous string.search() method, which returns the starting position of the first occurrence of a substring within a string. Since legitimate string positions begin with 1 in ObjectPAL, string.search() can indicate failure by returning a zero, and no separate success flag is needed.

Here is an example of a simple slow way to replace all occurrences in a string. This is just to show the use of string.search() and is definitely not the best technique for the replacement task.

Listing 1: Replacing all occurrences of <asOLD> with <asNEW> in the general case, where <asOLD> might be more than one character and <asOLD> might be a subset of <asNEW>
proc REPL_STR_EXIG( const asXXX string, const asOLD string, const asNEW string ) string
; return <asXXX> with every instance of <asOLD> replaced
; by <asNEW> replacement is effected by copying the string
; one piece at a time from psEND to psBGN, where a 'piece'
; is a section of text between matches.
var
  psBGN, psEND  string
  piSIZ, piLOC  longint
endvar
; record length of asOLD to simplify the code and reduce RTL calls
piSIZ = size(asOLD)
; preset the source and destination string buffers
psBGN = blank()
psEND = asXXX
; loop to copy and replace all instances
while true
  ; attempt to find next match of asOLD in source buffer
  piLOC = psEND'search(asOLD)
; if there is no next match then exit the loop
if piLOC=0 then quitloop endif
  ; copy any text before the match into the destination buffer
  if piLOC>1 then
    psBGN = psBGN + substr(psEND,1,piLOC-1)
  endif
  ; copy a replacement into the destination buffer
  psBGN = psBGN + asNEW
  ; string.search() always starts from the beginning of
  ; the subject string, so we need to remove any text
  ; before the match, and the match itself, from the
  ; source buffer, before trying for the next match
  if (piLOC+piSIZ)>size(psEND) then
    psEND = blank()
  else
    psEND = substr(psEND, piLOC+piSIZ, size(psEND)-piLOC-piSIZ+1)
  endif
endwhile
; when no more matches are found, the remainder of the text is in psEND
return psBGN + psEND
endproc


2.2. Searching for a Pattern

Both string.match() and string.advMatch() will search for patterns. The pattern construction grammar is quite different between the two, as is the rule for assigning matched data to receiver variables. AdvMatch() can be made to do anything that match() can do. The only reason for using match() is that, for the things it can do, its patterns are simpler than the corresponding advMatch() patterns.

Rather than try to retain two pattern grammars in my head, I decided to just use advMatch() for all string pattern matching.

You might decide differently. Here is an example of two statements that test the last line of a USA mail address for the 'city, state zipcode' pattern, and recover the three parts if found.

With string.match()
  if psSUBJ'match( ".., .. ..", psCITY, psST, psZIP) then
    ; psCITY has City part
    ; psST has State abbreviation part
    ; psZIP has ZipCode part
  else
    ; it's not a postal address last line
  endif
With string.advMatch()
  if psSUBJ'advMatch( "(..), (..) (..)", psCITY, psST, psZIP) then
    ; psCITY has City part
    ; psST has State abbreviation part
    ; psZIP has ZipCode part
  else
    ; it's not a postal address last line
  endif
Note how much simpler the first pattern is.

However, suppose you then decide to test for an address last line and simultaneously extract the state and zipcode parts into one variable. The 'match' pattern can be modified to do one or the other, but not both. But the 'advmatch' statement can just be changed to

psSUBJ'advMatch( "(..), (.. ..)", psCITY, psPOST).

with psPOST getting the state abbreviation and zip code parts.


Part 3: Parsing Grammatically


Discussion of this article


 Feedback |  Paradox Day |  Who Uses Paradox |  I Use Paradox |  Downloads 


 The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation.
 Paradox is a registered trademark of Corel Corporation.


 Modified: 15 May 2003
 Terms of Use / Legal Disclaimer


 Copyright © 2001- 2003 Paradox Community. All rights reserved. 
 Company and product names are trademarks or registered trademarks of their respective companies. 
 Authors hold the copyrights to their own works. Please contact the author of any article for details.