![]() |
![]() |
|
![]() |
Manipulating Strings with ObjectPAL © 2002 Al Breveleri Extensive thanks to Liz for her tireless editing and excellent suggestions. 1. General Considerations You can operate on strings stored in string variables or in text files. The tool collections available for these two locations are different. Neither location is best for all operations and applications. 1.1. String and File, Match and AdvMatch Many developers are initially confused by the names given to the three pattern matching methods, and the different effects of similar patterns when used with the three methods. The three methods are (1) string.match(), (2) string.advMatch(), and (3) textstream.advMatch(). It so happens there are some similarities to be found, but the methods should be learned and used separately. Following are some notes on these methods. This is an adjunct to the online help, which you should read first. 1.1.1. string.match( <patternString> [ , <receiverVars> ] ) logical Pattern elements: Only wildcards '..' and '@' may be used. 1.1.2. string.advMatch( <patternString> [ , <receiverVars> ] ) logical Pattern elements: A general expression pattern syntax is used. (Textstream.advMatch() also uses this syntax.) 1.1.3. teststream.advMatch( <bgnPosnLongint>, <endPosnLongint>, <patternString> ) logical Pattern elements: A general expression pattern syntax is used. (String.advMatch() also uses this syntax.)1.2. Breakapart Forever An excellent way to break a text record into fields, or a sentence into words, is with the breakApart() method. Use this where a string can be divided into a list of substrings in a obvious way. BreakApart() puts the substrings into a string array. BreakApart() discards the separator characters found in the subject. The destination array is filled with the strings found between but not including the separator characters. BreakApart() exhibits one peculiar behavior. If the subject string ends with a separator character, this is taken to be a terminator rather than a separator, and no corresponding final element is generated in the target array. This can be quite a nuisance when, for example, parsing a comma-separated variable-length text record. If the last field is blank, the text line ends with a comma, and the resulting array gets no blank element for the last field. An efficient way to deal with this behavior is to append a copy of the separator character to the end of the subject string before applying breakApart(). If the subject originally ended with a separator, appending another will force a final element. If the subject did not end with a separator, appending one will have no effect. I use this construct: breakApart( <SubjectString>+<SeparatorChar>, <TargetArray>, <SeparatorChar> )A restriction on breakApart() is that breaks must be at a single character. You cannot specify a multicharacter substring as a break criterion. A multicharacter criterion argument will be taken as a set of break characters, and the string will be broken at every instance of each character. With a multicharacter break criterion argument, when appending the separator character to the end of the subject string before applying breakApart(), you must be careful not to append the entire string. Use substr() to extract the first character. Of course, to use substr() on an arg string, you must insure that the string is not blank. You can take advantage of the fact that the default separator is a space: sTemp=iif(<SeparatorChars>=blank()," ",<SeparatorChars>) breakApart( <SubjectString>+substr(sTemp,1,1), <TargetArray>, sTemp )Another restriction is the fact that there is no corresponding mendTogether() method. You can assemble a string from an array of strings only by repetitive concatenation, which is a slow operation. Some techniques for speeding this up are given below. 2. Searching In A String 2.1. Searching for a Substring When searching for a simple substring, the best tool to use is the eponymous string.search() method, which returns the starting position of the first occurrence of a substring within a string. Since legitimate string positions begin with 1 in ObjectPAL, string.search() can indicate failure by returning a zero, and no separate success flag is needed. Here is an example of a simple slow way to replace all occurrences in a string. This is just to show the use of string.search() and is definitely not the best technique for the replacement task. Listing 1: Replacing all occurrences of <asOLD> with <asNEW> in the general case, where <asOLD> might be more than one character and <asOLD> might be a subset of <asNEW>
proc REPL_STR_EXIG( const asXXX string, const asOLD string, const asNEW string ) string ; return <asXXX> with every instance of <asOLD> replaced ; by <asNEW> replacement is effected by copying the string ; one piece at a time from psEND to psBGN, where a 'piece' ; is a section of text between matches. var psBGN, psEND string piSIZ, piLOC longint endvar ; record length of asOLD to simplify the code and reduce RTL calls piSIZ = size(asOLD) ; preset the source and destination string buffers psBGN = blank() psEND = asXXX ; loop to copy and replace all instances while true ; attempt to find next match of asOLD in source buffer piLOC = psEND'search(asOLD) ; if there is no next match then exit the loop if piLOC=0 then quitloop endif ; copy any text before the match into the destination buffer if piLOC>1 then psBGN = psBGN + substr(psEND,1,piLOC-1) endif ; copy a replacement into the destination buffer psBGN = psBGN + asNEW ; string.search() always starts from the beginning of ; the subject string, so we need to remove any text ; before the match, and the match itself, from the ; source buffer, before trying for the next match if (piLOC+piSIZ)>size(psEND) then psEND = blank() else psEND = substr(psEND, piLOC+piSIZ, size(psEND)-piLOC-piSIZ+1) endif endwhile ; when no more matches are found, the remainder of the text is in psEND return psBGN + psEND endproc 2.2. Searching for a Pattern Both string.match() and string.advMatch() will search for patterns. The pattern construction grammar is quite different between the two, as is the rule for assigning matched data to receiver variables. AdvMatch() can be made to do anything that match() can do. The only reason for using match() is that, for the things it can do, its patterns are simpler than the corresponding advMatch() patterns. Rather than try to retain two pattern grammars in my head, I decided to just use advMatch() for all string pattern matching. You might decide differently. Here is an example of two statements that test the last line of a USA mail address for the 'city, state zipcode' pattern, and recover the three parts if found. With string.match() if psSUBJ'match( ".., .. ..", psCITY, psST, psZIP) then ; psCITY has City part ; psST has State abbreviation part ; psZIP has ZipCode part else ; it's not a postal address last line endifWith string.advMatch() if psSUBJ'advMatch( "(..), (..) (..)", psCITY, psST, psZIP) then ; psCITY has City part ; psST has State abbreviation part ; psZIP has ZipCode part else ; it's not a postal address last line endifNote how much simpler the first pattern is. However, suppose you then decide to test for an address last line and simultaneously extract the state and zipcode parts into one variable. The 'match' pattern can be modified to do one or the other, but not both. But the 'advmatch' statement can just be changed to psSUBJ'advMatch( "(..), (.. ..)", psCITY, psPOST) .with psPOST getting the state abbreviation and zip code parts. Part 3: Parsing Grammatically Discussion of this article |
![]() Feedback | Paradox Day | Who Uses Paradox | I Use Paradox | Downloads ![]() |
|
![]() The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation. Paradox is a registered trademark of Corel Corporation. ![]() |
|
![]() Modified: 15 May 2003 Terms of Use / Legal Disclaimer ![]() |
![]() Copyright © 2001- 2003 Paradox Community. All rights reserved. Company and product names are trademarks or registered trademarks of their respective companies. Authors hold the copyrights to their own works. Please contact the author of any article for details. ![]() |
![]() |
|