Paradox Community
Search:

 Welcome |  What is Paradox |  Paradox Folk |  Paradox Solutions |
 Interactive Paradox |  Paradox Programming |  Internet/Intranet Development |
 Support Options |  Classified Ads |  Wish List |  Submissions 


Paradox Programming Articles  |  Beyond Help Articles  |  Tips & Tricks Articles  


Manipulating Strings with ObjectPAL
© 2002 Al Breveleri


Previous Section: Part 3: Parsing Grammatically


4. Replacing Parts

One of the most common string operations is to replace every instance of a specified substring with a substitute substring. Here are several approaches to this delightful practice.

4.1. Working In a String Variable

This algorithm is the fastest but will crash if the string represented by 'XXX' is a substring of that represented by 'YYY'.

Listing 10: Replacing all occurrences of the constant string 'XXX' with the constant string 'YYY'

proc REPL_XXX_YYY( const asSUBJ string ) string
var
  psTEMP, psBGN, psEND   string
endvar
  psTEMP = asSUBJ
  while psTEMP'advMatch("^(..)XXX(..)$",psBGN,psEND)
    psTEMP = psBGN + "YYY" + psEND
  endwhile
  return psTEMP
endproc

This algorithm works correctly even if 'X' is a substring of 'YYY' but the match string can be only one char long.

Listing 11: Replacing all occurrences of the constant string 'X' with 'the constant string YYY' where 'X' is one character

proc REPL_X_YYY( const asSUBJ string ) string
var
  prTOK    array [] string
  psTEMP   string
  II       longint
endvar
  breakApart(asSUBJ+"X",prTOK,"X")
  psTEMP = prTOK[1]
  for II from 2 to size(prTOK)
    psTEMP = psTEMP + "YYY" + prTOK[II]
  endfor
  return psTEMP
endproc

The fastest way to replace all matches in the general case is with a recursive procedure.

Listing 12: Replacing all occurrences of the constant string 'XXX' with the constant string 'YYYXXXZZZ' for the general case, where 'XXX' is more than one character and 'XXX' is a substring of 'YYYXXXZZZ'

proc REPL_XXX_YYYXXXZZZ( const asSUBJ string ) string
var
  psBGN, psEND   string
endvar
  if asSUBJ'advMatch("^(..)XXX(..)$",psBGN,psEND) then
    return REPL_XXX_YYYXXXZZZ(psBGN) + "YYYXXXZZZ" + REPL_XXX_YYYXXXZZZ(psEND)
  else
    return asSUBJ
  endif
endproc

Only the general case can safely accept arbitrary strings as match and replacement arguments.

Listing 13: Replacing all occurrences of <asOLD> with <asNEW> in the general case, where <asOLD> might be more than one character and <asOLD> might be a subset of <asNEW>

proc REPL_STR_1( const asSUBJ string, const asOLD string, const asNEW string ) string
var
  psBGN, psEND   string
endvar
  if asSUBJ'advMatch("^(..)"+asOLD+"(..)$", psBGN,psEND) then
    return REPL_STR_1(psBGN,asOLD,asNEW) + asNEW + REPL_STR_1(psEND,asOLD,asNEW)
  else
    return asSUBJ
  endif
endproc

Both execution time and stack space can be saved by not passing the unvarying 'asOLD' and 'asNEW' parameters to each recursive call.

Listing 14: Slightly faster version of above

; requires global var definitions: ;var ; gsPATT, gsREPL string ;endvar proc REPL_STR_2( const asSUBJ string, const asOLD string, const asNEW string ) string gsPATT = "^(..)"+asOLD+"(..)$" gsREPL = asNEW return REPL_RCRS( asSUBJ ) endproc proc REPL_RCRS( const asTEMP string ) string var psBGN, psEND string endvar if asTEMP'advMatch(gsPATT,psBGN,psEND) then return REPL_RCRS(psBGN) + gsREPL + REPL_RCRS(psEND) else return asTEMP endif endproc
4.2. Working In a TextStream File

Replacement in a textstream file can only be effected by copying the file, except in the very special case (not discussed here) where the match string and the replacement string are exactly the same length.

Listing 15: General case substring replacer for textstream (copies tsSRC to tsDST)

; assuming gtsSRC has been opened globally
; as the input textstream and gtsDST has been created
; globally as the output textstream
proc REPL_FILE( const asOLD string, const asNEW string )
var
  psBFFR                        string
  piANCHOR, piFNDBGN, piFNDEND  longint
  piREMAINING                   longint
endvar
  piFNDEND = 1   ; start from beginning of input file
  while true
    piANCHOR = piFNDEND
    piFNDBGN = piANCHOR
    if not gtsSRC'advMatch(piFNDBGN,piFNDEND,asOLD) then
      ; If a match is found, advMatch sets piFNDBGN to
      ; point to the first char of the match and piFNDEND
      ; to point to the first char after it.  If a match
      ; is not found, the next two statements set piFNDBGN
      ; and piFNDEND to point to the first char after the
      ; end of the file.
      piFNDBGN = size(gtsSRC)+1
      piFNDEND = piFNDBGN
    endif
    ; Now, piANCHOR, piFNDBGN, and piFNDEND can be
    ; compared to determine what was found, and what
    ; action should be taken:
    ; case               piFNDBGN     piFNDEND      action
    ; ----------         ----------   ----------    ----------
    ; no text, no match  = piANCHOR   = piFNDBGN    quit (end of file)
    ; no text but match  = piANCHOR   end of match  process match
    ; text but no match  end of text  = piFNDBGN    process unmatched text
    ; text and match     end of text  end of match  process text and match
    ; ----------         ----------   ----------    ----------
  ; quit if piFNDBGN=piANCHOR and piFNDEND=piFNDBGN
  if piFNDEND=piANCHOR then quitloop endif

    if piFNDBGN<>piANCHOR then
      ; unmatched text found
      ; copy the unmatched text
      gtsSRC'setPosition(piANCHOR)
      piREMAINING = piFNDBGN-piANCHOR
      while piREMAINING>0
        gtsSRC'readChars(psBFFR, int(min(piREMAINING,32767)))
        gtsDST'writeString(psBFFR)
        piREMAINING = piREMAINING-32767
      endwhile
    endif
    if piFNDEND<>piFNDBGN then    ; match found
      ; replace the matched text
      gtsDST'writeString(asNEW)
    endif
  endwhile
endproc


5. Building Long Strings

5.1. Building Into a String Variable

Listing 16: Concatenating the contents of a string array gasTOKENS[]

; assuming gasTOKENS is a string array to be
; concatenated together
proc CONCAT_TOKENS_1() string
var psRESULT string II longint endvar
  psRESULT = blank()
  for II from 1 to size(gasTOKENS)
    psRESULT = psRESULT + gasTOKENS[II]
  endfor
  return psRESULT
endproc

The ObjectPAL RTL string package apparently creates or keeps a pool of 4KB string buffers for operations such as substring extraction and string concatenation. Operations involving a string longer than 4KB involve one or more extra trips to the Windows global dynamic memory manager which IS WRITTEN IN 16 BIT BASIC and is not very efficient. The actual limit seems to be 4095 chars, probably to allow for an empty string.

This program concatenates smaller strings into a temporary buffer, and concatenates with the longer string only when the next buffer concatenation would have been over 4095 chars anyway.

Listing 17: This is faster in the general case, where the ultimate result may be longer than 4095 chars

; assuming gasTOKENS is a string array to be
; concatenated together
proc CONCAT_TOKENS_2() string
var
  psRESULT, psBUFFER          string
  piBFFRSIZE, piITEMSIZE, II  longint
endvar
  psRESULT = blank()
  psBUFFER = blank()
  piBFFRSIZE = 0
  for II from 1 to size(gasTOKENS)
    piITEMSIZE = size(gasTOKENS[II])
    if (piBFFRSIZE+piITEMSIZE)>4095 then
      if piITEMSIZE>4095 then
        psRESULT = psRESULT + psBUFFER + gasTOKENS[II]
        psBUFFER = blank()
        piBFFRSIZE = 0
      else
        psRESULT = psRESULT + psBUFFER
        psBUFFER = gasTOKENS[II]
        piBFFRSIZE = piITEMSIZE
      endif
    else
      psBUFFER = psBUFFER + gasTOKENS[II]
      piBFFRSIZE = piBFFRSIZE+piITEMSIZE
    endif
  endfor
  return psRESULT + psBUFFER
endproc

Listing 18: Similar to above but adds a specified separator string

; assuming gasTOKENS is a string array to be
; concatenated together
proc MENDTOGETHER( const asSEP string ) string
var  
  psRESULT, psBUFFER                     string
  piBFFRSIZE, piSEPSIZE, piITEMSIZE, II  longint
endvar
  psRESULT = blank()
  psBUFFER = blank()
  piBFFRSIZE = 0
  piSEPSIZE = size(asSEP)
  if size(gasTOKENS)>0 then psBUFFER = gasTOKENS[1] endif
  for II from 2 to size(gasTOKENS)
    piITEMSIZE = size(gasTOKENS[II])
    if (piBFFRSIZE+piSEPSIZE+piITEMSIZE)>4095 then
      if (piSEPSIZE+piITEMSIZE)>4095 then
        if piITEMSIZE>4095 then
          psRESULT = psRESULT+psBUFFER+asSEP+gasTOKENS[II]
          psBUFFER = blank()
          piBFFRSIZE = 0
        else
          psRESULT = psRESULT + psBUFFER + asSEP
          psBUFFER = gasTOKENS[II]
          piBFFRSIZE = piITEMSIZE
        endif
      else
        psRESULT = psRESULT + psBUFFER
        psBUFFER = asSEP + gasTOKENS[II]
        piBFFRSIZE = piSEPSIZE+piITEMSIZE
      endif
    else
      psBUFFER = psBUFFER + asSEP + gasTOKENS[II]
      piBFFRSIZE = piBFFRSIZE+piSEPSIZE+piITEMSIZE
    endif
  endfor
  return psRESULT + psBUFFER
endproc

5.2. Building Into a Textstream File

If your processor speed, RAM size, and disk size are roughly in balance, or if you have increased your file buffer pool size, it is faster to build a long string in a text file using textstream than it is to build it in memory using a string variable.

If you are constructing a long string that will end up in a file anyway, try to design your program to create the file directly. This is the usual case when emitting dynamic web pages.

The syntax for writeString() allows you cite an array in the arg list to the writeString() method. This will write out all the elements in the array, but each one will be written on a separate line. To fully control the concatenation of the elements, you must iterate the array.

Listing 19: Concatenating the contents of a string array gasTOKENS[] directly into a text file

; assuming gasTOKENS is a string array to be
; concatenated together assuming gtsDST has been opened
; globally as the output textstream
proc CONCAT_TOKENS_WRITE()
var II longint endvar
  for II from 1 to size(gasTOKENS)
    gtsDST'writeString( gasTOKENS[II] )
  endfor
endproc

Use this code if you want to add a separator character between the elements.

Listing 20: Similar to above but adds a specified separator string

; assuming gasTOKENS is a string array to be
; concatenated together assuming gtsDST has been opened
; globally as the output textstream
proc MENDTOGETHER_WRITE( const asSEP string )
var II longint endvar
  gtsDST'writeString( gasTOKENS[1] )
  for II from 2 to size(gasTOKENS)
    gtsDST'writeString( asSEP, gasTOKENS[II] )
  endfor
endproc



Discussion of this article


 Feedback |  Paradox Day |  Who Uses Paradox |  I Use Paradox |  Downloads 


 The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation.
 Paradox is a registered trademark of Corel Corporation.


 Modified: 15 May 2003
 Terms of Use / Legal Disclaimer


 Copyright © 2001- 2003 Paradox Community. All rights reserved. 
 Company and product names are trademarks or registered trademarks of their respective companies. 
 Authors hold the copyrights to their own works. Please contact the author of any article for details.