![]() |
![]() |
|
![]() |
Base64 Encoding © 2003 Rick Kelly www.crooit.com Preface A library and test script of all OPAL methods presented is available here. Overview Base64 is a reversible encoding method that converts 8-bit data into 7-bit ASCII text. Each three bytes of the original data are divided into four 6-bit blocks that are represented by four 7-bit ASCII characters. This usually enlarges the file size by one third. It is used to transmit non-text files over Internet email and is typically used for mail attachments. The base64 alphabet contains 64 characters plus an equal sign ("=") that is used to indicate null characters in the last encoding block.
As an example of the encoding process, let's take a word familiar to all serious OPAL adherents. Liz Ansicodes: 76 (L), 105 (i), 122 (z) or hex codes 4C, 69, 7A In binary form this looks like: 01001100 01101001 01111010 Re-grouping these 24 bits into 4 groups of 6 bits yields: 010011 000110 100101 111010 or, in hex (padding each 6 bit group with two extra zeros on the left) 13, 6, 25, 3A or decimal 19, 6, 37, 58 Using the decimal values as indices into our base64 alphabet gives us: TGl6 Special processing is performed if fewer than 24 bits are available at the end of the String Type being encoded as base64 encoding always results in groups of 4 characters. When fewer than 24 input bits are available, zero bits are added (on the right) to form four 6-bit groups. Padding at the end of the encoding result is performed using the '=' character. Since all base64 encoding is always performed on three 8-bit groups, only the following cases can results:
As an example, let us take another well known word such as Tony, and walk through its base64 encoding. Using the same steps previously presented, the first input encoding group is: Ton which encodes to VG9u The second input encoding group is the single character: y Following the rule in (2) above: y = x'79' or 01111001 Re-grouping these 8 bits into two six bit groups and extending or padding the second group with zeros looks like: 011110 010000 or, in hex (padding each 6 bit group with two extra zeros on the left) 1E, 10 or decimal 30, 16 Using the decimal values as indices into our base64 alphabet us and padding with two "=" characters gives yields: eQ== The total base64 encoded value for Tony is: VG9ueQ== As one can notice, the output of base64 encoding is always 4 characters for every 3 input characters or approximately a 33% increase. Decoding is straight forward and it the exact reverse of the encoding process taking care to ignore any characters not contained in our base64 alphabet. One final note about the encoding process: the encoded output must be represented in lines of no more than 76 characters each before a CR/LF must be inserted in the encoded output stream. OPAL Methods and Procedures Encoding Breaking three 8-bit groups into four 6-bit groups involves shifting bits around. The basic steps are:
To isolate the first 6 bits of the first character, we need to shift the bits to the right two positions or divide by 4 and then force the two left most bits to zero just to be safe. For example (assume integer arithmetic) using Liz for the input block as previously discussed: Input: 01001100 or decimal 76 76 / 4 = 19 or 00010011 To use the last 2 bits of first character plus first 4 bits of second character involves:
Input: 01001100 01101001 or decimal 76 and 105 Clearing the first 6 bits of the first character gives us 00000000 or 0 when multiplied by 16 = 0 105 / 16 = 6; ignore remainder 0 + 6 = 6 or 00000110 To use the last 4 bits of the second character plus the first 2 bits of the third character involves:
Input: 01001100 01101001 01111010 or decimal 76, 105 and 122 Turn off the first 4 bits of the second character and shift left by 2 positions yields: 00001001 or 9 * 4 = 36 122 / 64 = 1; ignore remainder 36 + 1 = 37 or 00100101 To use the last 6 bits of the third character involves only turning off the two left most bits. For example: Input: 01001100 01101001 01111010 or decimal 76, 105 and 122 Turning off the first two bits of the third character yields: 00111010 or 58 Using the four decimal values (19,6,37,58) as indices into our base64 alphabet gives us: TGl6 Decoding Base64 decoding is done by essentially reversing the encoding procedure. Step 1 Lookup each base64 character in our alphabet. Using TGl6 for our encoding example above: T = 19, G = 6, l = 37, 6 = 58 Converted to binary, these 4 groups are: 00010011 00000110 00100101 00111010 Combining these 4 groups into 3 8-bit groups involves shifting bits around. Each character is treated as being 6 bits in size, the first two bits of each character are ignored. For our example, assume the bits in each character are numbered 1-8 from left to right. The basic steps are:
Input: 00010011 00000110 00100101 00111010 Shifting 00010011 (x'13' or decimal 19) left two bits using multiplication by 4: 19 * 4 = 76 Shifting the 00000110 (x'06' or decimal 6) right four bits using division by 16: 6 / 16 = 0; ignore remainder Add two results together: 76 + 0 = 76 To combine the last 4 bits of the second character and bits 3-6 of the third character involve:
Input: 00010011 00000110 00100101 00111010 Shifting 00000110 (x'06' or decimal 6) left four bits using multiplication by 16: 6 * 16 = 96 Shifting 00100101 (x'25' or decimal 37) right two bits using division by 4: 37 / 4 = 9; ignore remainder Add two results together: 96 + 9 = 105 To get the last two bits of the third character and bits 3-8 of the fourth character involve:
Input: 00010011 00000110 00100101 00111010 Shifting 00100101 (x'25' or decimal 37) left six bits using multiplication by 64: 37 * 64 = 2368 (x'0940') and dropping the high order byte (x'09') yields x'40' or 64 Add in the fourth character (00111010 x'3A' or decimal 58): 64 + 58 = 122 Taking our three results of 76, 105 and 122 and using the OPAL chr() function, we end up with a final result chr(76) + chr(105) + chr(122) or Liz. Following are the OPAL proc's that implement the core base64 encode and decode as just reviewed. Assume that stBase64EncodingTable = cnBase64EncodingTable. Const cnBase64Null = "=" cnBase64Invalid = -1 cnBase64EncodingTable = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" endConst Proc cmBase64DecodeBlock( stBase64Block String, var siChar1 SmallInt, var siChar2 SmallInt, var siChar3 SmallInt) ; ; Given a 4 character base64 block, return 3 character decode ; var siBase64Char1 SmallInt siBase64Char2 SmallInt siBase64Char3 SmallInt siBase64Char4 SmallInt endVar siChar1 = cnBase64Invalid siChar2 = cnBase64Invalid siChar3 = cnBase64Invalid siBase64Char1 = stBase64EncodingTable.search(stBase64Block.substr(1,1)) - 1 siBase64Char2 = stBase64EncodingTable.search(stBase64Block.substr(2,1)) - 1 siBase64Char3 = stBase64EncodingTable.search(stBase64Block.substr(3,1)) - 1 siBase64Char4 = stBase64EncodingTable.search(stBase64Block.substr(4,1)) - 1 switch case siBase64Char1 <> cnBase64Invalid : siChar1 = cmShiftLeft2(siBase64Char1) + cmBitAnd(cmShiftRight4(iif(siBase64Char2 <> cnBase64Invalid, siBase64Char2,0)),3) endSwitch switch case siBase64Char2 <> cnBase64Invalid : siChar2 = cmShiftLeft4(cmBitAnd(siBase64Char2,15)) + cmBitAnd(cmShiftRight2(iif(siBase64Char3 <> cnBase64Invalid, siBase64Char3,0)),15) endSwitch switch case siBase64Char3 <> cnBase64Invalid : siChar3 = cmShiftLeft6(cmBitAnd(siBase64Char3,3)) + iif(siBase64Char4 <> cnBase64Invalid,siBase64Char4,0) endSwitch endProc Proc cmBase64EncodeBlock( siChar1 SmallInt, siChar2 SmallInt, siChar3 SmallInt) String ; ; Given a three character block, return 4 character base64 encode ; var siIndex1 SmallInt siIndex2 SmallInt siIndex3 SmallInt siIndex4 SmallInt endVar ; ; Use first 6 bits of first character ; siIndex1 = cmBitAnd(cmShiftRight2(siChar1),63) ;63 = 0x3f (0011 1111) ; ; Use last 2 bits of first character + ; first 4 bits of second character ; siIndex2 = cmShiftLeft4(cmBitAnd(siChar1,3)) + cmBitAnd(cmShiftRight4(siChar2),15) ;15 = 0x0f (0000 1111) ; ; Use last 4 bits of second character + ; first 2 bits of third charcter ; siIndex3 = cmShiftLeft2( cmBitAnd(siChar2,15)) + cmBitAnd(cmShiftRight6(siChar3),3) ; ; Use last 6 bits of third character ; siIndex4 = cmBitAnd(siChar3,63) return stBase64EncodingTable.substr(siIndex1 + 1,1) + stBase64EncodingTable.substr(siIndex2 + 1,1) + stBase64EncodingTable.substr(siIndex3 + 1,1) + stBase64EncodingTable.substr(siIndex4 + 1,1) endProc Proc cmBitAnd( siAny SmallInt, siMask SmallInt) SmallInt ; ; Apply bitAnd mask siMask to siAny ; return siAny.bitAnd(siMask) endProc Proc cmShiftRight2(siAny SmallInt) SmallInt ; ; Shift value right two bits ; return siAny / 4 endProc Proc cmShiftRight4(siAny SmallInt) SmallInt ; ; Shift value right four bits ; return siAny / 16 endProc Proc cmShiftRight6(siAny SmallInt) SmallInt ; ; Shift value right six bits ; return siAny / 64 endProc Proc cmShiftLeft2(siAny SmallInt) SmallInt ; ; Shift value left two bits ; return siAny * 4 endProc Proc cmShiftLeft4(siAny SmallInt) SmallInt ; ; Shift value left four bits ; return siAny * 16 endProc Proc cmShiftLeft6(siAny SmallInt) SmallInt ; ; Shift value left six bits ; return siAny * 64 endProcTo use these basic, low level procedures, we need two methods that feed encoding and decoding blocks. Assume that stCRLF = chr(13) + chr(10). method Base64Encode(stAny String) String ; ; Base 64 Encode a String Type ; var stEncoded String liTotalBlocks LongInt liInputSize LongInt siOddSize SmallInt liIndex LongInt stLastBlock String siChar1 SmallInt siChar2 SmallInt siChar3 SmallInt liBlocksWritten LongInt endVar ; ; Initialize local variables ; liInputSize = stAny.sizeEx() stEncoded = blank() liBlocksWritten = 0 ; ; Calculate number of encoding blocks to process ; liTotalBlocks = liInputSize / 3 ; ; Determine if last block is < 3 characters in size ; siOddSize = smallInt(liInputSize.mod(3)) ; ; Encode each 3 byte block to 4 base64 characters adding ; a CR/LF after each 19 blocks or 76 base64 characters. ; for liIndex from 1 to liInputSize - siOddSize step 3 liBlocksWritten = liBlocksWritten + 1 stEncoded = stEncoded + cmBase64EncodeBlock( ansiCode(stAny.substr(liIndex,1)), ansiCode(stAny.substr(liIndex + 1,1)), ansiCode(stAny.substr(liIndex + 2,1))) + iif(liBlocksWritten.mod(19) = 0,stCRLF,"") endFor ; ; Check for odd size last block ; liBlocksWritten = liBlocksWritten + 1 switch case siOddSize = 0 : case siOddSize = 1 : stLastBlock = cmBase64EncodeBlock( ansiCode(stAny.substr(liInputSize,1)), 0, 0) stEncoded = stEncoded + stLastBlock.substr(1,2) + cnBase64Null + cnBase64Null + iif(liBlocksWritten.mod(19) = 0,stCRLF,"") otherwise : stLastBlock = cmBase64EncodeBlock( ansiCode(stAny.substr(liInputSize - 1,1)), ansiCode(stAny.substr(liInputSize,1)), 0) stEncoded = stEncoded + stLastBlock.substr(1,3) + cnBase64Null + iif(liBlocksWritten.mod(19) = 0,stCRLF,"") endSwitch return stEncoded endMethod method Base64Decode(stEncoded String) String ; ; Decode a Base 64 to String Type ; var stDecoded String liIndex LongInt liLine LongInt siChar1 SmallInt siChar2 SmallInt siChar3 SmallInt arEncodeBlock Array[] String endVar ; ; Initialize local variables ; stDecoded = blank() ; ; Break into substrings based on CR/LF ; stEncoded.breakApart(arEncodeBlock,stCRLF) ; ; Loop through substrings and decode ; for liLine from 1 to arEncodeBlock.size() stEncoded = arEncodeBlock[liLine] switch case isBlank(stEncoded) = True : loop endSwitch for liIndex from 1 to stEncoded.sizeEx() step 4 cmBase64DecodeBlock( stEncoded.substr(liIndex,4), siChar1, siChar2, siChar3) stDecoded = stDecoded + iif(siChar1 <> cnBase64Invalid,chr(siChar1),"") + iif(siChar2 <> cnBase64Invalid,chr(siChar2),"") + iif(siChar3 <> cnBase64Invalid,chr(siChar3),"") endFor endFor return stDecoded endMethod Conclusion We now have methods that support encoding and decoding using base64. The presented methods and procedures only deal with String Types and a nice project would be to develop other supporting methods for dealing with any type of input, including files. I'll leave this to you, the reader, to explore and have fun with. Discussion of this article |
![]() Feedback | Paradox Day | Who Uses Paradox | I Use Paradox | Downloads ![]() |
|
![]() The information provided on this Web site is not in any way sponsored or endorsed by Corel Corporation. Paradox is a registered trademark of Corel Corporation. ![]() |
|
![]() Modified: 19 Jul 2003 Terms of Use / Legal Disclaimer ![]() |
![]() Copyright © 2001- 2003 Paradox Community. All rights reserved. Company and product names are trademarks or registered trademarks of their respective companies. Authors hold the copyrights to their own works. Please contact the author of any article for details. ![]() |
![]() |
|