Using Compression with CryptoSys

Compressing data before encrypting adds to security and can lead to shorter messages for transmission. CryptoSys API now includes compression (deflate) and decompression (inflate) functions based on Jean-loup Gailly's excellent zlib product.

However, using compression adds to the complexity of the operation. The major issue is that you need to know the uncompressed length of your data before you inflate it and that means the length must be stored with the compressed data.

We suggest here a simple method of storing your compressed data in a simple compressed data packet before encryption. An example in Visual Basic is given that shows how a plaintext message is compressed, stored in the packet and encrypted ready for transmission or storage. The example then shows how to extract the plaintext from the ciphertext.

Compressed Data Packet Format

The compressed data is preceded by a 6-byte header. The format is as follows:
  0   1   2   3   4   5 
+---+---+---+---+---+---+
|  SIG  | CSIZE | USIZE |
+---+---+---+---+---+---+
+======================================+
|...CSIZE bytes of compressed data...  |
+======================================+

SIG is the two-byte signature. We use the values 0x5a and 0x41 (the ascii characters 'Z' and 'A') but you can use any pair of bytes you wish. The purpose of a signature is to prevent us accidentally trying to interpret a wrong set of data and allows us to signal the format of data to follow. We could, for example, use "ZB" as a signature for a different format.

CSIZE is the size of the compressed data in bytes represented in a two-byte value in big-endian format, i.e. the first byte is the most significant.

USIZE is the size of the original uncompressed data in bytes represented in a two-byte value in big-endian format.

The size values are stored in big-endian order such that the first byte is the most significant. For example, the number 3 would be stored as 00 03 and the number 7400 as 1C E8. Obviously, if you needed to store large quantities of data in one packet, you could change the format to increase the sizes to be four bytes instead of two. This format just makes things simpler for our example.

Example code in Visual Basic

    Dim abPlain() As Byte
    Dim abCompressed() As Byte
    Dim abPacket() As Byte
    Dim abCipher() As Byte
    Dim abKey() As Byte
    Dim nCompLen As Long
    Dim nUncompLen As Long
    Dim nPacketLen As Long
    Dim nPad As Long
    Dim nLen As Long
    Dim i As Integer
    Dim strPlain As String
    Dim strBack As String
    Dim lngRet As Long
    
    ' ENCRYPTION WITH COMPRESSSION
    
    ' Get the plaintext message
    strPlain = "hello, hello, hello. This is a 'hello world' message " & _
        "for the world, repeat, for the world."
    ' Convert to an array of bytes
    abPlain = StrConv(strPlain, vbFromUnicode)
    nUncompLen = UBound(abPlain) + 1
    ' Find required length by calling with zero output length value
    nCompLen = ZLIB_Deflate(0, 0, abPlain(0), nUncompLen)
    ReDim abCompressed(nCompLen - 1)
    ' Now compress plaintext
    Call ZLIB_Deflate(abCompressed(0), nCompLen, abPlain(0), nUncompLen)
    ' Create a compressed data packet
    nPacketLen = nCompLen + 6
    ReDim abPacket(nPacketLen - 1)
    ' Store header values in packet
    abPacket(0) = Asc("Z")
    abPacket(1) = Asc("A")
    abPacket(2) = (nCompLen \ &H100) And &HFF
    abPacket(3) = nCompLen And &HFF
    abPacket(4) = (nUncompLen \ &H100) And &HFF
    abPacket(5) = nUncompLen And &HFF
    ' Copy compressed data into packet
    ' (NB there are Win32 API memory functions must faster to do this)
    For i = 0 To nCompLen - 1
        abPacket(i + 6) = abCompressed(i)
    Next
    
    ' Now we encrypt the compressed data packet using DES
    ' First we must pad the packet so the length is a multiple of 8
    nPad = 8 - (nPacketLen Mod 8)
    ReDim Preserve abPacket(nPacketLen + nPad - 1)
    For i = 0 To nPad - 1
        abPacket(nPacketLen + i) = nPad
    Next
    ' Set the key (this is just a sample, OK)
    ReDim abKey(7)
    For i = 0 To 7
        abKey(i) = i Xor &HFF
    Next
    
    ReDim abCipher(UBound(abPacket))
        
    lngRet = DES_Bytes(abCipher(0), abPacket(0), nPacketLen + nPad, abKey(0), True)

    ' The encrypted data in abCipher is transmitted or stored
    ' ............
    
    ' DECRYPTION AND DECOMPRESSSION
    ' We show how to recover the plaintext
    ' given just the data in abCipher and the key
    ' deriving all the other necessary values from the data itself.
    nLen = UBound(abCipher) + 1
    ReDim abPacket(nLen - 1)
    
    ' Decrypt using the key we already know
    lngRet = DES_Bytes(abPacket(0), abCipher(0), nLen, abKey(0), False)
    
    ' Check the value of the padding bytes
    If abPacket(nLen - 1) > 8 Then
        MsgBox "Invalid padding character"
        Exit Function
    End If
    ' Check we have the header bytes we are expecting
    If abPacket(0) <> Asc("Z") Or abPacket(1) <> Asc("A") Then
        MsgBox "Invalid packet header signature"
        Exit Function
    End If
    ' Extract the lengths from the header
    nCompLen = abPacket(2) * &H100 + abPacket(3)
    nUncompLen = abPacket(4) * &H100 + abPacket(5)
    ' perhaps check reasonableness of lengths here...
    
    ' Uncompress the compressed data, skipping the packet header
    ReDim abPlain(nUncompLen - 1)
    lngRet = ZLIB_Inflate(abPlain(0), nUncompLen, abPacket(6), nCompLen)
    
    ' Convert back to a string
    strBack = StrConv(abPlain, vbUnicode)
    Debug.Print strBack

The original 90 characters of plaintext data displayed in hexdump format are:-

000000  68 65 6c 6c 6f 2c 20 68 65 6c 6c 6f 2c 20 68 65  hello, hello, he
000010  6c 6c 6f 2e 20 54 68 69 73 20 69 73 20 61 20 27  llo. This is a '
000020  68 65 6c 6c 6f 20 77 6f 72 6c 64 27 20 6d 65 73  hello world' mes
000030  73 61 67 65 20 66 6f 72 20 74 68 65 20 77 6f 72  sage for the wor
000040  6c 64 2c 20 72 65 70 65 61 74 2c 20 66 6f 72 20  ld, repeat, for 
000050  74 68 65 20 77 6f 72 6c 64 2e                    the world.      
These 90 characters of plaintext are compressed by the ZLIB deflate function to 68 characters:-
000000  78 9c cb 48 cd c9 c9 d7 51 c8 40 a2 f4 14 42 32  x..H....Q.@...B2
000010  32 8b 15 80 28 51 41 1d 2c a2 50 9e 5f 94 93 a2  2...(QA.,.P._...
000020  ae 90 9b 5a 5c 9c 98 9e aa 90 96 5f a4 50 92 91  ...Z\......_.P..
000030  0a 11 d6 51 28 4a 2d 48 4d 2c d1 41 15 d6 03 00  ...Q(J-HM,.A....
000040  86 d1 1f 4e                                      ...N            

Note how this act of compression makes our original plaintext quite unreadable on a first glance.

To create our packet, we prepend our 6-byte header to make the 74-byte compressed data packet before encryption. Note our signature characters 'Z' (0x5a) and 'A' (0x41) at the start, and remember that decimal 68 = 0x0044 and decimal 90 = 0x005a.

000000  5a 41 00 44 00 5a 78 9c cb 48 cd c9 c9 d7 51 c8  ZA.D.Zx..H....Q.
000010  40 a2 f4 14 42 32 32 8b 15 80 28 51 41 1d 2c a2  @...B22...(QA.,.
000020  50 9e 5f 94 93 a2 ae 90 9b 5a 5c 9c 98 9e aa 90  P._......Z\.....
000030  96 5f a4 50 92 91 0a 11 d6 51 28 4a 2d 48 4d 2c  ._.P.....Q(J-HM,
000040  d1 41 15 d6 03 00 86 d1 1f 4e                    .A.......N      
Before encryption, we need to pad with another 6 bytes to make the size up to the next multiple of 8 bytes, i.e. 80 bytes.
000000  5a 41 00 44 00 5a 78 9c cb 48 cd c9 c9 d7 51 c8  ZA.D.Zx..H....Q.
000010  40 a2 f4 14 42 32 32 8b 15 80 28 51 41 1d 2c a2  @...B22...(QA.,.
000020  50 9e 5f 94 93 a2 ae 90 9b 5a 5c 9c 98 9e aa 90  P._......Z\.....
000030  96 5f a4 50 92 91 0a 11 d6 51 28 4a 2d 48 4d 2c  ._.P.....Q(J-HM,
000040  d1 41 15 d6 03 00 86 d1 1f 4e 06 06 06 06 06 06  .A.......N......
This input block is encrypted using DES in ECB mode with the key 0xfffefdfcfbfaf9f8 to produce the final 80 bytes of ciphertext for transmission:-
000000  d2 70 73 5c 86 87 21 95 73 64 54 11 e3 3a 19 50  .ps\..!.sdT..:.P
000010  d9 e1 c8 e9 a6 a5 a4 81 56 f4 df 4a 80 b6 1d 6a  ........V..J...j
000020  ad f3 48 15 68 2f a9 7a 61 12 d1 6a 31 3c 3b b5  ..H.h/.za..j1<;.
000030  64 24 00 c9 81 c2 6c b1 00 75 1c a0 8b 22 d3 a6  d$....l..u..."..
000040  77 02 a5 88 9e a5 06 75 65 ce 75 fd 84 08 8f 71  w......ue.u....q

Without the key, it is now virtually impossible to recover the original plaintext message from this information. For practical purposes - well, if we are not the NSA - this data appears to be just random bytes.

For interest, in base64 format this ciphertext becomes:-

0nBzXIaHIZVzZFQR4zoZUNnhyOmmpaSBVvTfSoC2HWqt80gVaC+pem
ES0WoxPDu1ZCQAyYHCbLEAdRygiyLTpncCpYiepQZ1Zc51/YQIj3E=
Decryption and inflation are essentially just the reverse of this process.

Comments

Contact

For comments, complaints and suggestions on this document please contact us.


Copyright © 2003-6 DI Management Services Pty Ltd. <www.cryptosys.net>   <www.di-mgt.com.au>