Tagged with: [ encryption ]
Today I overheard two colleagues discussing one of my favorite subjects: encryption. The discussion was about that encrypting data (with a normal block cipher) was working perfectly in ECB mode, but not in CBC mode. So, this all leads up to the question: what is ECB and CBC? And when should you use them? Although this post has some PHP code in it, it is applicable for every other language.
Lets start off with the basics of a block cipher encryption. In order to encrypt data, we need to have 2 pieces of information we need to feed into the encryption-function: the message and the key. The message in this case could be anything: a string, binary data, numbers, a file. It doesn’t matter. The key is the secret that makes it (almost) impossible to decrypt the encrypted data if you don’t have it. But with it, decrypting is easy (just like the correct key makes it a lot easier to open a lock).
The actual encryption method we will use is not really that important, as long as it is a block cipher. Now, block ciphers are algorithms that use 1 single key for both encryption and decryption (also known as a symmetrical cipher). Another property is that they act on a block of data, instead of just a single bit or byte at the time (which is what stream ciphers would do).
If you have a large message (a 1MB image for instance), it has to be split up into smaller blocks of exactly the length of your key. If you use a 64bit key, you have blocks of 64 bits (or 8 bytes) each will be encrypted or decrypted. A 256 bit key gives you blocks of 32 bytes etc.
It is always possible that your message will not be exactly a multiple of your key length but it still should be possible to encrypt a 9 byte message with a 64bit key. In order to make this work, we can apply a padding-scheme to fill up the last block. However, depending on the operation mode, padding might not be needed.
+--------+--------+ | BLOCK1 | BLOCK2 | +--------+--------+ |12345678|9PPPPPPP| +--------+--------+
As you can see, the last block (block2) has some extra padding (P) which has to be stripped of by the decryption methods. Normally, this is all already taken care of by the encryption/decryption routines you use.
We only discuss 2 operation modes but there are more. A good explanation about these modes can be found here at wikipedia. The operation mode specifies how blocks “interconnect” with each other and every mode has some advantages and disadvantages. We will talk only about ECB and CBC, since these are the most common used. Other modes, like CFB or OFB are block ciphers that act like stream ciphers, but we don’t discus them here.
ECB stands for Electronic CodeBook and is the easiest mode. Every block will be concatenated to the next block so it couldn’t be simpler. However, this results in some issues:
First of all, every block of data is encrypted with only the message and key as input. Suppose you encrypt the text “HELLOYOU” with a 64-bit block size algorithm (like blowfish). This would fit perfectly inside one block since it’s 8 bytes (or 64 bits) long. Now this text is repeated 10 times, this means that your encrypted code will consists of 10 time the same encrypted output. Let’s see an example in PHP:
This would result in something like this:
3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD 3F 89 AD 58 3C C8 21 CD
This encryption is deterministic since the same input always results in the same output. Without knowing the actual message, we know that the message is 8 bytes long and repeated 10 times. It’s the way ECB works and it has some advantages and disadvantages:
First of all, you can see patterns in your encryption real easily just like we shown in the example. Secondly, it’s very easy for somebody to move block #2 to block #1, which results in a complete other plaintext (well, not in this case, since every block has the same text), but still will be decrypted correctly. This has some severe consequences: suppose you encrypt a user-cookie with an ID. This ID is encrypted so the user cannot edit it, but it tells your application that this is the ID that is logged in. Now suppose we can change blocks around so we effectively changed the user-id, even if it’s encrypted?
An upside to this operation mode is that it’s relatively error-prone. If something happens to data in 1 block, only that block gets corrupted. All other blocks remain intact, as shown in the next example:
And the output:
You see, only the second block is corrupt. The others can be decrypted perfectly. Although it might not sound useful in these internet-days where TCP guarantees delivery of (correct) data, but on some systems you must be prepared for data corruption and error correction.
Another (big) advantage, is that you can encrypt or decrypt multiple blocks in parallel. For instance, it’s ok to encrypt block 10 first, and afterwards block 1. You only have to make sure that all blocks will be placed in the correct order at the end. This makes it easier for multicore or multiprocessor systems to encrypt pieces of the same file simultaneously.
However, when using ECB mode for encryption, the advantages does not outweigh the disadvantages. I seriously suggest that IF you use ECB as your encryption operation mode, you take a look at the next mode: CBC.
CBC or Cipher Block Chaining is a complete other way of connecting blocks together. What is does is instead of just processing each block separately, every block will be XOR’ed with the encrypted previous block. This effectively means that every block depends on the output of the previous block.
Let’s take a look at an example:
As you can see, we have introduced a secondary piece of information called the IV or initialization vector. We talk about this a bit later on. The output of this would be:
B1 2C 46 D5 E1 73 E3 52 22 1F BA 57 F1 83 F3 4A 63 2F 21 37 4D 9E 93 55 40 BE AA C9 58 2F 5A 5D FA 84 60 45 9C 99 AB 6F C5 71 70 52 61 4A DA E8 21 00 0F 93 35 6C AC 45 EA C4 6E 3C EA 50 83 A7 FF 1A 28 9F 7C 69 49 ED EF 88 CA 25 F6 F2 98 1C
As you can see, the repetitions are gone, even though you are encrypting 10 times the same message. The advantage is clear: you cannot deduce the plaintext by looking at the encrypted blocks separately. However, it has some consequences:
First: parallel encryption is not possible: we need to encrypt block 1 before we can encrypt block 2, etc. Another disadvantage is that an error in one block will result in failure of not only that block, but of the next block as well. This is because the first block is used for creating the next block. My programmers instinct would suggest that this means that EVERY other block after that error would also be in error (since we now have an erroneous input into the next block), but this is not the case. The error gets rectified by the xor operation (which is a bit hard to explain, but it works..) This is called “limited error-propagation”.
Remember the extra
$iv variable we had to add when we use the CBC mode? As said before, CBC uses the encrypted output of
block #N as additional input for block N+1. This works fine for all blocks, but what would be the additional input for
the first block? Somehow we must “jumpstart” the CBC routines with initial information and that’s exactly what the $iv
variable does. It’s the input data that block #1 uses.
This means a few things:
- We need to use the SAME IV FOR BOTH ENCRYPTION AND DECRYPTION. A logical, but most of the time overlooked point.
- If the $iv used in decryption is not the same as we used during encryption, we should get garbled output for block #1 but this would be “fixed” from block #2 and forwards. This means that garbled text at the start of a decrypted message means 9 out of 10 times that your IV is wrong.
- The $iv MUST be the same size as the block. So for a 64bit block, we need a 64bit IV. PHP has got the mcrypt_enc_get_iv_size() function to determine the size of the IV.
- The $iv should be randomized for each encryption. It should not be a constant. This would mean that the encryption would become deterministic again.
Playing around with the various operating modes shows that there are many things you need to consider. Don’t expect that everything is safe just because you encrypt your data. Without proper knowledge, this means absolutely nothing. Having said this, using a rule of thumb: don’t use ECB, but always CBC with random IV’s seem to work for most people. And of course, always use the best cipher algorithm that is available.