Decoding TLS with PHP.
Tagged with: [ encryption ] [ rc4 ] [ ssl ] [ tls ]
As a proof of concept I wanted to see in how far I could decode some TLS data on the client side. Obviously, this is very complex matter, and even though TLS looks deceptively simple, it isn’t. To make matters worse, PHP isn’t quite helping us making things easy neither.
Capturing TLS data
It’s not easy to decrypt TLS data if you’re not affiliated with 3-letter organizations. After all: that’s the whole purpose of using TLS: encrypt en authenticate data. But since we want to decode data from the client endpoint, this makes things a little bit easier for us. I’m using a more complex setup to get our data, but this is mostly because the setup was already in place for some other purposes. You can just tag along hopefully, as it isn’t really hard to try it yourself.
First of all, you’ll need a browser (Chrome or Firefox are fine, Safari and IE aren’t!), and a website to connect to. In
this case I’m using github.com. Before we start our browser, we have to do a neat trick that allows us to see some
internal TLS information from the browser. You’ll need to set an environment variable called SSLKEYLOGFILE
to a
certain file where it will store all kind of data. Under OSX, this is done by entering the following in your console:
launchctl setenv SSLKEYLOGFILE /tmp/sslkeylogfile.txt
This will actually log some information about SSL keys and secrets in the file so we can use it later. Next, we will use
wireshark to do some serious TLS debugging. Before we actually start with that, we need to set some preferences. In
wireshark, go to your preferences
, and select SSL
from the protocol
list. Here we will set the ssl debug
file
. That will store some additional SSL debug information (we don’t really need it, but makes it much easier to
figure out what’s going on, but be careful, this file will become VERY large in a short time!). Most important setting
is the (pre)-master-secret file
. You will need to set this to the path you filled in your SSLKEYLOGFILE
. This
allows wireshark to actually use your browser data to automatically decrypt TLS data (so you can automatically look into
your encrypted TLS data and see the HTTP or whatever you have running beneath).
From this point on, you can start a new wireshark capture on your outbound interface. Next you go to your web-browser and go to https://github.com/favicon.ico (i use this so your browser only does one fetch, when fetching the index, your browser will automatically issue many connections, and we want to isolate just one, this is probably the easiest way).
In wireshark you will see a lot of data, but you can easy filter this by adding SSL
in the filter-bar. This will
filter out everything that is not SSL. If you see too much, you might have other applications open that uses SSL, like
twitter etc.. I turn all my applications of so they don’t interfere, but you might even filter wireshark to filter only
for the github.com ip address. In the end, i’ve got something like this:
You see that it automatically decrypts the HTTP records that encrypted by TLS (this is because of the SSLKEYLOGFILE). Obviously our goal is to achieve the same thing in PHP. But you need some of the data from wireshark to get to that point.
Gathering all the necessary fields
Because this is a proof of concept, i’m not doing the actual connection of TLS. I will be using some pre-captured data
which I will be decrypting. There are 3 important items we need: 2 of them are available directly from the line, but one
isn’t. The two we need are called the client_random
and the server_random
. These are 2 random values generated
by respectively the client and the server and send over the line unencrypted. We use this as input for generating some
keys we need later.
The 3rd value is something called a pre-master-secret
. This is some data that should be kept VERY safe and is send
over encrypted from the client to the server (in the so-called client-key-exchange record). The way this is encrypted
depends on the so-called cipher-suite that is used. Normally, the client send over a list of different cipher-suites
that it supports. The server checks this list, and chooses a cipher-suite and sends this back to the client. So it’s the
server that ultimately decides which cipher-suite they will be using for communication. This is important for us, as
encryption,decryption, authentication and integrity depends on the selected ciphersuite. Some of these suites are
strong, others are weak, and some of them are even broken and should not be used.
Funny enough, github.com doesn’t use the strongest cipher-suite that most browsers are sending. This has a few reasons:
using strong cipher suites result in more complex encryption/decryption, and thus more computer power that must be used.
Also, some “stronger” suites are vulnerable to certain current attacks, while some weaker suites aren’t. So it’s a kind
of a trade-off. For now, it’s safe to say that github.com uses the TLS_RSA_WITH_RC4_128_SHA
suite and it’s ok for now,
although this might likely change in the (near) future.
Inside wireshark, we can actually see the list of cipher suites we are sending in the client-hello
record (strongest
and preferred suites are at the top). The one that we will be using can be found in the server-hello
record. This is
also where our client_random and server_random can be found.
Inside the client-key-exchange
record, we can find our pre-master-secret. Since the ciphersuite we use is TLS_RSA_*,
it means that we are using RSA to encrypt the pre-master-secret (no-one but the server at github.com can decrypt this
data!).
But for our purposes, we need to have this pre-master-secret as well. Here is where our debug files come into action.
Inside the SSLKEYLOGFILE
, it will actually store these pre-master secrets. Inside this file, there are lines starting
with RSA
. The next number the first part of the encrypted key (not ALL of it!), and the next long number is the actual
pre-master secret. All we need to do is find the encrypted part, and fetch the matching pre-master secret.
In my case, my encrypted key starts with 04294e1aab81
, and in my SSLKEYLOGFILE
, this matches up with the actual
pre-master-secret of 03034f855727c944e11c5d74490ce62b550db46a96b32a7be76d68342dcc7fe9c87026090ebb99245f3ffd13f9ed185a
.
Ok, so it’s not so secret anymore.. :(
Now, we have our 3 main components: client_random
, server_random
and pre-master-secret
. Let’s get decoding!
Decoding TLS_RSA_WITH_RC4_128_SHA
We already know that we are using the cipher suite TLS_RSA_WITH_RC4_128_SHA
. This means that the pre-master-key is sent
over via RSA. But it also means that the actual data encryption is done through RC4
(which is a commercial version, the
“free” version is called ARCFOUR
, which gives the EXACT same results). The 128 means that we are using a secret key that
is 128 bits (16 byte) long, and the SHA part means we are using SHA1 for checking integrity of our data.
From pre-master-secret to master-secret
First we need to convert our pre-master secret
to a master-secret
. If you look closer inside your
SSLKEYLOGFILE
, you will see lines starting with CLIENT_RANDOM
. These lines contain a client-random value and a
master-secret. So it’s pretty easy to find the correct master-secret. However, we will do it manually which is possible
with the data we’ve already gathered.
The most important part of generating such a master-secret is the PRF
(pseudo-random-function). This PRF has changed
over the different versions of TLS, and TLS 1.0 and TLS1.1 are using a very complex PRF. Luckily, the TLS1.2 version
uses a most simplistic one. Since the chrome browser talks TLS1.2 (and fortunately, github too), we can use this PRF:
As you can see, the master secret is created by the pre master secret (in binary form), a literal string 'master secret'
,
and the concatenation of the client_random and server_random. The 48 means the master_secret must be 48 bytes long.
The prf_tls12
function is pretty easy too:
It just uses the p_hash function, where it will be using sha256 to create our key. No matter which cipher we use for our communication, the generation of the master-secret in TLS1.2 is always using sha256.
Next up, the p_hash function:
Ok, this looks a bit complex, but basically what it does is generate some output based on the previous output. What it does is explained in http://tools.ietf.org/html/rfc5246#section-5, the actual RFC for TLS 1.2. But it boils down to this:
P_SHA256(secret, seed) = HMAC_SHA256(secret, A(1) + seed) + HMAC_SHA256(secret, A(2) + seed) + HMAC_SHA256(secret, A(3) + seed) + ... A() is defined as: A(0) = seed A(i) = HMAC_SHA256(secret, A(i-1))
Depending on how much data we need, we have to run this multiple times. Since sha256 returns a hash of 32 bytes (256 bits), and we need 48 bytes for our master-secret, it means that we must run this p_hash function twice so we have 64 bytes. Then we will return 48 bytes and just ignore the last 16 bytes. Every time we add more data, we use the output of the previous round as the input of this round.
After this, we have our actual master-secret. You can verify if this is correct, by checking against the master-secret
you can find in the SSLKEYLOGFILE
.
Partitioning our master-secret
But a master secret isn’t a key yet. For this, we must repeat the same trick again, but with different data and values:
It looks pretty much the same as we did while generating the master-secret, except we use the actual master-secret
(pre-master secret is not used anymore and must be officially discarded and cleared from memory). The label we use is
'key expansion'
, and something that is easily overlooked: the server-random
and client-random
are reversed in this
phase.
You notice that we use 72 here as the size, and this value depends on the given cipher suite. The data that comes from
this $key_buffer
is used for 6 different variables: a client_mac
, a server_mac
, a client_key
, a server_key
, a
client_iv
, and a server_iv
. Since we use TLS_RSA_WITH_RC4_128_SHA
, it means that the macs are 20 bytes, the keys
are 16 bytes and the IV’s are 0 bytes, taking us to 72 bytes in total.
Next we “partition” the key_buffer into our variables:
The reason that the IV’s are empty is because RC4 is a stream cipher and does not use IVs. Notice that we have both have client and server keys, macs and ivs. This is because communication from a client to server uses a different values as communication from a server to client. IF somebody cracks your client_key, it can only decode message from the server to the client, but not the communication from the client to the server.
Decoding our data
Finally, we can decode our data!
But, there is a big problem when dealing with RC4 in PHP. This cipher is available through the mcrypt extension. The downside of this library is that it does not function the way TLS assumes. When initializing this cipher with our key, it sets the cipher in a certain state. When decoding data, this state changes. TLS assumes that the next data it encrypts/decrypts is done with this new state. However, PHP’s mcrypt does not and always “resets” the state. This makes it that you can decrypt (or encrypt) the first TLS packet without problems, but you can’t decrypt or encrypt the next packets.
For this reason, i’ve decided not to use the standard RC4 implementation through mcrypt, but I will be using a custom RC4 class that i’ve written and it’s loosely based on a RC4 implementation found in the php documentation. Fortunately RC4 is pretty simple in how it works, and this class saves the state so it can be used for TLS purposes.
To use it we just construct it with the secret key, and use the decrypt/encrypt methods (decrypt and encrypt does the
same thing. Encrypting something twice returns the original text, like rot13
and such).
Armed with our new class, we can do some decoding. The bytes below are just encrypted packets from the TLS connection.
This should output some hex numbers, but also a GET/favicon.ico HTTP/1.1
request. As you can see, the second packet is
something the server has sent to us, and thus we must use our server_cipher instance to decrypt it. Notice that I strip
away some data at the end of our message (20 bytes). This is our MAC which we can use to verify our message. It’s 20
bytes because our used cipher_suite uses a 20 byte mac (SHA)
Also make sure that when you decode TLS data, you use the order in which the packets are arriving. You CANNOT decode the first HTTP packet without decoding the finished packet first (the thing about RC4 saves its internal state between encrypting/decrypting).
Verifying message integrity
TLS also allows us to verify the integrity of a send message. This is the part of the message we strip away (the 20 bytes). This number depends on the MAC function the cipher is using (SHA(1) in our case, which is 20 bytes). Verifying our message is not hard, but has some issues in PHP.
What we need to check if the mac that is received matches the result of the following:
$key
is the client_write_mac
or server_write_mac
, depending on which message we want to verify. Sequence number
is a number that increments every time we decode a packet. The 8 bit content type tells us what kind of TLS packet we
are (normally, this is 23, which means it’s a application data packet, which holds our encrypted HTTP data. The TLS
version for TLS1.2 must be 0x0303
, and the message length is just the regular strlen()
of the message.
This check_mac is very fixed on TLS1.2 and SHA1. But as a proof-of-concept it works. One problem that PHP has is that it cannot not pack 64 bit numbers. For now we fix this by using 32bit numbers and hopefully we will not run out of numbers. For a decent implementation, you probably need some bit-shifting and bit-masking to store the sequence number as binary data.
Source
This should be enough to get you going and to decode some TLS data. I’ve placed everything a bit neater together into a git repository. Check out the code at: https://github.com/jaytaph/TLS-decoder