greg.ewing@canterbury.ac.nz

http://www.cosc.canterbury.ac.nz/greg.ewing

March 2010

I had occasion to want to write a program to make some changes to some files that were protected by a CRC. The program that created the files was closed-source and too big for disassembly to be practical, so I wondered whether it would be possible to work out the polynomial and other parameters being used to generate the CRC from examining examples of the files.

After a rather long and interesting journey, I did find a way. I'm writing this essay to document the techniques that I developed, in case anyone else finds them useful.

The relevant parameters can be summarised as follows. Here I'm using the names that the Python program pycrc uses for them.

- Width - the number of bits in the final CRC result.
- Poly - the polynomial being used to generate the CRC, expressed as a bit string. (I will omit the leading 1 bit that is always understood to be present.)
- ReflectIn (boolean) - Whether to reverse the input bytes before applying the algorithm to them. (This is equivalent to processing the input bytes LSB-first instead of MSB-first.)
- XorIn - the initial value of the CRC register.
- ReflectOut (boolean) - whether to reverse the bits of the CRC before presenting it as the final result. (This is equivalent to mirroring the algorithm so that the register is shifted right instead of left.)
- XorOut - a value to be exclusive-ored with the final CRC value.

There were two bytes in the header that changed in an apparently random fashion, which I surmised was a 16-bit CRC. I was assuming it was a CRC rather than some other kind of checksum, because the application would report a "CRC mismatch" when attempting to open a modified file.

Since this was on Windows, it also seemed a fair bet that the CRC would be stored little-endian, and this later proved to be correct.

Another thing I needed to know was how much of the file the CRC covered. Most of the header appeared to be unused, and I found that I could change the last byte of the header without the application complaining. However, if I changed the last byte in the schema, it complained of a CRC mismatch. I concluded that the CRC covered everything in the schema area, including the pad bytes. (This turned out to be almost correct, but not quite.)

I found a program, pycrc, that among other things allows you to calculate a CRC for supplied data given the above parameters. The first thing I did was to try all the standard 16-bit CRC models that pycrc knows about, none of which worked.

Next I considered trying a brute-force search of all possible 16-bit polynomials. This seemed feasible, although not quite as straightforward as it might appear at first, because of the XorIn and XorOut parameters. Without knowing them, I wouldn't be able to tell when I'd found the right polynomial, and trying all possible combinations of Poly+XorIn+XorOut would obviously not be practical.

Fortunately, there are some properties of CRC algorithms that provide a way of getting around this difficulty.

We can also treat the initial value of the CRC register (the XorIn parameter) as generating a component of its own that gets xored into the final CRC. So we can express the CRC of a message in terms of three components:

C = T^{n}I + D + F

where + represents exclusive-or, and

I = the initial register contents (XorIn)

F = the final xor value (XorOut)

n = the length of the message

D = the homogeneous CRC of the data in the message

T^{i} x = the result of applying i shift-xor steps to register contents x with zero data

F = the final xor value (XorOut)

n = the length of the message

D = the homogeneous CRC of the data in the message

T

Here I'm borrowing a term from linear algebra and using "homogeneous CRC" to mean a CRC calculated using XorIn = XorOut = 0.

Now suppose we take two messages of the same length and exclusive-or them together. The exclusive-or of their CRCs is given by

C_{1} + C_{2} | = | (T^{n}I + D_{1} + F) + (T^{n}I + D_{2} + F) |

= | D_{1} + D_{2} |

The I and F terms have cancelled out, leaving us with just the homogeneous CRC of the exclusive-ored data. So we can ignore the XorIn and XorOut parameters initially, as long as we confine ourselves to studying this kind of message, which I will call a difference message.

Encouraged by this, I constructed another message with a different length and did the same thing again. I got some more hits. But unfortunately, none of them matched any of the ones I got the first time.

At this point, I was stuck. I didn't have any more parameters to vary, so I was beginning to doubt that I was dealing with a standard CRC algorithm at all. If I wasn't, it seemed unlikely that I would ever be able to figure it out.

Then I got into a conversation with Patrick Maupin, who suggested a test that might help to clarify whether it was a true CRC or not. Due to the superposition principle, if changing a message by xoring it with a bit pattern B1 causes its CRC to change by C1, and another bit pattern B2 causes the CRC to change by C2, then xoring the message with (B1 xor B2) should change the CRC by (C1 xor C2). If that doesn't happen, the algorithm can't be an ordinary CRC algorithm.

I constructed some suitable test messages, and found that this property did seem to hold. So there was hope that it could be a standard CRC algorithm, or something very similar to it.

I also began to wonder what I could learn by studying what happens to the CRC when individual bits of the file are changed. At first I thought that the CRC corresponding to a single 1 bit might be a rotation of the polynomial, but it turns out to be more complicated than that. However, after some more thinking along those lines, I came up with a possible plan of attack.

When the 1 bit arrives, the polynomial gets loaded into the register. After that, it gets transformed by the shift-xor operation once for each remaining 0 bit in the message, with the leading bit of the register determining whether to xor the polynomial in or not.

So the final CRC value will be equal to the polynomial transformed by n shift-xor cycles, where n depends on the position of the 1 bit in the message.

Now consider two CRC values obtained from two 1-bit messages, where the 1 bits are in adjacent positions. The resulting CRCs will differ by just one shift-xor cycle. To be precise, if C1 corresponds to the message with a 1 in position i, and C2 corresponds to the message with a 1 in position i+1, then C1 is derived from applying one shift-xor cycle to C2. (If this seems backwards, it's because the further the 1 bit is from the end of the message, the more shift-xor cycles get applied to the CRC.)

There are two possibilities. If the leading bit of C2 (the one about to be shifted out) is 0, then C1 will be equal to C2 shifted by one place. If it is 1, then C2 will be equal to C1 shifted one place and xored with the polynomial.

So all we need to do is find a C1 and C2 such that the leading bit of C2 is 1, shift C2 by one place, and xor it with C1. The result will be equal to the polynomial!

The byte values I came up with, and their corresponding CRC values (after byte swapping to correct for little-endianness) were as follows:

Bytes CRC

----- ----

02 00 763c

04 00 ec78

08 00 98f3

10 00 71e5

20 00 e3ca

40 00 8797

80 00 4f2d

00 01 9e5a

00 02 7cb7

00 04 f96e

00 08 b2df

00 10 25bd

----- ----

02 00 763c

04 00 ec78

08 00 98f3

10 00 71e5

20 00 e3ca

40 00 8797

80 00 4f2d

00 01 9e5a

00 02 7cb7

00 04 f96e

00 08 b2df

00 10 25bd

Working upwards through the list of CRCs, it is apparent that whenever the LSB of a CRC is 0, the preceding one is obtained by simply right-shifting it, which is consistent with the theory outlined above.

Now let's see if we can extract the polynomial. Applying the shift-xor operation to adjacent CRC values gives the following results, shown in blue:

02 00 763c

0000

04 00 ec78

a001

08 00 98f3

a001

10 00 71e5

0000

20 00 e3ca

a001

40 00 8797

a001

80 00 4f2d

0000

00 01 9e5a

a001

00 02 7cb7

0000

00 04 f96e

a001

00 08 b2df

a001

00 10 25bd

0000

04 00 ec78

a001

08 00 98f3

a001

10 00 71e5

0000

20 00 e3ca

a001

40 00 8797

a001

80 00 4f2d

0000

00 01 9e5a

a001

00 02 7cb7

0000

00 04 f96e

a001

00 08 b2df

a001

00 10 25bd

From this it is clear that the polynomial is 0xa001.

A few other things are also apparent. The shifting direction indicates that the ReflectOut parameter should be True, since shifting to the right is equivalent to using the canonical left-shifting version of the algorithm with the polynomial 0x8005 and then reflecting the resulting CRC. It is notable that 0x8005 is one of the standard 16-bit polynomials -- the one that pycrc calls "crc-16".

The fact that the 1 bits proceed from right to left within each byte as we go down the sequence indicates that bits are being processed LSB-first. In terms of the model parameters, this means ReflectIn = True.

I had now established all except two of the CRC algorithm parameters. I was making considerable progress!

The first thing I did was go back and try pycrc again, in case I had made a mistake of some kind the first time around. Armed with knowledge of the polynomial, I should have been able to run pycrc over one of my difference messages and obtain the correct CRC for it. But, it still didn't work.

I was convinced by now that I was dealing with a very standard CRC algorithm, so this failure was perplexing. Thinking about what could cause it, it occurred to me that my assumption about what region of the file was covered by the CRC might be wrong. Perhaps not all of the pad bytes were included, in which case I was going too far and ending up with the wrong CRC.

Trying to think how I could tell how far I should be going, I came up with the following idea. Start by initialising the register with the polynomial -- this corresponds to the state just after encountering the 1 in a 1-bit difference message. Then run the algorithm and count the number of steps required before the known CRC value is reached. Assuming it was eventually reached, that would tell me how many 0 bits following the 1 were included in the CRC.

The test file I used had a 1 in the byte at offset 0xab. Running the algorithm told me that a further 0x358 zero bytes were needed to reach the correct CRC, meaning that the last byte was at offset 0x403... which was 4 bytes past the end of the file!

So an extra 4 bytes from somewhere were being included in the CRC. To check this, I added another 4 zero bytes to the end of my difference file and ran pycrc over it. This time, I got the correct checksum.

Previously I had assumed that the range of checked bytes was contiguous, but now I retracted that assumption and went back to investigate the header area more thoroughly. Poking around in it, I found that apart from the CRC itself, there were three other bytes that would provoke a CRC mismatch if I changed them.

One of them had a constant value of 0x04 in all the files I looked at. The other two bytes appeared to contain the length of the schema portion of the file, excluding the header.

My first thought was that perhaps these were three of the bytes making up the extra data, or that the extra data was derived from them in some way. However, I realised that there was another likely explanation for the CRC depending on the length bytes. If the application was using these bytes to tell how much schema data to read, then changing them would cause it to calculate a CRC for the wrong amount of data. So the contents of the extra bytes needn't have anything to do with the length bytes at all.

There was another curious thing about the extra bytes. The fact that I was able to successfully calculate CRCs for my difference files meant that the unseen contents of these bytes must have been the same for both of the files that went into each difference file. Otherwise, they would have differed by more than one bit, and my technique for deducing the polynomial wouldn't have worked.

This suggested that the extra bytes might be constant, or at least they might depend only on the length of the file and not the data in it. However, it might only have been a coincidence, because I had only been working with files having very small differences, and I might just not have happened to change anything that would trigger a difference in the extra bytes.

It would be fortuitous if the extra bytes were constant, because since they appear at the end of the data being checked, their effect on the CRC depends only on their contents and not on the length of the file. If they were constant, their actual contents wouldn't matter, because I would be able to assume they were zero and compensate for their effect by making an adjustment to the XorOut parameter.

In any case, I had gone about as far as I could go using difference files, and it was time to start working with the original files.

The XorOut values that I obtained this way were not 0 or 0xffff, but this was to be expected, because the actual contents of the extra bytes may not have been zero. However, all of the files of a given length seemed to require the same XorOut value, providing more support for the idea that the extra bytes are the same for all files of a particular length.

To further check this, I created some more test files with very different contents. The hypothesis still seemed to hold up -- the XorOut values required for a particular length was always the same.

I needed different XorOut values for different lengths, however. This could have been because the extra bytes varied somehow according to the length of the file, but it could also have been because a non-standard XorIn was being used. This would get transformed in different ways as it passed through different numbers of shift-xor steps and change the resulting CRC.

I decided to stick with the hypothesis that the extra bytes were constant, and try to find an XorIn value consistent with it. This meant finding a single pair of XorIn/XorOut values that would work across files of different sizes.

Now, I could have used brute force again here. There are only 2**16 possibilities, because choosing an XorIn for some file uniquely determines the XorOut needed, and then it's just a matter of trying the same combination on another file and seeing if it works.

However, I wanted to see if I could come up with a more direct approach, partly for the challenge, and partly because it would give me a technique that could be applied to larger CRCs for which exhaustive searching would not be feasible.

Let n1 and n2 be the lengths of the messages. The exclusive-or of their CRCs will be

C_{1} + C_{2} | = | (T^{n1}I + D_{1} + F) + (T^{n2}I + D_{2} + F) |

= | T^{n1}I + T^{n2}I + (D_{1} + D_{2}) |

Notice that F has cancelled out again, but not I. We can further rearrange the equation to give

T^{n1}I + T^{n2}I | = | C_{1} + C_{2} + D_{1} + D_{2} |

or

(T^{n1}+ T^{n2}) I | = | K |

where K = C

Adding T

In particular, we can represent T using a matrix, and calculate Tx using matrix multiplication with mod-2 arithmetic.

To calculate T, we decompose x into a sum of basis elements b

Having got T, we can calculate M = T

When I tried to do this, the elimination algorithm got stuck, unable to find a pivot element for the last row. Examining the matrix revealed that it had ended up with an all-zero row. When this happens, it is an indication that the equation has more than one solution.

The final state of the matrix looked like this:

0: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

2: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1

3: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

4: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

5: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

6: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

7: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

8: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

9: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

10: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

11: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

12: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1

13: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

2: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1

3: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

4: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

5: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

6: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1

7: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

8: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

9: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

10: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

11: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

12: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1

13: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

14: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

15: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This corresponds to the following set of equations for the bits of I:

b_{0} + b_{15} | = | 0 | b_{8} | = | 0 | |

b_{1} | = | 1 | b_{9} | = | 0 | |

b_{2} | = | 1 | b_{10} | = | 0 | |

b_{3} | = | 0 | b_{11} | = | 0 | |

b_{4} | = | 0 | b_{12} | = | 1 | |

b_{5} | = | 0 | b_{13} | = | 0 | |

b_{6} | = | 1 | b_{14} + b_{15} | = | 1 | |

b_{7} | = | 0 | _{} |

By inspection, there are two bit patterns that satisfy these constraints, 0x5046 and 0x9047. Trying them on the relevant files revealed the corresponding XorOut values required:

XorIn | XorOut | |

5046 | 3e64 | |

5047 | fe65 |

I tried these values on the rest of my test files, and both combinations worked for all of them, regardless of the file size.

An Aside It's interesting to note that the difference between these two XorIn values (0xc001) is also the difference between the corresponding XorOut values. It turns out that this is not a coincidence, because 0xc001 is a fixed point of the shift-xor operation: (0xc001 >> 1) ^ 0xa001 == 0xc001. So if you change the initial register value by 0xc001, you change its value on all subsequent steps by 0xc001 also. In terms of linear algebra, the value e = 0xc001 is an eigenvector of the matrix T with eigenvalue 1, i.e. T e = e. This also means that T ^{n} e = e for any n. So (T^{n1} + T^{n2}) (x + e) = (T^{n1} + T^{n2}) x + e + e = (T^{n1} + T^{n2}) x.Consequently, if X is a solution of (T ^{n1} + T^{n2}) x = K, then X + e must also be a solution. |

I never found out exactly what was going on with the extra bytes, or why such strange-looking XorIn/XorOut values were needed, but I had found an algorithm which seemed to be good enough, and I was satisfied with that.

So if I ever need to figure out a 64-bit CRC, I'll be able to do it without needing a Blue Gene. And now that you've read this, you will too!