The Illegal Prime

Hello, reader. An offering, for your perusal:

It’s unbelievable today, but there was a time when the government classed crypto as a munition and made it illegal for anyone to export or use it on national security grounds. Get that? We used to have illegal math in this country.

The National Security Agency were the real movers behind the ban. They had a crypto standard that they said was strong enough for bankers and their customers to use, but not so strong that the mafia would be able to keep its books secret from them. The standard, DES-56, was said to be practically unbreakable. Then one of EFF’s millionaire co-founders built a $250,000 DES-56 cracker that could break the cipher in two hours.

Still the NSA argued that it should be able to keep American citizens from possessing secrets it couldn’t pry into. Then EFF dealt its death-blow. In 1995, they represented a Berkeley mathematics grad student called Dan Bernstein in court. Bernstein had written a crypto tutorial that contained computer code that could be used to make a cipher stronger than DES-56. Millions of times stronger. As far as the NSA was concerned, that made his article into a weapon, and therefore unpublishable.

Well, it may be hard to get a judge to understand crypto and what it means, but it turned out that the average Appeals Court judge isn’t real enthusiastic about telling grad students what kind of articles they’re allowed to write. The crypto wars ended with a victory for the good guys when the 9th Circuit Appellate Division Court ruled that code was a form of expression protected under the First Amendment — “Congress shall make no law abridging the freedom of speech.” If you’ve ever bought something on the Internet, or sent a secret message, or checked your bank-balance, you used crypto that EFF legalized. Good thing, too: the NSA just isn’t that smart. Anything they know how to crack, you can be sure that terrorists and mobsters can get around too.

From the amazing (and edifying!) Little Brother, by Cory Doctorow. But no, this little burst of text isn’t about this (scary) modern day dystopian novel, or the EFF, or crypto, or security. It’s about a 1401 digit prime number that showed up in my feeds, via Everything2, a mystery in a puzzle in a mystery. It’s about Unix, and about peeling back layers. This prime number is illegal. I’m going to paste it anyway, then tell you why:


What is it for?

When written in base 16 (hexadecimal), this 1401 digit prime number found by Phil Carmody forms a gzip file of the C-source code that decrypts the DVD Movie encryption scheme (DeCSS). This prime number is illegal in every country that the DMCA applies.


If you’re like me, you’re firing up a terminal right now to unfurl the layers one after another, driven by burning curiosity to see what the seed contains. If you’re not… well, aren’t you a little curious? Of course your are.

Let’s get on a Unix shell and start peeling.

Getting the number

If you select all of the digits in the number above using your mouse, you’ll pull in a lot of newline (and/or carriage return) characters, even if you try copying the HTML source. If you’re going to deal with HTML source anyway, let’s make this easy on ourselves.

I read about it on Everything2, and you can use this URL to access the number from there, or you could use the URL to this page itself.

Now we pull down the HTML so we can easily parse the number:

$ curl -s

That should spew out the HTML containing this number. If ‘curl’ is not your thing, try wget to save to a file, or use the browser itself, and replace the above by ‘cat your filename’.

Next, let’s find this block of text. It’s preformatted with either a <blockquote> or <pre> tag, so we use that in a regular expression. There may be more than one preformatted section in the page, so we use a dirty hack in the pattern matching: Let’s put in the first few digits of the number in the pattern so the other blockquotes don’t match.

$ curl -s | awk '/(<pre>|<blockquote>).*4856.+(<\/pre>|<\/blockquote>)/ {print}'

We could have used grep instead of awk, but we’ll get to that in a moment. If the command works, you should see the illegal prime on screen with a bunch of <br> and other tags mixed in. Let’s get rid of those, shall we?

$ curl -s | awk '/(<pre>|<blockquote>).*4856.+(<\/pre>|<\/blockquote>)/ {gsub(/[^[:digit:]]/,""); print}' 

The gsub performs a global substitution on any character that is not a digit , /[^[:digit:]]/. This should print just the number on the screen. Things will get very unwieldy soon, so we’ll store this number (actually a string) into a variable. In Bash, the only way to do this is:

$ prime=$(curl -s | awk '/(<pre>|<blockquote>).*4856.+(<\/pre>|<\/blockquote>)/ \
{gsub(/[^[:digit:]]/,""); print}')

Note the ‘\’ followed by a newline, splitting the command onto two lines. If you were typing this, you don’t need to include either.

Converting it to hex

To see $prime in hex, we use bc, the handy command line calculator:

$ echo "obase=16; $prime" | bc

This should print out the number in Hex. Here are the first few bytes:


Oops. bc considered it appropriate to put end of line markers, \ splitting the text across multiple lines. Let’s get rid of them, using tr, the text replacement utility:

$ echo "obase=16; $prime" | bc | tr -d '\\\n'

There are three \’s in the text we’re deleting because one of them’s part of ‘\n’, one of them is a literal backslash that’s escaped by the shell, and the third one’s there to keep the second one from being escaped. Phew.

As a check, here’s what the gzip specification says about the header to any gzip file:

The first two bytes have the fixed values ID1 = 31 (0×1f, 037), ID2 = 139 (0×8b, 213), to identify the file as being in gzip format.

1F8B, indeed. We’re almost there.

Dumping it into a binary file

The above stream can’t be unzipped! This sure bewildered for a few minutes. I suppose this is what happens when your mental model of the computer’s storage structure is as badly broken as mine.

Of course it can’t be unzipped. It’s an ascii stream of hex characters; still an ascii stream! We need to dump the above hex code into a binary file.

How do we do that? Sure, you can write some code in C to do it. (Scan as hex and use putchar().)
But you know what they say about Unix: If you need to do it, it’s been done. Hmm. Let’s search for a bit:

$ apropos --and hex dump
hd (1)               - ASCII, decimal, hexadecimal, octal dump
hexdump (1)          - ASCII, decimal, hexadecimal, octal dump
xxd (1)              - make a hexdump or do the reverse.

Slowly I realize how true this adage is. Enter xxd. Some thinking reveals that we need to do a reverse hex dump. A hex dump is where the contents of a binary file are printed out as a stream of ascii-encoded hex characters, we’re trying to do the opposite. After reading the man page, we construct the reverse hex dump:

$ echo "obase=16; $prime" | bc | tr -d '\\\n' | xxd -r -p

If all goes well, we shoud see a stream of unprintable characters (no, not that type) desperately trying to print themselves on the screen.

On to the source

The rest is trivial. We have the gzipped stream pouring through ‘xxd‘, so let’s redirect that:

$ echo "obase=16; $prime" | bc | tr -d '\\\n' | xxd -r -p | zcat

And that’s it. If ‘zcat‘ doesn’t work, for some reason, try ‘gunzip -c‘.

The illegal source code hidden in the illegal prime should be floating on your screen now.

What, you thought I’d put it here? It’s illegal.

Probably more illegal than printing the number, anyway. If you skipped to the end hoping to glance at the code, here’s an unreadable composite command for you, instead. (Again, note the line breaking ‘\’ after awk’s first argument, added to keep this webpage from breaking.) Hide your eyes, they who fear line noise:

$ curl -s | awk '/(<pre>|<blockquote>).*4856.*(<\/pre>|<\/blockquote>)/ \
{gsub(/[^[:digit:]]/,""); print "obase=16;" $0}' | bc | tr -d '\\\n' | xxd -r -p | zcat

Or, you know, you could just read the code on Wikipedia. I didn’t. I don’t know anything about crypto, but I expected a certain kind of joy in deciphering this simple code, something I could actually hope to do. I wasn’t disappointed.

The wormhole

Now, the DMCA isn’t at odds with the opening excerpt: It’s all right to write code that does absolutely anything, and it may be all right to distribute it, but it’s illegal (by the DMCA) to use it to break copy-protection measures.

Except for one little thing, though. There is still illegal math.

A request, then: Someone explain that C source to me!

Further reading:

  • Little Brother, as described by Cory Doctorow: “This book is meant to be part of the conversation about what an information society means: does it mean total control, or unheard-of liberty? It’s not just a noun, it’s a verb, it’s something you do.”
    This book is licensed under Creative Commons, and is free to download. It’s also a cracker of a novel.

  • Regular expressions are awesome. I cannot stress this enough. If you want to see a whimsical but fantastic use of regexes, try this.

  • Phil Carmody’s Titanic Primes. See for yourself.

  • The gzip specification: Just to see what goes into making a standard. Why did they pick 1F8B as the gzip header, for instance?

  • The Unix utilities awk, bc, tr and curl. The entire GNU coreutils pack is awesome, more functionality than you can ever need in a terse, powerful set of commands.

  • Hex Dump Code Golf, a game played by the Stack Overflow community where the objective is to write a reverse hex dump in the language of your choice in the fewest chars possible. Perl won.

Old comments from the w0rdpress days

Traums said:
9 August, 2010 at 3:14 am

Although I doubt there is much in the fact that that number happened to be a prime. zcat seemed to detect and ignore some ‘trailing garbage’. Besides, the number can be altered drastically by changing variable names and adding comments to the *.c file. I don’t know how ergodically that number can be made to vary, and whether it is ’sufficiently easy’ to touch other nearest primes. Does the decrease in prime-number density flatten after some big number? Must summon Neels in on this one.

There is a self-referential GEB-esque philosophy to illegality of math (or any string of symbols). The statement, ” If one prints out all the digits of \pi, one will eventually encounter the entire Windows 7 source code” might be as profound as it is profoundly useless. But I’d wager that with appropriate choice of block chaining or cypher feedback rules, one can make out the first n digits of \pi to be the Windows 7 source code.

Crypto munitions brought back memories of

The decryption code itself demands another blogpost. But now we’ll have to buy a commercial DVD and break the law :)

Tweets that mention >> RIGHTSHIFT >> The Illegal Prime -- said:
9 August, 2010 at 11:37 am

[...] This post was mentioned on Twitter by Suresh Govindarajan, AlbinJames. AlbinJames said: RT @modularform: LOL >> RIGHTSHIFT » The Illegal Prime [...]