CueCat Output Decode and Decipherment

CueCat Output Decode and Decipherment
http://oilcan.org/cuecat/base64.html

Stephen Satchell, Thursday, September 21, 2000.
Copyright 2000 Stephen Satchell -- all rights reserved. Anyone is free to link directly to this essay from any Web site. Reproduction, but not for profit, is hereby granted as long as this
Copyright notice remains attached. Any commercial use of the information of this paper requires that appropriate publishing rights be secured from the author.
Table of Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
Goal of this paper

Genesis: What CueCat Returns in the Raw
Analyzing the Coding
Cryptoanalysis, using known-plaintext attack
Checking Our Crypto-Work
The Final Two Code Positions
Putting It All Together
Conclusions
So What's Next? (humor)
Goal
There have been claims that some people have gone to extreme lengths to determine the output scheme for the Digital Convergence CueCat. This essay
intends to show that the determination of the output of the CueCat is simple and straightforward, and requires a minimum of effort and cryptoanalysis.
The other purpose of this essay is to show that the intent of the encoding scheme is not to "protect" the output from being misused, but instead the output was
designed to be placed into a URL with a minimum of post-processing on the part of the program that sends barcode output to a central site.
Genesis: What CueCat Returns in the Raw

The method used by the Cuecat barcode wand to return information from scanned barcodes is fairly simple. First, the keyboard version of the wand emits
standard IBM PC AT keyboard codes, which is interpreted by the keyboard controller in the IBM-compatible personal computer and the software driver in
the BIOS or operating system (MS-DOS, Windows, OpenBSD, BeOS, whatever). The keyboard codes generate keystrokes that correspond to keyboard
1 of 9
15/07/2012 11:29 PM
characters.
Here are some examples of the output of the barcode wand:
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.CNf7.edCMmYSMlwmqiJCGkYyVlW. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.aabI.F39-FN5+Fx19. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3rZDNzWENP6. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v6ENzWENP6. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v7CxzWENP6. <enter>
As is easy to inspect, the code consists of the "header", a period, a series of graphic characters, a second period, a much shorter series of graphic characters, a
third period, a series of graphic characters, a fourth period, and an ENTER code.
First conclusion: the period characters delimit fields of data. So let's break the data into the obvious fields and see what we get:
C3nZC3nZC3nZCNzYDNPYC3nX
CNf7
edCMmYSMlwmqiJCGkYyVlW
aabI
F39-FN5+Fx192
cGf2
ENr7CNz0DxjZD3rZDNzWENP6
cGf2
ENr7CNz0DxjZD3v6ENzWENP6
cGf2
ENr7CNz0DxjZD3v7CxzWENP6
The first column of data is constant. When different barcode wands are tried, though, this number changes, so it's obvious that this represents some sort of
identifier. The second and third fields change; the second field is a fixed-length field, while the third field varies. We don't show the barcodes that were
scanned, but the first one is a base-128 representation of the author's name, the second is a "test" barcode in Q-fur format, and the third is the ISBN
"BookLan" barcode of the author's book Linux IP Stacks Commentary. The fourth barcode is particularly interesting in that it is the BookLan scan of a sister
book in the Coriolis series, Linux Core Kernel Commendary -- interesting because the ISBN codes are exactly one digit apart. Finally, the last barcode is of
another companion book, Apache Server Commenary , which is one down from the IP Stacks number. In other words, we have an interesting bracketing of
numbers.
Analyzing the Coding

The following table shows the correspondence between the ISBN numbers and the scan codes in the third column:
2 of 9
Title
ISBN
Wand coding
Apache Server Commentary
978157610468253999
ENr7CNz0DxjZD3v7CxzWENP6
15/07/2012 11:29 PM
Linux Core Kernel Commentary
978157610469953999
ENr7CNz0DxjZD3v6ENzWENP6
Linux IP Stacks Commentary
978157610470553999
ENr7CNz0DxjZD3rZDNzWENP6
Now, let's eliminate all the "common" information and see what changes from code to code:
Title
ISBN delta
Wand coding delta
Apache Server
682
v7Cx
Linux Core Kernel
699
v6EN
Linux IP Stacks
705
rZDN
Let's look at the bit patterns. We'll take the bits of the expected data and group them into six bits, the standard for base-64 conversion. Here is what we get:
682
001101 100011 100000 110010
13 35 32 50
v 7 C x
699
001101 100011 100100 111001
13 35 36 57
v 6 E N
705
001101 110011 000000 110101
13 51 0 53
r Z D N
Cryptoanalysis, using known-plaintext attack

OK, so we have come this far, and can't establish a one-to-one correspondence with code positions and characters in the wand report. Now it's time to do
some more experimentation. To do this, we need a way of generating our own barcodes. There are plenty of barcode generators out there; the GNU Barcode
system for Linux works pretty well, and there are demo systems that run under windows. Using any one of those, we now generate a Code 39 (3-of-9)
barcode in which we code a string of zeros, and scan it to obtain this result:
.C3nZC3nZC3nZCNzYDNPYC3nX.ahb6.C3nZC3nZC3nZC3m.
One thing pops out very quickly: the pattern "C3nZ" repeated three times, corresponding to the three sets of three zeros in our test barcode. We also see this
3 of 9
15/07/2012 11:29 PM
in the first field, which suggests that the field also contains zeros in it -- for this particular wand, there appears to be at least nine leading zeros. So let's start
looking at bit patterns again:
000
00110000 00110000 00110000
001100 000011 000000 110000
12 3 0 48
C 3 n Z
Whatever scrambling is being done, it has to be done after the conversion from base-64 coding to 8-bit data. Whatever code is being used doesn't use any
form of cypher streaming, because identical input appears to generate identical output.
The generation of base-64 code sets usually follows a preset pattern. For example, the classic uuencode(1) program used the decode table
`!"#$%&'()*+,-./
0123456789:;<=>?
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_
Which is the ASCII collating sequence (except that the space is replaced by the backquote). Most variants of the base-64 scheme tend to use regular
sequences for their translate tables. The most common alternative for base-64 make use of the upper-case letters, lower-case letters, and two special
characters to make up the alphabet. So let's assume that the coding follows this pattern, so we have an initial table that looks like this:
ABCDEFGHIJKLMNOP
QRSTUVWXYZabcdef
ghijklmnopqrstuv
wxyz0123456789??
Now let's see what we get when we take our "zero sequence" and apply it to our data.
C 3 n Z
2 55 39 25
000010 110111 100111 010101
00001011 01111001 11010101
Nope, that didn't work; the three resulting characters are not the same. Let's reverse the capital and lower-case letters and try again.
abcdefghijklmnop
qrstuvwxyzABCDEF
GHIJKLMNOPQRSTUV
WXYZ0123456789??
4 of 9
15/07/2012 11:29 PM
C 3 n Z
28 55 13 51
011100 110111 001101 110011
01110011 01110011 01110011
Eureka! We have a constant pattern. When we subtract out the desired value of the character '0', or 0011000, we get a delta pattern of 01000011 or the ASCII
character 'C'. Now the question is, how is this delta applied? Best guess is that we use an exclusive-OR operation on each character, because that way there
is no end-around wrap problem.
Checking Our Crypto-Work

Now that it's established that we have a base-64 code, we need to recalculate the deltas so that three source digits correspond to four code digits, and that we
maintain block order. So the revised delta table looks like this.
Title
ISBN delta
Wand coding delta
Apache Server
468253
D3v7CxzW
Linux Core Kernel
469953
D3v6ENzW
Linux IP Stacks
470553
D3rZDNzW
Given our table of a..zA..Z0..9??, let's see if we can see the data come out correctly:
468253
00110100 00110110 00111000 00110010 00110101 00110011
01000011 01000011 01000011 01000011 01000011 01000011 pattern
01110111 01110101 01111011 01110001 01110110 01110000 XOR
011101 110111 010101 111011 011100 010111 011001 110000
29 55 21 59 28 23 25 48
D 3 v 7 C x z W
And we have a match. I leave the other two ISBN codes to you, the student, as an exercise.
The Final Two Code Positions

There seems to be some confusion about positions 62 and 63 of the base-64 conversion table. Comparisons of output from multiple units shows that there
are differences from unit to unit. So far, it appears that position 62 is represented by a plus sign (+) or a comma (,) depending on the version of wand you
have. Position 63 is most frequently represented by a minus sign (-), but there has been a report that position 63 may be represented by a forward-slash (/).
5 of 9
15/07/2012 11:29 PM
The best way to find out what's going on is to open this test document (PDF), print it on a 600-dpi laser printer, and (using the driver of your choice) swipe
the barcode. See for yourself what characters come back -- within the code report you will see the representation for both positions 62 and 63. If you see
something other that what's described on the page, let me know and I'll update this information.
Putting It All Together

The proof of the concept is to code up a decoder and try it. So here is the source of a C-language filter that implements all the stuff above. It has been tested
under Linux and in the Windows environment using the Borland C++ compiler. Here are the results of the five scans we documented at the beginning of this
paper:
.C3nZC3nZC3nZCNzYDNPYC3nX.CNf7.edCMmYSMlwmqiJCGkYyVlW.
000000000151591002 128 Stephen\x20Satchell\x00
This barcode was created using the B-Coder program. It is a Code-128 encoding of the author's name. The trailing null is an artifact of the program that
signifies that the "extra" data at the end of the barcode consisted of nulls, and therefore is of no consequence.
.C3nZC3nZC3nZCNzYDNPYC3nX.aabI.F39-FN5+Fx19.
000000000151591002 CC! <<<===>>>
This particular barcode was created using the Azalea Software package, which is a "Cue" with the special encoding seen here. The purpose of the code is to
investigate the 62/63 problem described above. This encoding is guaranteed to exercise that portion of the base-64 table.
.C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3rZDNzWENP6.
000000000151591002 IB5 978157610470553999
.C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v6ENzWENP6.
000000000151591002 IB5 978157610469953999
.C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v7CxzWENP6.
000000000151591002 IB5 978157610468253999
These three barcode scans are the ISBN numbers from the three Coriolis books. Comparing the output of the barcode wand with the source material, we see
that the code itself is correct.
In the process, we learned that the particular barcode wand used to decode output has a serial number of 151,591,002. Because I doubt that 151 million of
these things have been shipped, I supposed that the last three digits may indicate a code revision level within the barcode wand itself. As we see additional
6 of 9
15/07/2012 11:29 PM
samples of barcode wands, we may or may not be able to justify this speculation.
The second field reveals itself to identify the particular symbology detected by the barcode wand when scanning the field. The capabilities page shows all the
symbology fields that have been confirmed by the author; others have reported additional report values as well that have not been confirmed and added to the
list.
Scanning several dozen different barcodes with human-readable values, including proprietary cues, shows that the code developed to the guidelines described
here works fine. This completes the analysis of the output of the CueCat.
Conclusions
The choice of character code set used with the CueCat is obvious on its face when you compare the codeset with the character set allowed in HTML forms
reply strings. First, let's review how HTML forms work.
When a forms-capable browser is asked to submit form data to the HTTP server, it does it in one of two ways. Using the GET method, the form data
becomes part of the URL itself. Using the POST method, the data is placed in a datastream. In either case, the data is in a very specific format. In general,
the data itself is defined in BNF as follows:
<stream>
::= <element>
| <stream> & <element>
<element> ::= <name> = <value>
Within each and , spaces are converted to plus signs, and any occurrence of an ampersand, equal sign, percent sign, or plus sign must be escapes using the
form "%XX" where the "X"s are hexidecimal values.
When building a GET method URL, there are a couple of additional considerations. For example, it's a good idea to avoid the slash, blackslash, and colon
characters because the presence of these characters in astonishing (to software) positions can screw up HTTP proxy servers. Avoiding the ampersand and
equal characters eliminate the need to post-process the CatCue output to generate escape sequences. The plus sign is problematical, because an unadorned
plus sign is supposed to be interpreted as a space code...but the CGI script or program can handle that one fairly easily. The comma character can be
problematic for the server programmer when CGI libraries are used that impose a meaning on comma.
The changes in the codes in positions 62 and 63 reflect obvious corrections to fix problems with HTML transmission. Originally the codes used in this
position were comma and slash. In other CueCats, these positions are now represented by plus and minus
In other words, the use of a base-64 strategy was to ease the programming of the client support software.
The reason for the XOR dodge is considerably more opaque. Perhaps it was a misguided attempt at trying to bring the output of the CueCat under the Digital
7 of 9
15/07/2012 11:29 PM
Millenium Copyright Act, but such an attempt fails on its face because the information being "protected" does not fall under the classification of information
that can be copyrighted. The idea that a denial-of-service attack can be averted is doomed to fail because there is nothing that prevents, for example, a replay
attack.
From a theft-of-service perspective, the encryption doesn't make sense. The barcodes are unique to the CueCat, being a non-standard configuration of Code
128 symbology, so in theory no other scanner on the planet can read the damn things, let alone make sense of them. Yes, the encoding scheme using XOR
would prevent a casual third party from submitting codes for steering.
In short, I'm puzzled by what Digital Convergence was trying to do.
(Humor)
So What's Next?
Now I'm beginning to suspect that we will see future 'Cats with different 62/63 reports. What are possible additional 62/63 pairs? (Aren't you sorry you
asked?)
( ) For LISPers
< > Why HTML people didn't think of these I don't know
[ ] I so love array indexes
{ } For the blockheads in the group
\ ` Hey, these are slanted just like 'CatCues!
@ ! Wasn't this an organization at SF conventions?
^ ~ Not Not joke for C/Java programmers
& | For bit-ty C'ers
? : For Java programmers wanting to make up their mind
% = The ultimate in breaking HTML form strings
8 of 9
15/07/2012 11:29 PM
Back to Main CueCat Principles of Operations page.
9 of 9
15/07/2012 11:29 PM

CueCat Output Decode and Decipherment

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CueCat Output Decode and Decipherment

Încărcat de

Drepturi de autor:

Formate disponibile

CueCat Output Decode and Decipherment