go to end of the page
sample illustrations
parameters of Pless's system

1998 Summer Program: Coding and Cryptography Week 1: Codes, Design, and Cryptography

Talk abstract:

Some Connections Between Coding Theory and Cryptography

Vera Pless, University of Illinois - Chicago

This talk will explore some common areas between coding theory and cryptography and attempt to evaluate their status and impact on these disciplines. Views on this vary widely. One answer is that both draw on common subject areas of mathematics, such as number theory, finite fields, algebraic geometry, combinatorial designs and sequences.

We review some specific long standing connections including the public key McEliece scheme, cyclic codes and linear feed-back shift registers, MDS codes and threshold schemes, and wiretappers with limited resourses and generalized Hamming weights. We give all the relevant definitions.

Same Basis-Information Theory

C.E. Shannon - A Mathematical Theory of Information (1949), Communication Theory and Secrecy Systems (1949).

Development time different-Coding theory started in 50's, (unclassified) mathematical Cryptography started in 70's.

Both are concerned with communicating digitally encoded information.

Goals are very different-Coding - reliability, want to reveal message in presence of noise,

Cryptography -

  1. Privacy - conceal message (from eavesdropper),
  2. Signature - proof for receiver of message that message came from a certain party,
  3. Integrity - proof for receiver that message was not changed by a third party.

Ciphersystems-block or stream, Codes- block or convolutional codes,

Block cases have been analyzed more, others harder to analyze, used more(?).

Stream ciphers are inspired by the one-time pad.

This cipher has perfect security, with the obvious disadvantage-need unlimited amount of key.

This suggest: to encrypt the plaintex- add " pseudo-random" sequence generated by a deterministic algorithm.

To decrypt the ciphertext, subtract the sequence.

Any such sequence is ultimately periodic, so closely related to cyclic codes.

Properties a pseudo-random sequence should have:
  1. period should be very large (say 10^50),
  2. sequence should be easy to generate,
  3. knowing a portion of plaintext should not enable the cryptanalyst to reproduce the whole sequence.

We represent messages in binary - as sequences of 0's and 1's

Coding - add redundancy so can get original message 1001 110

An [n,k] binary code C is a vector space of n-tuples of 0's and 1's of dimension k.

Hence code can be described by a basis called a generator matrix.

C is cyclic if whenever a vector is in C, so are all of its cyclic shifts.

Example: n=7,

illustration 1

back to top
go to end of the page


Can get the same set of codewords from the output of a 3-stage linear feed-back shift register, LFSR.

illustration 2


characteristic polynomial:

1+x+x3

Initial state: contents of s0, s1, s2. If initial state is 100 out put is 1001011100...

If start with 100, get 100 1011 which them repeats.

The characteristic polynomial of this LFSR is 1+x+x3 which is primitive.

This is so in general, an n-stage LFSR has maximal period 2 n-1 if and only if its characteristic polynomial is primitive.

S. Golomb (1967) gave reasonable conditions for a binary sequence to be pseudo-random:

  1. about same number of 0's and 1's
  2. half the runs have length 1, a quarter have length 2, an eighth ...
  3. condition on inner product.

Cyclic codes give answer.



Generate sequence by a linear feedback shift register.

Vectors in a cyclic code C have these properties if the generator polynomial of the code (characteristic polynomial of the LFSR) is primitive.

Want larger periods for cryptographic uses.

Can get periods as large as you want.

If n=23 have many sequences which are periodic of period 2{sup>23 -1. These can be found in tables (for coding purposes). Can get even larger periods by combining.

So have Golomb properties 1) and 2) for pseudo-random sequences. Unfortunately can show (due to linearity) that 2n consecutive bits determines the whole sequence ! Property 3) does not hold.

Just 46 consecutive bits for the sequence of length 223 -1 gives the coefficients of the primitive polynomial hence the key LFSR.

This is too nice to give up.

Use other devices to provide non-linearity.

I did this (1977).

I used an easily available device called a J-K flip-flop.

It has 2 binary inputs and one binary output.

Parameters of Pless's system
back to top
go to end of the page

Each pair of S.R.'s have periods which are relatively prime. Can show that this implies that the final sequence has period = to the product of all the periods namely 1.519x1029.

The number of keys is 2.435x1051.

This particular scheme was broken by Rubin (1979).

Used the fact that even though the output of a J-K flip-flop, fed by two binary sequences, is a sequence of independent and uniformly distributed random variable (as Wanted), this output exhibits a strong correlation with either input sequence.

Maybe not breakable if use L.F.S.R.'s with larger periods or used multiplexers or other non-linear schemes instead of J.K. flip-flops.

These possibilities not discussed in literature.

Possibly new insights into algebraic theory of convolutional codes can be applied to stream ciphers.

"Analysis and Design of Stream Ciphers", Rainer Rueppel, 1986.

Every sequence generated by a deterministic machine is eventually periodic.

There is a shortest L.F.S.R. which generates any given sequence. If know this, can generate rest of sequence.

The length of this shortest LFSR is called the linear complexity of the sequence.

This is the most fundamental parameter for estimating the unpredictability of a key stream. This together with a requirement of uniform statistics good basis for determining randomness.

The linear complexity of a sequence can be determined by the Berlekamp-Massey decoding algorithm (for BCH codes).

For public-key cryptography want to use problems known to be hard (but with a trap-door).

Decoding a random code: Known to be NP-complete.

McEliece scheme (1978): Is based on the fact that one can decode large Goppa codes, also many exist. Need to disguise code used.

Will illustrate with Hamming [7,4,3] single error-correcting code:


G =

Public G' = SGP, S invertible 4x4, P permutation matrix.

Alice sends mG' + e, m length 4 message, e weight one 7-vector, the single error.

G gen. matrix of Goppa code. The public 524x1024 G' used for encoding is SGP where S is invertible and P is perm. matrix. Alice now adds a weight 50 random error to her encoded (and hence expanded) message. Difficult to decode a random code. Bob can by returning to Goppa.

Bob applies P-1 and decodes. He then applies S-1 to get message.

McEliece scheme uses 50 error-correcting Goppa codes of length 1024. Many such. Need to find degree 50 irreducible polynomials (probabilistically).

If these positions are also independent, then can find the message by Gaussian elimination. Know if correct if distance from original is < 51.

If close to rank 524 can also guess.

Probability is high that positions are independent. Expected workload about 280.7.

Possible Attacks:

  1. Guess S and P. Too many.
  2. Find closest codeword by comparing with 2524 words.
  3. Decode by syndrome decoding. 2500 comparisons.
  4. Select 524 random positions. Hope they are free of errors.

None are feasible.

Last seems to be least work.

If chosen k positions do not work choose others using knowledge gained.

This has stimulated work in decoding a random code particularly by information sets.

McEliece scheme not broken, but not used.

Not good for digital signatures.

Seems to be generally unwieldy.

Attempts to shorten length and turn it into a conventional scheme seem to have been broken.

Secret Sharing: Have n people and want a minimum of k of them to be there before secret is know.

M.D.S. code: every k positions are information set: [n, k, n-k+1] code

Ex. Hexacode: [6,3,4] GF(4) code

Secret: codeword, each person gets one coordinate, need 3 people to find codeword.

3 or more positions determine codeword uniquely, not fewer.

However, this is a single error correcting code so if we have the coordinates of 5 people, one can lie and we can still determine the secret and identify the malicious liar.

Theorem: (McEliece, Sarwate 1981)

If C is a k-dim. MDS code of length n can give people coordinates of a codeword (the secret).

If k+2t or more participants share their coordinates and t of these values are incorrect, then the secret can be recovered and the liars identified.

If k+2t-1 or fewer share their values and t of these are incorrect, then the secret cannot be recovered correctly and each codeword is equally likely.

Possible Difficulty: Only know the existence of MDS codes for certain lengths.

All of these are mainly contributions of coding to cryptography.

Other way: Effort to analyze the wire tap channel (which uses codes for cryptographic purposes) has led to the very interesting study of generalized Hamming weights.

(V.Wei-1991)

" Coding theory at work in cryptology and vice versa" by Henk C.A. van Tilborg to appear in Handbook in Coding Theory, edited by Pless and Huffman - Elsevier Publishing Co., 1998

Both subjects draw heavily on certain areas: finite fields, designs number theory, sequences important in both.

Institute for Mathematics and Its Applications home page
Back to Workshop Schedule
Back to top