Encryption‎ > ‎

Ciphers and cryptography

Substitution cipher

The substitution cipher involves one piece of data being replaced with another. The data could be a letter. For example, in a message the letter a is replaced by the letter d, the letter b replaced by the letter e, the letter c replaced by the letter f, and so on. Each letter is replaced by the one three to the right of it in the alphabet. When letter x is reached it is transposed to the letter a. In effect the alphabet has been 'wrapped around'. The table below shows the arrangement of plaintext and ciphertext characters using a shift of three.

plaintextabcdefghijklmnopqrstuvwxyz
ciphertext equivalentdefghijklmnopqrstuvwxyzabc

Using this cipher the word hello in plaintext would become khoor in ciphertext. This particular cipher, that involves a shift of three to the right, is known as the Caesar cipher after Julius Caesar who used it to communicate with his generals. The enciphering key Ke is said to equal 3 and the decrypting key Kd is said to equal -3. Or:

Ke = 3
K= -3

Kd is -3 because to decipher a letter the third one on its left has to be substituted. For instance in khoor the letter third on the left of k is h. In other words 'going' to the right is a positive shift and 'going to the left' is negative. Deciphering is a reversal of the enciphering process.

In an alphabet of 26 letters there are 25 possible shifts to the right, starting at 1 (a->b, b->c, etc.) and going through to 25 (a->z, b->a, etc.). The key space for this particular cipher method is said to be 25. This is a low number and it would be reasonably easy to crack such an encrypted message just by trying out various shifts until some intelligible text appears. In fact, a print out of all possible shifts could be generated in seconds using the right computer program. As a rule of thumb the greater the key space the harder a cipher is to crack. The keyspace could be increased by including capital letters (add on 26), punctuation symbols, numbers, spaces and other alphanumeric keyboard characters.

Also, the letters in the English language appear at a particular frequency  in any non-specialised passage of text. For instance, the letter t occurs much more frequently on average than the letter z. The graph below shows relative frequencies. As can be seen, e is the most common, followed by t, then a. The least common are q and z.


The analysis of ciphertext in order to determine the original plaintext is called Frequency analysis. Again a computer program can use this knowledge of frequencies of letter occurrence to analyse ciphertext and come up with a pretty good estimation of the original plaintext.

Plaintext encryption using a simple shift substitution system is quite predictable. It is possible to use Modulo arithmetic to make the shift system much more unpredictable and stronger. To learn more follow this link: modulo encryption

There are broadly speaking two types of cryptography: Symmetric-key cryptography and Public-key cryptography. The sections below discuss them.


Symmetric-key cryptography

The substitution method above is an example of symmetric-key encryption because to use it the sender and receiver have to know the the encryption or decryption key (Ke or Kd), depending on who is sending and who is receiving.

To quote this section of the Wikipedia page on encryption (edited):

Symmetric-key cryptography refers to encryption methods in which both the sender and receiver share the same key (or, less commonly, in which their keys are different, but related in an easily computable way). This was the only kind of encryption publicly known until June 1976 when a paper on public-key encryption was published.

The modern study of symmetric-key ciphers relates mainly to the study of block ciphers and stream ciphers and to their applications. Block ciphers take as input a block of plaintext and a key, and output a block of ciphertext of the same size. Since messages are almost always longer than a single block, some method of knitting together successive blocks is required.

The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are block cipher designs which have been designated cryptography standards by the US government. DES, especially its more secure triple-DES variant, is used across a wide range of applications, from ATM encryption to e-mail privacy and secure remote access. Many other block ciphers have been designed and released.

Stream ciphers, in contrast to the 'block' type, create an arbitrarily long stream of key material, which is combined with the plaintext bit-by-bit or character-by-character. In a stream cipher, the output stream is created based on a hidden internal state which changes as the cipher operates.

Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length as input, and output a short, fixed length hash which can be used in (for example) a digital signature. For good hash functions, an attacker cannot find two messages that produce the same hash. MD5 and SHA-1 and SHA-2 are popular standards. 

SHA-1 outputs a 40 character string no matter what the length of input. For example the password aaaaaa encrypted using SHA-1 using the phpMyAdmin program becomes f7a9e24777ec23212c54d7a350bc5bea5477fdbb. SHA-1 is what is called a one way function. A value encrypted by it can not be decrypted as their is no known inverse function.

PlaintextSHA-1 encrypted
  aaaaaa


  f7a9e24777ec23212c54d7a350bc5bea5477fdbb



The following text encrypted using SHA-1 produces the 40 character string 9093b444de8e236f492284b8fb6e0a9fc8cbfac1

Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length as input, and output a short, fixed length hash which can be used in (for example) a digital signature. For good hash functions, an attacker cannot find two messages that produce the same hash. MD4 is a long-used hash function which is now broken; MD5, a strengthened variant of MD4, is also widely used but broken in practice. The U.S. National Security Agency developed the Secure Hash Algorithm series of MD5-like hash functions: SHA-0 was a flawed algorithm that the agency withdrew; SHA-1 is widely deployed and more secure than MD5, but cryptanalysts have identified attacks against it; the SHA-2 family improves on SHA-1, but it isn't yet widely deployed, and the U.S. standards authority thought it "prudent" from a security perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash algorithm toolkit."[20] Thus, a hash function design competition is underway and meant to select a new U.S. national standard, to be called SHA-3, by 2012.

Thus SHA-1 can be used to create a digital signature for any piece of text or file. The integrity of a downloaded file can be verified by encrypted it with the SHA-1 algorithm and comparing it with its digital signature.


Public-key (asymmetric) cryptography

Symmetric encryption suffers from a major flaw. That is the need for both parties involved in a transaction to know the keys involved. These must be transmitted securely in some fashion. Asymmetric methods get around this problem.

From the HowStuffWorks web page on encryption here is quote that sums up public-key encryption well:

Also known as asymmetric-key encryption, public-key encryption uses two different keys at once -- a combination of a private key and a public key. The private key is known only to your computer, while the public key is given by your computer to any computer that wants to communicate securely with it. To decode an encrypted message, a computer must use the public key, provided by the originating computer, and its own private key. Although a message sent from one computer to another won't be secure since the public key used for encryption is published and available to anyone, anyone who picks it up can't read it without the private key. The key pair is based on prime numbers (numbers that only have divisors of itself and one, such as 2, 3, 5, 7, 11 and so on) of long length. This makes the system extremely secure, because there is essentially an infinite number of prime numbers available, meaning there are nearly infinite possibilities for keys. One very popular public-key encryption program is Pretty Good Privacy (PGP), which allows you to encrypt almost anything.

The graphic at the top of the introduction page to this section illustrates the process quite well. It comes from the following page that has useful information on public-key encryption and digital certificates: Internet Encryption

History

Public-key encryption was made possible in 1976 through a protocol (set of rules) published in a paper by Diffie and Hellman. See picture at left. They invented the Diffie-Hellman exchange protocol, though it was subsequently realised that others had previously developed the algorithms involved earlier in the 1970s. A famous encryption system based on the the protocol was RSA, named for its creators Ronald Rivest, Adi Shamir, and Len Adleman.




Public-key cryptography can be used to implement digital signature schemes. This is where digital content can be verified through its correspondence to the digital signature that was created from it when it was original. If the content has been changed, a recalculating of the signature will produce a version that does not match the original signature. From Wikipedia: 

"Public-key algorithms are most often based on the computational complexity of "hard" problems, often from number theory. For example, the hardness of RSA is related to the integer factorization problem". 

The integer factorization problem is "the breaking down of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer. When the numbers are very large, no efficient integer factorization algorithm is known; an effort concluded in 2009 by several researchers factored a 232-digit number (RSA-768) utilizing hundreds of machines over a span of 2 years". Non-trivial divisors are in effect prime numbers (a number that can only be divided evenly by itself and 1), and large prime numbers are notoriously hard to find.

CS Unplugged Activities

  1. A precursor to public-key encryption: Tourist Town

  2. Public-key encryption at Kid Krypto


Additional resources
From the Moodle course "Living in a Computing World" by Dr. Jody Paul.




Comments