Showing posts with label aes. Show all posts
Showing posts with label aes. Show all posts

Sunday, March 6, 2011

AES ECB basic implementation

The AES algorithm can be implemented following strictly the AES round steps described in my previous post or by using pre-computed lookup tables. This post describes an AES implementation which follows the AES round steps and is based on existing source code developed by karl malbrain. It is written in a very modular way, making it ideal to get to know the algorithm without getting confused by the usage of look up tables. However, this version is aimed to understand the process and has no focus at all in performance optimization. As well the implementation is used to have a starting point to demonstrate how different levels of performance benefits can be achieved by using a modified version of the algorithm. From now on we will refer to this version as AES-B which stands for AES Basic implementation.

Let’s first describe a high level pseudo code of the algorithm:

- Generate AES expanded key
- XOR input data with encryption key
- For Rounds = 1 To NrRounds
- - SubBytes operation
- - ShifRows operations
- - If Rounds < NrRounds
- - - MixColumns Operation
- - XOR with round key k[Rounds]
- copy result to output

As we can see on the previous pseudo code the AES-B algorithm follows the AES round steps described. This makes it easier to understand its functionality. First, we generate the expanded key or key schedule which basically takes the cipher key and performs the key schedule operation to determine the number of round keys necessary to encrypt the data. The number of round keys necessary to encrypt one block of information is related to the key length. In this example we are using a 128 bits key; therefore it will require in total 11 round keys, 1 for the initial round, 9 for standard rounds and 1 for the final round. After we expand the key, the input data is XORed which is basically the AddRoundKey operation for the initial round, followed by the execution of SubBytes, ShiftRows and MixCulumns operations. Note that these 3 operations will be executed only for the first 9 main rounds and when we reach the 10th round we exclude the MixColumns operation as described on previous posts.


Now let’s take a look of how to implement AES in C, if you are going to copy this code make sure you download Malbrain's implementation and reference it from your application.

// GLOBAL CONSTANTS
// --------------------------------------
//File name that contains the data to be encrypted
static const char *filename = "input.txt";

// INCLUDES
// --------------------------------------
#include <time.h>
#include <math.h>
#include <stdio.h>
#include <string.h>

// DEFINES
// --------------------------------------
// Amount of encryption tasks to perform
#define ITERATIONS 1

// AES Block size is 128 bits or 16 Bytes
#define AES_BLOCK_SIZE 16

// Amount of columns in the state. AES defines always 4 because it
//always encrypts 128 bit blocks. However, Rijndael can define more
#define STATE_COLUMNS 4

// Amount of AES rounds to execute depending on the key length
#define ROUNDS 10

// 1024 Bytes is the default start value when checking throughputs
#define START_VALUE 1024

//Assert used to compare encryption outputs against NIST Test Vectors
#define assert(x) if(!(x)){
printf ("\nGeneric assert " #x " has thrown an error\n");
}

double cpuEncrypt(char* in, int in_len, char* key, char* out)
{
int i;
unsigned char expkey[4 * STATE_COLUMNS * (ROUNDS + 1)];
clock_t start, end;

//Expand the key and store in key
ExpandKey (key, expkey);

start = clock();

for ( i = 0; i < in_len; i+=16 )
Encrypt( in+i, expkey, out+i );

end = clock();

return (end-start) / (double)CLOCKS_PER_SEC;
}

static void test_vectors()
{
int i = 0;
//NIST Test Vectors AES 128
//The following link contains the test vectors used below to guarantee
//that the functionality of the algorithm hast not been modified
//http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf
unsigned char testPlainText[] = {0x50, 0x68, 0x12, 0xA4, 0x5F, 0x08,
0xC8, 0x89, 0xB9, 0x7F, 0x59, 0x80, 0x03, 0x8B, 0x83, 0x59};
unsigned char testKeyText[] = {0x00, 0x01, 0x02, 0x03, 0x05, 0x06,
0x07, 0x08, 0x0A, 0x0B, 0x0C, 0x0D, 0x0F, 0x10, 0x11, 0x12};
unsigned char testCipherText[] = {0xD8, 0xF5, 0x32, 0x53, 0x82, 0x89,
0xEF, 0x7D, 0x06, 0xB5, 0x06, 0xA4, 0xFD, 0x5B, 0xE9, 0xC9};

unsigned char out[16] = {0x0};

//Display plaintext to the user
printf("\n PlaintText: ");
for ( i = 0; i < AES_BLOCK_SIZE; i++)
printf("%x", testPlainText[i]);
printf("\n");

//Display key to the user
printf("\ Key: ");
for ( i = 0; i < AES_BLOCK_SIZE; i++)
printf("%x", testKeyText[i]);
printf("\n");

//Display ciphertext to the user
printf("\ CipherText: ");
for ( i = 0; i < AES_BLOCK_SIZE; i++)
printf("%x", testCipherText[i]);
printf("\n");

// Test encryption with OpenSSL
cpuEncrypt (testPlainText, sizeof(testPlainText), testKeyText, out);

//Display encrypted data
printf("\n CPU Encryption: ");
for (i = 0; i < AES_BLOCK_SIZE; i++)
printf("%x", out[i]);

//Assert that the encrypted output is the same as the
//NIST testCipherText vector
assert (memcmp (out, testCipherText, 16) == 0);
}

static double check_throughtputs ()
{
//Determines the amount of data to encrypt
int i, size;
size_t bytes_read;
double timer;

//Pointer to the file were the data to be process is located
FILE *file_input;

//Pointer to the input buffer, will hold the data read from the file
char *input_data;

//Pointer to the output buffer which will hold the encrypted data
char *output_data;
int fileSize;

//AES 128 Bit Key
unsigned char ckey[] = {0x00, 0x01, 0x02, 0x03, 0x05,
0x06, 0x07, 0x08, 0x0A, 0x0B, 0x0C, 0x0D, 0x0F, 0x10, 0x11, 0x12};

// Read the file into memory
file_input = fopen (filename, "rb");
fseek(file_input, 0, SEEK_END);
fileSize = ftell(file_input);
fseek(file_input, 0, SEEK_SET);

//Allocate memory for the inputa and output
//data depending on the file size
input_data = (char*)calloc (fileSize, sizeof(char));
output_data = (char*)calloc (fileSize, sizeof(char));

// It is important to know that the last 128 bit block
// to be encrypted can contain less than 128 bits to a
// for this test purpose input lenght is always aligned
// multiple of AES block size otherwise padding has to be implemented
bytes_read = fread (input_data, sizeof(char), fileSize, file_input);
fclose (file_input);
bytes_read = (int)(ceil (bytes_read / 16.0) * 16);

printf("\n%16s%16s%20s\n", "Total KB", "Seconds", "Throughput Mbps");

//Increment the data size to be encrpted by 2 starting with until
//the end of the file is reached
for ( size = START_VALUE; size <= fileSize ; size *= 2 )
{
timer = 0.0;
for ( i = 0; i < ITERATIONS; ++i ){
timer += cpuEncrypt (input_data, size, ckey, output_data);
}

printf ("%16i%16.3f%16.3f\n", size/1024, timer,
(size*8.0/(1024*1024))/ timer);
}

printf ("\t\n--------------------------------------------------\n");

//Free resources
free (input_data);
free (output_data);

return timer;
}

static void run_cipher(){
printf ("\nAES 128 Bit key");
printf ("\t\nNIST Test Vectors: ");
printf ("\t\n--------------------------------------------------");
test_vectors();
printf ("\n Test status: Passed\n");

printf ("\t\nAES on CPU Throughputs: ");
printf ("\t\n--------------------------------------------------");
check_throughtputs();
}

int main(int argc, char *argv[]){
run_cipher();
system("PAUSE");
return 0;
}
First, we execute a correctness test, it is performed to ensure that the algorithm’s functionality was not altered. This test takes a known plaintext and key, and gives back a known cipher text taken from the test vectors provided by the National Institute of Standards and Technology (NIST). After correctness is verified, the a 256MB file is loaded, the encryption takes place and the performance is evaluated. Remember that to be able to encrypt undetermined file sizes, it is necessary to split the data into 128 bit chunks so the data can be processed; this happens in the cpuEncryptmethod. For a better understanding, we encrypt different amount of data; starting with 1KB and duplicating the amount per iteration until the whole size of the file is reached 262144KB (256MB). Each iteration prints the time it takes to process its respective amount of data so we can have an overview of how heavy the operation executing might be. The following picture shows the results obtained when running the application:

Throughputs obtained when encrypting a 256MB file on AES-B [click to enlarge]

As it can be seen, the encryption test outputs the expected cipher text proposed by NIST, so we can be sure that the algorithm’s functionality is correct. The output shows a linear behavior, executing in average 45Mbps; this is due to the intensive computation the processor has to go through to encrypt the data. Remember this algorithm does not make heavy usage of pre-calculated look up tables to skip arithmetic, so nearly every single value required through the AES process has to be calculated, hence consuming all the CPU resources and eventually creating a bottleneck that won’t let the CPU process at a faster rate.


Typically applications that require encryption/decryption processing need a lot of computing power and processes can take hours to complete, slow down applications or require databases to be offline, and many times this is unacceptable for 24x7 high available systems. In that way encryption/decryption processes must be done as fast as possible to avoid unwanted downtimes. OpenSSL is a cryptographic framework which provides different cipher implementations, among them AES. OpenSSL implements and improved version of the AES algorithm which relies on pre-calculated look up tables to skip arithmetic and relief sthe processor of many of its intensive operations improving considerably the performance obtained when encrypting/decrypting data. In future posts I will be showing how to encrypt using the AES algorithm implemented on AES, as well as comparing the throughputs achieved when using this library.

Saturday, January 29, 2011

Modes of Operation


After explaining the basis of how the AES block cipher works now we’re moving into modes of operations. Basically, a block cipher by itself allows encrypting only a data block of the cipher's block length (in AES 128 bits). Although it might happen that the plaintext to be encrypted has exactly 128 bits, this is not always the case and the plaintext normally exceeds by far 128 bits. As plaintexts can be of any length they have to be broken into blocks before the encryption process takes place.

The National Institute for Standards and Technology has defined five confidentiality modes of operation for the AES block cipher, with different characteristics. The modes of operation defined for AES are ECB (Electronic Code Book), CBC (Cipher Block Chaining), CFB (Cipher FeedBack), OFB (Output FeedBack), and CTR (Counter).

Since the idea of these posts is to demonstrate the benefits of running AES in parallel I will explain only ECB and CTR modes of operations, which are the ones that might benefit the most when running in parallel due to the lack of dependencies between the data blocks they encrypt; being like that ideal candidates to be implemented on parallel processors such as a Graphic Processing Unit or GPU (we’ll come back to that later..).

Electronic Code Book

ECB is one of the simplest modes of operations. To encrypt a plaintext the forward AES cipher function is applied. The plaintext is divided into blocks each of which is, encrypted independently using a key. In ECB decryption, the inverse cipher function is as well applied directly and independently to each block of ciphertext.


One important consideration is the fact that in ECB if a similar data pattern exists and the same key is used, then the plain text will generate the same cipher text (as would happen when enciphering a file with repeated 16 bytes blocks), which is a major leak of secret information and can be exploited by cryptanalytic attacks. In the next picture it can be seen the pattern that ECB exposes when encrypting an image, this was mainly the reason why other modes of operation were designed, among them CTR. If this property is not desired, this mode of operation should not be considered.



Counter CTR

One way to hide data patterns is to provide some randomization for each block. All the modes of operation apart from ECB require an initialization vector (IV). The IV is used to provide a unique cipher text if the same key is re-used. In CTR a set of input blocks called counters are encrypted using the key, producing output blocks called keystreams, which are used to perform an XOR operation with the plaintext blocks. The sequence of counters must have the property that each block in the sequence is different than the rest, in other words all counters must be distinct. In CTR encryption, the block cipher encryption function is called on each counter block. Afterwards, the resulting output is XORed with the respective plaintext block to generate the ciphertext. While decrypting, the block cipher encryption function is invoked on each counter block. The resulting output will be then XORed with the respective ciphertext block in order to recover the plaintext block.

Summary

In these first three posts I cover the very basis of AES and modes of operations, which for our purposes are going to be enough to start with our first coding examples. Let’s make a small review!!

A brief historical remark about the Advanced Encryption Standard, followed by an overview of its functionality was provided. It was explained how input data is mapped to an intermediate matrix called the state in where all AES operations are going to take place; as well as how the state maps to the output array. The four steps contained on the AES rounds: AddRoundKey, SubBytes, ShiftRows and MixColumns were explained. The amount of rounds strongly depends on the key length and in order to compute them it is necessary to expand the key; the process of expanding the key was exemplified as well. Moreover, modes of encryption were introduced and two parallel in nature modes of operations named ECB and CTR were covered.

The first AES examples I will be presenting soon are going to run on the CPU; we’re going to create a C program that makes usage of the OpenSSL library which is the most common used cryptographic framework. The idea is to create a (hopefully short!) tutorial of how to implement AES using OpenSSL which it is kind of painful due to the lack of documentation on the web.

Sunday, November 7, 2010

AES Round Steps

The AES algorithm encrypts data by executing 4 different operations; modifying gradually the state matrix until the ciphertext is obtained. The operations are: AddRoundKey, SubBytes, ShiftRows, MixColumns and an extra operation called the KeySchdule which is in charge to expand the original key for each AES round. This operations are going to be executed N times depending on the key length.

AddRoundKey: This transformation applies a bitwise XOR between the elements of the State array and the elements of the RoundKey. The RoundKey is obtained from the cipher key by computing the KeySchedule which will be explained later. The State and the RoundKey are of the same size where “a” is the current state, “k” the round key and “b” is the next state. AddRoundKey is denoted by:

SubBytes: This step provides non linearity byte substitution; it operates independently on each byte of the state matrix using a look up table, called the Rijndael S-box. The design ofthe S-box is thought to be resistant to attacks made by differential and linear cryptanalysis and algebraic manipulation, and has been extensively analyzed during the competition to verify its security. It is constructed by the following two transformations:

  1. Compute the multiplicative inverse in the finite field GF(2^8)
  2. Apply a bitwise affine transformation over GF(2), which is a polynomial multiplication by a constant matrix using the inversed byte, followed by a XOR operation with a constant byte

The constant matrix and the constant byte are provided by the AES Standard as described in FIPS 197. On image 2 we can see the constant matrix and the constant byte, where b0….b7 is the multiplicative inverse as a vector

After doing the previous steps we will obtain the Rijndael S-Box. Image 4 shows the S-Box generated; afterwards, each byte of the state table is updated. For example if state(i,j)={3D} the new value is determined by the intersection of the row with value 3 and the column with value D, providing an output of state'(i,j)=27.

ShiftRows: This operation shifts each row of the state cyclically to the left depending on the row index. First row is not shifted, second row is shifted one position, third row two positions and finally the fourth row is shifted three positions to the left.


MixColumns: The MixColumns step along with the ShiftRows step is the primary source of diffusion in AES. MixColumns performs a column by column transformation, treating each column of the state as polynomials over GF (2^8). Each polynomial or column is then multiplied with a fixed polynomial:

this latter polynomial is expressed as a fixed matrix, the operation can be written as matrix multiplication as follows:

Each column of the state is then multiplied by the fixed polynomial expressed as a matrix and and the next state is obtained

KeySchedule: The AES algorithm takes the cipher key and performs a key schedule also referred to as a key expansion to generate keys for each AES round. The number of round keys necessary to encrypt one block of information is related to the key length because this will determine the number of rounds. For example, a key length of 128 bits will require in total 11 round keys, 1 for the initial round, 9 for standard rounds and 1 for the final round. The key schedule is therefore a method to extend the cipher key, where it can be seen as an array of 32 bit columns numbered from 0 to 43. The i-th column of the cipher key k matrix is denoted by Wi. The first four columns are filled with the given cipher key


Columns in position that are multiple of 4 are calculated by:

  • Applying a rotation operation and SubBytes transformation to the previous column W(i-1)
  • Adding this result to the column 4 positions earlier W(i-4) plus a round constant Rcon.

The reminding 3 columns are calculated by adding the previous column W(i-1) with the column 4 positions earlier W(i-4)

Wednesday, November 3, 2010

Advanced Encryption Standard Algorithm

In the next days I will be writing about the AES algorithm, so I am going to start with a brief historical introduction and basic description of the algorithm before I move into more technical stuff.

In January 1997, the US National Institute for Standards and Technology (NIST) announced a competition to develop a new encryption algorithm to replace the old Data Encryption Standard DES. This latter algorithm was urged to be replaced due to computational power improvements making of DES a weak cipher. Some computers at the time were capable to break DES inspecting its key space of 2^56 keys by brute force.

The new encryption standard developed, was the Advanced Encryption Standard or AES becoming the United States encryption standard for sensitive information defined in Federal Information Processing Standard (FIPS 197). Unlike the design of DES, AES was designed as an open and public competition and many companies and researchers around the world participated in the contest . In total fifteen new algorithms were submitted and a second round narrow the choice to five of these. Among them was the Rijndael algorithm which in 2001 was announced to be the winner cipher. Rijndael was designed by two Belgium cryptographers; Joan Daemen and Vincent Rijmen from which the algorithm receives its name.

AES is a symmetric algorithm based on the original Rijndael algorithm. It encrypts in blocks of 128 bits of data called plaintext, and transforms it into an encrypted new block of the same length called ciphertext. The difference between Rijnadael and AES is that Rijndael encrypts blocks of data of 128, 192 and also 256 bits and AES only encrypts blocks of 128 bits. In order to encrypt AES makes usage of a key which is going to be secret and shared between the encryption/decryption parties. The key was designed to have sizes of 128-bits, 192-bits or 256-bits length.


Image taken from a very cool AES animation found in: http://www.formaestudio.com/rijndaelinspector

The algorithm performs a number of rounds transforming the plaintext depending on the key length. For example, when using a 128 bit key the algorithm executes 10 rounds, when using 192 bit keys 12 rounds and finally when using a 256 bit key 14 rounds are executed. Through the process the 128 bit block of plaintext will go to what we are going to call as the AES encryption process illustrated in the image above, and the key will go through a process called the KeyExpansion or sometimes referred to as the KeySchedule (Generate expanded keys for the different AES rounds). The 128 bit block of plaintext will be processed in an intermediate state which “can” be seen as a 4x4 matrix holding 16 bytes (is not always a real matrix and can be a vector).

If we look back to the AES process illustrated this refers particularly to AES-128 bit key which executes a total of 10 rounds. As we can see, AES consists of only 4 operations: SubBytes, ShiftRows, MixColumns and AddRoundKey. It is important to note that the very last round differs from the rest of rounds by skipping the MixColumns process. Before the plaintext is transformed to ciphertext it has to pass through all these operations transforming gradually the plaintext stored on the state matrix into encrypted text which will result into the final cipher text.

This is an introductory and general explanation of the AES algorithm. In future posts I will be explaining how the different AES operations work. However, if in the meantime you feel interested in getting to know more about the AES steps I really recommend checking out this AES encryption animation (http://www.formaestudio.com/rijndaelinspector) it gives a very simple understanding of how AES works without digging into complex mathemtical explanations. Though, if that is what you need “The Design of Rijndael” or the “AES specification” might be more useful.