Huffman coding is one of the basic compression methods, that have proven useful in image and video compression standards. When applying Huffman encoding technique on an Image, the source symbols can be either pixel intensities of the Image, or the output of an intensity mapping function.
Prerequisites : Huffman Coding | File Handling
The first step of Huffman coding technique is to reduce the input image to a ordered histogram, where the probability of occurrence of a certain pixel intensity value is as
prob_pixel = numpix/totalnum
where numpix is the number of occurrence of a pixel with a certain intensity value and totalnum is the total number of pixels in the input Image.
Let us take a 8 X 8 Image
The pixel intensity values are :
This image contains 46 distinct pixel intensity values, hence we will have 46 unique Huffman code words.
It is evident that, not all pixel intensity values may be present in the image and hence will not have non-zero probability of occurrence.
From here on, the pixel intensity values in the input Image will be addressed as leaf nodes.
Now, there are 2 essential steps to build a Huffman Tree :
- Build a Huffman Tree :
- Combine the two lowest probability leaf nodes into a new node.
- Replace the two leaf nodes by the new node and sort the nodes according to the new probability values.
- Continue the steps (a) and (b) until we get a single node with probability value 1.0. We will call this node as root
- Backtrack from the root, assigning ‘0’ or ‘1’ to each intermediate node, till we reach the leaf nodes
In this example, we will assign ‘0’ to the left child node and ‘1’ to the right one.
Now, let’s look into the implementation :
Step 1 :
Read the Image into a 2D array(image)
If the Image is in .bmp format, then the Image can be read into the 2D array, by using this code given in this link here.
int i, j; char filename[] = "Input_Image.bmp" ; int data = 0, offset, bpp = 0, width, height; long bmpsize = 0, bmpdataoff = 0; int ** image; int temp = 0; // Reading the BMP File FILE * image_file; image_file = fopen (filename, "rb" ); if (image_file == NULL) { printf ( "Error Opening File!!" ); exit (1); } else { // Set file position of the // stream to the beginning // Contains file signature // or ID "BM" offset = 0; // Set offset to 2, which // contains size of BMP File offset = 2; fseek (image_file, offset, SEEK_SET); // Getting size of BMP File fread (&bmpsize, 4, 1, image_file); // Getting offset where the // pixel arrray starts // Since the information // is at offset 10 from // the start, as given // in BMP Header offset = 10; fseek (image_file, offset, SEEK_SET); // Bitmap data offset fread (&bmpdataoff, 4, 1, image_file); // Getting height and width of the image // Width is stored at offset 18 and height // at offset 22, each of 4 bytes fseek (image_file, 18, SEEK_SET); fread (&width, 4, 1, image_file); fread (&height, 4, 1, image_file); // Number of bits per pixel fseek (image_file, 2, SEEK_CUR); fread (&bpp, 2, 1, image_file); // Setting offset to start of pixel data fseek (image_file, bmpdataoff, SEEK_SET); // Creating Image array image = ( int **) malloc (height * sizeof ( int *)); for (i = 0; i < height; i++) { image[i] = ( int *) malloc (width * sizeof ( int )); } // int image[height][width] // can also be done // Number of bytes in the // Image pixel array int numbytes = (bmpsize - bmpdataoff) / 3; // Reading the BMP File // into Image Array for (i = 0; i < height; i++) { for (j = 0; j < width; j++) { fread (&temp, 3, 1, image_file); // the Image is a // 24-bit BMP Image temp = temp & 0x0000FF; image[i][j] = temp; } } } |
Create a Histogram of the pixel intensity values present in the Image
// Creating the Histogram int hist[256]; for (i = 0; i < 256; i++) hist[i] = 0; for (i = 0; i < height; i++) for (j = 0; j < width; j++) hist[image[i][j]] += 1; |
Find the number of pixel intensity values having non-zero probability of occurrence
Since, the values of pixel intensities range from 0 to 255, and not all pixel intensity values may be present in the image (as evident from the histogram and also the image matrix) and hence will not have non-zero probability of occurrence. Also another purpose this step serves, is that the number of pixel intensity values having non-zero probability values will give us the number of leaf nodes in the Image.
// Finding number of // non-zero occurences int nodes = 0; for (i = 0; i < 256; i++) { if (hist[i] != 0) nodes += 1; } |
Calculating the maximum length of Huffman code words
As shown by Y.S.Abu-Mostafa and R.J.McEliece in their paper “Maximal codeword lengths in Huffman codes”, that, If , then in any efficient prefix code for a source whose least probability is p, the longest codeword length is at most K & If
, there exists a source whose smallest probability is p, and which has a Huffman code whose longest word has length K. If
, there exists such a source for which every optimal code has a longest word of length K.
Here, is the
Fibonacci number.
Gallager [1] noted that every Huffman tree is efficient, but in fact it is easy to see more generally that every optimal tree is efficient
Fibonacci Series is : 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …
In our example, lowest probability(p) is 0.015625
Hence,
1/p = 64
For K = 9, F(K+2) = F(11) = 55 F(K+3) = F(12) = 89
Therefore,
1/F(K+3) < p < 1/F(K+2) Hence optimal length of code is K=9
// Calculating max length // of Huffman code word i = 0; while ((1 / p) < fib(i)) i++; int maxcodelen = i - 3; |
// Function for getting Fibonacci // numbers defined outside main int fib( int n) { if (n <= 1) return n; return fib(n - 1) + fib(n - 2); } |
Step 2
Define a struct which will contain the pixel intensity values(pix), their corresponding probabilities(freq), the pointer to the left(*left) and right(*right) child nodes and also the string array for the Huffman code word(code).
These structs is defined inside main(), so as to use the maximum length of code(maxcodelen) to declare the code array field of the struct pixfreq
// Defining Structures pixfreq struct pixfreq { int pix; float freq; struct pixfreq *left, *right; char code[maxcodelen]; }; |
Step 3
Define another Struct which will contain the pixel intensity values(pix), their corresponding probabilities(freq) and an additional field, which will be used for storing the position of new generated nodes(arrloc).
// Defining Structures huffcode struct huffcode { int pix, arrloc; float freq; }; |
Step 4
Declaring an array of structs. Each element of the array corresponds to a node in the Huffman Tree.
// Declaring structs struct pixfreq* pix_freq; struct huffcode* huffcodes; |
Why use two struct arrays?
Initially, the struct array pix_freq, as well as the struct array huffcodes will only contain the information of all the leaf nodes in the Huffman Tree.
The struct array pix_freq will be used to store all the nodes of the Huffman Tree and the array huffcodes will be used as the updated (and sorted) tree.
Remember that, only huffcodes will be sorted in each iteration, and not pix_freq
The new nodes created by combining two nodes of lowest frequency, in each iteration, will be appended to the end of the pix_freq array, and also to huffcodes array.
But the array huffcodes will be sorted again according to the probability of occurrence, after the new node is added to it.
The position of the new node in the array pix_freq will be stored in the arrloc field of the struct huffcode.
The arrloc field will be used when assigning the pointer to the left and right child of a new node.
Step 4 continued…
Now, if there are N number of leaf nodes, the total number of nodes in the whole Huffman Tree will be equal to 2N-1
And after two nodes are combined and replaced by the new parent node, the number of nodes decreases by 1 at each iteration. Hence, it is sufficient to have a length of nodes for the array huffcodes, which will be used as the updated and sorted Huffman nodes.
int totalnodes = 2 * nodes - 1; pix_freq = ( struct pixfreq*) malloc ( sizeof ( struct pixfreq) * totalnodes); huffcodes = ( struct huffcode*) malloc ( sizeof ( struct huffcode) * nodes); |
Step 5
Initialize the two arrays pix_freq and huffcodes with information of the leaf nodes.
j = 0; int totpix = height * width; float tempprob; for (i = 0; i < 256; i++) { if (hist[i] != 0) { // pixel intensity value huffcodes[j].pix = i; pix_freq[j].pix = i; // location of the node // in the pix_freq array huffcodes[j].arrloc = j; // probability of occurrence tempprob = ( float )hist[i] / ( float )totpix; pix_freq[j].freq = tempprob; huffcodes[j].freq = tempprob; // Declaring the child of // leaf node as NULL pointer pix_freq[j].left = NULL; pix_freq[j].right = NULL; // initializing the code // word as end of line pix_freq[j].code[0] = ' |