Encoding, in computers, can be defined as the process of transmitting or storing sequence of characters efficiently. There are two types of encoding schemes
Fixed-Length encoding – Every character is assigned a binary code using same number of bits. Thus, a string like “aabacdad” can require 64 bits (8 bytes) for storage or transmission, assuming that each character uses 8 bits.
Variable- Length encoding – As opposed to Fixed-length encoding, this scheme uses variable number of bits for encoding the characters depending on their frequency in the given text.
Huffman Encoding- can be used for finding the unique representation for variable length encoding. Developed by David Huffman in 1951, this technique is the basis for all data compression and encoding schemes. It assign binary codes to characters depending on how frequently they occur in the given text.
- Using Huffman algorithm, the characters used in string “aabacdad” can be encoded as
- 0 gets decoded to ‘a’
- 110 gets decoded to ‘b’
- 111 gets decoded to ‘c’
- 10 gets decoded to ‘d’
Comparative Analysis of various Encoding schemes
Assume the string to be encoded is “aabacdad”
- Using default encoding (i.e. ASCII encoding, using 8 bits for each character
- Here decoding is not required as ASCII representation is universally known and unique
- ==> Number of bits used for encoding = 8 * 8 = 64 bits
- Uniform encoding (with less number of bits )
- Here we find 4 characters appears in the string, so, 2 bits are sufficient to represent these characters. Therefore, we need to send bits for actual character along with the coding scheme (let say a=00, b=01, c=11, d=10).
- ==> total bits required = decoding scheme + encoded message
- decoding scheme => [Ascii bits for 4 characters + 2 bit representation for 4 characters]
- encoded message => [2 bits for each character in the string ]
- ==> (8*4 + 4*2) +( 8*2) = (32 +8)+ 16 =56 bits
- Variable encoding =decoding scheme + decoded message
- Here we find 4 characters appears in the string with different frequency, so, variable number of bits are required to represent each characters. Therefore, we also need to send bits for actual character along with the coding scheme (Using Huffman we find a=0, b=110, c=111, d=10 ).
- ==> total bits required = decoding scheme + encoded message
- decoding scheme => [Ascii bits for 4 characters + bit representation for 4 characters]
- encoded message => [ a*4+b*1+c*1+d*2 bits (for characters used in the string) ]
- ==> (8*4 + 9) +( 1*4+3*1+3*1+2*2) = (32 +9)+ (14) =55 bits
NOTE – The real benefit is realized with larger size strings.
Click the link to watch lecture on HUFFMAN ENCODING .
You are invited to raise queries by writing your question in the box given below