Interactive Programming Education Platform

Logic Practice

Introduction

Data compression is a technique for conserving space by encoding symbols based on their probabilities to attain a concise representation. The Shannon-Fano algorithm, developed by Claude Shannon and Robert Fano in the 1940s, stands as one of the earliest effective methods for achieving lossless data compression.

Problem Statement:

We need to generate Shannon-Fano codes for individual symbols based on their probabilities, which are essential for optimizing message encoding and decoding processes. This guide focuses on the implementation of Shannon-Fano coding in C++ specifically addressing the decoding aspect.

History:

Shannon-Fano encoding stands as a significant achievement in the evolution of data compression, a concept pioneered by Claude E. Shannon and Robert M. Fano during the 1940s. Widely acknowledged as a foundational figure in information theory, Claude Shannon was an esteemed American mathematician, electrical engineer, and cryptanalyst, whereas Robert Fano was recognized as an Italian-American computer scientist and engineer.

The Shannon-Fano technique was one of the initial endeavors to achieve data compression without loss by assigning varying length codes to symbols depending on their probabilities. Essentially, it includes organizing symbols by their probabilities, then iteratively dividing symbol sets into two subsets with binary codes allocated in a way that symbols with higher probabilities receive shorter codes.

Shannon-Fano encoding stood out as a significant progression in data compression techniques. Nonetheless, its dominance was short-lived as Huffman encoding soon outperformed it in effectiveness. The innovative approach devised by David A. Huffman in 1952 employed a binary tree to allocate variable-length codes according to symbol frequencies, ultimately delivering superior compression efficiency compared to Shannon-Fano coding.

However, Shannon-Fano encoding has been superseded by algorithms like arithmetic coding or Huffman coding which followed it. Despite this, it continues to stand out as a significant accomplishment in data compression. It established the groundwork for more sophisticated methods and played a role in enhancing the theoretical comprehension of information theory.

Features of Shannon-Fano:

There are several features of Shannon-Fano in C++. Some main features of the Shannon-Fano algorithm are as follows:

✅ Simple and Easy to Understand: The Shannon-Fano algorithm is quite simple and intuitive compared to other complex compressions like arithmetic or Huffman encoding.
✅ Moderate Compression Level: Although it may not achieve high level compressions like other methods do, but Shannon Fano still offers reasonable rates especially when dealing with certain types of data sets.
✅ Not Externally Structured: Unlike building binary trees requirement for Huffman codding, Shannon Fanno can be implemented using only code tables without any extra structures.
✅ Adaptive Coding: The basic Shannon-Fano algorithm is non-adaptive but can be made variable by considering the symbols dynamicity that compresses certain types of data.
✅ Historical Significance: Shannon-Fano coding method is an important development in compression history as it was among the pioneering lossless techniques thus paving way for other more complex methods used in this field currently.

Approach:

✅ Input: We take encoded message and Shannon-Fano code table as our input.
✅ Decoding Process: In order to decode the message each bit of encoded bits are read one by one while traversing through shannon fano tree until corresponding symbol is found. This step is repeated until whole message gets decoded.
✅ Output: After that, we show decoded message to user.

Example 1:

Let's consider a scenario to demonstrate the Shannon-Fano decoding process in the C++ programming language.

Example


#include <iostream>
#include <map>
#include <string>

using namespace std;

// Structure to represent a node in the Shannon-Fano tree
struct Node {
    char symbol;
    Node* left;
    Node* right;
};

// Function to decode the message using the Shannon-Fano code table
string decodeMessage(string encodedMessage, map<string, char>& codeTable) {
    string decodedMessage = "";
    string currentCode = "";
    
    for (char bit : encodedMessage) {
        currentCode += bit;
        if (codeTable.find(currentCode) != codeTable.end()) {
            decodedMessage += codeTable[currentCode];
            currentCode = "";
        }
    }
    
    return decodedMessage;
}

// Main function to demonstrate Shannon-Fano decoding
int main() {
    // Example Shannon-Fano code table
    map<string, char> codeTable = {
        {"0", 'A'},
        {"10", 'B'},
        {"110", 'C'},
        {"111", 'D'}
    };

    // Example encoded message
    string encodedMessage = "0110110110";

    // Decode the message
    string decodedMessage = decodeMessage(encodedMessage, codeTable);

    // Display the decoded message
    cout << "Decoded Message: " << decodedMessage << endl;

    return 0;
}
#include <iostream>
#include <map>
#include <string>

using namespace std;

// Structure to represent a node in the Shannon-Fano tree
struct Node {
    char symbol;
    Node* left;
    Node* right;
};

// Function to decode the message using the Shannon-Fano code table
string decodeMessage(string encodedMessage, map<string, char>& codeTable) {
    string decodedMessage = "";
    string currentCode = "";
    
    for (char bit : encodedMessage) {
        currentCode += bit;
        if (codeTable.find(currentCode) != codeTable.end()) {
            decodedMessage += codeTable[currentCode];
            currentCode = "";
        }
    }
    
    return decodedMessage;
}

// Main function to demonstrate Shannon-Fano decoding
int main() {
    // Example Shannon-Fano code table
    map<string, char> codeTable = {
        {"0", 'A'},
        {"10", 'B'},
        {"110", 'C'},
        {"111", 'D'}
    };

    // Example encoded message
    string encodedMessage = "0110110110";

    // Decode the message
    string decodedMessage = decodeMessage(encodedMessage, codeTable);

    // Display the decoded message
    cout << "Decoded Message: " << decodedMessage << endl;

    return 0;
}

Output:

Output


Decoded Message: ACCC

Explanation:

✅ Struct Node: It defines a structure, which represents node in shanon fano tree where each node has symbol (char) and pointers towards its left child and right child.
✅ decodeMessage: It takes encoded message and shannon fano code table as input then returns decoded message. For every bit in encoded message, this function goes through building up current code. Whenever current code matches code from codetable, corresponding symbol is appended with decoded message plus reset current code.
✅ Main Function:
✅ Create a Shannon-Fano code table example using map<string, char> . Binary codes serve as keys and their corresponding symbols are values.
✅ Define an encoded message example as a string.
✅ Call decodeMessage with the coded message and the code table in order to decode it.
✅ After that, print out the decoded message.

For Example:

It is the procedure of decoding the encoded message using the provided code table that we will now explore:

Encoded Message: "0110110110"

Decode Process:

"0" → "A"

"1" → Not found in code table

"11" → "B"

"0" → "A"

"1" → Not found in code table

"10" → "B"

"1" → Not found in code table

"10" → "B"

"1" → Not found in code table

"0" → "A"

Decoded Message: "ABBABA"

Complexity Analysis:

Time Complexity:

✅ Building the code table: The time complexity of building the code table depends on how we build it. If all symbols and their codes are iterated through when creating this map, it should take O(n) where n represents number of symbols.
✅ Decoding the message: In order to decode messages, we look at each bit once until we find matching symbol from our dictionary so it has O(m) time complexity where m is length of encoded message.
✅ Thus, overall time complexity for decoding process becomes O(n + m) where n represents number of symbols and m stands for length of encoded message.

Space Complexity:

✅ The code table's space complexity is O(n) where n denotes the number of symbols because it stores each symbol together with their corresponding codes.
✅ Variables and temporary storage use additional space, which is O(1) for this implementation.
✅ The overall space complexity becomes O(n) due to the code table.

Example 2:

Let's consider a different instance to demonstrate Shannon-Fano decoding within a C++ context.

Example


#include <iostream>
#include <vector>
#include <algorithm>
#include <map>

using namespace std;

// Structure to represent a symbol and its probability
struct SymbolProb {
    char symbol;
    double probability;
};

// Function to recursively generate Shannon-Fano codes
void generateCodes(vector<SymbolProb>& symbols, int start, int end, string code, map<char, string>& codeTable) {
    if (start == end) {
        codeTable[symbols[start].symbol] = code;
        return;
    }

    double sum = 0;
    for (int i = start; i <= end; ++i) {
        sum += symbols[i].probability;
    }

    double halfSum = 0;
    int splitIndex = start;
    for (int i = start; i <= end; ++i) {
        halfSum += symbols[i].probability;
        if (halfSum >= sum / 2) {
            splitIndex = i;
            break;
        }
    }

    generateCodes(symbols, start, splitIndex, code + "0", codeTable);
    generateCodes(symbols, splitIndex + 1, end, code + "1", codeTable);
}

// Function to encode a message using Shannon-Fano codes
string encodeMessage(string message, map<char, string>& codeTable) {
    string encodedMessage = "";
    for (char c : message) {
        encodedMessage += codeTable[c];
    }
    return encodedMessage;
}

// Function to decode a message using Shannon-Fano codes
string decodeMessage(string encodedMessage, map<string, char>& codeTable) {
    string decodedMessage = "";
    string currentCode = "";
    
    for (char bit : encodedMessage) {
        currentCode += bit;
        if (codeTable.find(currentCode) != codeTable.end()) {
            decodedMessage += codeTable[currentCode];
            currentCode = "";
        }
    }
    
    return decodedMessage;
}

int main() {
    // Example symbols and their probabilities
    vector<SymbolProb> symbols = {
        {'A', 0.25},
        {'B', 0.15},
        {'C', 0.2},
        {'D', 0.4}
    };

    // Sort symbols by probability in descending order
    sort(symbols.begin(), symbols.end(), [](const SymbolProb& a, const SymbolProb& b) {
        return a.probability > b.probability;
    });

    // Generate Shannon-Fano codes
    map<char, string> codeTable;
    generateCodes(symbols, 0, symbols.size() - 1, "", codeTable);

    // Print the generated codes
    cout << "Generated Codes:" << endl;
    for (const auto& entry : codeTable) {
        cout << entry.first << ": " << entry.second << endl;
    }

    // Example message to encode
    string message = "ABCD";

    // Encode the message
    string encodedMessage = encodeMessage(message, codeTable);
    cout << "Encoded Message: " << encodedMessage << endl;

    // Decode the message
    map<string, char> reverseCodeTable;
    for (const auto& entry : codeTable) {
        reverseCodeTable[entry.second] = entry.first;
    }
    string decodedMessage = decodeMessage(encodedMessage, reverseCodeTable);
    cout << "Decoded Message: " << decodedMessage << endl;

    return 0;
}

Output:

Output


Generated Codes:
A: 01
B: 11
C: 10
D: 00
Encoded Message: 01111000
Decoded Message: ABCD

Explanation:

✅ struct SymbolProb: This structure represents a symbol (char) and its associated probability (double). It is used to store information about each symbol in the message.
✅ generateCodes Function: This function generates Shannon-Fano codes for symbols recursively depending on their probabilities.
✅ Symbols: A vector of SymbolProb structures has symbols and their probabilities.
✅ start, end: These parameters define the range of indices in the symbols vector to process.
✅ Code: This parameter holds the current Shannon-Fano code being generated.
✅ Message: The input message to be encoded.
✅ codeTable: Map containing Shannon-Fano codes for each symbol.
✅ decodeMessage Function: This function decodes an encoded message provided with Shannon-Fano codes.
✅ encodedMessage: The encoded message to be decoded.
✅ codeTable: A map containing Shannon-Fano codes mapped to their respective symbols, used for decoding.

Main Function (main)

✅ A vector symbols is initialized with symbols and their corresponding probabilities.
✅ For example: {('A', 0.25), ('B', 0.15), ('C', 0.2), ('D', 0.4)}.
✅ The symbols vector is sorted in descending order based on probabilities using std::sort with a lambda function.
✅ The generateCodes function is called to generate Shannon-Fano codes for the sorted symbols.
✅ The codes generated are kept in codeTable map.
✅ Generated Shannon-Fano codes stored in codeTable map are displayed.
✅ It shows how to encode and decode a sample message ("ABCD").
✅ encodeMessage method encodes the message using codeTable.
✅ After that, creates reverse mapping (reverseCodeTable) from codes to symbols for decoding purposes.
✅ Using decodeMessage and reverseCodeTable, decodes the encoded message .

Complexity Analysis:

Time Complexity

✅ Sorting Symbols: std::sort is used for sorting the symbols according to probabilities has a time complexity of O(nlogn) where n is number of symbols because it uses efficient comparison based sorting algorithm like Quicksort or Merge Sort usually.
✅ Generating Codes (generateCodes Function): This function is recursively called to partition symbols based on their probabilities. The process time complexity depends on the recursion tree structure and the number of symbols. If each partition splits into two as evenly as possible, the time complexity becomes O(nlogn) in worst case.
✅ Encoding a Message (encodeMessage Function): In order to encode a message one has to iterate through all characters in input message and substitute them with corresponding Shannon-Fano code. It takes O(m) where m is length of message operations.
✅ Decoding a Message (decodeMessage Function): In order for an encoded messages to be read again they must also go bit by bit or code segment by code segment through every part of themselves matching these bits against symbols using the table given for codes provided. This takes O(k) where k is length of encoded messages operation time.
✅ std::sort is used for sorting the symbols according to probabilities has a time complexity of O(nlogn) where n is number of symbols because it uses efficient comparison based sorting algorithm like Quicksort or Merge Sort usually.
✅ This function is recursively called to partition symbols based on their probabilities. The process time complexity depends on the recursion tree structure and the number of symbols.
✅ If each partition splits into two as evenly as possible, the time complexity becomes O(nlogn) in worst case.
✅ In order to encode a message one has to iterate through all characters in input message and substitute them with corresponding Shannon-Fano code. It takes O(m) where m is length of message operations.
✅ In order for an encoded messages to be read again they must also go bit by bit or code segment by code segment through every part of themselves matching these bits against symbols using the table given for codes provided. This takes O(k) where k is length of encoded messages operation time.

Space Complexity

✅ Symbol Storage: Vector symbols holds data about every symbol along with its probability. For this storage, a required n number of symbols is required, which gives us the following space complexity: O(n).
✅ Code Table (codeTable and reverseCodeTable): The code table and reverse code table maps Shannon-Fano codes with symbols. On the other hand, it can be vice versa too. The number of unique symbols and their respective lengths determine space complexity for these maps. In worst cases scenario where every symbol has its own unique code of maximum length possible then space complexity will be O(n), where n is number of symbols. But usually in real life situations with average symbol distribution the space complexity would have been lesser.
✅ Recursive Call Stack: The GenerateCodes function uses recursion to partition symbols, which consumes call stack space. The maximum recursion depth depends on the number of symbols and how their probabilities are distributed. During recursive calls, best case space complexity for call stack is O(logn) (if partitions are balanced) while worst case is O(n) (when they are unbalanced).
✅ Vector symbols holds data about every symbol along with its probability. For this storage, a required n number of symbols is required, which gives us the following space complexity: O(n).
✅ The code table and reverse code table maps Shannon-Fano codes with symbols. On the other hand, it can be vice versa too. The number of unique symbols and their respective lengths determine space complexity for these maps.
✅ In worst cases scenario where every symbol has its own unique code of maximum length possible then space complexity will be O(n), where n is number of symbols. But usually in real life situations with average symbol distribution the space complexity would have been lesser.
✅ The GenerateCodes function uses recursion to partition symbols, which consumes call stack space. The maximum recursion depth depends on the number of symbols and how their probabilities are distributed.
✅ During recursive calls, best case space complexity for call stack is O(logn) (if partitions are balanced) while worst case is O(n) (when they are unbalanced).

Advantages of Shannon-Fano Coding:

Several advantages of Shannon-Fano Coding are as follows:

✅ Efficient Compression: Shannon-Fano coding is capable of achieving efficient compression ratios, especially for data with non-uniform probability distributions. It can lead to smaller file sizes and reduced storage requirements.
✅ Simple Implementation: The Shannon-Fano algorithm is relatively straightforward to implement compared to more complex compression techniques like Huffman coding. This simplicity can make it an attractive choice for applications where a lightweight compression algorithm is sufficient.
✅ Decoding Speed: Shannon-Fano decoding can be faster than other algorithms like Huffman coding in certain scenarios. It can be advantageous for real-time applications or systems where decoding speed is critical.
✅ Low Memory Overhead: Shannon-Fano decoding typically requires less memory overhead compared to some other compression techniques. It can be beneficial for systems with limited memory resources.
✅ Lossless Compression: Shannon-Fano decoding ensures lossless compression, meaning that the original data can be perfectly reconstructed after compression and decompression. It is important for applications where data integrity is paramount.
✅ Adaptability: The Shannon-Fano algorithm can be adapted to handle various data types and sizes. It can be customized to suit specific compression requirements based on the characteristics of the input data.

Disadvantages of Shannon-Fano Coding:

Several disadvantages of Shannon-Fano Coding are as follows:

✅ Suboptimal Compression Ratios: Compared to more advanced techniques like Arithmetic coding or Lempel-Ziv-Welch (LZW) compression, Shannon-Fano coding can sometimes produce suboptimal compression ratios. It means that the compressed data may not be as small as it could be with other algorithms.
✅ Static Codebook: Shannon-Fano coding uses a static codebook that is generated based on the probabilities of symbols in the input data. This static nature can result in inefficiencies, especially when dealing with data that has varying probability distributions over time or across different segments of data.
✅ Complexity with Large Alphabet Sizes: As the size of the alphabet (number of distinct symbols) increases, Shannon-Fano decoding becomes less efficient. Managing a large codebook with many symbols can lead to increased memory usage and slower decoding performance.
✅ Encoding and Decoding Overhead: The process of generating the Shannon-Fano codebook and performing encoding/decoding can introduce additional computational overhead. This overhead may be more pronounced for large datasets or in systems with limited processing power.
✅ Not Adaptive: Unlike adaptive algorithms such as Adaptive Huffman coding or Adaptive Arithmetic coding, Shannon-Fano coding does not adapt to changes in the input data stream. This lack of adaptability can result in less efficient compression for dynamic data streams.
✅ Potential Loss of Information: While Shannon-Fano decoding is a lossless compression technique, it is possible to encounter scenarios where the compression process inadvertently loses some information, especially if the probabilities of symbols are not accurately estimated during codebook generation.

Conclusion:

This tutorial discussed the Shannon-Fano decoding technique and provided a demonstration of a basic C++ program that applies it. Shannon-Fano encoding is a fundamental concept in data compression, so understanding how to decipher this method can be valuable in a wide range of applications involving data compression and encoding in different formats.

Shannon Fano Decoding In C++

Introduction

Problem Statement:

History:

Features of Shannon-Fano:

Approach:

Example 1:

Explanation:

For Example:

Complexity Analysis:

Time Complexity:

Space Complexity:

Example 2:

Explanation:

Complexity Analysis:

Time Complexity

Space Complexity

Advantages of Shannon-Fano Coding:

Disadvantages of Shannon-Fano Coding:

Conclusion:

Input Required