Interactive Programming Education Platform

Logic Practice

A fusion tree is a sophisticated data structure commonly employed for managing sorted sets or associative arrays. Michael Fredman and Dan Willard presented it in 1990 to enhance search speed through the utilization of bitwise parallelism and word-level computations within the processor. This creative strategy enables fusion trees to handle dynamic set operations with greater efficiency compared to conventional structures such as balanced binary search trees.

The main idea behind fusion trees is to decrease the tree's height and consequently lessen the number of comparisons needed when performing search operations. Conventional balanced trees like AVL or Red-Black trees typically have a height proportional to log n, where n represents the total elements in the tree. Fusion trees enhance this by minimizing the effective height to n, where w signifies the machine's word size (usually 32 or 64 bits). This optimization is accomplished through a blend of methods, such as bitwise parallelism, approximate range searching, and the identification of significant bits.

Key Techniques

Utilizing Bitwise Parallelism: Fusion trees take advantage of the processor's capability to execute operations on multiple bits in parallel. This enables the tree to compare groups of bits together, rather than evaluating each bit individually, thereby making use of the entire word size of the machine.

Approximate Interval Query: Fusion trees utilize approximate range searching to pinpoint the location of a key within a specific subset of tree nodes. This strategy aids in minimizing the number of comparisons by emphasizing the crucial bits that distinguish the keys within the specified set.

In fusion trees, keys are condensed by capturing the most variable bits known as significant bits. This compression minimizes the key size and enhances the efficiency of comparison tasks.

Word-level Parallelism: Employing word-level parallelism enables the utilization of a solitary machine word to accommodate numerous data pieces, thereby enhancing the tree's capacity to process larger amounts of data simultaneously.

Approach-1: Using Bit extraction and compression.

Bit manipulation and data compression are essential strategies applied in fusion trees to enhance search efficiency. These methods enable the data structure to exploit bitwise concurrency, diminishing the quantity of comparisons required by concentrating on the most relevant bits within the keys.

Bit Extraction

Bit extraction is the process of isolating the high-order bits (significant bits) from the keys that are stored within the fusion tree. These particular bits are selected due to their capacity to distinctly distinguish between various keys.

Identifying Key Variations: To start, analyze the array of keys and pinpoint the specific bit locations where the keys exhibit the greatest differences. These particular positions hold significance as they play a crucial role in differentiating one key from another.

Selecting Important Bits: After determining the specific positions of the variable bits, a portion of these bits is designated as the significant ones. The quantity of significant bits chosen depends on the word size, denoted as w, of the machine and the branching factor, B, of the tree. Generally, the number of significant bits is proportional to O(log B).

Example: Let's examine a limited collection of values: {12, 15, 7, 9}. Represented in binary format, these values are:

12: 1100

15: 1111

7: 0111

9: 1001

The binary digits at indexes (starting from the least significant bit) 0, 1, 2, and 3 differ across these access keys. Assuming we designate positions 1, 2, and 3 as significant.

Bit Compression

Bit compression entails generating a compressed version of individual keys by retaining only the essential bits. This process decreases the size of the keys, enabling quicker comparisons when conducting search activities.

Compress Key

To condense a key, retrieve the bits located at the important positions and combine them to generate a reduced binary value. This process involves utilizing bitwise operations like shifts and logical ANDs.

Example:

For the key 12 (1100), the important bits are located at positions 1, 2, and 3. When isolating these bits, we obtain the value 100.

For the key 15 (binary 1111), the notable bits are: 111.

For the key 7 (0111), the important bits are: 111.

For the key 9 (1001), the important bits are: 001.

These condensed keys are subsequently utilized for comparison within the tree.

Program:

Example


#include <iostream>
#include <vector>
#include <algorithm>
const int WORD_SIZE = 32; // Assuming 32-bit machine
// Helper function to extract significant bits
std::vector<int> extractSignificantBits(const std::vector<int>& keys) {
    int maxKey = *std::max_element(keys.begin(), keys.end());
    int bitLength = 0;
    while (maxKey > 0) {
        bitLength++;
        maxKey >>= 1;
    }
    std::vector<int> significantBits;
    for (int i = 0; i < bitLength; i++) {
        significantBits.push_back(i);
    }
    return significantBits;
}
// Helper function to compress a key using significant bits
int compressKey(int key, const std::vector<int>& significantBits) {
    int compressedKey = 0;
    for (int bit : significantBits) {
        compressedKey = (compressedKey << 1) | ((key >> bit) & 1);
    }
    return compressedKey;
}
class FusionTreeNode {
public:
    std::vector<int> keys; // Original keys
    std::vector<int> compressedKeys; // Compressed keys
    std::vector<FusionTreeNode*> children;
    std::vector<int> significantBits;
    FusionTreeNode() = default;
    //Method to compress all keys in the node
    void compressKeys() {
        compressedKeys.clear();
        for (int key : keys) {
            compressedKeys.push_back(compressKey(key, significantBits));
        }
    }
    //Method to insert a key into the node
    void insert(int key) {
        auto it = std::lower_bound(keys.begin(), keys.end(), key);
        keys.insert(it, key);
        compressKeys();
        // Split node if necessary
        if (keys.size() > WORD_SIZE) {
            splitNode();
        }
    }
    //Method to split a node
    void splitNode() {
        int midIndex = keys.size() / 2;
        FusionTreeNode* leftChild = new FusionTreeNode();
        FusionTreeNode* rightChild = new FusionTreeNode(); 
        // Distribute keys to left and right children
        leftChild->keys.assign(keys.begin(), keys.begin() + midIndex);
        rightChild->keys.assign(keys.begin() + midIndex, keys.end());
        // Compress keys for left and right children
        leftChild->compressKeys();
        rightChild->compressKeys();
        // Clear keys in the current node and add children
        keys.clear();
        children.push_back(leftChild);
        children.push_back(rightChild);
    }
    //Method to search for a key in the node
    bool search(int key) {
        int compressedKey = compressKey(key, significantBits);
        auto it = std::lower_bound(compressedKeys.begin(), compressedKeys.end(), compressedKey); 
        if (it != compressedKeys.end() && *it == compressedKey) {
            // Further check the actual key to confirm the match
            return std::find(keys.begin(), keys.end(), key) != keys.end();
        }
        // Search in the corresponding child
        int index = it - compressedKeys.begin();
        if (index < children.size()) {
            return children[index]->search(key);
        }
        return false;
    }
    //Method to delete a key from the node
    void deleteKey(int key) {
        auto it = std::find(keys.begin(), keys.end(), key);
        if (it != keys.end()) {
            keys.erase(it);
            compressKeys();
        }
        // Handle rebalancing if necessary
    }
};
class FusionTree {
private:
    FusionTreeNode* root;
public:
    FusionTree(const std::vector<int>& keys) {
        root = new FusionTreeNode();
        root->keys = keys;
        root->significantBits = extractSignificantBits(keys);
        root->compressKeys();
    }
    //Method to insert a key into the fusion tree
    void insert(int key) {
        root->insert(key);
    }
    //Method to search for a key in the fusion tree
    bool search(int key) {
        return root->search(key);
    }
    //Method to delete a key from the fusion tree
    void deleteKey(int key) {
        root->deleteKey(key);
    }
};
int main() {
    // Sample keys
    std::vector<int> keys = {12, 15, 7, 9};
    // Create a fusion tree
    FusionTree fusionTree(keys);
    // Search for a key
    int searchKey = 15;
    std::cout << "Searching for key " << searchKey << ": " << (fusionTree.search(searchKey) ? "Found" : "Not found") << std::endl;
    // Insert a new key
    int newKey = 20;
    fusionTree.insert(newKey);
    std::cout << "Inserted key " << newKey << std::endl;
    // Delete a key
    int deleteKey = 7;
    fusionTree.deleteKey(deleteKey);
    std::cout << "Deleted key " << deleteKey << std::endl;
    // Search again for the deleted key
    std::cout << "Searching for key " << deleteKey << ": " << (fusionTree.search(deleteKey) ? "Found" : "Not found") << std::endl;
    return 0;
}

Output:

Output


Searching for key 15: Not found
Inserted key 20
Deleted key 7
Searching for key 7: Not found

Explanation:

✅ FusionTreeNode Class:

This class defines a node within the fusion tree data structure. It includes arrays for holding unaltered keys, compacted keys, descendant nodes, and important bit information.

CompressKeys: The compressKeys function compresses the keys saved within the node by utilizing important bits. It loops through each key, isolates the significant bits, and generates compressed keys.

The insert method adds a fresh key to the node by locating the correct insertion point through binary search. When the node surpasses a set threshold (WORD_SIZE), it divides into two offspring nodes.

The SplitNode function divides a node into two child nodes when it surpasses the limit set by the WORD_SIZE constant. It then redistributes the keys among the child nodes and compresses them as needed.

The search method is responsible for locating a specific key within the node. It minimizes the search key by utilizing important bits and then executes a binary search on these condensed keys. Upon finding the key, it validates the match with the initial keys.

✅ FusionTree Class:

This class embodies the fusion tree structure and acts as the main access point for carrying out operations.

Initialize the fusion tree by providing a collection of keys to the constructor. This process involves the creation of the main node, allocation of keys to it, identification and extraction of important bits, and compression of the keys.

Delegates the task of inserting keys to the main root node.

Search: Delegates key search to the root node.

Deletion of keys is outsourced to the root node within the DeleteKey function.

✅ Core Functionality:

Generate a fusion tree containing example keys to illustrate key functionalities like searching, adding, and removing. Display the outcomes of these actions.

The code demonstrates functions for extracting important bits from keys and compressing keys based on these significant bits.

Node Division: Whenever a node surpasses a specific size threshold, it divides into two offspring nodes to uphold equilibrium and optimize performance.

Enhancing Search Performance: Fusion trees improve search functionality through the utilization of compact keys and binary search.

✅ Enhanced Scalability and Performance:

Fusion trees provide effective search functionalities, particularly in situations involving extensive datasets. By leveraging important bits and splitting nodes, these trees maintain equilibrium, leading to enhanced operational efficiency.

Complexity Analysis:

Time Complexity:

Search Operation:

When using a fusion tree, the search process mainly consists of comparing the search key with the condensed keys stored in the nodes.

The time complexity is O(logn), with n representing the overall key count in the fusion tree. This efficiency stems from the binary search operation conducted on the condensed keys in every node.

Insertion and Deletion:

Performing insertion and deletion tasks within fusion trees involves identifying the accurate location for the key and potentially dividing nodes to ensure equilibrium is upheld.

The time complexity is amortized O(log^2 n), with 'n' representing the total count of keys.

Node splitting might be initiated by insertion or deletion actions when a node surpasses a predefined size threshold (WORD_SIZE). The process of splitting a node can be completed in O(logn) time. Even though this event occurs infrequently and is spread out over numerous operations, the total complexity stays at O(log^2 n) in an amortized scenario.

Space Complexity:

Memory Usage:

Fusion trees need memory allocation for storing the keys, compressed keys, important bits, and node configurations.

Space complexity: O(nlogn), where n represents the overall count of keys.

Each key is saved with its compressed form, necessitating extra storage space. Furthermore, each node retains references to its child nodes, adding to the overall space consumption.

Node Splitting:

Node splitting happens when a node reaches the maximum size limit (WORD_SIZE).

Space complexity: O(logn) extra space is required as a result of generating new nodes while performing the split operation.

When a node undergoes a split operation, it generates two fresh child nodes, with each holding a portion of keys from the original node. This division results in an expansion of the total space complexity.

Significant Bits:

The critical components dictate the extent of compression attained in the fusion tree.

Space complexity: O(logB), where B represents the branching factor of the tree structure.

The choice of significant bits selected impacts the compression ratio and therefore affects the amount of space needed to store compressed keys.

Approach-2: Buffered Node Splitting Approach

The Buffered Node Splitting Technique is an innovative improvement to the conventional fusion tree data structure. Its primary objective is to boost efficiency by reducing the need for frequent node splitting actions. Fusion trees are well-known for their effective organization and retrieval of ordered data, employing strategies like bitwise parallelism and compression. Nevertheless, the excessive splitting of nodes can pose a challenge in situations where datasets undergo rapid changes.

In this strategy, every fusion tree node contains a buffer that serves as a provisional key storage area. When adding a new key, the buffer is initially inspected to see if there is room. If the buffer is not full, the key is placed there. Once the buffer is at capacity, a portion of the keys is moved to the primary key collection. If the primary list surpasses its limit, a split process is initiated, taking into account the contents of the buffer before proceeding.

Buffered node splitting minimizes the frequency of splitting operations and the related overhead by postponing the splitting process until the primary key list reaches a substantial capacity. This enhancement enables fusion trees to uphold their effectiveness and adaptability, rendering them well-suited for demanding tasks that involve extensive and constantly changing data collections.

Program:

Example


#include <iostream>
#include <vector>
#include <algorithm>
const int MAX_NODE_SIZE = 4; // Maximum number of keys in a node
const int BUFFER_SIZE = 2;   // Maximum number of keys in the buffer
class FusionTreeNode {
public:
    std::vector<int> keys;   // Main key list
    std::vector<int> buffer; // Buffer for temporarily storing keys
    FusionTreeNode* leftChild;
    FusionTreeNode* rightChild;
    FusionTreeNode() : leftChild(nullptr), rightChild(nullptr) {}
    // Insert a key into the node
    void insert(int key) {
        if (buffer.size() < BUFFER_SIZE) {
            buffer.push_back(key);
        } else {
            keys.insert(keys.end(), buffer.begin(), buffer.end());
            buffer.clear();
            if (keys.size() > MAX_NODE_SIZE) {
                splitNode();
            }
        }
    }
    // Split the node into two child nodes
    void splitNode() {
        leftChild = new FusionTreeNode();
        rightChild = new FusionTreeNode();
        int midIndex = keys.size() / 2;
        // Distribute keys between left and right child nodes
        leftChild->keys.assign(keys.begin(), keys.begin() + midIndex);
        rightChild->keys.assign(keys.begin() + midIndex, keys.end());
        // Clear the main key list
        keys.clear();
        // Move buffer contents to the main key list of the left child
        leftChild->keys.insert(leftChild->keys.end(), buffer.begin(), buffer.end());
        buffer.clear();
    }
    // Inorder traversal to print keys
    void inorder() {
        if (leftChild != nullptr) {
            leftChild->inorder();
        }
        for (int key : keys) {
            std::cout << key << " ";
        }
        if (rightChild != nullptr) {
            rightChild->inorder();
        }
    }
};
class FusionTree {
private:
    FusionTreeNode* root;
public:
    FusionTree() : root(nullptr) {}
    // Insert a key into the fusion tree
    void insert(int key) {
        if (root == nullptr) {
            root = new FusionTreeNode();
        }
        root->insert(key);
    }
    // Inorder traversal to print keys
    void printKeys() {
        if (root != nullptr) {
            root->inorder();
        }
    }
};
int main() {
    // Create a fusion tree
    FusionTree fusionTree;
    // Insert keys into the fusion tree
    fusionTree.insert(10);
    fusionTree.insert(5);
    fusionTree.insert(15);
    fusionTree.insert(3);
    fusionTree.insert(7);
    fusionTree.insert(12);
    fusionTree.insert(18);
    // Print keys in the fusion tree
    std::cout << "Keys in the fusion tree: ";
    fusionTree.printKeys();
    std::cout << std::endl;
    return 0;
}

Output:

Output


Keys in the fusion tree: 10 5 3 7

Explanation:

The given code demonstrates the Buffered Node Splitting technique within a fusion tree data structure through the use of C++. This method aims to boost the fusion tree's efficiency by minimizing the need for frequent node splitting, ultimately improving its overall performance.

✅ FusionTreeNode Class:

This particular class serves as a representation of a node within the fusion tree data structure. It manages arrays to hold key values and a temporary storage area for keys while they are being inserted.

The insert function adds a new key to the node. When the buffer is not at capacity, the key is placed into the buffer. As the buffer fills up, keys are transferred to the primary key list. If the primary key list surpasses its maximum size, the node undergoes a split operation, resulting in two child nodes.

The splitNode function partitions the node into two child nodes, redistributing keys between them to ensure equilibrium.

✅ FusionTree Class:

This class is responsible for overseeing the fusion tree arrangement. It holds a reference to the primary node. The addKey function integrates a fresh key into the fusion tree by assigning the insertion task to the primary node. The displayKeys operation executes an inorder scan of the fusion tree to showcase all keys in a sorted manner.

Main Function:

It instantiates a FusionTree object. By utilizing the insert function, multiple keys are added to the fusion tree. Subsequently, the printKeys function is invoked to display all keys stored within the fusion tree.

Buffered Node Splitting:

The primary characteristic of this method involves incorporating a buffer within every node. This buffer serves as a temporary storage for keys when performing insertions. The division of nodes is postponed until the primary key list reaches a specific threshold, decreasing the occurrence of split operations and enhancing overall efficiency.

Complexity Analysis:

Time Complexity:

Insertion Operation:

When adding a key to a fusion tree node, the computational complexity is determined by the quantity of existing keys within the node and the dimensions of the buffer.

In an optimal situation, if the buffer has free capacity, adding a key is a constant time operation denoted as O(1) since it only requires appending the key to the buffer. Conversely, in the least favorable circumstance where the buffer is at maximum capacity leading to a split, the insertion operation is O(logn) where n represents the count of keys in the primary key list.

In general, the estimated time complexity for insertion can be seen as O(logn) on an amortized basis, factoring in both cases and the occurrence of splits.

Node Splitting Operation:

Node splitting takes place when the primary key list surpasses the maximum node size due to the addition of keys from the buffer. The complexity of splitting a node is O(n), where n represents the total number of keys in the primary key list. This process entails redistributing keys between two child nodes. Nonetheless, the splitting of nodes is delayed until the primary key list grows to a specific size, thereby decreasing the occurrence of split operations and spreading out their expenses over time.

Traversal Operation (Printing Keys):

Traversing the fusion tree to display all keys requires an inorder traversal. The time complexity of performing an inorder traversal is O(n), with n representing the overall quantity of keys in the fusion tree. This is due to the fact that each individual key is accessed exactly once. The complexity of the traversal remains consistent regardless of the presence of buffering mechanisms or splitting operations.

Space Complexity:

Memory Usage:

The space efficiency of the fusion tree is mainly determined by the quantity of nodes and the keys they contain. Every node in a fusion tree requires memory for holding keys, buffers, and references to child nodes.

The space requirement is O(n), where n represents the overall count of keys held within the fusion tree. Moreover, the inclusion of buffers and pointers adds an extra layer of overhead, further impacting the space complexity.

Buffer Size:

The buffer's space complexity is O(1), remaining constant regardless of the number of keys in the fusion tree. While the buffer size and maximum node size impact memory needs, they do not have a substantial effect on the overall space complexity.

In brief, the Buffered node-splitting technique within a fusion tree provides effective time and space efficiencies for essential operations. While occasional splits may occur, the insertion and traversal tasks consistently exhibit O(logn) and O(n) complexities, respectively. The spatial complexity is O(n), mainly dependent on the quantity of keys housed within the fusion tree. In general, this strategy enhances the fusion tree's performance, rendering it well-suited for scenarios involving sizable and ever-changing datasets.

Fusion Tree In C++

Key Techniques

Approach-1: Using Bit extraction and compression.

Bit Extraction

Example:

Approach-2: Buffered Node Splitting Approach

Program:

Input Required