Count Min Sketch In C++ - C++ Programming Tutorial
C++ Course / Miscellaneous / Count Min Sketch In C++

Count Min Sketch In C++

BLUF: Mastering Count Min Sketch In C++ is a critical step in becoming a proficient C++ developer. This lesson provides a deep dive into the syntax, performance considerations, and real-world applications of this concept.
Key Performance Insight: Count Min Sketch In C++

C++ is renowned for its efficiency. Learn how Count Min Sketch In C++ enables low-level control and high-performance computing in the tutorial below.

Introduction:

The Count-Min Sketch is a probabilistic technique employed for estimating counts in significant data streams with approximate accuracy. It effectively determines the occurrence rate of items within a data stream, all while conserving memory resources.

Essentially, the Count-Min Sketch is comprised of a two-dimensional array of counters. Hash functions are utilized to assign elements from the data stream to specific locations in this array. It is common practice to employ multiple hash functions, with each one identifying a distinct column within the array. Upon the arrival of an element in the stream, the counters corresponding to it in every row are incremented.

The primary concept driving the Count-Min Sketch is the utilization of several hash functions and the upkeep of numerous counters per item. This approach acknowledges that collisions are likely to occur, yet the smallest value among all counters associated with a specific item typically offers a reliable estimate of its actual frequency. Users have the flexibility to regulate the balance between precision and memory consumption by modifying the dimensions of the array.

Program:

Example

#include <iostream>
#include <vector>
#include <functional>
#include <climits>
class CountMinSketch {
private:
    int width;
    int depth;
    std::vector<std::vector<int>> counts;
    std::vector<std::hash<int>> hash_functions;
public:
    CountMinSketch(int w, int d) : width(w), depth(d), counts(d, std::vector<int>(w, 0)) {
        for (int i = 0; i < d; ++i) {
            hash_functions.push_back(std::hash<int>());
        }
    }
    void update(int item, int count) {
        for (int i = 0; i < depth; ++i) {
            int hash_val = hash_functions[i](item);
            int index = hash_val % width;
            counts[i][index] += count;
        }
    }
    int query(int item) {
        int min_count = INT_MAX;
        for (int i = 0; i < depth; ++i) {
            int hash_val = hash_functions[i](item);
            int index = hash_val % width;
            min_count = std::min(min_count, counts[i][index]);
        }
        return min_count;
    }
};
int main() {
    // Example usage
    int width = 100;
    int depth = 5;
    CountMinSketch sketch(width, depth);
    // Update sketch with counts
    sketch.update(42, 5);
    sketch.update(55, 3);
    sketch.update(42, 2);
    // Query sketch for counts
    std::cout << "Count of item 42: " << sketch.query(42) << std::endl;  // Output: 7
    std::cout << "Count of item 55: " << sketch.query(55) << std::endl;  // Output: 3
    return 0;
}

Output:

Output

Count of item 42: 7
Count of item 55: 3

Explanation:

Private Attributes: The code introduces a C++ class called CountMinSketch, which encapsulates the functionality of the Count-Min Sketch data structure.

The dimensions of width and depth in the Count-Min Sketch determine both its memory utilization and precision.

counts: This vector in two dimensions represents the collection of counters within the sketch.

hashing functions: This array contains occurrences of the std::hash<int> method, employed for generating hash values of elements.

  • Initializer:

The constructor sets the values for the width, depth, and counts array.

It further fills the hash_functions vector with the specified number of hash functions.

  • Update Procedure:

The update function requires an item along with its quantity as parameters.

It hashes the item with each hash function and then proceeds to update the relevant counters within the count's array.

  • Query Strategy:

The query function receives an item as an argument and provides an approximation of its quantity.

It hashes the item with every hash function and then selects the lowest count from all the hash functions.

  • Primary Function:

Within the main function, a CountMinSketch object is instantiated with a defined width and depth configuration.

The sketch has been revised to include quantities for a pair of items: 42 and 55.

The query function is invoked to calculate the quantities of these elements, and the outcomes are displayed.

Complexity analysis:

Time Complexity:

Update Operation:

The time complexity of refreshing the sketch with the count of an item is O(d), where ```

include <iostream>

include <vector>

include <functional>

include <climits>

class CountMinSketch {

private:

int width;

int depth;

std::vector<std::vector<int>> counts;

std::vector<std::hash<int>> hash_functions;

public:

CountMinSketch(int w, int d) : width(w), depth(d), counts(d, std::vector<int>(w, 0)) {

for (int i = 0; i < d; ++i) {

hashfunctions.pushback(std::hash<int>);

}

}

void update(int item, int count) {

for (int i = 0; i < depth; ++i) {

int hashval = hashfunctionsi;

int index = hash_val % width;

countsi += count;

}

}

int query(int item) {

int mincount = INTMAX;

for (int i = 0; i < depth; ++i) {

int hashval = hashfunctionsi;

int index = hash_val % width;

mincount = std::min(mincount, countsi);

}

return min_count;

}

};

int main {

// Example usage

int width = 100;

int depth = 5;

CountMinSketch sketch(width, depth);

// Update sketch with counts

sketch.update(42, 5);

sketch.update(55, 3);

sketch.update(42, 2);

// Query sketch for counts

std::cout << "Count of item 42: " << sketch.query(42) << std::endl; // Output: 7

std::cout << "Count of item 55: " << sketch.query(55) << std::endl; // Output: 3

return 0;

}

Example


d is the depth of the sketch.

This is due to the fact that, for every hash function (of which there are d), updating requires a constant-time action to increase the relevant counter.

Query Operation:

The time complexity of retrieving the item count from the sketch is also O(d).

Similar to the process of updating, each hash function in the query requires accessing the respective counter and identifying the minimum value.

Space Complexity:

The amount of space required by the Count-Min Sketch is influenced by the dimensions of its data structure, which include:

- An allocation of O(w×d) space for the two-dimensional counts array, with w representing the width (counters per row) and d representing the depth (hash functions count).

- An overhead of O(d) space to hold the hash functions within the hash_functions vector.

Input Required

This code uses input(). Please provide values below:

Logic Practice
Install Logic Practice
Add to home screen for a faster app-like experience