Cache Friendly Code In C++ - C++ Programming Tutorial
C++ Course / Miscellaneous / Cache Friendly Code In C++

Cache Friendly Code In C++

BLUF: Mastering Cache Friendly Code In C++ is a critical step in becoming a proficient C++ developer. This lesson provides a deep dive into the syntax, performance considerations, and real-world applications of this concept.
Key Performance Insight: Cache Friendly Code In C++

C++ is renowned for its efficiency. Learn how Cache Friendly Code In C++ enables low-level control and high-performance computing in the tutorial below.

In this guide, we will explore cache-conscious programming in C++ along with its functionality and a variety of illustrations.

What is the Cache-friendly code in C++?

Writing code that optimizes memory access patterns to leverage the CPU cache effectively, a small, rapid memory designed to store commonly accessed data, is known as "cache-friendly code." By implementing efficient cache management strategies, software operations experience enhanced speed due to the quicker access to data compared to fetching it from RAM (primary memory). Effective cache management in C++ involves organizing our data and algorithms in close proximity by clustering related or frequently used data together in memory. This arrangement minimizes cache misses, which result from crucial elements not being present in the cache.

The reason cache friendliness matters:

Memory functions at a notably reduced speed compared to modern systems, which can lead to performance bottlenecks when a CPU needs to pause for data retrieval from RAM. Enhancing code and data structures to maximize cache utilization can substantially boost programming performance.

Example 1:

Let's consider a scenario to demonstrate Cache-friendly programming in C++.

Example

#include <iostream>
#include <chrono>
#include <vector>

int main() {
    const int size = 10000;

    // Dynamically allocate 2D array using vector of vectors
    std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));

    // Cache-friendly: Accessing by row
    auto start = std::chrono::high_resolution_clock::now();

    for (int row = 0; row < size; row++) {
        for (int col = 0; col < size; col++) {
            matrix[row][col] += 1;  // Row-major access (contiguous in memory)
        }
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff = end - start;
    std::cout << "Cache-friendly row-major time: " << diff.count() << " seconds\n";

    return 0;
}

Output:

Output

Cache-friendly row-major time: 0.484802 seconds

Explanation:

In this instance, the outer loop handles iterations across rows, while the inner loop manages iterations along columns. This results in sequential memory access following a row-major order, enhancing efficiency and maximizing cache utilization.

Example 2:

Let's consider another example to demonstrate the concept of Cache-friendly code in C++.

Example

#include <iostream>
#include <chrono>
#include <vector>

int main() {
    const int size = 10000;

    // Dynamically allocate 2D array using vector of vectors
    std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));

    // Cache-unfriendly: Accessing by column
    auto start = std::chrono::high_resolution_clock::now();

    for (int col = 0; col < size; col++) {
        for (int row = 0; row < size; row++) {
            matrix[row][col] += 1;  // Column-major access (non-contiguous in memory)
        }
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff = end - start;
    std::cout << "Cache-unfriendly column-major time: " << diff.count() << " seconds\n";

    return 0;
}

Output:

Output

Cache-unfriendly column-major time: 0.96013 seconds

Explanation:

This particular 2D array demonstrates poor cache efficiency as it is accessed column by column, resulting in non-contiguous memory access. Given that C++ stores 2D arrays in row-major order, accessing elements in a column-major fashion frequently results in cache misses.

Example 3:

Let's consider another scenario to demonstrate the concept of Cache-friendly code in C++.

Example

#include <iostream>
#include <chrono>
#include <vector>

int main() {
    const int size = 10000;

    // Dynamically allocate 2D array using vector of vectors
    std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));

    // Cache-friendly: Accessing by row
    auto start = std::chrono::high_resolution_clock::now();

    for (int row = 0; row < size; row++) {
        for (int col = 0; col < size; col++) {
            matrix[row][col] += 1;  // Row-major access (contiguous in memory)
        }
    }

    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> diff = end - start;
    std::cout << "Cache-friendly row-major time: " << diff.count() << " seconds\n";

    return 0;
}

Output:

Output

Cache-friendly row-major time: 0.587543 seconds

Explanation:

  • Allocating the Matrix: Because of the size of the matrix, it is best to dynamically allocate it using std::vector to avoid stack overflow issues.
  • Memory Access Optimization: Use row-major loops for memory access because iteration first over rows and then columns exploits data localization similarly done in C++. Enhancing memory continuity localization reduces cache missing that quickens the process.
  • Conclusion:

In summary, creating code that is optimized for cache usage is a crucial concept in improving performance, particularly when dealing with large data structures such as 2D arrays in C++. Cache-friendly programming focuses on maximizing cache hits while ensuring efficient and timely data retrieval through the utilization of both temporal and spatial locality in memory access. This approach can significantly boost cache effectiveness by leveraging row-based memory organization to facilitate sequential access. By structuring code to access elements in a way that accelerates cache line population, it minimizes the need for frequent trips to slower main memory. Conversely, cache-unfriendly practices like column-major access can lead to suboptimal performance due to increased cache misses and irregular memory accesses.

Input Required

This code uses input(). Please provide values below:

Logic Practice
Install Logic Practice
Add to home screen for a faster app-like experience