In this article, we will discuss cache-friendly code in C++ with its working and several examples.
What is the Cache-friendly code in C++?
Programming that maximizes memory access patterns to make the most of the CPU cache a quick, compact memory intended to hold frequently requested data is referred to as "cache-friendly code". In proper cache management, programs run much faster because of the faster retrieval of information compared to the speed for accessing information from RAM (main memory). Proper cache management in C++ positions our data and algorithms close together by placing related or frequently requested data near each other in memory to ensure that cache misses, which are caused by the absence of important aspects in the cache.
The reason cache friendliness matters:
Memory operates significantly more slowly than on contemporary systems. Performance bottlenecks may occur on a CPU that must wait for data to load from RAM. Programming performance can be greatly increased by optimizing code and data structures to use the cache as efficiently as possible.
Example 1:
Let us take an example to illustrate the Cache-friendly code in C++.
#include <iostream>
#include <chrono>
#include <vector>
int main() {
const int size = 10000;
// Dynamically allocate 2D array using vector of vectors
std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));
// Cache-friendly: Accessing by row
auto start = std::chrono::high_resolution_clock::now();
for (int row = 0; row < size; row++) {
for (int col = 0; col < size; col++) {
matrix[row][col] += 1; // Row-major access (contiguous in memory)
}
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Cache-friendly row-major time: " << diff.count() << " seconds\n";
return 0;
}
Output:
Cache-friendly row-major time: 0.484802 seconds
Explanation:
In this example, iterations are performed over rows by the outer loop, and over columns by the inner loop. The output is contiguous memory access in a row-major order, which is very optimal and maximizes cache usage.
Example 2:
Let us take another example to illustrate the Cache-friendly code in C++ .
#include <iostream>
#include <chrono>
#include <vector>
int main() {
const int size = 10000;
// Dynamically allocate 2D array using vector of vectors
std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));
// Cache-unfriendly: Accessing by column
auto start = std::chrono::high_resolution_clock::now();
for (int col = 0; col < size; col++) {
for (int row = 0; row < size; row++) {
matrix[row][col] += 1; // Column-major access (non-contiguous in memory)
}
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Cache-unfriendly column-major time: " << diff.count() << " seconds\n";
return 0;
}
Output:
Cache-unfriendly column-major time: 0.96013 seconds
Explanation:
This example's 2D array is less cache-friendly because we access it column by column due to non-contiguous memory access. Because 2D arrays are stored in C++ in row-major order, accessing elements in a column-major style often leads to cache misses.
Example 3:
Let us take another example to illustrate the Cache-friendly code in C++.
#include <iostream>
#include <chrono>
#include <vector>
int main() {
const int size = 10000;
// Dynamically allocate 2D array using vector of vectors
std::vector<std::vector<int>> matrix(size, std::vector<int>(size, 0));
// Cache-friendly: Accessing by row
auto start = std::chrono::high_resolution_clock::now();
for (int row = 0; row < size; row++) {
for (int col = 0; col < size; col++) {
matrix[row][col] += 1; // Row-major access (contiguous in memory)
}
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
std::cout << "Cache-friendly row-major time: " << diff.count() << " seconds\n";
return 0;
}
Output:
Cache-friendly row-major time: 0.587543 seconds
Explanation:
- Allocating the Matrix: Because of the size of the matrix, it is best to dynamically allocate it using std::vector to avoid stack overflow issues.
- Memory Access Optimization: Use row-major loops for memory access because iteration first over rows and then columns exploits data localization similarly done in C++. Enhancing memory continuity localization reduces cache missing that quickens the process.
Conclusion:
In conclusion, cache-friendly code is an important concept in optimizing performance, especially with huge data structures like 2D arrays in C++. Cache-friendly programming tries to ensure proper cache hits while maintaining coherent and timely data access by using both temporal and spatial locality when accessing memory. It can greatly enhance cache efficiency because it allows the use of rows to sequentially organize memory. When the code is written to access elements, it helps the CPU fill cache lines fast so that there is no need to keep visiting slower main memory for longer periods. However, cache-unfriendly patterns, such as column-major access, may result in poor performance because they cause frequent cache misses and fragmented memory accesses.