The Standard Template Library (STL) in Contemporary C++ provides a wide range of algorithms that operate on fundamental sequences like vectors, arrays, and lists. These algorithms interact with various iterators and are structured as template functions. Designed with the principles of generic programming, these algorithms heavily rely on iterators to hide the complexities of the data structures beneath.
In the realm of STL algorithms, the execution policy defines the manner in which the algorithm can operate concurrently. This feature was introduced in C++17 to empower developers to leverage parallelism in certain algorithms when processing large datasets. The three execution policies are as follows:
1. Sequential execution:
This guideline, std::execution::seq (Sequential Execution), guarantees this. If no specific policy is given, the default execution policy is applied. This policy ensures that the algorithm will be executed in a single thread and that the order of execution will correspond to the sequence of the input iterators.
Syntax:
It has the following syntax:
std::sort(std::execution::seq, begin(myVector), end(myVector));
Example:
Let's consider an example to demonstrate the step-by-step execution in C++:
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
int main()
{
std::vector<int> v = { 7, 9, 8, 1, 1 };
std::sort(std::execution::seq, v.begin(), v.end());
for (auto i : v)
std::cout << i << " ";
return 0;
}
Output:
2. Standard::Execution::Par (Parallel Execution):
This guideline allows for concurrent algorithm processing. The decision to employ threads, vectorization, or any alternative parallelization method is at the discretion of the implementation; the C++ Standard Library does not dictate the approach to achieving parallelism.
Syntax:
It has the following syntax:
std::sort(std::execution::par, begin(myVector), end(myVector));
Example:
Let's consider an example to demonstrate concurrent execution in C++:
#include <algorithm>
#include <execution>
#include <iostream>
#include <vector>
int main()
{
std::vector<int> v1 = { 1, 2, 3, 4, 5 };
std::vector<int> v2(5);
std::transform(std::execution::par, v1.begin(),
v1.end(), v2.begin(),
[](int x) { return x * x; });
for (int i : v2) {
std::cout << i << " ";
}
return 0;
}
- Parallel Unsequenced Execution, also known as std::execution::par_unseq, functions as a policy similar to std::execution::par, yet it allows for the vectorization of the algorithm. This indicates that besides running concurrently, the algorithm can also undergo vectorization to enhance performance at a per-iteration level.
Syntax:
It has the following syntax:
std::sort(std::execution::par_unseq, begin(myVector), end(myVector));
Example:
Let's consider a scenario to demonstrate concurrent unordered execution in C++:
#include <algorithm>
#include <iostream>
#include <vector>
#include <execution>
int main()
{
std::vector<int> v = { 1, 2, 3, 4, 5 };
std::for_each(std::execution::par_unseq, v.begin(),v.end(),
[](int x) { std::cout << x << " "; });
return 0;
}
Output:
It is important to bear in mind that not every algorithm is capable of leveraging parallel execution with the utilization of these execution policies. The degree of performance enhancement will differ based on the inherent characteristics of the algorithm, the data volume, and the hardware structure. Prior to opting for a parallel execution policy, it is advisable to conduct profiling and measurement of performance, as synchronization and other elements could potentially introduce overhead resulting from parallel execution.
Example:
Here is a basic illustration showcasing the utilization of std::execution::par with std::for_each:
#include <algorithm>
#include <execution>
#include <iostream>
#include <vector>
int main() {
std::vector<int> numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::for_each(std::execution::par, numbers.begin(), numbers.end(), [](int& num) {
num *= 2;
});
for (const auto& num : numbers) {
std::cout << num << " ";
}
return 0;
}
Output:
Explanation:
In this instance, the lambda expression is employed to process each element of the vector concurrently, effectively doubling its value.
Performance Comparison Between Execution Policies:
A basic C++ program can be employed to analyze the performance discrepancies among the execution policies, as illustrated below:
// C++ Program to evaluate the performance of the four
// execution policies
#include <chrono>
#include <execution>
#include <iostream>
#include <vector>
// Function to calculate the execution time of different
// execution policies
void execTime(auto policy_type, std::vector<int>& num,
std::string pType_name)
{
auto start_time= std::chrono::high_resolution_clock::now();
long long sum = 0;
// finding sum of each element in the vector
std::for_each(policy_type, num.begin(), num.end(),
[&](int n) { sum += n; });
auto end_time= std::chrono::high_resolution_clock::now();
auto taken_time = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count();
// printing execution time
std::cout << pType_name
<< " execution time: " << taken_time<< "ms\n";
}
int main()
{
// Creating large vector of int
int size = 9999999;
std::vector<int> num(size);
// initializing vector
for (int i = 0; i < size; i++) {
num[i] = i;
}
// execution time
execTime(std::execution::seq, num, "Sequenced");
execTime(std::execution::unseq, num, "Unsequenced");
execTime(std::execution::par, num, "Parallel");
execTime(std::execution::par_unseq, num,"Parallel Unsequenced");
return 0;
}
Output:
Explanation:
The unsequencedpolicy is known for its rapid execution, particularly in cases involving vectorization. Following this, the parallelpolicy is implemented, with the parallelunsequencedpolicy following closely behind. Your execution strategy adhered to the intended sequence.
It is crucial to remember that not all algorithms are compatible with every execution policy. Depending on the chosen execution policy, certain algorithms may exhibit varying performance. Selecting the most suitable execution approach that aligns with the task's demands and the available hardware is paramount. Additionally, experimenting with different strategies is essential to identify the optimal one for a specific task.
Benefits of Execution Policy of STL Algorithms in Modern C++
Developers can now manage the parallel execution of specific algorithms due to the addition of execution policies in STL algorithms in C++17. This feature has several advantages:
- Performance Gains from Parallelism: The main advantage is using parallelism to boost multi-core processor performance. Large datasets can be processed more quickly by parallelizing some algorithms to handle data concurrently.
- Enhanced Responsiveness: Applications handling big datasets may benefit from parallel execution, which can enhance responsiveness. Algorithms can execute and respond faster by splitting the workload among several threads or cores.
- Scalability: By utilizing available hardware resources, execution policies make applications more scalable. Achieving optimal performance depends more on parallel execution as processors' core counts rise.
- Flexibility and Portability: Depending on their application's parameters and their algorithms' properties, developers can select the best execution strategy. Because of its adaptability, optimization can be done to meet the unique requirements of various scenarios.
- Fine-grained Control: The amount of parallelism can be managed finely using execution policies. Depending on the algorithm and its requirements, developers can select between sequential execution (std::execution::seq) , parallel execution (std::execution::par) , and parallel unsequenced execution (std::execution::par_unseq) .
- Compatibility with Current Code: Sequential execution by default functions flawlessly for existing code that uses STL algorithms without defining an execution policy. As a result, backward compatibility is guaranteed, and developers can progressively implement parallelism as needed.
- Performance Profiling: The ability to switch between execution policies makes performance profiling and tuning easier. When developers can test various policies and gauge their effect on performance, it becomes simpler to locate bottlenecks and optimize crucial code segments.
- Concise and Readable Code: Using execution policies in STL algorithms enables developers to express parallelism concisely and readably despite the underlying complexity of parallel execution. The algorithm calls are still known, and behaviour is controlled by the execution policy acting as a modifier.
It is crucial to bear in mind that parallelization does not uniformly improve the performance of all algorithms, and the effectiveness of parallel execution is influenced by factors such as the algorithm's characteristics, scale, and hardware design. It is recommended to leverage performance profiling and thoughtful analysis to identify the optimal execution policies for a specific scenario.