A pair of float values supplied by the user, representing X and Y variables, serve as the input for the C++ program to compute the Pearson correlation coefficient.
The Pearson correlation coefficient is used to measure the linear relationship between two variables. It typically takes values between -1 and 1, denoted by the symbol r.
- Perfect positive linear relationships are shown by r = 1,
- Perfect negative linear relationships are shown by r = −1, and
- No linear relationships are shown by r = 0.
Steps involved in the implementation of the pearson correlation coefficient in C++:
- It is necessary to find the means of the two variables.
- Find out how much the covariance between the two variables represents.
- Calculate the standard deviations of the two variables individually.
- Use the following formulas to find the correlation coefficient.
Example 1:
Let's consider a scenario to demonstrate the Pearson Correlation Coefficient in C++.
#include <iostream>
#include <vector>
#include <cmath>
double calculateMean(const std::vector<double>& data) {
double sum = 0.0;
for (double value : data) {
sum += value;
}
return sum / data.size();
}
double calculateCovariance(const std::vector<double>& x, const std::vector<double>& y) {
if (x.size() != y.size()) {
std::cerr << "Error: Size mismatch between x and y vectors\n";
return 0.0;
}
double xMean = calculateMean(x);
double yMean = calculateMean(y);
double covariance = 0.0;
for (size_t i = 0; i < x.size(); ++i) {
covariance += (x[i] - xMean) * (y[i] - yMean);
}
return covariance / x.size();
}
double calculateStandardDeviation(const std::vector<double>& data, double mean) {
double variance = 0.0;
for (double value : data) {
variance += pow(value - mean, 2);
}
return sqrt(variance / data.size());
}
double calculatePearsonCorrelation(const std::vector<double>& x, const std::vector<double>& y) {
double covariance = calculateCovariance(x, y);
double xStdDev = calculateStandardDeviation(x, calculateMean(x));
double yStdDev = calculateStandardDeviation(y, calculateMean(y));
return covariance / (xStdDev * yStdDev);
}
int main() {
std::vector<double> x = {1, 2, 3, 4, 5};
std::vector<double> y = {2, 4, 6, 8, 10};
double correlation = calculatePearsonCorrelation(x, y);
std::cout << "Pearson Correlation Coefficient: " << correlation << std::endl;
return 0;
}
Output:
Pearson Correlation Coefficient: 1
Example 2:
Let's consider another instance to demonstrate the Pearson Correlation Coefficient in C++.
#include <iostream>
#include <cmath>
// Function to calculate the mean of an array
double calculateMean(double *arr, int n) {
double sum = 0.0;
for (int i = 0; i < n; ++i) {
sum += *(arr + i);
}
return sum / n;
}
// Function to calculate the covariance of two arrays
double calculateCovariance(double *arr1, double *arr2, int n) {
double mean1 = calculateMean(arr1, n);
double mean2 = calculateMean(arr2, n);
double covariance = 0.0;
for (int i = 0; i < n; ++i) {
covariance += (*(arr1 + i) - mean1) * (*(arr2 + i) - mean2);
}
return covariance / n;
}
// Function to calculate the standard deviation of an array
double calculateStandardDeviation(double *arr, int n, double mean) {
double variance = 0.0;
for (int i = 0; i < n; ++i) {
variance += pow(*(arr + i) - mean, 2);
}
return sqrt(variance / n);
}
// Function to calculate the Pearson correlation coefficient
double calculatePearsonCorrelation(double *arr1, double *arr2, int n) {
double covariance = calculateCovariance(arr1, arr2, n);
double mean1 = calculateMean(arr1, n);
double mean2 = calculateMean(arr2, n);
double stdDev1 = calculateStandardDeviation(arr1, n, mean1);
double stdDev2 = calculateStandardDeviation(arr2, n, mean2);
return covariance / (stdDev1 * stdDev2);
}
int main() {
int n;
std::cout << "Enter the number of elements in the arrays: ";
std::cin >> n;
double *x = new double[n];
double *y = new double[n];
std::cout << "Enter the elements of the first array:\n";
for (int i = 0; i < n; ++i) {
std::cin >> x[i];
}
std::cout << "Enter the elements of the second array:\n";
for (int i = 0; i < n; ++i) {
std::cin >> y[i];
}
double correlation = calculatePearsonCorrelation(x, y, n);
std::cout << "Pearson Correlation Coefficient: " << correlation << std::endl;
// Free dynamically allocated memory
delete[] x;
delete[] y;
return 0;
}
Output:
Enter the number of elements in the arrays: 5
Enter the elements of the first array:
1 2 5 6 8
Enter the elements of the second array:
12 232 45 61 76
Pearson Correlation Coefficient: -0.202537
===============================================================
Enter the number of elements in the arrays: 5
Enter the elements of the first array:
1 2 5 6 8
Enter the elements of the second array:
12 23 45 61 76
Pearson Correlation Coefficient: 0.995226
Conclusion:
In summary, The Pearson correlation coefficient serves to measure the magnitude and direction of linear connections between two variables. The process of determining this coefficient entails summing data points, their squares, and products, as demonstrated in the C++ script provided. It's important to recognize that the applicability of the Pearson correlation coefficient relies on certain assumptions like the linearity between variables and the absence of outliers in numerical data. The code lacks error-handling features such as addressing division by zero or handling inappropriate input sizes that could trigger runtime complications. Nevertheless, despite these constraints, the code offers a fundamental insight into calculating correlations using C++. A potential improvement would be to incorporate more robust error-checking and validation functionalities to enhance its reliability in practical applications.