Pearson Correlation Coefficient In C++

A user-provided vector consisting of two float values, denoting variables X and Y , is used as input by the C++ program to calculate the Pearson correlation coefficient.

The Pearson correlation coefficient is used to measure the linear relationship between two variables. It typically takes values between -1 and 1, denoted by the symbol r.

✅ Perfect positive linear relationships are shown by r = 1,
✅ Perfect negative linear relationships are shown by r = −1, and
✅ No linear relationships are shown by r = 0.

Steps involved in the implementation of the pearson correlation coefficient in C++:

✅ It is necessary to find the means of the two variables.
✅ Find out how much the covariance between the two variables represents.
✅ Calculate the standard deviations of the two variables individually.
✅ Use the following formulas to find the correlation coefficient.

Example 1:

Let us take an example to illustrate the Pearson Correlation Coefficient in C++.

Example


#include <iostream>
#include <vector>
#include <cmath>

double calculateMean(const std::vector<double>& data) {
    double sum = 0.0;
    for (double value : data) {
        sum += value;
    }
    return sum / data.size();
}

double calculateCovariance(const std::vector<double>& x, const std::vector<double>& y) {
    if (x.size() != y.size()) {
        std::cerr << "Error: Size mismatch between x and y vectors\n";
        return 0.0;
    }
    
    double xMean = calculateMean(x);
    double yMean = calculateMean(y);
    
    double covariance = 0.0;
    for (size_t i = 0; i < x.size(); ++i) {
        covariance += (x[i] - xMean) * (y[i] - yMean);
    }
    return covariance / x.size();
}

double calculateStandardDeviation(const std::vector<double>& data, double mean) {
    double variance = 0.0;
    for (double value : data) {
        variance += pow(value - mean, 2);
    }
    return sqrt(variance / data.size());
}

double calculatePearsonCorrelation(const std::vector<double>& x, const std::vector<double>& y) {
    double covariance = calculateCovariance(x, y);
    double xStdDev = calculateStandardDeviation(x, calculateMean(x));
    double yStdDev = calculateStandardDeviation(y, calculateMean(y));
    
    return covariance / (xStdDev * yStdDev);
}

int main() {
    std::vector<double> x = {1, 2, 3, 4, 5};
    std::vector<double> y = {2, 4, 6, 8, 10};
    
    double correlation = calculatePearsonCorrelation(x, y);
    std::cout << "Pearson Correlation Coefficient: " << correlation << std::endl;
    
    return 0;
}

Output:

Output


Pearson Correlation Coefficient: 1

Example 2:

Let us take another example to illustrate the Pearson Correlation Coefficient in C++.

Example


#include <iostream>
#include <cmath>

// Function to calculate the mean of an array
double calculateMean(double *arr, int n) {
    double sum = 0.0;
    for (int i = 0; i < n; ++i) {
        sum += *(arr + i);
    }
    return sum / n;
}

// Function to calculate the covariance of two arrays
double calculateCovariance(double *arr1, double *arr2, int n) {
    double mean1 = calculateMean(arr1, n);
    double mean2 = calculateMean(arr2, n);
    
    double covariance = 0.0;
    for (int i = 0; i < n; ++i) {
        covariance += (*(arr1 + i) - mean1) * (*(arr2 + i) - mean2);
    }
    return covariance / n;
}

// Function to calculate the standard deviation of an array
double calculateStandardDeviation(double *arr, int n, double mean) {
    double variance = 0.0;
    for (int i = 0; i < n; ++i) {
        variance += pow(*(arr + i) - mean, 2);
    }
    return sqrt(variance / n);
}

// Function to calculate the Pearson correlation coefficient
double calculatePearsonCorrelation(double *arr1, double *arr2, int n) {
    double covariance = calculateCovariance(arr1, arr2, n);
    double mean1 = calculateMean(arr1, n);
    double mean2 = calculateMean(arr2, n);
    
    double stdDev1 = calculateStandardDeviation(arr1, n, mean1);
    double stdDev2 = calculateStandardDeviation(arr2, n, mean2);
    
    return covariance / (stdDev1 * stdDev2);
}

int main() {
    int n;
    std::cout << "Enter the number of elements in the arrays: ";
    std::cin >> n;

    double *x = new double[n];
    double *y = new double[n];

    std::cout << "Enter the elements of the first array:\n";
    for (int i = 0; i < n; ++i) {
        std::cin >> x[i];
    }

    std::cout << "Enter the elements of the second array:\n";
    for (int i = 0; i < n; ++i) {
        std::cin >> y[i];
    }

    double correlation = calculatePearsonCorrelation(x, y, n);

    std::cout << "Pearson Correlation Coefficient: " << correlation << std::endl;

    // Free dynamically allocated memory
    delete[] x;
    delete[] y;

    return 0;
}

Output:

Output


Enter the number of elements in the arrays: 5
Enter the elements of the first array:
1 2 5 6 8
Enter the elements of the second array:
12 232 45 61 76 
Pearson Correlation Coefficient: -0.202537
===============================================================
Enter the number of elements in the arrays: 5
Enter the elements of the first array:
1 2 5 6 8
Enter the elements of the second array:
12 23 45 61 76
Pearson Correlation Coefficient: 0.995226

Conclusion:

In conclusion, The Pearson correlation coefficient helps to quantify both the strength and direction of linear relationships between two variables. The calculation process involves finding the sum of data points, their squares, and their products as illustrated in the C++ code provided. It is worth noting that the validity of using the Pearson correlation coefficient is based on several assumptions such as linearity between variables, numerical nature of data without outliers. There are no error-handling mechanisms present within this code implementation like division by zero or invalid input sizes that could lead to runtime issues; however, despite these limitations, this code imparts a rudimentary knowledge on how correlations can be computed using C++. An enhancement would involve including more robust error-checking and validation capabilities to make it more dependable in practical scenarios.

Example 1:

Example 2:

Conclusion:

Input Required