Interactive Programming Education Platform

Logic Practice

In this guide, we will explore the implementation of a neural network in C++ along with a practical illustration.

What is a Neural Network?

The neural network functions as a computational framework mirroring the structure and behavior of brain neurons. Serving as an artificial nervous system, it receives, processes, and transmits input data akin to biological neurons.

Layers of Neural Network

A neural network is comprised of three distinct types of layers: the input layer, the hidden layer, and the output layer.

Input layer:

There exists a single input layer within a model that gathers all inputs from the dataset, representing the various features. This input data is then transmitted to the hidden layers within the Neural network.

Hidden layer:

There are multiple concealed layers within the network. The data received from the initial layer is adjusted and combined to achieve the expected results. If we consider a scenario with three hidden layers, the inputs are initially directed to the first hidden layer. Subsequently, the weights are updated and transferred to the second layer. The revised weights are then passed on to the third hidden layer. Finally, this updated data is transmitted to the output layer.

Output layer:

Once the processing is complete, the information is accessible at the output layer.

Basics of the Neural Networks:

Neurons and layers: Each neuron functions similarly to a biological neuron. There are input, hidden, and output layers in neural networks.

Activation functions: These mathematical functions introduce non-linearities or curves to the model, enabling it to learn intricate patterns within the data. Some examples of activation functions include the sigmoid function and ReLU. These functions play a crucial role in enhancing the model's capability to capture complex patterns in the dataset. ReLU, along with its variations like Leaky ReLU and ELU, are frequently preferred for hidden layers owing to their simplicity and effectiveness across a wide range of scenarios. On the other hand, activation functions like sigmoid and softmax are typically employed in the output layer, selected based on the specific nature of the problem being addressed.

During the process of forward propagation, the weights are adjusted and optimized to make precise predictions. Conversely, in backpropagation, the weights are reduced and modified to minimize errors.

Loss functions, alternatively referred to as cost functions or objective functions, measure the disparity between predicted values and true values. Various types of loss functions include Mean Squared Error, Binary Cross-Entropy Loss, Categorical Cross-Entropy Loss, Hinge Loss, among others. Selecting the correct loss function holds significance as it directly impacts the model's learning process and guarantees optimal model optimization tailored to the particular task being addressed.

Optimizers play a crucial role in weight optimization. Varieties such as gradient, Adagrad, stochastic gradient descent, RMSprop, and others are essential for achieving convergence and precise results. Currently, Adam stands out as the top-performing optimizer in the field.

Weight initialization: It is essential to initialize weights correctly to avoid issues like the vanishing gradient or exploding gradient. Two common methods for this are Xavier and Glorot initialization.

Regularization methods: These techniques play a crucial role in addressing issues related to overfitting and underfitting. The two primary forms of regularization are L1, known as Lasso regularization, and L2, referred to as Ridge regularization. Additional regularization approaches include employing callbacks and implementing early stopping mechanisms. To combat underfitting, data augmentation is frequently employed.

Validation and Testing:

Training datasets: The section of data that is essential for teaching the network to gain insights and knowledge from the dataset.

Validation dataset: The validation dataset is a distinct subset of the data that remains untouched during the model training phase. Its primary purpose is to assess the model's performance as it undergoes training iterations.

The evaluation dataset is an entirely new section of the data which is employed to assess the performance of the finalized model post its training and optimization with the training and validation datasets.

Example:

Let's consider a scenario to demonstrate the neural network in C++.

Example


#include <iostream>
#include <vector>
#include <cmath>
#include <cstdlib>
#include <ctime>
using namespace std;
class NeuralNetwork {
public:
 NeuralNetwork(int inputSize, int hiddenSize, int outputSize);
 void train(vector<vector<double>>& inputs, vector<double>& targets, int epochs, double learningRate);
 double predict(vector<double>& input);
private:
 vector<vector<double>> weights_input_hidden; // weights in the hidden layers
 vector<double> weights_hidden_output; // weights are present in the output layer which is the last layer
 double sigmoid(double x); // activation function takes input weights and add baises of neuron and give the updated weights to next hidden layer
 double sigmoidDerivative(double x); // derivative of the sigmoid derivative essenetial for back propagation inorder to minimize the error
};
NeuralNetwork::NeuralNetwork(int inputSize, int hiddenSize, int outputSize) {
 srand(time(0));
 // Initialize weights using Xavier/Glorot initialization
 for (int i = 0; i < hiddenSize; ++i) {
 vector<double> inputHiddenWeights;
 for (int j = 0; j < inputSize; ++j) {
 inputHiddenWeights.push_back((rand() % 2000 - 1000) / 1000.0);
 }
 weights_input_hidden.push_back(inputHiddenWeights);
 }
 for (int i = 0; i < outputSize; ++i) {
 weights_hidden_output.push_back((rand() % 2000 - 1000) / 1000.0);
 }
}
double NeuralNetwork::sigmoid(double x) {
 // formula for activation function
 return 1.0 / (1.0 + exp(-x));
}

double NeuralNetwork::sigmoidDerivative(double x) {
 // finding the derivative of the sigmoid function using the formula for finding gradient
 return x * (1.0 - x);
}
void NeuralNetwork::train(vector<vector<double>>& inputs, vector<double>& targets, int epochs, double learningRate) {
 for (int epoch = 0; epoch < epochs; ++epoch) {
 for (size_t i = 0; i < inputs.size(); ++i) {
 vector<double> input = inputs[i];
 double target = targets[i];
 // Forward pass
 vector<double> hiddenOutput(weights_input_hidden.size());
 for (size_t j = 0; j < weights_input_hidden.size(); ++j) {
 double weightedSum = 0.0;
 for (size_t k = 0; k < input.size(); ++k) {
 weightedSum += input[k] * weights_input_hidden[j][k];
 }
 hiddenOutput[j] = sigmoid(weightedSum);
 }
 double output = 0.0;
 for (size_t j = 0; j < weights_hidden_output.size(); ++j) {
 output += hiddenOutput[j] * weights_hidden_output[j];
 }
 // Backpropagation
 double outputError = target - output;
 vector<double> hiddenErrors(hiddenOutput.size());
 for (size_t j = 0; j < hiddenErrors.size(); ++j) {
 hiddenErrors[j] = outputError * weights_hidden_output[j];
 }
 for (size_t j = 0; j < weights_hidden_output.size(); ++j) {
 weights_hidden_output[j] += learningRate * outputError * hiddenOutput[j];
 }
 for (size_t j = 0; j < weights_input_hidden.size(); ++j) {
 for (size_t k = 0; k < input.size(); ++k) {
 weights_input_hidden[j][k] += learningRate * hiddenErrors[j] * sigmoidDerivative(hiddenOutput[j]) * input[k];
 }
 }
 }
 }
}
double NeuralNetwork::predict(vector<double>& input) {
 vector<double> hiddenOutput(weights_input_hidden.size());
 for (size_t i = 0; i < hiddenOutput.size(); ++i) {
 double weightedSum = 0.0;
 for (size_t j = 0; j < input.size(); ++j) {
 weightedSum += input[j] * weights_input_hidden[i][j];
 }
 hiddenOutput[i] = sigmoid(weightedSum);
 }
 double output = 0.0;
 for (size_t i = 0; i < weights_hidden_output.size(); ++i) {
 output += hiddenOutput[i] * weights_hidden_output[i];
 }
 return output;
}
double calculateMeanSquaredError(vector<double>& predictedOutputs, vector<double>& actualOutputs) {
 if (predictedOutputs.size() != actualOutputs.size()) {
 cerr << "Error: Predicted and actual outputs must have the same size." << endl;
 return -1.0;
 }
 double sumSquaredError = 0.0;
 for (size_t i = 0; i < predictedOutputs.size(); ++i) {
 double error = predictedOutputs[i] - actualOutputs[i];
 sumSquaredError += error * error;
 }

 return sumSquaredError / static_cast<double>(predictedOutputs.size());
}
int main() {
 // Sample dataset with 2 input features and 1 output feature
 vector<vector<double>> inputs;
 vector<double> outputs;
 for (int i = 0; i < 500; ++i) {
 // Generate random input features between 0 and 1
 double input1 = static_cast<double>(rand()) / RAND_MAX;
 double input2 = static_cast<double>(rand()) / RAND_MAX;
 // Calculate output based on a linear relationship (for example: output = 2 * input1 + 3 * input2 + random noise)
 double output = 2 * input1 + 3 * input2 + 0.1 * static_cast<double>(rand()) / RAND_MAX; // Adding random noise
 // Store the data point in the dataset
 inputs.push_back({input1, input2});
 outputs.push_back(output);
 }
 
 vector<vector<double>> testInputs;
 vector<double> testOutputs;
 for (int i = 501; i < 600; ++i) {
 // Generate random input features between 0 and 1
 double testinput1 = static_cast<double>(rand()) / RAND_MAX;
 double testinput2 = static_cast<double>(rand()) / RAND_MAX;

 // Calculate output based on a linear relationship (for example: output = 2 * input1 + 3 * input2 + random noise)
 double testoutput = 2 * testinput1 + 3 * testinput2 + 0.1 * static_cast<double>(rand()) / RAND_MAX; // Adding random noise
 // Store the data point in the dataset
 testInputs.push_back({testinput1, testinput2});
 testOutputs.push_back(testoutput);
 }
 // Creating and training the neural network
 NeuralNetwork neuralNetwork(2, 16, 1); // 2 input features, 8 hidden neurons, 1 output neuron
 neuralNetwork.train(inputs, outputs, 500, 0.01);
 vector<double> predictedOutputs;
 for (size_t i = 0; i < testInputs.size(); ++i) {
 double predictedOutput = neuralNetwork.predict(testInputs[i]);
 predictedOutputs.push_back(predictedOutput);
 cout << "Test Data #" << i + 1 << ", Predicted Output: " << predictedOutput << ", Actual Output: " << testOutputs[i] << endl;
 }
 // Calculate Mean Squared Error
 double mse = calculateMeanSquaredError(predictedOutputs, testOutputs);
 cout << "Mean Squared Error (MSE): " << mse << endl;

 // Sample input for prediction
 vector<double> userInput = {100, 100}; // Change these values to your desired input
 // Predict using the trained neural network
 double predictedOutput = neuralNetwork.predict(userInput);
 cout << "Predicted Output: " << predictedOutput << endl;
 return 0;
}

Output:

Explanation:

Break down the code into functions to enhance comprehension, as this extensive program encompasses a multitude of valuable techniques essential for neural networks.

Layout of the program:

NeuralNetwork Class:

Constructor:

Create a new instance of NeuralNetwork with specified input size, hidden layer size, and output size.

Activation Function:

double NeuralNetwork::sigmoid(double x)

Activation Function Derivative:

double NeuralNetwork::sigmoidDerivative(double x)

Training Method:

void NeuralNetwork::train(vector<vector<double>>& inputValues, vector<double>& desiredOutputs, int trainingEpochs, double alpha)

Prediction Method:

double The NeuralNetwork class's predict function takes a vector input as an argument.

Helper Function:

Mean Squared Error Calculation:

double computeMeanSquaredError(vector<double>& predictedResults, vector<double>& actualResults)

Main Function:

int main

NeuralNetwork Constructor:

It instantiates the NeuralNetwork object with designated sizes for the input, hidden, and output layers. Three parameters - inputSize, hiddenSize, and outputSize - are required for this initialization. This method establishes the fundamental layout of the neural network, readying it for both training and prediction functions.

Sigmoid function:

It functions as the activation function responsible for calculating the sigmoid activation based on the provided input value x. This specific function transforms any real number into a value within the interval from 0 to 1. The formula for the sigmoid function is:

Sigmoid(x) = 1 / (1+ e^(-x))

double x - The value that needs to be provided as input to calculate the sigmoid activation.

double - The outcome derived from implementing the sigmoid function on the input.

The primary function of this specific function is to add non-linearity to the network.

This function is invoked during both forward propagation and backpropagation.

sigmoidDerivative function:

The function sigmoidDerivative calculates the derivative of the sigmoid activation function in relation to its input. The derivative of the sigmoid function σ(x) is determined by the following formula:

Example


σ ′(x) = σ(x)×(1-σ(x))

Compute the derivative of the sigmoid function for the given input value double x and return the result of this computation.

train function:

Function parameters are:

A 2D vector is employed to represent the input data used for training purposes.

Targets: A vector that holds the target values associated with each element.

Epochs: It determines the quantity of times the training process iterates.

The learning rate signifies the magnitude of adjustment applied to the weights in every iteration.

The return type of the function are:

It has a void return type, meaning it doesn't send any values back to other functions; instead, it alters the internal state of the neural network object.

Variables used in the function:

The integer variable "epoch" represents the ongoing epoch within the training process.

The variable size_t i represents the index of the current data point being used in training iteration.

The variable size_t j is employed as a loop counter for traversing through concealed layers.

The <double> k represents the loop variable utilized for iterating over input features.

double weightedSum: A temporary variable that stores the sum of weighted inputs for a neuron.

The outputError variable denotes the disparity between the forecasted output and the real target value.

vector<double> concealedOutput: Array storing the result values produced by the hidden layer neurons in response to a specific input.

vector<double> hiddenErrors: This vector stores the errors present in the hidden layer and is crucial for the backpropagation process.

Calculations and steps present in this method:

Forward propagation:

For every input data point, the algorithm calculates the weighted totals and then implements the activation function on the hidden layer.

Backpropagation:

It calculates the error in hidden layers by propagating it backwards through the neural network. The error is computed based on the output error.

Weight updation:

It pertains to the gradient descent principle. It adjusts the weights connecting the input and hidden layers according to the error computed in the preceding step.

Iterations:

Iterations are structured around epochs, during which the model undergoes training on the complete dataset anew. This process enables the model to adapt by recalibrating its weights.

Predict function:

Parameters used in the function:

One vector containing the input features necessary for making predictions.

Return type:

It returns the prediction of the neural networks.

Variables present in the function are:

The hiddenOutput variable stores the results from the neurons in the hidden layer following the application of the sigmoid activation function. The prediction process relies on the weightedSum and output variables.

Calculation: For every neuron within the hidden layer (equal to the size of weightsinputhidden).

Iterate across the input characteristics and compute the weighted total (weightedSum) by multiplying each input characteristic by its respective weight in the hidden stratum.

Utilize the sigmoid activation function on the weighted sum to derive the neuron's output.

Save the result of the sigmoid function in the hiddenOutput array.

calculateMeanSquaredError function:

The parameters passed to the function are:

predictedOutputs: It represents a vector comprising the anticipated output values generated by the neural networks.

It is a vector that holds the real values corresponding to the identical inputs.

return types of the function are:

it denotes the computed average squared difference between the predicted and observed outputs

Calculations:

The function initially verifies whether the predictedOutputs and actualOutputs vectors are of equal size. In case they differ in size, an error message is output to the standard error stream (cerr), and the function concludes by returning -1.0 to signify an error.

The function verifies whether the input vectors are of equal length, computes the squared differences between the predicted and observed values, determines the average of these squared errors, and provides the mean squared error as an indicator of the neural network's prediction precision.

main function:

This function is divided into five parts that are:

✅ data generation
✅ test data generation
✅ neural network initialization and training
✅ testing the trained neural networks
✅ prediction using user input

Data Generation:

This section involves two vectors labeled as inputs, a two-dimensional vector, and outputs, a one-dimensional vector. We are constructing datasets with two input characteristics and one output characteristic. The dataset comprises 500 data points, where the inputs are inserted into the inputs vector. The resulting outputs, generated by implementing the specified correlation, are saved in the outputs vector.

Neural Network Initialization and Training:

A neural network featuring 2 input characteristics, 16 hidden neurons, and 1 output neuron is employed in this scenario. The training process involves invoking the train method with four arguments: the input vector, output vectors, the total epochs, and the learning rate.

Testing the Trained Neural Network:

The neural network that has been trained is assessed using the test dataset. Each data point in the test set is utilized to make predictions through the prediction method. Subsequently, both the predicted and actual outputs are displayed for every test data point. The Mean Squared Error (MSE) is then computed by comparing the predicted outputs with the actual outputs in order to assess the effectiveness of the model.

Prediction using user input:

The individual has the ability to enter personalized data (for instance, { 100, 100 }). The neural network generates a forecast based on the trained model using the given input characteristics, and then showcases the projected result.

Conclusion:

The structure of the neural network includes an initial layer with 2 input features, followed by a hidden layer containing 16 neurons, and finally an output layer with a single neuron. Training of this model involves a dataset consisting of 500 data points, where each data point is defined by 2 input features and 1 output value. During the training process, the network utilizes backpropagation to update its weights and reduce the difference between the predicted outputs and the true values.

Following the completion of training, the network's effectiveness is assessed using a distinct test set consisting of 99 data points. The evaluation utilizes the Mean Squared Error (MSE) metric to gauge the precision of the forecasts. MSE assesses the neural network's ability to capture the fundamental patterns in the data, where decreased MSE values signify higher accuracy in predictions.

It's crucial to understand that various aspects play a role in determining the efficiency of the neural network. The initial random assignment of weights can influence how well the training algorithm converges, while the caliber and quantity of the dataset have a substantial effect on the model's capacity to generalize to unfamiliar data. Furthermore, the quantity of training epochs and the learning rate are pivotal hyperparameters that affect the network's overall performance. Tweaking these parameters and exploring diverse network structures can boost the precision of predictions.

Furthermore, the neural network serves as a flexible instrument suitable for tackling a wide range of practical challenges, including but not restricted to image identification, processing human language, and predicting financial trends. This execution lays down the basics of neural networks and can be expanded upon and fine-tuned for handling more intricate assignments and datasets.

Neural Network In C++