Mbrtoc32 In Cc++

In this article, you will learn about the mbrtoc32 function in C++ with its syntax, parameters, and examples.

A multibyte character sequence in C/C++ can be converted to a wide character (more precisely, a 32-bit wide character represented by char32_t) using the mbrtoc32 function in the standard library. This function is especially helpful when working with character encodings like UTF-8, which need several bytes to represent a single character.

Syntax:

It has the following syntax:

Example

size_t mbrtoc32(char32_t *pc32, const char *s, size_t n, mbstate_t *ps);

Parameters

  • pc32: A pointer to the char32_t variable's destination, where the outcome will be kept.
  • s: A pointer to the character sequence to be translated that consists of several bytes.
  • n: The number of bytes that must be considered for conversion in the multibyte character sequence.
  • ps: A reference to the conversion state of mbstate_t . This state stores information when converting multibyte sequences that span numerous calls to mbrtoc32.
  • Return Value

  • '0' indicates the conclusion of the multibyte sequence if 's' points to a null byte.
  • If 's' points to an invalid multibyte sequence, it returns staticcast<sizet>(-1) (a constant representing an error).
  • Otherwise, it returns the number of bytes consumed from the input multibyte sequence.
  • Key Points of mbrtoc32:

There are several key points of the mbrtoc32 in C++. Some main points of the mbrtoc32 are as follows:

  1. Locale Dependency

The locale setting at the moment affects how mbrtoc32 behaves. The character encodings used in various locales can vary, impacting the conversion function.

  1. Error Handling

The function is designed to handle invalid multibyte sequences. If an invalid sequence is encountered, it returns an error code (staticcast<sizet>(-1)) , signalling an error in the conversion.

  1. Stateful Conversion

Stateful conversion can be achieved with the mbstate_t argument. The state is updated when the function calls and can be utilized again.

  1. UTF-8 and Unicode Support

UTF-8 multibyte sequences can be translated into matching Unicode code points represented by char32_t using the mbrtoc32 function when encoding in UTF-8. It makes working with many different characters, including those that fall outside of the fundamental multilingual plane, easier.

  1. Multibyte Character Handling

Although a single character may be represented by several bytes, this function helps to handle multibyte characters. It guarantees accurate conversion together with information about the amount of bytes used.

Example:

Let us take an example to demonstrate the mbrtoc32 function in C++:

Example

#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <uchar.h>
#include <wchar.h>
using namespace std;
int main(void)
{
 char32_t hold;
 char str[] = "";
 mbstate_t arr{};
 int len;
 //initializing the function
 len = mbrtoc32(&hold, str, MB_CUR_MAX, &arr);
 if (len < 0)
 {
 perror("conversion failed");
 exit(-1);
 }
 cout << "The String is: " << str << endl;
 cout << "The Length is: " << len << endl;
 printf("32-bit character = 0g%02hd\n", hold);
}

Output:

Output

The String is: 
The Length is: 0
32-bit character = 0g00

Explanation:

  1. Header Files

The required header files, including <cstdio>, <cstdlib>, <iostream>, <uchar.h>, and <wchar.h>, are included in the code.

  1. Namespace

Standard C++ identifiers can be used without the std::prefix by bringing the full std namespace into scope with the using namespace std; statement.

  1. Variable Declarations
  • char32t hold;: Declare a variable hold of type char32t to store the transformed wide character.
  • char str = "";: It declares a character array str with an empty string as its initial value.
  • mbstatet arr{};: It initializes the mbstatet variable arr to the default state {} after declaring it.
  1. Function Call - mbrtoc32
  • len = mbrtoc32(&hold, str, MBCURMAX, &arr);: It calls the mbrtoc32 function to convert the multibyte sequence str to a char32t wide character. MBCUR_MAX is used to specify the maximum number of bytes in a multibyte character in the current locale. The result is stored in the hold, and the number of bytes consumed is stored in len .
  1. Error checking

It verifies if the conversion was unsuccessful (len < 0) . If so, an error code is output and an error message using perror before it departs.

  1. Output

Uses std::cout function to output the original string str and its length.

The 'printf' is used to print the 32-bit character in octal format.

  1. Statement of Return

return 0;: It denotes that the program has been successfully executed.

The output will indicate that the character's length is 0 because the supplied str is an empty string. The 32-bit character that is converted will be determined by the state information. Depending on your use case, you may supply a non-empty multibyte sequence to watch the conversion.

Input Required

This code uses input(). Please provide values below: