In this article, you will learn about the mbrtoc32 function in C++ with its syntax, parameters, and examples.
A multibyte character sequence in C/C++ can be converted to a wide character (more precisely, a 32-bit wide character represented by char32_t) using the mbrtoc32 function in the standard library. This function is especially helpful when working with character encodings like UTF-8, which need several bytes to represent a single character.
Syntax:
It has the following syntax:
size_t mbrtoc32(char32_t *pc32, const char *s, size_t n, mbstate_t *ps);
Parameters
- pc32: A pointer to the char32_t variable's destination, where the outcome will be kept.
- s: A pointer to the character sequence to be translated that consists of several bytes.
- n: The number of bytes that must be considered for conversion in the multibyte character sequence.
- ps: A reference to the conversion state of mbstate_t . This state stores information when converting multibyte sequences that span numerous calls to mbrtoc32.
- '0' indicates the conclusion of the multibyte sequence if 's' points to a null byte.
- If 's' points to an invalid multibyte sequence, it returns staticcast<sizet>(-1) (a constant representing an error).
- Otherwise, it returns the number of bytes consumed from the input multibyte sequence.
Return Value
Key Points of mbrtoc32:
There are several key points of the mbrtoc32 in C++. Some main points of the mbrtoc32 are as follows:
- Locale Dependency
The locale setting at the moment affects how mbrtoc32 behaves. The character encodings used in various locales can vary, impacting the conversion function.
- Error Handling
The function is designed to handle invalid multibyte sequences. If an invalid sequence is encountered, it returns an error code (staticcast<sizet>(-1)) , signalling an error in the conversion.
- Stateful Conversion
Stateful conversion can be achieved with the mbstate_t argument. The state is updated when the function calls and can be utilized again.
- UTF-8 and Unicode Support
UTF-8 multibyte sequences can be translated into matching Unicode code points represented by char32_t using the mbrtoc32 function when encoding in UTF-8. It makes working with many different characters, including those that fall outside of the fundamental multilingual plane, easier.
- Multibyte Character Handling
Although a single character may be represented by several bytes, this function helps to handle multibyte characters. It guarantees accurate conversion together with information about the amount of bytes used.
Example:
Let us take an example to demonstrate the mbrtoc32 function in C++:
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <uchar.h>
#include <wchar.h>
using namespace std;
int main(void)
{
char32_t hold;
char str[] = "";
mbstate_t arr{};
int len;
//initializing the function
len = mbrtoc32(&hold, str, MB_CUR_MAX, &arr);
if (len < 0)
{
perror("conversion failed");
exit(-1);
}
cout << "The String is: " << str << endl;
cout << "The Length is: " << len << endl;
printf("32-bit character = 0g%02hd\n", hold);
}
Output:
The String is:
The Length is: 0
32-bit character = 0g00
Explanation:
- Header Files
The required header files, including <cstdio>, <cstdlib>, <iostream>, <uchar.h>, and <wchar.h>, are included in the code.
- Namespace
Standard C++ identifiers can be used without the std::prefix by bringing the full std namespace into scope with the using namespace std; statement.
- Variable Declarations
- char32t hold;: Declare a variable hold of type char32t to store the transformed wide character.
- char str = "";: It declares a character array str with an empty string as its initial value.
- mbstatet arr{};: It initializes the mbstatet variable arr to the default state {} after declaring it.
- Function Call - mbrtoc32
- len = mbrtoc32(&hold, str, MBCURMAX, &arr);: It calls the mbrtoc32 function to convert the multibyte sequence str to a char32t wide character. MBCUR_MAX is used to specify the maximum number of bytes in a multibyte character in the current locale. The result is stored in the hold, and the number of bytes consumed is stored in len .
- Error checking
It verifies if the conversion was unsuccessful (len < 0) . If so, an error code is output and an error message using perror before it departs.
- Output
Uses std::cout function to output the original string str and its length.
The 'printf' is used to print the 32-bit character in octal format.
- Statement of Return
return 0;: It denotes that the program has been successfully executed.
The output will indicate that the character's length is 0 because the supplied str is an empty string. The 32-bit character that is converted will be determined by the state information. Depending on your use case, you may supply a non-empty multibyte sequence to watch the conversion.