Difference Between ASCII And Unicode In C

What is the ASCII in C?

ASCII stands for American Standard Code for Information Interchange. It is the first and oldest character representation encoding standard used for computers. It is a 7-bit system that provides 128 unique characters, which are capable of representing the English letters, numbers, punctuation characters, and some control characters. Its simplicity made it highly suitable for early communication systems, programming languages, and data transmission.

ASCII, as a character code, provides a standard representation of text throughout the devices but limits itself to the use of the English language and cannot accommodate the many diacritical symbols and non-Latin codes. Extended ASCII, which is a system consisting of 8 bits, provides for the representation of 256 characters and is still inadequate for the multilingual situation presently existing in the world.

Syntax:

It has the following syntax:

Example

char variable = 'A';     // ASCII character

int ascii_value = variable;  // gets ASCII code (65)

Table ASCII pattern:

Example

'A' → 65

'a' → 97

'0' → 48

C Example for ASCII

Let's take an example to demonstrate the ASCII in C.

Example

Example

#include <stdio.h>

int main() {    // main function

    char ch = 'A';

    printf("Character: %c\nASCII Value: %d\n", ch, ch);

    return 0;

}

Output:

Output

Character: A

ASCII Value: 65

Explanation:

In the given example, we have taken a variable named ch that stores the character 'A'. In C, characters are internally represented by their ASCII integer codes. The ASCII value of 'A' is 65, so when the printf function prints ch using %c, it displays the character A, and when printed using %d, it displays its ASCII integer value 65.

What is the Unicode in C?

Unicode is a modern universal character encoding standard that was developed to represent the texts of all languages and scripts. It was developed to take care of the limitations of ASCII and provide for a uniform encoding principle. Unicode assigns a code point for each of the characters, irrespective of platform, language, or software used.

It provides for more than 1.1 million characters to accommodate scripts such as Latin, Devanagari, Chinese, Arabic, as well as for the increased use of emoji and mathematical symbols. It uses several encoding forms (UTF-8, UTF-16, and UTF-32), giving both flexibility and worldwide uniformity. The UTF-8 standard is most often used in Internet applications since it is efficient and backward compatible with ASCII. Unicode provides for uniform and accurate representation of text in modern systems, thus being the standard for modern applications of computing.

Unicode Syntax in UTF-8/ UTF-16:

It has the following syntax:

Example

char utf8_text[] = "नमस्ते";      // UTF-8 string

wchar_t uni_char = L'अ';          // Wide Unicode character

char16_t uni_char16 = u'अ';       // UTF-16 code unit

char32_t emoji = U'';            // UTF-32 code point

For escaping sequence syntax:

Example

\uXXXX    →  UTF-16 (Java, JavaScript)

\UXXXXXXXX → UTF-32 (Python, C++)

C Example for Unicode

Let's take an example to illustrate the use of Unicode in the C programming language .

Example

Example

#include <stdio.h>



int main() {    // main function

    char utf8_char[] = "नमस्ते";  // Hindi text in UTF-8

    printf("Unicode UTF-8 String: %s", utf8_char);

    return 0;

}

Output:

Output

Unicode UTF-8 String: नमस्ते

Explanation:

In this example, we print a Hindi character using the Unicode UTF-8 encoding. UTF-8 uses variable-length bytes that allow it to represent complex scripts and characters beyond the ASCII range. Unlike the ASCII encoding, Unicode supports the characters of many world languages, which enables multilingual text processing across modern systems, devices, and applications.

Key Difference Between the ASCII and Unicode in C

There are several key differences between the ASCII and Unicode in C. Some of them are as follows:

Feature ASCII Unicode
Full Form American Standard Code for Information Interchange Universal Character Encoding Standard
Bits Used 7 bits (128 characters) or extended 8 bits (256 characters) Variable: UTF-8 (8-32 bits), UTF-16 (16-32 bits), UTF-32 (32 bits)
Character Set Size 128 or 256 1,114,112 possible characters
Languages Supported English only All world languages (e.g., Hindi, Chinese, Arabic)
Emoji Support No Yes
Compatibility Simple, lightweight Backward compatible with ASCII
Usage Older systems, basic text files Modern applications, websites, smartphones

C Example for ASCII and Unicode

Let's take an example to demonstrate the ASCII and Unicode in the C programming language.

Example

Example

#include <stdio.h>

#include <wchar.h>

#include <locale.h>



int main() {    //main function

    setlocale(LC_ALL, "");     // Enable wide-character (Unicode) output



    // ASCII Example

    char ascii_char = 'z';

    printf("ASCII Character: %c\n", ascii_char);

    printf("ASCII Value: %d\n\n", ascii_char);



    // Unicode Example (Wide Character)

    wchar_t unicode_char = L'अ';      // using Single Unicode character

    wchar_t unicode_text[] = L"こんにちは";  // Japanese text



    wprintf(L"Unicode Character: %lc\n", unicode_char);

    wprintf(L"Unicode Text: %ls\n", unicode_text);



    return 0;

}

Output:

Output

ASCII Character: z

ASCII Value: 122



Unicode Character: अ

Unicode Text: こんにちは

Explanation:

In the given example, we demonstrate how C handles both ASCII and Unicode characters. First, this program prints the ASCII character 'z' along with its ASCII value 122, which shows how 8-bit character encoding works for basic English symbols. Next, it uses the wchar_t and wprintf functions to display the Unicode character 'अ' and the Japanese string "こんにちは". After that, when we call the setlocale function, it enables proper rendering of Unicode characters in the terminal.

ASCII Table

In the C programming language, the ASCII table is a commonly utilized character encoding standard that maps numerical values to characters. It makes a consistent text representation possible across computer systems. It offers 128 characters divided into sections, each section providing a different function.

The lower portion (0-31 and 127) consists of control characters, which are non-printable and utilized for such functions as new line, tab, backspace, and device control. The printable characters start from 32 and consist of such things as symbols, numerals, capital letters, lower case letters, and programming functions, which include brackets and braces.

Each section is arranged in logical order, such as numerals (48-57), capital characters (65-90), and lower case characters (97-122), which makes it easier to understand and use with programming. The ASCII table is the basic foundation of text encoding, data processing, and communication in computer systems, even modern systems that extend it now into Unicode.

Decimal Range Hex Range Characters Category Description
0-31 00-1F Non-printable Control Characters System-level control codes like NUL, TAB, LF, CR
32 20 (space) Whitespace Blank space used for text separation
33-47 21-2F ! " # $ % & ' ( ) + , - . / Symbols Basic punctuation and operators
48-57 30-39 0-9 Digits Numeric characters
58-64 3A-40 : ; < = > ? @ Punctuation & Symbols Relational operators + punctuation
65-90 41-5A A-Z Uppercase Letters Capital English alphabet
91-96 5B-60 [ \ ] ^ _ ` Brackets & Modifiers Useful for coding and formatting
97-122 61-7A a-z Lowercase Letters Small English alphabet
123-126 7B-7E { } ~ Extended symbols Braces, pipe, tilde used in
127 7F DEL Control Character Delete / non-printable

Conclusion

In conclusion, ASCII and Unicode are two important character-encoding standards that are most commonly utilized in the C programming language. Each standard has its own purpose and limitations. The ASCII encoding standard is a simple, lightweight, and suitable for basic English characters, but it cannot represent multilingual text. In contrast, Unicode gives a universal encoding system capable of representing characters from virtually all world languages, symbols, and scripts.

When we use the wide characters (wchar_t) and functions like wprintf along with setlocale, C programs can handle global text processing effectively. Overall, ASCII is very suitable for small-scale applications, while Unicode is ideal for modern, international, and multilingual software development.

ASCII vs Unicode FAQ's

1) What is ASCII, and why is it important?

The ASCII (American Standard Code for Information Interchange) is a standard for encoding the characters and numbers assigned to letters, numbers, symbols, and control characters. It is the basic character set used for computers, programming, file formats, and communication protocols.

2) How many characters are in ASCII in C?

There are 128 characters in ASCII (0-127). They have all the thousands of control characters, printable characters, numbers, capital letters, small letter characters, and any special symbols.

3) What is the difference in the ASCII values of upper and lower case in C?

The capital letters are (65-90 A through Z), and the small letters are (97-122 a through z). The difference between any upper and lower case pair is always 32. Therefore, it is very easy to make conversions in programming languages.

4) What are control characters in the ASCII code in C?

In the C programming language, the control characters are the codes from 0 to 31 and from 127. These characters are non-printable characters for tasks like newline (10), tab (9), carriage return (13), backspace (8), and escape (27). It is important because of formatting and device communication.

5) Is ASCII still in use today, or has it been replaced by Unicode?

ASCII code is still in use today, and it is the basis for Unicode. The first 128 characters in Unicode and the ASCII code are the same. Unicode has many characters beyond those in the ASCII set, but ASCII is still very important in programming, in data formats, network protocols, and the use of legacy systems.

Input Required

This code uses input(). Please provide values below: