What is Character Encoding?

How Do Input Devices Convert Text to Binary?

This conversion is done by using character encoding and number system.

Character encoding is the process of assigning unique numbers to each characters.

Number system is used to convert the assigned unique number to 0’s and 1’s.

What is ASCII?

In 1960s, ASCII (American Standard Code for Information Interchange) developed a standard character table which handles character encoding.

  • If you calculate the number of characters we use in day to day life are:
    • all special symbols- !@#$%^&*()
    • all english characters: A-Z, a-z,
    • all math notations or numbers: 0-9, +, >,
  • You will find that the total characters are less than 128.
  • If we want to represent 128 characters in binary then you can calculate it with formula: 2^n
  • How many characters can we uniquely identify?
    • For 1-bit, 2^1 = 2 characters
    • For 2-bit, 2^2 = 4 characters
    • For 3-bit, 2^3 = 8 characters
    • For 4-bit, 2^3 = 16 characters
    • For 5-bit, 2^3 = 32 characters
    • For 6-bit, 2^3 = 64 characters
    • For 7-bit, 2^3 = 128 characters
    • For 8-bit, 2^3 = 256 characters
    • and it goes on.
ASCII Table Example

Have a look at how ASCII table maps each character as an example. To see full details, refer to wiki or table here.

Character Decimal (base-10) Binary (base-2)
A 65 01000001
B 66 01000010
--- --- ---
a 97 01100001
b 98 01100010
--- --- ---

In early days of computers, hardware had limitations,

  • The designers of ASCII initially chose to use only 7 bits for character representation to leave the eighth bit for error-checking purposes to detect if any data in the 7-bit were lost during transmission.

As the hardware evolved and improved,

  • The need for error-checking decreased, and the eighth bit is used for other purposes, such as supporting extra characters.
What is Number System?

To convert characters to binary, computers use number systems to assign a unique number to each character with the help of a set of symbols and rules.

Here is the commonly used number systems
Base Digits used to represent Description
2 (binary) 0, 1 Used in digital systems and computer programming
8 (octal) 0, 1, 2, 3, 4, 5, 6, 7 Less common, but still used
10 (decimal) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Most commonly used in everyday life
16 (hexadecimal) 0, 1, 2, 3, 4, 5, 6, 7, 9, A, B, C, D, E, F Used in computer programming and memory addressing
Let's see an example

Convert character ‘A’ to ‘01000001’ using ASCII encoding.

  • Step 1: The ASCII code for ‘A’ is 65 (in decimal or base-10 format).
  • Step 2: Computer then converts 65 (in decimal or base-10 format) to 01000001 (in binary or base-2 format)
What is Unicode?

ASCII had limitations:

  • ASCII only supports limited characters for English language.
  • ASCII does not cover all languages in the world.

As computers became more global, a need to represent characters from various language characters, symbols in a standard way.

  • Unicode is a universal character encoding standard which aims to represent characters used in multiple languages worldwide, including  (Latin, Arabic, Chinese, etc.), symbols, mathematical notations, emojis, and more.
UTF-8 and UTF-16 Table Example
  • UTF-8 uses 8-bits to represent a character i.e. represents 2^8 = 256 characters.
  • UTF-16 uses 16-bits to represent a character i.e. represents 2^16 = 65,536 characters.
Character Decimal (base-10) UTF-8 (base-2) UTF-16 (base-2)
A 65 01000001 00000000 01000001
B 66 01000010 00000000 01000010
--- --- --- ---
a 97 01100001 00000000 01100001
b 98 01100010 00000000 01100010
--- --- --- ---
UTF-32 Table Example
  • UTF-32 uses 32-bits to represent a character i.e. represents  2^32 = 4,294,967,296 characters.
Character Decimal (base-10) UTF-32 (base-2)
A 65 00000000 00000000 00000000 01000001
B 66 00000000 00000000 00000000 01000010
--- --- ---
a 97 00000000 00000000 00000000 01100001
b 98 00000000 00000000 00000000 01100010
--- --- ---
Note
  • All this conversion is done automatically. We don’t have to do it.
  • You just need to know how it works in brief. That’s all.