Chapter 2: Data Representation

COMPUTER CODES

Introduction

Have you ever thought how the keys on the computer keyboard that are in human recognizable form are interpreted by the computer system? This section briefly discusses text interpretation by the computer. We have learnt in the previous chapter that computer understands only binary language of 0s and 1s. Therefore, when a key on the keyboard is pressed, it is internally mapped to a unique code, which is further converted to binary.

Example 2.1 When the key ‘A’ is pressed (Figure 2.1), it is internally mapped to a decimal value 65 (code value), which is then converted to its equivalent binary value for the computer to understand.

Similarly, when we press alphabet ‘अ’ on Hindi keyboard, internally it is mapped to a hexadecimal value 0905, whose binary equivalent is 0000100100000101.

What is encoding?

Ans: The mechanism of converting data into an equivalent cipher using specific code is called encoding. It is important to understand why code value 65 is used for the key “A” and not any other value?

Some of the well-known encoding schemes are described in the following sections.

American Standard Code for Information Interchange (ASCII)

In the early 1960s, computers had no way of communicating with each other due to different ways of representing keys of the keyboard. Hence, the need for a common standard was realized to overcome this shortcoming. Thus, encoding scheme ASCII was developed for standardizing the character representation. ASCII is still the most commonly used coding scheme. Initially ASCII used 7 bits to represent characters. Recall that there are only 2 binary digits (0 or 1). Therefore, total number of different characters on the English keyboard that can be encoded by 7-bit ASCII code is 27 = 128. Table 2.1 shows some printable characters for ASCII code. But ASCII is able to encode character set of English language only.

Indian Script Code for Information Interchange (ISCII)

In order to facilitate the use of Indian languages on computers, a common standard for coding Indian scripts called ISCII was developed in India during mid 1980s. It is an 8-bit code representation for Indian languages which means it can represent 28=256 characters. It retains all 128 ASCII codes and uses rest of the codes (128) for additional Indian language character set. Additional codes have been assigned in the upper region (160–255) for the ‘aksharas’ of the language.

UNICODE

There were many encoding schemes, for character sets of different languages. But they were not able to communicate with each other, as each of them represented characters in their own ways. Hence, text created using one encoding scheme was not recognised by another machine using different encoding scheme. Therefore, a standard called UNICODE has been developed to incorporate all the characters of every written language of the world. UNICODE provides a unique number for every character, irrespective of device (server, desktop, mobile), operating system (Linux, Windows, iOS) or software application (different browsers, text editors, etc.). Commonly used UNICODE encodings are UTF-8, UTF-16 and UTF-32. It is a superset of ASCII, and the values 0–128 have the same character as in ASCII. Unicode characters for Devanagari script is shown in Table 2.3. Each cell of the table contains a character along with its equivalent hexadecimal value.

Class XI Computer Science(083) Chapter 2: Data Representation(Computer Codes)

Chapter 2: Data Representation

COMPUTER CODES

Introduction

What is encoding?

Blog Post