Chapter 2: Data Representation
COMPUTER CODES
Introduction
Have you ever thought how the keys on the computer
keyboard that are in human recognizable form are
interpreted by the computer system? This section briefly
discusses text interpretation by the computer.
We have learnt in the previous chapter that
computer understands only binary language of 0s and
1s. Therefore, when a key on the keyboard is pressed, it
is internally mapped to a unique code, which is further
converted to binary.
Example 2.1 When the key ‘A’ is pressed (Figure 2.1), it
is internally mapped to a decimal value 65 (code value),
which is then converted to its equivalent binary value
for the computer to understand.
Similarly, when we press alphabet ‘अ’ on Hindi keyboard, internally it is mapped to a hexadecimal value 0905, whose binary equivalent is 0000100100000101.
What is encoding?
Ans: The mechanism of converting
data into an equivalent cipher using specific code is called encoding. It is important to understand why
code value 65 is used for the key “A” and not any other
value?
Some of the well-known encoding
schemes are described in the following sections.
American Standard Code for Information
Interchange (ASCII)
In the early 1960s, computers had no way of
communicating with each other due to different
ways of representing keys of the keyboard. Hence,
the need for a common standard was realized to
overcome this shortcoming. Thus, encoding scheme
ASCII was developed for standardizing the character
representation. ASCII is still the most commonly used
coding scheme.
Initially ASCII used 7 bits to represent characters.
Recall that there are only 2 binary digits (0 or 1).
Therefore, total number of different characters on the
English keyboard that can be encoded by 7-bit ASCII
code is 27 = 128. Table 2.1 shows some printable
characters for ASCII code. But ASCII is able to encode
character set of English language only.
Indian Script Code for Information Interchange
(ISCII)
In order to facilitate the use of Indian languages on
computers, a common standard for coding Indian scripts
called ISCII was developed in India during mid 1980s.
It is an 8-bit code representation for Indian languages
which means it can represent 28=256 characters. It
retains all 128 ASCII codes and uses rest of the codes
(128) for additional Indian language character set.
Additional codes have been assigned in the upper region
(160–255) for the ‘aksharas’ of the language.
UNICODE
There were many encoding schemes, for character
sets of different languages. But they were not able
to communicate with each other, as each of them
represented characters in their own ways. Hence, text
created using one encoding scheme was not recognised
by another machine using different encoding scheme.
Therefore, a standard called UNICODE has been
developed to incorporate all the characters of every
written language of the world. UNICODE provides a
unique number for every character, irrespective of
device (server, desktop, mobile), operating system
(Linux, Windows, iOS) or software application (different browsers, text editors, etc.). Commonly used UNICODE
encodings are UTF-8, UTF-16 and UTF-32. It is a superset
of ASCII, and the values 0–128 have the same character
as in ASCII. Unicode characters for Devanagari script
is shown in Table 2.3. Each cell of the table contains a
character along with its equivalent hexadecimal value.