Data Formats used by Computers
ISO – International Standards Organization
CSA – Canadian Standards Association
ANSI – American National Standards Institute
IEEE – Institute for Electrical and Electronics Engineers
3. Format must be appropriate
• The internal representation must be
appropriate for the type of processing to
take place (e.g., text, images, sound)
4. Rules/Conventions
• Proprietary formats
– Unique to a product or company
– E.g., Microsoft Word, Corel Word Perfect, IBM Lotus
Notes
• Standards
– Evolve two ways:
• Proprietary formats become de facto standards (e.g., Adobe
PostScript, Apple Quick Time)
• Committee is struck to solve a problem (Motion Pictures
Experts Group, MPEG)
5. Standards Organizations
• ISO – International Standards Organization
• CSA – Canadian Standards Association
• ANSI – American National Standards
Institute
• IEEE – Institute for Electrical and
Electronics Engineers
• Etc.
6. Examples of Standards
Type of Data Standards
Alphanumeric ASCII, EBCDIC, Unicode
Image JPEG, GIF, PCX, TIFF
Motion picture MPEG-2, Quick Time
Sound Sound Blaster, WAV, AU
Outline graphics/fonts PostScript, TrueType, PDF
7. Why Standards?
• Standard are “arbitrary”
• They exist because they are
– Convenient
– Efficient
– Flexible
– Appropriate
– Etc.
8. Alphanumeric Data
• Problem: Distinguishing between the number 123
(one hundred and twenty-three) and the characters
“123” (one, two, three)
• Four standards for representing letters (alpha) and
numbers
– BCD – Binary-coded decimal
– ASCII – American standard code for information
interchange
– EBCDIC – Extended binary-coded decimal interchange
code
– Unicode
14. Codes and Characters
• Each character is coded as a byte
• Most common coding system is ASCII
(Pronounced ass-key)
• ASCII = American National Standard Code
for Information Interchange
• Defined in ANSI document X3.4-1977
15. ASCII Features
• 7-bit code
• 8th bit is unused (or used for a parity bit)
• 27 = 128 codes
• Two general types of codes:
– 95 are “Graphic” codes (displayable on a
console)
– 33 are “Control” codes (control features of the
console or communications channel)
16. ASCII Chart
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
17. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
18. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
Most significant bit
Least significant bit
19. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
e.g., ‘a’ = 1100001
20. 95 Graphic codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
21. 33 Control codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
22. Alphabetic codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
23. Numeric codes
000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
24. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
Punctuation, etc.
26. Common Control Codes
• CR 0D carriage return
• LF 0A line feed
• HT 09 horizontal tab
• DEL 7F delete
• NULL 00 null
Hexadecimal code
27. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
28. Terminology
• Learn the names of the special symbols
– [ ] brackets
– { } braces
– ( ) parentheses
– @ commercial ‘at’ sign
– & ampersand
– ~ tilde
29. 000 001 010 011 100 101 110 111
0000 NULL DLE 0 @ P ` p
0001 SOH DC1 ! 1 A Q a q
0010 STX DC2 " 2 B R b r
0011 ETX DC3 # 3 C S c s
0100 EDT DC4 $ 4 D T d t
0101 ENQ NAK % 5 E U e u
0110 ACK SYN & 6 F V f v
0111 BEL ETB ' 7 G W g w
1000 BS CAN ( 8 H X h x
1001 HT EM ) 9 I Y i y
1010 LF SUB * : J Z j z
1011 VT ESC + ; K [ k {
1100 FF FS , < L l |
1101 CR GS - = M ] m }
1110 SO RS . > N ^ n ~
1111 SI US / ? O _ o DEL
30. Escape Sequences
• Extend the capability of the ASCII code set
• For controlling terminals and formatting output
• Defined by ANSI in documents X3.41-1974 and
X3.64-1977
• The escape code is ESC = 1B16
• An escape sequence begins with two codes:
ESC [
1B16 5B16
36. Unicode Version 2.1
• 1998
• Improves on version 2.0
• Includes the Euro sign (20AC16 = )
• From the standard:
…contains 38,887 distinct coded characters derived
from the supported scripts. These characters cover the
principal written languages of the Americas, Europe,
the Middle East, Africa, India, Asia, and Pacifica.
http://www.unicode.org
37. Keyboard Input
• Key (“scan”) codes are converted to ASCII
• ASCII code sent to host computer
• Received by the host as a “stream” of data
• Stored in buffer
• Processed
• Etc.
pp. 69
38. Shift Key
• inhibits bit 5 in the ASCII code
Key(s)
ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 0 1
1 0 0 0 0 0 1
a
A
a
aShift
39. Control Key
• inhibits bits 5 & 6 in the ASCII code
Key(s)
ASCII code
6 5 4 3 2 1 0 Character
1 1 0 0 0 1 1
0 0 0 0 0 1 1
c
ETX
c
cCtrl
Control
code
40. Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
42. Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
43. Bar Codes
• An automatic identification (Auto ID)
technology that streamlines identification
and data collection
• See
http://www.digital.net/barcoder/barcode.html
44. Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
45. Voice/audio Input
• Input device: microphone
• Audio input is “digitized” and stored
• Processed in two ways
– As is (no recognition)
– Recognized and converted to alphanumeric data
(ASCII)
Digitize 10110010…
46. Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
51. Objects
• Images made of geometrically definable
shapes
• Offer efficiency, flexibility, small size, etc.
52. Other Input
• OCR – optical character recognition
• Bar code readers
• Voice/audio input
• Punched cards
• Images / objects
• Pointing devices
pp. 69-86
53. Pointing Devices
• Originally used for specifying coordinates
(x, y) for graphical input
• Today used as general purpose device for
“graphical user interfaces” (GUIs)