The table here shows all the standard 7-bit ASCII-coded characters in decimal, octal, and hexadecimal form. This set is sometimes referred to as the "low-8" bit characters, since the highest bit in 8-bit representations is a zero.
The other possible 8 bit characters beyond 127 are not part of ASCII, and can vary depending on system or language settings. Many of the schemes devised for handling more characters have the 7-bit ASCII as a subset. Examples of this include UTF-8 and ISO8859-1.
On Fedora Linux systems, the default language and character set settings are defined by the LANG= entry in the file /etc/sysconfig/i18n.
Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex | Char Dec Oct Hex ------------------------------------------------------------------------------------- (nul) 0 0000 0x00 | (sp) 32 0040 0x20 | @ 64 0100 0x40 | ` 96 0140 0x60 (soh) 1 0001 0x01 | ! 33 0041 0x21 | A 65 0101 0x41 | a 97 0141 0x61 (stx) 2 0002 0x02 | " 34 0042 0x22 | B 66 0102 0x42 | b 98 0142 0x62 (etx) 3 0003 0x03 | # 35 0043 0x23 | C 67 0103 0x43 | c 99 0143 0x63 (eot) 4 0004 0x04 | $ 36 0044 0x24 | D 68 0104 0x44 | d 100 0144 0x64 (enq) 5 0005 0x05 | % 37 0045 0x25 | E 69 0105 0x45 | e 101 0145 0x65 (ack) 6 0006 0x06 | & 38 0046 0x26 | F 70 0106 0x46 | f 102 0146 0x66 (bel) 7 0007 0x07 | ' 39 0047 0x27 | G 71 0107 0x47 | g 103 0147 0x67 (bs) 8 0010 0x08 | ( 40 0050 0x28 | H 72 0110 0x48 | h 104 0150 0x68 (ht) 9 0011 0x09 | ) 41 0051 0x29 | I 73 0111 0x49 | i 105 0151 0x69 (nl) 10 0012 0x0a | * 42 0052 0x2a | J 74 0112 0x4a | j 106 0152 0x6a (vt) 11 0013 0x0b | + 43 0053 0x2b | K 75 0113 0x4b | k 107 0153 0x6b (ff) 12 0014 0x0c | , 44 0054 0x2c | L 76 0114 0x4c | l 108 0154 0x6c (cr) 13 0015 0x0d | - 45 0055 0x2d | M 77 0115 0x4d | m 109 0155 0x6d (so) 14 0016 0x0e | . 46 0056 0x2e | N 78 0116 0x4e | n 110 0156 0x6e (si) 15 0017 0x0f | / 47 0057 0x2f | O 79 0117 0x4f | o 111 0157 0x6f (dle) 16 0020 0x10 | 0 48 0060 0x30 | P 80 0120 0x50 | p 112 0160 0x70 (dc1) 17 0021 0x11 | 1 49 0061 0x31 | Q 81 0121 0x51 | q 113 0161 0x71 (dc2) 18 0022 0x12 | 2 50 0062 0x32 | R 82 0122 0x52 | r 114 0162 0x72 (dc3) 19 0023 0x13 | 3 51 0063 0x33 | S 83 0123 0x53 | s 115 0163 0x73 (dc4) 20 0024 0x14 | 4 52 0064 0x34 | T 84 0124 0x54 | t 116 0164 0x74 (nak) 21 0025 0x15 | 5 53 0065 0x35 | U 85 0125 0x55 | u 117 0165 0x75 (syn) 22 0026 0x16 | 6 54 0066 0x36 | V 86 0126 0x56 | v 118 0166 0x76 (etb) 23 0027 0x17 | 7 55 0067 0x37 | W 87 0127 0x57 | w 119 0167 0x77 (can) 24 0030 0x18 | 8 56 0070 0x38 | X 88 0130 0x58 | x 120 0170 0x78 (em) 25 0031 0x19 | 9 57 0071 0x39 | Y 89 0131 0x59 | y 121 0171 0x79 (sub) 26 0032 0x1a | : 58 0072 0x3a | Z 90 0132 0x5a | z 122 0172 0x7a (esc) 27 0033 0x1b | ; 59 0073 0x3b | [ 91 0133 0x5b | { 123 0173 0x7b (fs) 28 0034 0x1c | < 60 0074 0x3c | \ 92 0134 0x5c | | 124 0174 0x7c (gs) 29 0035 0x1d | = 61 0075 0x3d | ] 93 0135 0x5d | } 125 0175 0x7d (rs) 30 0036 0x1e | > 62 0076 0x3e | ^ 94 0136 0x5e | ~ 126 0176 0x7e (us) 31 0037 0x1f | ? 63 0077 0x3f | _ 95 0137 0x5f | (del) 127 0177 0x7f
The various abbreviations are discussed below. The usual convention for representing the control characters in print is a caret (^) followed by the corresponding character two columns away, in the range 0x3f--0x5f. The ^ usually reads as "control-", for example, ^A (stx or 0x01) is called "Control-A".
Name | Description | |
nul | ^@ | This used to correspond to unmarked paper tape and was thus ignored on reading, at least before the beginning of the data or after the end of it. It is now become mostly used for terminating variable-length strings of characters, the role that the etx code once had. |
soh | ^A | Start of header. Originally used to indicate that the following characters constituted some kind of header block. Now it is just known as "Control-A" |
stx | ^B | Start of text. Indicated that the following characters was some kind of text. In some text processing systems, such as WordStar, it would indicate bold text, probably through the mnemonic device of Control-B for Bold. |
etx | ^C | End of text. This would indicate the end of text blocks. Nowadays, this function is served by the nul, to terminate strings of text in C programs. Instead, the Control-C command is used to break out of programs in CP/M and MS-DOS, and many Unix systems also are configured to use this character through the stty intr setting. |
eot | ^D | End of transmission. The original usage remains in some UNIX shells, where "control-D" is used to signify the end of input. Of course, the stty eof setting can be changed, but this is the traditional default. |
enq | ^E | Enquiry, "Are you ready?" meant to be used for some kind of handshaking, but nowadays, this is just "control-E". The reply would be ack for yes and nak for no. |
ack | ^F | Acknowledge, "Yes I am ready", part of the same kind of handshaking, but nowadays, this is just "control-F". |
bel | ^G | This would ring a bell on the receiving teletype. More recent hardware tends to beep instead, as mechanical bells have been superseded by beeper circuitry. |
bs | ^H | This backspaces, moves the printing head or equivalent one position back. Often used as an erase character, as the backwards movement also indicates that the character is overwritten. This is also where the ^H convention of "retracting" something said^H^H^H^H^H written online comes from. |
ht | ^I | This is the horizontal tab character. Commonly seen in program source to indent the code in a uniform manner. Problem with this is that everyone wants a different number of positions for their code, and the default of 8 characters is usually too big. These tab-characters also can be found as field-separators in text-form data files. As long as there are no such characters as part of the text themselves, that will work. The us character that was originally meant for doing this job doesn't seem to be used for this at all. |
nl | ^J | This is the New-line, or Line-Feed (lf) character. Its original job was to move the paper one line down, and together with cr would complete the transition to a new line. Unix systems have dispensed with cr, here only the single nl is used to separate lines. |
vt | ^K | Vertical tab varies a lot. Some systems used them to move up a line, the opposite direction of the nl. Others don't do anything. |
ff | ^L | Form-feed usually tells printers to eject the page now being printed and a start on a new page. Terminals may clear the screen. In text files, these characters may also be interpreted as page breaks. This is often shown as ^L. |
cr | ^M | On teletypes, the carriage return moved the carriage back to the beginning of the line. The following line-feed would then advance one line. Many operating system's text files still follow this convention, requiring two characters between each line of text. Seeing that this wasn't really necessary, some other operating systems decided that they could do well without one or the other of these: unix systems only use nl and old Apple systems only used cr. |
so | ^N | Shift out. This would allow use of different color ribbon or alternate character sets for the following text. |
si | ^O | Shift in. This would go back to the regular character set, or select the black ribbon or do whatever else would be needed to undo the changes in the so. |
dle | ^P | Data Link Escape. Originally a variation on the escape-char idea, that the following characters would be interpreted as out-of-band messaging of some sort. By analogy with "control-B for Bold", the control-P for Printer would cause screen output to be redirected to an attached printer as well. On PR1ME systems this would cause the program to break, like Control-C usually does elsewhere. |
dc1 | ^Q | Device Control 1, also known as XON, is used with software handshaking to restart output that was stopped with XOFF. |
dc2 | ^R | Device Control 2. Once meant to turn on the paper tape puncher, it doesn't seem to actually control anything in most other implementations. |
dc3 | ^S | Device Control 3 or XOFF is used with software handshaking to stop the stream of output, presumably so that some device or reader can catch up. This stream is restarted with XON. |
dc4 | ^T | Device Control 4. Once meant to turn off the paper tape puncher, it doesn't seem to actually control anything in most other implementations. |
nak | ^U | Negative Acknowledge, "No I am not ready", part of the same kind of handshaking as used with enq and ack, but nowadays, this is often the stty kill input command Control-U, that wipes out the entire typed-in line and allows starting anew. By the analogy of Control-B for Bold, Control-U for Underline has also been seen in text-processing programs such as WordStar. |
syn | ^V | Synchronisation idle characters. These were ignored, they were just there to keep things moving in a synchronous link in place of any real data. |
etb | ^W | End of transmission block. The following data may be check-sum or something similar. |
can | ^X | Cancel, ignore the information or text just sent. |
em | ^Y | End of Medium, this indicates that there is nothing to follow on this tape. |
sub | ^Z | Substitute the following character in from an alternate set or such. Although this is better known as control-Z. CP/M and MS-DOS systems used this to indicate the end of a file, and used ^Z, though the original meaning of em or fs above seem to be much closer to this. |
esc | ^[ | Escape. Used with ANSI control of terminals and printers, to indicate that the following characters have some special meaning to the device and is not part of the text. It is also used in the vi editor to go back to command mode. |
fs | ^\ | File Separator. This would have been the original End-of-file character code that should been used instead of ^Z. |
gs | ^] | Group Separator. The ^] (control-right-bracket) is used for temporarily escaping from telnet sessions. No idea how to type this on a Europeian keyboard however... |
rs | ^^ | Record separator. Not used as such, seems everyone uses their variety of line-break instead. Like us, this would have simplified things when wanting to have line-breaks as part of the data elements. |
us | ^_ | Unit Separator, or field-separator, for the units or data-elements inside records. It's just unfortunate that we seem to always want to use commas, semicolons, tabs, and suchlike used as field-separators instead, with predictable and complicating consequences when wanting any of these characters as part of the actual data. |
sp | This is the interword space. It sits right at the boundary between the control and printable characters. Like ht, nl and ff, it is considered a non-printing separating character, but it also appears as the most common and very important printing character in most texts. | |
del | ^? | On paper tapes, this characters was all holes, and thus served to delete any mistaken character that would be present. Thus, it was to be ignored in a data stream. Current usage is as an alternative to the back-space for erasing single characters, but it is interesting to note the difference between the character meaning "perform the deletion", and the character on the paper tape actually being the result of a deletion. |