A few words about encodings
Anyone who knows everything about this can skip this article, but I will tell the rest about the causes of various encodings and the problems of web design associated with them.
History of occurrence
The encoding is a character table, where each letter of the alphabet (as well as numbers and special characters) is assigned its own unique number – the character code.
Only half of the standardized tables, so-called ASCII code – the first 128 characters, which include the letters of the Latin alphabet. And there is never a problem with them. The second half of the table (with a total of 256 characters – by the number of states that can take one byte) is given for national characters, and in each country this part is different. But only in Russia managed to come up with as many as 5 different encodings. The term “different” means that the same symbol corresponds to a different digital code. Those. if we incorrectly determine the encoding of the text, then we will see absolutely unreadable text.
Encoding appeared historically. The first widely used Russian encoding was called KOI-8. It came up, when adapted to the Russian language UNIX system. This was back in the seventies – before the advent of desktops. And so far in UNIX it is considered the main encoding.
Then the first personal computers appeared, and the victorious DOS procession began. Instead of using the already-invented encoding, Microsoft decided to make its own, incompatible with anything. So there was a DOS encoding (or 866 code page). In it, by the way, special characters were introduced for drawing frames, which was widely used in programs written under DOS. For example, in the same Norton Commander-e.
In parallel with IBM-compatible, Macintosh computers were also developed. Despite the fact that their share in Russia is very small, however, the need for russification existed and, of course, was invented by another coding – MAC.
Time passed, and in 1990 Microsoft showed to the light the first successful version Windows 3.0-3.11. And with it the support of national languages. And again the same trick was done as with DOS. For unknown reasons, they did not support any of the previously existing ones (as OS / 2 did, which took the DOS encoding as standard), but proposed a new Win encoding (or code page 1251). De facto, it has become the most common in Russia.
And finally, the fifth version of the encoding is no longer associated with a particular company, but with attempts to standardize encodings at the level of the entire planet. I dealt with this ISO – International Standards Organization. And guess what they did with the Russian language? Instead of being taken for a “standard Russian” any of the above, they came up with another (!) And called it a long unpalatable combination of ISO-8859-5. Of course, she, too, was incompatible with anything. And at the moment, this encoding is practically never used. It seems that it is used only in an Oracle database. At least I have never seen text in this encoding. However, its support is present in all browsers.
Now work is underway to create a new universal encoding (UNICODE), which is supposed to cram all the languages of the world into one code table. Then there will definitely be no problems. For this, 2 bytes were allocated for each character. Thus, the maximum number of characters in the table has grown to 65535. But till the moment when all will pass to the UNICODE, there is still too much time.
Web Design & Encodings
And now that all these codes are associated with web-design. The problem lies in web-servers and in browsers. Both components have to communicate in one language and one encoding, and only in this case, the browser will understand what he is sending server.
On the server side, we must install a system that can pre-send a message about in which encoding the page will be sent. And the browser must accept this message and, accordingly, tune in to the desired display. If everything is done correctly, then there are no problems. But reality makes its corrections and incorrect configuration of the web-server can lead to the fact that by sending a message that the page will now be in the coding win-1251, the server sends it in KOI-8. Of course, the browser is confused, because it is not able to independently determine the encoding of the page. He only uses the server instructions and, accordingly, shows a page incorrectly.
There is a way to specify the page encoding not on the server, but directly in the HTML code. To do this, use a special version of the META tag with the charset parameter that sets the desired language. For example, for pages written in coding Win1251, the corresponding code will look like this:
<meta http-equiv = “content-type” content = “text / html;
charset = Windows-1251 “>