Koniec z us-ascii w Internecie? - Grupy dyskusyjne w eGospodarka.pl

eGospodarka.pl › Grupy › pl.internet.polip › Koniec z us-ascii w Internecie? › Koniec z us-ascii w Internecie?

Data: 2008-04-01 18:55:40
Temat: Koniec z us-ascii w Internecie?
Od: "Andrzej P. Wozniak" <u...@p...onet.pl.invalid> szukaj wiadomości tego autora
[ pokaż wszystkie nagłówki ]
Nie wiem, czy ktoś zauważył publikację nowych RFC dotyczących rezygnacji
z us-ascii w Internecie, więc streszczam.

1. RFC 5198
Unicode Format for Network Interchange
Category: Standards Track

This document proposes to establish "Net-Unicode" as a new
standardized text transmission form for the Internet, to serve as an
internationalized alternative for NVT ASCII when specified in new --
and, where appropriate, updated -- protocols. UTF-8 [RFC3629] is
chosen for the coding because it has good compatibility properties
with ASCII and for other reasons discussed in the existing IETF
character set policy [RFC2277].

Czyli: zamieniamy us-ascii na utf-8, jako znaków końca linii używamy tylko
CRLF, wywalamy tabulator, backspace itp. - rzeczy nudne i oczywiste. Całość
dostępna tu: http://www.rfc-editor.org/rfc/rfc5198.txt

Drugi RFC jest ciekawszy. Dotyczy tylko nazw domenowych, ale oznacza również
rezygnację z utf-8.

2. RFC 5242
A Generalized Unified Character Code: Western European and CJK Sections
Category: Informational

Many issues have been identified with the use of general-purpose
character sets for internationalized domain names and similar
purposes. This memo specifies a fully unified coded character set
for scripts based on Latin, Greek, Cyrillic, and Chinese characters.

There are four important principles in this work:

1. If it looks alike, it is alike. The number of base characters
and marks should be minimized. Glyphs are more important than
character abstractions.

2. If it is the same thing, it is the same thing. Two symbols that
have the same semantic meaning in all contexts should be encoded
in a way that allows their identity to be discovered by removing
modifiers, rather than having to resort to external equivalence
tables.

3. For simplicity, when a character form can be evaluated on the
basis of either serif or sanserif fonts, the sanserif font is
always preferred.

4. The use of combining characters and modifiers is preferred to
adding more base characters.

Based on these principles, it becomes obvious that:

o Ligatures, digraphs, and final forms are constructed with special
modifiers so that relationships to basic forms are obvious.

o Symbols consisting of multiple marks are always constructed from
combining characters and positional modifiers; thus, the "i"
character is constructed from the vertical line symbol followed by
a combining dot above. Similarly "f" is composed of a centered
vertical line, a right hook in the top position, and an
appropriately-positioned composing hyphen.

This document draws strongly from the design and terminology of
Unicode [Unicode] but represents a radically different approach.

Nie chce mi się rozwijać sprawy, więc polecam samodzielną lekturę:
http://www.rfc-editor.org/rfc/rfc5242.txt

--
Andrzej P. Woźniak u...@p...onet.pl (zamień miejscami z<->h w adresie)
...admin z przypadku często pewnych rzeczy nie wie/nie zna/nie umie/
nie ma w zwyczaju. Znam z własnego doświadczenia. A to się potem mści.
Także na nim samym. -- Mariusz Kruk na pl.comp.os.advocacy