Note: Currently new registrations are closed, if you want an account Contact us

Localization and Internationalization

From FSCI Wiki

Now the whole earth had one language and few words. And as men migrated from the east, they found a plain in the land of Shinar and settled there. And they said to one another,"Come, let us make bricks, and burn them thoroughly", And they had brick for stone, and bitumen for mortar. Then they said said, "Come, let us build ourselves a city, and a tower with its top in the heavens, and let us make a name for ourselves, lest we be scattered abroad upon the face of the whole earth"

And the Lord came down to see the city and the tower, which the sons of men had built. And the Lord said, "Behold, they are one people, and they have all one language;and this is only the beginning of what they will do; and nothing that they propose to do will now be impossible for them.Come, let us go down, and there confuse their language, that they may not understand one another's speech"...

-Genesis

comment5, http://videomaker.com/community/forums/profile/hydrochlorothiazide buy hydrochlorothiazide without prescription, xbf, http://videomaker.com/community/forums/profile/furosemide buy furosemide, 133937, http://people.ischool.berkeley.edu/~prateekk/drupal-6.14/?q=node/95 blood pressure medication, 8)), http://gravatar.com/vardenafil4 vardenafil hcl, jlo, http://gravatar.com/erectiledysfunction4 erectile dysfunction pills over the counter,  %-D,

comment5, http://b39.cal.pl/w0Bq673wa buy hydrochlorothiazide 25 mg, jmfwb, http://b39.cal.pl/u4Kh648hk buy lopressor online, 205912, http://live.gnome.org/Dorian%20Zyvox buy zyvox, 503709, http://sciencestage.com/levitra4 acheter levitra generique, 911899, http://sciencestage.com/cialis4 achat cialis 20mg, yzlrm,

comment1, http://www.computer.org/portal/web/tse/forum/-/message_boards/message/2753156 buy antibiotics without prescription, zmoij, http://www.computer.org/portal/web/tse/forum/-/message_boards/message/2734841 where to buy gabapentin, 49063, http://www.qbn.com/Tadacip/634052/ tadacip 20mg, 504, http://videomaker.com/community/forums/profile/ambien buy ambien without rx, 655072, http://videomaker.com/community/forums/profile/valium purchase valium online, pomhnv,

comment6, http://www.computer.org/portal/web/tse/forum/-/message_boards/message/2753156 buy antibiotics without prescription, gjdd, http://www.computer.org/portal/web/tse/forum/-/message_boards/message/2734841 buy gabapentin uk, soin, http://www.qbn.com/Tadacip/634052/ buy tadacip, >:-(((, http://videomaker.com/community/forums/profile/ambien buy ambien without rx, rltca, http://videomaker.com/community/forums/profile/valium purchase valium, =DD,

comment4, http://www.videojug.com/user/pepcid buy pepcid online, lty, http://www.videojug.com/user/lansoprazole buy lansoprazole, frffds, http://www.more.com/user/profile/25700 buy meridia without prescription, 2117, http://wiki.obgyn.net/account/Escitalopram buy escitalopram without prescription, =-PP, http://wiki.obgyn.net/account/Buspirone buspirone, =))),

comment4, http://www.codeplex.com/site/users/view/Female_Viagra female viagra pills, sudcq, http://www.codeplex.com/site/users/view/Avodart buy avodart without prescription,  %-O, http://bham.academia.edu/AzithromycinWithoutPrescription azithromycin without prescription, 04314, http://ru.gravatar.com/sildenafil4 sildenafil over the counter uk, oaoxz, http://en.gravatar.com/soma34 cheap soma online, kwio,

comment4, http://en.kioskea.net/communaute/profil-replica+watches replica watches for sale,  :[[[, http://en.kioskea.net/communaute/profil-breitling+replica+watches cheap breitling replica watches, mddi, http://www.guardian.co.uk/users/Vardenafil vardenafil 20mg tab, ihxel, http://www.threadless.com/profile/1288319/Replica_Handbags replica designer handbags, 052437, http://www.threadless.com/profile/1288326/Replica_Louis_Vuitton_Handbags replica louis vuitton handbags wholesale,  %-[,

Rules of Thumb for Software Internationalization

Internationalized software must enable easy porting to other locales. A locale defines language and specific cultural conventions. The process of adjusting internationalized software to a particular locale is called localization (a common acronym for this term is L10N). You can think of software internationalization as a prerequisite for localization. Localization consists of more than just translating the user interface. Consider North America and Britain, for instance. Seemingly, they use the same language. However, not only do these locales differ in spelling (program vs. programme, realize vs. realise, color vs. colour etc.), certain cultural conventions such date formatting (in Europe the date format is DDMMYYY whereas in north America it's MMDDYYYY), currency, and measurement system. Other locales exhibit additional cultural differences. In Germany and other European countries, the sign of a decimal fraction (also called radix) is a comma, e.g., 10,5. In North America, the radix is called "decimal point", and as the name suggests, it's represented as follows: 10.5.

Locales use different character codesets (7-bit ASCII, EBCDIC, Unicode) and fonts (Latin, Hebrew, Cyrillic). There are a few basic guidelines to follow in order to ensure easy software localization:

  • Avoid any hard code literal text in your code. Instead, use string tables or environment variables.
  • Use wide characters instead of narrow characters. C and C++ support the wchar_t datatype. C++ also Compose decimal numbers and dates from dynamic lexical units. Such lexical units are strings that contain a locale-specific representation of a fraction sign, currency symbol, and date format separators.
  • Don't assume anything about text directionality. Semitic languages such as Arabic and Hebrew are written right-to-left, as opposed to European languages. Consequently, menus, frames and pages are aligned differently in such languages. Some Asian languages are written bottom-up.
  • Be ready to deal with several calendars. Other calendars such as the Muslim, Chinese and Hebrew calendars may be used in addition to the Gregorian calendar in certain locales.
  • Avoid any assumption about religious matters and holidays. For example, in non-Christian countries, December 25 is usually an ordinary business day and so is Sunday. An internationalized banking system should be ready to process transactions from foreign branches on Sunday, for example.

Locale

A locale denotes a specific language along with its conventional information such as date, currency, calendar, number format etc.It also includes the following:

  1. Names of the months
  2. Days of the week
  3. First day of the week
  4. Collation sequencing (Sort order)
  5. Time Zone information

Writing Sytems

A writing system, or script is not a language; it is a means of conveying information through written language.They can be classified as follows.

Script Type

Alphabetic: Individual units for writing are composed of consonants, and in some cases vowels. When combined they spell out words phonetically. Eg: Indic, Arabic, Latic, Greek etc.

Syllabic: The individual units for writing are composed of syllables. Eg: Japanese kana and Korean Hangul

Ideographic: A writing system which uses pictures or symbols to represent words. Eg: Chinese

Context dependent Glyph Shaping

Positional: The shape of the character changes depending on the position in the word. Eg: Arabic greek.

Ligatures: Characters combine to form a different shape when they appear next to one another. In Indic scripts ligatures are mandatory.

Cursive: The letters are joined while writing. Arabic is an example.But English is not of this kind.

Text Direction

Left to right: Text is written left to right horizontally. Eg: Indic, English

Bidirectional: Examples are Arabic and Hebrew.Text is written right to left while numbers and latin words are written left to right.

Vertical: In Chinese and Japanese text is written vertically

Other Characteristics

Diacritics: Special marks used for accents, tones, and vowels, or to uniquely identify a character. In some writing systems such as Indic and Thai, diacritics can span multiple characters.

Word seperator: Most of the languages use space as word separator. Exceptions are Chinese, Thai , and Japanese

Punctuation: Marks are inconsistent across writing systems

A detailed description of above writing systems can be found at Wikipedia page on Writing Systems

Unicode

Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a character repertoire, an encoding methodology and set of standard character encodings, a set of code charts for visual reference, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and rules for normalization, decomposition, collation and rendering.

The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java (programming language)|Java programming language and modern operating systems.

More details at

  1. Wikipedia page on Unicode
  2. Unicode.org

Internationalized Resource Identifiers

Internationalized Resource Identifiers (IRI) is also known as Multilingual Web Addresses. Currently Web addresses are typically expressed using Uniform Resource Identifiers or URIs. This restricts Web addresses to a small number of characters: basically, just upper and lower case letters of the English alphabet, European numerals and a small number of symbols. Recent developments enable you to add non-ASCII characters to Web addresses.

Detailed information is available from An Introduction to Multilingual Web Addresses

Input Methods

Input methods are applications or software components that convert users key strokes into symbols, characters or words.

An input method editor (IME) is a program or operating system component that allows computer users to enter characters and symbols not found on their keyboard. This, for instance, allows the user of a Western keyboard to input Chinese, Japanese, Korean and Indic characters.

This is intended as a non-exhaustive list of input methods for UNIX platforms.


Name Languages supported Implementations supported
SCIM Multiple languages, including CJK GTK+ , Qt and XIM
uim Multiple languages, including CJK GTK+, Qt, XIM, Leim, Tty (Unix) and TSM (Mac OS X)
xcin Mainly for traditional Chinese; adapted for use for simplified Chinese. XIM
InputKing Traditional Chinese and simplified Chinese. Browser based.
im-ja Japanese GTK+ and XIM
kinput2 Japanese XIM, kinput2 protocol
ami Korean XIM
imhangul Korean GTK+
Nabi Korean XIM
qimhangul Korean Qt
xvnkb Vietnamese XIM
x-unikey Vietnamese XIM

Source: wikipedia page on Input Method Editor

Appendix

ISO codes for languages

Refer http://www.unicode.org/unicode/onlinedat/languages.html

Unicode Ranges

Refer the Unicode charts http://unicode.org/charts/

References

  • Java Internationalization, Andrew Deitsch and David Czarnecki, O'Reilly, First Edition,2001,p 1-15

Related Links

Indian Localization Efforts