Note: Currently new registrations are closed, if you want an account Contact us
Difference between revisions of "Localization and Internationalization"
Line 15: | Line 15: | ||
comment6, http://connect.silive.com/user/tadalafil4/index.html buy tadalafil 20mg, sqghni, http://connect.silive.com/user/vardenafil1/index.html vardenafil 20mg, 803095, http://www.feedage.com/users/meridia/ buy generic meridia 15mg, 713175, http://cialissuperactive.tumblr.com/ order cialis super active, grpmu, http://levitranoprescription.tumblr.com/ levitra no prescription, 647, | comment6, http://connect.silive.com/user/tadalafil4/index.html buy tadalafil 20mg, sqghni, http://connect.silive.com/user/vardenafil1/index.html vardenafil 20mg, 803095, http://www.feedage.com/users/meridia/ buy generic meridia 15mg, 713175, http://cialissuperactive.tumblr.com/ order cialis super active, grpmu, http://levitranoprescription.tumblr.com/ levitra no prescription, 647, | ||
comment6, http://www.qbn.com/ViagraForSale/634048/ viagra for sale usa, 73876, http://www.qbn.com/Tadacip/634052/ buy tadacip, 643, http://intensedebate.com/people/Phentermine4 phentermine 37.5, twz, http://www.mmorpg.com/profile.cfm/username/BrandCialis buy brand cialis, qpnier, http://www.mmorpg.com/profile.cfm/username/Proventil proventil inhaler, =[[[, | |||
==Culturally Biased wrong Assumptions== | ==Culturally Biased wrong Assumptions== |
Revision as of 16:48, 7 July 2010
Now the whole earth had one language and few words. And as men migrated from the east, they found a plain in the land of Shinar and settled there. And they said to one another,"Come, let us make bricks, and burn them thoroughly", And they had brick for stone, and bitumen for mortar. Then they said said, "Come, let us build ourselves a city, and a tower with its top in the heavens, and let us make a name for ourselves, lest we be scattered abroad upon the face of the whole earth"
And the Lord came down to see the city and the tower, which the sons of men had built. And the Lord said, "Behold, they are one people, and they have all one language;and this is only the beginning of what they will do; and nothing that they propose to do will now be impossible for them.Come, let us go down, and there confuse their language, that they may not understand one another's speech"...
comment6, http://wiki.obgyn.net/account/Escitalopram escitalopram oxalate tablets, 84889, http://wiki.obgyn.net/account/Buspirone buspirone hcl, 2473, http://live.gnome.org/Dorian%20Zyvox zyvox cost, :PPP, http://www.feedage.com/users/meridia/ buy meridia no prescription, 333461, http://www.formspring.me/Lorazepam lorazepam 1mg tablets, %-(,
comment3, http://transworld.net/author/sibutramine buy sibutramine 15mg, 8-[, http://transworld.net/author/clonazepam buy clonazepam online, 1811, http://www.livevideo.com/BuyCymbalta buy cymbalta cheap, 604, http://bham.academia.edu/AdipexWithoutPrescription order adipex without prescription, :-(((, http://bham.academia.edu/OnlinePharmacyWithoutaPrescription online pharmacy, bkvgq,
comment6, http://www.livevideo.com/Sildenafil4 sildenafil, >:-), http://www.livevideo.com/Sildenafil5 sildenafil tablets offers, sthzs, http://bham.academia.edu/AzithromycinWithoutPrescription purchase azithromycin, 6523, http://www.videojug.com/user/pepcid pepcid, >:-DD, http://www.videojug.com/user/lansoprazole lansoprazole, qhy,
comment1, http://wiki.obgyn.net/account/Ephedra buy ephedra online, =-OOO, http://wiki.obgyn.net/account/Phendimetrazine buy phendimetrazine, cvnk, http://gravatar.com/cheapcigarettes4 buy cheap cigarettes, 8]], http://www.stanford.edu/group/voxclara/cgi-bin/?q=content/antibiotics-without-prescription antibiotics without prescription, 8-OO, http://www.stanford.edu/group/voxclara/cgi-bin/?q=content/buy-cheap-birth-control-pills-order-birth-control-pills-online-no-prescription birth control pills, 8O,
comment6, http://connect.silive.com/user/tadalafil4/index.html buy tadalafil 20mg, sqghni, http://connect.silive.com/user/vardenafil1/index.html vardenafil 20mg, 803095, http://www.feedage.com/users/meridia/ buy generic meridia 15mg, 713175, http://cialissuperactive.tumblr.com/ order cialis super active, grpmu, http://levitranoprescription.tumblr.com/ levitra no prescription, 647,
comment6, http://www.qbn.com/ViagraForSale/634048/ viagra for sale usa, 73876, http://www.qbn.com/Tadacip/634052/ buy tadacip, 643, http://intensedebate.com/people/Phentermine4 phentermine 37.5, twz, http://www.mmorpg.com/profile.cfm/username/BrandCialis buy brand cialis, qpnier, http://www.mmorpg.com/profile.cfm/username/Proventil proventil inhaler, =[[[,
Culturally Biased wrong Assumptions
- All letters are between A and Z: Does not hold good for non-English words
- All scripts contains upper and lower case letters: Chinese, Indian, Korean, Japanese scripts does not have the concept of case.
- Words are seperated by space:Korean and Thai don't have the concept of word separation
- Punctuation is same:English uses ? for question mark. Spanish uses same sign, but upside down.
- Text is written left to write: Arabic and Hebrew are bidirectional. Mongolian is written vertically from left to right. Chinese can be written left to right horizontally or right to left vertically.
- All calendar systems are Gregorian: Thai government allows only Buddhist calendar for business.
- Characters are eight bit: 8 bit character representation cannot hold all characters in the world.Usually 16 bit-Unicode is used
- Words contains consonants and vowels:Arabic and Hebrew don't require vowels.
Rules of Thumb for Software Internationalization
Internationalized software must enable easy porting to other locales. A locale defines language and specific cultural conventions. The process of adjusting internationalized software to a particular locale is called localization (a common acronym for this term is L10N). You can think of software internationalization as a prerequisite for localization. Localization consists of more than just translating the user interface. Consider North America and Britain, for instance. Seemingly, they use the same language. However, not only do these locales differ in spelling (program vs. programme, realize vs. realise, color vs. colour etc.), certain cultural conventions such date formatting (in Europe the date format is DDMMYYY whereas in north America it's MMDDYYYY), currency, and measurement system. Other locales exhibit additional cultural differences. In Germany and other European countries, the sign of a decimal fraction (also called radix) is a comma, e.g., 10,5. In North America, the radix is called "decimal point", and as the name suggests, it's represented as follows: 10.5.
Locales use different character codesets (7-bit ASCII, EBCDIC, Unicode) and fonts (Latin, Hebrew, Cyrillic). There are a few basic guidelines to follow in order to ensure easy software localization:
- Avoid any hard code literal text in your code. Instead, use string tables or environment variables.
- Use wide characters instead of narrow characters. C and C++ support the wchar_t datatype. C++ also Compose decimal numbers and dates from dynamic lexical units. Such lexical units are strings that contain a locale-specific representation of a fraction sign, currency symbol, and date format separators.
- Don't assume anything about text directionality. Semitic languages such as Arabic and Hebrew are written right-to-left, as opposed to European languages. Consequently, menus, frames and pages are aligned differently in such languages. Some Asian languages are written bottom-up.
- Be ready to deal with several calendars. Other calendars such as the Muslim, Chinese and Hebrew calendars may be used in addition to the Gregorian calendar in certain locales.
- Avoid any assumption about religious matters and holidays. For example, in non-Christian countries, December 25 is usually an ordinary business day and so is Sunday. An internationalized banking system should be ready to process transactions from foreign branches on Sunday, for example.
Locale
A locale denotes a specific language along with its conventional information such as date, currency, calendar, number format etc.It also includes the following:
- Names of the months
- Days of the week
- First day of the week
- Collation sequencing (Sort order)
- Time Zone information
Writing Sytems
A writing system, or script is not a language; it is a means of conveying information through written language.They can be classified as follows.
Script Type
Alphabetic: Individual units for writing are composed of consonants, and in some cases vowels. When combined they spell out words phonetically. Eg: Indic, Arabic, Latic, Greek etc.
Syllabic: The individual units for writing are composed of syllables. Eg: Japanese kana and Korean Hangul
Ideographic: A writing system which uses pictures or symbols to represent words. Eg: Chinese
Context dependent Glyph Shaping
Positional: The shape of the character changes depending on the position in the word. Eg: Arabic greek.
Ligatures: Characters combine to form a different shape when they appear next to one another. In Indic scripts ligatures are mandatory.
Cursive: The letters are joined while writing. Arabic is an example.But English is not of this kind.
Text Direction
Left to right: Text is written left to right horizontally. Eg: Indic, English
Bidirectional: Examples are Arabic and Hebrew.Text is written right to left while numbers and latin words are written left to right.
Vertical: In Chinese and Japanese text is written vertically
Other Characteristics
Diacritics: Special marks used for accents, tones, and vowels, or to uniquely identify a character. In some writing systems such as Indic and Thai, diacritics can span multiple characters.
Word seperator: Most of the languages use space as word separator. Exceptions are Chinese, Thai , and Japanese
Punctuation: Marks are inconsistent across writing systems
A detailed description of above writing systems can be found at Wikipedia page on Writing Systems
Unicode
Unicode is an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a character repertoire, an encoding methodology and set of standard character encodings, a set of code charts for visual reference, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and rules for normalization, decomposition, collation and rendering.
The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java (programming language)|Java programming language and modern operating systems.
More details at
Internationalized Resource Identifiers
Internationalized Resource Identifiers (IRI) is also known as Multilingual Web Addresses. Currently Web addresses are typically expressed using Uniform Resource Identifiers or URIs. This restricts Web addresses to a small number of characters: basically, just upper and lower case letters of the English alphabet, European numerals and a small number of symbols. Recent developments enable you to add non-ASCII characters to Web addresses.
Detailed information is available from An Introduction to Multilingual Web Addresses
Input Methods
Input methods are applications or software components that convert users key strokes into symbols, characters or words.
An input method editor (IME) is a program or operating system component that allows computer users to enter characters and symbols not found on their keyboard. This, for instance, allows the user of a Western keyboard to input Chinese, Japanese, Korean and Indic characters.
This is intended as a non-exhaustive list of input methods for UNIX platforms.
Name | Languages supported | Implementations supported |
---|---|---|
SCIM | Multiple languages, including CJK | GTK+ , Qt and XIM |
uim | Multiple languages, including CJK | GTK+, Qt, XIM, Leim, Tty (Unix) and TSM (Mac OS X) |
xcin | Mainly for traditional Chinese; adapted for use for simplified Chinese. | XIM |
InputKing | Traditional Chinese and simplified Chinese. | Browser based. |
im-ja | Japanese | GTK+ and XIM |
kinput2 | Japanese | XIM, kinput2 protocol |
ami | Korean | XIM |
imhangul | Korean | GTK+ |
Nabi | Korean | XIM |
qimhangul | Korean | Qt |
xvnkb | Vietnamese | XIM |
x-unikey | Vietnamese | XIM |
Source: wikipedia page on Input Method Editor
Appendix
ISO codes for languages
Refer http://www.unicode.org/unicode/onlinedat/languages.html
Unicode Ranges
Refer the Unicode charts http://unicode.org/charts/
References
- Java Internationalization, Andrew Deitsch and David Czarnecki, O'Reilly, First Edition,2001,p 1-15
Related Links
- FOSS Localization
- Gnome L10N Guidelines for Developers by Christian Rose
- Date and time notation by country
- List of languages by name
Indian Localization Efforts
- IndLinux - Indic Localization project
- മലയാളം (Malayalam) - FOSS Malayalam Community
- ಕನ್ನಡ (Kannada) - Kannada FOSS Team
- తెలుగు (Telugu) - Telugu FOSS Team