Localization and Internationalization: Difference between revisions

Added
 
m Protected "Localization and Internationalization": Excessive spamming ([edit=autoconfirmed] (indefinite) [move=autoconfirmed] (indefinite))
 
(100 intermediate revisions by 8 users not shown)
Line 56: Line 56:


Usually people wont ask for internationalized softwares, but they expect that the software follows their local conventions correctly.
Usually people wont ask for internationalized softwares, but they expect that the software follows their local conventions correctly.
'''Gettext'''
* Get text is the tool which is used for runtime internationalisation.
* Internationalisation is achieved in gettext through the following phases:
## Preparing source code for internationalisation
## Extraction process
## Translation Process
## Compilation of translation
## Retrieval of translation


==Localization==
==Localization==
Line 92: Line 101:
==Globalization==
==Globalization==
The term globalization (G11N) is often used synonymously with internationalization.  But usually it encompasses both Internationalization and Localization. It is a process which involves design, implementation and localization.
The term globalization (G11N) is often used synonymously with internationalization.  But usually it encompasses both Internationalization and Localization. It is a process which involves design, implementation and localization.
== The Importance of Localization ==
Currently, people who want to use computers must first learn English. In a country with low [[w:Literacy|literacy rates]], this blocks access to [[w:Information technology|information and communications technologies]] (ICTs), especially for the rural poor and women who do not have equal access to education. Even after having learnt English, users must pay hundreds of dollars to license foreign software, or resort to widespread illegal copying of software, in order to gain access to ICTs. In short, access to information technology is one of the keys to development, and localized FOSS applications remain a crucial missing link in communications infrastructure.
[[w:Localization|Localization]] brings the following benefits:
# Significantly reduces the amount of training necessary to empower end-users to use a computer system.
# Facilitates the introduction of computer technology in [[w:Small and medium enterprises|Small and Medium Enterprises]] (SMEs).
# Opens the way for the development of computer systems for a country's national, provincial and district level administration that will allow civil servants to work entirely in the local language and manage databases of local language names and data.
# Facilitates the decentralization of data at provincial and district levels. The same applies to utility companies (electricity, water, telephone), who will develop local language databases, thereby reducing costs and giving better service to citizens.
# Allows citizens to communicate through e-mail in their own language.
# Empowers local software development companies to work for the administration, the public sector and private companies.
# Provides the local design industry with good fonts.
# Helps universities train more software engineers.
The beneficiaries of this multi-stakeholder project are:
# Directly, all local computer users, who will have easier access to the use of computers as they will not have to learn English first.
# Indirectly, through improvements in governance using native computer systems, all local citizens in the quality of their dealings with the administration.
# The local government who will have the opportunity to develop databases and applications in the local language. Sufficient technology and empowered local development companies will be available. The government will also have the tool to coordinate applications among similar administrations (e.g., provinces), so that IT-based improvements in governance can be made at the lowest possible cost.
# The [[w:Software industry|software industry]]. The government's use of standards-compliant computer technology encourages software companies to start developing compatible computer systems that will be used by the different bodies of the administration, thereby creating a stable software industry in the country. Once this expertise is developed (using FOSS), these companies will be empowered to undertake similar projects for foreign companies at extremely competitive prices, facilitating sales beyond the local market.
Source: http://en.wikibooks.org/wiki/FOSS_Localization/Introduction


==Culturally Biased wrong Assumptions==
==Culturally Biased wrong Assumptions==
Line 98: Line 129:
*'''Words are seperated by space''':Korean and Thai don't have the concept of word separation
*'''Words are seperated by space''':Korean and Thai don't have the concept of word separation
*'''Punctuation is same''':English uses ? for question mark. Spanish uses same sign, but upside down.
*'''Punctuation is same''':English uses ? for question mark. Spanish uses same sign, but upside down.
*'''Text is written left to write''': Arabic and Hebrew are bidirectional. Mongolian is written vertically from left to right. Chinese can b written left to right horizontally or right to left vertically.
*'''Text is written left to write''': Arabic and Hebrew are bidirectional. Mongolian is written vertically from left to right. Chinese can be written left to right horizontally or right to left vertically.
*'''All calendar systems are Gregorian''': Thai government allows only Buddhist calendar for business.   
*'''All calendar systems are Gregorian''': Thai government allows only Buddhist calendar for business.   
*'''Characters are eight bit''': 8 bit character representation cannot hold all characters in the world.Usually 16 bit-Unicode is used
*'''Characters are eight bit''': 8 bit character representation cannot hold all characters in the world.Usually 16 bit-Unicode is used
*'''Words contains consonants and vowels''':Arabic and Hebrew don't require vowels.
*'''Words contains consonants and vowels''':Arabic and Hebrew don't require vowels.


===Rules of Thumb for Software Internationalization===
==Rules of Thumb for Software Internationalization==
Internationalized software must enable easy porting to other locales. A locale defines language and specific cultural conventions. The process of adjusting internationalized software to a particular locale is called localization (a common acronym for this term is L10N). You can think of software internationalization as a prerequisite for localization. Localization consists of more than just translating the user interface. Consider North America and Britain, for instance. Seemingly, they use the same language. However, not only do these locales differ in spelling (program vs. programme, realize vs. realise, color vs. colour etc.), certain cultural conventions such date formatting (in Europe the date format is DDMMYYY whereas in north America it's MMDDYYYY), currency, and measurement system. Other locales exhibit additional cultural differences. In Germany and other European countries, the sign of a decimal fraction (also called radix) is a comma, e.g., 10,5. In North America, the radix is called "decimal point", and as the name suggests, it's represented as follows: 10.5.
Internationalized software must enable easy porting to other locales. A locale defines language and specific cultural conventions. The process of adjusting internationalized software to a particular locale is called localization (a common acronym for this term is L10N). You can think of software internationalization as a prerequisite for localization. Localization consists of more than just translating the user interface. Consider North America and Britain, for instance. Seemingly, they use the same language. However, not only do these locales differ in spelling (program vs. programme, realize vs. realise, color vs. colour etc.), certain cultural conventions such date formatting (in Europe the date format is DDMMYYY whereas in north America it's MMDDYYYY), currency, and measurement system. Other locales exhibit additional cultural differences. In Germany and other European countries, the sign of a decimal fraction (also called radix) is a comma, e.g., 10,5. In North America, the radix is called "decimal point", and as the name suggests, it's represented as follows: 10.5.


Line 115: Line 146:


==Locale==
==Locale==
A locale denotes a specific language along with its conventional information such as date, currency, calendar, number format etc.It also includes the following
A locale denotes a specific language along with its conventional information such as date, currency, calendar, number format etc.It also includes the following:
# Names of the months
# Names of the months
# Days of the week
# Days of the week
Line 126: Line 157:


===Script Type===
===Script Type===
====Alphabetic====
'''Alphabetic''':
Individual units  for writing are composed of consonants, and in some cases vowels. When compined they spell out words phonetically. Eg: Indic, Arabic, Latic, Greek etc.
Individual units  for writing are composed of consonants, and in some cases vowels. When combined they spell out words phonetically. Eg: Indic, Arabic, Latic, Greek etc.


====Syllabic====
'''Syllabic''':
The individual units for writing are composed of syllables. Eg: Japanese kana and Korean Hangul
The individual units for writing are composed of syllables. Eg: Japanese kana and Korean Hangul


====Ideographic====
'''Ideographic''':
A writing system which uses pictures or symbols to represent words. Eg: Chinese
A writing system which uses pictures or symbols to represent words. Eg: Chinese


===Context dependent Glyph Shaping===
===Context dependent Glyph Shaping===
====Positional====
'''Positional''':
The shape of the character changes depending on the position in the word. Eg: Arabic greek.
The shape of the character changes depending on the position in the word. Eg: Arabic greek.
====Ligatures====
 
'''Ligatures''':
Characters combine to form a different shape when they appear next to one another. In Indic scripts ligatures are mandatory.
Characters combine to form a different shape when they appear next to one another. In Indic scripts ligatures are mandatory.
====Cursive====
 
'''Cursive''':
The letters are joined while writing.
The letters are joined while writing.
Arabic is an example.But English is not of this kind.
Arabic is an example.But English is not of this kind.
===Text Direction===
===Text Direction===
====Left to right====
 
'''Left to right''':
Text is written left to right horizontally. Eg: Indic, English
Text is written left to right horizontally. Eg: Indic, English
====Bidirectional====
 
'''Bidirectional''':
Examples are Arabic and Hebrew.Text is written right to left while numbers and latin words are written left to right.
Examples are Arabic and Hebrew.Text is written right to left while numbers and latin words are written left to right.
====Vertical====
 
'''Vertical''':
In Chinese and Japanese text is written vertically
In Chinese and Japanese text is written vertically
===Other Chracteristics===
 
====Diacritics====
===Other Characteristics===
 
'''Diacritics''':
Special marks used for accents, tones, and vowels, or to uniquely identify a character. In some writing  systems such as Indic and Thai, diacritics can span multiple characters.
Special marks used for accents, tones, and vowels, or to uniquely identify a character. In some writing  systems such as Indic and Thai, diacritics can span multiple characters.
====Word seperator====
 
Most of the languages use space as word seperator.Exceptions are Chinese, Thai , and Japanese
'''Word seperator''':
====Punctuation====
Most of the languages use space as word separator. Exceptions are Chinese, Thai , and Japanese
 
'''Punctuation''':
Marks are inconsistent across writing systems
Marks are inconsistent across writing systems
'''A detailed description of above writing systems can be found at [[wikipedia:writing system|Wikipedia page on Writing Systems]]'''


==Unicode==
==Unicode==
[http://unicode.org Unicode.org]
 
'''Unicode''' is an industry standard designed to allow  text and symbols from all of the [[wikipedia:writing systems |writing systems]] of the world to be consistently represented and manipulated by computers. Developed in tandem with the Universal Character Set standard and published in book form as ''The Unicode Standard'', Unicode consists of a character repertoire, an encoding methodology and set of standard character encodings, a set of code charts for visual reference, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and rules for normalization, decomposition, [[wikipedia:collation|collation]] and rendering.
 
The Unicode Consortium, the non-profit organization that coordinates Unicode's development, has the ambitious goal of eventually replacing existing character encoding schemes with Unicode and its standard Unicode Transformation Format (UTF) schemes, as many of the existing schemes are limited in size and scope and are incompatible with multilingual environments. Unicode's success at unifying character sets has led to its widespread and predominant use in the internationalization and  localization of computer software. The standard has been implemented in many recent technologies, including XML, the Java (programming language)|Java programming language and modern operating systems.
 
More details at
#[[wikipedia:Unicode | Wikipedia page on Unicode]]
#[http://unicode.org Unicode.org]
 
==Internationalized Resource Identifiers==
 
Internationalized Resource Identifiers (IRI) is also known as Multilingual Web Addresses.
Currently Web addresses are typically expressed using Uniform Resource Identifiers or URIs. This restricts Web addresses to a small number of characters: basically, just upper and lower case letters of the English alphabet, European numerals and a small number of symbols. Recent developments enable you to add non-ASCII characters to Web addresses.
 
Detailed information is available from [http://www.w3.org/International/articles/idn-and-iri/ An Introduction to Multilingual Web Addresses]


==Input Methods==
==Input Methods==
Input methods are applications or software components that convert users key strokes into symbols, characters or words.
Input methods are applications or software components that convert users key strokes into symbols, characters or words.
An '''input method editor''' ('''IME''') is a program or operating system component that allows computer users to enter characters and symbols not found on their [[wikipedia:Keyboard layout|keyboard]]. This, for instance, allows the user of a Western keyboard to input [[wikipedia:Chinese character|Chinese]], [[wikipedia:Japanese character|Japanese]], [[wikipedia:Hangul|Korean]] and [[wikipedia:Indic_script|Indic]] characters.
This is intended as a non-exhaustive list of [[wikipedia:input method editor|input method]]s for UNIX platforms.
{|class="wikitable"
! Name !! Languages supported !! Implementations supported
|-
|[[wikipedia:SCIM|SCIM]]
|Multiple languages, including CJK
|GTK+ , Qt and XIM
|-
|[[wikipedia:uim|uim]]
|Multiple languages, including CJK
|GTK+, Qt, XIM, Leim, Tty (Unix) and TSM (Mac OS X)
|-
|[http://xcin.linux.org.tw/ xcin]
|Mainly for traditional Chinese; adapted for use for simplified Chinese.
|XIM
|-
|[http://www.inputking.com/ InputKing]
|Traditional Chinese and simplified Chinese.
|Browser based.
|-
|[http://im-ja.sourceforge.net/ im-ja]
|Japanese
|GTK+ and XIM
|-
|kinput2
|Japanese
|XIM, kinput2 protocol
|-
|ami
|Korean
|XIM
|-
|[http://kldp.net/projects/imhangul/ imhangul]
|Korean
|GTK+
|-
|[http://nabi.kldp.net/document_en.html Nabi]
|Korean
|XIM
|-
|[http://kldp.net/projects/qimhangul/ qimhangul]
|Korean
|Qt
|-
|[http://xvnkb.sourceforge.net/ xvnkb]
|Vietnamese
|XIM
|-
|[http://www.unikey.org/linux.php x-unikey]
|Vietnamese
|XIM
|}
Source: [[wikipedia:input method editor|wikipedia page on Input Method Editor]]


==Appendix==
==Appendix==
===ISO codes for languages===
===ISO codes for languages===
Refer http://www.unicode.org/unicode/onlinedat/languages.html
===Unicode Ranges===
===Unicode Ranges===
Refer the Unicode charts http://unicode.org/charts/
==References==
==References==
* ''Java Internationalization'', Andrew Deitsch and David Czarnecki, O'Reilly, First Edition,2001,p 1-15
* ''Java Internationalization'', Andrew Deitsch and David Czarnecki, O'Reilly, First Edition,2001,p 1-15


==Related Links==
==Related Links==
*[http://en.wikibooks.org/wiki/FOSS_Localization FOSS Localization]
*[http://developer.gnome.org/doc/tutorials/gnome-i18n/developer.html Gnome L10N Guidelines for Developers by Christian Rose]
*[[wikipedia:Date and time notation by country|Date and time notation by country]]
*[[wikipedia:Date and time notation by country|Date and time notation by country]]
*[[wikipedia:List of languages by name|List of languages by name]]
*[[wikipedia:List of languages by name|List of languages by name]]
Line 178: Line 299:
*[[ಕನ್ನಡ| ಕನ್ನಡ (Kannada)]] - Kannada FOSS Team
*[[ಕನ್ನಡ| ಕನ್ನಡ (Kannada)]] - Kannada FOSS Team
*[[తెలుగు|తెలుగు (Telugu)]] - Telugu FOSS Team
*[[తెలుగు|తెలుగు (Telugu)]] - Telugu FOSS Team
  Feel free to edit this page and share your knowledge/experience in this subject - [[User:Santhosh|Santhosh Thottingal]]