jagomart
digital resources
picture1_Tamil Alphabets Pdf 98946 | Chapter 29


 163x       Filetype PDF       File size 0.15 MB       Source: www.apiit.edu.in


File: Tamil Alphabets Pdf 98946 | Chapter 29
a transliteration keyboard configuration with tamil unicode characters m a c m raafi and h m nasir department of mathematical sciences south eastern university of sri lanka e mail raafim ...

icon picture PDF Filetype PDF | Posted on 21 Sep 2022 | 3 years ago
Partial capture of text on file.
                                                                              
                                                                              
              A Transliteration Keyboard Configuration with Tamil Unicode 
                                                                  Characters 
                                                                              
                                                                              
                                                                           *                   #
                                                          M.A.C.M. Raafi  and H. M. Nasir  
                                                                              
                                    *Department of Mathematical Sciences, South Eastern University of Sri Lanka 
                                                               E-mail: raafim@seu.ac.lk 
                                           #
                                            Department of Mathematics, University of Peradeniya, Sri Lanka 
                                                               E-mail: nasirh@pdn.ac.lk 
                                                                              
            
           Abstract                                                           Aayitha character. In Tamil word-processors the large 
           Keyboard configurations for typing are available for many  numbers of compound alphabets are obtained by a sequential 
           languages and for data processing tasks. The common  keying of the corresponding consonant and vowel. For 
           keyboard used today is QWERTY keyboard. The QWERTY  example, the keystrokes for consonant k (க்) followed by 
           keyboard layout is specially designed for typing English  vowel I leads to appearance of compound character ki (கி). 
           alphabets and numerals. Typing for other languages needs 
           these configurations which remap the QWERTY keys to fit for  Keyboard layouts of this kind have been called "phonetic". 
           other languages. This configuration often faces difficulties due   Tamil transliteration is phonetic keyboard system. Thus, the 
           to large number of character sets in these languages other than    Tamil word for father (அப்பா) is written as appA  (or appaa),  
           English. To solve this issue, transliteration keyboard mother (அம்மா)  as 'ammA' (or as ammaa) in the 
           configuration is to be considered. Transliteration is a method 
           by which one could read a text of a language in the writing  transliteration program. 
           method of another language. In this paper, phonetically we              The following advantages are available normally in ourv 
           discuss about developing a transliteration keyboard 
                                                                              transliteration system. 
           configuration for Tamil language using Unicode encodings.               1.  A user-friendly keystrokes; users easily type in more 
                                                                                       familiar way. 
                                                                                   2.  No need to memorize whole the mapping key strokes 
           Introduction                                                                of the keyboard. 
           Input devices are used to enter data and commands in the                3.  New person entering from some other language can 
           computer system for data processing work. One of the                        type easily. 
           commonly used input devices is the keyboard which consists              4.  We don’t  need to change the font each time to type 
           of letters, numerals and other special characters.                          following characters special character and symbols 
               There are different types of keyboard system available in               such as: / , :  <  >  |  )  (  *  &  ^  %  $  # @   ! ~ + 
           the computing environment. The standard keyboard is known                   ?...............etc 
           as the QWERTY keyboard. This keyboard is specially                      5.  By introducing Unicode   
           designed to type English Language letters and related                            a.  It can be displayed everywhere 
           symbols. Use of other languages, such as Asian languages, the                    b.  No matter about the language 
           QWERTY keyboard is inconvenient.                                    
             Entering these Asian language characters using this                   6.  No matter about the font 
           QWERTY keyboard is impossible without a proper                          7.  Wrong word format is being corrected. 
           convenient configuration mapping for the English keys in the   
           keyboard. Even with the configuration mapping, typing the  Encoding Systems  
           letter of the language is difficult, because one has to memorize   Encoding scheme is a necessary part of the configuration of a 
           or be familiar with the keyboard mapping in the configuration.     keyboard layout for the transliteration program. The encoding 
             Despite these limitations, transliteration is to be is the system by which the characters in a set are represented 
           considered for typing texts to the benefit of end users. The  in binary form in a file. In computers and in data transmission 
           transliteration is the process by which one reads and between them, i.e. in digital data processing and transfer, data 
           pronounces the words and sentences of one language using the  is internally presented as octets, as a rule. Octets are often 
           letters and special symbols of another language.  It is helpful  called bytes, but in principle, octet is a more definite concept 
           in situations where one does not know the script of a language  than byte. Internally, octets consist of eight bits [6]. 
           but knows how to speak and understand the language [1].             
               For example, one of the Asian languages, Tamil, can be  Tamil Character encodings 
           introduced to English literate Tamils and non-Tamils with a  In Tamil, the forms of some of the letters differ from one to 
           transliteration scheme.  There are 247 characters in Tamil: 12  another for the same vowel sound. This is the reason for the 
           vowels, 18 consonant, 216 compound alphabets and one  inclusion of a high number of letters in the Tamil keyboards 
                                                                      [Page No. 135] 
            th
           5  IEEE International Conference on Advanced Computing & Communication Technologies [ICACCT-2011] ISBN 81-87885-03-3 
            
           designed so far. Tamil is a language, where in addition to the  Unicode Code Charts  
           basic vowels (uyir) and consonants (mei), the compounded  The code charts that follow present the characters of the 
           (uyirmei) characters, all have unique glyph forms. Some  Unicode Standard. Characters are organized into related 
           popular Tamil font encoding schemes are TSCII, TAM, TAB,  groups called blocks. In the Unicode Standard, character 
           ISCII and Unicode.                                                 blocks generally contain characters from a single script. In 
                                                                              many cases, a script is fully represented in its character block. 
           TSCII                                                              There are, however, important exceptions, most notably in the 
           The first and most popular one is the Tamil Standard Code for  area of punctuation characters. 
           Information Interchange (TSCII), a glyph-based, 8-bit  
           bilingual encoding. It uses a unique set of glyphs; the usual   
           lower ASCII set. Roman letters with standard punctuation  Literature Review 
           marks occupy the first 128 slots and the Tamil glyphs occupy  Transliteration of Asian language input is a subject of recent 
           the upper ASCII segment with slots 128-256.                        research. During the past several years, different methods have 
                                                                              been introduced to prepare Indian language documents by 
           TAM and TAB                                                        entering the text through specific transliteration schemes. Data 
                                                                              entry through transliteration is quite close to phonetic mapping 
           TAM is a Monolingual encoding scheme (TAmil 
           Monolingual) where TAB is a Bilingual encoding scheme  of Indian language characters to the letters of the Roman 
           (TAmil Bilingual). They were proposed by the Tamil Nadu  alphabet.  
           Government. TAM is  limited use in an OS environment.                   The earliest and widely used transliteration scheme is 
                                                                              what is known as Library Of Congress Transliteration 
           ISCII                                                              Scheme. This uses roman alphabets with diacritics (horizontal 
           Indian Standard Code for Information Interchange, ISCII is a  bars or circles added above or below roman alphabets) to 
           8-bit /single byte umbrella standard, defined in such a way that   represent alphabets of Indian languages. Diacritical markers 
           all Indian languages can be treated using one single character  added to a letter or symbol show its pronunciation, accent, 
           encoding scheme. ISCII is a bilingual character encoding (not  etc., typically indicating that a phonetic value is different from 
           glyphs-based!) scheme. Roman characters and punctuation  the unmarked state. The scheme is very general in scope and 
           marks as defined in the standard lower-ASCII take up the first  hence can be used in almost all world languages. Established 
           half the character set (first 128 slots). Characters for Indic  Tamil research centers all around the world are aware of this 
           languages are allocated to the upper slots (128-255) [5].          scheme and  most of them implement this scheme as such 
                                                                              without modifications [5]. 
           Unicode                                                                 ADAMI was one of the early Tamil word-processors for 
           Unicode is an international standard for multi-lingual word-       MS-DOS PCs produced by Dr. K. Srinivasan of Canada in 
           processing. It is a two-byte encoding scheme which covers the  early eighties released in 1984 to recast such transliterated text 
           entire world's common writing systems. It represents each  into Tamil. The Tamil text is to be typed using a plain ASCII 
           character as a 2-byte number, from 0 to 65535. Each 2 byte  transliteration scheme. Upon compiling and execution of the 
           number represents a unique character used in at least one of  linked macro, this romanized text page is recast on screen in 
           the world's languages. There is exactly 1 number per  equivalent Tamil. One needs to return to the romanized text 
           character, and exactly 1 character per number. It provisions  mode to make the corrections if any. In a more recent version 
           over 65000 slots to handle nearly all world more than 50  of this software called THIRU, a split screen, where the roman 
           languages simultaneously. Along with other Asian languages,  text being typed in the bottom half of the screen is 
           for example Tamil has been assigned specific slots from  continuously recast in the upper half in Tamil. ADHAWIN is 
           U+0B80 to  U+0BFF (which, in decimal, is from 2944 to  another recent implementation of the same software  for 
           3071; 128 locations) in this multi-lingual standard [6].           Windows-based PCs [5]. 
             Unicode encodes only basic vowels and consonant                       Murasu and Anjal word-processing packages are widely 
           characters and a set of modifiers to represent situations where  used in Malaysian, Singaporean and Tamil Newspapers and 
           the vowel/consonant pair appear as a combination (uyirmei) in  Magazines. These packages belong to the group of "romanized 
           Tamil language. Unicode file stores textual information solely  input and interpreted output" tools. The ‘inaimathi’ and related 
           at this "character" level. It does not care about the actual form  fontfaces used in these packages are of the 8-bit bilingual type. 
           of the glyphs. Rendering of the glyphs corresponding to stored     The first 128 (0-127) slots are filled by roman characters as in 
           characters is left to softwares.                                   basic ASCII and the Tamil characters occupy the upper ASCII 
               Once we get beyond the ASCII world, there are many  slots (128-255). By invoking the keyboard editor it is possible 
           different native encodings for different languages and to access either of these two blocks. In the Tamil typing mode, 
           operating systems. Converting between all of these is easiest  the roman keyboard strokes and their relative sequence are 
           with a central "common point", and that is Unicode.                continuously interpreted to present equivalent Tamil 
               Technically, Unicode is used wherever the characters  characters on screen. Thus we can type 'kathai' to get the 
           used are all drawn from the Unicode set in other words, just  equivalent Tamil word ‘கைத’  [8]. 
           about everywhere. Systems that use ASCII are also using   
           Unicode, since Unicode contains the ASCII set and gives them   
           the same code points they had in ASCII [6].                        Keyboard Configuration Program 
                                                                              There are number of computer programs used  to develop 
                                                                      [Page No. 136] 
                                                                        A Transliteration Keyboard Configuration with Tamil Unicode Characters 
                                                                                                                                            
           transliteration keyboard configuration softwares such as  Methodology 
           Keyman, C, C++, Java. In our work we take Keyman as a  The keyboard program interprets and translates input from the 
           keyboard configuration program. Keyman is a keyboard  computer keyboard according to a set of rules called a 
           management utility that makes it practical to input many  keyboard. Transliteration of Tamil has to fit the need for 
           different languages. It is fully supports Unicode and allows us  Tamil to be recognized as the only other known language 
           to creating our own keyboard layouts for use. It interprets and  comparable to the English language with a 26-letter keyboard. 
           translates input from the computer keyboard according to a set   It is the plan of our work to develop simple methods to use 
           of rules called a keyboard. These rules are stored in a  Tamil in the computer and introduce Tamil through 
           keyboard file. It includes features such as an on-screen  transliteration. 
           keyboard, phonetic and visual-order input methods.                 We have over 230 characters in Tamil language;13 
               Keyman  includes full support for Unicode. It  support  vowels(uyir), 18 consonants(meis) and compound (uyirmeis) 
           input and output of any of the thousands of characters defined  derived from these. Tamil is one of the Indian languages 
           in Unicode. There are two applications included in Keyman  where many of the compound (uyirmei) alphabets have 
           Developer:  TIKE and KMComp. TIKE, the Tavultesoft  complex geometric structure (glyph) of their own. There are 
           Integrated Keyboard Editor is a complete environment for  12 vowels characters and one aayitham letter in Tamil 
           designing, developing, testing, and packaging our keyboards  language.  
           for distribution.                                                    There are 18 Mei Letters(consonants) and 216 Uyir-Mei. 
               KMComp, the command-line compiler, is a simple tool  The Mei characters are  created with sign Anushvara ( ◌ஂ ). The 
           that lets us compile keyboards, packages, and installers from  Uyir-Mei letters are created by the combination of the above 
           the command-line. This is useful if we want to use batch  12 Uyir letter with the  18 Mei characters(12X18).  
           builds or Make files.                                                Also there  are 13 digits used in Tamil. These character 
                                                                            digits are now not  much used by people but these characters 
           Keyboard File                                                    were used in early times. They are as follows: 
           Keyboard file is the most important component in a keyboard                                      
           configuration. It contains the set of rules to represent the 
           particular keyboard. As we want to create a new keyboard, we 
           want to create a keyboard file. There are two ways to create a       0    1     2   3     4    5     6      7     8      9     10    100   1000  
           keyboard file:                                                    
                                                                             
           The Keyboard Wizard                                              Choosing the mapping for characters  
           It gives us a simple interface to quickly create a keyboard  We define the output characters to be produced by the 
           using a visual representation of a computer keyboard. We can  keyboard. We select the appropriate keystrokes from the 
           drag and drop characters from a character map, and create  QWERTY keyboard to map the output characters. Some 
           ANSI and Unicode keyboard layouts. We cannot access most  keystrokes are used to represent output characters while some 
           of the programs more powerful features from the Keyboard  keys are not. These keystrokes that do not represent any output 
           Wizard, but it will be useful to get us started on our design.  are called dead keys. Dead keys produce null output.  
           We can convert keyboards created in the Keyboard Wizard to   
           standard program source files in TIKE.                           Analyzing the Keystrokes and Assigning Keystroke  
                                                                            We want to analyze how to create all the Tamil Characters 
           The Keyboard Language                                            using this limited number of codes. Some characters have 
           It provides the flexibility that is needed to write keyboards  direct Unicode numbers so it can be assigned directly while 
           with complex character management, including constraints,  some other characters; they don’t have their own Unicode 
           dead keys, post-entry parsing, virtual key management  numbers. So, we have to assign them for Unicode characters 
           (accessing any key on the keyboard), and other  features.        by combining two or more other Unicode characters. It is 
               A keyboard file is divided into two sections: the header  being assigned a key or collection of key strokes to a 
           and the rules. The header section defines the name of the  particular character or combination of characters to represent 
           keyboard, its bitmap, and other general settings. The rules are  Tamil characters. To represent a character one or more key 
           used to define how the keyboard responds to keystrokes from  strokes can be used. 
           the user, and are divided into groups.                               The 247 letters in the Tamil alphabet are the product of 31 
               The keyboard header is the first part of a keyboard; it  basic Tamil letters. 18 English letters have similar sound 
           consists of statements that help Keyman identify the keyboard  connection with 18 Tamil letters. It is only the 13 remaining 
           and set default options for it. Each statement in the header  Tamil letters that need a ‘sound connection’ with English. We 
           must be on a separate line and is usually written with capital  can make the ‘sound connection,’ - that is, devise the new 
           letters. The body of the keyboard is another the most  connections- by allocating letters that are in use either in 
           important part: it determines the behavior of the keyboard.  combination or singly as follows; 
           The body consists of groups, which in turn contain one or 
           more rules which define the responses of the keyboard to             + "a" > U+0B85                                          அ 
           certain keystrokes.                                                  U+0B85 + "a" > U+0B86                                  ஆ 
                                                                             
            
                                                                    [Page No. 137] 
          th
         5  IEEE International Conference on Advanced Computing & Communication Technologies [ICACCT-2011] ISBN 81-87885-03-3 
          
             +"A" > U+0B86                                    ஆ            diveintophython.org/toc /index.html. 
              + "i" > U+0B87                                   இ      [7]  Muguntharaj, Tamil-TSCIIANJAL, 1998. 
                                                                      [8]  Muthu Nedumaran, Murasu Anjal, 2000. 
             U+0B87 + "i" > U+0B88                              ஈ     [9]  Ramalingam Shanmugalingam, jAzhan,Transliteration 
             + "I" > U+0B88                                    ஈ           of Tamil to English for the Information Technology, 
                                                                           2002. 
                                                                      [10]  Samaranayake, V. K., Nandasara, S. T., Dissanayake, 
         Conclusion                                                        J. B.,  Weerasinghe,   A.R.,Wijayawardhana, H., An 
         Usage of Tamil language in computers enters a new era with        Introduction to UNICODE for Sinhala Characters, 
         the emerge of the Unicode standard with the support of more       University of Colombo School of Computing, 2003. 
         modern platforms and applications. These days, most of the   
         Tamil websites support Unicode and typography related 
         techniques also switching into the new standard. 
             This paper is useful to people who are interested in 
         developing their own transliteration softwares to type words 
         and sentences for their word  processing work and to do World 
         Wide Web applications easily using QWERTY keyboard.  
           Also this study provides solutions for some existing 
         problems with Tamil typography. Many non-Unicode Tamil 
         fonts with stylish glyphs are available at present. Usage of 
         such fonts in documents can give great appearance. But due to 
         the unfamiliar keyboard mapping to these fonts, these are not 
         widely used in typing of Tamil. It is possible to develop these 
         stylish fonts into familiar keyboard configuration mapping, of 
         course with the support of keyboard configuration 
         environment. Then we can use it with our keyboard 
         configuration. 
             It is also possible to extend this keyboard configuration to 
         other platforms like Linux, Mac OS, Solaris, etc. as these are 
         already supporting Unicode. Only thing to be done is to set up 
         a keyboard layout in each Operating system’s native format. 
          
          
         Appendix 
         Some Typing Example. 
         naan or nAn                                        நான் 
         avan                                              அவன் 
         manithan                                        மனிதன் 
         paadasaalai                                     பாடசாைல  
         paLkaLaikazakam                          பல்கைலகழகம் 
          
          
         References 
          
            [1]  Acharya, Multilingual Computing for Literacy and 
                Education, SDL, IIT Madrass,  India, 
                http://acharya.iitm.ac.in/acharya.html, 2005. 
            [2]  Addison-Wesley Pub Co, The Unicode Standard 3.0 
                (www.unicode.org), 1998.  
            [3]  Elengo, Tamil 99 Keyboard Layout, 
                www.cadgraf.com, 2000. 
            [4]  Ilakkuvanar, S., Tholkappiyam in English. 
            [5]  Kalyanasundaram, K., An Overview Of Different 
                Tools For Word-ProcessingOf Tamil And A Proposal 
                Towards Standardisation, Institute of Physical 
                Chemistry, Swiss Federal Inst. of Technology, 1997. 
            [6]  Mark Pilgrim, “Python and Unicode”, http:// 
                                                            [Page No. 138] 
The words contained in this file might help you see if this file matches what you are looking for:

...A transliteration keyboard configuration with tamil unicode characters m c raafi and h nasir department of mathematical sciences south eastern university sri lanka e mail raafim seu ac lk mathematics peradeniya nasirh pdn abstract aayitha character in word processors the large configurations for typing are available many numbers compound alphabets obtained by sequential languages data processing tasks common keying corresponding consonant vowel used today is qwerty example keystrokes k followed layout specially designed english i leads to appearance ki numerals other needs these which remap keys fit layouts this kind have been called phonetic often faces difficulties due system thus number sets than father written as appa or appaa solve issue mother amma ammaa be considered method one could read text language writing program another paper phonetically we following advantages normally ourv discuss about developing using encodings user friendly users easily type more familiar way no need...

no reviews yet
Please Login to review.