jagomart
digital resources
picture1_Language Pdf 98413 | India Report O C 2009


 144x       Filetype PDF       File size 0.06 MB       Source: ahclab.naist.jp


File: Language Pdf 98413 | India Report O C 2009
speech corpora o c09 india recent developments in text speech corpora i 500 hindi sentences 40 speakers 2 utterances 4 age groups s n 50db a star tools for indian ...

icon picture PDF Filetype PDF | Posted on 21 Sep 2022 | 3 years ago
Partial capture of text on file.
                                                                                                                                                                                  Speech Corpora         O-C09 -India
                          Recent Developments in Text, Speech Corpora  &                                                                                       (i)   500 Hindi Sentences – 40 Speakers (2 Utterances) 4 age groups, S/N> 50db – A STAR    
                                               Tools for Indian Languages                                                                                           project – CDAC Noida.
                                                                                                                                                               (ii)  1250 phonetically rich Hindi sentences by one male speaker – phonetically labelled  
                                                       Country Report – India                                                                                       using HTK tool kit (manually corrected) for developing Hindi TTS – CDAC Noida.
                                    O-OCOCOSDA 2009, Urumqi, China, 11 August, 2009                                                                            (iii)  1500 hrs speech corpora in Telugu, Kannada & Indian English – IIIT, Hyd. (LDC - IL).
                                                                                                                                                               (iv)  50 hrs of Annotated speech corpora for six Indian Language i.e. Hindi, Marathi, Punjabi -
                                                                                                                                                                    by CDAC, Noida
                                                                                                                                                                    Bengali, Assamese & Manipuri language - by CDAC Kolkata
                                                    Dr. S. S. Agrawal,                                                                                              Tamil, Telugu, Malayalam and Kannada are under development – by  CDAC, Thir.
                                                Advisor, C-DAC,  Noida                                                                                         (v)  Speech data base from 300 persons in each of 14 languages ranging from 39 hrs  
                                   Executive Director, KIIT, Gurgaon, India                                                                                         for Urdu to 159 hrs for Hindi (alongwith meta data information) – CIIL / LDC – IL.
                       Email:  ssagrawal@cdacnoida.in,  ss_agrawal@hotmail.com                                                                                 (vi)   50k phonetically rich Hindi sentences, Transcribed Hindi speech data base for very   
                                                                                                                                                                     large no. of speakers – TIFR.
                                                                                                                                                               (vii)   Multi-channel, Multi-lingual database for 100 speakers in Contemporary/Non contemporary 
                                                                                                                                                                     situations – for applications in Language & Text independent Speaker Recognition. – CFSL
                                                                                                                                                               (viii)  Data base for Dialectal variations, Domain Specific applications Emotional variations,  
                                                                                                                                                                         telephone / mobile phones,  speech etc. – KIIT, DRDO, IPU
                                         Text Corpora           O-C09 - India 
             (1)    Tagged Corpus of 200k words in Hindi, Punjabi, Urdu, Bengali, Marathi, Tamil,                                                                                      Speech Recognition O-C09 -India
                    Telugu, Malyalam – Consortium 
                    project (MCIT/GOI) - IIT(B), IIT(Kharagpur), IIIT(Hyd) and other Univ.
               (2)    Parallel corpus of 15k sentences – consortium members.
               (3)    Transcription of 3000 pages of parallel text in 5 languages – Telugu, Hindi, Tamil,                                                      (i)      Language models for Tamil Speech Recognition – Anna University.
                    Kannada and Indian English – CIIL / IDC-IL
               (4)    Text data base: PB words, PB sentences connected text, dates, command,                                                                   (ii)     Large Vocabulary Speech Recognition System for Telugu, Tamil, Marathi and Hindi – Anna 
                    control words, proper nouns, names, most frequent words – (1000) Forms,                                                                             University, H.P. Labs, IIT(M).
                    function words, new domain words etc. (14 Indian Languages) – CIIL / LDC – IL
                                                                                                                                                               (iii)    (a) Speaker Independent – Hindi CSR based on > 65000 words. 90% Accuracy IBM India   
                                    Standardization                                                                                                                         Research Lab. 
                                                                                                                                                                        (b) Telephone Speech Recognition System for Hindi – IBM India Research Lab.
                                                                                                                                                                            (Based on Adaptation of IBM via-voice speech Recognition System)
               1. Standardization of Phonetic Alphabet of Indian Languages - IPA level 
                    standardization - 3 Indian                                                                                                                 (iv)         Speech to Text System for Hindi – Shruti- lekhan – Prototype - CDAC Pune
                   languages - Hindi, Bengali and Assamese(Electro Palatogram based) – CDAC, 
                    Kolkata/DIT (MCIT)                                                                                                                         (v)         Manner Based Lexically Driven Bengali Speech Recognition System – CDAC Kolkata
              2.  Signal to symbol transformation model Symbols – Phoneme like units / more than 
                    phoneme like units – IIIT (Hyd.)
              3.   (i)   Speech Application Program Interface SAPI (Microsoft)
                   (ii)  Speech Synthesis Markup Language (SSML) – W3C
                   (iii)  Speech Recognition Grammar Specification (SRGS) – W3C
                   (iv)  Semantic Interpretation for Speech Recognition (SIRS) – W3C
                   Speech Synthesis / Text to Speech O-C09 -India                                                                                                                                        Tools    O-C09 -India
                                                                                                                                                                     (i)      Semi automatic tools for developing Speech Corpora (5 levels of annotation) - phoneme, 
                 1.    Festival and HMM based  TTS for Hindi – CDAC,Noida                                                                                                     syllabi, word, phrase and POS.:- Standard format.
                 2.    Festival Framework based TTS for Tamil – Anna University                                                                                      (ii)     Pronounciation dictionaries: In 12 Indian languages (user friendly displays) : CIIL / LDC - IL
                 3.    Festival based TTS for Hindi for Nokia – IIIT, Hyderabad
                 4.    Festival based TTS for Telugu for Bhrigus - IIIT, Hyderabad                                                                                   (iii)    Algorithm for Automatic syllabification of Speech units – CDAC, Noida, Thiruvanantpuram
                 5.    Festival based TTS for Domain specific applications—MSIT/KIIT
                 6.    TTS voices in four languages: Telugu, Hindi, Kannada and Tamil – IIIT, Hyd – (DIT/MCIT).                                                      (iv)     Processing of Laughter Speech – IIIT (Hyd.)
                 7.     Vaachak:    Concatinative TTS for Hindi, work going on for Indian English – followed for other                                               (v)      - Carnatic Music Information Retrieval System  for Musical Characteristics, singers, 
                                   Indian Language – SAPI Compliant – Prologix Software                                                                                         instruments, emotion, ragas, talam etc. - Anna Univ.
                                                                                                                                                                              - Screen reading facility TTS System – IIIT Hyderabad
                 8.     Hindi Vani:   TTS for Hindi based on Klatt’s format synthesizer – α version released – CEERI/DIT                                                      - Summerization
                                    (GOI) 
                 9.    Bangla Vani: Concatinative Synthesizer for Bangla and Nepali (ESNOLA Based)  - CDAC Kolkata  
                                    / DIT (GOI)
                 10.   Subhasini – TTS for Malayalam : Based on diphonic concatenation – supports ISCII, ISFOC &  
                                     UNICODE –CDAC, Thiruvanantpuram
The words contained in this file might help you see if this file matches what you are looking for:

...Speech corpora o c india recent developments in text i hindi sentences speakers utterances age groups s n db a star tools for indian languages project cdac noida ii phonetically rich by one male speaker labelled country report using htk tool kit manually corrected developing tts ococosda urumqi china august iii hrs telugu kannada english iiit hyd ldc il iv of annotated six language e marathi punjabi bengali assamese manipuri kolkata dr agrawal tamil malayalam and are under development thir advisor dac v data base from persons each ranging executive director kiit gurgaon urdu to alongwith meta information ciil email ssagrawal cdacnoida ss hotmail com vi k transcribed very large no tifr vii multi channel lingual database contemporary non situations applications independent recognition cfsl viii dialectal variations domain specific emotional telephone mobile phones etc drdo ipu tagged corpus words malyalam consortium mcit goi iit b kharagpur other univ parallel members transcription pages...

no reviews yet
Please Login to review.