144x Filetype PDF File size 0.06 MB Source: ahclab.naist.jp
Speech Corpora O-C09 -India Recent Developments in Text, Speech Corpora & (i) 500 Hindi Sentences – 40 Speakers (2 Utterances) 4 age groups, S/N> 50db – A STAR Tools for Indian Languages project – CDAC Noida. (ii) 1250 phonetically rich Hindi sentences by one male speaker – phonetically labelled Country Report – India using HTK tool kit (manually corrected) for developing Hindi TTS – CDAC Noida. O-OCOCOSDA 2009, Urumqi, China, 11 August, 2009 (iii) 1500 hrs speech corpora in Telugu, Kannada & Indian English – IIIT, Hyd. (LDC - IL). (iv) 50 hrs of Annotated speech corpora for six Indian Language i.e. Hindi, Marathi, Punjabi - by CDAC, Noida Bengali, Assamese & Manipuri language - by CDAC Kolkata Dr. S. S. Agrawal, Tamil, Telugu, Malayalam and Kannada are under development – by CDAC, Thir. Advisor, C-DAC, Noida (v) Speech data base from 300 persons in each of 14 languages ranging from 39 hrs Executive Director, KIIT, Gurgaon, India for Urdu to 159 hrs for Hindi (alongwith meta data information) – CIIL / LDC – IL. Email: ssagrawal@cdacnoida.in, ss_agrawal@hotmail.com (vi) 50k phonetically rich Hindi sentences, Transcribed Hindi speech data base for very large no. of speakers – TIFR. (vii) Multi-channel, Multi-lingual database for 100 speakers in Contemporary/Non contemporary situations – for applications in Language & Text independent Speaker Recognition. – CFSL (viii) Data base for Dialectal variations, Domain Specific applications Emotional variations, telephone / mobile phones, speech etc. – KIIT, DRDO, IPU Text Corpora O-C09 - India (1) Tagged Corpus of 200k words in Hindi, Punjabi, Urdu, Bengali, Marathi, Tamil, Speech Recognition O-C09 -India Telugu, Malyalam – Consortium project (MCIT/GOI) - IIT(B), IIT(Kharagpur), IIIT(Hyd) and other Univ. (2) Parallel corpus of 15k sentences – consortium members. (3) Transcription of 3000 pages of parallel text in 5 languages – Telugu, Hindi, Tamil, (i) Language models for Tamil Speech Recognition – Anna University. Kannada and Indian English – CIIL / IDC-IL (4) Text data base: PB words, PB sentences connected text, dates, command, (ii) Large Vocabulary Speech Recognition System for Telugu, Tamil, Marathi and Hindi – Anna control words, proper nouns, names, most frequent words – (1000) Forms, University, H.P. Labs, IIT(M). function words, new domain words etc. (14 Indian Languages) – CIIL / LDC – IL (iii) (a) Speaker Independent – Hindi CSR based on > 65000 words. 90% Accuracy IBM India Standardization Research Lab. (b) Telephone Speech Recognition System for Hindi – IBM India Research Lab. (Based on Adaptation of IBM via-voice speech Recognition System) 1. Standardization of Phonetic Alphabet of Indian Languages - IPA level standardization - 3 Indian (iv) Speech to Text System for Hindi – Shruti- lekhan – Prototype - CDAC Pune languages - Hindi, Bengali and Assamese(Electro Palatogram based) – CDAC, Kolkata/DIT (MCIT) (v) Manner Based Lexically Driven Bengali Speech Recognition System – CDAC Kolkata 2. Signal to symbol transformation model Symbols – Phoneme like units / more than phoneme like units – IIIT (Hyd.) 3. (i) Speech Application Program Interface SAPI (Microsoft) (ii) Speech Synthesis Markup Language (SSML) – W3C (iii) Speech Recognition Grammar Specification (SRGS) – W3C (iv) Semantic Interpretation for Speech Recognition (SIRS) – W3C Speech Synthesis / Text to Speech O-C09 -India Tools O-C09 -India (i) Semi automatic tools for developing Speech Corpora (5 levels of annotation) - phoneme, 1. Festival and HMM based TTS for Hindi – CDAC,Noida syllabi, word, phrase and POS.:- Standard format. 2. Festival Framework based TTS for Tamil – Anna University (ii) Pronounciation dictionaries: In 12 Indian languages (user friendly displays) : CIIL / LDC - IL 3. Festival based TTS for Hindi for Nokia – IIIT, Hyderabad 4. Festival based TTS for Telugu for Bhrigus - IIIT, Hyderabad (iii) Algorithm for Automatic syllabification of Speech units – CDAC, Noida, Thiruvanantpuram 5. Festival based TTS for Domain specific applications—MSIT/KIIT 6. TTS voices in four languages: Telugu, Hindi, Kannada and Tamil – IIIT, Hyd – (DIT/MCIT). (iv) Processing of Laughter Speech – IIIT (Hyd.) 7. Vaachak: Concatinative TTS for Hindi, work going on for Indian English – followed for other (v) - Carnatic Music Information Retrieval System for Musical Characteristics, singers, Indian Language – SAPI Compliant – Prologix Software instruments, emotion, ragas, talam etc. - Anna Univ. - Screen reading facility TTS System – IIIT Hyderabad 8. Hindi Vani: TTS for Hindi based on Klatt’s format synthesizer – α version released – CEERI/DIT - Summerization (GOI) 9. Bangla Vani: Concatinative Synthesizer for Bangla and Nepali (ESNOLA Based) - CDAC Kolkata / DIT (GOI) 10. Subhasini – TTS for Malayalam : Based on diphonic concatenation – supports ISCII, ISFOC & UNICODE –CDAC, Thiruvanantpuram
no reviews yet
Please Login to review.