118x Filetype PDF File size 0.60 MB Source: www.ijsrcsams.com
ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies An Efficient English To Hindi Translator 1 2 3 4 Dhawal Jain , Aditi Jadhav , Ateeq Ansari , Aditi Raut 1,2,3,4(Department of Computer Engineering, St. John College Of Engineering & Management, Palghar, Maharashtra, India) 1 jaindhawal05@gmail.com,2 jadhavadi25@gmail.com,3ateeqnsr8@gmail.com,4 aditir@sjcet.co.in Abstract— Machine Translation pertains to a translation of one can translate English into several regional languages. Also, natural language to another by using automated computing. The several websites are in English, which are of no use to rural primary objective is to fill the language gap between two different people, as they do not know English, thus are unable to languages speaking people, communities or countries. India is a understand the information given on the site. Hence a translator multilingual country; different states have different territorial is needed which can convert English to Hindi which can be languages, but not all Indians are polyglots. There are 18 easily understood by the people. constitutional languages and ten prominent scripts. The majority of the Indians, especially the remote villagers, do not understand, II. LITERATURE REVIEW read or write English, therefore implementing an efficient The paper focuses on rule-based machine translation. It is language translator is needed. Machine translation systems that translate text from one language to another will enhance the based on corpus management and multilingual database. The enlightened society of Indians without any language barrier. system architecture comprises of the parser and morphological English, being a universal language and Hindi, the language used tools which analyses grammar of source language and then by the majority of Indians, we propose an English to Hindi transform it into the target language. The method suggested in machine translation system design based on Recurrent neural the paper [1] requires a deep understanding of the grammatical network(RNN), LSTM(Long short-term memory) and attention structure of both source and target language. mechanism. Statistical machine translation is done using statistics. The idea behind this comes from information theory. The Keywords— RNN, LSTM, Attention mechanism. translation Is done according to the probability distribution. I. INTRODUCTION The method suggested in the paper [2] uses Bayes decision rule Machine translation has been in the process of development and statistical theory to minimize errors. The approach since 1940. Machine translation has been in the process of discussed in this paper has a word alignment problem between growth since 1940. Machine Translation system translates text phrases and language modeling problem. or speech from one natural language to another language. [3] Hybrid mechanism, i.e., a combination of rule-based and Machine translation is needed to convert the document or text statistical based machine translation is used for conversion. The to our native language from other commonly known languages. architecture comprises of the splitter, parser, declension tagger, It overcomes the lingual barriers. NLP is the field of CS that sentence rules, reordering, lexical dictionary, and translator. In strives to fill this gap. Neural Machine Translation requires this paper, the source language is passed through splitter in minimum domain knowledge and is conceptually simple. A which sentence is divided into words, and then parser analyses vast neural network is trained and can generate very long word the syntax and semantic structure. Declension tagger inflects sequences. The model does explicitly store large phrase tables noun, adjective, pronoun to indicate singular, plural, case, and language models, unlike standard machine translation gender. Then the reordering is done and using lexical rule the system. The first successful demonstration of the MT system is source language is translated into a target language. done by the collaboration of Georgetown University and IBM The paper [4] is based on the neural machine translation. in the year 1965. The importance of Machine Translation arises Architecture discussed in this paper comprises the encoder, from the socio-political significance of translation in decoder, residual connection, etc. This approach is based on communities where more than one language is spoken. Besides, modeling the conditional probability of translating a source the concept of attention mechanism is used. sentence to the target sentence. This approach provides a more Hindi is a widely spoken language as well as the principal accurate translation official language of India, whereas English is spoken III. METHODOLOGY worldwide, hence is an internationally well-known language. From the British period, English as a verbal language was A. Architecture Diagram: introduced in India. Thus, both English & Hindi are major The System consist of the following modules: languages, both primarily used. Thus, there is a need to build a 1. Encoder-Decoder Model translator for converting one to another. Here we are going to 2. LSTM study English to Hindi translation. Presently awareness has 3. Attention Mechanism been developed in India to use regional languages like Hindi for government document writing and other purposes. In this context, it has become essential to creating an MT system that IJSRCSAMS Volume 8, Issue 1 (January 2019) www.ijsrcsams.com ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies architecture comprises of a memory cell, an input gate, an output gate, and a forget gate. Input Gate: The input gate is responsible for the addition of information to the cell state. Forget Gate: A forget gate is responsible for removing information from the cell state. Output Gate: Produces the output. Fig. 1. Architecture Diagram B. Encoder-Decoder Model: IT is a way of organizing recurrent neural networks(RNN) to tackle sequence-to-sequence projection issue where the count of input and output time steps differ. The model was build for the matter of machine translation, such as translating sentences in English to Hindi. The model involves two sub-models, as follows: Fig. 3. Long Short Term Memory Encoder: Encoder is an RNN model that reads the entire source D. Attention Mechanism: sequence to a fixed-length encoding. Decoder: Decoder is an RNN model that uses the encoded input The encoder-decoder model is an end-to-end model that sequence and decodes it to output the target sequence. performs well on challenging sequence-to-sequence prediction The figure shows the relationship between the encoder and the problems such as machine translation. The model appears to be decoder models. limited on very long sequences. The reason for this is the fixed- length encoding of the source sequence. Attention is a mechanism that provides a first encoding of the source sequence from which to build up a context vector which can then be used by the decoder. Attention mechanism allows the model to learn what encoded words in the source sentence pay attention to and to what degree during the forecast of each word in the target sentence. The hidden state for each input is assembled from encoder rather than the hidden state of the final Fig. 2. Encoder-Decoder Model step of the source sequence. A context vector is build up The LSTM recurrent neural network is used as the encoder especially for each output word in the target sentence. First, and decoder. The encoder output describes the source sequence, each hidden state value from the encoder is attained using a which is used to begin the converting process, trained on the neural network, and then it is normalized to a probability over words already produced as output so far. The hidden state of an the encoder's hidden states. Finally, the possibilities are used to encoder for the final time step of the input is used to start the determine a weighted sum of the encoder-hidden states to state of the decoder. produce a context vector to be used in the decoder. C. Long Short-term Memory: Long short-term memory units are units of a recurrent neural network. An RNN composed of LSTM units is often called an LSTM network. The cell remembers values over arbitrary time intervals, and the three gates regulate the flow of information into and out of the cell. There are several architectures of LSTM units. An LSTM cell takes input and stores it for some time, it is equivalent to applying the identity function is constant, when an LSTM network is trained with backpropagation through time, the gradient does not vanish. The activation function of the LSTM gate is often the logistic function. A typical Fig. 4. Attention Mechanism IJSRCSAMS Volume 8, Issue 1 (January 2019) www.ijsrcsams.com ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies D. Implementation: Other results are shown in the table below. The project is based on the conversion of English text to a Sr. Input(English) Output(Hindi) Hindi version. Input can be an English document or a text file, No. and after processing, we get the output as a Hindi text. 1 You're kidding! मज़ाक कर रह े हो! Training Phase: 2 Is there a cafe? In this phase, we have trained English- Hindi bilingual data यह़ााँ कै फे है क्य़ा? with an epoch= 300. The training data includes both English as well as its corresponding Hindi sentence and words. 3 Come if you can. अंदर आ ज़ाओ। Testing: In the testing phase, we tested various inputs, which were in 4 Make a better the form of pdf, doc, etc. After training the data with an epoch translation of the आप जजस व़ाक्य क़ा =300, we have achieved the accuracy of 90 to 95%. Most of sentence that you are अनुव़ाद कर रह े हैं, उस the input sentences are yielding a correct output. translating. Do not let translations into other ही क़ा अच्छी तरह से IV. RESULTS languages influence अनुव़ाद करें। दसू री We successfully tested our proposed framework with more you. भ़ाष़ाओं के अनुव़ादों से than twenty individual sentences having a different perspective. प्रभ़ाजवत न होने द।ें Following some examples illustrates the output for the given input: Graph in Fig 6. Shows the accuracy of the implemented system. On X- axix we have epoch and on y-axis we have accuracy. Fig 5. Input textbox V. CONCLUSIONS In this paper, we built an English to Hindi translator using RNN. We experimented with long short-term memory (LSTM) and attention mechanism. Using the attention mechanism and LSTM the correct translation to a target language is made possible. In this project, we have added a feature that we can directly upload a document that is to be translated so eventually it reduces the typing time. To make the translation process more efficient, new rules can be added to the system. ACKNOWLEDGMENT We thank our guide, Ms. Aditi Raut who has extended all valuable guidance and help through various stages for the development of the project. Her Valuable suggestions were of immense help throughout the project work. We convey our sincere regards to our respected principal Dr. Fig 6. Output Textbox G.V. Mulgund and Head of Department Dr. G.A. Walikar for their valuable support. IJSRCSAMS Volume 8, Issue 1 (January 2019) www.ijsrcsams.com ISSN 2319 – 1953 International Journal of Scientific Research in Computer Science Applications and Management Studies REFERENCES [1] Shachi Mall, Umesh Jaiswal. 2013. Developing a system for machine translation from Hindi to English. In 2013 4th International Conference on Computer and Communication Technology (ICCCT). [2] A. R. Babhulgaonkar, S. V. Bharad. 2017. Statistical Machine st Translation. In 2017 1 International Conference on Intelligent System and Information Management (ICISIM), October 5-6, 2017, Aurangabad, India. [3] Jayshree Nair, Amrutha Krishnan, Deetha R. 2017. An efficient English to Hindi machine translation using a hybrid mechanism. 2016 Intl. Conference on Advances in Computing, Communications, and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India. [4] Karthik Revanuru, Kaushik Turlapaty, and Shrisha Rao. 2017. Neural Machine Translation of Indian Languages. In Compute ’17:10th Annual ACM India Compute Conference, November 16–18-2017, Bhopal, India. [5] Brenda Reyes Ayala, Jiangping Chen,2017. A Machine Learning Approach to Evaluating Translation Quality, IEEE 2017. [6] Hybrid machine translation for English to Marathi: A research evaluation in Machine Translation, March 2016 [7] Kamala Kant Yadav, Dr. Umesh Chandra Jaiswal. A Survey Paper on Performance Improvement of Word Alignment in English to Hindi Translation System. In 2017 International Conference on Intelligent Computing and Control (I2C2) [8] Pankaj Kumar, Sheetal Srivastava, Monica Joshi. Syntax Directed Translator for English to Hindi Language. In 2015 IEEE International Conference on Research in Computational Intelligence and communication Networks. [9] Brian Sam Thomas, Rajat Dogra, Bhaskar Dixit, Aditi Raut. “Automatic Image and Video Colourisation using Deep Learning” 2018 International Conference on Smart City and Emerging Technology(ICSCET), Mumbai, 2018 IJSRCSAMS Volume 8, Issue 1 (January 2019) www.ijsrcsams.com
no reviews yet
Please Login to review.