Solving Equations Pdf 176735

Partial capture of text on file.
                      Point to the Expression: Solving Algebraic Word Problems using the
                                           Expression-Pointer Transformer Model
                            BugeunKim             KyungSeoKi            DonggeonLee             GahgeneGweon
                                                 Department of Transdisciplinary Studies,
                                                         Seoul National University,
                                                          Seoul, Republic of Korea
                                     {cd4209,kskee88,lsw5835,ggweon}@snu.ac.kr
                                      Abstract                             Problem       One number is eight more than
                      Solving algebraic word problems has recently                       twice another and their sum is 20.
                      emerged as an important natural language pro-                     Whataretheir numbers?
                      cessing task. To solve algebraic word prob-          Numbers      1(‘one’), 8(‘eight’), 2(‘twice’), 20.
                      lems, recent studies suggested neural mod-           Equations     x −2x =8,x +x =20
                                                                                          0      1        0     1
                      els that generate solution equations by using        Answers      (16,4)
                     ‘Op (operator/operand)’ tokens as a unit of
                      input/output. However, such a neural model               Table 1: A sample algebraic word problem
                      suffered two issues: expression fragmentation
                      and operand-context separation.  To address
                      each of these two issues, we propose a pure         et al., 2018; Amini et al., 2019; Chiang and Chen,
                      neuralmodel,Expression-PointerTransformer           2019; Wang et al., 2019). However, suggested
                      (EPT), which uses (1) ‘Expression’ token and        neural models showed a fairly large performance
                      (2) operand-context pointers when generat-
                      ing solution equations. The performance of          gap compared to existing state-of-the-art models
                      the EPT model is tested on three datasets:          based on hand-crafted features in popular algebraic
                      ALG514, DRAW-1K, and MAWPS. Com-                    wordproblemdatasets,suchasALG514(44.5%for
                      pared to the state-of-the-art (SoTA) models,        pureneuralmodelvs. 83.0%forusinghand-crafted
                      the EPT model achieved a comparable perfor-         features) (Huangetal.,2018;UpadhyayandChang,
                      mance accuracy in each of the three datasets;       2016). To address the large performance gap in
                      81.3% on ALG514, 59.5% on DRAW-1K,                  this study, we propose a larger unit of input/output
                      and 84.5% on MAWPS. The contribution of
                      this paper is two-fold; (1) We propose a            (I/O) token called “Expressions” for a pure neural
                      pure neural model, EPT, which can address           model. Figure1illustratesconventionallyused“Op
                      the expression fragmentation and the operand-       (operator/operands)” versus our newly proposed
                      context separation. (2) The fully automatic        “Expression” token.
                      EPT model, which does not use hand-crafted            To improve the performance of pure neural
                      features, yields comparable performance to          models that can solve algebraic word problems, we
                      existing models using hand-crafted features,
                      and achieves better performance than existing       identiﬁed two issues that can be addressed using
                      pure neural models by at most 40%.                  Expression tokens, which are shown in Figure
                  1   Introduction                                        1: (1) expression fragmentation and (2) operand-
                                                                          context separation. First, the expression fragmen-
                  Solving algebraic word problems has recently            tation issue is a segmentation of an expression
                  become an important research task in that auto-         tree, which represents a computational structure
                  matically generating solution equations requires        of equations that are used to generate a solution.
                  understanding natural language. Table 1 shows           This issue arises when Op, rather than the whole
                  a sample algebraic word problem, along with             expression tree, is used as an input/output unit
                  corresponding solution equations that are used          of a problem-solving model.       For example, as
                  to generate answers for the problem. To solve           shown in Figure 1 (a), using Op tokens as an
                  such problems with deep learning technology,            input to a problem-solving model disassembles a
                  researchers recently suggested neural models that       tree structure into operators (“×”) and operands
                  generate solution equations automatically (Huang        (“x1” and “2”). Meanwhile, we propose using the
                                                                     3768
                        Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 3768–3779,
                                                             c
                                       November16–20,2020. 
2020Association for Computational Linguistics
                 Figure 1: Illustration using the word problem in Table 1 for the (a) expression fragmentation issue, (b) operand-
                 context separation issue, and (c) our solution for these two issues.
                 “Expression”(×(x1,2))token,whichcanexplicitly          two studies are presented. Section 5.1 presents a
                 capture a tree structure as a whole, as shown in       performancecomparisonbetweenEPTandexisting
                 Figure 1 (c).                                          SoTA models. Section 5.2 presents an ablation
                    Thesecondissue of operand-context separation        study examining the effects of Expression tokens
                 is the disconnection between an operand and            and applying operand-context pointers. Finally, in
                 a number that is associated with the operand.          Section 6, a conclusion is presented with possible
                 This issue arises when a problem-solving model         future directions for our work.
                 substitutes a number stated in an algebraic word
                 problem into an abstract symbol for generalization.    2   Related work
                 AsshowninFigure1(b),whenusinganOptoken,
                 the number 8 is changed into an abstract symbol        Our goal is to design a pure neural model that
                 ‘N1’. Meanwhile, when using an Expression token,       generates equations using ‘Expression’ tokens to
                 the number 8 is not transformed into a symbol.         solve algebraic word problems. Early attempts
                 Rather a pointer is made to the location where the     for solving algebraic word problems noted the
                 number8occurredinanalgebraic word problem.             importance of Expressions in building models with
                 Therefore, using such an “operand-context pointer”     hand-crafted features (Kushman et al., 2014; Roy
                 enables a model to access contextual information       et al., 2015; Roy and Roth, 2015; Zhou et al., 2015;
                 about the number directly, as shown in Figure 1 (c);   Upadhyay et al., 2016). However, recent neural
                 thus, the operand-context separation issue can be      models have only utilized ‘Op (operator/operand)’
                 addressed.                                             tokens (Wang et al., 2017; Amini et al., 2019;
                    In this paper, we propose a pure neural model       Chiang and Chen, 2019; Huang et al., 2018; Wang
                 called Expression-Pointer Transformer (EPT) to         et al., 2019), resulting in two issues: (1) the
                 address the two issues above. The contribution of      expression fragmentation issue and (2) the operand-
                 this paper is two-fold;                                context separation issue. In the remaining section,
                                                                        wepresent existing methods for tackling each of
                   1. Wepropose a pure neural model, Expression-        these two issues.
                      PointerTransformer(EPT),whichcanaddress             To address the expression fragmentation issue,
                      the expression fragmentation and operand-         researchers tried to reﬂect relational information
                      context separation issues.                        between operators and operands either by using a
                   2. The EPT model is the ﬁrst pure neural model       two-step procedure or a single step with sequence-
                      that showed comparable accuracy to the exist-     to-sequence models. Earlier attempts predicted
                      ing state-of-the-art models, which used hand-     operators and their operands by using a two-step
                      crafted features. Compared to the state-of-       procedure. Such early models selected operators
                      the-art pure neural models, the EPT achieves      ﬁrst by classifying a predeﬁned template (Kushman
                      better performance by about 40%.                  et al., 2014; Zhou et al., 2015; Upadhyay et al.,
                                                                        2016), then in the second step, operands were ap-
                    In the rest of the paper, we introduce existing     plied to the template selected in the ﬁrst step. Other
                 approaches to solve algebraic word problems            models selected operands ﬁrst before constructing
                 in Section 2.    Next, Section 3 introduces our        expression trees with operators in the second step
                 proposed model, EPT, and Section 4 reports the         (Royet al., 2015; Roy and Roth, 2015). However,
                 experimental settings. Then in Section 5, results of   such two-step procedures in these early attempts
                                                                   3769
                      Input    Output                    Expression token                     Meaning
                     position   index    Operator (f )   Operand 0 (a )    Operand 1 (a )
                                                    i                 i0                i1
                        0         -         BEGIN                                             (Start an equation)
                        1        R           VAR                                              (Generate variable x )
                                   0                                                                              0
                        2        R           VAR                                              (Generate variable x )
                                   1                                                                              1
                        3        R             ×                2                 R           2x
                                   2                                               1            1
                        4        R             −               R                  R           x −2x
                                   3                             0                 2           0      1
                        5        R             =               R                  8           x −2x =8
                                   4                             3                             0      1
                        6        R             +               R                  R           x +x
                                   5                             0                 1           0     1
                        7        R             =               R                  20          x +x =20
                                   6                             5                             0     1
                        -        R7          END                                              (Gather all equations)
                                 Table 2: The Expression token sequence for x − 2x = 8 and x +x = 20
                                                                           0     1          0    1
                 can be performed via a single-step procedure with    Amini et al., 2019). For example, Huang et al.
                 neural models. Speciﬁcally, recent attempts have     (2018) used a pointer-generator network that can
                 utilized sequence-to-sequence (seq2seq) models as    point to the context of a number in a given math
                 a single-step procedure to learn the implicit rela-  problem. Although Huang’s model can address the
                 tionship between operators and operands (Amini       operand-context separation issue using pointers,
                 et al., 2019; Chiang and Chen, 2019; Wang et al.,    their pure neural model did not yield a comparable
                 2019).   For example, to capture the operator-       performance to the state-of-the-art model using
                 operand relationship, Chiang and Chen (2019)         hand-crafted features (44.5% vs. 83.0%). In this
                 constructed a seq2seq model that used push/pop       paper, we propose that by including additional
                 actions on a stack for generating operator/operand   pointers that utilize the contextual information
                 tokens.  Similarly, Amini et al. (2019) built a      of operands and neighboring Expression tokens,
                 seq2seq model to generate an operator token right    performance of pure neural models can improve.
                 after producing required operand tokens. However,    3   EPT:Expression-Pointer Transformer
                 although these seq2seq approaches consider rela-
                 tional information of operands when generating       Figure 2 shows the proposed Expression-Pointer
                 operators, the approach still does not address       Transformer (EPT)1 model, which adopts the
                 the problem of lacking relation information of       encoder-decoder architecture of a Transformer
                 operators when generating operands. On the other     model (Vaswani et al., 2017). The EPT utilizes
                 hand, by using Expression token, our model can       the ALBERTmodel(Lanetal.,2019),apretrained
                 consider relational information when generating      language model, as the encoder. The encoder input
                 both operator and operands.                          is tokenized words of the given word problem, and
                   Secondly, there were efforts to address the        encoderoutputistheencoder’shidden-statevectors
                 operand-context separation issue. To utilize contex- that denote numeric contexts of the given problem.
                 tual information of an operand token, researchers       After obtaining the encoder’s hidden-state vec-
                 built hand-craftedfeaturesthatcapturethesemantic     tors from the ALBERT encoder, the transformer
                 content of a word, such as the unit of a given num-  decoder generates ‘Expression’ tokens. The two
                 ber (Roy and Roth, 2015; Koncel-Kedziorski et al.,   decoder inputs are Expression tokens and the
                 2015; Zhou et al., 2015; Upadhyay et al., 2016;      ALBERT encoder’s hidden-state vectors, which
                 Roy and Roth, 2017) or dependency relationship       are used as memories. For the given example
                 between numbers (Kushman et al., 2014; Zhou          problem, the input is a list of 8 Expression tokens
                 et al., 2015; Upadhyay et al., 2016). However,       shown in Table 2.     We included three special
                 devising hand-crafted input features was time-       commands in the list: VAR (generate a variable),
                 consuming and required domain expertise. There-      BEGIN (start an equation), and END (gather all
                 fore, recent approaches have employed distributed    equations). Following the order speciﬁed in the list
                 representations and neural models to learn numeric   of Table 2, the EPT receives one input Expression
                 context of operands automatically (Wang et al.,         1The code is available on https://github.com/
                 2017; Huang et al., 2018; Chiang and Chen, 2019;     snucclab/ept.
                                                                  3770
                 Figure 2: The architecture of Expression-Pointer Transformer (EPT) where two ideas applied: (1) Expression
                 token and (2) operand-context pointer.
                 at a time. For the ith Expression input, the model       The embedding vector a , which represents
                                                                                                    ij
                 computes an input vector vi. The EPT’s decoder        the jth operand of ith Expression, is calculated
                 then transforms this input vector to a decoder’s      differently according to the operand aij’s source.
                 hidden-state vector d . Finally, the EPT predicts     Toreﬂectcontextualinformationofoperands, three
                                       i
                 the next Expression token by generating the next      possible sources are utilized: problem-dependent
                 operator and operands simultaneously.                 numbers, problem-independent constants, and the
                    To produce ‘Expression’ tokens, two compo-         result of prior Expression tokens. First, problem-
                 nents are modiﬁed from the vanilla Transformer:       dependent numbers are numbers provided in an
                 input vector and output layer. In the following       algebraic problem (e.g., ‘20’ in Table 1).      To
                 subsections, we explain the two components.           compute a of a number, we reuse the encoder’s
                                                                                  ij
                                                                       hidden-state vectors corresponding to such number
                 3.1   Input vector of EPT’s decoder                   tokens as follows:
                 The input vector vi of ith Expression token is                  a =LN c u           +e ,            (3)
                 obtained by combining operator embedding f and                    ij      a   a num      aij
                                                              i
                 operand embedding a as follows:                       where u denotes a vector representing the source,
                                       ij                              and e   ∗is the encoder’s hidden-state vector cor-
                                                                             aij
                   v =FF (Concat(f ,a ,a ,···,a )), (1)                                               2
                    i       in           i  i1   i2      ip            responding to the number a . Second, problem-
                                                                                                    ij
                                                                       independent constants are predeﬁned numbers that
                 where FF indicates a feed-forward linear layer,       are not stated in the problem (e.g., 100 is often used
                           ∗
                 and Concat(·) means concatenation of all vectors      for percentiles). To compute aij of a constant, we
                 inside the parentheses. All the vectors, including    use a look-up table Ec as follows:
                 v , f , and a , have the same dimension D. For-
                   i  i       ij                                                a =LN (c u           +E(a )).          (4)
                 mulae for computing the two types of embedding                  ij      a   a const     c  ij
                 vectors, f and a   are stated in the next paragraph.
                          i      ij                                    Note that LN , c are shared across different
                    For the operator token f of ith Expression, the                   a   a
                                            i                          sources. Third, the result of the prior Expression
                 EPTcomputestheoperator embedding vector fi as         token is an Expression generated before the ith
                 in Vaswani et al. (2017)’s setting:                   Expression (e.g., R ). To compute a     of a result,
                                                                                           0                 ij
                                                                                                                    3
                                                                       weutilize the positional encoding as follows :
                           f =LN (c E (f )+PE(i)),              (2)
                            i      f   f f  i                                   a =LN (c u           +PE(k)),          (5)
                                                                                  ij      a   a expr
                 where E∗(·) indicates a look-up table for embed-         2Whentwoormoretokensformanumberintheproblem,
                 ding vectors, c denotes a scalar parameter, and       weaveraged all related hidden-state vectors.
                                ∗                                         3Since we want to sustain simultaneous decoding, which
                 LN (·) and PE(·) represent layer normalization
                     ∗                                                 is one of the strengths in the Transformer, we use PE(k) for
                 (Ba et al., 2016) and positional encoding (Vaswani    the kth prior Expression, although it is possible to use decoder
                 et al., 2017), respectively.                          hidden state dk.
                                                                   3771
The words contained in this file might help you see if this file matches what you are looking for:

...Point to the expression solving algebraic word problems using pointer transformer model bugeunkim kyungseoki donggeonlee gahgenegweon department of transdisciplinary studies seoul national university republic korea cd kskee lsw ggweon snu ac kr abstract problem one number is eight more than has recently twice another and their sum emerged as an important natural language pro whataretheir numbers cessing task solve prob lems recent suggested neural mod equations x els that generate solution by answers op operator operand tokens a unit input output however such table sample suffered two issues fragmentation context separation address each these we propose pure et al amini chiang chen neuralmodel pointertransformer wang ept which uses token models showed fairly large performance pointers when generat ing gap compared existing state art tested on three datasets based hand crafted features in popular alg draw k mawps com wordproblemdatasets suchasalg for pared sota pureneuralmodelvs forusin...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area