130x Filetype PDF File size 0.40 MB Source: www.stroustrup.com
Open Pattern Matching for C++ Yuriy Solodkyy Gabriel Dos Reis Bjarne Stroustrup Texas A&MUniversity College Station, Texas, USA {yuriys,gdr,bs}@cse.tamu.edu Abstract 1.1 Summary Pattern matching is an abstraction mechanism that can greatly sim- We present functional-style pattern matching for C++ built as an plify source code. We present functional-style pattern matching for ISOC++11library. Our solution: C++ implemented as a library, called Mach71. All the patterns are • is open to the introduction of new patterns into the library, while user-definable, can be stored in variables, passed among functions, not making any assumptions about existing ones. and allow the use of class hierarchies. As an example, we imple- • is type safe: inappropriate applications of patterns to subjects ment commonpatterns used in functional languages. are compile-time errors. Our approach to pattern matching is based on compile-time • Makes patterns first-class citizens in the language (§3.1). composition of pattern objects through concepts. This is superior • is non-intrusive, so that it can be retroactively applied to exist- (in terms of performance and expressiveness) to approaches based ing types (§3.2). on run-time composition of polymorphic pattern objects. In partic- • provides a unified syntax for various encodings of extensible ular, our solution allows mapping functional code based on pattern hierarchical datatypes in C++. matching directly into C++ and produces code that is only a few • provides an alternative interpretation of the controversial n+k percent slower than hand-optimized C++ code. patterns (in line with that of constructor patterns), leaving the The library uses an efficient type switch construct, further ex- choice of exact semantics to the user (§3.3). tending it to multiple scrutinees and general patterns. We compare • supports a limited form of views (§3.4). the performance of pattern matching to that of double dispatch and • generalizes open type switch to multiple scrutinees and enables open multi-methods in C++. patterns in case clauses (§3.5). Categories and Subject Descriptors D.1.5 [Programming tech- • demonstratesthatcompile-timecompositionofpatternsthrough niques]:Object-orientedProgramming; D.3.3[ProgrammingLan- conceptsissuperiortorun-timecompositionofpatternsthrough guages]: Language Constructs and Features polymorphic interfaces in terms of performance, expressive- ness, and static type checking (§4.1). General Terms Languages, Design Our library sets a standard for the performance, extensibility, Keywords Pattern Matching, C++ brevity, clarity, and usefulness of any language solution for pat- tern matching. It provides full functionality, so we can experiment 1. Introduction with the use of pattern matching in C++ and compare it to existing alternatives. Our solution requires only current support of C++11 Pattern matching is an abstraction mechanism popularized by without any additional tool support. the functional programming community, most notably ML [12], OCaml [21], and Haskell [15], and recently adopted by several 2. Pattern Matching in C++ multi-paradigm and object-oriented programming languages such The object analyzed through pattern matching is commonly called as Scala [30], F# [7], and dialects of C++[22, 29]. The expressive the scrutinee or subject, while its static type is commonly called powerofpatternmatchinghasbeencitedasthenumberonereason the subject type. Consider for example the following definition of for choosing a functional language for a task [6, 25, 28]. factorial in Mach7: This paper presents functional-style pattern matching for C++. To allow experimentation and to be able to use production-quality int factorial(int n) { toolchains (in particular, compilers and optimizers), we imple- unsigned short m; mented our matching facilities as a C++ library. Match(n) { Case(0) return 1; 1 The library is available at http://parasol.tamu.edu/mach7/ Case(m) return m∗factorial(m−1); Case( ) throw std::invalid argument(”factorial”); Permission to make digital or hard copies of all or part of this work for personal or }} EndMatch classroom use is granted without fee provided that copies are not made or distributed The subject n is passed as an argument to the Match statement for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM and is then analyzed through Case clauses that list various pat- mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,orrepublish, terns. In the first-fit strategy typically adopted by functional lan- to post on servers or to redistribute to lists, requires prior specific permission and/or a guages, the matching proceeds in sequential order while the pat- fee. Request permissions from permissions@acm.org. terns guarding their respective clauses are rejected. Eventually, the GPCE’13, October 27–28, 2013, Indianapolis, Indiana, USA. statement guarded by the first accepted pattern is executed or the Copyright © 2013 ACM 978-1-4503-2373-4/13/10...$15.00. http://dx.doi.org/10.1145/2517208.2517222 control reaches the end of the Match statement. Thevalue0inthefirstcaseclauseisanexampleofavaluepat- This == is an example of a binary method: an operation that re- tern. It will match only when the subject n is 0. The variable m in quires both arguments to have the same type [3]. In each of the the second case clause is an example of a variable pattern. It will case clauses, we check that both subjects are of the same dynamic bind to any value that can be represented by its type. The name in type using a constructor pattern. We then decompose both subjects the last case clause refers to the common instance of the wildcard into components and compare them for equality with the help of a pattern.Value,variable,andwildcardpatternsaretypicallyreferred variable pattern and an equivalence combinator (‘+’) applied to it. to as primitive patterns. The list of primitive patterns is often ex- Theuseofanequivalencecombinatorturnsabindinguseofavari- tendedwithapredicatepattern(e.g.asseeninScheme[49]),which able pattern into a non-binding use of that variable’s current value allows the use of any unary predicate or nullary member-predicate as a value pattern. We chose to overload unary + because in C++ it as a pattern: e.g. Case(even) ... (assuming bool even(int);) or turns an l-value into an r-value, which has a similar semantics here. Case([](int m) { return mˆm−1; }) ... for λ-expressions. In general, a pattern combinator is an operation on patterns The predicate pattern is a use of a predicate as a pattern and to produce a new pattern. Other typical pattern combinators, sup- should not be confused with a guard, which is a predicate attached ported by many languages, are conjunction, disjunction and nega- to a pattern that may make use of the variables bound in it. The tion combinators, which all have an intuitive Boolean interpreta- result of the guard’s evaluation will determine whether the case tion. We add a few non-standard combinators to Mach7 that reflect clause and the body associated with it will be accepted or rejected. the specifics of C++, e.g. the presence of pointers and references. Guards gives rise to guard patterns, which in Mach7 are expres- The equality operator on λ-terms demonstrates both nesting of sions of the form P|=E, where P is a pattern and E is its guard. patterns and relational matching. The variable pattern was nested Pattern matching is closely related to algebraic data types. within an equivalence pattern, which in turn was nested inside In ML and Haskell, an Algebraic Data Type is a data type each a constructor pattern. The matching was also relational because of whose values are picked from a disjoint sum of data types, we could relate the state of two subjects. Both aspects are even called variants. Each variant is a product type marked with a better demonstrated in the following well-known functional so- unique symbolic constant called a constructor. Each construc- lution to balancing red-black trees with pattern matching due to tor provides a convenient way of creating a value of its vari- Chris Okasaki [32, §3.3] implemented in Mach7: ant type as well as discriminating among variants through pat- class T{enum color{black,red} col; T∗ left; K key; T∗ right;}; tern matching. In particular, given an algebraic data type D = T∗ balance(T::color clr, T∗ left, const K& key, T∗ right) { C T ,...,T S⋯SC T ,...,T anexpressionoftheform 1 11 1m1 k k1 kmk const T::color B = T::black, R = T::red; Cx ,...,x inanon-pattern-matchingcontextiscalledavalue i 1 mi var⟨T∗⟩ a, b, c, d; var⟨K&⟩ x, y, z; T::color col; constructor andreferstoavalueoftypeD createdviatheconstruc- tor C and its arguments x ,...,x . The same expression in the Match(clr, left, key, right) { i 1 mi Case(B, C⟨T⟩(R, C⟨T⟩(R, a, x, b), y, c), z, d) ... pattern-matching context is called a constructor pattern and is used Case(B, C⟨T⟩(R, a, x, C⟨T⟩(R, b, y, c)), z, d) ... to check whether the subject is of type D and was created with the Case(B, a, x, C⟨T⟩(R, C⟨T⟩(R, b, y, c), z, d)) ... constructor Ci. If so, it matches the actual values it was constructed with against the nested patterns xj. Case(B, a, x, C⟨T⟩(R, b, y, C⟨T⟩(R, c, z, d))) ... C++ does not directly support algebraic data types. However, Case(col, a, x, b) return new T{col, a, x, b}; such types can be encoded in the language in a number of ways. } EndMatch Common object-oriented encodings employ an abstract class to } represent the algebraic data type and derived classes to represent The... in the first four case clauses above stands for variants. Consider for example the following representation of the return new T{R, new T{B,a,x,b}, y, new T{B,c,z,d}};. terms of the λ-calculus in C++: To demonstrate the openness of the library, we implemented struct Term { virtual ∼Term() {} }; numerous specialized patterns that often appear in practice and struct Var : Term { std:: string name; }; are even built into some languages. For example, the following struct Abs : Term { Var& var; Term& body; }; combination of regular-expression and one-of patterns can be used struct App : Term { Term& func; Term& arg; }; to recognize a toll-free phone number. C++ allows a class to have several constructors, but it does not rex(”([0−9]+)−([0−9]+)−([0−9]+)”,any({800,888,877}),n,m) allow overloading the meaning of construction for use in pat- tern matching. This is why in Mach7 we have to be slightly The regular-expression pattern takes a C++11 regular expression more explicit about constructor patterns, which take the form andanarbitrary numberofsub-patterns. It uses matching groups to C⟨T ⟩(P ,...,P ), where T is the name of the user-defined matchagainst the sub-patterns. A one-of pattern takes an initializer i 1 mi i type we are decomposing and P ,...,P are patterns that will list with a set of values and checks that the subject matches at least 1 mi be matched against members of T . ‘C’ was chosen to abbreviate one of them. The variables n and m are integers, and the values i “Constructor pattern” or “Case class” as its use resembles the use of the last two parts of the pattern will be assigned to them. The of case classes in Scala [30]. For example, we can write a com- parsing is generic and will work with any data type that can be read plete (with the exception of bindings discussed in §3.2) recursive from an input stream; this is a common idiom in C++. Should we implementation of testing for the equality of two lambda terms as: alsoneedtheexactareacode,wecanmixinavariablepatternusing bool operator==(const Term& left, const Term& right) { the conjunction combinator: a && any(...). var⟨const std:: string &⟩ s; var⟨const Term&⟩ x,y; 3. Implementation Match(left , right ) { Case(C⟨Var⟩(s) , C⟨Var⟩(+s) ) return true; The traditional object-oriented approach to implementing first- Case(C⟨Abs⟩(x,y), C⟨Abs⟩(+x,+y)) return true; class patterns is based on run-time compositions through inter- Case(C⟨App⟩(x,y), C⟨App⟩(+x,+y)) return true; faces. This “patterns as objects” approach has been explored in Otherwise() return false ; several different languages [11, 14, 34, 47]. Implementations differ } EndMatch in where bindings are stored and what is returned as a result, but in } its most basic form it consists of the pattern interface with a virtual function match that accepts a subject and returns whether it was template⟨typename P, typename S⟩ accepted or rejected. This approach is open to new patterns and using AcceptedType = P::accepted type for⟨S⟩::type; pattern combinators, but a mismatch in the type of the subject and The wildcard pattern defines accepted type for to be an identity the type accepted by the pattern can only be detected at run-time. function, while variable and value patterns define it to be their Furthermore, it implies significant run-time overhead (§4.1). underlying type. The constructor pattern’s accepted type is the type 3.1 Patterns as Expression Templates it decomposes, which is typically different from the subject type. PatternsinMach7arealsorepresentedasobjects;however,theyare Mach7 employs an efficient type switch [41] under the hood to composed at compile time, based on C++ concepts. Concept is the convert subject type to accepted type. C++ community’s long-established term for a set of requirements Guards, n+k patterns, the equivalence combinator, and po- for template parameters. Concepts were not included in C++11, tentially some new user-defined patterns depend on capturing but techniques for emulating them with enable if [18] have been in the structure (term) of lazily-evaluated expressions. All such use for a while. enable if provides the ability to include or exclude expressions are objects of some type E that must satisfy the certain class or function declarations from the compiler’s consid- LazyExpressionconstraint: eration based on conditions defined by arbitrary metafunctions. template ⟨typename E⟩ constexpr bool LazyExpression() { To avoid the verbosity of enable if, in this work we use the no- return Copyable⟨E⟩ // E must also be Copyable tation for template constraints – a simpler version of concepts [42]. &&is expression⟨E⟩::value // this is semantic constraint TheMach7 implementation emulates these constraints. &&requires (E e) { // syntactic requirements: There are two main constraints on which the entire library is ResultType⟨E⟩; // associated result type built: Pattern and LazyExpression. ResultType⟨E⟩ == { eval(e) };// eval(E)→ result type template ⟨typename P⟩ constexpr bool Pattern() { }; }ResultType⟨E⟩ { e }; // conversion to result type return Copyable⟨P⟩ // P must also be Copyable template⟨typename E⟩ using ResultType = E::result type; &&is pattern⟨P⟩::value // this is a semantic constraint &&requires (typename S, P p, S s) {// syntactic reqs: The constraint is, again, semantic, and the classes claiming to bool = { p(s) }; // usable as a predicate on S satisfy it must assert it through the is expression⟨E⟩ trait. The AcceptedType⟨P,S⟩; // has this type function template alias ResultType⟨E⟩ is defined to return the expres- }; } sion’s associated type result type, which defines the type of the ThePatternconstraintistheanalogofthepatterninterfacefrom result of a lazily-evaluated expression. Any class satisfying the the patterns as objects solution. Objects of any class P satisfying LazyExpression constraint must also provide an implementation this constraint are patterns and can be composed with any other of the function eval that evaluates the result of the expression. Con- patterns in the library as well as be used in the Match statement. version to the result type should call eval on the object in order Patterns can be passed as arguments of a function, so they must to allow the use of lazily-evaluated expressions in the contexts be Copyable. Implementation of pattern combinators requires the wheretheir eagerly-evaluated value is expected, e.g. a non-pattern- library to overload certain operators on all the types satisfying the matching context of the right-hand side of the Case clause. Patternconstraint.Toavoidoverloadingtheseoperatorsfortypes Our implementation of the variable pattern var⟨T⟩ satisfies the that satisfy the requirements accidentally, the Patternconstraintis PatternandLazyExpressionconstraints as follows: a semantic constraint, which means that classes claiming to satisfy template ⟨Regular T⟩ struct var { it have to state that explicitly by specializing the is pattern⟨P⟩ template ⟨typename⟩ trait. The constraint also introduces some syntactic requirements, struct accepted type for { typedef T type; }; described by the requires clause. In particular, because patterns bool operator()(const T& t) const // exact match are predicates on their subject type, they require presence of an { m value = t; return true; } application operator that checks whether a pattern matches a given template ⟨Regular S⟩ subject. Unlike the patterns as objects approach, the Pattern bool operator()(const S& s) const // with conversion constraint does not impose any restrictions on the subject type S. { m value = s; return m value == s; } Patterns like the wildcard pattern will leave the S type completely typedef T result type; // type when used in expression unrestricted, while other patterns may require it to satisfy certain friend const result type& eval(const var& v) // eager eval constraints, model a given concept, inherit from a certain type, etc. { return v.m value; } The application operator will typically return a value of type bool operator result type() const { return eval(∗this); } indicating whether the pattern is accepted on a given subject or mutable T m value; // value bound during matching rejected. }; Most of the patterns are applicable only to subjects of a given template⟨Regular T⟩struct is pattern⟨var⟨T⟩⟩:true type{}; expected type or types convertible to it. This is the case, for ex- template⟨Regular T⟩struct is expression⟨var⟨T⟩⟩:true type{}; ample, with value and variable patterns, where the expected type is the type of the underlying value, as well as with the construc- For semantic or efficiency reasons a pattern may have several over- tor pattern, where the expected type is the type being decomposed. loads of the application operator. In the example, the first alter- Some patterns, however, do not have a single expected type and native is used when no conversion is required; thus, the variable may work with subjects of many unrelated types. A wildcard pat- pattern is guaranteed to be accepted. The second may involve tern, for example, can accept values of any type without involv- a (possibly-narrowing) conversion, which is why we check that ing a conversion. To account for this, the Pattern constraint re- the values compare as equal after assignment. Similarly, for type quires the presence of a type alias AcceptedType, which given a checking reasons, accepted type for may (and typically will) pro- pattern of type P and a subject of type S returns an expected type vide several partial or full specializations to limit the set of accept- AcceptedType⟨P,S⟩ that will accept subjects of type S with no or able subjects. For example, the address combinator can only be ap- a minimumofconversions. By default, the alias is defined in terms plied to subjects of pointer types, so its implementation will report of a nested type function accepted type for, as follows: a compile-time error when applied to any non-pointer type. To capture the structure of an expression, the library employs a given class hierarchy (e.g. by its designer) and can be reused ev- a commonly-used technique called “expression templates” [45, erywhere. This is also true for parameterized classes (§3.4). 46]. In general, an expression template is an algebraic structure 3.3 Algebraic Decomposition ⟨Σζ,f1,f2,...⟩ defined over the set Σζ = τ S τ ⊧ ζ of all the types τ modeling a given concept ζ. The operations fi allow one to Traditional approaches to generalizing n+k patterns treat match- compose new types modeling the concept ζ out of existing types. ing a pattern fx,y against a value r as solving an equation In this sense, the types of all lazy expressions in Mach7 stem from fx,y = r [33]. This interpretation is well-defined when there a set of a few (possibly-parameterized) basic types like var⟨T⟩ and are zero or one solutions, but alternative interpretations are possi- value⟨T⟩ (which both model LazyExpression) by applying type ble when there are multiple solutions. Instead of discussing which functors like plus and minus to them. Every type in the resulting interpretation is the most general or appropriate, we look at n+k family then has a function eval defined on it that returns a value patterns as a notational decomposition of mathematical objects. of the associated type result type. Similarly, the types of all the The elements of the notation are associated with sub-components patterns stem from a set of a few (possibly-parameterized) patterns of the matched mathematical entity, which effectively lets us de- like wildcard, var⟨T⟩, value⟨T⟩,C⟨T⟩etc.byapplyingtothempat- compose it into parts. The structure of the expression tree used in tern combinators such as conjunction, disjunction, equivalence, the notation is an analog of a constructor symbol in structural de- addressetc. The user is allowed to extend both algebras with either composition, while its leaves are placeholders for parameters to basic expressions and patterns or with functors and combinators. be matched against or inferred from the mathematical object in The sets ΣLazyExpression and ΣPattern have a non-empty question. In essence, algebraic decomposition is to mathematical intersection, which slightly complicates matters. The basic types objects what structural decomposition is to algebraic data types. var⟨T⟩ and value⟨T⟩ belong to both of those sets, and so do some While the analogy is somewhat ad-hoc, it resembles the situation of the combinators, e.g. conjunction. Since we can only have with operator overloading: you do not strictly need it, but it is so one overloaded operator&& for a given combination of argument convenient it is virtually impossible not to have it. We demonstrate types, we have to state conditionally whether the requirements of this alternative interpretation of the n+k patterns with examples. Pattern,LazyExpression,orbotharesatisfiedinagiveninstan- • Anexpression n~misoftenusedtodecomposearationalnum- tiation of conjunction⟨T ,T ⟩, depending on what combination of 1 2 ber into numerator and denominator. these concepts the argument types T and T model. Concepts, un- • 1 2 An expression of the form 3q + r can be used to obtain the like interfaces, allow modeling such behavior without multiplying quotient and remainder of dividing by 3. When r is a constant, implementations or introducing dependencies. it can also be used to check membership in a congruence class. 3.2 Structural Decomposition • The Euler notation a + bi, with i being the imaginary unit, is used to decompose a complex number into real and imaginary Mach7’sconstructor patterns C⟨T⟩(P ,...,P ) requires the library iφ 1 n parts. Similarly, expressions rcosφ+isinφ and re are used to know which member of class T should be used as the subject to decompose it into polar form. to P , which should be matched against P , etc. In functional lan- • A 2D line can be decomposed with the slope-intercept form 1 2 guages supporting algebraic data types, such decomposition is un- mX+c,the linear equation form aX + bY = c, or the two- ambiguous as each variant has only one constructor, which is thus points form Y −y x −x = y −y X −x . 0 1 0 1 0 0 also used as a deconstructor [2, 13] to define the decomposition • An object representing a polynomial can be decomposed for a of that type through pattern matching. In C++, a class may have specific degree: a , a X1 +a , a X2 +a X1 +a , etc. 0 1 0 2 1 0 several constructors, so we must be explicit about a class’ decom- • An element of a vector space can be decomposed along some position. We specify that by specializing the library template class sub-spaces of interest. For example a 2D vector can be matched bindings. Here are the definitions that are required in order to be against 0,0, aX, bY , or aX+bY to separate the general case able to decompose the lambda terms we introduced in §2: from cases when one or both components of the vector are 0. template⟨⟩class bindings⟨Var⟩{Members(Var::name);}; The expressions i, X, and Y in those examples are not variables, template⟨⟩class bindings⟨Abs⟩{Members(Abs::var,Abs::body);}; but rather are named constants of some dedicated type that allows template⟨⟩class bindings⟨App⟩{Members(App::func,App::arg);}; the expression to be generically decomposed into orthogonal parts. Thelinear equation and two-point forms for decomposing lines The variadic macro Members simply expands each of its argu- already include an equality sign, so it is hard to give them seman- mentsintothefollowingdefinition,demonstratedhereonApp::func: tics in an equational approach. In our library that equality sign is static decltype(&App::func) member1(){return &App::func;} not different from any other operator, like + or ∗, and is only used to capture the structure of the expression, while the exact semantics Each such function returns a pointer-to-member that should be of matching against that expression is given by the user. This flexi- boundinpositioni. The library applies them to the subject in order bility allows us to generically encode many of the interesting cases to obtain subjects for the sub-patterns P ,...,P . Note that binding of the equational approach. The following example, written with 1 n use of Mach7, defines a function for fast computation of Fibonacci definitions made this way are non-intrusive since the original class numbers by using generalized n+k patterns: definition is not touched. The binding definitions also respect en- capsulation since only the public members of the target type will int fib(int n) { be accessible from within a specialization of bindings. Members var⟨int⟩ m; do not have to be data members only, which can be inaccessible, Match(n) { but any of the following three categories: Case(any({1,2})) return 1; • a data member of the target type T Case(2∗m) return sqr(fib(m+1)) − sqr(fib(m−1)); • a nullary member function of the target type T Case(2∗m+1) return sqr(fib(m+1)) + sqr(fib(m)); • a unary external function taking the target type T by pointer, } EndMatch // sqr(x) = x∗x reference, or value. } Unfortunately, C++ does not yet provide sufficient compile-time introspection capabilities to let the library generate bindings im- The Mach7 library already takes care of capturing the structure plicitly. These bindings, however, only need to be written once for of lazy expressions (i.e. terms). To implement the semantics of
no reviews yet
Please Login to review.