jagomart
digital resources
picture1_Programming Pdf 183925 | Programming Perls


 136x       Filetype PDF       File size 0.44 MB       Source: frank.itlab.us


File: Programming Pdf 183925 | Programming Perls
programming by an bentley pearls self describing data you just spent three cpu hours running a simula title efficient string matching tion to forecast your company s financial future and ...

icon picture PDF Filetype PDF | Posted on 01 Feb 2023 | 2 years ago
Partial capture of text on file.
                                                                                                 programming 
             by ]an Bentley                                                                      pearls 
             SELF-DESCRIBING DATA 
             You  just  spent  three  CPU hours  running  a simula-                               %title           Efficient          string       matching: 
             tion  to forecast  your  company’s  financial                 future,  and                            an  aid      to    bibliographic             search 
             your  boss asks you  to interpret  the  output:                                      %journal         Communications              of   the     ACM 
                                                                                                   Xvolume          18 
             Scenario         1:       3.2%     -12.0%           1.1%                              %number         6 
             Scenario         2:      12.7%         0.0%         8.6%                              %month          June 
             Scenario         3:       1.6%       -8.3%          9.2%                              %year            1975 
                                                                                                   %pages           333-340 
             Hmmm.                                                                                Blank  lines  separate  entries  in  the  file.  A  line  that 
                 You  dig  through  the  program  to find  the  meaning                           begins  with  a percent  sign  contains  an identifying 
             of each output  variable.  Good news-Scenario                         2              term  followed  by  arbitrary  text.  Text  may  be con- 
             paints  a rosy  picture  for  the  next  fiscal  year.  Now  all                     tinued  on subsequent  lines  that  do not  start  with 
             you  have  to do is uncover  the  assumptions  of each.                              a percent  sign. 
             Oops-the         disaster  in  Scenario  1 is your  company’s                            The  lines  in  the  bibliography           file  are name-value 
             current  strategy,  doomed  to failure.  What  did  Sce-                             pairs:  each  line  contains  the  name  of an attribute 
             nario  2 do that  was so effective?  Back to the  pro-                               followed  by  its  value.  The  names  and  the  values  are 
             gram, trying  to discover  which  input  files  each  one                            sufficiently      self-describing       that  I  don’t  need  to elabo- 
             reads. . . .                                                                          rate further  on them.  This  format  is particularly                   well 
                 Every  programmer  knows  the  frustration                   of trying           suited  to bibliographies           and  other  complex  data 
             to decipher  mysterious  data. The  first  two  sections  of                         models.  It  supports  missing  attributes  (books have  no 
             this  column  discuss  two  techniques  for  embedding                               volume  number  and journals  have  no city),  multiple 
             descriptions  in  data files.  The  third  section  then                             attributes  (such  as authors),  and  an arbitrary  order  of 
             applies  both  methods  to a concrete  problem.                                      fields  (one need  not  remember  whether  the  volume 
                                                                                                  number  comes before  or after  the  month). 
             Name-Value  Pairs                                                                        Name-value  pairs  are useful  in  many  databases. 
             Many  document  production  systems support  biblio-                                 One might,  for instance,  describe  the  aircraft  carrier 
             graphic  references  in  a form  something  like:                                    USS Nimitz  in  a database of naval  vessels with  these 
                                                                                                  pairs: 
             Xtitle           The  Art       of    Computer        Programming,                    name                    Nimitz 
                              Volume       3:    Sorting        and  Searching                     class                   CVN 
              Xauthor         D.  E.  Knuth                                                        number                  68 
              %pub            Addison-Wesley                                                       displacement            81600 
              %city           Reading,         Mass.                                               length                  1040 
              %year            1973                                                                beam                    134 
              Xauthor         A.    V.   Aho                                                       draft                   36.5 
              %author         M.  J.     Corasick                                                  flightdeck              252 
                                                                                                   speed                   30 
                                                                                                   officers                447 
              0  1967  ACM  OOOl-0782/87/0600-0479   75a                                           enlisted                5176 
             June 1987  Volume  30  Number  6                                                                                                   Communications  of the ACM              479 
               Programming  Pearls 
                           Such  a record  could  be used for  input,  storage, and                                           format.  The  regular  structure  supports  the  tabular 
                           output.  A  user  could  prepare  a record  for  entry  into                                       format,  but  observe  that  the  header  line  is another 
                           the  database using  a standard  text  editor.  The  data-                                         kind  of self-description                embedded  in  the  data. 
                           base system  could  store records  in  exadtly  this  form                                             Name-value  pairs  are a handy  way  to give  input 
                           (we’ll  soon see a representation                      that  is more  space                        to any  program.  (They  are perhaps  the  tiniest  of the 
                           efficient).  The  same record  could  be included  in  the                                         “little    languages”  described  in  the  September  1986 
                           answer  to the  query  “What  ships  have  a displace-                                             column.)  They  can help  meet the  criteria  that 
                           ment  of more  than  75,000  tons?”                                                                Kernighan  and Plauger  propose  in  Chapter  5 of their 
                               Name-value  pairs  offer  several  advantages  for this                                        Elements of Progrumming Style (2nd  ed., McGraw-Hill, 
                           hypothetical           application.         A single  format  can be                               1978): 
                           used for  reading,  storing,  and  writing                       records;  this                       Use mnemonic  input  and output.  Make  input  easy to 
                           simplifies  life  for  user  and  implementer                        alike.  The                      prepare  (and to prepare  correctly).  Echo the  input  and 
                           application         is inherently          variable-format             because                        any  defaults  onto  the  output;  make  the  outpui  self- 
                           different  ships  have  different  attributes:  submarines                                            explanatory. 
                           have  no flight  decks  and  aircraft  carriers  have  no                                              Name-value  pairs  are useful  in  code far removed 
                           submerged  depth.  Unfortunately,                         the  example  does                       from  input/output.               Suppose we  wanted  to provide  a 
                           not  document  the  units  in  which  the  various  quanti-                                        subroutine  that  adds a ship  to a database. Most  lan- 
                           ties  are expressed;  we’ll  return  to that  shortly.                                             guages denote  the  (formal)  name  of a parameter  by 
                               Some database systems store records  on mass                                                   its  position  in  the  parameter  list.  This  mechanism 
                           memory  in  exactly  the  form  shown  above.  This  for-                                          leads to remarkably  clumsy  calls: 
                           mat  makes it  particularly                 easy to add new  fields  to 
                           records  in  an existing  database. The  name-value  for- 
                           mat  can be quite  space efficient,  especially  com-                                              addship("Nimitz",                      "CVN",         "68", 
                           pared  to fixed-format               records  that  have  many  fields,                                             81600,          1040,        134,      36.5, 
                           most of which  are usually  empty.  If storage is criti-                                                            447,       5176,,,30,,,252,,,,) 
                           cal,  however,  then  the  database  could  be squeezed 
                           to a compressed  format:                                                                           The  missing  parameters  denote  fields  not  present  in 
                                                                                                                              this  record.  Is 30  the  speed in  knots  or the  draft  in 
                                                                                                                              feet? A  little  discipline             in  commenting  helps  unravel 
                                                                                                                              the  mess: 
                          Each field  begins  with  a two-character                           name and                        addshipc         "Nimitz",               #  name 
                          ends with  a vertical  bar.  The  input  and  the  stored                                                            " CVN"  ,               #  class 
                          formats  are connected  by  a data dictionary,                              which                                    "68"     ,              #  number 
                          might  start:                                                                                                        81600,                  #  disp 
                                                                                                                                                1040,                  #  length 
                          ABBR  NAME                             UNITS                                                                              .  . . 1 
                          na         name                        text 
                          cl         class                       text                                                         Many  programming  languages  support  named 
                          nu         number                      text                                                         parameters,  which  make  the  job  easier: 
                          di         displacement                tons 
                           le        length                      feet 
                          be         beam                        feet                                                        addshipcname                 =  "Nimitz", 
                          dr         draft                       feet                                                                          class        =  "CVN", 
                          fl         flightdeck                  feet                                                                          number         =  "68", 
                          sP         speed                       knots                                                                         disp       =  81600, 
                          of         officers                    personnel                                                                     length         =  1040, 
                          en         enlisted                    personnel                                                                          . . . 1 
                          In  this  dictionary           the  abbreviations             are always  the 
                          first  two  characters  of the  name;  that  may  not  hold                                        Even  if  your  language  doesn’t  have  named  param- 
                          in  general.  Readers offended  by  hypocrisy  may  com-                                           eters, you  can  usually  simulate  them  with  a few 
                          plain  that  the  above  data is not  in  a name-value                                             routines: 
              480         Communications  of  the ACM                                                                                                                    lune  1987       Volume  30  Number  6 
                                                                                                                                                                                                                                                                               Programming  Pearls 
                    shipstart                                                                                                                                           “When  looking  at the  result  of a computation                                                                 like 
                    ship&r                (name,               "Nimitz")                                                                                          this,  it  is  helpful                    to have  an ‘audit  trail’  of the  vari- 
                    shipstr               (class,                 "CVN"  )                                                                                        ous command  lines  and  data files  encountered.  We 
                    shipstr               (number,                    "68"        )                                                                               therefore  built  a mechanism  for  ‘commenting’  the 
                    shipnum(disp,                              81600)                                                                                             files  with  various  annotations  so that  when  we  re- 
                    shipnum(length,                                   1040)                                                                                       view  the  output,  everything                                        is there  on  one  display 
                           . . .                                                                                                                                  or piece  of paper. 
                    shipendc  )                                                                                                                                         “We use several  types  of comments.  An  ‘audit 
                                                                                                                                                                  trail’  line  identifies                        a data file  or a command-line 
                    The  name variables  name,  class,                                                     number,  etc., are                                     transformation.                        A ‘dictionary’                      line  names the  attri- 
                    assigned unique  integers.                                                                                                                    butes  in  each  column  of the  output.  A  ‘frame  separa- 
                                                                                                                                                                  tor’  sets apart  a group  of sequential  records  associ- 
                                                                                                                                                                  ated with  a common  event.  A  ‘note’  allows  us to 
                    Provenances  in  Programming                                                                                                                  place  our  remarks  in  the  file.  All  comments  begin 
                    The  provenance  of a museum  piece  lists  the  origin                                                                                       with  an exclamation                                mark  and the  type  of the  com- 
                    or source  of the  object.  Antiques  are worth  more                                                                                         ment;  other  lines  are passed through  untouched  and 
                    when  they  have  a provenance  (this  chair  was built                                                                                       processed as data. Thus  the  output  of the  above 
                    in  such-and-such,                            then  purchased  by  so-and-so,                                                                 pipeline  might  look  like: 
                    etc.). You  might  think  of a provenance  as a pedigree 
                    for  a nonliving                      object. 
                          The  idea  of a provenance  is old  hat  to many  pro-                                                                                  ltrail              sim.events                        -k        1.5         -1       3 
                    grammers.  Some software  shops insist  that  the                                                                                             ltrail              sample                -t        .Ol 
                    provenance  of a program  be kept  in  the  source  code                                                                                      ltrail              bins            -t        .Ol 
                    itself:  in  addition  to other  documentation                                                         in  a mod-                             Idict            bin-bottom-value                                     item-count 
                    ule,  the  provenance  gives  the  history  of the  code                                                                                     0.00                     72 
                    (who  changed  what  when,  and why).  The  prove-                                                                                           0.01                     138 
                    nance  of a data file  is often  kept  in  an associated  file                                                                               0.02                     121 
                    (a transaction  log, for  instance).  Frank  Starmer,  a                                                                                               .  .  . 
                    Professor in  the  Departments  of Computer  Science                                                                                          lnote            there              is       a  cluster                     around               0.75 
                    and Medicine  at Duke  University,                                                   tells  how  his  pro-                                    lframe 
                    grams produce  data files  that  contain  their  own 
                    provenances: 
                          “We constantly  face the  problem  of keeping  track                                                                                    All  programs  in  our  library  automatically                                                         copy  exist- 
                    of our  manipulations                                of data. We typically                               explore                              ing  comments  from  their  input  onto  their  output, 
                    data sets by  setting  up  a UNIX@ pipeline  like                                                                                             and additionally                          add a new  t r a i  1 comment  to doc- 
                                                                                                                                                                  ument  their  own  action.  Programs  that  reformat  data 
                    sim.events                        -k        1.5        -1  3            I                                                                     (such  as bins)                       add a diet                    comment  to describe  the 
                    sample                -t        .Ol         I                                                                                                 new  format. 
                    bins                                                                                                                                               “We’ve  done  this  in  order  to survive.  This  disci- 
                                                                                                                                                                  pline  aids in  making  both  input  and  output  data files 
                    The  first  program  is a simulation                                              with  the  two  pa-                                         self-documenting.                            Many  other  people  have  built 
                    rameters  k and  1 (set in  this  example  to 1.5 and  3)’                                                                                    similar  mechanisms;  wherever  possible,  I have  cop- 
                    The  vertical  bar  at the  end  of the  first  line  pipes  the                                                                              ied  their  enhancements  rather  than  having  to figure 
                    output  into  the  second  program.  That  program  sam-                                                                                      out  new  ones myself.” 
                    ples the  data at the  designated  frequency,  and  in                                                                                             Tom Duff  of Bell  Labs uses a similar  strategy  in  a 
                    turn  pipes  its  output  to the  third  program,  which                                                                                      system for  processing  pictures.  He has developed  a 
                    chops the  input  into  bins  (suitable  for  graphical                                                                                       large  suite  of UNIX  programs  that  perform  transfor- 
                    display  as a histogram).                                                                                                                     mations  on pictures.  A picture  file  consists  of text 
                                                                                                                                                                  lines  listing  the  commands  that  made  the  picture 
                                                                                                                                                                  (terminated  by  a blank  line)  followed  by  the  picture 
                                                                                                                                                                  itself  (represented  in  binary).  The  prelude  provides  a 
                    UNIX  is a registered  trademark  of AT&T  Bell  Laboratories.                                                                               provenance  of the  picture.  Before  Duff  started  this 
                    ’  Note  that  the  two  parameters  are set by a simple  name-value  mechanism                                                              practice  he would  sometimes  find  himself  with  a 
                    -/.    B.                                                                                                                                    wonderful                  picture  and  no idea  of what  transforma- 
                    ]une  1987  Volume  30  Number  6                                                                                                                                                                                        Communications  of the ACM                                        481 
             Programming  Pearls 
                      tions  produced  it;  now  he can  reconstruct  any                            camps            4772 
                      picture  from  its provenance.                                                 swaps            4676 
                         Duff  implements  the  provenances  in  a single                            cpu              0.1083 
                      library   routine  that  all  programs  call  as they  begin 
                      execution.  The  routine  copies  the  old  command                               Given  the  sort  routines  and  other  procedures  to do 
                      lines  to the  output  and  then  writes  the  command                         the  real  work,  the  control  program  is easy to build. 
                      line  of the  current  program.                                                Its  main  loop  is sketched  in  Program  1: the  code 
                                                                                                     reads each input  line,  copies it  to the  output,  and 
                      A Sorting  Lab                                                                 processes the  name-value  pair.  The  s imul.ate                (  ) 
                      To make the  above ideas more  concrete,  we’ll  apply                         routine  performs  the  experiment  and  writes  the  out- 
                      them  to a problem  described  in  the  July  1985 col-                        put  variables  in  name-value  format;  it  is called  at 
                      umn:  building      “scaffolding”      to  experiment  with  sort              each blank  line  and  also at the  end  of the  file. 
                      routines.  That  column  contains  code for  several  sort 
                      routines  and  a few  small  programs  to exercise  them. 
                      This  section  will  sketch  a better  interface  to the  rou-                          loop 
                      tines.  The  input  and  output  are both  expressed  in                                      read     input     line     into     string      S 
                      name-value  pairs,  and  the  output  contains  a com-                                         if   end  of  file       then     break 
                      plete  description      of the  input  (its provenance).                                       if   S  =  *I II  then   simulated) 
                         Experiments  on sorting  algorithms  involve  adjust-                                      write      S  on  output 
                      ing  various  parameters,  executing  the  specified  rou-                                     Fl   =  first     field      in   S 
                      tine,  then  reporting  key  attributes  of the  computa-                                      F2  =  second       field      in   S 
                      tion.  The  precise  operations  to be performed  can be                                       if   Fl=    "n"   then 
                      specified  by  a sequence  of name-value  pairs.  Thus                                               N  =  F2 
                      the  input  file  to the  sorting  lab  might  look  like  this:                               else    if   Fl    =  "alg"     then 
                                                                                                                            if   F2  =  "insert"         then 
                      n                 100000                                                                                    alg    =  1 
                      input            identical                                                                            else    if   F2  =  "heap"        then 
                      alg              quick                                                                                      alg    =  2 
                      cutoff            10                                                                                  else    if   F2  =  "quick"         then 
                      partition        random                                                                                     alg    =  3 
                      seed             379                                                                                  else    error("bad         alg") 
                                                                                                                     else    if   Fl    =  "input"       then 
                                                                                                                            .  . . 
                      In  this  example  the  problem  size,  n, is set to  100,000.                          simulatet       1 
                      The  input  array  is initialized        with  identical 
                      elements  (other  options  might  include  random,                                      PROGRAM 1.  A Loop to Process Name-Value Pairs 
                      sorted,or        reversed        elements).     The  sortingalgo- 
                      rithm  in  this  experiment        is  quit   k  for  quicksort; 
                      insert      (for  insertionsort)     and  heap      (forheapsort) 
                      might  also be available.  The  cutoff              and  parti      - 
                      t  ion  names specify  further  parameters  in  quit               k -            This  structure  is useful  for  many  simulation             pro- 
                      sort.                                                                          grams. The  program  is easy to build  and  easy to use. 
                         The  input  to the  simulation         program  is a sequence               Its output  can  be read by  a human  and  can  also be 
                      of experiments  in  the  above  format,  separated  by                         fed to later  programs  for  statistical  analysis.  Because 
                      blank.  lines.  Its output  is in  exactly  the  same format                   all  the  input  variables  (which  together  provide  a 
                      of name-value  pairs,  separated  by  blank  lines.  The                       provenance  of the  experiment)  appear  with  the  out- 
                      first  part  of an output  record  contains  the  original                     put  variables,  any  particular        experiment  can be re- 
                      input  description,  which  gives  the  provenance  of                         peated  and studied  in  detail.  The  variable  format 
                      each  experiment.  The  input  is followed  by  three  ad-                     allows  additional       input  and  output  parameters  to 
                      ditional   attributes:    camps  records  the  number  of                      be added to future  simulations  without  having  to 
                      comparisons  made,  swaps  counts  swaps, and  cpu                             restructure  previous  data. Problem  8 shows  how  the 
                      records  the  run  time  of the  procedure.  Thus  an out-                     basic  structure  can be gracefully  extended  to per- 
                      put  record  might  end with  the  fields:                                     forming  sets of experiments. 
            482       Communications  of  the ACM                                                                                       June 1987  Volume  30  Number  6 
The words contained in this file might help you see if this file matches what you are looking for:

...Programming by an bentley pearls self describing data you just spent three cpu hours running a simula title efficient string matching tion to forecast your company s financial future and aid bibliographic search boss asks interpret the output journal communications of acm xvolume scenario number month june year pages hmmm blank lines separate entries in file line that dig through program find meaning begins with percent sign contains identifying each variable good news term followed arbitrary text may be con paints rosy picture for next fiscal now all tinued on subsequent do not start have is uncover assumptions oops disaster bibliography are name value current strategy doomed failure what did sce pairs attribute nario was so effective back pro its names values gram trying discover which input files one sufficiently i don t need elabo reads rate further them this format particularly well every programmer knows frustration suited bibliographies other complex decipher mysterious first tw...

no reviews yet
Please Login to review.