README for Perl script chunklink.pl

by Sabine Buchholz
ILK
Computational Linguistics
Tilburg University
The Netherlands
2/2/2000

Introduction

The Perl script chunklink.pl converts parsed sentences (e.g. Penn Treebank II files) into a common format containing (at least) the same information as the original files. The common format has a line of information for each word, which indicates the chunk to which it belongs, its function (for head words), relevant head word, grammatical structure hierarchy, and trace information (where relevant). The main purpose of distributing the script is to facilitate easy and standard handling of grammatical information. Currently, researchers have their own scripts for extracting the particular information of interest. Therefore, there may be variations in the definitions, or bugs introduced when one tries to reconstruct an extracted concept from the description in a paper. Moreover, various formats have different emphasis. Currently, the script processes input in Penn Treebank II format. While POS information is readily available for each word in this format, extracting chunk information and resolving links and traces is a demanding task. The chunklink script produces this information in a line-per-word format, with parameters which allow flexibility of the line structure and other details. We used the chunklink format as the basis for creating training and testing material for our cascaded shallow parser. This ensures that the definition of what constitutes a chunk is the same in both components of the parser, i.e. the chunker and the grammatical relations finder.

Example

Input

Consider the following Penn Treebank II sentence from wsj_0002.mrg:
( (S 
    (NP-SBJ-1 
      (NP (NNP Rudolph) (NNP Agnew) )
      (, ,) 
      (UCP 
        (ADJP 
          (NP (CD 55) (NNS years) )
          (JJ old) )
        (CC and) 
        (NP 
          (NP (JJ former) (NN chairman) )
          (PP (IN of) 
            (NP (NNP Consolidated) (NNP Gold) (NNP Fields) (NNP PLC) ))))
      (, ,) )
    (VP (VBD was) 
      (VP (VBN named) 
        (S 
          (NP-SBJ (-NONE- *-1) )
          (NP-PRD 
            (NP (DT a) (JJ nonexecutive) (NN director) )
            (PP (IN of) 
              (NP (DT this) (JJ British) (JJ industrial) (NN conglomerate) ))))))
    (. .) ))

Output

From command: chunklink.pl /cdrom/treebank/combined/wsj/00/wsj_0002.mrg | more

The first two lines of the output files describe the options. In our example only the default option settings were used. When writing code for processing the data, these lines should be consulted in order to determine the meaning of the fields and tags. Since the script is open for changes, each modification of the format should be reflected in these openning lines. That ensures both backward and forward compatibility of utilities based on this format.

#arguments: IOB tag: Begin, word numbering: file
#columns: file_id sent_id word_id iob_inner pos word function heads head_ids iob_chain trace-function trace-type trace-head_ids
 0002  1  0 B-NP    NNP   Rudolph         NOFUNC          Agnew             1 B-S/B-NP/B-NP
 0002  1  1 I-NP    NNP   Agnew           NP-SBJ          named            16 I-S/I-NP/I-NP         NP    *    16
 0002  1  2 O       COMMA COMMA           NOFUNC          Agnew             1 I-S/I-NP
 0002  1  3 B-NP    CD    55              NOFUNC          years             4 I-S/I-NP/B-UCP/B-ADJP/B-NP
 0002  1  4 I-NP    NNS   years           NP              old               5 I-S/I-NP/I-UCP/I-ADJP/I-NP
 0002  1  5 B-ADJP  JJ    old             ADJP/UCP        Agnew             1 I-S/I-NP/I-UCP/I-ADJP
 0002  1  6 O       CC    and             NOFUNC          old/chairman      5/8 I-S/I-NP/I-UCP
 0002  1  7 B-NP    JJ    former          NOFUNC          chairman          8 I-S/I-NP/I-UCP/B-NP/B-NP
 0002  1  8 I-NP    NN    chairman        NP/UCP          Agnew             1 I-S/I-NP/I-UCP/I-NP/I-NP
 0002  1  9 B-PP    IN    of              PP              chairman          8 I-S/I-NP/I-UCP/I-NP/B-PP
 0002  1 10 B-NP    NNP   Consolidated    NOFUNC          PLC              13 I-S/I-NP/I-UCP/I-NP/I-PP/B-NP
 0002  1 11 I-NP    NNP   Gold            NOFUNC          PLC              13 I-S/I-NP/I-UCP/I-NP/I-PP/I-NP
 0002  1 12 I-NP    NNP   Fields          NOFUNC          PLC              13 I-S/I-NP/I-UCP/I-NP/I-PP/I-NP
 0002  1 13 I-NP    NNP   PLC             NP              of                9 I-S/I-NP/I-UCP/I-NP/I-PP/I-NP
 0002  1 14 O       COMMA COMMA           NOFUNC          Agnew             1 I-S/I-NP
 0002  1 15 B-VP    VBD   was             NOFUNC          named            16 I-S/B-VP
 0002  1 16 I-VP    VBN   named           VP/S            named            16 I-S/I-VP
 0002  1 17 B-NP    DT    a               NOFUNC          director         19 I-S/I-VP/B-NP/B-NP
 0002  1 18 I-NP    JJ    nonexecutive    NOFUNC          director         19 I-S/I-VP/I-NP/I-NP
 0002  1 19 I-NP    NN    director        NP-PRD          named            16 I-S/I-VP/I-NP/I-NP
 0002  1 20 B-PP    IN    of              PP              director         19 I-S/I-VP/I-NP/B-PP
 0002  1 21 B-NP    DT    this            NOFUNC          conglomerate     24 I-S/I-VP/I-NP/I-PP/B-NP
 0002  1 22 I-NP    JJ    British         NOFUNC          conglomerate     24 I-S/I-VP/I-NP/I-PP/I-NP
 0002  1 23 I-NP    JJ    industrial      NOFUNC          conglomerate     24 I-S/I-VP/I-NP/I-PP/I-NP
 0002  1 24 I-NP    NN    conglomerate    NP              of               20 I-S/I-VP/I-NP/I-PP/I-NP
 0002  1 25 O       .     .               NOFUNC          named            16 I-S

Columns

Lines starting with a hash (#) contain special information. By default, the script produces 13 columns (the last 3 columns are mostly empty). By using options, we could have suppressed all columns we are not interested in. The line starting with "#columns:" lists the columns that are printed. The content of the columns is as follows:
  1. File number
  2. Sentence number
  3. Word number
  4. Chunk tag information: This information is not explicitly present in the treebank files. Below???, we will explain how it is computed by the script. The "O" tag means that the word in this line is outside of any chunk. The "I-XP" tag means that this word is inside an XP chunk. "B-XP" by default means that the word is at the beginning of an XP chunk. Thus "Rudolph Agnew" e.g. is an NP chunk, and "was named" is a VP chunk. (By using the -B option, the meaning of "B" can be changed or other tags added (cf. ???). The current setting is indicated in the line starting with "#arguments: IOB tag:".)
  5. Part-of-speech (POS) tag (extracted directly from the treebank)
  6. Word (extracted directly from the treebank)
  7. Grammatical function/relation of the chunk resp. word. The last word in each chunk is its head. Its function is the function of the whole chunk. The NP chunk "Rudolph Agnew" e.g. is the subject (NP-SBJ). The other words in a chunk that are not the head have "NOFUNC" as their function.
  8. The head word of the chunk towards which the chunk has the mentioned relation. Thus "Rudolph Agnew" is the subject of the VP chunk "was named". Non-heads have a "NOFUNC" relation towards the head of the chunk.
  9. The number of the head (as specified in column 8). As the same word may appear more than once in the same sentence, the head word might not be a unique indicator. Programs will mostly use column 9, whereas humans might prefer column 8.
  10. So-called IOB-chain, noting the syntactic categories of all the constituents on the path from the root node to this leaf node of the tree. "I-XP" and "B-XP" tags have the same meaning for constituents as in column four for chunks.
  11. Trace function: The treebank trees may contain traces. These are indicated by "-NONE-" instead of a part-of-speech tag and might be coreferenced to another constituent in the tree through an index ("-1"). In our example, the subject (NP-SBJ) of the predicative clause (S) under "named" is coreferent with the overall subject. Note that before the tree is converted to the word-based format, certain pruning operations take place. (These are controlled through the script, cf. ???) One of these operations involves pruning the inner VP, another one pruning the S-clause under "named" and redefining the S-clause subject to be the object of the matrix verb. Thus effectively, (part of) the chunklink format represents the (sub)tree:
          (NP-SBJ-1 (NP (NNP Rudolph) (NNP Agnew) )
            ...
          (VP (VBD was) (VBN named) 
              (NP (-NONE- *-1) )
              (NP-PRD 
                (NP (DT a) (JJ nonexecutive) (NN director) )
                (PP (IN of) 
                  (NP (DT this) (JJ British) (JJ industrial) (NN conglomerate) ))))
    
    The trace function of "Rudolph Agnew" is thus object.
  12. Trace head number: "Rudolf Agnew" is the object of "was named".
  13. Trace type

We used the chunklink format as the basis for creating training and testing material for our cascaded shallow parser. This ensures that the definition of what constitutes a chunk is the same in both components of the parser, i.e. the chunker and the grammatical relations finder.

Options

This text is displayed if the script is called without any arguments:
call as:                                                                                      
  chunklink.pl  /cdrom/treebank/combined/wsj/0?/wsj_0???.mrg | more                  
                                                                                              
options:                                                                                      
 -s : Place a '# sentence ID' line before the word-list of each sentence                      
      instead of at the lines of the individual words.                                        
      The sentence ID is file/number, e.g., 0001/01.                                          
                                                                                              
 -ns : Enumerate the words inside a sentence, instead of number in the file                   
                                                                                              
 -B  : which sort of IOB tags to output; I tags are always inside a chunk, O tags are outside 
       possible values are: Begin (the default): B tags for word at the beginning of a chunk  
                            End:                 E tags for word at the end of a chunk        
                            BeginEndCombined:    B tags for word at the beginning of a chunk  
                                                 E tags for word at the end of a chunk        
                                                 C tags for single word chunks                
                            Between:             B tags for words that are at the beginning   
                                                 of a chunk and the previous chunk had the    
                                                 same syntactic category                      
                                    Attention! The last option applies only to the simple     
                                    IOB tag column (e.g. 'I-NP'), not to the IOB chain column 
                                    (e.g. 'I-S/I-S/I-NP/I-NP/I-PP/B-NP'). If 'Between', the   
                                    latter column gets the default representation 'Begin'.    
                                                                                              
 -N  : suppress word number in output                                                         
 -p  :     ...  POS tag ...                                                                   
 -f  :          function                                                                      
 -h  :          head word                                                                     
 -H  :          head number                                                                   
 -i  :          IOB tag                                                                       
 -c  :          IOB tag chain                                                                 
 -t  :          trace information                                                             

Method

The information from the treebank is processed one sentence at a time.

While reading a sentence, the script creates one object data structure for each node in the tree. Depending on the type (word terminal, trace terminal, or non-terminal), the objects have different features, e.g. POS, word, function, kind of trace. Coindexing information and parent-child relations between nodes are represented as pointers between objects.

In the pruning step, nodes fulfilling certain conditions are pruned by redirecting pointers.

Next, the head child of each constituent node is determined. This information is not explicitly present in the treebank. The script makes use of (declarative) lists of which constituent or POS can be a head of which other constituent. Because of annotation errors in the treebank, these lists are not as concise as one might wish. Following the head paths, the tree is then "lexicalized", i.e. (a pointer to) the head word is copied up from head child to parent.

Then the leaf nodes of tree are collected in a flat list. During this step, the functions are copied down the tree from parent to head child.

If according to the options set the IOB-chain has to be computed, traces are pruned from the tree, and the IOB-chain information is collected from root to leafs.

Next, the chunk tag information is computed for all the objects in the flat list by giving the words up to and including the head of any XP constituent the tag I-XP. ???

Finally, the information contained in the features of these objects is printed in columns.

Technical details

This text was written at an earlier stage of the program, but should still be useful in general.

The file chunklink_23-8-99.pl consists of four parts:

Initialize

Main

Subs

The calling structure is as follows:
initialize 	-> head_medium 

                                                           		 |-> 'terminal'->new()
		|-> start_read 		-> read_sentence (recursive) ->	 |-> 'non_terminal'->new()
                |                                          		 |-> 'trace'->new()
		|-> prune 		-> head_of 
main 		|-> lexicalize (recursive) 
		|-> flatten (recursive)
		|-> chunks 
		|-> print_flatten 
"start_read" and recursive calls of "read_sentence" read a sentence from the input file and store the information in object data structures. The nodes of the parse tree are represented by objects of the types 'terminal', 'non_terminal' or 'trace', the arcs are represented by references to objects. Some special characters are replaced (, -> COMMA). The words are numbered consecutively. Default feature values are inserted. See "nodes.pm" for the definition of the objects, their features and the initial values. The reference to the root of the tree is stored in $result. See below for more details.

"prune" performs in fact two separate actions. First, some parts of the tree are simplified, by deleting a node and attaching its daughters to its mother (splice). Second, the head daughter(s) of each node are determined and marked (in feature {head_comp}). This part uses the definition loaded in the beginning, about which terminal (restrictions on part-of-speech) or which other constituent (restrictions on syntactic category) may be the head of a which constituent. See below for more details.

"lexicalize" recursively determines the lexical head to which a constituent attaches. Initially, the lexical head of a terminal is the terminal itself. Next, the lexical head of a constituent is the lexical head(s) of its head daughter(s). Thus lexical heads percolate upwards along the head line. Finally, the lexical head of a mother is copied to all its daughters. Lexical heads are stored as (lists of) references in the feature {lex_head}.

"flatten" introduces preliminary IOB-tags (only Is) on terminals (feature {iob_tag}), copies syntactic function information (e.g. NP-SBJ, PP-LOC) from mother to head daughter (i.e. down the tree to the lexical head of a constituent, feature {function}) and then collects only the references to terminals in an array (@flattened).

"chunks" refines the IOB-tags (feature {iob_tag}) by introducing B-tags if necessary and cares for the special case of possessive NPs.

"print_flatten" prints the information in the final format.

Visualize

These subs only serve to check what happens in each of the above processing steps. "print_parse_tree" prints the internal representation of the parse tree ($result), "print_list" is for printing @flattened. "print_lex_head" and "print_trace" are auxiliary functions.
Idiosyncracies of my treatment of the information in the tree bank can be found in "head_medium" (my definition of what is a head), "prune" (which constituents to throw away, e.g. QPs, ADJPs inside NPs, how VP-chunks are defined, treatment of coordination) and in "chunks" (treatment of possessive NPs).


6/9/1999

start_read: Skips (blank) lines until it finds a line starting with an opening bracket (=the beginning of a new sentence). Consumes that opening bracket from the input and sets the $depth-variable to 1. Then "read_sentence" is called with as second argument $depth-1, i.e. zero.

read_sentence: Repeatedly consumes pieces of the input. Whenever an opening bracket is read, $depth is increased, whenever a closiong bracket is read, $depth is decreased. If $depth is equal to the depth given as second argument to "read_sentence", the end of the constituent has been found and a reference to it is returned. There are basically four different kinds of pieces of input that "read_sentence" consumes:

Things get a little more complicated than these four basic cases because of traces and things coreferent with traces (=fillers). Traces are terminal nodes, e.g. "(-NONE- *T*-1)", coreferences to traces are attached to non-terminals, e.g. "(WHNP-1 ...)". Traces can refer backward, e.g.
              (NP 
                (NP (NNS symptoms) )
                (SBAR 
                  (WHNP-1 (WDT that) )
                  (S 
                    (NP-SBJ (-NONE- *T*-1) )
                    (VP (VBP show) 
                      (PRT (RP up) )
                      (ADVP-TMP 
                        (NP (NNS decades) )
                        (JJ later) )))))
which is the normal case, or forward, e.g.
  (S 
    (S-ADV 
      (NP-SBJ (-NONE- *-1) )
      (NP-PRD (DT No) (NNS dummies) ))
    (, ,) 
    (NP-SBJ-1 (DT the) (NNS drivers) )
    (VP (VBD pointed) 
      (PRT (RP out) ) 
      ... ) )
When the terminal node that was read is a trace, an object of type "trace" (instead of "terminal") is created. The value of its "reference" feature is a reference to the filler. If the filler has not yet been found (forward reference), a reference to the trace object is stored in hash %tracerefs. If a filler is found before the corresponding trace, its reference is stored in hash %corefs. Every object of type "terminal" or "non-terminal" has a feature "trace", whose value is a reference to the trace if the constituent was coreferenced to a trace, and undefined otherwise.

prune: Recursively descends through the tree and checks the daughters of each non-terminal node from left to right. If a daughter is a non-terminal, too, checks whether any of several pruning pattern apply and possibly prunes daughter, e.g. (NP ... (ADJP D1 D2 D3) ...) -> (NP ... D1 D2 D3 ...). A special case are ADVPs inside VPs: These are not directly pruned, but their position is remembered in array @advps, to be pruned later. In order to determine later which daughter is the head daughter of the constituent, records are kept of all the possible lexical and non-lexical heads of a constituent (@lastnonref, @sub_xps). In addition, it is recorded whether a coordinating conjunction(phrase) was found ($cc). After processing through all the daughters, the head daughter is determined and marked. ADVPs inside VPs are pruned. The function is recursively called for all the daughters.

There are two different sorts of pruning patterns. One erases internal structure of chunks (e.g. QP, NX) or chunks that do not count as chunks when they are inside other chunks (ADJP in NP, ADVP in ADJP, ADVP in VP). The other changes the chunk borders itself, e.g. "(VP would (VP have (VP wanted (VP to (VP come)))))" is one chunk because all the inner VPs are pruned.


24/9/99

The initialize-part of chunklink now contains a new sub part, that specifies which constituents are to be pruned. Here are examples of pruning:

If $prune_always{'NAC'}=1 is set:

              (NP (DT the) 
                (NAC-LOC (NNP West) (NNP Groton) 
                  (, ,)
                  (NNP Mass.) 
                  (, ,)
                  )
                (NN paper) (NN factory) )
becomes
              (NP (DT the) 
                (NNP West) (NNP Groton) 
                (, ,)
                (NNP Mass.) 
                (, ,)
                (NN paper) (NN factory) )
I'm not sure whether NACs only appear inside NPs. Anyway, they are pruned regardless of what their mother category is.

If $prune_always{'QP'}=1 is set:


                        (ADVP-TMP 
                          (NP 
                            (QP (RBR more) (IN than) (CD 30) )
                            (NNS years) )
                          (IN ago) )
becomes
                        (ADVP-TMP 
                          (NP 
                            (RBR more) (IN than) (CD 30)
                            (NNS years) )
                          (IN ago) )
I'm not sure whether QPs only appear inside NPs. Anyway, they are pruned regardless of what their mother category is.

If $prune_always{'NX'}=1 is set:

          (NP 
            (NP (JJ former) 
              (NX 
                (NX (NN president) )
                (CC and) 
                (NX (NN chief) (VBG operating) (NN officer) )))
            (PP (IN of) 
              (NP (NNPS Toys) (`` ``) (NNP R) ('' '') (NNP Us) (NNP Inc.) )))
becomes
          (NP 
            (NP (JJ former) 
              (NN president)
              (CC and) 
              (NN chief) (VBG operating) (NN officer) )
            (PP (IN of) 
              (NP (NNPS Toys) (`` ``) (NNP R) ('' '') (NNP Us) (NNP Inc.) )))
According to the bracketing guidelines, NXs may only appear inside NPs.

If $prune_always{'X'}=1 is set

                                (NP-PRD 
                                  (NP 
                                    (ADJP (RB exceptionally) (JJ good) )
                                    (NNS returns) )
                                  (X 
                                    (PP (IN in) )))
becomes
                                (NP-PRD 
                                  (NP 
                                    (ADJP (RB exceptionally) (JJ good) )
                                    (NNS returns) )
                                  (PP (IN in) ))
X is a kind of garbage category. Not much can be said about it.

If $prune_if_infrontof_head{'NP'}{'ADJP'}=1 is set:

      (VP (VBZ is) 
        (ADJP-PRD (RB unusually) (JJ resilient) ))
stays the same because ADJP is not directly under NP but under VP.
    (NP-SBJ 
      (NP (NNP Pierre) (NNP Vinken) )
      (, ,) 
      (ADJP 
        (NP (CD 61) (NNS years) )
        (JJ old) )
      (, ,) )
stays the same because ADJP is under NP, but after the NP's head "Vinken".

      (NP 
        (NP (DT the) (NN unit) )
        (PP (IN of) 
          (NP 
            (ADJP (JJ New) (JJ York-based) )
            (NNP Loews) (NNP Corp.) ))
becomes
      (NP 
        (NP (DT the) (NN unit) )
        (PP (IN of) 
          (NP 
            (JJ New) (JJ York-based)
            (NNP Loews) (NNP Corp.) )))
because ADJP is under NP and in front of the NP's head "Corp.".

If $prune_if_infrontof_head{'NP'}{'UCP'}=1 is set:

        (S 
          (UCP 
            (PP-TMP (IN after) 
              (NP (DT the) (NNS charges) ))
            (, ,) 
            (CC and)
            (S 
              (NP-SBJ (-NONE- *-1) )
              (`` ``) 
              (VP (VBG assuming) 
                (NP 
                  (NP (DT no) (JJ dramatic) (NN fluctuation) )
                  (PP-LOC ... )))))
          (, ,) 
          (NP-SBJ-1 (DT the) (NN company) )
          (VP (VBZ expects) 
            (S ... )))
stays the same because UCP is not under NP. This may not be very useful...

	
                    (PP-CLR (IN of) 
                      (NP (DT a) 
                        (UCP (NN state) (CC or) (JJ local) )
                        (NN utility) ))
becomes
	
                    (PP-CLR (IN of) 
                      (NP (DT a) 
                        (NN state) (CC or) (JJ local)
                        (NN utility) ))
because UCP is under NP and in front of the head.

If $prune_if_infrontof_head{'WHNP'}{'WHADJP'}=1 is set:

	
      (S 
        (NP-SBJ (DT that) )
        (VP (VBZ 's) 
          (SBAR-PRD 
            (WHADJP-2 (WRB how) (JJ bad) )
            (S 
              (NP-SBJ (PRP it) )
              (VP (VBZ is) 
                (ADJP-PRD (-NONE- *T*-2) )))))))
stays the same because WHADJ is not under WHNP, but under SBAR.

(WHNP-2 (WHADJP (WRB how) (JJ many) ) (NNS warrants) (CC and) (NNS options) )

	
becomes
              (WHNP-2 
                (WRB how) (JJ many)
                (NNS warrants) 
                (CC and)
                (NNS options) )
because WHADJP is under WHNP and in front of the heads "warrants" and "options".

If $prune_if_infrontof_head{'ADJP'}{'ADVP'}=1 is set:

	
                    (VP (VBP show) 
                      (PRT (RP up) )
                      (ADVP-TMP 
                        (NP (NNS decades) )
                        (JJ later) ))
stays the same because ADVP is not under ADJP, but under VP.
	
      (ADJP-PRD 
        (ADJP 
          (ADVP (RB almost) (RB entirely) )
          (JJ Western) )
        (, ,) 
        (ADJP (RB especially) (JJ American) )))
becomes
	
      (ADJP-PRD 
        (ADJP 
          (RB almost) (RB entirely)
          (JJ Western) )
        (, ,) 
        (ADJP (RB especially) (JJ American) )))
because ADVP is under ADJP and in front of ADJP's head "Western".

The following set of pruning flags is more specific and more complicated. In addition, they interact with one another, i.e. pruning of one sort may make the conditions for another sort of pruning come true (see prune_advp_in_vp_flag). Most of them lay the basis for the VP-chunks I want.
Let's have a look at a really complicated example:
If $prune_s_in_vp_non_empty_subject_flag=1,
$prune_s_in_vp_empty_subject_flag=1 and
$prune_vp_in_vp_flag=1 are set:

	
        (S 
          (NP-SBJ-1 (PRP they) )
          (VP (VBP have) (RB n't) 
            (VP (VBN decided) 
              (SBAR (IN whether) 
                (S 
                  (NP-SBJ-2 (-NONE- *-1) )
                  (VP (TO to) 
                    (VP (VB try) 
                      (S 
                        (NP-SBJ (-NONE- *-2) )
                        (VP (TO to) 
                          (VP (VB force) 
                            (S 
                              (NP-SBJ (DT the) (NN company) )
                              (VP (TO to) 
                                (VP (VB go) 
                                  (PRT (RP through) )
                                  (PP-CLR (IN with) 
                                    (NP (DT the) (NNS contracts) )))))))))))))))
becomes
        (S 
          (NP-SBJ-1 (PRP they) )
          (VP (VBP have) (RB n't) 
            (VBN decided) 
            (SBAR (IN whether) 
              (S 
                (NP-SBJ-2 (-NONE- *-1) )
                (VP (TO to) 
                  (VB try)
                  (TO to) 
                  (VB force) 
                  (S 
                    (NP-SBJ (DT the) (NN company) )
                    (VP (TO to) 
                      (VB go) 
                      (PRT (RP through) )
                      (PP-CLR (IN with) 
                        (NP (DT the) (NNS contracts) )))))))))))
    (VP (VBP make) 
      (S 
        (NP-SBJ (PRP them) )
        (ADJP-PRD (JJ fearful) )))
becomes
    (VP (VBP make) 
      (NP (PRP them) )
      (ADJP-PRD (JJ fearful) ))
Through "sub prune_s_in_vp_non_empty_subject_condition". Note that the function of "them" is changed from subject (of the predicate "fearful") to object (of "make").

      (NP-SBJ-1 (RBR More) (JJ common) (NN chrysotile) (NNS fibers) )
      (VP 
        (VP (VBP are) 
          (ADJP-PRD (JJ curly) ))
        (CC and) 
        (VP (VBP are) 
          (VP 
            (ADVP-MNR (RBR more) (RB easily) )
            (VBN rejected) )))
becomes
      (NP-SBJ-1 (RBR More) (JJ common) (NN chrysotile) (NNS fibers) )
      (VP 
        (VP (VBP are) 
          (ADJP-PRD (JJ curly) ))
        (CC and) 
        (VP (VBP are) 
          (RBR more) (RB easily)
          (VBN rejected) ))
In the case of VP coordination, we don't want the inner VPs to be pruned. This is achieved by the "sub verbs_or_adverbs_in_front". The second VP in this example is not pruned because there is no verb or adverb in front of it (inside the first VP). The third VP is not pruned because there are other constituents than verbs or adverbs in front (i.e. a VP and a CC). The fourth VP is pruned, there is only one verb in front of it ("are"). The adverbial phrase is pruned, too, see below.

      (NP-SBJ 
        (NP 
          (NP (DT A) (NN form) )
          (PP (IN of) 
            (NP (NN asbestos) )))
        (RRC 
          (ADVP-TMP (RB once) )
          (VP (VBN used) 
            (NP (-NONE- *) )
            (S-CLR 
              (NP-SBJ (-NONE- *) )
              (VP (TO to) 
                (VP (VB make) 
                  (NP (NNP Kent) (NN cigarette) (NNS filters) )))))))
In this example, only the third VP is pruned (not shown). The first trace prevents S-CLR from being pruned: it does not have only verbs and/or adverbs in front of it (see "sub prune_s_in_vp_condition"). Therefore, also the second VP cannot be pruned (it is not directly under VP). Thus we end up with two VP-chunks: "used" and "to make" instead of one ("used to make"), and that is exactly how it should be. Note that the structure of sentences like "I used to live here" is different and would result in one VP-chunk.

If $prune_advp_in_vp_flag=1 is set:

                    (VP (VBP show) 
                      (PRT (RP up) )
                      (ADVP-TMP 
                        (NP (NNS decades) )
                        (JJ later) ))
stays the same because even if ADVP is in VP, it is after the (only) verb "show".
    (VP (VBP are) 
      (ADVP-TMP (RB currently) )
      (VP (VBG yielding) 
        (NP 
          (QP (RB well) (IN over) (CD 9) )
          (NN %) )))
becomes
    (VP (VBP are) 
      (RB currently)
      (VBG yielding) 
      (NP 
        (QP (RB well) (IN over) (CD 9) )
        (NN %) ))
if VPs in VPs are pruned, too, because after pruning the inner VP, ADVP is still in an VP but in front of the last verb.