FoLiA library

This tutorial will introduce the FoLiA python library, part of PyNLPl. The FoLiA library provides an Application Programming Interface for the reading, creation and manipulation of FoLiA XML documents.

Prior to reading this document, it is highly recommended to first read to FoLiA documentation itself and familiarise yourself with the format and underlying paradigm.

Reading FoLiA

Loading a document

Any script that uses FoLiA starts with the import:

from pynlpl.formats import folia

Subsequently, a document can be read from file and into memory as follows:

doc = folia.Document(file="/path/to/document.xml")

This returns an instance that holds the entire document.

Once you have loaded a document, all data is available for you to read and manipulate as you see fit. We will first illustrate some simple use cases:

Printing text

You may want to simply print all (plain) text contained in the document, which is as easy as:

print doc

Alternatively, you can obtain a string representation of all text:

text_u = unicode(doc) #unicode instance
text = str(doc) #UTF-8 encoded

For any subelement of the document, you can obtain its text in the same fashion.

Index

A document instance has an index which you can use to grab any of its sub elements by ID. Querying using the index proceeds similar to using a python dictionary:

word = doc['example.p.3.s.5.w.1']
print word

Obtaining list of elements

Usually you do not know in advance the ID of the element you want, or you want multiple elements. There are some methods of iterating over certain elements using the FoLiA library.

For example, you can iterate over all words:

for word in doc.words():
    print word

That however gives you one big iteration of words without boundaries. You may more likely seek word within sentences. So we first iterate over all sentences, then over the words therein:

for sentence in doc.sentences():
    for word in sentence.words():
        print word

Or including paragraphs, assuming the document has them:

for paragraph in doc.paragraphs():
    for sentence in paragraph.sentences():
        for word in sentence.words():
            print word

You can also use this method to obtain a specific word, by passing an index parameter:

word = sentence.words(3) #retrieves the fourth word

If you want to iterate over all of the child elements of a certain element, regardless of what class they are, you can simply do so as follows:

for element in doc:
    if isinstance(element, folia.Sentence):
        print "this is a sentence"
    else:
        print "this is something else"

If applied recursively this allows you in principle to traverse the entire element tree.

Select method

There is a generic method available on all elements to select child elements of any desired class. This method is by default applied recursively. Internally, the paragraphs(), words() and sentences() methods seen above are simply shortcuts that make use of the select method:

sentence = doc['example.p.3.s.5.w.1']
words = sentence.select(folia.Word)
for word in words:
    print word

Note that the select method is by default recursive, set the third argument to False to make it non-recursive. The second argument can be used for restricting matches to a specific set.

Common attributes

As you know, the FoLiA paradigm introduces sets, classes, annotator with annotator types and confidence values. These attributes are easily accessible on any element that has them:

  • element.id (string)
  • element.set (string)
  • element.cls (string) Since class is already a reserved keyword in python, the library consistently uses cls
  • element.annotator (string)
  • element.annotatortype (set to folia.AnnotatorType.MANUAL or folia.AnnotatorType.AUTO)
  • element.confidence (float)

Attributes that are not available for certain elements, or not set, default to None.

Annotations

FoLiA is of course a format for linguistic annotation. So let’s see at how to obtain annotations. This can be done using annotations() or annotation(), which is very similar to the select method, except that it will raise an exception when no such annotation is found. The difference between annotation() and annotations() is that the former will grab only one and raise an exception if there are more between which it can’t disambiguate:

for word in doc.words():
    try:
        pos = word.annotation(folia.PosAnnotation, 'CGN')
        lemma = word.annotation(folia.LemmaAnnotation)
        print "Word: ", word
        print "ID: ", word.id
        print "PoS-tag: " , pos.cls
        print "PoS Annotator: ", pos.annotator
        print "Lemma-tag: " , lemma.cls
    except folia.NoSuchAnnotation:
        print "No PoS or Lemma annotation"

Note that the second argument of annotation(), annotations() or select() can be used to restrict your selection to a certain set. In the above example we restrict ourselves to Part-of-Speech tags in the CGN set.

Span Annotation

(to be written still)

Subtoken Annotation

(to be written still)

Searching in a FoLiA document

(Yet to be written)

Editing FoLiA

Creating a new document

Creating a new FoliA document, rather than loading an existing one from file, can be done by explicitly providing an ID for the new document in the constructor:

doc = folia.Document(id='example')

Adding structure

Assuming we begin with an empty document, we should first add a Text element. Then we can append paragraphs, sentences, or other structural elements. The append() is always used to append new children to an element:

text = doc.append(folia.Text)
paragraph = text.append(folia.Paragraph)
sentence = paragraph.append(folia.Sentence)
sentence.append(folia.Word, 'This')
sentence.append(folia.Word, 'is')
sentence.append(folia.Word, 'a')
sentence.append(folia.Word, 'test')
sentence.append(folia.Word, '.')

Adding annotations

Adding annotations, or any elements for that matter, is done using the append method. Let’s build on the previous example:

#First we grab the fourth word, 'test', from the sentence
word = sentence.words(3)

#Add Part-of-Speech tag
word.append(folia.PosAnnotation, set='brown-tagset',cls='n')

#Add lemma
lemma.append(folia.LemmaAnnotation, cls='test')

Note that in the above examples, the append() method takes a class as first argument, and subsequently takes keyword arguments that will be passed to the classes’ constructor.

A second way of using append() is by simply passing a child element and constructing it prior to appending. The following is equivalent to the above example:

#First we grab the fourth word, 'test', from the sentence
word = sentence.words(3)

#Add Part-of-Speech tag
word.append( folia.PosAnnotation(doc, set='brown-tagset',cls='n') )

#Add lemma
lemma.append( folia.LemmaAnnotation(doc , cls='test') )

The append method always returns that which was appended.

In the above example we first instantiate a PosAnnotatation and a LemmaAnnotation. Instantiation of any element follows the following pattern:

Class(document, *children, **kwargs)

The common attributes are set using equally named keyword arguments:

  • id=
  • cls=
  • set=
  • annotator=
  • annotatortype=
  • confidence=

Not all attributes are allowed for all elements, and certain attributes are required for certain elements. ValueError exceptions will be raised when these constraints are not met.

Instead of setting id. you can also set the keyword argument generate_id_in and pass it another element, an ID will be automatically generated, based on the ID of the element passed. When you use the first method of appending, instatation with generate_id_in will take place automatically behind the screens when applicable and when id is not explicitly set.

Any extra non-keyword arguments should be FoLiA elements and will be appended as the contents of the element, i.e. the children or subelements. Instead of using non-keyword arguments, you can also use the keyword argument content and pass a list. This is a shortcut made merely for convenience, as Python obliges all non-keyword arguments to come before the keyword-arguments, which if often aesthetically unpleasing for our purposes. Example of this use case will be shown in the next section.

Adding span annotation

Adding span annotation is easy with the FoLiA library, not withstanding the fact that there’s more to it than adding token annotation.

As you know, span annotation uses an stand-off annotation embedded in annotation layers. These layers are in turn embedded at the sentence level. In the following example we first create a sentence and then add a syntax parse:

sentence = text.append(folia.Sentence)
sentence.append(folia.Word, 'The',id='example.s.1.w.1')
sentence.append(folia.Word, 'boy',id='example.s.1.w.2')
sentence.append(folia.Word, 'pets',id='example.s.1.w.3')
sentence.append(folia.Word, 'the',id='example.s.1.w.4')
sentence.append(folia.Word, 'cat',id='example.s.1.w.5')
sentence.append(folia.Word, '.', id='example.s.1.w.6')

#Adding Syntax Layer
layer = sentence.append(folia.SyntaxLayer)

#Adding Syntactic Units
layer.append(
    SyntacticUnit(self.doc, cls='s', contents=[
        SyntacticUnit(self.doc, cls='np', contents=[
            SyntacticUnit(self.doc, self.doc['example.s.1.w.1'], cls='det'),
            SyntacticUnit(self.doc, self.doc['example.s.1.w.2'], cls='n'),
        ]),
        SyntacticUnit(self.doc, cls='vp', contents=[
            SyntacticUnit(self.doc, self.doc['example.s.1.w.3'], cls='v')
                SyntacticUnit(self.doc, cls='np', contents=[
                    SyntacticUnit(self.doc, self.doc['example.s.1.w.4'], cls='det'),
                    SyntacticUnit(self.doc, self.doc['example.s.1.w.5'], cls='n'),
                ]),
            ]),
        SyntacticUnit(self.doc, self.doc['example.s.1.w.6'], cls='fin')
    ])
)

To make references to the words, we simply pass the word instances and use the document’s index to obtain them. Note also that passing a list using the keyword argument contents is wholly equivalent to passing the non-keyword arguments separately.

Adding subtoken annotation

(Yet to be written)

Corrections

API Reference

class pynlpl.formats.folia.AbstractAnnotation(doc, *args, **kwargs)
class pynlpl.formats.folia.AbstractAnnotationLayer(doc, *args, **kwargs)
Annotation layers for Span Annotation are derived from this abstract base class
class pynlpl.formats.folia.AbstractCorrectionChild(doc, *args, **kwargs)
class pynlpl.formats.folia.AbstractDefinition
class pynlpl.formats.folia.AbstractElement(doc, *args, **kwargs)
classmethod addable(Class, parent, set=None, raiseexceptions=True)

Tests whether a new element of this class can be added to the parent. Returns a boolean or raises ValueError exceptions (unless set to ignore)!

This will use OCCURRENCES, but may be overidden for more customised behaviour.

This method is mostly for internal use.

ancestors(Class=None)
append(child, *args, **kwargs)

Append a child element. Returns the added element

Arguments:
  • child - Instance or class

If an instance is passed as first argument, it will be appended If a class derived from AbstractElement is passed as first argument, an instance will first be created and then appended.

Keyword arguments:
  • alternative= - If set to True, the element will be made into an alternative.

Generic example, passing a pre-generated instance:

word.append( folia.LemmaAnnotation(doc,  cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL ) )

Generic example, passing a class to be generated:

word.append( folia.LemmaAnnotation, cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL )

Generic example, setting text with a class:

word.append( “house”, cls=’original’ )
copy()
Make a deep copy
deepvalidation()
description()
Obtain the description associated with the element, will raise NoDescription if there is none
feat(subset)

Obtain the feature value of the specific subset.

Example:

sense = word.annotation(folia.Sense)
synset = sense.feat('synset')        
classmethod findreplacables(Class, parent, set=None, **kwargs)
Find replacable elements. Auxiliary function used by replace(). Can be overriden for more fine-grained control. Mostly for internal use.
hastext(cls='current')
Does this element have text (of the specified class)
insert(index, child, *args, **kwargs)

Insert a child element at specified index. Returns the added element

If an instance is passed as first argument, it will be appended If a class derived from AbstractElement is passed as first argument, an instance will first be created and then appended.

Arguments:
  • index
  • child - Instance or class
Keyword arguments:
  • alternative= - If set to True, the element will be made into an alternative.
  • corrected= - Used only when passing strings to be made into TextContent elements.

Generic example, passing a pre-generated instance:

word.insert( 3, folia.LemmaAnnotation(doc,  cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL ) )

Generic example, passing a class to be generated:

word.insert( 3, folia.LemmaAnnotation, cls="house", annotator="proycon", annotatortype=folia.AnnotatorType.MANUAL )

Generic example, setting text of a specific correctionlevel:

word.insert( 3, "house", corrected=folia.TextCorrectionLevel.CORRECTED )
items(founditems=[])
Returns a depth-first flat list of all items below this element (not limited to AbstractElement)
originaltext()
Alias for retrieving the original uncorrect text
overridetextdelimiter()
May return a customised text delimiter that overrides the default text delimiter set by the parent. Defaults to None (do not override). Mostly for internal use.
classmethod parsexml(Class, node, doc)

Internal class method used for turning an XML element into an instance of the Class.

Args:
  • ``node`’ - XML Element
  • doc - Document
Returns:
An instance of the current Class.
postappend()

This method will be called after an element is added to another. It can do extra checks and if necessary raise exceptions to prevent addition. By default makes sure the right document is associated.

This method is mostly for internal use.

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
remove(child)
Removes the child element
replace(child, *args, **kwargs)

Appends a child element like append(), but replaces any existing child element of the same type and set. If no such child element exists, this will act the same as append()

Keyword arguments:
  • alternative - If set to True, the replaced element will be made into an alternative. Simply use append() if you want the added element

to be an alternative.

See append() for more information.

resolveword(id)
select(Class, set=None, recursive=True, ignorelist=[, 'Original', 'Suggestion', 'Alternative'], node=None)

Select child elements of the specified class.

A further restriction can be made based on set. Whether or not to apply recursively (by default enabled) can also be configured, optionally with a list of elements never to recurse into.

Arguments:
  • Class: The class to select; any python class subclassed off ‘AbstractElement`
  • set: The set to match against, only elements pertaining to this set will be returned. If set to None (default), all elements regardless of set will be returned.
  • recursive: Select recursively? Descending into child elements? Boolean defaulting to True.
  • ignorelist: A list of Classes (subclassed off AbstractElement) not to recurse into. It is common not to want to recurse into the following elements: folia.Alternative, folia.Suggestion, and folia.Original. As elements contained in these are never authorative.
  • node: Reserved for internal usage, used in recursion.
Returns:
A list of elements (instances)

Example:

text.select(folia.Sense, 'cornetto', True, [folia.Original, folia.Suggestion, folia.Alternative] )        
setdocument(doc)
Associate a document with this element
settext(text, cls='current')
Set the text for this element (and class)
stricttext(cls='current')
Get the text strictly associated with this element (of the specified class). Does not recurse into children, with the sole exception of Corection/New
text(cls='current')
Get the text associated with this element (of the specified class), will always be a unicode instance. If no text is directly associated with the element, it will be obtained from the children. If that doesn’t result in any text either, a NoSuchText exception will be raised.
textcontent(cls='current')

Get the text explicitly associated with this element (of the specified class). Returns the TextContent instance rather than the actual text. Raises NoSuchText exception if not found.

Unlike text(), this method does not recurse into child elements (with the sole exception of the Correction/New element), and it returns the TextContent instance rather than the actual text!

xml(attribs=None, elements=None, skipchildren=False)
Return an XML Element for this element and all its children.
xmlstring(pretty_print=False)
Return a string with XML presentation for this element and all its children.
xselect(Class, recursive=True, node=None)
Same as select(), but this is a generator instead of returning a list
class pynlpl.formats.folia.AbstractExtendedTokenAnnotation(doc, *args, **kwargs)
class pynlpl.formats.folia.AbstractSpanAnnotation(doc, *args, **kwargs)

Abstract element, all span annotation elements are derived from this class

append(child, *args, **kwargs)
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.AbstractStructureElement(doc, *args, **kwargs)
append(child, *args, **kwargs)
See AbstractElement.append()
paragraphs(index=None)

Returns a list of Paragraph elements found (recursively) under this element.

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the list of all
resolveword(id)
sentences(index=None)

Returns a list of Sentence elements found (recursively) under this element

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the list of all
words(index=None)

Returns a list of Word elements found (recursively) under this element.

Arguments:
  • index: If set to an integer, will retrieve and return the n’th element (starting at 0) instead of returning the list of all
class pynlpl.formats.folia.AbstractSubtokenAnnotation(doc, *args, **kwargs)
Abstract element, all subtoken annotation elements are derived from this class
class pynlpl.formats.folia.AbstractSubtokenAnnotationLayer(doc, *args, **kwargs)
Annotation layers for Subtoken Annotation are derived from this abstract base class
class pynlpl.formats.folia.AbstractTokenAnnotation(doc, *args, **kwargs)

Abstract element, all token annotation elements are derived from this class

append(child, *args, **kwargs)
See AbstractElement.append()
class pynlpl.formats.folia.ActorFeature(doc, *args, **kwargs)
Actor feature, to be used within Event
class pynlpl.formats.folia.AlignReference(doc, *args, **kwargs)
class pynlpl.formats.folia.AllowCorrections
correct(**kwargs)
Apply a correction (TODO: documentation to be written still)
class pynlpl.formats.folia.AllowGenerateID
generate_id(cls)
class pynlpl.formats.folia.AllowTokenAnnotation

Elements that allow token annotation (including extended annotation) must inherit from this class

alternatives(Class=None, set=None)

Obtain a list of alternatives, either all or only of a specific annotation type, and possibly restrained also by set.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation). Or set to None to select all alternatives regardless of what type they are.
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
List of Alternative elements
annotation(type, set=None)
Will return a single annotation (even if there are multiple). Raises a NoSuchAnnotation exception if none was found
annotations(Class, set=None)

Obtain annotations. Very similar to select() but raises an error if the annotation was not found.

Arguments:
  • Class - The Class you want to retrieve (e.g. PosAnnotation)
  • set - The set you want to retrieve (defaults to None, which selects irregardless of set)
Returns:
A list of elements
Raises:
NoSuchAnnotation if the specified annotation does not exist.
annotationsold(annotationtype=None)
Generator yielding all annotations of a certain type. Raises a Raises a NoSuchAnnotation exception if none was found.
hasannotation(Class, set=None)
Returns an integer indicating whether such as annotation exists, and if so, how many. See annotations() for a description of the parameters.
class pynlpl.formats.folia.Alternative(doc, *args, **kwargs)
Element grouping alternative token annotation(s). Multiple alternative elements may occur, each denoting a different alternative. Elements grouped inside an alternative block are considered dependent.
class pynlpl.formats.folia.AlternativeLayers(doc, *args, **kwargs)
Element grouping alternative subtoken annotation(s). Multiple altlayers elements may occur, each denoting a different alternative. Elements grouped inside an alternative block are considered dependent.
class pynlpl.formats.folia.AnnotationType
class pynlpl.formats.folia.AnnotatorType
class pynlpl.formats.folia.Attrib
class pynlpl.formats.folia.BegindatetimeFeature(doc, *args, **kwargs)
Begindatetime feature, to be used within Event
class pynlpl.formats.folia.Caption(doc, *args, **kwargs)
Element used for captions for figures or tables, contains sentences
class pynlpl.formats.folia.Chunk(doc, *args, **kwargs)
Chunk element, span annotation element to be used in ChunkingLayer
class pynlpl.formats.folia.ChunkingLayer(doc, *args, **kwargs)
Chunking Layer: Annotation layer for Chunk span annotation elements
class pynlpl.formats.folia.ClassDefinition(id, type, label, constraints=[])
classmethod parsexml(Class, node, constraintindex)
class pynlpl.formats.folia.ConstraintDefinition(id, restrictions={}, exceptions={})
classmethod parsexml(Class, node, constraintindex)
class pynlpl.formats.folia.Content(doc, *args, **kwargs)
classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.Corpus(corpusdir, extension='xml', restrict_to_collection='', conditionf=<function <lambda> at 0x36f32a8>, ignoreerrors=False, **kwargs)
A corpus of various FoLiA documents. Yields a Document on each iteration. Suitable for sequential processing.
class pynlpl.formats.folia.CorpusFiles(corpusdir, extension='xml', restrict_to_collection='', conditionf=<function <lambda> at 0x36f32a8>, ignoreerrors=False, **kwargs)
A corpus of various FoLiA documents. Yields the filenames on each iteration.
class pynlpl.formats.folia.CorpusProcessor(corpusdir, function, threads=None, extension='xml', restrict_to_collection='', conditionf=<function <lambda> at 0x36f3488>, maxtasksperchild=100, preindex=False, ordered=True, chunksize=1)

Processes a corpus of various FoLiA documents using a parallel processing. Calls a user-defined function with the three-tuple (filename, args, kwargs) for each file in the corpus. The user-defined function is itself responsible for instantiating a FoLiA document! args and kwargs, as received by the custom function, are set through the run() method, which yields the result of the custom function on each iteration.

execute()
run(*args, **kwargs)
class pynlpl.formats.folia.Correction(doc, *args, **kwargs)
current(index=None)
hascurrent()
hasnew()
hasoriginal()
hassuggestions()
new(index=None)
original(index=None)
select(cls, set=None, recursive=True, ignorelist=[], node=None)
Select on Correction only descends in either “NEW” or “CURRENT” branch
suggestions(index=None)
text(cls='current')
textcontent(cls='current')

Get the text explicitly associated with this element (of the specified class). Returns the TextContent instance rather than the actual text. Raises NoSuchText exception if not found.

Unlike text(), this method does not recurse into child elements (with the sole exception of the Correction/New element), and it returns the TextContent instance rather than the actual text!

class pynlpl.formats.folia.Current(doc, *args, **kwargs)
classmethod addable(Class, parent, set=None, raiseexceptions=True)
exception pynlpl.formats.folia.DeepValidationError
class pynlpl.formats.folia.DependenciesLayer(doc, *args, **kwargs)
Dependencies Layer: Annotation layer for Dependency span annotation elements. For dependency entities.
class pynlpl.formats.folia.Dependency(doc, *args, **kwargs)
dependent()
Returns the dependent of the dependency relation. Instance of DependencyDependent
head()
Returns the head of the dependency relation. Instance of DependencyHead
class pynlpl.formats.folia.DependencyDependent(doc, *args, **kwargs)
class pynlpl.formats.folia.DependencyHead(doc, *args, **kwargs)
class pynlpl.formats.folia.Description(doc, *args, **kwargs)

Description is an element that can be used to associate a description with almost any other FoLiA element

classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.Division(doc, *args, **kwargs)

Structure element representing some kind of division. Divisions may be nested at will, and may include almost all kinds of other structure elements.

head()
class pynlpl.formats.folia.Document(*args, **kwargs)

This is the FoLiA Document, all elements have to be associated with a FoLiA document. Besides holding elements, the document hold metadata including declaration, and an index of all IDs.

append(text)

Add a text to the document:

Example 1:

doc.append(folia.Text)
Example 2::
doc.append( folia.Text(doc, id=’example.text’) )
create(Class, *args, **kwargs)
Create an element associated with this Document. This method may be obsolete and removed later.
date(value=None)
No arguments: Get the document’s date from metadata Argument: Set the document’s date in metadata
declare(annotationtype, set, **kwargs)
declared(annotationtype, set)
defaultannotator(annotationtype, set=None)
defaultannotatortype(annotationtype, set=None)
defaultset(annotationtype)
findwords(*args, **kwargs)
items()
Returns a depth-first flat list of all items in the document
language(value=None)
No arguments: Get the document’s language (ISO-639-3) from metadata Argument: Set the document’s language (ISO-639-3) in metadata
license(value=None)
No arguments: Get the document’s license from metadata Argument: Set the document’s license in metadata
load(filename)
Load a FoLiA or D-Coi XML file
paragraphs(index=None)

Return a list of all paragraphs found in the document.

If an index is specified, return the n’th paragraph only (starting at 0)

parsemetadata(node)
parsexml(node, ParentClass=None)
Main XML parser, will invoke class-specific XML parsers. For internal use.
parsexmldeclarations(node)
publisher(value=None)
No arguments: Get the document’s publisher from metadata Argument: Set the document’s publisher in metadata
save(filename=None)

Save the document to FoLiA XML.

Arguments:
  • filename=: The filename to save to. If not set (None), saves to the same file as loaded from.
select(Class, set=None)
sentences(index=None)

Return a list of all sentence found in the document. Except for sentences in quotes.

If an index is specified, return the n’th sentence only (starting at 0)

setcmdi(filename)
setimdi(node)
text()
Returns the text of the entire document (returns a unicode instance)
title(value=None)
No arguments: Get the document’s title from metadata Argument: Set the document’s title in metadata
words(index=None)

Return a list of all active words found in the document. Does not descend into annotation layers, alternatives, originals, suggestions.

If an index is specified, return the n’th word only (starting at 0)

xml()
xmldeclarations()
xmlmetadata()
xmlstring()
xpath(query)
Run Xpath expression and parse the resulting elements. Don’t forget to use the FoLiA namesapace in your expressions, using folia: or the short form f:
class pynlpl.formats.folia.DomainAnnotation(doc, *args, **kwargs)
Domain annotation: an extended token annotation element
exception pynlpl.formats.folia.DuplicateAnnotationError
exception pynlpl.formats.folia.DuplicateIDError
class pynlpl.formats.folia.EnddatetimeFeature(doc, *args, **kwargs)
Enddatetime feature, to be used within Event
class pynlpl.formats.folia.EntitiesLayer(doc, *args, **kwargs)
Entities Layer: Annotation layer for Entity span annotation elements. For named entities.
class pynlpl.formats.folia.Entity(doc, *args, **kwargs)
Entity element, for named entities, span annotation element to be used in EntitiesLayer
class pynlpl.formats.folia.ErrorDetection(doc, *args, **kwargs)
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.Event(doc, *args, **kwargs)
class pynlpl.formats.folia.Feature(doc, *args, **kwargs)

Feature elements can be used to associate subsets and subclasses with almost any annotation element

classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
xml()
class pynlpl.formats.folia.Figure(doc, *args, **kwargs)

Element for the representation of a graphical figure. Structure element.

caption()
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.Gap(doc, *args, **kwargs)

Gap element. Represents skipped portions of the text. Contains Content and Desc elements

content()
class pynlpl.formats.folia.Head(doc, *args, **kwargs)
Head element. A structure element. Acts as the header/title of a division. There may be one per division. Contains sentences.
class pynlpl.formats.folia.HeadFeature(doc, *args, **kwargs)
Synset feature, to be used within PosAnnotation
class pynlpl.formats.folia.Label(doc, *args, **kwargs)
Element used for labels. Mostly in within list item. Contains words.
class pynlpl.formats.folia.LemmaAnnotation(doc, *args, **kwargs)
Lemma annotation: a token annotation element
class pynlpl.formats.folia.Linebreak(doc, *args, **kwargs)
Line break element, signals a line break
class pynlpl.formats.folia.List(doc, *args, **kwargs)
Element for enumeration/itemisation. Structure element. Contains ListItem elements.
class pynlpl.formats.folia.ListItem(doc, *args, **kwargs)
Single element in a List. Structure element. Contained within List element.
exception pynlpl.formats.folia.MalformedXMLError
class pynlpl.formats.folia.MetaDataType
class pynlpl.formats.folia.Mode
exception pynlpl.formats.folia.ModeError
class pynlpl.formats.folia.Morpheme(doc, *args, **kwargs)
Morpheme element, represents one morpheme in morphological analysis, subtoken annotation element to be used in MorphologyLayer
class pynlpl.formats.folia.MorphologyLayer(doc, *args, **kwargs)
Morphology Layer: Annotation layer for Morpheme subtoken annotation elements. For morphological analysis.
class pynlpl.formats.folia.NativeMetaData(*args, **kwargs)
items()
class pynlpl.formats.folia.New(doc, *args, **kwargs)
classmethod addable(Class, parent, set=None, raiseexceptions=True)
exception pynlpl.formats.folia.NoDefaultError
exception pynlpl.formats.folia.NoDescription
exception pynlpl.formats.folia.NoSuchAnnotation
exception pynlpl.formats.folia.NoSuchText
class pynlpl.formats.folia.Original(doc, *args, **kwargs)
classmethod addable(Class, parent, set=None, raiseexceptions=True)
class pynlpl.formats.folia.Paragraph(doc, *args, **kwargs)
Paragraph element. A structure element. Represents a paragraph and holds all its sentences (and possibly other structure Whitespace and Quotes).
class pynlpl.formats.folia.Pattern(*args, **kwargs)
resolve(size, distribution)
Resolve a variable sized pattern to all patterns of a certain fixed size
variablesize()
variablewildcards()
class pynlpl.formats.folia.PhonAnnotation(doc, *args, **kwargs)
Phonetic annotation: a token annotation element
class pynlpl.formats.folia.PosAnnotation(doc, *args, **kwargs)
Part-of-Speech annotation: a token annotation element
class pynlpl.formats.folia.Query(files, expression)
An XPath query on one or more FoLiA documents
class pynlpl.formats.folia.Quote(doc, *args, **kwargs)

Quote: a structure element. For quotes/citations. May hold words or sentences.

append(child, *args, **kwargs)
resolveword(id)
class pynlpl.formats.folia.Reader(filename, target, bypassleak=False)
Streaming FoLiA reader. The reader allows you to read a FoLiA Document without holding the whole tree structure in memory. The document will be read and the elements you seek returned as they are found.
class pynlpl.formats.folia.RegExp(regexp)
class pynlpl.formats.folia.SenseAnnotation(doc, *args, **kwargs)
Sense annotation: a token annotation element
class pynlpl.formats.folia.Sentence(doc, *args, **kwargs)

Sentence element. A structure element. Represents a sentence and holds all its words (and possibly other structure such as LineBreaks, Whitespace and Quotes)

corrections()
Are there corrections in this sentence?
correctwords(originalwords, newwords, **kwargs)
Generic correction method for words. You most likely want to use the helper functions splitword() , mergewords(), deleteword(), insertword() instead
deleteword(word, **kwargs)
TODO: Write documentation
division()
Obtain the division this sentence is a part of (None otherwise)
insertword(newword, prevword, **kwargs)
mergewords(newword, *originalwords, **kwargs)
TODO: Write documentation
paragraph()
Obtain the paragraph this sentence is a part of (None otherwise)
resolveword(id)
splitword(originalword, *newwords, **kwargs)
TODO: Write documentation
class pynlpl.formats.folia.SetDefinition(id, classes=[], subsets=[], constraintindex={})
classmethod parsexml(Class, node)
testclass()
testsubclass(subset, subclass)
exception pynlpl.formats.folia.SetDefinitionError
class pynlpl.formats.folia.SetType
class pynlpl.formats.folia.SubentitiesLayer(doc, *args, **kwargs)
Subentities Layer: Annotation layer for Subentity subtoken annotation elements. For named entities within a single token.
class pynlpl.formats.folia.Subentity(doc, *args, **kwargs)
Subentity element, for named entities within a single token, subtoken annotation element to be used in SubentitiesLayer
class pynlpl.formats.folia.SubjectivityAnnotation(doc, *args, **kwargs)
Subjectivity annotation: a token annotation element
class pynlpl.formats.folia.SubsetDefinition(id, type, classes=[], subsets=[])
parsexml(Class, node, constraintindex={})
class pynlpl.formats.folia.Suggestion(doc, *args, **kwargs)
class pynlpl.formats.folia.SynsetFeature(doc, *args, **kwargs)
Synset feature, to be used within Sense
class pynlpl.formats.folia.SyntacticUnit(doc, *args, **kwargs)
Syntactic Unit, span annotation element to be used in SyntaxLayer
class pynlpl.formats.folia.SyntaxLayer(doc, *args, **kwargs)
Syntax Layer: Annotation layer for SyntacticUnit span annotation elements
class pynlpl.formats.folia.Text(doc, *args, **kwargs)
A full text. This is a high-level element (not to be confused with TextContent!). This element may contain divisions, paragraphs, sentences, etc..
class pynlpl.formats.folia.TextContent(doc, *args, **kwargs)

Text content element (t), holds text to be associated with whatever element the text content element is a child of.

Text content elements have an associated correction level, indicating whether the text they hold is in a pre-corrected or post-corrected state. There can be only once of each level. Text content elements on structure elements like Paragraph and Sentence are by definition untokenised. Only on Word level and deeper they are by definition tokenised.

Text content elements can specify offset that refer to text at a higher parent level. Use the following keyword arguments:
  • ref=: The instance to point to, this points to the element holding the text content element, not the text content element itself.
  • offset=: The offset where this text is found, offsets start at 0
append(child, *args, **kwargs)
This method is not implemented on purpose
finddefaultreference()

Find the default reference for text offsets: The parent of the current textcontent’s parent (counting only Structure Elements and Subtoken Annotation Elements)

Note: This returns not a TextContent element, but its parent. Whether the textcontent actually exists is checked later/elsewhere

classmethod findreplacables(Class, parent, set, **kwargs)
(Method for internal usage, see AbstractElement)
classmethod parsexml(Class, node, doc)
(Method for internal usage, see AbstractElement)
postappend()
(Method for internal usage, see AbstractElement.postappend())
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
text()
Obtain the text (unicode instance)
validateref()
Validates the Text Content’s references. Raises UnresolvableTextContent when invalid
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.TextCorrectionLevel
class pynlpl.formats.folia.TimedEvent(doc, *args, **kwargs)
class pynlpl.formats.folia.TimingLayer(doc, *args, **kwargs)
Dependencies Layer: Annotation layer for Dependency span annotation elements. For dependency entities.
exception pynlpl.formats.folia.UnresolvableTextContent
class pynlpl.formats.folia.Whitespace(doc, *args, **kwargs)
Whitespace element, signals a vertical whitespace
class pynlpl.formats.folia.Word(doc, *args, **kwargs)

Word (aka token) element. Holds a word/token and all its related token annotations.

context(size, placeholder=None)
Returns this word in context, {size} words to the left, the current word, and {size} words to the right
division()
Obtain the deepest division this word is a part of, otherwise return None
domain(set=None)
Shortcut: returns the FoLiA class of the domain annotation (will return only one if there are multiple!)
getcorrection(set=None, cls=None)
getcorrections(set=None, cls=None)
incorrection()
Is this word part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None
leftcontext(size, placeholder=None)
Returns the left context for a word. This method crosses sentence/paragraph boundaries
lemma(set=None)
Shortcut: returns the FoLiA class of the lemma annotation (will return only one if there are multiple!)
next()
Returns the next word in the sentence, or None if no next word was found. This method does not cross sentence boundaries.
overridetextdelimiter()
May return a customised text delimiter that overrides the default text delimiter set by the parent. Defaults to None (do not override). Mostly for internal use.
paragraph()
Obtain the paragraph this word is a part of, otherwise return None
classmethod parsexml(Class, node, doc)
pos(set=None)
Shortcut: returns the FoLiA class of the PoS annotation (will return only one if there are multiple!)
previous()
Returns the previous word in the sentence, or None if no next word was found. This method does not cross sentence boundaries.
resolveword(id)
rightcontext(size, placeholder=None)
Returns the right context for a word. This method crosses sentence/paragraph boundaries
sense(set=None)
Shortcut: returns the FoLiA class of the sense annotation (will return only one if there are multiple!)
sentence()
Obtain the sentence this word is a part of, otherwise return None
split(*newwords, **kwargs)
xml(attribs=None, elements=None, skipchildren=False)
class pynlpl.formats.folia.WordReference(doc, *args, **kwargs)

Word reference. Use to refer to words from span annotation elements. The Python class will only be used when word reference can not be resolved, if they can, Word objects will be used

classmethod parsexml(Class, node, doc)
classmethod relaxng(includechildren=True, extraattribs=None, extraelements=None)
pynlpl.formats.folia.c
alias of PhonAnnotation
pynlpl.formats.folia.loadsetdefinition(filename)
pynlpl.formats.folia.parse_datetime(s)
Returns (datetime, tz offset in minutes) or (None, None).
pynlpl.formats.folia.parsecommonarguments(object, doc, annotationtype, required, allowed, **kwargs)
pynlpl.formats.folia.relaxng(filename=None)
pynlpl.formats.folia.relaxng_declarations()
pynlpl.formats.folia.validate(filename, schema=None, deep=False)