Extending the AIR Equation Scoring Engine

The AIR Equation Scoring Engine supports a limited subset of the MathML standard. The most likely place that users will want to extend the engine is to implement support for MathML constructs that are not currently supported. This can be accomplished by writing subclasses of airscore.mathmlsympy.base_mathml_element.BaseMathmlElement or airscore.mathmlsympy.mathml_containers.BaseMathmlContainer, and registering the new classes with the parser using the airscore.mathmlsympy.parser.mathml_element() decorator. The interfaces for the relevant classes and methods are described below.

BaseMathmlElement

class airscore.mathmlsympy.base_mathml_element.BaseMathmlElement(tag, attrib={}, **extra)

This is the base class for all MathML elements in the XML tree

is_implicit_addend

bool() - This node may appear as the implicit addend of an integer (as the fractional part of a mixed number).

is_implicit_multiplicand

bool() - When a number or a symbol appears to the left of this node, an implicit multiplication should be performed.

is_inequality

bool() - This is an equal sign or inequality operator. A chained equation can be broken on this node.

is_number

bool() - This node contains a number (digits, decimal points, etc).

is_non_neg_integer

bool() - This node is a non-negative integer. This flag is used to detect when the numerator and denominator of a fraction contain simple numbers, allowing us to use the fraction as part of a mixed number (see is_implicit_addend).

validate_max_children

int() - Maximum number of children permitted for this node.

validate_min_children

int() - Minimum number of children permitted for this node.

validate_no_text

bool() - If True, it is an error for this node to contain text directly. Text may still exist inside of nested nodes.

validate_required_attributes

set of str() - A set containing the names of required attributes. Not namespace aware.

decoded_text

unicode The text content

The default implementation decodes a limited dictionary of common unicode values to sympy equivalents.

get_sympy_text()

Get a string representing this node, including children

This is the main method that you will need to override in order to control how this node and its children are rendered in the sympy output.

The default implementation simply returns the decode_text attribute.

Override to_sympy() instead if you need to control how this node relates to its neighbors.

Returns:unicode - The string representing this node in a sympy expression
pick_subclass()

Change the leopard’s spots to zebra stripes

The xml.etree.ElementTree parsing mechanism forces us to choose the class for new elements when the start tag has been parsed, but before any of the content has been read. There are a few MathML constructs for which we need different classes, but they are represented by the same start tag. Our parser calls this method after the end tag has been processed, in order to give the element a chance to make any changes it needs to make to finalize its class selection.

In most cases, this method does nothing (the default behavior). In a few cases, however, this method will assign a new value to the instance’s __class__ property in order to change the object into an instance of a subclass of its original class.

We are using an admittedly obscure Python “feature,” and I can’t recommend that you make a habit of altering the classes of existing objects. But for this limited purpose it seemed the cleanest solution.

Returns:None
to_sympy(tail=<airscore.mathmlsympy.partial_sympy_object.PartialSympyObject object at 0x7f8e34989b90>)

Return oneself as a “partial sympy object”

You will need to override this routine if you are changing the way this node combines with its neighbor nodes. The default implementation concatenates this node with its right-hand neighbor, adding a mutliplcation operator to the list first if the right-hand neighbor is a suitable implicit multiplicand.

Override get_sympy_text() instead if you need to change how this node and its children are represented in the output, but not how this node is related to its siblings.

Parameters:tail (airscore.mathmlsympy.partial_sympy_object.PartialSympyObject) – The head of a linked list of sympy objects which will become the tail of the newly created object
Returns:airscore.mathmlsympy.partial_sympy_object.PartialSympyObject
validate()

Validate the node content

This method is called during the processing of the XML end tag, immediately after the call to pick_subclass(). Subclasses should perform any required validation of the newly-created MathML object. An error should be raised for a validation failure (usually ValueError)

The default implementation performs the following steps:

Returns:None
Raises:ValueError

BaseMathmlContainer

class airscore.mathmlsympy.mathml_containers.BaseMathmlContainer(tag, attrib={}, **extra)

The base class for all MathML container elements.

The base class for all MathML elements that can contain an arbitrary list of other MathML elements. This includes elements like <math> and <row>, as well as elements listed in the MathML spec as containing an implicit <row> element.

get_sympy_text_list()

Return a list containing text representations of simple equations or inequalities.

This method is the main loop of the parser for most MathML expressions. Most containers will not need to override this method, but it is worth understanding how it works. One oddity worth noting is that the parser parses the expression from right to left, instead of the usual left-to-right.

If the expression is a “chained” equation, containing more than one equals or inequality operator, then this function will return multiple strings in its return list. Each return value will be a simple equation or inequality–i.e., one that contains only one equals or inequality operator.

If the expression is already a simple equation or inequality, then a list containing a single string represnting that equation or inequality will be returned.

If the expression contains no equality or inequality operators, then a list containing a single string representing that expression will be returned.

Returns:list() of str()

mathml_element()

airscore.mathmlsympy.parser.mathml_element(*args)

A decorator which registers the decorated class as a mathml element class

This decorator registers the association between a class and an element name for the MathMLBuilder. In order to have the MathMLBuilder use a particular class instead of the default xml.etree.ElementTree.Element class to represent a given XML element, decorate your class definition with this decorator.

There are three permitted calling conventions. You can use the decorator without arguments, like so:

@mathml_element
class bob( BaseMathmlElement ):
    ...

in which case the new class will be used for elements named <bob> in the MathML namespace. This usage is not ideal, as it requires your Python class to have the same name (including case) as the MathML element.

You avoid this problem by specifying an element name like this:

@mathml_element( 'bob' )
class MathMLBob( BaseMathmlElement ):
    ...

Finally, if you have a class that should be associated with multiple MathML tag names, you can specify all of the names as arguments to the mathml_element decorator()

@mathml_element( 'bob', 'jim', 'joe' )
class MathMLBob( BaseMathmlElement ):
    ...

In every case, the classes will only be used for elements in the MathML namespace (http://www.w3.org/1998/Math/MathML)

No special effort beyond the use of this decorator is required to register new classes for handling MathML elements. However, you must be certain that the modules containing your classes have been imported before attempting to process the XML data.

PartialSympyObject

class airscore.mathmlsympy.partial_sympy_object.PartialSympyObject(el, tail)

An element in a linked list that represents a Sympy expression

Parameters:
  • el (BaseMathmlElement) – The mathml element from which this object is derived
  • tail (PartialSympyObject) – The next item in the list
next

PartialSympyObject - The next rightward neighbor in the list.

is_closed

bool() - Used in balancing absolute value bars. True if the parser has encountered an odd number of absolute value bars to the right of this point.

is_implicit_mutliplicand

bool() - The is_implicit_multiplicand attribute of the airscore.mathmlsympy.base_mathml_element.BaseMathmlElement that generated this object.

is_implicit_addend

bool() - The is_implicit_addend attribute of the airscore.mathmlsympy.base_mathml_element.BaseMathmlElement that generated this object.

is_number

bool() - The is_number attribute of the airscore.mathmlsympy.base_mathml_element.BaseMathmlElement that generated this object.

text

unicode() - The result of the BaseMathmlElement.get_sympy_text() method of the airscore.mathmlsympy.base_mathml_element.BaseMathmlElement object that generated this object.

get_sympy_text()

The concatenated text attributes of this node and all of the nodes to its right.

Returns:unicode
itertext()

Iterate through the linked list, returning the text attribute of each node.

Returns:iterator of unicode