Computer algebra systems like Mathematica or SymPy feel like magic.
You type in a derivative request and get back a perfect symbolic answer.
But how does a computer understand variables rather than strings? How
does it know the rules of calculus? I decided to build a symbolic
mathematics engine from scratch in Haskell to find out. I chose Haskell
because its functional nature mirrors mathematics perfectly. In
imperative languages, you might implement an expression as an object
with methods; in functional programming, an expression is just data, and
differentiation is just a transformation of that data.
Expression Trees
The fundamental insight is that any mathematical expression can be
represented as a tree. The expression
isn’t a flat string; it’s a Sum operation where the left child is a
Product (of 2 and x) and the right child is 1. In Haskell, we define
this using an Algebraic Data Type that covers everything from simple
constants to trig functions. Once you have this tree structure, calculus
becomes a game of pattern matching. Differentiation isn’t a calculation;
it’s a recursive function that transforms one tree structure into
another based on the standard rules. For example, the product rule
becomes a direct translation where a Product node is transformed into a
Sum of two Product nodes representing the sub-derivatives.
Implementing replacement and substitution required a careful
distinction between the structure of an expression and its evaluation.
The substitute function performs a purely structural traversal,
replacing Var nodes matching a target string with a
replacement sub-tree. This allows for powerful compositional operations;
for instance, computing
becomes a substitution of
into every variable node of
.
This is distinct from mathematical evaluation where variables are
replaced by numeric constants; in this system, variables can be replaced
by entire complex expression trees, enabling symbolic composition and
chain rule applications.
Simplification
However, raw symbolic differentiation is messy. If you differentiate
using the power rule, you strictly get
.
A computer doesn’t automatically know that
or that
.
Without simplification, the derivative of
might come out as
.
To fix this, I implemented a simplifier that repeatedly applies
algebraic identities until the expression stops changing (a
“fixed-point” algorithm). It applies rules like identity addition,
identity multiplication, and constant folding to turn that messy tree
back into the clean result we expect.
The simplification algorithm operates on a fixed-point iteration
principle. A single pass of algebraic rules isn’t sufficient,
simplifying
to
might reveal further opportunities like
that weren’t visible in the first pass. My simplifyFully
function recursively applies the simplification rules up to a safety
limit (100 iterations) or until the expression stops changing
(converges). This ensures that deeply nested expressions like
collapse completely down to
without requiring complex look-ahead logic in the individual rules.
This engine is robust enough to handle advanced applications like
Taylor Series expansion. The Taylor Series construction showcases the
power of Haskell’s lazy evaluation and functional folding. The series is
generated by mapping a term-generating function over an integer sequence
and then folding the results with Sum. Each term combines
three distinct symbolic operations: computing the
-th
derivative, evaluating that derivative at the center point
via substitution, and multiplying by the power term
.
Because these operations return valid Expr trees, they can
be immediately simplified by the optimization pass, keeping the
polynomial expansion compact even for high-degree approximations:
Building this library highlighted why functional programming is so
effective for compiler-like tasks; the lack of mutable state means you
never worry about “breaking” an expression while transforming it.