• Articles for students
Differential equations: how does separation of variables really
One of the simplest forms of differential equation to solve is that in
which the variables can be separated on either side of the equality.
Such equations occur frequently in science and engineering, and are usually
the first problems of this type to be taught to students.
Some of the operations that textbooks and classes teach to students, as
part of the process of solving separable-variable equations, appear to be
mathematically highly dubious. Students are typically taught to treat the
differential operator dy/dx as if it were a simple ratio of two quantities,
whose individual terms can be multiplied and divided independently.
However, dy/dx is not a ratio, and is not clear why these procedures
work. Worse, it is unclear whether it is always safe to treat
the differential operator in such a cavalier fashion, or whether it is
only safe for particularly classes of problem.
This situation irritated me when I was first taught calculus more
than thirty years ago, and it continues to irritate me to this day.
In this article I will attempt to describe how we can deal with separation
of variables without playing fast and loose with differentials.
In doing so, I hope to provide some insight into what is going on
behind the scenes when we follow the method that is usually taught.
The arguments in this article apply also to other procedures in calculus
that appear to rely on treating a differential operator as a ratio,
such as integration by U-substitution.
I'll start by describing the usual textbook procedure for solving a
simple differential equation with separable variables, and the
explain what's wrong with it, and how it can be put right.
I want to point out right from the start that this article is really
about the philosophy of mathematics. You can follow the textbook method
and get the right answer without understanding any of this stuff.
Please note that this page contains mathematical notation described using
the MathML standard. At the time of writing, Firefox seems to be the only
browser that supports MathML properly. Google Chrome is said to support it,
but the version I tried did not. If you are unable to view this page properly,
please consider using Firefox.
The textbook approach
Consider the following simple differential equation for x and y
Our job is to find a relationship between x and y that does not
contain a derivative term, that is, to solve for y in terms of x.
The usual textbook approach to solving an equation of this type is to
separate the variables — putting all the x terms on one side, and the
y terms on the other. The x terms have to include
dx, and the y terms dy. We can
multiply both sides by dx and by 1-y2, and
this gives us:
But what does it mean to 'multiply by dx?' dx is not a
quantity — it is a component of the operator d/dx[f(x)]. While it is true that the 'd' terms do, in some senses, represent
small values ('infinitesimals', in the Leibnitz formulation), the
operator itself is a limit expression (maybe; see the closing remarks for
discussion) — it denotes the limiting value of
one infinitesimal as the other approaches zero. The components of
dy/dx have little significance on their own, and they certainly
don't form an arithmetical ratio.
Leaving that problem aside for now, the textbook procedure now typically calls
for us to write an integration sign in front of each side (whatever that signifies), to create a pair of integrals:
We now have something that is mathematically well-formed, but we've got to it
by a peculiar process. Like the differential operator, the indefinite
is something that is meaningful in
a particular form; the doesn't have
much meaning on its own.
Be that as it may, integrating both sides with respect
to their independent variables gives:
With a bit of tidying up:
This is a cubic equation in y, and not easy to juggle into a
straightforward relationship of the form y=f(x). Still,
we have a solution (or, rather, a family of solutions) to the original
differential equation; the calculus is done, and the rest is algebra.
So what's the problem?
Most textbooks and classes gloss over the fact that dy/dx is not
a ratio of two quantities, and treat the individual terms as though
they can be multiplied, divided and canceled at will. Some authors
do at least point out the problem, but usually to say "dy/dx is not
a ratio, but it can be helpful on occasions to treat it as if it were."
It's never very clear what these occasions are. Sometimes you'll
come across statements such as "This procedure is justified by the
chain rule" or "This procedure is justified by the Fundamental Theorem
of Calculus." Maybe that's true, but how is it justified?
You'll sometimes see expressions in dx or dy explained
away with hand-waving expressions like "formal form" or "infinitesimal
form". But what do those terms really mean?
By leaving these questions unanswered, students are left with no way
to figure out whether the approach being explained is generally
applicable, or works only for a limited class of problems.
Let's see if we can't improve the formulation of the separation of
variables procedure, and avoid some of these ugly mathematical
kludges. Starting with the original differential equation:
We can rearrange to give:
Doing this keeps dy/dx intact, for now.
Now let's take the indefinite integral of each side with respect to x.
This is a legitimate thing to do, as we're using the same independent
variable in the two integrals, even though the LHS is a function
of y as well as x:
From here, we could "cancel out" the dx terms to leave:
and this is the same separated form we arrived at earlier.
But what does it mean to "cancel out" the dx terms? Is doing this
any better than manipulating dy and dx independently?
Not really — all
we've done is postpone the application of a kludge — we're still going to have
to treat dy/dx as a ratio — we're just doing it a bit later. The
dx in the integration operator is not the same quantity as the
dx in the differentiation operator, if it is even a quantity at all.
Giving the lack of mathematical rigour, it can be somewhat surprising
that these procedures do actually work — in this particular kind of
problem, at least, manipulating dy and dx independently
does allow the correct solution to be reached.
"Cancelling dx" as a shortcut to applying the chain rule
To figure out what's going on here, we need to think about what an
indefinite integral actually means. In essence, when we look to evaluate
what we're really asking is "What can be differentiated with respect
It's an odd question, on the face of it, because aren't particularly used to see
the results of a differentiation having differential terms
There's certainly no problem finding something that will differentiate to
: straightforward integration gives us
, for any constant C.
Knowing this, it is possible to find an expression intuitively that will
. Such an expression is
where y is a function of x, and we differentiate in terms
of x, not y. To perform this differentiation we need
the chain rule:
If we take
, where u is a function of x,
then its derivative with respect to x is:
In other words, the LHS of
can be replaced to give:
This is exactly where we got to by "cancelling out" the dx
terms and performing the integration of the LHS with respect to y, as
explained above. We could then proceed by straightforwardly integrating
The above reasoning shows that the "cancelling out" of the dx
terms was justified, in the sense that doing it led to the same answer
that could be obtained by intuitively finding an expression that could
be differentiated to give the integrand in question.
In practice, however, we need something more general — something that justifies the operation used, and can be applied routinely
in problems of this short.
For that, we need to show with reasonable rigour that
That is, we need to show that "cancelling the dx terms"
is mathematically valid.
To do that, we'll start with the chain rule again:
Substituting into the chain rule:
The first term on the RHS is simply a differentiation of an integration with
respect to the same variable, y, and so the equation reduces to:
Now if we integrate both sides with respect to x, we
As the RHS is just the integral of a derivative with respect to the
same variable, the equation reduces to:
which is what we set out to prove.
So where does that all get us?
In summary, when we split up dy/dx into separate terms and
manipulate those terms separately,
what we're really doing is
integrating both sides of the equation with respect to the same variable,
and then applying the chain rule to remove the differential term
from the integrand. Treating dx and dy as elements of
a ratio is sloppy, but it is mathematically sound to do so in
cases where the chain rule would be capable of yielding the integrands
And, of course, it's a lot more convenient than applying the chain rule
explicitly every time.
The meaning of dy/dx is itself problematic. Traditionally
it has been seen as an operator, derived by considering infinitesimal changes
in one variable with respect to another. However, this formulation is not
without its problems — most prominently that there is no
'infinitesimally small' real number. The set of real numbers is, by definition,
infinitely sub-dividable into smaller sets. To some extent, problems
understanding what 'infinitesimal' meant in the context of real numbers
led to derivatives being treated as limiting expressions, rather that
ratios of infinitesimals. However, there are other ways to understand
a derivative, including Abraham Robinson's rigorous definition of
infinitesimals in terms of hyperreal fields. Some of these formulations
might allow dy and dx to be treated independently in
Nevertheless, it seems to me that explaining the methodology of separation
of variables and integration by substitution as a short-cut to the
application of the chain rule is more likely to be understood by
students, than a highly technical discussion of the meaning of a differential.