# Fast polynomial evaluation

I guess everybody knows what a polynomial is.

Polynomials are really cool and used all over the place in programming. Evaluating a polynomial is really important, so how should we do it?

First of all lets see how the polynomial function looks like:

$a_n x^n + a_{n-1} x^{n-1} + \dots + a_2 x^2 + a_1 x + a_0$

We can see that a polynomial is basically defined by it’s coefficients $a_i$. So we can actually store all we need for the evaluation by storing the coefficients into an array.

We can use a shorter notation $\sum\limits_{i=1}^n a_i x^i$ that actually implies the first algorithm.

We implement this in Javascript

function polyEvalSimple(x,coeff)
{
var xacc = 1;
var finalValue = 0.0
for(var i=0; i< coeff.length; i++)
{
finalValue = finalValue + xacc * coeff[i];
xacc = xacc * x;
}
return finalValue;
}


and then in Python

def polyeval(x,coeff):
xacc = 1;
finalValue = 0.0
for i in xrange(0,len(coeff)):
finalValue = finalValue + coeff[i] * xacc
xacc = xacc * x
return finalValue


and of course C

double polyEvalSimple(double x, double *coeff, unsigned int numCoeff)
{
double xacc = 1;
double finalValue = 0.0f;
for (unsigned int i = 0; i < numCoeff; i++)
{
finalValue = finalValue + coeff[i] * xacc;
xacc = xacc * x;
}
return finalValue;
}


Ok, now we can do better than this. We just use a little math trickery and split the polynomial into linear parts (monomial basis):

$p(x) = a_0 + x(a_1+x(a_2+ \dots +x(a_{n-1} + a_n x)))$

This is called the Horner method and it’s been around ages (even before Horner of course :) )

Now lets see the implementation:

Javascript

function polyEvalHorner(x,coeff)
{
var finalValue = 0
for(var i=coeff.length -1; i>= 0 ; i--)
{
finalValue = finalValue * x + coeff[i];
}
return finalValue;
}


and then in Python

def polyevalHorner(x,coeff):
finalValue = 0.0
for i in xrange(len(coeff)-1 , -1,-1):
finalValue = finalValue * x+ coeff[i]
return finalValue


and of course C

double polyEvalHorner(double x, double *coeff, unsigned int numCoeff)
{
double finalValue = 0.0f;
for (int i = numCoeff-1; i >= 0 ; i--)
{
finalValue = finalValue* x + coeff[i];
}
return finalValue;
}


As we can see, we basically avoid a multiplication by $x$ each iteration.

Now that’s all fine and dandy but will that actually matter on todays computers?

Let’s do some profiling.

We note the G = $\frac{t_{horner}}{t_{simple}}$ .

Python Javascript C
G 0.71 0.375 1.11

I did my tests on my home CPU: i7-2600 at 3.4GHZ (they could vary on your computers so feel free to run them and tell me your results)

We can see that there are some gains in speed when using the Horner method in Python and in Javascript.

We also see that the Horner method is actually worst in C . Why is that? We should be having better times, because we avoid doing a multiplication ! :)

We can also check the assembly code (I took the debug code so we can actually understand what is happening, the debug times are still better for the simple evaluation):

Simple method

finalValue = finalValue + coeff[i] * xacc;
00007FF69D321BCE  mov         eax,dword ptr [rbp+44h]
00007FF69D321BD1  mov         rcx,qword ptr [coeff]
00007FF69D321BD8  movsd       xmm0,mmword ptr [rcx+rax*8]
00007FF69D321BDD  mulsd       xmm0,mmword ptr [xacc]
00007FF69D321BE2  movsd       xmm1,mmword ptr [finalValue]
00007FF69D321BEB  movaps      xmm0,xmm1
00007FF69D321BEE  movsd       mmword ptr [finalValue],xmm0
xacc = xacc * x;
00007FF69D321BF3  movsd       xmm0,mmword ptr [xacc]
00007FF69D321BF8  mulsd       xmm0,mmword ptr [x]
00007FF69D321C00  movsd       mmword ptr [xacc],xmm0


Horner

finalValue = finalValue* x + coeff[i];
00007FF69D321B10  movsd       xmm0,mmword ptr [finalValue]
00007FF69D321B15  mulsd       xmm0,mmword ptr [x]
00007FF69D321B1D  movsxd      rax,dword ptr [rbp+24h]
00007FF69D321B21  mov         rcx,qword ptr [coeff]