April 20th, 2018 at 5:46 PM
We've been working on recursive sequences in python in classes lately, and I was a little surprised at how much math actually makes a difference in your ability to optimize code. I wrote a little demonstration for one of my assignments.
The fibonacci sequence makes no sense to me, but essentially, it goes like this.
0, 1, 1, 2, 3, 5, 8, 13, 21, ...
Each digit is the sum of the two previous digits. Pretty simple. We could definitely extrapolate this as far as we wanted to and get an infinitely large number of digits, but I won't go there. Apparently this is some sort of important sequence in computer science and I don't know why, but you should be able to see the pattern easily enough.
Say we wanted to know what, say, the 20th digit in the sequence was? It turns out that the method for finding this is a little bit more complicated than you'd think. It's rather tedious to go and actually add up all of the numbers. And particularly if you're going to use a recursive sequence, as fast as we think computers are these days, trying to go past about the 40th term or so in the sequence will crash my macbook because literally going through and generating this sequence is that tedious for a computer. Maybe yours might get a tad bit higher but I'd be willing to bet you won't get past 50 in Python.
Okay, but why in the world is this so freaking slow?
It's because it's recursive. If it wants to find the 5th number in the sequence, it first must find the 4th and the 3rd, and add those up. But because it doesn't know what the 4th and the 3rd are, in order to find the 4th, it must find the 3rd and the 2nd. And in order to find the 3rd, it must find the 2nd and the first. And it has to go through all of these steps before it can give a simple answer. Although this is easy for small numbers, as you can probably imagine, this starts to grow exponentially once you start finding, say, the 20th, or the 30th, or the 35th number in a sequence. The next thing you know, your computer can't even find the 40th term in a simple sequence without crashing, where you could probably do it by hand in a matter of a few minutes.
There are two ways to speed this up. Instead of calling the same function over and over again and creating a gigantic tree of function calls to recalculate the same data over and over and over again, you can either save what each digit is as you go, and only calculate if you haven't saved that particular digit yet, or you can use something called an exact solution (a simple formula you make out of the recursive sequence using some sort of voodoo magic) to do the trick. By saving these numbers, it prevents the results from growing exponentially because it never has to recalculate values it's gone over before. You can easily calculate, say, the 4000th term with this method.
I demonstrated all three methods with the following code. It calculates the 35th number in the sequence. Enough to take a few seconds on the recursive calculation, but not enough to crash my Macbook. Hopefully it won't crash yours.
This will run in Python 3 if you want to save it into a .py file and run it. And I attached a file with this post too.
This page (yes that's a link) pretty much describes what we spent an hour or two doing in class to optimize this. It's my very first semester in this new program, I haven't even taken Calculus yet, and I already feel like my head is hurting.
Needless to say, you can teach yourself to program, but it's hard to teach yourself everything. I feel that, even solving simple problems, I have learned a lot.
The fibonacci sequence makes no sense to me, but essentially, it goes like this.
0, 1, 1, 2, 3, 5, 8, 13, 21, ...
Each digit is the sum of the two previous digits. Pretty simple. We could definitely extrapolate this as far as we wanted to and get an infinitely large number of digits, but I won't go there. Apparently this is some sort of important sequence in computer science and I don't know why, but you should be able to see the pattern easily enough.
Say we wanted to know what, say, the 20th digit in the sequence was? It turns out that the method for finding this is a little bit more complicated than you'd think. It's rather tedious to go and actually add up all of the numbers. And particularly if you're going to use a recursive sequence, as fast as we think computers are these days, trying to go past about the 40th term or so in the sequence will crash my macbook because literally going through and generating this sequence is that tedious for a computer. Maybe yours might get a tad bit higher but I'd be willing to bet you won't get past 50 in Python.
Okay, but why in the world is this so freaking slow?
It's because it's recursive. If it wants to find the 5th number in the sequence, it first must find the 4th and the 3rd, and add those up. But because it doesn't know what the 4th and the 3rd are, in order to find the 4th, it must find the 3rd and the 2nd. And in order to find the 3rd, it must find the 2nd and the first. And it has to go through all of these steps before it can give a simple answer. Although this is easy for small numbers, as you can probably imagine, this starts to grow exponentially once you start finding, say, the 20th, or the 30th, or the 35th number in a sequence. The next thing you know, your computer can't even find the 40th term in a simple sequence without crashing, where you could probably do it by hand in a matter of a few minutes.
There are two ways to speed this up. Instead of calling the same function over and over again and creating a gigantic tree of function calls to recalculate the same data over and over and over again, you can either save what each digit is as you go, and only calculate if you haven't saved that particular digit yet, or you can use something called an exact solution (a simple formula you make out of the recursive sequence using some sort of voodoo magic) to do the trick. By saving these numbers, it prevents the results from growing exponentially because it never has to recalculate values it's gone over before. You can easily calculate, say, the 4000th term with this method.
I demonstrated all three methods with the following code. It calculates the 35th number in the sequence. Enough to take a few seconds on the recursive calculation, but not enough to crash my Macbook. Hopefully it won't crash yours.
Code:
import sys
sys.setrecursionlimit(2000000000)
print ("Will evaluate three methods of calculating the fibonacci sequence at 35.\n")
#define a simple recursive fibonacci sequence.
def recursive_f(n):
"""Slow recursive function. n = the number to evaluate. """
if n == 0:
return 0 #base case
elif n == 1:
return 1 #another base case
else:
return recursive_f(n-1)+recursive_f(n-2)
#same as the previous function, but store previous values.
#this keeps the function from re-evaluating the same values.
def mem_f(n, mem=dict()):
"""Memoization recursive method. n = num to eval. Don't pass mem. """
if (n in mem):
return mem[n] #welp we already saved a value. Don't re-evaluate.
if n == 0:
return 0 #base case
elif n == 1:
return 1 #base case
else:
value = mem_f(n-1, mem)+ mem_f(n-2, mem)
mem[n] = value #store value
return value
#This solution is a little different. This is a formula that is not recursive.
#In theory python shouldn't crash, but large numbers are beyond python's range.
def fib_exact(n):
"""Calculated exact solution from second order sequence. n = num to eval."""
value1 = ((1 + (5**(0.5)))**n)
value2 = ((1 - (5**(0.5)))**n)
value3 = (2**n) * (5**(0.5))
result = (value1 - value2) // value3
return int(result)
#recursive with slow method. will take forever.
print ("Slow recursive method: ", end="")
print (str(recursive_f(35)) + "\n")
#recursive with fast method. Much faster.
print ("Fast recursive method: ", end="")
print(str(mem_f(35)) + "\n")
#Exact solution. The fastest of them all.
print ("Exact non-recursive method: ", end="")
print (str(fib_exact(35)))
This will run in Python 3 if you want to save it into a .py file and run it. And I attached a file with this post too.
This page (yes that's a link) pretty much describes what we spent an hour or two doing in class to optimize this. It's my very first semester in this new program, I haven't even taken Calculus yet, and I already feel like my head is hurting.
Needless to say, you can teach yourself to program, but it's hard to teach yourself everything. I feel that, even solving simple problems, I have learned a lot.