Wednesday, March 23, 2016

Minecrafting Adders

George Boole supposedly proved that every operation in math can be reduced to a combination of three operations: "and," "or" and "not." This had a great influence on the development of computers years ago. When teaching a CS class for Learningtech.org I was introduced to the Cedar Logic software, which allows you to create interactive virtual logic gates. Here's a two bit adder made from a couple of gates (four, actually):
The two inputs on the left are on, so the output is the binary form of 2: on and off or "10."

A student (thanks, Nathan) showed me how he made logic gates in Minecraft using redstone, the game's "electrical" substance. I didn't think much more about it until recently, when I bought Craig Richardson's book on modding the Raspberry Pi's version of Minecraft using Python. It can be used on the full PC or Mac version as well, so I experimented with writing  code to build houses and cities and then blow them up with TNT. It's fun!

For those students who would rather create their structures by setting blocks the old fashioned way, you can simply scan an area (ok, a volume) with the "getBlock" function and some nested loops and save each block's information  for building a house in a list or .txt file. Then you can run the process backwards and recreate the structure at another location.

def duplicate(x,y,z,l,h,w):
    '''saves all the blocks in an area'''
    blockFile = [] #list for block info
    for i in range(l): #loops through the x-values
        blockFile.append([]) #starts new array for rows
        for j in range(h): #loops through the y-values
            blockFile[i].append([]) #starts new array for columns
            for k in range(w):  #loops through the z-values
                #put block data in row and column array:
                blockFile[i][j].append(mc.getBlock(x+i,y+j,z+k))
    return blockFile


For instance, the figure above was made using this "copy and paste" procedure. The structure on the left was the original, which I copied and pasted in the middle. The data for the color of the wool and the orientation of the torches was lost. I needed to modify the getBlock function to "getBlockWithData" and then the structure in the foreground was identical to the original.

Unlike in the minimized version of Minecraft available on the Raspberry Pi, in the Windows version I could also use redstone. It made me think of Nathan and the logic gates. A web search got me the design for an AND gate. The output on the left is only on if both switches on the right are ON.


A coworker helped me "optimize my code" and make an AND gate out of less blocks:


So I made the gates from the adder at the top and connected everything with redsone. Using my duplicate function I copied and pasted the components until I had a 4-bit adder:


(My previous adder is on the bottom of the screen.) The switches are set up to add 11 + 1. In binary that's 1011 and 1. The 5 glowstone pillars at the top of the screen are the digits, and they're off, on, on, off and off, so 01100 or 12 in decimal. It works!

Getting in creative in Minecraft is a great way to learn about 3D coordinates and interacting with programs using Python. Then you get to walk around in your creation!

Tuesday, March 22, 2016

Drawing Triangles the Hard Way

I'm a big fan of fractals, and dedicated a whole chapter of Hacking Math Class with Python to them. The Sierpinski Triangle (or "Sieve") is a famous shape that pops up in the most unexpected ways.


In my book I showed the easy way to draw one by making a turtle do it, but there's a little more complicated way that involves another famous triangle. It was known to the Chinese and Indians while we Europeans were still hiding from the Vikings, but we like to call it Pascal's Triangle after a more recent French mathematician.


The numbers can be generated by adding the numbers in the row above, but using a computer it's quicker to use the combination formula.



Here's the (Python 2) code I copied and pasted from dheerosaur on Stack Overflow:

import operator as op
...
def ncr(n, r):
    r = min(r, n-r)
    if r == 0: return 1
    numer = reduce(op.mul, xrange(n, n-r, -1))
    denom = reduce(op.mul, xrange(1, r+1))
    return numer//denom

(To convert it to Python 3, just change "xrange" to "range" and add "from itertools import reduce" to the imports.)

The 1st row and column are really the 0th, so the "20" in row 7 is actually row 6, column 3:

>>> ncr(6,3)
20

How could you possibly get The Sierpinski Sieve from Pascal's Triangle? Let's use Pygame.

I use a template I found at Professor Craven's excellent Pygame tutorial. This sets up the screen and the colors and the display loop. All I have to do is draw shapes; it even updates the display and quits for me. I have to create a function to draw a triangle where I want it and what size I want it. The number (calculated with ncr above) will determine whether it's filled with color or not:

def triangle(x,y,size,filled):
    '''draw an equilateral triangle'''
    #if the number isn't a multiple of 2, color it in.
    if filled % 2 != 0: 
        pygame.draw.polygon(screen,VIOLET,[[x,y],
                                       [x-size/2.,y+1.732*size/2.],
                                       [x+size/2.,y+1.732*size/2.]],
                                        0)
    else: thick = 0 #leave multiples of 2 empty

The "gasket" function will draw the triangles at the right spots:

def gasket(rows):
    size = 400./rows #resize depending on the number of rows
    for i in range(rows):
        for j in range(i+1):
            y = 1.732*i*size/2. #height of equilateral triangle
            x = -i*size/2. + j * size + WIDTH/2. #width of triangle
            triangle(x,y,size,ncr(i,j)) #draw the triangle


Now running "gasket(50)" will get you 50 rows of Pascal's triangle, with the multiples of 2 left empty:



Look familiar? You can change one digit in the "triangle" function to leave out the multiples of 3 instead:


Or the multiples of 8. The pattern of empty triangles changes every time. Try other numbers!


This blog entry has been brought to you by O'Reilly books.:-)

Sunday, March 20, 2016

Prime Testing

For decades, a common test of a computer's abilities (and its programmer's, too) is calculating prime numbers. Right now a job applicant is being asked to provide yet another function to generate primes. The largest known prime (discovered a couple of months ago) contains over 22 million digits. Yesterday a couple of students and I were working on answering the question, "What is the 10,001st prime number?" We started at the beginning, with the question of how we would tell whether a number is prime.

Obviously we'd divide it by every lower number. If there are no divisors, it's prime. The Python code for this is

def isPrime(n):
    for i in range(2,n):
        if n % i == 0:
            return False
    return True

One of my students suggested we start a list to store the primes in, so we can easily tell the program to stop when the list has reached the desired length. Here's the code for that:

def primeList(length):
    primes = []
    num = 2
    while len(primes) < length:
        if isPrime(num):
            primes.append(num)
        num += 1
    return primes

Once we got this working correctly, my student went straight for the 10,001st:
 
primes = primeList(10001)
print(primes[-1]) #prints the last number in the list

Using Python's time module, you can time how long it takes to run your program. At the top of your file, you import the time module and record the starting time:

import time
time1 = time.time()

At the bottom of the code, after printing the answer, you record the ending time and print the elapsed time:

time2 = time.time()
print(time2 - time1)

On the computer the student was on this took a minute and a half. On my desktop, it took 45 seconds. Still a long time, since we wanted to generate a much bigger list of primes and save it for future use. How could we save the computer some effort?

This process of optimizing algorithms is a big deal in computer programming, and it's the reason we'll still need mathematical thinking in the digital age. Our first effort was "brute force," meaning we checked all the numbers up to n as possible divisors, and then we checked every number until we got a list of 10,001 primes. Do we really need to check all these numbers?

Obviously, after 2 there will be no more even primes. So we can skip all the even numbers. We changed our code to this:

def isPrime(n):
    #start at 3 and go up to n by 2's
    for i in range(3,n,2):
        if n % i == 0:
            return False
    return True

and

def primeList(length):
    primes = [2] #the only even prime
    num = 3 #the next number to try
    while len(primes) < length:
        if isPrime(num):
            primes.append(num)
        num += 2 #only check odd numbers
    return primes

I was sure this would cut the processing time in half, and it did. We got our 10,001st prime in 22.6 seconds.

But could we do even better? In my book Hacking Math Class with Python, I show how to speed up the process much more. To check whether a number is prime, you only have to check the numbers up to the square root of that number. Past that point, if there's a divisor, it must be multiplied by a smaller number, and you've already checked every smaller number. We changed our "isPrime" code to this:

from math import sqrt

def isPrime(n):
    for i in range(3,int(sqrt(n))+1,2):
        if n % i == 0:
            return False
    return True

Now we got our answer in 0.3 seconds!

With that kind of speed we could create a .txt file of primes to have on our computer so we wouldn't need to wait:

f = open('primelist.txt','w')
for prime in primes:
    f.write(str(prime)+ ' ')
f.close()

Now we can just access the file anytime and loop through our primes without having to generate them all over again.

Saturday, March 12, 2016

Hacking the Election (Data)

A student at the Coder School was creating his own election results program, to print out who won or if there was a tie. He was doing a lot of stuff manually, like entering his own data, and writing lots of conditionals for "if Candidate_A > Candidate_B and Candidate_A > Candidate_C" and so on. I figured Python could offer us a way to get real data and visualize it in a graph.

Python has a built-in module for reading csv ("comma separated values") files you find on the web. I found a .csv file of the results of the Virginia Republican Primary, held less than two weeks ago. I downloaded it and opened it in a spreadsheet program, but it had a lot of info I wasn't interested in:

This wasn't even the whole sheet, which contained 22 columns, from "Election Name" to "District Type" to lots of crazy "IDs." I was only interested in the "Last Name" and "Total Votes" columns, D and F.

And instead of just a dozen or so rows of the total votes each candidate received in Virginia, I had 38,000 rows of all the totals for each candidate from every county in Virginia! Clearly some pruning was needed. I saved the file as "results.csv" and fired up Python to sort things out.

I started by adapting the "read_csv" function from Chapter 3 of Amit Saha's brilliant book Doing Math With Python. The csv module has a "reader" function which loops over the rows of the file. This code will open the file and start an iterator called "rows":

def read_csv(filename):
    '''Returns the results in a dictionary'''
    with open(filename) as f:
        rows = csv.reader(f)
        next(rows)

So now I can go through all the rows and add up everybody's votes. I'll put the 4 top candidates (Trump, Cruz, Rubio and Kasich) into a dictionary called "results" and start their totals at 0:

results = {'Trump':0, 'Rubio': 0, 'Cruz':0,'Kasich':0}

Now the program will loop over the rows and check if column D is one of those four names. If so, it'll add the number in column F to his total. The indices start at 0, so column D is the item in "row" with index 3, or "row[3]" and column F is "row[5]."

for row in rows:
    if row[3] in results:
         results[row[3]] += int(row[5])

That should work perfectly, but instead it gives me a ValueError because it hit a blank row or something. Let's just have it continue if it hits a snag like that:

for row in rows:
    try:
        if row[3] in results:
            results[row[3]] += int(row[5])
    except ValueError:
        continue

Now we can print the results!

print(results)

And at the bottom of the file, call the "read_csv" function on running the program:

if __name__ == '__main__':
    read_csv('results.csv')

These are the totals for the top 4 Republican candidates:

{'Kasich': 97791, 'Rubio': 327936, 'Trump': 356896, 'Cruz': 171162}

There was no mistake: those totals match the official results on http://results.elections.virginia.gov/. After printing the results we could add some code to print the winner. First we find the "key," in this case the name, associated with the maximum value. Then we print it.

winner = max(results.keys(), key=(lambda k: results[k]))
print('The winner is {0}, with {1} votes.'.\
       format(winner,results[winner]))

Now it adds the winner line:

The winner is Trump, with 356896 votes.

To make a nice bar chart out of the numbers, add this code:




And here's the chart:


Did I leave somebody out? If you want to add a candidate, just add them to the dictionary like this:

results = {'Trump':0, 'Rubio': 0, 'Cruz':0, 'Kasich':0, 
           'Carson':0, 'Bush':0}

Run the program and you'll get an updated chart:
(It looks a little different because I ran it in an IPython notebook.)

A great introduction to using Python for data analysis and very timely, too!

Update 3/13/16:
It's even easier to get the totals using a database like pandas: