RSS

I'm a data journalist working for The News Journal in Delaware.

You'll probably find a lot of stuff about journalism, data and data visualization (nerd alert) and other random musings.

All things
Patrick Sweet

Archive

Jun
15th
Fri
permalink

My new favorite tool: IronSpread

Two very smart and generally awesome students at MIT created an Excel plug-in called IronSpread that allows you to write Python scripts for the popular spreadsheet program, and I’m now in love with it.

I just started playing with the plug-in this week and have already applied it to a couple projects. In short, it lets you use a shell to manipulate cell values or run a python script from within Excel. For those who find themselves spending a lot of time cleaning data (e.g. people like me), this is huge if you don’t want to mess with lots of macros or VBA. For example, here’s a script I run with IronSpread that strips whitespace from every cell and returns the contents in uppercase.

def upperTrim(cell_value):
    """
    Strip whitespace from cell.
    Return uppercase value.
    """
    orig = str(cell_value)
    trimmed = orig.upper().strip()
    output = ' '.join(trimmed.split())
    return output

# Prompt for the number of rows and columns to clean.
y_range = raw_input("No. of Columns: ")
x_range = raw_input("No. of Rows: ")

# Run cells through upperTrim method and print status.
for y in xrange(1, int(y_range) + 1):
    print "Cleaning values for: %s" % Cell(1, y).value
    num_rows = 0
    for x in xrange(1, int(x_range) + 1):
        cell_value = Cell(x, y).value
        cleaned_value = upperTrim(cell_value)
        Cell(x, y).value = cleaned_value
        num_rows += 1
    print "%s rows cleaned." % num_rows

I have a spreadsheet of nuisance property violations with lots of whitespace in the address field. With IronSpread, I ran this script, typed 17 when prompted for the number of columns to clean and 5,549 for the number of rows, and sat back as the script cleaned every field in the spreadsheet in just a few seconds. This has definitely made my life easier.

If you didn’t notice from the script, all you have to do to get a cell’s value is access it with “Cell(x,y).value” or “Cell(‘a1’).value”. The first style makes iterating over a number of columns and rows a snap.

IronSpread is super handy and I’ve just started scratching the surface of what it can do. Because it’s Python, you can import any module that you have installed, so I plan on playing around with geocoding and scrapping webpages directly in Excel. The possibilities seem endless.

blog comments powered by Disqus