2nd

The News Journal now has a PANDA, a “newsroom data appliance” that will help fuel our reporting and hopefully make lives easier.
Needless to say, I’m pretty stoked. At the NICAR conference last February, I attended a presentation by the developers and knew right away that our newsroom would benefit from a PANDA. It took us a little while to track down a local server we could use, but we ended up re-purposing an old advertising server.
If you’ve never heard of PANDA before, it’s essentially a data “library” for the newsroom. Reporters can load data - or “feed the panda” - and it indexes it and makes it searchable. Already, we have loaded the registered voters database, state credit card payments and a few campaign finance tables. So, now all we have to do is type in a name and PANDA will tell us which databases that person is in. A search for Gov. Jack Markell’s last name shows entries in the registered voters database, the credit card data and the campaign finance reports for U.S. Senator Tom Carper. Once we get more data uploaded, we could ostensibly do a relatively thorough background check on sources just by searching for them with PANDA.
In a nutshell, this is going to greatly enhance our newsroom’s ability to capitalize on all the data we collect. Who wouldn’t be excited for that?
Oh IE, how the world hates you.
<HTML> <HEAD>
<meta HTTP-EQUIV="REFRESH" ...> </HEAD> <BODY> </BODY> </HTML>
<!--
- Unfortunately, Microsoft has added a clever new
- "feature" to Internet Explorer. If the text of
- an error's message is "too small", specifically
- less than 512 bytes, Internet Explorer returns
- its own error message. You can turn that off,
- but it's pretty tricky to find switch called
- "smart error messages". That means, of course,
- that short error messages are censored by default.
- IIS always returns error messages that are long
- enough to make Internet Explorer happy. The
- workaround is pretty simple: pad the error
- message with a big comment like this to push it
- over the five hundred and twelve bytes minimum.
- Of course, that's exactly what you're reading
- right now.
-->
I’m cleaning and preparing school test score data for later this week. We are ranking schools for each subject and grade level based on average test score. The somewhat annoying challenge in excel is getting the built-in RANK() function to work on filtered data. When trying to input a range of scores into the RANK function, excel is including the hidden rows. The solution is courtesy of Damon Ostrander (yeah, he’s a pianist as well as a VBA pro).
Function Vis(Rin As Range) As Range
Dim Cell As Range
Application.Volatile
Set Vis = Nothing
For Each Cell In Rin
If Not (Cell.EntireRow.Hidden Or Cell.EntireColumn.Hidden) Then
If Vis Is Nothing Then
Set Vis = Cell
Else
Set Vis = Union(Vis, Cell)
End If
End If
Next Cell
End Function
I very very rarely mess with VBA (though I should find some time to learn it), but this was just too perfect. All you have to do with this function is go from =RANK(a1, $a$1:$a$99) to =RANK(Vis(a1), Vis($a$1:$a$99)). Drag the auto fill down the column and Presto!
As always, if you have any ideas or know better ways to do this, leave me a note in the comments.
Two very smart and generally awesome students at MIT created an Excel plug-in called IronSpread that allows you to write Python scripts for the popular spreadsheet program, and I’m now in love with it.
I just started playing with the plug-in this week and have already applied it to a couple projects. In short, it lets you use a shell to manipulate cell values or run a python script from within Excel. For those who find themselves spending a lot of time cleaning data (e.g. people like me), this is huge if you don’t want to mess with lots of macros or VBA. For example, here’s a script I run with IronSpread that strips whitespace from every cell and returns the contents in uppercase.
def upperTrim(cell_value):
"""
Strip whitespace from cell.
Return uppercase value.
"""
orig = str(cell_value)
trimmed = orig.upper().strip()
output = ' '.join(trimmed.split())
return output
# Prompt for the number of rows and columns to clean.
y_range = raw_input("No. of Columns: ")
x_range = raw_input("No. of Rows: ")
# Run cells through upperTrim method and print status.
for y in xrange(1, int(y_range) + 1):
print "Cleaning values for: %s" % Cell(1, y).value
num_rows = 0
for x in xrange(1, int(x_range) + 1):
cell_value = Cell(x, y).value
cleaned_value = upperTrim(cell_value)
Cell(x, y).value = cleaned_value
num_rows += 1
print "%s rows cleaned." % num_rows
I have a spreadsheet of nuisance property violations with lots of whitespace in the address field. With IronSpread, I ran this script, typed 17 when prompted for the number of columns to clean and 5,549 for the number of rows, and sat back as the script cleaned every field in the spreadsheet in just a few seconds. This has definitely made my life easier.
If you didn’t notice from the script, all you have to do to get a cell’s value is access it with “Cell(x,y).value” or “Cell(‘a1’).value”. The first style makes iterating over a number of columns and rows a snap.
IronSpread is super handy and I’ve just started scratching the surface of what it can do. Because it’s Python, you can import any module that you have installed, so I plan on playing around with geocoding and scrapping webpages directly in Excel. The possibilities seem endless.
After hearing about LinkedIn being hacked, I dug around a little to find out how to check if your password was among the stolen ones. All the passwords are still stored as hashed values, but if you have a simple password, it could be easy to figure out and apparently many of the easier ones have already been cracked.
So, here’s a little python script I pulled together from a few different sources (read: more-or-less copied) to see whether yours is among the hacked passwords. The only caveat is that you’ll have to download the hacked passwords (be careful, and download at your own risk), which I found here http://www.mediafire.com/?n307hutksjstow3
from hashlib import sha1
import getpass
import sys
ifile = open("<INSERT PATH TO DOWLOADED PASSWORDS FILE HERE>")
hashes = []
for x in ifile:
hashes.append(x[0:40])
def hashMyPassword(password, offset=5):
hashed = sha1(password).hexdigest()
return (hashed, '0' * offset + hashed[offset:])
def checkPassword():
password = getpass.getpass("Enter Password:")
myHash, myHashBroken = hashMyPassword(password)
if myHash in hashes or myHashBroken in hashes:
print "Your password was STOLEN."
else:
print "Your password was NOT STOLEN."
checkAgain = raw_input("Check another password? (y/n): ")
if checkAgain == "y":
checkPassword()
else:
sys.exit()
if __name__ == '__main__':
checkPassword()
Save the script as “linkedIn_check.py” and run it from the command line using “python linkedIn_check.py”.
Hope this helps. Let me know in the comments if there is a better way to do this or if I missed something.
Credit: User stordoff on Hacker News for the majority of the script.
npr:
I make my living in the blue part of the pie; my semicolon game is on point like a sir.
I’m not even going to attempt to cleverly include a semicolon. —Sarah