This post has been updated. Scroll to the bottom for the new information.
We published a story today about the growing population of deer and the state’s warnings that motorists need to be extra mindful of deer in the roadway this time of year.
It just so happens that we have a database of car accidents between 2005 and 2012 that can tell us whether a deer in the roadway caused the accident. It’s a little spotty before 2010, but that still leaves us three years of accident data to play with. In this case, we identified more than 4,700 car accidents caused by deer in the roadway in those three years.
We don’t want to just throw 4,700 points on a map, however, especially when the map is just the size of Delaware. So, this was the perfect opportunity to try something that I think is pretty cool and that I’ve seen a few other news outlets do: turn the state into a hexagonal grid to identify areas with a lot of accidents.
Instead of leaving you in suspense, here’s a link to the map. The rest will walk you through how it was put together.
After months of analysis, reporting and delays, Melissa Nann Burke and I finally saw our analysis of the most dangerous intersections in Delaware grace A1 of The News Journal.
STORY: Delaware’s Dangerous Intersections
INTERACTIVE MAP: Most Dangerous Intersections
Our analysis focused on the 185 intersections that averaged at least 15 crashes per year between 2010 and 2012. I’ll defer to the story for a discussion of the findings, though. Here, I want to focus on how the analysis was done.
I’ve been meaning to post this for a while and was reminded when I saw a post from Anthony DeBarros about using xlrd to parse an Excel document.
A FOIA request for voting records for Delaware state legislators returned a ton of MS Word documents, each with a table of votes for a single legislator. The State House uses a Lotus Notes database and actually stores their voting records in this format. So, I had to extract all of the votes from the files and put them in a form that I could examine with excel and later put into MySQL for publication. Here’s the script that I used:
At the heart of this script is the win32com library. The client module allows you to open up Microsoft Office applications with Python and to parse whatever is in them. In this case, we are opening an instance of MS Word in the background with lines 12-13. From there, we open each document inside the writeVotes function with lines 22-23 and select the table with line 24, “table = doc.Tables(1)”. Luckily, each document only had one table that we cared about, though we could have easily iterated through each table in each document.
The rest of the script is pretty straightforward. For each row in the table, we grab the relevant information with “.Range.text” and join it all together.
I hope others find this helpful. If the data didn’t come in several dozen word files, it probably would have been easier to do a copy/paste job, but this script sped things up tremendously and made my life easier.