RSS

I'm a data journalist working for The News Journal in Delaware.

You'll probably find a lot of stuff about journalism, data and data visualization (nerd alert) and other random musings.

All things
Patrick Sweet

Archive

Mar
5th
Tue
permalink
#NICAR13 is already paying off. Thanks to a few sessions from a few brilliant people, I decided to try using R to evaluate bail data. Unfortunately, all these boxplots show me is that our data was screwed up when they changed the field type in 2004, which is just another fun problem to solve…

#NICAR13 is already paying off. Thanks to a few sessions from a few brilliant people, I decided to try using R to evaluate bail data. Unfortunately, all these boxplots show me is that our data was screwed up when they changed the field type in 2004, which is just another fun problem to solve…

Feb
6th
Wed
permalink
Salmon about to hit the oven. It’s spiced up with whole grain Dijon mustard, sliced tomato and capers.

Salmon about to hit the oven. It’s spiced up with whole grain Dijon mustard, sliced tomato and capers.

Feb
5th
Tue
permalink

Parsing tables in MS Word with Python

I’ve been meaning to post this for a while and was reminded when I saw a post from Anthony DeBarros about using xlrd to parse an Excel document.

A FOIA request for voting records for Delaware state legislators returned a ton of MS Word documents, each with a table of votes for a single legislator. The State House uses a Lotus Notes database and actually stores their voting records in this format. So, I had to extract all of the votes from the files and put them in a form that I could examine with excel and later put into MySQL for publication. Here’s the script that I used:

At the heart of this script is the win32com library. The client module allows you to open up Microsoft Office applications with Python and to parse whatever is in them. In this case, we are opening an instance of MS Word in the background with lines 12-13. From there, we open each document inside the writeVotes function with lines 22-23 and select the table with line 24, “table = doc.Tables(1)”. Luckily, each document only had one table that we cared about, though we could have easily iterated through each table in each document.

The rest of the script is pretty straightforward. For each row in the table, we grab the relevant information with “.Range.text” and join it all together.

I hope others find this helpful. If the data didn’t come in several dozen word files, it probably would have been easier to do a copy/paste job, but this script sped things up tremendously and made my life easier.

Feb
4th
Mon
permalink
Homemade (giant) pigs in a blanket. Add sweet potato bisque and you have an awesome meal.

Homemade (giant) pigs in a blanket. Add sweet potato bisque and you have an awesome meal.

Jan
14th
Mon
permalink
In God we trust, all others bring data.
— William Edwards Deming (via an awesome free stats book you can download from Stanford).
Jan
6th
Sun
permalink
We created a monster!

We created a monster!

Nov
8th
Thu
permalink

Locked out of the server on Election Day

image

Election Day at The News Journal was a pretty solid success.

We used PHP and internet duct tape (iframes) to scrape and display results for big races live on our homepage and results for all races on another landing page.

Unlike the Primary Election, our cron job ran smooth all night and my stress level wasn’t through the roof.

Click here to check out a “still-live” version of our election night banner on our testing stage.  [UPDATE: 02/05/2013] That link is a bit busted right now because our PHP server was hacked and we lost a few resources.

The day before, though, was a little more difficult.

Read More

Oct
15th
Mon
permalink

Linking Leaflet maps to DataTables

Along with likely every other newsroom in the country, we’ve been having a lot of fun with campaign finance data around here. Last week, we tore into Gov. Jack Markell and GOP challenger Jeff Cragg’s reports and were experimenting with how to present the data online.

Our initial idea was just a map because the story focused heavily on geography after Markell asserted that most of the out-of-state money from surrounding states came from people who commuted and worked in Delaware. Leaflet took care of that quickly. I built the map in ArcMap, transferred it to GeoJSON and it’s straight to the web.

"But wait," we told ourselves. We’ve cleaned up all this data, let’s hand it over to readers and make it searchable. In comes DataTables. I’ve used DataTables a lot since I joined The News Journal. It’s a fantastic jQuery plugin, and I love that I can easily plug it into a MySQL database with a PHP script.

Read More

Oct
2nd
Tue
permalink
buzzfeed:

Every once in a while you see something that changes your entire life.

buzzfeed:

Every once in a while you see something that changes your entire life.

(via pleatedjeans)

Sep
21st
Fri
permalink

The News Journal’s new PANDA

The News Journal now has a PANDA, a “newsroom data appliance” that will help fuel our reporting and hopefully make lives easier.

Needless to say, I’m pretty stoked. At the NICAR conference last February, I attended a presentation by the developers and knew right away that our newsroom would benefit from a PANDA. It took us a little while to track down a local server we could use, but we ended up re-purposing an old advertising server.

If you’ve never heard of PANDA before, it’s essentially a data “library” for the newsroom. Reporters can load data - or “feed the panda” - and it indexes it and makes it searchable. Already, we have loaded the registered voters database, state credit card payments and a few campaign finance tables. So, now all we have to do is type in a name and PANDA will tell us which databases that person is in. A search for Gov. Jack Markell’s last name shows entries in the registered voters database, the credit card data and the campaign finance reports for U.S. Senator Tom Carper. Once we get more data uploaded, we could ostensibly do a relatively thorough background check on sources just by searching for them with PANDA.

In a nutshell, this is going to greatly enhance our newsroom’s ability to capitalize on all the data we collect. Who wouldn’t be excited for that?