[Workshop] Tidying Data with Python and OpenRefine

Date: 

Thursday, April 16, 2020, 1:00pm to 3:30pm

Location: 

Lamont Library B-30

In his paper "Tidy Data," Hadley Wickham riffs on Tolstoy: "Like families, tidy datasets are all alike but every messy dataset is messy in its own way." When we spend 75% of our "analysis" time cleaning and preprocessing data, it makes sense to focus on strategies to standardize our data.

In this workshop, we will focus on correcting common errors in collected data and (re)structuring datasets to facilitate analysis. We will be using OpenRefine and Python for these tasks; while you don't need to be a Pythonista, you should have some familiarity with Python or other similar scripting languages, as we won't be spending much time on syntax.

Free registration on the Harvard Training Portal.

 

See also: Visualization