Info

Archive for

There’s a lot of data out there. But where do you start to find what you need?

Some basic strategies that work pretty well: Google it. That’s never a bad place to start and it only takes a second. (And use Google in a smart way. Use key words specific to your data. Use filetype: to narrow your search for specific file types. For example, use filetype: csv for only csv file formats. Use the results to dig deeper and discover related agencies that may have the data).

  • Figure out who should have the data? Who might have it? Is this information only the NYPD or the IRS can collect? The Departments of City Planning, Buildings, Housing, Finance and Taxation all keep tabs on who owns property in New York City, where that property is located and what it can be used for. If you know who ought to have the numbers you’re looking for, you can start your search by asking them.
  • Look at recent reporting about the subject. Who has been releasing reports? Who has been cited in stories? Go ask them for data, or ask them for help finding it.
  • Wikipedia is a fantastic resource. Don’t be afraid of it. Most information there comes with a citation — don’t take some Wikipedia author’s word for it, but do look at the source they cited and confirm that the numbers are there.
  • Look for think tanks and aid organizations that specialize in the issue you’re interested in.
  • Ask a librarian

Know your sources

You can get data anywhere, so it is up to you to decide whether or not you’re working with reliable data. You should know where your sources are coming from — do they have an agenda that can help you understand how they’re framing the data they put out? You can roughly guess who is behind NRA Institute for Legislative Action, but what about Law Center to Prevent Gun Violence? Don’t assume that a think tank is reliable just because it kind of feels professional.

A famous example is the misleading website www.martinlutherking.org. Though the site appears to be an informational site about the civil rights leader Martin Luther King, Jr., it actually is a mouthpiece for the white supremacist group Stormfront.org. You can verify the ownership of domain sites using www.betterwhois.com.

Provenance

It is also up to you to know where your data is coming from. Did the organization hire a research firm to conduct a comprehensive study? Or did they post a little box on their website asking visitors how they feel?

Be skeptical: an advocate (or government agency) insisting that these numbers mean something doesn’t make it so.

Where to look?

The Journalism School’s Research Center maintains an excellent roundup of guides, many of which will point you to great data sets. Check out the census, business and crime guides in particular.

NICAR’s database library is a great resource. So is Amanda’s tumblr’s “data sources” tag.

Here’s a working guide from last semester: https://github.com/amandabee/cunyjdata/wiki/Where-to-Find-Data

The data visualization shows veteran suicide rates, by state. Strengths: Each state is represented by the same-sized square, making effective comparisons between states of wildly different geographic areas. The color coding is intuitive and the legend is clear. Weakness: The hover interactivity adds more detail, but offsetting the state is distracting because it blocks the nearby data.

The bar charts below offers additional information by ranking each state, though the color scheme may confuse some readers who may try to make a connection with the map colors. The second bar chart would work better as a stacked chart to show proportions of veterans and civilian suicides of the total.

Tracking Veteran Suicides, from News21 2013

Screen Shot 2014-08-08 at 9.56.26 AM