The default settings in Excel spit out pretty terrible looking charts. But it’s not too difficult to turn something like this:
Into something more like this:
Note the specific steps for the transformation:
- The move from a legend to direct labeling
- The reduction of excessive axes lines and tick mark labels to reduce visual clutter
- Highlighting the data of interest and de-emphasizing the others
- Adding the headline, text, and source
Follow this tutorial from Storytelling with Data to see the step-by-step guide that took the default graph style to its more refined, and ultimately more effective, finished state.
Finally: an animated GIF of the process of stripping away unnecessary graphics to focus on the data and the message:
Bitmap formats (JPEG, PNG, BMP, GIF) are images that are displayed with pixels, or tiny colored dots. Vector graphics, on the other hand, are rendered by the computer based on mathematical formulas and code. For example, a circle would be defined by a radius of a certain size, the color of the outline, the color of the filled inside, its location, and so on. Because vector graphics are defined mathematically, they don’t have a fixed resolution. You can zoom in or out of a vector image, and the display remains sharp. If you zoom into a bitmap file, you’ll see the pixels that make up the image.
An example of charts rendered as vector graphics, from the Guardian.
Microsoft Excel can output charts as JPEG, GIF, BMP, or PNG (bitmap formats), or as PDF (vector). You can convert the PDF file into an SVG (Scaleable Vector Graphics), which is a common format for displaying vector graphics in a browser. Your images will be sharper and easier to edit in a vector graphics editing program like Adobe Illustrator.
- In Microsoft Excel, size your chart (width and height) in the FORMAT tab.
- Select the text in the horizontal and vertical axes, and choose Format Selection to change the Font to Arial. The default Excel font is a bit unusual and sometimes throws off the conversion of text in later steps.
- Right-click (Ctrl-click) on the chart frame and choose Save As Picture.
- Save your chart picture as a PDF.
- Open the PDF in Adobe Illustrator. If you’re comfortable with Illustrator, you can edit your chart.
- Choose File > Save As, and save as a SVG (compressed optional).
- If you don’t have access to Illustrator, you can use the free online tool CloudConvert (https://cloudconvert.org/svg-to-pdf), which can convert your PDF to SVG. The only downside is that you can’t edit your graphics.
- The resulting SVG file is simply an HTML text file that describes the shapes for your chart. You can open it up in a browser, or copy and paste it into another HTML document to display your chart.
Unfortunately, you can’t just paste the SVG code directly into a WordPress post, as WP (out of the box) doesn’t support SVG formats. There are plug-ins that allow SVG uploads and other hacks, but a simple solution is to upload your SVG to DigitalStorage and iframe it into your WP post, like this:
These are the studies and readings that we discussed in class regarding visual encoding of your data:
Summary findings of encodings, from most accurate to least accurate:
- Density and color saturation
- Color hue
Know some of the common chart types:
- Bar charts: Trends for categories
- Line charts: Trends for continuous series/continuous changes between x-axis (time series)
- Scatter plot: correlation
- Bubble plot: scatter plot + additional variable
- Pie chart: show proportions
- Area charts/stacked graphs: proportions
Often the data that you get needs cleaning. Misspellings and data entry errors need to be fixed. Open Refine is a powerful tool that can clean datasets quickly. It also allows you to look at your data in different ways, filtering the categories. These different filters are known as “facets” in Open Refine.
If you need a refresher on Refine, you can walk back through the exercise we led in class (below), or try one of these tutorials/resources:
Good examples of the kinds of stories you can write with this data, from Gothamist Shady Groups Spend “Unprecedented” Amount Of Cash In Mayoral Election and from The New York Times, Loophole in a Rule on Ad Spending, Group Financed by Business Leaders Has Put Nearly $7 Million Into Council Races. The Times also built a nice interactive guide toHow Much the N.Y.C. Mayoral Candidates Have Raised and Spent — they haven’t updated it since September 2, and it just goes to show that it takes more than money to win an NYC primary race, but you’ve got the data.
In this walk-through, we’ll use Olympic athlete data from London’s 2012 summer olympics. Use Guardian London Olympic data — look for the “download the data” link and be sure to save a copy so you can edit it. This blog post will use Google Spreadsheet’s Pivot Table function.
You can summarize your data with pivot tables in Excel, Google Spreadsheets or LibreOffice Calc. The details will between software, but the basic steps are the same. Follow these steps to look at the data in pivot tables on Google Spreadsheets:
Delimiters and Functions with Flu Data
To review spreadsheet basics, download the data from Google’s Flu Trends. The data is just text with a lot of commas. The goal is to get your data into tidy rows and columns in a spreadsheet, so you can start looking for interesting patterns or trends.
There’s a lot of data out there. But where do you start to find what you need?
Some basic strategies that work pretty well: Google it. That’s never a bad place to start and it only takes a second. (And use Google in a smart way. Use key words specific to your data. Use filetype: to narrow your search for specific file types. For example, use filetype: csv for only csv file formats. Use the results to dig deeper and discover related agencies that may have the data).
- Figure out who should have the data? Who might have it? Is this information only the NYPD or the IRS can collect? The Departments of City Planning, Buildings, Housing, Finance and Taxation all keep tabs on who owns property in New York City, where that property is located and what it can be used for. If you know who ought to have the numbers you’re looking for, you can start your search by asking them.
- Look at recent reporting about the subject. Who has been releasing reports? Who has been cited in stories? Go ask them for data, or ask them for help finding it.
- Wikipedia is a fantastic resource. Don’t be afraid of it. Most information there comes with a citation — don’t take some Wikipedia author’s word for it, but do look at the source they cited and confirm that the numbers are there.
- Look for think tanks and aid organizations that specialize in the issue you’re interested in.
- Ask a librarian
Know your sources
You can get data anywhere, so it is up to you to decide whether or not you’re working with reliable data. You should know where your sources are coming from — do they have an agenda that can help you understand how they’re framing the data they put out? You can roughly guess who is behind NRA Institute for Legislative Action, but what about Law Center to Prevent Gun Violence? Don’t assume that a think tank is reliable just because it kind of feels professional.
A famous example is the misleading website www.martinlutherking.org. Though the site appears to be an informational site about the civil rights leader Martin Luther King, Jr., it actually is a mouthpiece for the white supremacist group Stormfront.org. You can verify the ownership of domain sites using www.betterwhois.com.
It is also up to you to know where your data is coming from. Did the organization hire a research firm to conduct a comprehensive study? Or did they post a little box on their website asking visitors how they feel?
Be skeptical: an advocate (or government agency) insisting that these numbers mean something doesn’t make it so.
Where to look?
The Journalism School’s Research Center maintains an excellent roundup of guides, many of which will point you to great data sets. Check out the census, business and crime guides in particular.
NICAR’s database library is a great resource. So is Amanda’s tumblr’s “data sources” tag.
Here’s a working guide from last semester: https://github.com/amandabee/cunyjdata/wiki/Where-to-Find-Data
The data visualization shows veteran suicide rates, by state. Strengths: Each state is represented by the same-sized square, making effective comparisons between states of wildly different geographic areas. The color coding is intuitive and the legend is clear. Weakness: The hover interactivity adds more detail, but offsetting the state is distracting because it blocks the nearby data.
The bar charts below offers additional information by ranking each state, though the color scheme may confuse some readers who may try to make a connection with the map colors. The second bar chart would work better as a stacked chart to show proportions of veterans and civilian suicides of the total.
Tracking Veteran Suicides, from News21 2013