Archive for

These are the studies and readings that we discussed in class regarding visual encoding of your data:

Summary findings of encodings, from most accurate to least accurate:

  1. Position
  2. Length
  3. Angle
  4. Area
  5. Density and color saturation
  6. Color hue

Know some of the common chart types:

  • Bar charts: Trends for categories
  • Line charts: Trends for continuous series/continuous changes between x-axis (time series)
  • Scatter plot: correlation
  • Bubble plot: scatter plot + additional variable
  • Pie chart: show proportions
  • Area charts/stacked graphs: proportions


The New York Times’ The Upshot published an interactive graph representing a model which predicts an American’s lifetime voting habits based on their birth year. The study found that a person’s most formative years were between the ages of 14 and 24, and that political events and presidential approval ratings during this timeframe were strongly influential on a person’s voting behavior.

The study excluded African Americans because they have historically voted predominantly Democratic. It also excluded Hispanics, particularly recent immigrants, because their population numbers have increased over the years.

The model also noted that once we reach age 40, we are three times less as likely to change our party allegiance given current events than at age 18.

One of the visualization’s strengths is that  it is able to show how different age demographics voted in the 2012–it was mostly people in their twenties and early sixties, which demonstrates Obama’s ability to galvanize youth voters. So, given the model’s prediction, Obama voters in their twenties are more likely to remain Democrat than, say, their parents.

Another of its strengths is that it demonstrates that certain generations essentially “made up their minds.” For example, the WWII generation, influenced by Eisenhower in the 1950’s, remained primarily Republican for the rest of their lives. Conversely, baby-boomers, who grew up in the 1960’s, remained Democrats.

In this same vein, I think the visualization falls short because it is unable to exhibit or quantify historical moments that may have had a great influence on a specific generation. At a minimum, it could have displayed, in an additional line, the president listed below’s approval rating and/or party affiliation.

Historical context makes the visualization compelling, but it’s difficult to import that information in an objective way.

On an interactivity level, the visualization is incredibly easy and appealing. I’m sure most people, like myself, scrolled the bar to their own date of birth to compare themselves with their generational compatriots.

I would assume that one data set the study may have used was party records, since registering for a party is public record.

Often the data that you get needs cleaning. Misspellings and data entry errors need to be fixed. Open Refine is a powerful tool that can clean datasets quickly. It also allows you to look at your data in different ways, filtering the categories. These different filters are known as “facets” in Open Refine.

If you need a refresher on Refine, you can walk back through the exercise we led in class (below), or try one of these tutorials/resources:

Good examples of the kinds of stories you can write with this data, from Gothamist Shady Groups Spend “Unprecedented” Amount Of Cash In Mayoral Election and from The New York Times, Loophole in a Rule on Ad SpendingGroup Financed by Business Leaders Has Put Nearly $7 Million Into Council Races. The Times also built a nice interactive guide toHow Much the N.Y.C. Mayoral Candidates Have Raised and Spent — they haven’t updated it since September 2, and it just goes to show that it takes more than money to win an NYC primary race, but you’ve got the data.



In Climbing Income Ladder, Location Matters from The New York Times (July 22, 2013)

This data visualization breaks down, by county, the percentage chance that a child raised in the bottom fifth of the income distribution ladder will rise to the top fifth.

The study’s range of the top fifth: family income of more than $70,000 for the child by age 30 or more than $100,000 by age 45. (Shown in the third and final graph) For the bottom fifth: parents’ income less than $25,000 (top fifth: parents’ income above $107,000).

The study based it’s findings on the possibility of upward mobility in metropolitan areas mainly on education, family structure and economic layout of metro areas. Areas with higher levels of mobility tended to have stronger secondary school systems.

Counties in states in the West, Northeast and Great Plains regions showed the most favorable opportunity for advancement while the Southeast (Mississippi, Georgia and South Carolina) showed the poorest chances.


-The story accompanying the data visualization provides color and depth as it leads with a character and then ends with the same character, spotlighting Atlanta.

-The colors on the first map representing the percentages, I think, work well for navigation and comprehension.

-The second map, which includes a clickable map and search bar that allows you to type in and track, by city, where a child may sit on the income ladder by the age of 30, based off what his/ her parents earned in the late 1990s, is complex in that it provides a lot of information, but it was still kept simple and easy to understand.

-The third graph is a slope graph highlighting the 30 most populous cities from “best to worst” on the left and right by chance of mobility and based off of where the child or children were raised on the income ladder. The goal of the study and data was to target metro areas, so I think the third graph (slope graph) succeeds in accomplishing that while the other two provide more of a broad picture to show what’s happening across the country.


-Shows “correlation not causation.” But, would causation be difficult to show if there was more information about school systems, demographics and where the income lives in the areas?

-missing a few counties, but oh well.

-Despite having a massive amount of information, It would’ve been nice to see a small tidbit somewhere that showed demographics or information about the school systems in the areas–to tap in to the correlation that information would have with income–but with three very interesting, detailed graphs already included, I thought it was fine that the school and demographics information was just left to text.

In this walk-through, we’ll use Olympic athlete data from London’s 2012 summer olympics. Use Guardian London Olympic data — look for the “download the data” link and be sure to save a copy so you can edit it. This blog post will use Google Spreadsheet’s Pivot Table function.

You can summarize your data with pivot tables in Excel, Google Spreadsheets or LibreOffice Calc. The details will between software, but the basic steps are the same. Follow these steps to look at the data in pivot tables on Google Spreadsheets:



A little background: I was listening to WNYC last week and came across a debate about “selfies.” I started wondering what selfies really are. And then, I came across the following dataviz project.

“Selfiecity” is a project conducted by a computer science team at CUNY to analyze more than 3,000 selfies from Instagram in five cities of the world – New York, Bangkok, Berlin, Moscow and Sao Paulo.

The team collected more than 600,000 in those five cities and selected 640 “single selfies” from each city, with the help of Amazon’s Mechanical Turk workers, for analysis. The team looked at age, gender, pose and facial expressions (smiles, angry, etc) by using a facial analysis software.

Some of the interesting facts are:

  • Only three to five percent of images analyzed are actually selfies.
  • Significantly more women take selfies in all of the five cities.
  • Most people in the photos are fairly young with the median age of 23.7. Bangkok is the youngest city (21.0). New York City is the oldest (25.3).
  • The project’s mood analysis revealed that you can find a lot of smiling faces in Bangkok (0.68 average smile score) and Sao Paulo (0.64). Moscow had the least smiles.


The project is less conventional and scientific. It has a lot of flaws, including how to identify a person’s age. Person’s mood is subjective. It needs more samples from other cities to find global trends. But, this project has made me think about how we can approach a vast amount of information/data on social media.