Posts from the Festival of Data Category

Igarapé is a Brazilian think tank that released a comprehensive dataset of worldwide murder statistics. It is presented in the form of a animated globe, that you can drag around and click for information on homicide in each country. It is made in Javascript.

Good things:

– The full dataset is available to download and clearly marked;

– Some countries have regional data, on states/provinces. There is an emphasis on Latin America and Caribbean;

– They provide info number of homicides, homicide rate per 100,000 inhabitants, evolution across the years, weapons mostly used and prevailing age and gender of the victims;

Not so good things:

– The datasets are very diverse in sources and dates. Some countries have very long historical series, some don’t. The application does a good job of making these sources and dates very clear, but it still gives me doubt of how much they can be comparable. Ex: in African countries, if the country is going through a civil war or peaceful times will make a lot of difference in the data;

– An animation shows the top ten homicide countries in the world. It isn’t a very clear when it starts and stops, and it’s the only ranking available in the site. I can’t figure out how a particular country ranks;

– The site is available for mobile but doesn’t work very well in an iPhone;

All in all, a good dataviz tackling a very difficult subject and dataset.





Michaela Ross

May graduates might be celebrating this month, but next month many will have another thing coming: student loan repayments. Student loan debt in the U.S. has skyrocketed in the last decade, from a total balance of $350 billion in 2004 to nearly $1.2 trillion dollars in 2014, according to the Federal Reserve Bank of New York. The Fed reports that the average student loan debt now stands at $26,700 per borrower. The New York Times has partnered with the Institute for College Access and Success and other debt research and advocacy groups to create useful tools and visualizations to track this growing burden:

The Student Debt Repayment Calculator:

My favorite visualization is the student loan repayment calculator, created in May 2014. The user is able to enter their student debt owed, or search for the average debt owed by graduates of their alma mater in 2013. Next, they enter the interest rate of their loan and finally they chose if they’d like to pay it off over the standard period of 10 years, or a shorter or longer period.

The real magic comes in the “monthly payment” slider. Borrowers can increase or decrease this payment amount and instantly see in the graphic how it will shorten or lengthen their years of repayment. It also displays how the total interest paid on the loan will increase or decrease with this variable. The right side of the screen displays the salary amount one needs to make in order to keep payments at 20% of their discretionary income.

I think this visualization/calculator epitomizes a service-orientated approach to data. The Times has taken a calculation that many borrowers find very difficult to compute on their own and simplified it in a user-friendly format. It actually could affect if people choose to take out loans or how they choose to budget their repayment, and may even save some of them from defaulting.

    What works:

-Highly individualized
-Highly interactive
-Aesthetic is simple
-Dashboard-like format makes all data visible at once
-Labels are used sparingly, but aid in understanding because of close proximity
-Incredibly simple
-Practical use

    What doesn’t:

-Colors used don’t show enough contrast to make them pop out (especially since the colors used in the triangular graphic match the colors of the corresponding values in the sidebar)
-School search bar doesn’t appear to function
-Quotes from expert sources clutter the right side of the graphic

Average Graduate Debt and Tuition Costs Tracker:

This second graphic, created in May 2012, is a bit more complicated to understand and use. The landing page shows a chart with blue dots (representing public colleges and universities) and orange dots (representing private schools) on a grid pattern. By rolling over the dots, one sees the name of the school the dot represents as well as the average student debt at graduation and average tuition and fees for 2010. The x axis displays the annual cost of tuition and fees, and the y axis displays the average graduate debt. The dots are therefore laid out on the grid according to where they fall with these two measures. The size of the dots reflects the schools’ enrollment.

There are several ways to manipulate the chart. One can search for their college or university in the search bar and its corresponding dot is highlighted. One can also limit the amount of dots shown on the chart by public or private, enrollment size, graduation rate, share of graduates with debt or athletic conference of the school. These options are in drop-down menus to the left of the chart.

A timeline slider above these drop-down lists allows the viewer to watch the dots in their customized chart to shift over time from 2004-2010.

A zoom slider on the right side of the screen lets the viewer get a closer look at dots that are close together.

In the lower left portion of the screen, one can enter their personal debt upon graduation and their graduation year. A horizontal line is drawn across the chart representing where one’s school would fall. The dots above the line represent schools where graduates average higher debt.

The graphic can also be switched from chart to map mode. The drop-down menus still allow one to limit the dots shown by private or public schools, enrollment, etc. The map mode does not allow for customization according to the viewer’s personal debt, like chart mode does.

    What works:

-The chart mode allows viewers to pick up on trends quickly, like the concentration of public schools at -the low end of the debt/tuition spectrum.
-Highly interactive
-Fairly strong level of personalization
-Timeline slider dramatizes trends clearly for viewer

    What doesn’t:

-The learning curve for the user to understand and interact with the chart is more challenging, and some people might lose interest
-The map mode seems to add another layer of complexity without adding real value to understanding the data
-The data is limited to schools that reported their data to the research institute used, so it excludes many graduates

Although these two visualizations by the Times were made in different years, they compliment each other in their content. They also allow viewers to get a better idea of where they stand compared to other graduates in the U.S. and their student debt.

FiveThirtyEight broke down the data around airline delays last month in a very detailed interactive graphic with an accompanying article. According to the Bureau of Transportation Statistics, which collects all of this information, the 6 million domestic flights in 2014 took an extra 80 million minutes to reach their destinations. Out the major airlines, United and American were the worst offenders.

What works:
-This clearly took a lot of work in terms of data organization and analysis. While it’s still a little confusing at first, the methods are explained thoroughly in an accompanying article.
-The visualizations are clear and engaging. Having the box for entering specific flights at the top works well to draw readers in.
-The map itself is fun to play with. Both charts do a good job of displaying the key points.

What could be improved:
-It would be better if the airport information card popped up automatically when you hover over an airport on the map. Right now, with so many dots, it’s hard to tell what you’re looking at unless you click on one.
-The graph plotting out times by month is confusing. Flipping the axes, so months are on the bottom along the x axis and times are on the lefthand side along y, would have been easier to understand. Right now it just looks like a bunch of lines.

Overall, a strong interactive piece that makes me happy to fly Virgin America!


This is a page on Grantland’s site that has a bunch of data visualizations for the playoffs. My favorite is the one at the bottom of the page which illustrates where the playoff teams make the most shots. You click on a team and the court changes. Blue means a terrible shooting percentage and red means great. The only problem I see with this map is you can’t make it bigger. If this went full screen or perhaps even opening up into a new window like other graphics from Grantland, I think it would have been a stronger visualization.

By Alison Kanski

Water’s Edge by Reuters was an enormous investigative story about the rising sea level and its effects on coastal areas, mostly focusing on the U.S., but also looking briefly at southeast Asia. This article is very long and packed with data and visualizations, so I’m just going to look at a few of them.

The first graph was definitely the strongest. At first it looks overwhelming, but it clearly shows the general trend upward in coastal flooding. If the reader wanted to find a particular city (like I looked for Charleston, SC), the drop down menu to the right of the chart made it really easy. Rather than hunting through the jumbled lines, you can just pick a city that interests you and see the data for it. One issue with this graph is that some of the lines are broken due to missing data. The note below the graph explains that the data is missing because there were incomplete records for certain years.

The floods per station map was certainly a really cool thing to watch, but it moved too fast to even register the number of floods per station before it moved on to the next year. But the general idea to show that coastal flooding has increased is clear. One issues is that more stations are added as time goes on. In 1920, there are only nine stations, but in 2013 there are easily four times that. I think that may skew the reader’s perception of the floods. There are more pulses on the map both because there are more floods and because there are more stations recording the floods.

The two maps were not very interesting. The use of the color gradient was hard to distinguish, particularly the gray, and there was no interactivity to show specifics for each shaded area for either map. The layers icon in the top left is also an issue. I only noticed it after staring at the maps for a few minutes, so the average reader would most likely not notice it’s there. Although that may be for the best. When I started playing around with the layers, it almost made the maps more confusing because some of the layers overlaid each other and others didn’t. I was spending more time trying to figure out the layers than actually looking at the map. It was the most user-friendly experience.

Overall, I thought this was a great data viz story. A massive amount of data and research clearly went into it and, in general, it was organized and presented well. It takes some time and dedication to the story to get through this entire article, but I certainly think it’s worth it.


By: Andrea González-Ramírez

The Refugee Project is an interactive map that documents the refugee migrations around the world from 1975 to 2012. The project is based on data from the United Nations and is complemented by original histories of the major refugee crises of the last four decades, situated in their individual contexts.

Why it works:

1) There’s a lot of information, but it’s very well organized

The map provides tons of information: refugee population per country, places where refugees have seemed asylum, historical context of each event that led to a refugee crisis, a timeline of the refugee migrations from 1975 to 2012, among other things.  Having so much data could potentially lead to a very disorganized map, but that’s not the case. The stats are always on a left column, the historical context doesn’t appear unless you click on the country that had the crisis and the option of zooming in and out makes easier the process of finding information for each country.

2) The data sources are legitimate

The project uses the UN’s data. The organization counts and tracks millions of displaced people through the offices of the High Commissioner for Refugees (UNHCR), and a separate agency for Palestinian refugees (UNRWA). The UNHCR has a map rather similar to the Refugee Project with the exact same data, but it doesn’t work as well. Bottom line is, I think the data sources are pretty legit.

3) It provides historical context for each event that led to a refugee crisis

Each crisis that has led to refugee migrations is described in detail whenever you hover a specific country during a specific year. It’s nice that the map not only tells you that X amount of people flew out a country, but why they were forced to.

4) It’s pretty

The design is simple, yet effective. The use of the dark background and only red and blue to make the information pop out was a good idea.

5) It has an additional list of sources for further readings

This is one of my favorite parts. Even though is not accesible through the visualization per se, this project also includes a list of other sources in their “About Me” section where you can find multiple sources of information per year.

Things that could be better:

1) It can be a little overwhelming

At first glance, the map can be a little bit confusing and it can be hard to decide where to start. For me it was easy to figure it out, but I had a friend who seemed just plain confused when trying to figure out how it worked. Maybe it there was a legend or a established set of  instructions to guide you through it, it would be much better.

2) Because it goes back to 1975, some data is missing or not available

This one was kind of obvious. Because it goes back four decades, there’s data that’s either missing or not available. This is not explained anywhere. But come on, during the 70’s the United States didn’t welcomed any kind of refugees? That doesn’t seem right.*

[*I’m spouting nonsense without verifying this information. Will get back to you guys on this fact.]

Final impressions:

Overall, I believe this is a very good visualization. This is a very important topic and this project did a good job in trying to capture all the information. I wish the group behind it would make an updated version, specially considering all that’s been going on in the last couple of years.

I was browsing The Washington Post‘s graphics feed, as I am wont to do when I am bored and/or desperate to RT something, and I came across this nifty map of the United States, which shows all 435 congressional districts color-coded by House members’ religious affiliation. The accompanying article adds some good analysis of the general findings you can check out, but in the meantime, here are some observations from Yours Truly.

Why the map works:

1) It’s pretty. No, really. It’s a bright, colorful, interactive map that is (mostly) easy to read.

2) It breaks down nicely. There are four different versions of this map: (a) a broader Christian/non-Christian map to drive home how disproportionately Christian the House is (93 percent versus 76 percent of all Americans), (b) a Christian-centric map to showcase the diversity of denominations within that 93 percent, (c) a non-Christian-centric map to highlight the 30 districts that might otherwise get lost on the map and (d) a map to drive home the partisan divide between the Republican politics of the House’s Mormons (9 of 9) and the Democratic politics of the House’s Jews (18 of 19).

3) It’s interactive. If you hover the cursor over a district, the congressman’s name appears, along with his or her party, district number and religious affiliation.

4) It’s different. This isn’t so much a comment on the function of the map as it is on its uniqueness. There are multiple lists and even some charts and graphs showing the religious makeup of Congress, but other than this hideous, outdated and static map from BuzzFeed, I could not find the data presented as a map.

Why the map is broken:

1) It’s full of mistakes. Yes, the map isn’t accurate. For one thing, in the multi-denomination map, many of the Anglican/Episcopalian members’ districts are colored brown, the color for “other Christian,” instead of the correct green. Also, I found at least one congressman, David Valadao, whose party was misstated; he is a Republican, not a Democrat.

2) You can’t zoom. For much of the country, this isn’t a problem, but without a zoom feature, urban areas containing many small districts, including New York City, are difficult to examine.

3) No partisan breakdown on the map. If one reads the list or examines each district one by one, members’ parties are clearly listed, but there is no way to look at the breakdown of a religious affiliation by party on the map itself, with the exception of the Mormon/Jewish special feature. For example, if one wanted to see which Catholic members were Republicans and which were Democrats, one could have clicked that setting, and all Catholic Republican districts would appear red, all Catholic Democratic districts would appear blue and all non-Catholic districts would appear grey. Alas, this is not a feature, but it seems a rather obvious one. I rather like Pew’s presentation of the partisan breakdown, using a graphic that looks somewhat like the actual seats in Congress.

4) Where are the Lutherans? Full disclosure: I am not a Lutheran, but I found it odd that a denomination with so many members both in the House (21, versus 24 Presbyterians, who appear on the map in dark blue) and in the general population (there are nearly four times as many American Lutherans as American Anglicans/Episcopalians) was lumped in with “other Christian” on the map. Most of the Lutheran congressmen live in the Midwest, as do most American Lutherans overall, so this was a major missed opportunity to show this geographic trend.

Overall, this map was certainly an impressive undertaking on the part of the Post, but it is lacking in both accuracy and functionality. I would like to see a map that follows the Post‘s model that implements my recommendations so we can have a graphic that is truly representative of our representatives.

More countries are urbanizing as we progress through the 21st century. Since 1950, urbanization has increased globally. Certain countries now, like Saudia Arabia, have become exclusively urban when they were once largely rural countries. Researchers and data journalists at KILN, a London-based data visualization organization, have created an interactive presentation on urbanization. They found that average income, fertility rates, childhood malnutrition, and illiteracy affect a country’s rate of urbanized growth. They have also found that most countries around the world have turned to urbanization and to this day, are still migrating towards the urban life.

KILN’s interactive presentation succeeds in presenting a large amount of information through simple and visuals mediums. The amount of reporting behind the project is evident, and facts are stated in a chronological manner. Readers will clearly see that most countries have increased in urbanization from 1950 to 2015 through color-coded line graphs.

The presentation’s narration also aids the progression of the information, allowing readers to sit back and view the presentation before setting out to explore the project themselves.

Organization is a problem that I found with the project. Navigating through the graphs is slightly confusing because readers have the option to view countries by average income, continent, country, and change since 1950. Graphs will sort themselves according to which buttons are pressed, but the color-coded distinction between countries becomes overwhelming when the graphs are stacked on top of each other. All countries in Africa, for example, are colored blue. Once these graphs are combined, the user has no way of deciphering which country is which if they don’t hover their mouse above each individual line that reveals the country’s name.

Not all countries are shown in each section and it’s not explained why certain countries were included while others were not. There are no source citations throughout the presentation. Rural areas seem to have the higher fertility rates in all countries. The dates of the data sets for the fertility rates vary from 1998 to 2012. This is not updated information.

The presentation succeeds in capturing visual attention, but the speediness in which the project is presented restricts viewers from properly understanding the information and the accompanying narration. KILN seems to be an innovative organization unlike any other, but the flaws for this particular presentation are evident.


In Climbing Income Ladder, Location Matters from The New York Times (July 22, 2013)

This data visualization breaks down, by county, the percentage chance that a child raised in the bottom fifth of the income distribution ladder will rise to the top fifth.

The study’s range of the top fifth: family income of more than $70,000 for the child by age 30 or more than $100,000 by age 45. (Shown in the third and final graph) For the bottom fifth: parents’ income less than $25,000 (top fifth: parents’ income above $107,000).

The study based it’s findings on the possibility of upward mobility in metropolitan areas mainly on education, family structure and economic layout of metro areas. Areas with higher levels of mobility tended to have stronger secondary school systems.

Counties in states in the West, Northeast and Great Plains regions showed the most favorable opportunity for advancement while the Southeast (Mississippi, Georgia and South Carolina) showed the poorest chances.


-The story accompanying the data visualization provides color and depth as it leads with a character and then ends with the same character, spotlighting Atlanta.

-The colors on the first map representing the percentages, I think, work well for navigation and comprehension.

-The second map, which includes a clickable map and search bar that allows you to type in and track, by city, where a child may sit on the income ladder by the age of 30, based off what his/ her parents earned in the late 1990s, is complex in that it provides a lot of information, but it was still kept simple and easy to understand.

-The third graph is a slope graph highlighting the 30 most populous cities from “best to worst” on the left and right by chance of mobility and based off of where the child or children were raised on the income ladder. The goal of the study and data was to target metro areas, so I think the third graph (slope graph) succeeds in accomplishing that while the other two provide more of a broad picture to show what’s happening across the country.


-Shows “correlation not causation.” But, would causation be difficult to show if there was more information about school systems, demographics and where the income lives in the areas?

-missing a few counties, but oh well.

-Despite having a massive amount of information, It would’ve been nice to see a small tidbit somewhere that showed demographics or information about the school systems in the areas–to tap in to the correlation that information would have with income–but with three very interesting, detailed graphs already included, I thought it was fine that the school and demographics information was just left to text.


A little background: I was listening to WNYC last week and came across a debate about “selfies.” I started wondering what selfies really are. And then, I came across the following dataviz project.

“Selfiecity” is a project conducted by a computer science team at CUNY to analyze more than 3,000 selfies from Instagram in five cities of the world – New York, Bangkok, Berlin, Moscow and Sao Paulo.

The team collected more than 600,000 in those five cities and selected 640 “single selfies” from each city, with the help of Amazon’s Mechanical Turk workers, for analysis. The team looked at age, gender, pose and facial expressions (smiles, angry, etc) by using a facial analysis software.

Some of the interesting facts are:

  • Only three to five percent of images analyzed are actually selfies.
  • Significantly more women take selfies in all of the five cities.
  • Most people in the photos are fairly young with the median age of 23.7. Bangkok is the youngest city (21.0). New York City is the oldest (25.3).
  • The project’s mood analysis revealed that you can find a lot of smiling faces in Bangkok (0.68 average smile score) and Sao Paulo (0.64). Moscow had the least smiles.


The project is less conventional and scientific. It has a lot of flaws, including how to identify a person’s age. Person’s mood is subjective. It needs more samples from other cities to find global trends. But, this project has made me think about how we can approach a vast amount of information/data on social media.