Reading Nathan Yau’s recent post about the Rise of the Data Scientist inspired me to take a look Ben Fry’s dissertation on Computational Information Design in which he describes the process for understanding data as follows:
- acquire – the matter of obtaining the data, whether from a file on a disk or from a source over a network.
- parse – providing some structure around what the data means, ordering it into categories.
- filter – removing all but the data of interest.
- mine – the application of methods from statistics or data mining, as a way to discern patterns or place the data in mathematical context.
- represent – determination of a simple representation, whether the data takes one of many shapes such as a bar graph, list, or tree.
- refine – improvements to the basic representation to make it clearer and more visually engaging.
- interact – the addition of methods for manipulating the data or controlling what features are visible.
I took his process and created a diagram that maps my own skill set with the addition of Interaction Design (my current profession) which I believe covers the represent, refine, and interact steps. 
While I don’t disagree that these steps represent the process for understanding data for the individual creating the data visualization, they don’t cover a step needed to create a design that is readily understood or that is persuasive to others.
User research and testing of the design is needed to verify that the representation is clear and appropriate. Although this could be considered part of the refine step, it may be needed at other points in the process (i.e. represent, or interact). For anyone who is interested in creating data visualizations for other people, it should be considered an important part of the design process.


Visualizing Economics was created by 
{ 3 trackbacks }
{ 3 comments… read them below or add one }
Great diagram, Catherine. As an Interaction Designer myself I totally agree with you about the importance of user-testing. There’s one thing I would alter though: The order of 5.represent, 6.refine, 7.interact excludes the possibility to refine the visualization based on the interaction model. I’ve written my thoughts in more detail on datavisualization.ch — would love to hear your feedback.
An excellent point.
I wrote a master’s thesis that is specifically focused on what it takes to do steps 5 and 6 well. This includes what you have to know about your user’s context (language, industry convention, etc.), as well as how the human brain deals with factors such as information density and relative and absolute placement.
The thesis is available at http://tinyurl.com/diagramthesis
In terms of your diagram, I’m curious why you chose a cyclical format (implying the cycle repeats?), and why steps 5, 6, and 7 are further from the origin (implying something?).
Best, Noah
So we have another land grab by designers. One of these days, non-designers should be preemptive about the land grabs. A land grab by designers wouldn’t be so bad if they didn’t deliberately exclude all others from the space. I suppose it keeps designers employed and rates high to ensure that only they can do InfoVis, IA, etc. Of course, collaboration across disciplines is very difficult. Such collaborations are made much harder by the insistence that only this way or that way works, and the assertion of hierarchy and disqualification.
All phases test. The tests will differ.