Making Devices That Matter


Creative Commons License
This work by Chesney Research is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Clustering


Clustering is an important part of a wide variety of fields, such as epidemiology, as well as genetics, computation, and sociology too. Being able to cluster helps us chunk information, which is critical to any activity involving human understanding. On small scales, clustering can be easy for humans to do visually, but on very large scales, it quickly becomes impossible. When the number of elements becomes quite large computers are needed to cluster appropriately. With an ever-expanding and an increasingly networked population, the utility of mathematical clustering techniques grows even greater.

As far back as 1854, John Snow famously created an ingenious plot of the number of cholera cases overlaid on a map of London that was instrumental in solving a problem [1] that had plagued England for years. Through this visualization, Snow was able to trace the root of the cholera epidemic to a particular water pump that was at the epicenter of the outbreak. This lad to revolutions in sanitation that help us lead cleaner and healthier lives today.


An Example from the Reality Mining Project


The Reality Mining project at MIT conducted a study of cellular phone users in 2005 [2]. Ninety-four subjects participated in the study, allowing the investigators in the study to log basic data form their cellular phone. During the study, the participants answered a dyadic questionnaire indicating, among other things, whether or not they are friends with each of the other participants in the study.

The first diagram to the right is a plot of the 36 subjects who responded to the friends questionnaire. Each subject is a numbered square. A connection between two subjects shows that each indicated they are friends with the other. Visually, it is difficult to tease out groups of friends among the subjects.


Ordering the subjects according to how they cluster together makes the groups of friends much more distinct. Treating the subjects as nodes of a graph and friendships as edges in a graph, we can use results from graph theory to partition the subjects based on whether or not they are friends with each other. The subjects break up into 11 groups of friends. Some groups only have 2 friends, others have as many as 5 friends who participated in the study.




References

[1] E. Tufte, Visual Explanations. Cheshire CT: Graphics Press, 1997.

[2] N. Eagle, A. Pentland, and D. Lazer (2009), "Inferring Social Network Structure using Mobile Phone Data", Proceedings of the National Academy of Sciences, 106(36), pp. 15274-15278.