Ethnography in Industry: Methods for distributed & large data sets (part two)

In previous posts in this series, we described what objectives organizations can use ethnography for, as well as an overview of data collection methods that ethnographers use to understand a particular population or situation of interest.

A question that has come up is how do you ethnography for distributed populations (e.g., online communities) and massive data sets… and for that, I’ll talk about mine and my colleague Nic Ducheneaut’s (around here, we’re referred to as the “Nics”, and no, we didn’t name ourselves that!) work in virtual worlds.

Virtual worlds: the opportunity

Virtual Worlds (VWs) provide a number of unique affordances and challenges for researchers:

  • Unlike the physical world, the VW comes inherently instrumented with high-precision movement sensors, perfectly transcribed conversations, and instantaneous teleportation.
  • On the other hand, VWs bring interesting challenges in terms of ambiguous multiple identities (i.e., players having multiple characters) and, more importantly, the sheer scale of data that can be collected at the click of a button. For example, in our original “PlayOn” study, we only collected 7 character variables from the online game World of Warcraft, but it took us about a year and a half (and about half a dozen publications, see some of them here and  here) before we completed our analyses.

The hidden variable for us? Time. 7 variables from 150,000 characters over 1 year at 15 minute intervals provides many ways of looking at changes over time.

Virtual worlds: the methods and what’s needed

Over the years, we’ve used a broad and customized mix of methods to study VWs. These include participant observations, structured in-world interviews, lab experiments, online surveys, server-side data mining, and more. More importantly, we’ve learned that a certain interdisciplinary agility is needed to make the most of research in this area, and that perhaps these collaborations point to a larger intersection of social science methods facilitated by these novel research spaces.

Consider for example the need to bounce back and forth between different levels of analytic scale. In our study of player guilds (i.e., player organizations in online games), we leveraged the PlayOn data to map out the range of guild sizes and types. This allowed us to then target representative members of representative guilds for structured interviews. Not only did the interview data help us understand the larger trends, but the high-level data also allowed us to provide good estimations of how generalizable our qualitative findings were — a perennial concern for sophisticated ethnographers.

Another example to consider is the ability to leverage real-time data monitoring to identify low incident events, such as leadership changes in guilds or guild fracturing. By leveraging the PlayOn architecture, one possibility is to set up alerts for these low incident events and alert ethnographic researchers. In this way, VWs make it possible to gather many data points on events that would otherwise be difficult to gather data for.

Our examples highlight the ways in which qualitative and quantitative research teams can complement each other, but the examples also hint at the need for technical expertise — in terms of extracting and processing large data sets from VWs — which both psychologists and ethnographers don’t often have. Thus, we believe that VWs and similar Web 2.0 spaces hint at an emerging mixed or “hybrid” methodology that depends on agile collaborations between quantitative researchers, qualitative researchers, and software engineers.

Why bother?

This is not just an academic enterprise. The ability to glean this data has many implications for designing and scaffolding online communities, learning new aspects of personality and social behavior in online worlds, and mapping digital personas to physical needs.

The ability to leverage this architecture for more tailored marketing is one commercial opportunity. In addition to inferring basic demographics, personality inferences may lead to more nuanced methods of targeted advertising. And the ability to infer demographics based on online interaction metrics helps fill in the gaps left from zip code segmentation alone — after all, not everyone who lives in your neighborhood (or in your home!) is exactly like you…

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for Commercialization.


Our scientists and staffers are active members and contributors to the science and technology communities.