“If I could ever build the ideal researcher, I would clone someone who’s a third McKinsey consultant, a third data scientist, and a third someone who’s got the current research skills and instincts, who is concerned about the quality of data, and who knows how to do surveys, to explore complex issues that you can’t address through big data.”
~ Howard Shimmel, former CRO Turner Broadcasting, quoted in The Insights Revolution: Questioning Everything
Insights professionals today tend to be well versed in the art of the survey, but most researchers are not so familiar with the world of data science. But the reality is, we need to learn about data science—and fast.
Fortunately, a powerful new book entitled Bit by Bit: Social Research in the Digital Age, by Princeton professor Matthew J. Salganik, provides a thorough and very readable introduction to the intersection of traditional survey-based methodologies and data science. “This book,” he writes “is for social scientists who want to do more data science, data scientists who want to do more social science and anyone who is interested in the hybrid of these two fields.”
He makes this intersection relatable for insights professionals by focusing on research design. “If you think of social research as the process of asking and answering questions about human behavior, then research design is the connective tissue; research design links questions and answers.”
The book has chapters on observing behavior, asking questions, running experiments, creating mass collaboration, and ethics. It concludes with a chapter on the future of digital social research. Each chapter is exhaustively illustrated with vivid examples drawn from hundreds of studies (there are 55 pages of references alone).
Bit by Bit is written to be “helpful, future oriented and optimistic.” Salganik’s enthusiasm for the subject shines through on every page. And he has an outstanding ability to distill what could be complex and dry subjects into very engaging bite-sized lessons. This one example, from the introduction of the book, neatly captures the power of combining survey research methods and big data:
“In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls from family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. These researchers were studying wealth and poverty by conducting a survey of a random sample of people from a database of 1.5 million customers of Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the randomly selected people if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.
Everything I have said so far makes this sound like a traditional social science survey. But what comes next is not traditional—at least not yet. In addition to the survey data, Blumenstock and colleagues also had the complete call records for all 1.5 million people. Combining these two sources of data, they used the survey data to train a machine learning model to predict a person’s wealth based on their call records. Next, they used this model to estimate the wealth of all 1.5 million customers in the database. They also estimated the places of residence of all 1.5 million customers using the geographic information embedded in the call records. Putting all of this together—the estimated wealth and the estimated place of residence—they were able to produce high-resolution maps of the geographic distribution of wealth in Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.
It was impossible to validate these estimates because nobody had ever produced estimates for such small geographic areas in Rwanda. But when Blumenstock and colleagues aggregated their estimates to Rwanda’s thirty districts, they found that these estimates were very similar to those from the Demographic and Health Survey, which is widely considered to be the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about ten times faster and fifty times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and cheaper estimates create new possibilities for researchers, governments, and companies.”
If you are interested in learning more about doing research that is faster, cheaper and more accurate and actionable, then I would strongly encourage you to pick up a copy of Bit by Bit: Social Research in the Digital Age.