Two views on Big Data

Big Data is here to stay, and like many others, I'm trying to make sense of its academic and social implications. In particular, how will big data contribute to our wellbeing?

I am reading two interesting, but widely different perspectives: On the one hand, there are computer scientists trying to get the most out of a new tool. Take for example, Alex Pentland who directs the Human Dynamics Laboratory at MIT and his new book: "Social Physics: How Good Ideas Spread- The lessons from a new science".  Pentland exudes confidence: "I live in the future... at the centre of the innovation universe". His book aims to "extend economic and political thinking by including not only political forces but exchanges of ideas, information, social pressure and social status".  He suggests we can explain the social world based on the "patterns of human experience" that we can develop from the traces people leave while using digital devices (call records, credit card expenses, GPS, etc).

Pentland points out that his vision of a "data-driven society implicitly assumes that the data will not be abused".  We know, however (from leaked documents and human history), that this is not something we can simply rely on; we cannot assume that any government or company will respect our rights or our privacy. But the Big discussion is not just about ethics, it's about what we can expect from Big Data methods.

Others such as danah boyd and Kate Crawford, social scientists at Microsoft (danah is also at Harvard and Kate at NYU and the University of New South Wales), have taken a more critical view.



In (boyd & Crawford, 2012) they make 6 interesting points. The first four pertain to limitations of the methods while the last two are ethical:
  1. "Big Data changes the definition of knowledge" because as an instrument used to see, understand, and influence the world, it will impose its own limitations. 
  2. "Claims to objectivity and accuracy are misleading", particularly that observations are not 'objective' (because data must always be interpreted - a subjective process) nor accurate (i.e there is data loss and noise). 
  3. "Bigger data are not always better data" emphasising the point above and the fact that having more data does not eliminate, but actually increase, the need for valid research methods. The impact of sampling is key, for example, that certain emotions are more prevalent amongst people using Facebook does not mean ipso facto that they are more prevalent in the population at large.
  4. "Taken out of context, Big Data loses its meaning" - best made clear with their example: the fact that two people spend a lot of time together (e.g. coworkers), possibly more than the time they spend with others (e.g spouses), does not imply they are closer. 
  5. "Just because it is accessible does not make it ethical" - makes the case for considering research ethics in all Big Data problems. Although it is increasingly common to use Twitter or Facebook data researchers need to be aware of the implications of using and releasing the data. Evidence suggests it's impossible to guarantee 100% anonymity once the data is released.
  6. "Limited access to Big Data creates new digital divides" - focuses on which researchers can actually access this data. Large companies may have access to this data, but the research they perform will be driven by their commercial interests. The authors focus on the divide amongst researchers Big Data foments. This is not about the envy that some may feel for not beinn able to access the wealth of information in those companies, it is abut the type of questions those with access are likely to ask and not ask.

Parhaps Skinner's behaviourism contributed to our understanding of the mind in the same way that Big Data can help us understand societies. But Big Data also has many of the limitations behaviourism had, e.g. disregard for subjective qualitative experiences (mentalism according to Skinner). Given that we now understand the limitations of radical behaviourism, and how it influenced our perceptions of the world, we should try not to replay the same mistakes. As Pentland indicates big data can be used to influence societies, not just individuals. We should make sure that big data acknowledges free will and does not aim to engineer behaviour (as behaviourism did) but rather support human strengths and potentials.


Pentland, Alex Social Physics: How Good Ideas Spread- The lessons from a new science, Scribe Publications 2014.
Boyd, Danah, and Kate Crawford. "Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon." Information, Communication & Society 15.5 (2012): 662-679.