I recently attended the Nordic-Baltic Security Summit in Tallinn, where I presented on two areas that I have been working on for a very long time – Data Supply Chains, blockchain and the security implications of both. I have been growing quite concerned about data supply chains for several years. In this blog, rather than give a summary of the event, therefore, I decided I would give a bit more detail about these concepts. While I work in an academic context, the impact of these data supply chains is not theoretical – it is real, and the possible consequences are increasingly having an impact on peoples’ lives. As usual, before I gave a presentation on data supply chains, Facebook very kindly decided to illustrate why they are so important the day before the event.

Dr. Catherine Mulligan, Nordic-Baltic Security Summit (Estonia, April 2019)

The world needs urgent collaboration around these issues immediately – the risks emerging because of data should be equated to those of nuclear energy – a possible positive tool that also has monumentally detrimental capabilities. We must start the work required to balance the negative aspects of technology with the possible positive ones and implement correct controls; understanding the role of data supply chains is the first step in that process.

The real-world impact of these new types of the supply chain is one of the reasons I am so excited about working with GovTech Lab on the projects we have planned and the great team that has formed around it. There’s lots of work to do, and I can’t think of a better place to do it from!

Data Supply Chains, Risk and Security

I’ve first started working on data supply chains in about 2006 – it had become evident that a new form of economic activity was emerging around data and Open APIs. This emerging phenomenon contained inherently the same issues of inequality, the balance of power as in usual physical supply chains like coffee or chocolate. I eventually named this an “Information-Driven Global Commodity Chain” and published a book on it in 2011. Since 2006, I have amassed 13 years’ worth of data and analysis of these supply chains and been investigating the new issues they raise for both economics and the commercial world – how would these need to evolve to respond to the new means of organising the world’s economy?  How do we as engineers need to adapt and change as the systems we are building become more critical to our society and economy?

Merely put a data supply chain is a supply chain that is made up of data. What does that look like? One is illustrated below: Figure 1: An Information-Driven Global Commodity Chain (Mulligan, 2011)

On the far left, we have inputs such as IoT data from sensors from a variety of different companies, the data stored about consumers in the corporate databases, or the data exhaust that comes from an end-users’ mobile phone data using an Android or an iPhone. We can see data supply chains everywhere and most often they have been used to serve adverts to people. There are, however, numerous other use cases and this is what can be seen with the Facebook example above or the example of Google data being used to identify possible suspects in a crime. Data submitted to Facebook has been sold or given to third parties, who have then stored it without proper security protection. No one seems to have been adversely affected so far – but what if they had been? What if user account data had been used to track and trace people, find out where they live or for identity theft? Who is responsible for the possible negative consequences of such activities?

Google, meanwhile, has received requests from law enforcement to provide data on whose mobile device has been in the location of a crime – this is another type of data supply chain – the one associated with national surveillance. There have already been several examples of innocent people being arrested with this type of data.

There are, therefore, many issues that remain to be fully understood in data supply chains. In some previous research through an EPSRC grant, we laid out the research agenda for data supply chains and compared and contrasted them to existing physical supply chains. A summary is illustrated below:

Adapted from Gurguc, Z., Mulligan, CEA, Journal of International Production Economics, 2018

So, where we have well-established rules of engagement for physical supply chains, and they are well-studied and relatively well understood, we are still in need of a more detailed research agenda for data supply chains to be delivered on by the economics, technology and policy communities. This needs to happen as soon as possible to assist our society’s transition towards a digital economy.

Insurance and Consent

Insurance is the lifeblood of a business – without insurance, many companies can’t operate. If I were an insurer looking at some of the data business models in the world today, I would start to ask myself one question – “how can I quantify the risk associated with such data supply chains?” What is the possible liability for something terrible happening because of third parties taking data and misusing it? A critical question that really needs to be understood therefore is the pricing of risk associated with data supply chains. This remains an extensive research question right now.

A second question that needs answering is how we create data supply chains that enable people to use the benefits of digital technologies while minimising the use of associated data exhausts without peoples’ knowledge – and in some cases consent. This requires data supply chains that can manage and control provenance of data, provide assurances that the data has not been manipulated in transit and that the correct permissions are in place to use the data in the context of that supply chain. The context of the data supply chain can therefore also change the type of manner of data usage within it.  These are some of the complex technical and policy issues we are addressing in GovTech Lab.

In another blog, I will further discuss the concepts of Software Supply Chains – which are distinct from data supply chains – and the need to successfully manage and control those in an era of heightened software security and geopolitical tensions.