9.6 billion user accounts from 436 websites have been leaked until now. History showed us that data breaches can happen even to companies like Google and Microsoft, and events similar to the Facebook-Cambridge Analytics scandal raised awareness regarding how companies store and use our data. If we can’t build bulletproof software, then how can we reduce the impact those bullets have?
“I think computer viruses should count as life. I think it says something about human nature that the only form of life we have created so far is purely destructive. We’ve created life in our own image.“
Stephen Hawking
Our data isn’t safe
Most data breaches don’t happen by having your personal computer hacked. When you use an online service, you entrust them with your personal data. According to HaveIBeenPwned, 9.6 billion user accounts from 436 websites have been leaked so far. Stolen personal information ranges from email addresses to passwords, credit card details, home ownership statuses and financial investments. And there’s probably more that haven’t been made public yet.
One might believe that using well known and trusted online services is a way of protecting yourself from having your data leaked. Surely, Google and Microsoft must be just too big to get breached, right?
After working at Bitdefender for 2 years, I’ve learned that “perfect security” doesn’t exist in software. Investing into security is mandatory and it reduces the number of vulnerability points a tech product has, but becoming bulletproof is not feasible for complex software systems. Even a simple presentation website, like an online flyer, programmed during a coffee break depends on complex components like an operating system, some online protocols, a programming language, a server hardware etc., and each of these components might contain a flaw exploitable by hackers to access the flyer.
“Security in IT is like locking your house or car — it doesn’t stop the bad guys, but if it’s good enough they may move on to an easier target.”
Paul Herbka
And while big companies can afford improved tech security, they also own more customer data, which means hacking them is more profitable. And Microsoft, Google, Adobe, eBay, Canva, Equifax, Linkedin, Yahoo, Myspace, Uber, Ubisoft, Sony, Slack, Mozilla, Blizzard, British Airways, Bank of America, Patreon, Quora, Reddit, Steam, Twitter and Twitch are some of the big names which are believed to have some of their data breached.
Lack of data regulations
Ok, we got it, our data isn’t safe. And we can’t know all the information companies store about us either. So, what can we do? Can we bring attention for governments to impose stricter software laws?
Well, awareness for potential disasters tends to raise retroactively. Humans understood the threat of earthquakes after surviving catastrophes. Unfortunately, we need a history of obviously damaging events in order to take steps towards preventing them, and it’s not different with cybersecurity. This is why most countries were not prepared for the current Coronavirus pandemic, despite some great minds warning us against them. It is also why it takes us so long to understand and prevent the threats of global warming, before being affected by them. Humans are bad at factoring in the unseen benefits of things we take for granted.
We all know the Facebook-Cambridge Analytics scandal from early 2018, when it was revealed that Cambridge Analytica harvested the personal data of millions of people’s Facebook profiles without their consent and used it for political advertising purposes. Facebook was then fined $5bn for this, a sum which represents 40% of their first quarter revenue. This event, among others, has served for a wake-up call for tighter software regulations and public awareness about methods of their data being used.
“Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say. “
Edward Snowden
Personal Data Accounts (PDAs)
Ok, so we can’t fully prevent data breaches and the government aren’t going to tighten our software rules overnight. Then what’s left to be done?
Well, we can limit the damage done by having our data stolen, whenever it happens. Think of your sensitive data as plutonium: it’s dangerous and once it’s leaked, there’s no going back.
“Ethical tech is plutonium (personal data) handling”
Irene Ng, CEO at Dataswift
So what if we store our plutonium separately, in a safe that can only be opened with our initial database? This way, if our database gets disclosed, no sensitive data will be lost and the whole event is recoverable. And if the plutonium gets leaked, it’s useless without access to our database. The only way we’ll be in trouble is if both containers get stolen by the same entity, which is still possible, but far more unlikely.
“We should treat personal electronic data with the same care and respect as weapons-grade plutonium — it is dangerous, long-lasting and once it has leaked there’s no getting it back.”
Cory Doctorow
Let’s push this idea a bit further. If our purpose is protecting customers, what if we could store the plutonium in such a way that customers have full control over it, and they can always choose what sensitive data we store about them? This can be done using personal data accounts. The concept of a personal data account is the cornerstone for effective data sovereignty for the simple reason that I can only control what is with me. It enables customers to own their sensitive information, it prevents data leaks from becoming catastrophes, and it helps building customers’ trust. And we don’t have to build everything from scratch, there are PDA providers out there.
“It used to be expensive to make things public and cheap to make them private. Now it’s expensive to make things private and cheap to make them public.”
Clay Shirky
Ok, but how does the user gain “full control” over their data? Well, after we issue his PDA, the customer will have direct access to his data through our provider. If his PDA was a credit card, our provider would be his “bank” and we’d only be a merchant. He owns his data which is stored and hosted by another party, we merely have access to it for as long as the customer wants to.
PDAs with conventional design
But how does this approach look, compared to the well known conventional design? Is it feasible is to build around personal data accounts or to integrate them into an already developed architecture?
Let’s consider an oversimplified architecture for a blogging platform. We’d store posts, comments, likes, users and other data in the same place. See the issue? With this approach, if our DB gets compromised, attackers will have access to everything.

Now, let’s try to move our plutonium somewhere else.

Instead of keeping whole user accounts in our own DB, we can only store unique identifiers for those PDAs so we can fetch/update them at anytime. Personal data account providers usually expose simple CRUD (Create, Read, Update and Delete) APIs in order to manipulate your users’ data. In example, PDAs can be hosted by the provider on separate servers and the interface could be exposed as a REST service.
But how, exactly, will the new backend architecture look? Let’s dive into three common user flows: registering, fetching a profile page and updating it:

On register, on our database, we’ll only store a new unique ID for the new customer’s entry. Anything else which is considered sensitive, the plutonium, goes into the PDA. The only architectural change is that, instead of creating the user entry in our DB, we send a http request to our provider.
Fetching/updating his profile are done similarly:

In example, if our backend was written with Node.js and MongoDB, our main architectural change for our users data would be that, instead of calling:
db.users.update({ userId }, { ...newData })
, we would now call:
http.put({ url: provider, data: { userId, newData } })
And that’s it, in short. It requires some more error handling and some initialisation steps, but integrating PDA’s is easier than it looks and their value outweighs the time spent adding them.