Making the most of a bad thing – using public data breaches for good

Data breaches in which the contents are made public can be a security headache for many, but they can also serve a useful purpose. Companies, network defenders, and investigators can frequently extract value from examining leaked data and answer questions which might otherwise be impossible to address.

Public data breaches, where attackers have stolen personal data and leaked it publicly, are commonplace. At the time of writing the security website HaveIBeenPwned had collected more than 10.5 billion online accounts which had been caught up in such data breaches, more than one for every person on the planet. While data breaches can have high impacts on businesses themselves, affecting customer confidence and even shareholder value, the collateral damage on their users can also be far reaching.

Cybercriminals can be motivated to leak this data publicly to enhance their status among the cybercriminal underground. They may also be motivated to get access to this data as it often contains email addresses useful for sending phishing emails, and passwords which can be used for password-guessing attacks and account takeovers. The existence of these datasets, often easily downloaded or purchased from various data-breach sharing forums, presents a security threat to individuals and businesses.

Using public data breaches for enhanced security

Technology companies such as Google are now using databases of passwords from public data breaches to prevent users from using those which have already been leaked – a smart way of improving user security. Similarly, public data breaches can be queried as part of reviews of individuals’ public digital footprints to reduce their cyber, reputational and physical risks.

Using public data breaches for due diligence

Ideologically-motivated public data breaches, in contrast to financially-motivated leaks, are less common but can have wide-reaching and profound effects. The leak of data belonging to the privacy-focused offshore Panamanian law firm Mossack Fonseca in 2016 led to resignations of senior political figures, as well as increased scrutiny on many more politicians and businesspeople due to the firm’s suspected involvement in money-laundering and tax evasion. This dataset is now routinely queried by businesses as part of due diligence screening for potential customers to check for corruption and money-laundering risks. However, while appearance in a data set may be a “red flag” for due diligence analysts, the importance of critical reviews of leaked data should not be overlooked, given the propensity for data leaks to contain false data, or that which can be misinterpreted.

This week saw the announcement of a further ideologically-motivated compromise of data from Gab, a social networking service popular with users with far-right political leanings. The leak was revealed by the group DDOSSecrets, an organisation previously known for other leaks, including of data obtained from the Cayman National Bank in late 2019 which contained details of the offshore bank’s customers. An attacker allegedly gained access to Gab backend databases and stole 70Gb of data, including public and private chat data. Of particular interest will be details of those who used the platform and were involved in the January insurrection and storming of the Capitol building in Washington DC. These details will no doubt be used by some to identify the instigators, serving a purpose that many would strongly argue is in the public interest.

The importance of critical analysis

The leaked dataset has been announced publicly but only shared with a select group of journalists and researchers – a responsible approach, in marked contrast to some previous public data leaks which have been made freely available, including to those without the inclination to critically review its contents.

Data leaks can contain false data. In the Ashley Madison leak, the site did not require email verification meaning users could sign up with random email addresses. Including those appearing to relate to others. For example, an email address which looked like it belonged to Tony Blair was famously present in the dataset. Without an understanding of a site’s technology and security processes, it would be easy for the uncritical eye to jump to conclusions, underlining the importance of critically analysing such data. In some suspected nation-state “hack and leak” disinformation operations, false data has been shown to have been inserted among legitimate information, as was almost certainly the case in the Macron email leaks of 2017.

Public data leaks have many potential harmful effects on the compromised business and users, but it is now imperative that businesses consider the contents of some of these datasets to help improve their own decision making, whether that be to improve security for their users, or to critically analyse potential customers’ presence in these.

MDR Cyber use publicly breached data sets to help protect our customers from cybersecurity threats, conduct individual enhanced due diligence exercises and to examine and defend reputations. Not only do we look for traces within these datasets, our analysts are trained to corroborate findings and look for inconsistencies, meaning a nuanced approach to their use.