Can blockchain disrupt data aggregators?

Blockchain at its core is a database – a distributed database accessible to anyone, where information and hence anything of value that can be represented digitally – money, deeds, music, art, intellectual property, and even votes – can be stored, transferred and retrieved securely and privately. 

On the blockchain, trust and veracity are enforced not by the government or other endowed intermediaries (‘centralized’) but by incentivised mass co-operation and code (‘decentralized’).

There are several high-utility businesses today that are primarily data aggregators and whose business models take some form of economic rent-seeking as custodians of these datasets.

For example, take Credit Reporting Agencies (CRAs) like Experian, TransUnion and Equifax – the three leading CRAs in the US. Their core business is as follows – banks and lending institutions report individual-level credit data (new loan issuance, new credit card, repayments, defaults etc) to CRAs. CRAs use proprietary algorithms to compute a credit-worthiness score for an individual based on all the data it has collected. This credit score (with some but not the entire raw data) is then sold to other businesses for a fee. The most typical usage of this is when an individual applies for a new loan, the financial institution fetches the credit score from CRA and uses it to make decisions like whether to give the loan and how much interest to charge this individual. 

In absence of the CRA, it would have been very difficult for a bank to issue a loan or credit card to a new customer. Hence, CRAs are very important for a robust lending market and easy access to credit. The CRAs charge businesses several dollars for a single fetch for a particular individual. How big is this business? Together the top three CRAs have a combined market-capitalization of over USD $65 Billion. 

While CRAs in their existing form today play a very important role in the smooth functioning of our credit system, they also suffer from major flaws –  

  • Extractive business model – The pricing model is not reflective of their actual costs incurred in collecting and warehousing the credit data but rather a form of economic rent-seeking because of the centralized trust. 
  • Lack of privacy – when data is shared by the financial institutions with the CRAs, it is shared with a lot of personally identifiable information and it is stored as such. While retrieving the data, it is up to the CRA how much to share with the requesting business.
  • Lack of individual ownership – While the data being collected and traded is of individuals, they have no say in how it is stored or who has access to it. 
  • No monetization for the individual – Because individuals don’t own their data, they don’t benefit from it financially either. Entire value is captured by the CRAs (and not by the banks or individuals who are the primary parties in the data).
  • Lack of alternate data – Because these datasets are private, hence they tend to be non-composable there by making it difficult to combine alternate data (e.g. income data, investment behaviour etc) with credit behaviour to make it more comprehensive. This especially hurts individuals with thin credit history – they either get rejected or have to face high interest rates. 
  • Limited credit scoring models – Since the data is private and the credit scoring is proprietary, it prevents alternate innovative credit scoring mechanisms to come up. At best, this limits innovation and at worst can perpetuate economic biases in lending. 

Blockchain has the potential to solve all of the above issues while bringing down the costs of running a CRA. All credit behaviour data can be written on-to the blockchain in a privacy preserving way instead of a private database and reading from it can be very cheap. Accessing a user’s data can be done in a permissioned way where the user also gets a share of the fee being paid by the business requesting the credit score. Different businesses can use the raw credit behaviour data, enrich it with alternate data and come up with their own credit models suited to their business model and consumers.

Let’s take another example of a data aggregator – employee background verification services. 

One study by found that 96% of employers in the United States conduct at least one type of background check. Background checks are useful for obvious reasons – it helps you ascertain that the person you are hiring is the person who she claims to be (in terms of identity, education, skill set etc). Resume fraud has become quite rampant – the Cedalius Group, in conjunction with the Wall Street Journal, found that 34% of all application forms contained “outright lies about experience, education, and the ability to perform essential functions on the job.” Cognizant India recently laid off 6% of its employees over failed background checks.

But background checks are expensive and time consuming – a typical background check can take several weeks and cost between $25-$500 depending on the thoroughness and coverage of the check (criminal records, employment history, credit history, DMV records etc). Sterling Check, an employee background verification company, IPOed on NASDAQ in 2021 at a valuation of $2.2 Bn.

Employment history can be aggregated on the blockchain in a similar way as discussed above for credit reporting. Institutions can submit the data of their employees in a privacy preserving way to the blockchain and in turn gain access to the verification facility of the data submitted by others. This will drastically cut down the time and costs of verifying the employment history of prospective candidates. Alternatively, data can also be sourced via tokenized experience letters that an employer issues only to its outgoing employees. And again, the fees paid for data verification can be split between the blockchain nodes and the employees (and even the past employers). 

The biggest challenge in disrupting established data aggregators is the cold start problem of sourcing the data to begin with. Here too blockchain has an advantage as it provides a bundled means of incentivisation via tokens. However, as always, success will still be primarily dependent on the creativity and tenacity of entrepreneurs who take on this challenge.

Big disruptions happen when new technology enables better business models (e.g. streaming, SaaS, sharing/renting economy, DeFi) and blockchain has the potential to do the same for data aggregator businesses while putting individuals back in control of their own data. At Woodstock we are on the lookout for founders who are working on these problems and if you or anyone you know is, hit me up here.

Leave a Reply

The information provided on this website is for educational purposes only and should not be construed to be investment advice or considered to be a recommendation of any particular security, strategy or investment product. No portion of this content should be construed as an offer or solicitation for the purchase or sale of any security or investment. An offering may be made available only to certain sophisticated investors through official delivery of confidential offer documents along with other documents. Readers must understand that past performance is not a guarantee of future results.

%d bloggers like this: