How zero-knowledge proofs can certify Machine Learning model accuracy

by **Pratyush Ranjan Tiwari, PhD student @Johns Hopkins**

ML as a service (MLaaS) is a business, the model owner has a model of high value, it either leverages sensitive data only available to the model owner, or its the best model for that task, which can be both due to data asymmetry or due specialized ability of the model owner in training high-accuracy models. In either case, ML models are very valuable, so its unideal to expose model parameters publicly. Hence, the model is private and the only way to use the model is by submitting a request for inference to the model owner.

However, current systems make an assumption that the model owner is honest. The model owner self-certifies its own model’s accuracy, and the fact that the said model was used for inference. People pay for using ML models, in the pay-per-inference model, it is not clear how to ensure that the model owner does not cheat. The model owner promises that their model works with over x% accuracy, and there is no verification process.

Untitled

Taking Amazon Forecast feature as an example, assume Amazon promises 90% accuracy for forecasting business expenses for certain types of businesses:

There is no way to verify model accuracy, amazon will not reveal model parameters: naively loses its very valuable model
Assume the model is very accurate, suppose amazon wants to save costs and it has two models: model M1 with >90% accuracy and costs 2 cents per inference for amazon, and model M2 with 70% accuracy but costs only 1 cent per inference to Amazon
Assuming even a third trusted party which certifies M1’s accuracy, there is no way to know if amazon used M1 or M2 or a random guess for your classification problem
Amazon can alos use M2 every other time to save costs

Zero-knowledge (zk) proofs can help us resolve this issue. Zero-knowledge proof protocols allow a prover to prove to a verifier that a given statement is true without revealing any other information. Interested readers can refer to the following material to learn more about zk proofs: non-technical introduction and relevance, theory of zk. In this case, the prover statement has two main aspects, the prover proves that:

I own a model for classification/inference for a particular task with high accuracy
I used this same model to perform classification/inference using the given input x and the classification/inference produces the output y

At a high level, the protcol proceeds as follows:

Amazon, the prover, commits cryptographically to M1, the commitment is binding (later can not be equivocated to another model) and hiding (does not reveal details about M1)
A test input $i$ is chosen by a verifier after receiving commitment $c$
The prover performs inference of $i$ under model M1, prepares proof $\pi$ which attests to the following statement: “The input i classifies as output $o$ under a model whose commitment is $c$”
The verifier checks this proof using input $i$, received inference output $o$ and model commitment $c$
Combining a bunch of these proofs allows for proving model accuracy

zk Proof protocols for ML models

Naively designing proof circuits which convert a ML model to an arithmetic circuit is not always a feasible solution. In most cases such circuits are too large and consequently, the prover size computation is very taxing. To resolve this issues, many research papers and projects have aimed to come up with specialized protocols for particular ML models. We know how to build (reasonably) efficient zk proof protocols to prove model classification and accuracy for the follow ML models:

1) Linear and Logistic Regression

In linear regression, given a data set of a form where each input is of the form $(x_i, y_i)$ such that $x_i$ is a vector of size $p$. The goal is to find a linear relationship between different inputs such that the loss function is minimized. Mean squared error is a common loss function utilized for this purpose. Therefore, the the model takes the form: