Gabriel Poesia

Increasing the Cost of Model Extraction with Calibrated Proof of Work (@ ICLR 2022)

Adam Dziedzic, Muhammad Ahmad Kaleem, Yu Shen Lu, Nicolas Papernot


This paper introduces a defense against model stealing. Suppose you make your (secret) ML model available online through an API. Legitimate users will query it to get and use predictions. An attacker might make queries in order to instead replicate the model. The authors propose adding a proof-of-work requirement to the API that is harder the more information the query is revealing about the model. This information is measured using PATE, a metric taken from the ML differential privacy literature (it involves training a few teacher models and measuring disagreement between them).

This is a really interesting use of proof-of-work. That said, thinking of a real scenario, I find it weird that the difference assumed between attackers and regular users would be that regular users are querying in-distribution, vs attackers out-of-distribution. Depending on the application, I think either both would be OOD or ID. For instance, let's say the model is an image classifier. Where would regular users be taking their images from? If from their private datasets, then that'd be OOD (as would an attacker's). If from some source that is associated with the system itself, that would be ID, but then the attacker could just to the same. There might be cases where this distinction makes sense, though my non-expert imagination cannot think of any on the spot.