Privacy-Preserving Natural Language Processing Using Homomorphic Encryption
Natural language, either in its written or spoken form, contains some of the most sensitive information we produce. In addition, both their content (what is said or written) and their method of production (how something is said or written) can contain personally identifiable information. These vulnerabilities make natural language documents prime targets for malicious actors. Although data is often encrypted at rest and in transit, one still needs to decrypt the data to use current NLP models for processing. To correct the situation, we need to devise NLP algorithms that work on obfuscated data without sacrificing data utility or algorithmic accuracy.
One promising technology for this task is homomorphic encryption (HE). HE is a type of encryption which allows for polynomial operations to be performed on encrypted data without having to decrypt them. A data owner might send their encrypted data up to the cloud for processing, a service provider can then perform computations on that data without needing a decryption key and without gaining potential knowledge of the underlying data, then send the results back to the data owner. The data owner alone can use their decryption key to uncover the results of the server's private and secure computations.
The talk will discuss the advantages and limitations of using homomorphic encryption for building privacy-preserving NLP, from work by Pathak et al. (2011) on a homomorphic model for speaker recognition, to our more recent work on performing non-polynomial operations within the encrypted domain (Thaine, Gorbunov, and Penn, 2019). In the latter, we show how to approximate non-polynomial activation functions within neural networks (e.g., ReLU and Sigmoid) without having to decrypt the input to those functions. We will then cover alternative methods for privacy-preserving computations and discuss each of their privacy trade-offs.