Privacy Preserving Data Quality Assessment


Julien Freudiger
Shantnanu Rane
Ersin Uzun

Privacy Preserving Data Quality Assessment

In a data-driven economy that struggles to cope with the volume and diversity of information, data quality assessment has become a necessary precursor to data analytics. Real-world data often contains inconsistencies, conflicts and errors. Such dirty data increases processing costs and has a negative impact on analytics. Assessing the quality of a dataset is especially important when a party is con- sidering acquisition of data held by an untrusted entity. In this sce- nario, it is necessary to consider privacy risks of the stakeholders. This paper examines challenges in privacy-preserving data quality assessment. A two-party scenario is considered, consisting of a client that wishes to test data quality and a server that holds the dataset. Privacy-preserving protocols are presented for testing important data quality metrics: completeness, consistency, uniqueness, timeliness and validity. For semi-honest parties, the protocols ensure that the client does not discover any information about the data other than the value of the quality metric. The server does not discover the parameters of the clients query, the specific attributes being tested and the computed value of the data quality metric. The proposed protocols employ additively homomorphic encryption in conjunction with condensed data representations such as counting hash tables and histograms, serving as efficient alternatives to solutions based on private set intersection.

Additional information

Focus Areas

Our work is centered around a series of Focus Areas that we believe are the future of science and technology.

Licensing & Commercialization Opportunities

We’re continually developing new technologies, many of which are available for¬†Commercialization.


Our scientists and staffers are active members and contributors to the science and technology communities.