events contact us
Search the complete PARC site
 

Privacy Appliance

Private data is collected as a normal part of our interactions with healthcare providers, insurers, retail stores, and the government. Although these data can be used to learn private information about us, this is not a requirement for many beneficial applications. For example, research on medical outcomes, social issues or purchase patterns doesn't necessarily require the identification of the individuals in the respective medical, census and marketing databases.

PARC has begun work on a "privacy appliance" that protects privacy while allowing the data to be put to beneficial use. Operating as a "privacy firewall," a privacy appliance sits between data consumers and data sources to filter queries into those data sources and return only data that do not violate privacy. The appliances are owned and operated by the data owners. Little or no change is required of the data sources.

Technical Challenges

For the privacy appliance to be effective, it must limit the possibility of direct and indirect disclosure of individual identities. A key challenge is to protect the privacy of the individuals represented in the data while retaining the usefulness of the data. Several techniques, including inference and access controls, are used to address this problem. In addition, searchable and immutable access logs are created to reduce the threat of abuses.

Inference Control.  The first means of preventing direct disclosures are simple. Data such as names, social security numbers, credit card numbers, addresses, phone numbers and other identifying attributes are withheld from query results.

Controls must also include methods of preventing the inference of identity based on the combination of data. It has been shown that even seemingly innocuous attributes can, when taken together, be used to compromise an individual's privacy. In the example table below, social security number is clearly identifying and, more surprisingly, individuals can be identified when their sex, zip code and year of birth are all known. Hence, those three attributes are said to form an inference channel. Indeed, 87% of the US population is uniquely identifiable by sex, zip code and date of birth (month, day and year), according to the 1990 US Census.

SSN Sex Zip code Year of Birth
123-45-6789 Male 94305 1976
234-56-7891 Male 94305 1977
345-67-8912 Female 93165 1977
456-78-9123 Female 93165 1976
567-89-1234 Male 93165 1976
678-91-2345 Female 94305 1977
789-12-3456 Male 93165 1977
891-23-4567 Female 94305 1976

Inference controls will also include statistical analysis of data.  A statistic is considered sensitive if it reveals information about an individual or if sensitive information can be inferred from statistical summaries. Statistical inference control has been widely used to protect databases such as census data, and the standards of operation are well defined.  If queries are computed over too few records, the privacy appliance can label the data as sensitive and manage access accordingly.

Access Control.   The privacy appliance's access control will block queries that request identifying information and will block or modify queries that include any of the undesired inferences identified by the inference control tool.

The access controls also prevent queries that request combinations of data that have been identified as sensitive by the inference controls.  This mechanism inhibits the disclosure of information, both within a single query and over time, from which an individual identity could be inferred.  In the example used earlier, an individual who has seen the sex and zip code fields would be prevented from viewing the final piece of information, year of birth, in any subsequent queries. We are designing protocols that allow flexible information access and fast query responses, while ensuring that no inference channels are disclosed.

Searchable Audit Logs.   Audit logs ensure that all access to the data is recorded immediately and permanently, with no possibility of alteration. This capability is important to protect individuals against potential abuse of personal data. Information that we believe is safe to release today may turn out to be privacy-compromising in the future. Logs reveal who has accessed this information. In addition, agents who have used the database may be compromised and in such an event, logs reveal what information the agents have accessed. No one would be able to misuse data without the strong probability of detection. 

However, the logs themselves are sensitive and must be protected. We are designing tamper-resistant logging mechanisms that protect the logs through encryption, while enabling controlled search through the use of identity-based encryption. With our mechanisms an escrow agent can issue a search capability to identify which queries pertain to a certain keyword, while releasing no unnecessary additional information.

Applications

Government Databases. Government agents mine intelligence data to build models capable of predicting future terrorist attacks. Our inference control and logging technologies would allow authorized agents to search for indications of terrorist-related activity while limiting the potential to compromise the privacy of individuals. Undesired inferences may occur across data sources, which is the reason for the cross-data source privacy appliance. Its inference control component works the same as that of the individual privacy appliances, except that instead of analyzing a single data source, its inference control tool analyzes the collection of data sources. While all of the analysis may be done here, it is safer for the privacy appliances closest to the data sources to do as much as possible, to keep the privacy mechanisms under the control of the data owners.

Consumer Settings. Individuals are often asked directly to release personal information. For example, in retail settings, individuals may be asked for demographic information in return for coupons or other discounts.  It is difficult for the individual to evaluate the privacy risks of releasing the information because they do not know the attributes of the other respondents. We are designing a personal privacy appliance that allows individual users to evaluate the privacy risks of releasing information. The personal privacy appliance will store an individual's personal information (e.g. shopping and entertainment preferences) and inform the user as to the risk of identification coming from releasing any of this information.

 

This work was partially funded by DARPA contract F30602-03-C-0037.

BUSINESS CONTACT
Mark Grandcolas
Director of Business Development, Computing Science Laboratory
650-812-4429
RELATED INFORMATION

Security & Privacy

Usable Security

Network-in-a-Box Solution

PUBLICATIONS

Secure Conjunctive Keyword Search over Encrypted Data

Dynamic Inference Control

Building an Encrypted and Searchable Audit Log

Private Inference Control

 
   

  (Logo/Homepage) PARC - Palo Alto Research Center

Copyright © 2002-2007 Palo Alto Research Center Incorporated. All Rights Reserved.
PARC, the PARC Logo, AspectJ, DataGlyph, Obje, Silx, StressedMetal, and ClawConnect
are trademarks or registered trademarks of Palo Alto Research Center Incorporated.