Explain Privacy-preserving data mining methods .
Privacy-preserving data mining is an area of data mining research in response to privacy protection in data mining. It is also known as privacy-enhanced or privacy-sensitive data mining. It deals with obtaining valid data mining results without disclosing the underlying sensitive data values. Most privacy-preserving data mining methods use some form of transformation on the data to perform privacy preservation. Typically, such methods reduce the granularity of representation to preserve privacy. For example, they may generalize the data from individual customers to customer groups. This reduction in granularity causes loss of information and possibly of the usefulness of the data mining results. This is the natural trade-off between information loss and privacy. Privacy-preserving data mining methods can be classified into the following categories.
Randomization methods: These methods add noise to the data to mask some attribute values of records. The noise added should be sufficiently large so that individual record values, especially sensitive ones, cannot be recovered. However, it should be added skillfully so that the final results of data mining are basically preserved. Techniques are designed to derive aggregate distributions from the perturbed data. Subsequently, data mining techniques can be developed to work with these aggregate distributions.
"The k-anonymity and I-diversity methods: Both of these methods alter individual records so that they cannot be uniquely identified. In the k-anonymity method, the granularity of data representation is reduced sufficiently so that any given record maps onto at least k other records in the data. It uses techniques like generalization and suppression. The k-anonymity method is weak in that if there is a homogeneity of sensitive values within a group, then those values may be inferred for the altered records. The l-diversity model was designed to handle this weakness by enforcing intragroup diversity of sensitive values to ensure anonymization. The goal is to make it sufficiently difficult for adversaries to use combinations of record attributes to exactly identify individual records.
Distributed privacy preservation: Large data sets could be partitioned and distributed either horizontally (i.e., the data sets are partitioned into different subsets of records and distributed across multiple sites) or vertically (i.e., the data sets are partitioned and distributed by their attributes), or even in a combination of both. While the individual sites may not want to share their entire data sets, they may consent to limited information sharing with the use of a variety of protocols. The overall effect of such methods is to maintain privacy for each individual object while deriving aggregate results over all of the data.
Downgrading the effectiveness of data mining results: In many cases, even though the data may not be available, the output of data mining (e.g, association rules and classification models) may result in violations of privacy. The solution could be to downgrade the effectiveness of data mining by either modifying data or mining results, such as hiding some association rules or slightly distorting some classification models.
Comments
Post a Comment