Security Information Hiding in Data Mining on the Basis of Privacy Preserving Technique

Published on July 2016 | Categories: Types, Research, Internet & Technology | Downloads: 72 | Comments: 0 | Views: 650
of 5
Download PDF   Embed   Report

Journal of Computing, https://sites.google.com/site/journalofcomputing/

Comments

Content

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

41

Security Information Hiding in Data Mining on the Basis of Privacy Preserving Technique
Dr.R.Dhanapal, Gayathri Subramanian, M.R.Raja Gopal, K.Hemamalini
Abstract—Data mining has attracted a great deal of information industry and in society as a whole in recent years, due to the wide availability of huge amount of data and the imminent need for such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from market analysis, fraud detection and customer retention, to production control and science exploration. With and more information accessible in electronic forms and available on the web, and increasingly powerful data mining tools being developed and put into use, data mining may pose a threat to our privacy and data security .The real privacy concerns are with unconstrained access of individual records, like credit card, banking applications, customer ID, which must access privacy sensitive information. In this paper we investigate the issue of data mining, as data shared before mining the means to shield it with Unified Modeling Language diagrams. Describing the privacy preserving definition, problem statement privacy preserving data mining technique, Architecture of the proposed work. We propose an amalgamated scaffold for Privacy Preserving Data Mining that ensures that the mining process will not trespass Privacy up to a certain degree of security. Index Terms—Association Rules, Clustering, Confidence, Data Snooping, Data Sanitization, Privacy, Privacy Preserving Data Mining, Sensitive Data, Unified Modeling Language.

——————————  ——————————

1 INTRODUCTION
remain private even after the mining process. The problem that arises when confidential information can be derived from released data by unauthorized users is also commonly called the “database inference” problem. Using UML methodology the privacy model to be portrayed with use of several diagrams, such as logical diagrams, use case diagrams, scenario and activity diagrams, collaborations and distribution diagrams, Through the analysis of different occurring privacy preserving research project work scenarios, we were able to define the use case type, applying the appropriate UML diagrams. Conventional research project record maintenance poses sample obstacles to intruders, because those seeking to inspect records must have authorization. They can view records only in person. Moreover, because paper records were decentralized – a single project records maybe disjointed across a number of places in the event of a rupture of security, illegitimate access would be restricted. The remainder of this paper is organized as follows: Section 2 offers an overview of the privacy preserving Data mining. In this section we have also analyzed the different problems in Data mining and the existing ———————————————— solutions. Section 3 discusses the problem statement,  Prof.Dr.R.Dhanapal is with the Department of Computer Applications, Easwari Engineering College, Affiliated to Anna University of Technology, PPDM techniques for the research project services. Chennai – 600 089, Tamil Nadu, India. Section 4 presents the block diagram, PPDM techniques  Gayathri Subramanian is with the Department of Computer Science, for the research lab services. Section 5 discusses the R B Gothi Jain College for Women, Affiliated to University of Madras, implementation details using UML diagrams. Section 6 Chennai – 600 052,Tamil Nadu India, a research scholar pursuing Ph.D concludes this paper with a brief summary. Computer Science in Dravidian University
 M.R. Raja Gopal is with the Department of Computer Science, Swami Dayananda College of Arts and Science, Manjakkudi, a research scholar pursuing Ph.D Computer Science in Dravidian University  K.Hemamalini doing second year MCA at Easwari Engineering College, Affiliated to Anna University of Technology Chennai.

Data mining technology provides the number of advantages using automated tools to analyze corporate, research and development, biological, Financial data, retail industry, telecommunication industry, and other scientific applications can help to find way to increase efficiency of organization, industry, or in medical applications. Privacy preserving data mining [1,2], is a novel research direction in data mining and statistical databases [3], where data mining algorithms are analyzed for the side-effects they incur in data privacy. Knowledge can equally well compromise data privacy, as we knowledge about individuals or groups that could be against privacy policies, especially if there is potential dissemination of discovered information. Another issue that arises from this concern is the appropriate use of data mining. Due to the value of data, databases of all sorts of content are regularly sold, and because of the competitive advantage that can be attained from will indicate. The main objective in privacy preserving data mining is to develop algorithms for modifying the original data in some way, so that the private data and private knowledge

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

42

2 LITERATURE SURVEY
Security is an important issue with any data collection that is shared and/or is intended to be used for strategic decision making. In addition, when data is collected for customer profiling, user behavior understanding, correlating personal data with other information, etc., large amounts of sensitive and private information about individuals or companies is gathered and stored. This becomes controversial given the confidential nature of some of this data and the potential illegal access to the information. Moreover, data mining could disclose new implicit knowledge discovered, some important information could be withheld, while other information could be widely distributed and used without control. Multifarious issues, such as those concerned in Privacy Preserving Data Mining (PPDM), cannot simply be addressed by restricting data collection or even by restricting the secondary use of information technology [4, 5, and 6]. A fairly accurate explanation could be a dequate, depending on the relevance since the suitable altitude of privacy can be interpreted in diverse contexts [7, 8]. In some applications (e.g., association rules, classification, or clustering), an apt equilibrium between a want for privacy and knowledge discovery should be originated. Preserving privacy when data are pooled for mining is an exigent predicament. The usual methods in database security, such as access control and authentication that have been adapted to Lucratively handle the access to data present some restrictions in the milieu of data mining. While access control and authentication protections can preserve against direct disclosures, they do not address disclosures based on inferences that can be strained from released data [9, 10, and 11]. Preventing this sort of inference discovery is beyond the reach of the existing methods [16, 19]. In this paper we address the issue of privacy preserving Data Snooping for a scenario in which the parties owning confidential databases wish to run a Data Snooping algorithm on the union of their databases, without revealing any sensitive information.

3 PRIVACY PRESERVING DATA MINING STATEMENT
Privacy Preserving Data mining Analysis is an amalgamation of the data of heterogeneous users without disclosing the private and susceptible details of the users.

3.1. Problem Statement
Stipulation of a comprehensible but prescribed approach for early privacy preserving analysis in the milieu of component based software development, in order to evaluate and compare with apiece and all the Techniques in a universal platform and to devise, build up and execute functionalities like a User friendly framework, portability etc.

3.2. Classification of Privacy Preserving Techniques
There are many approaches which have been adopted for privacy preserving data mining. We can classify them

based on the following dimensions:  Data distribution  Data modification  Data mining algorithm  Data or rule hiding  Privacy preservation The first dimension refers to the distribution of data. Some of the approaches have been developed for centralized data. Distributed data scenarios can also be classified as horizontal data distribution and vertical data distribution. The second dimension refers to the data modification In general; data modification is used in order to modify the original values of a database that needs to be released to the public and in this way to ensure high privacy protection  Perturbation, which is accomplished by the alteration of an attribute value by a new value (i.e., changing a 1-value to a 0-value, or adding noise),  Blocking, which is the replacement of an existing attribute value with a “?”,  Aggregation or merging which is the combination of several values into a coarser category.  Swapping that refers to interchanging values of individual records.  Sampling, which refers to releasing data for only a sample of a population? The third dimension refers to the data mining algorithm, for which the data modification is taking place. This is actually something that is not known beforehand, but it facilitates the analysis and design of the data hiding algorithm. The fourth dimension refers to whether raw data or aggregated data should be hidden. The complexity for hiding aggregated data in the form of rules is of course higher, and for this reason, mostly heuristics have been developed. The last dimension, which is the most important, refers to the privacy preservation technique used for the selective modification of the data. Selective modification is required in order to achieve higher utility for the modified data given that the privacy is not jeopardized. The techniques that have been applied For this reason are:  Heuristic-based techniques like adaptive modification that modifies only selected values that minimize the utility loss rather than all available values.  Cryptography- based techniques like secure multiparty computation where a computation is secure if at the end of the computation, no party knows anything except its own input and the results.  Reconstruction-based techniques where the original distribution of the data is reconstructed from the randomized data.

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

43

4 SHARING RESEARCH PROJECT WORK DATA
Investigating and analyzing the predominance, frequency of various research project work for understanding various research project and treating them. Such analyses have considerable bang on policy decisions. A palpable precondition to (carrying out) such studies is to have the indispensable data available. First, various similar project data has to be collected from several research lab area providers, here projects 1- n. It has to subject to data sanitization and then integrated. The data that is required for pattern evaluation and knowledge mining alone is selected by filtering. These heterogeneous data’s are converted to the desired format. This course of action is tremendously time consuming and toil demanding. Privacy concerns are a major hindrance to streamlining these efforts. Infringing privacy can lead to significant dent to individuals both materially and psychologically. Privacy is addressed be nowadays by preventing propagation to a certain extent than integrating privacy constraints into the data sharing process. Privacy preserving amalgamation and partaking of research data has become vital to enabling scientific innovation.
Project 1 Unpreserved data Cleaning and data integration Database & dwh se rver Knowle dge base Data  snooping engine Pattern  Evaluator

held in reserve for records. These similar project data are collected from several research lab or research project. The patient data will be in dissimilar formats. These facts are keyed in to the database server. This input data will be converted into the preferred format and stored in these database servers. The second data input coming from the data warehouse is sent to the data warehouse servers. The data warehouse server contains a collection about various things. From the database server pool, we choose only the most wanted data and transform it to the desired format. This transformed data is the input data on which we need to run the Data Snooping techniques. Instead, of sending the data directly for Data Snooping we make the data to be obscured by using different privacy preserving Data Snooping techniques with the intention of preserving the sensitive information. This privacy preserved obscured data is the subjected to the various different Data Snooping techniques like classification, association, clustering etc., on the input preserved data. The extracted patterns are sent to the pattern evaluator and the interesting patterns are visually shown for further analysis.

5 UML PPDM MODELS
Figure shows an essential part of the use case diagram specifying the behavior of a PPDM system. An actor exterior to the box characterizes an external entity cooperating with the system. The use cases within the box characterize system functionalities afforded to the external actors, where each use case can include or be extended by other use cases. The use case diagram is complemented by textual use cases with a varying degree of formality from an informal, casual description to the use of a semiformal template specifying details of each use case.

Project 2   Project n

Patterns

Select and transformation Filtering

Privacy preserving data

GUI

Data  Warehouse

ppdm technique

Knowle dge

Fig. 1. System Architecture

5.1. PPDM System Use Cases 4.1. Unpreserved Data
Data of several project works are collected in research lab and kept for records. These project data are collected from several research labs. The similar project data will be in different formats. These data are given as an input to the database server. The data will be converted into a desired format and stored in these database servers. We get one more data input from the data warehouse and send it to the data warehouse servers. From the output of the database. Servers we select only the desired data and transform to the desired format for which we need to run the Data Snooping engine. We use different Data Snooping techniques like classification, association, clustering etc. on the input unpreserved data. The extracted patterns are sent to the pattern evaluator and the interesting patterns are visually shown for further analysis. Under the current state of affairs of hi-tech developments which has obliterated the distinctions of researcher project work data kept in private and public; we are incapable of shielding the project privacy. The project records are kept in private, various research labs. In budding project systems, the researcher or administrator responsibility as a research work privacy is under grave assault. Relationships between research & development organization and researcher have been transformed so that researcher may no longer be able to have power over project information in the manner they once did. Furthermore, new information technologies have enhanced the significance and latent uses of project data; as a result, third-party demands for right to use have increased, with attendant risks to project privacy. The highly sensitive information of the project work has to be conserved and then mined for effective data dredging. The UML (Unified Modeling Language) methodology allows the PPDM model to be described with use of diagrams, use case diagrams, class and class structure diagrams.

4.2. Privacy Preserved Data mining
Amount of Data is quite a lot of various similar projects that is collected from various research labs and

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

44

unyielding advances in the upcoming of PPDM. Our exploration concludes that our Privacy Preserving Data Mining framework is reusable, customizable, and effective, meets privacy requirements, and guarantees well-founded Data Snooping results while shielding vulnerable information (e.g., sensitive knowledge and individuals' privacy).

7 ACKNOWLEDGMENTS
The author would like to thank the reviewers for their constructive suggestions and comments.

REFERENCES
Fig. 2. Use case diagram of data mining system [1] Chris Clifton and Donald Marks, Security and privacy implications of data mining, In Proceedings of the ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (1996), 15–19. [2] Daniel E. O’Leary, Knowledge Discovery as a Threat to Database Security, In Proceedings of the 1st International Conference on Knowledge Discovery and Databases (1991), 107–516. [3] Nabil Adam and John C. Wortmann, Security- Control Methods for Statistical Databases: A Comparison Study, ACM Computing Surveys 21 (1989), no. 4, 515–556. [4] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Hippocratic Databases. In Proc. Of the 28th Conference on Very Large Data Bases, Hong Kong, China, August 2002. [5] L. Brankovic and V. Estivill-Castro. Privacy Issues in Knowledge Discovery and Data Mining. In Proc. Of Australian Institute of Computer Ethics Conference (AICEC99), Melbourne, Victoria, Australia, July 1999. [6] S. R. M. Oliveira and O. R. Zaiane. Foundations for an Access Control Model for Privacy Preservation in Multi- Relational Association Rule Mining. In Proc. of the IEEE ICDM Workshop on Privacy, Security, and Data Mining, pages 19 -26, Maebashi City, Japan, December 2002. [7] C. Clifton, W. Du, M. Atallah, M. Kantarcio_glu, X. Lin, and J. Vaidya. Distributed Data Mining to Protect Information. Privacy. Proposal to the National Science Foundation, December 2001. [8] C. Clifton. Using Sample Size to Limit Exposure to Data Mining. Journal of Computer Security, 8(4):281-307, November 2000. [9] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness [10] C. Farkas and S. Jajodia. The Inference Problem: A Survey. SIGKDD Explorations, 4(2):6{11, December 2002.KnowledgeBased Systems, 10(5):557-570, 2002. Dr.R.Dhanapal obtained his Ph.D in Computer Science from Bharathidasan University, India. He is currently Professor of the Department of Computer Applications, Easwari Engineering College, Affiliated to Anna University of Technology Chennai, Tamil Nadu India. He has 25 years of teaching, research and administrative experience. Besides being Professor, he is also a prolific writer, having authored twenty one books on various topics in Computer Science. His books have been prescribed as text books in Bharathidasan University and autonomous colleges affiliated to Bharathidasan University. He has served as Chairman of Board of Studies in Computer Science of Bharathidasan University, member of Board of Studies in Computer Science of several universities and autonomous colleges. Member of standing committee of Artificial Intelligence and Expert Systems of IASTED, Canada and Senior

5.2 Use Case Description
This gives us a comprehensive portrayal of how a system will be used. It endows us with an outline of the projected functionality of the system. PPDM main success scenario (basic flow) that can be extracted from Use Case Diagram shown in figure 2 is understandable by laymen as well as professionals. Class diagram as shown in figure 3 show the static structure of the Object, their internal structure, and their relationships.

Fig. 3. Class diagram

6.

CONCLUSION

The work presented in here, indicates the ever increasing interest of researchers in the area of securing sensitive data and knowledge from malicious users. The conclusions that we have reached from reviewing this area manifest that privacy issues can be effectively considered only within the limits of certain data mining algorithms. In this paper we are defining privacy preservation in data mining, and the implications of benchmark privacy doctrine in information detection and we are advocating a few policies for PPDM based on these privacy principles. These are vital for the development and deployment of methodological solutions and will let vendors and developers to construct

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 10, OCTOBER 2010, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG Member of International Association of Computer Science and Information Technology (IACSIT), Singapore. He has Visited USA, Japan, Malaysia, and Singapore for presenting papers in the International conferences and to demonstrate the software developed by him. He is the recipient of the prestigious ‘Life-time Achievement’ and ‘Excellence’ Awards. He is serving as Principal Investigator of UGC sponsored innovative, major and minor research projects about 1.6 crore. He is the recognized supervisor for research programmes in Computer Science leading to Ph.D and MS by research in several universities including Anna University of Technology Chennai, Bharathiar University, and Manonmaniam Sundaranar University. He has got 47 papers on his credit in international and national journals. Mrs.S.Gayathri Subramanian M.Sc. M.Phil., Head of the Department of Computer Science, R.B.Gothi Jain College for Women, Redhills, Chennai - 52, India, pursuing research leading to Ph.D in Dravidian University, Andhra Pradesh, India under the guidance and supervision of Prof.Dr.R.Dhanapal. Current Research Interest: Data Mining. Mr.M.R.RajaGopal M.Sc., MCA., MBA., M.Phil., Head of the Department of Computer Science, Swami Dayananda College of Arts and Science, Manjakkudi, pursuing reserach leading to Ph.D in Dravidian University, Andhra Pradesh, India under the guidance and supervision of Prof.Dr.R.Dhanapal. Current Research Interest: Data Mining. Ms.K.Hemamalini doing Second Year MCA at Easwari Engineering College, affiliated to Anna University of Technology, Chennai, India under the guidance and supervision of Prof.Dr.R.Dhanapal. Current Research Interest: Data Mining.

45

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close