COMPUTER SECURITY

Published on July 2016 | Categories: Documents | Downloads: 29 | Comments: 0 | Views: 360

of 40

Content

ARTIFICIAL IMMUNE SYSTEMFORCOMPUTER SECURITY
11/24/2013 Sweta leena Panda IIIT BHUBANESWAR

ARTIFICIAL IMMUNE SYSTEM
FOR COMPUTER SECURITY
A PROJECT SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Bachelor of Technology in Information technology By: Sweta leena panda (B40054) Under the Guidance of Prof. Anjali Mohapatra

Department of Information Technology International Institute of Information Technology Bhubaneswar DEPARTMENT OF INFORMATION TECHONOLOGY INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY BHUBANESWAR (ODISHA)

CERTIFICATE

This is to certify that the dissertation entitled “----Artificial Immune System For Computer Security ----” submitted by – Sweta leena panda (B410054)--- is approved for the award of Degree of Bachelor of Technology Information Technology .

PROJECT GUIDE (Prof.Anjali Mohapatra) DATE: 24/ 11/13

CO-ORDINATOR DEPARTMENT OF CSE AND IT, IIIT BHUBANESWAR DATE: 24/ 11 /13

ACKNOWLEDGEMENTS

On the submission of our Project for “ Artificial Immune System for Computer Security ”, we would like to extend our gratitude & sincere thanks to our Mentor Prof. Anjali Mohapatra , Department of Information Technology for her constant motivation and support during the course of our work in the 3 months. I truly appreciate and value her esteemed guidance and encouragement from the beginning to the end of this project. I am indebted to her for having helped us shape the problem and providing insights towards the solution. Above all, i would like to thank all our friends whose direct and indirect support helped us complete our project in time. This thesis would have been impossible without their perpetual moral support.

Sweta leena panda (B410054)

TABLE OF CONTENTS
Chapter No. Title Abstract List of Figures 1. Introduction 1.1. Purpose 1.2. History 1.3.Scope 1.4.Aim 1.4.1 computer security 1.4.2 Computer virus 2.Literature survey 2.1.Informatio security & virus 2.2.Anamoly detection & signature reorganization 2.3Auomatic manual detection & technique 2.4 Signature reorganization 3.Analysis 3.1 AIS & BIS 3.2 HIS vs AIS 3.3 Motivation 4.AIS models 4.1 Different models 5.Proposed work 5.1. Agent structure Page No. I II 1-3 1 1 2 2 2 3 4-6 5 5 5 6 9-10 9 10 14 15-20 15 20-22 21

5.2.Koskos bidirectional Associated memory 6.Implemented algorithm 6.1Code for agent basis 6.2Code for detect viral & legal code 7.Discussion 8.Current & Future Scope 9.Conclusion 10.Reference

22 26-27 26 27 27 29 30 31-32

List of Figures

Title fig.1: T Cells And B Cells Interact With Antigens fig.2: Graphical Representation of the Life Cycle of T Cells
and B Cells fig.3: Shows negative selection process fig.4: Fig. 4. Shows the partial matching rule fig. 5: Shows a self & non self pattern . fig. 6: Self & non self pattern works in human immune system fig.7:shows the hierarchal model for virus detection. fig. 8:Agent based flow chart fig. 9:Bam model structure

Page no. 12 13 15 16 17 19 20 21 22

Abstract
The paper describes on how to detect the virus using BAM MODEL , which is useful as an administrative tool for virus detection. along BAM MODEL many models are there which helps to detect the virus . BAM MODEL is developed to detect faults, abnormalities , intrusions. BAM MODEL is soft computing based . It captures many features of the vertebrate immune system and places them in the context of the problem of protecting a network of computers from illegal intrusions & virus detection.

1.Introduction
1.1 Purpose
Artificial Immune Systems (AIS) is a branch of biologically inspired computation focusing on many aspects of immune systems. AIS development can be seen as having two target domains: the provision of solutions to engineering problems through the adoption of immune system inspired concepts and the provision of models and simulations with which to study immune system theories.

1.2 History
Over the years, biology has provided a rich source of inspiration for many different scientists in many different domains, ranging from the design of aircraft wings to bulletproof vests. In computing, there has been an extensive amount of work undertaken on the use of biological metaphors, for example neural networks [Haykin 1999], swarm systems [Kennedy & Eberhart 2001], genetic algorithms [Holland 1975] and genetic programming [Banzhaf et al. 1998]. Recently, there has been increasing interest in using the natural immune system as a metaphor for computation in a variety of domains [de Castro & Timmis 2002a]. This field of research, Artificial Immune Systems (AIS), has seen the application of immune inspired algorithms to problems such as robotic control [Krohling et al. 2002], network intrusion detection [Forrest et al. 1997, Kim 2002], fault tolerance [Canham & Tyrrell 2002, Ayara 2005], bioinformatics [Cutello et al. 2004, Nicosia 2004] and machine learning [Kim & Bentley 2002a, Knight & Timmis 2003, Watkins et al. 2004], to name a few. To many, trying to mimic how the immune system operates in a computer may seem an unusual thing to do, why then would people in computing wish to do this? The answer is that, from a computational point of view, the immune system has many desirable properties that they would like their computer systems to possess. These properties are such things as robustness, adaptability, diversity, scalability, multiple interactions on a variety of timescales and so on. There is a real challenge in the world of computer science (and engineering) to build systems that can cope with increasingly complex problems, and are thus more scalable and robust (i.e. they break less!). Indeed, there is the notion of a Grand Challenge in computer science to try and do this very thing [Stepney et al. 2005a]. When working in the world of biologically inspired computing, a word of caution should be given. It is essential that metaphors are adopted carefully. Just because the immune system has the desirable property , it does not mean that it is necessarily suitable to solve your problem with, therefore, careful thought has to be given to the applicability of any technique [Freitas & Timmis 2003]

1.3 Scope
Artificial Immune Systems (AIS) are being used in many applications such as:1)anomaly detection 2)pattern recognition 3)data mining 4)computer security 5)adaptive control 6)fault detection .

1.3 Aim
1.3.1 Computer Security Now world has become a more interconnected place. Electronic communication, e-commerce, network services and the Internet have become vital components of business strategies, government operations, and private communications. Many organizations have become dependent on the wired world for their daily activities. This interconnectivity has also brought forth those who wish to exploit it. Computer security has, thus, become a necessity in the digital age. While information dependence is increasing, the threat from malicious code, such as computer viruses, is also on the rise. The number of computer viruses has been increasing exponentially from their first appearance in 1986 to over 55 000 different strains identified today . Viruses were once spread by sharing disks; now, global connectivity allows malicious code to spread farther and faster. Similarly, computer misuse through network intrusion is on the rise. With the rapid development of computer technology, new anti-malware technologies are required because malware is becoming more complex with a faster propagation speed and a stronger ability for latency, destruction, and infection. Many companies have released anti-malware software, most of which is based on signatures and can detect known malware very quickly. However, the software often fails to detect new variations and unknown malware. Based on metamorphic and polymorphous techniques, even a layman is able to develop new variations of known malware easily using malware automaton. Thus, traditional malware detection methods based on signatures are no longer suitable for new environments; as well, heuristics have started to emerge. For the past few years, applying immune mechanisms to computer security has developed into a new field, attracting many researchers. Forrest applied immune theory to computer abnormality

detection for the first time in 1994 . Since then, many researchers have proposed various different malware detection models and achieved some success.

1.3.2 Computer virus A computer virus is a program that can “infect” other programs by modifying them to include a possibly evolved version of itself . One distinguishing feature of viruses is that they are parasitic. They require a host to run them and to spread their viral code. This is usually another executable program although other hosts, such as disk boot sectors, can be infected. Computer viruses are usually classified by their method of infection. The common subclasses of viruses are file infector, boot sector and macro viruses. The file infector is the type degenerates into enumerating all possibilities, which leads to its classification as NPC. This indicates that a polynomial-time algorithm does not exist for generating antibodies, so an approximation algorithm is the only choice. File infection viruses work by inserting their code into executable files, just as the biological virus works by inserting its DNA code into living cells. The host file then executes the malicious code on behalf of the virus. Boot sector viruses attach themselves to specific areas of a disk that are loaded and executed on startup. By placing its viral code into the boot sector of the disk, a virus can gain control of the computer immediately upon boot up. This allows the virus to execute before anything can detect its existence. The macro virus is a section of code contained within an application document. The intent of this capability was to add automation capabilities to otherwise static documents. As a further boon to virus writers, macro viruses are much easier to write than before because macros use high-level languages and do not require specific operating system knowledge. Worms are programs that execute independently with the distinguishing feature that they utilize a computer network in order to propagate themselves. The first worms were built at the Xerox Palo Alto Research Center. They were designed to perform useful work in a distributed environment, such as finding idle resources. These original worms would probably be called mobile agents today. The Melissa virus is more accurately termed a worm as it used the features of Microsoft Exchange e-mail in order to spread itself across networks. More modern malicious code utilizes a variety of techniques, blurring the distinction among forms: Nimda uses e-mail as one of its several transport mechanisms, exhibits viral propagation on infected machines, and provides Trojan horse capabilities

2. Literature survey
There are several sources to get the related literature. They are 1) journals in the library and some online resource, 2) conference papers, 3) books. I use some methods to do this search. First, based on the problem definition, look for some recent good survey and review papers on related areas. In my process I found some such papers. They are [WCE2008_pp38-46, WangWei_CIS09, umi-okstate-1091 , A00, FH01]. s. From the information of these papers I trace the valuable systems and algorithms and further my collection of papers. Another way to get a good stand entry to the whole paper pool is to look for the bibliography of people working on the related area. Then in the search of these papers the related journals and conferences are categorized and these sources are searched in detail.

Journals related to computer virus include: 1) Computers Security 2) Computer Networks

Conference related to computer virus (CD)

1) 8th Annual Genetic and Evolutionary Computation Conference, 2) IEEE 37th Annual 2003 International Carnahan Conference on Security Technology

3) Immunity-Based Systems: A survey”, 1997 IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, 1997 4) IEEE Int. Conf. Systems, Man and Cybernetics ,(Oct. 1998)

There is a large volume of literature on intrusion detection systems and techniques. Because computer security is a new research area beginning from 80’, most papers are published in the recent twenty years. I include over sixty papers in the bibliography. These papers can be categorized into the following topics. 1) Information security 2) Computer security systems 3) Algorithms used on computer security

The key papers selected will be discussed in the following review sections. There are about 10 papers. The key papers are in two types. One is some survey papers that show the classification and comparison of the computer security and algorithms. The other is those papers introducing signature recognition algorithms and relative.

2.1 Information security and virus With the development of computer technology and application, computer security becomes an important issue. Computer security has six elements: availability, integrity, confidentiality, utility, authenticity, and possession. Virus can be defined as any set of activities aimed at breaking the security of a computer network system. Computer security may take many forms: external attacks, internal misuses, networkbased attacks, information gathering, denial of service etc. [Immunological computation Theory and Applications , AdvancesAIS]. It effectively detects the attack activities when they occur in the system. From the original seminal rule-based pattern matching4. Information security and intrusion detection With the development of computer technology and application, computer security becomes an important issue. Computer security has six elements : availability, integrity, confidentiality, utility, authenticity, and possession. An intrusion can be defined as any set of activities aimed at breaking the security of a computer network system [HLMS90]. Intrusion may take many forms: external attacks, internal misuses, network based attacks, information gathering, denial of service etc. [YGFZ98, B00]. Information security with regard to such intrusions mainly includes intrusion prevention, detection, diagnosis, response, and system recovery stages.

2.2 Anomaly detection and signature recognition Based on detection schemas there are two major types of approaches in practical use: anomaly detection or behavior-based, and signature detection or pattern matching [DDW99, A00, E981]. Anomaly detection tries to learn the normal behaviors of the subjects in computer system and to build up the profiles for them. These subjects may be users, host machines or networks. Then in detection it classifies the activities as attacks if they deviate significantly from the normal profiles. The techniques used in anomaly detection include specification-based profiling, artificial neural network, regression, computer immunology, Markov chain, Bayesian network, hidden Markov model and statistics based profiling such as stochastic process control. On the other hand the signature recognition method tries to recognize the signatures (patterns) of normal and intrusive activities predefined in training data. In detection it matches these signatures with coming data to determine the nature of the activities. Signature recognition tries to look for the behavior that is “bad” by definition. Anomaly detection may detect unknown attacks. However it tends to give false alarms if the anomalies are caused by behavioral irregularity instead of intrusions. Signature recognition is good at this task but is likely to be cheated by the novel intrusions. Hence anomaly detection and signature recognition techniques are often used together to help each other. Signature recognition intends to give more accurate prediction than anomaly detection. It aims to learn and abstract the patterns of normal and intrusive activities from the training data and to make prediction in detection through matching

the abstracted patterns with new data. This literature search study focuses on the signature recognition approach. 2.3 Automatic and manual detection techniques Computer security fall into two types in another view. They are self-learning or automatic systems, and programmed or manual systems [A00]. The programmed systems require the user or other roles to teach the systems, to program it. The knowledge about the patterns and models of normal or intrusive activities in a computer system need to be collected before being fed to the systems in a manual way. Self-learning systems learn by being presented training examples to automatically induce what constitute normal and intrusive behaviors. These training examples contain the information typically recorded for the computer system for a certain period, flagged by outside authorities as normal and intrusive. These systems then detect the attacks from normal activities using the induced knowledge. As discussed above intrusion detection systems generally use a very large volume of data from information systems to build comprehensive models. The profiles of normal and intrusive activities change over time and new patterns appear constantly. Thus a practical intrusion detection system has to modify or add new entries to its underlying model over time. Moreover there are a lot of noise in such data. All these features make it almost impossible to manually program all the normal or intrusive patterns into a system. Anomaly detection like rule-based methods or the state series modeling can be used in the manual way. It can also be applied in automatic mode like the artificial neural network, regression, Markov chain, Bayesian network, hidden Markov model and statistics based profiling. Many of them including neural network, Markov chain, Bayesian network and statistics based profiling can easily apply the incremental learning ability. For some of them like the statistics based profiling there is no scalability problem. However as shown in the next section most existing signature recognition techniques such as the state-transition analysis, rule-based systems, and petri net lack the automatic and incremental learning capability. They are incapable of handling the large volume of data. These damage the ability of signature recognition as an efficient intrusion detection approach. This study will focus on the automatic signature recognition technique. The automatic signature recognition takes both the intrusion and normal behavior models into consideration. It accommodates the varying intrusion signatures merely by being confronted by different variations of the same theme of attack. 2.4. Signature recognition techniques and related key papers The list of signature recognition algorithms includes string matching, state transition, Petri net, rule-based system, expert systems, decision tree, association rules, neural network, genetic algorithm, and CCA-S. i. String matching String matching is also called keyword selection. This method is used in NSM [HDLMWW90, LC00] and other systems including NetRanger, NID. The keywords used to detect suspicious activities may include “password”, “shadow”, “permission denied”, “login: guest” and so on. A

keyword list can be built by hand. In detection it scans and counts the text transmitted between systems or arising in the use of the system, searching for attack-specific keywords. Based on the keywords matched in a session alarms may be raised. This approach is very rigid but simple to understand. It tends to have high false alarms and will not be effective if the keywords are hidden using encryption techniques. There is improvement on it through applying neural network. In this improvement the counts of the number of times those keywords on the list occur in each session are produced. Neural networks further process these keyword counts. One network weights keyword counts to provide the estimate of the probability of an attack in the session. A second network is used to classify attack types. Although this procedure is automatic, the string matching is manual in nature in its way to select keywords. ii. Simple rule-based and expert system The systems of the first type express the intrusion signatures into simple rules and use them to make decision on the prediction of the possible intrusive activities. These rules are normally expressed in the “IF” and “THEN” pairs and operate on the logic operations such as “AND” and “OR”. An example of such rules is given below. Rule: If the object in the system is listed as read-restricted AND if user 210 is not a system user id, then read is not authorized and an alarm is generated. The simple rule-based algorithm to detect intrusion signatures often leads to speedy execution. Obviously it is manual and rigid. The intrusion detection systems using this mechanism are NADIR [HJSMDF93], ASAX [HCMM92], Bro [P88], Stalker [SW94], and Haystack [S88]. Given rules that describe the intrusive behavior, an expert system can be used to reason about the security status of the system. It increases the abstraction level of the data by attaching a semantic to it. The system can be defined using the concepts of “subjects” and “objects”, “actions” and so on.

iii. Genetic algorithms Genetic algorithms are a family of techniques based on evolution and natural selection. They search for the optimal solution by keeping a relatively large population and allowing the active interaction among the individuals. Potential solutions to the problem are coded as sequences of bits, characters or numbers, called genes. The sequence is called a chromosome. At the beginning there is a set of these chromosomes and an evaluation function that measures the fitness of each chromosome. The crossover and mutation methods are applied on the chromosomes to produce new individuals. Crossover operates two or more chromosomes by switching the gene segments according to the predefined rules. Mutation operates on one chromosome to change the genes of it randomly. The selection of chromosomes to survival in the next generation is biased on the chromosomes with high fitness values. This process iterates many times until it reaches some stop criterion. If the problem is well constructed, better chromosomes (solutions) gradually emerge. NEDAA [SPM99] also discusses the implementation of GA conceptually. It gives the encoding of genes on some simple data. Each section of the chromosome corresponds to one attribute of the record. The crossover and mutation operations are designed accordingly. Then the chromosome is used to filter a set of records marked as either anomalous or normal. The fitness function rewards partial matches of training examples that are anomalous. If it matches a normal record it is penalized. In order to

get a set of chromosomes that can match multiple patterns of attacks, a technique called niching is used to find local maxima. Sharing and crowding methods use the concept of similarity to maintain population diversity. This is a very direct implementation schema. iv. Neural network Neural networks are algorithms that learn about the relationship between input-output vectors and update the node weights to generalize such vectors. They use an iterative process of adjustments to their internal structures. Each neural network consists of neurons at input layer, hidden layers, and output layer. There are directed links from the neurons at different layers. Each link has a weight. Usually these links are from input layer to hidden layer and then to output layer in a called feed-forward network. If links with reverse directions are allowed it is called a recurrent network good at handling the problem with time serious information. Each neuron associates with it a non-linear function to calculate the output from the input vector. The most commonly used is sigmoid function in form e net f net 1 ()1, where net is the input sum to this neuron. A neural network is initialized by randomly assigning small values to all the weights. Each inputoutput vector pair is applied to the network. The error between the expected and observed output vectors can be computed. Then based on least squared error or maximum likelihood estimate the backpropagation algorithm is used to update the weights of each link. This process is cycled until reaching the stop condition such as the error is less than a threshold. Neural network represents a non-linear regression of the information in the training data. In [BCDM98] and [E982] feed-forward neural networks have been used on the training data representing suspicious and legitimate events. Input vectors are the selected features from the attributes of the record. The output is labeled as 0 if normal or 1 otherwise. After enough iteration they are tested on another data set and different network structures can be compared. One problem of neural network application is that the choice on the network structure is closely dependent on experience despite some general guidelines.

3Analysis
3.1Artificial immune system (AIS) & Biological immune system (BIS)
In medical science, historically, the term immunity refers to the condition in which an organism can resist disease, more specifically infectious disease. However, a broader definition of immunity is a reaction to foreign (or dangerous) substances. Immunology concerns the study of the immune system and the effects of its operation on the body. The immune system is normally defined in relation to its perceived function: a defense system that has evolved to protect its host from pathogens (harmful micro-organisms such as bacteria, viruses and parasites) [Goldsby et al. 2003]. It comprises a variety of specialized cells that circulate and monitor the body, various extra-cellular molecules, and immune organs that provide an environment for immune cells to interact, mature and respond. The collective action of immune cells and molecules forms a complex network leading to the detection and recognition of pathogens within the body. This is followed by a specific effectors response aimed at eliminating the pathogen. This recognition and response process is vastly complicated with many of the details not yet properly understood.  Binary Immune System has two stages of reacting to an immune response:

1. Primary immune response:  This launches a response to invading pathogens. 2. Secondary immune response:  It remembers past encounters.  Faster response the second time around. 

The biological immune system has some features that make it a great achievement to try to simulate it artificially.

Among these features:  Recognition: it has the ability to recognize self from non self cells.  Feature extraction  Diversity  Memory

 Distributed  Multi-layered  Adaptive

These features make the biological immune system a powerful system that the body can use to stay healthy without the need of any medical drugs. To be able to achieve these features, the Binary Immune System has some processes that ensures optimum results. These are:

1. Recognition (self/non-self) 2. Negative selection 3. Clonal selection 4. Immune networks

3.2 Human Immune System vs Artificial immune system
Bio and Artificial Immune mapping Biological Immune System Artificial Immune System

Computer network Human Body Nodes / Files Organisms/ Organs Mobile Agents Antibodies Software Virus Antigens Immunity, Tolerance Immunity, Suppression Server Neural Controller

Look up Table Immune memory Virus Signatures Training patterns Detectors Receptors Wireless/ Wired Link Bio Connectivity IP Address Organ address Time of Virus Detection Time of Attack Replication Cloning Agent Agent Life Time Recovery Time Built –in Security Natural Immunity Agent based Security Acquired Immunity Dead PC Natural Death

Fig. 1. The Process By Which T Cells And B Cells Interact With Antigens

Fig. 2. Graphical Representation of the Life Cycle of T Cells and B Cells and Their Interactions with Antigens. From University of Hartford, Department of Mathematics, Epidemics and AIDS web page.

3.3 Motivation
Why is it that engineers are attracted to the immune system for inspiration? The immune system exhibits several properties that engineers recognize as being desirable in their systems. [Timmis & Andrews 2007, Timmis et al. 2008a, de Castro & Timmis 2002a] have identified these as:-

1)Distribution and self-organization The behavior of the immune system is deployed through the actions of billions of agents (cells and molecules) distributed throughout the body. Their collective effects can be highly complex with no central controller. An organized response emerges as a system wide property derived from the low level agent behaviors. These immune agents act concurrently making immune processes naturally paralyzed.

2)Learning, adaption, and memory The immune system is capable of recognizing previously unseen pathogens, thus exhibits the ability to learn. Learning implies the presence of memory, which is present in the immune system enabling it to ‘remember’ previously encounter pathogens. This is encapsulated by the phenomenon of primary and secondary responses: the first time a pathogen is encountered an immune response (the primary response) is elicited. The next time that pathogen is encounter a faster and often more aggressive response is mounted (the secondary response).

3)Pattern recognition Through its various receptors and molecules the immune system is capable of recognizing a diverse range of patterns. This is accomplished through receptors that perceive antigenic materials in differing contexts (processed molecules, whole molecules, additional signals etc). Receptors of the innate immune system vary little, whilst receptors of the adaptive immune system, such as as antibodies and T-cell receptors are subject to huge diversity.

4) Classification The immune system is very effective at distinguishing harmful substances (non-self) from the body’s own tissues (self), and directing its actions accordingly. From a computational perspective, it does this with access to only a single class of data, self molecules [Stibor et al. 2005]. Creation of a system that effectively classifies data into two classes, having been trained on examples from only one, is a challenging task.

4. Artificial Immune Systems (AIS) Models
4.1 Different models
Artificial Immune Systems (AIS) emerged in the 1990s as a new branch in Computational Intelligence (CI).A number of AIS models exist, and they are used in pattern recognition, fault detection, computer security, and a variety of other applications researchers are exploring in the field of science and engineering . Although the AIS research has been gaining its momentum, the changes in the fundamental methodologies have not been dramatic. Among various mechanisms in the biological immune system that are explored as AISs, negative selection, immune network model and clonal selection are still the most discussed models.

There are several models to detect virus & malware in your computer.       Negative selection model Partial matching rule Anomaly detection model Self & Non-self model A Hierarchical Artiﬁcial Immune Model Agent based algorithm

4.1.1 Negative Selection

Fig. 3. Shows negative selection process

Negative selection is a process of selection that takes place in the thymus gland. T cells are produced in the bone marrow and before they are released into the lymphatic system, undergo a maturation process in the thymus gland. The maturation of the T cells is conceptually very simple. T cells are exposed to self-proteins in a binding process. If this binding activates the T cell, then the T cell is killed, otherwise it is allowed into the lymphatic system. This process of censoring prevents cells that are reactive to self from entering the lymph system, thus endowing (in part) the host’s immune system with the ability to distinguish between self and non -self agents.

4.1.2 Artificial Negative Selection The negative selection algorithm Forrest et al. , is one of the computational models of self/nonself discrimination, first designed as a change detection method. It is one of the earliest AIS algorithms that were applied in various real-world applications. Since it was first conceived, it has attracted many AIS researchers and practitioners and has gone through some phenomenal evolution. In spite of evolution and diversification of this method, the main characteristics of a negative selection algorithm described by Forrest et al. In generation stage, the detectors are generated by some random process and censored by trying to match self samples. Those candidates that match are eliminated and the rest are kept as detectors. In the detection stage, the collection of detectors (or detector set) is used to check whether an incoming data instance is self or non-self. If it matches any detector, then it is claimed as non-self or anomaly. This description is limited to some extent, but conveys the essential idea. Like any other Computational Intelligence technique, different negative selection algorithms are characterized by particular representation schemes, matching rules and detector generation processes.

4.1.3 Partial matching rule

Fig. 4. Shows the partial matching rule

4.1.4 Anomaly detection

4.1.5 Self & Non-self Model

Negative Selection Algorithm (NSA) an algorithm for change detection based on the principles of self-nonself discrimination (by T cell receptors) in the immune system. The receptors can detect antigens. Partition of the Universe of Antigens SNS: self and nonself.

Fig. 5. Shows a self & non self pattern .

Immunologists traditionally describe the problem solved by the immune system as the problem of distinguishing "self" from dangerous "other" (or "nonself") and eliminating other . Self is taken to be the internal cells and molecules of the body, and nonself is any foreign material, particularly bacteria, parasites, and viruses, as well as degenerated self-cells. Distinguishing between self and nonself in natural immune systems is difficult for several reasons. But the main reason is that the components of the body are constructed from the same basic building blocks as nonself, particularly proteins. Proteins are important constituent of all cells, and the immune system processes them in various ways, including the processing in fragments called peptides, which are short sequences of amino acids. The problem of protecting computer systems from malicious intrusions can similarly be viewed as the problem of distinguishing self from nonself. In this case nonself might be an unauthorized user, foreign code in the form of a computer virus or worm, unanticipated code in the form of a Trojan horse, or corrupted data, etc. In principle, information security could be completely specified based on the abstract representation of self and nonself as sets of bit strings, at that designated even as "proteins" and "peptides" Example – Most elementary is the skin, which is the first barrier to infection. Another barrier is physiological where conditions such as pH and temperature provide inappropriate living conditions for foreign organisms. Once pathogens have entered the body, they are dealt with by the innate IS and by the acquired immune response system.

Fig. 6. Self & non self pattern works in human immune system

4.1.6 A Hierarchical Artiﬁcial Immune Model The model is composed of two modules: virus gene library generating module and self-nonself classification module. The first module is used for the training phase, whose function is to generate a detecting gene library to accomplish the training of given data. The second module is assigned as the detecting phase in terms of the results from first module for detection of the suspicious programs.

Fig.7. shows the hierarchal model for virus detection.

5. Proposed work
5.1 Agent Structure

Fig. 8. Agent based flow chart

5.2 Kosko’s Bidirectional Associative Memory (BAM MODEL)
Kosko (1988) extended the Hopfield model by incorporating an additional layer to perform recurrent auto-associations as well as hetero-associations on the stored memories.

Fig. 9. Bam model structure The network structure of the Bidirectional Associative Memory model is similar to that of the linear associator but the connections are bidirectional, i.e., wij = wji, for i = 1, 2, ..., m and j = 1, 2, ...,n. Also, the units in both layers serve as both input and output units depending on the direction of propagation. Propagating signals from the X layer to the Y layer makes the units in the X layer act as input units while the units in the Y layer act as output units. The same is true for the other direction, i.e., propagating from the Y layer to the X layer makes the units in the Y layer act as input units while the units in the X layer act as output units. Below is an illustration of the BAM architecture. Just like the linear associator and Hopfield model, encoding in BAM can be carried out by using: Wk = XkTYk to store a single associated pattern pair and

to simultaneously store several associated pattern pairs. After encoding, the network can be used for decoding. In BAM, decoding involves reverberating distributed information between the two layers until the network becomes stable.

In decoding, an input pattern can be applied either on the X layer or on the Y layer. When given an input pattern, the network will propagate the input pattern to the other layer allowing the units in the other layer to compute their output values. The pattern that was produced by the other layer is then propagated back to the original layer and let the units in the original layer compute their output values. The new pattern that was produced by the original layer is again propagated to the other layer. This process is repeated until futher propagations and computations do not result in a change in the states of the units in both layers where the final pattern pair is one of the stored associated pattern pairs. The final pattern pair that will be produced by the network depends on the initial pattern pair and the connection weight matrix. Several modes can also be used to update the states of the units in both layers namely synchronous, asynchronous, and a combination of the two. In synchronous updating scheme, the states of the units in a layer are updated as a group prior to propagating the output to the other layer. In asynchronous updating, units in both layers are updated in some order and output are propagated to the other layer after each unit update. Lastly, in synchronous-asynchronous updating, there can be subgroups of units in each layer that are updated synchronously while units in each subgroup are updated asynchronously. Since the BAM also uses the traditional Hebb's learning rule to build the connection weight matrix to store the associated pattern pairs, it too has a severely low memory capacity. The BAM storage capacity for reliable recall was given by Kosko (1988) to be less than minimum(m, n), i.e., the minimum of the dimensions of the pattern spaces. A more recent study by Tanaka et al (2000) on the relative capacity of the BAM using statistical physics reveals that for a system having n units in each of the two layers, the capacity is around 0.1998 n. 5.2.1 Operation of BAM 1) Let there exists “N” no of target pairs {(A1 , B1) , (A2,B2) …… (Ai , Bi)…… (An,Bn)} Where Ai = (ai1 , ai2 ………… ain) Bi = (bi1, bi2 ……. bip) where this aij , bij are either in ON state or OFF state. 2) In the binary mode ON = 1, OFF =0 In bipolar mode ON = 1 , OFF = -1 we can frame the correlations matrix. M= XiTYi

To retrieve the nearest pair (Ai , Bi) given by (α ,β ) the recall equations are as follows

3) starting with (α ,β ) as the initial condition , we determine a finite sequence (α ‘,β ‘) & (α” ,β” ) until an equilibrium paint (α F, β F) is reached Here β’ = ø (α M ) ………………….. 1 α’ = ø (β’MT) ………………….2 ø(F) = G (g1 , g2 ……….

gn) ……………………3

F = (f1 , f2 , f3 ………….fn)……………………4 1 if Fi > 0 0 (Binary) 1 (bipolar) if fi <0 Previous gi , if fi =0

5.2.2 EXAMPLE Let AI is the = Legal code & BJ is the = Pseudo code A1 = (100001) A2 = (011000) A3 = (001011) B1 = (11000) B2 = (10100) B3 = (01110)

converting these in bipolar form

X1 = (1 -1 -1 -1 -1 1 ) X2 = (-1 1 1 -1 -1 -1) X3 = (-1 -1 1-1 1 1)

Y1 = (1 1 -1 -1 -1 ) Y2 = (1 -1 1 -1 -1) Y3 = (-1 1 1 1 -1)

calculate the matrix corrlation

M = X1TY1 + X2TY2 + X3TY3 = 1 1 -1 -1 -3 -1 1 -3 -1 -1 1 3 -3 1 3 -1 1 -1 3 1 -1 -1 1 1 1 1 -1 3 1 -1

let us suppose that we start with α = X3

αM = ( -1 -1 1 -1 1 1 ) 1 -1 -1

1

1 -3 -1 -1 1 3

-3 1 3 -1 1 -1

-1 -1 1 1 3 1 1 -1 3 1 -1

1

-3 -1 = [ -6 6 6 6 -6 ] β’ = (-1 1 1 1 -1)

β’MT = [ -5 -5 5 -3 7 5] ø (β’MT ) = [-1 -1 1 -1 1 1] = α’

α’M = ( -1 -1 1 -1 1 1) (M) = (-6 6 6 6 -6) ø (α’M) = (-1 1 1 1 -1) = β” = β’

If X3 = Y3 then Legal code = pseudo code then the pseudo code is the legal code If X3 ≠ Y3 then Legal code ≠ pseudo code then the pseudo code is the viral code so here we conclude that BJ is a legal code .

6. Implemented algorithms
6.1 Code for Agent basis to detect the malware
initializeAgent () { Receive agent from server Load it into the target machine Agent() { Scan memory with the help of agent to known malware for each process in memory{ signature=extractSignature(processID) Scan memory for processes according to “self” & “Non-self” database entries If process is “non-self” database{ Status=PreventiveAction(processID, signature) } // after assurance that all processes in memory are “self” while agentAge<criticalAge{ Monitor all processes loaded in memory If loaded process not self{ Run Agent to scan if a known malware{ if yes, terminate process and update log file else status=PreventiveAction(processID, signature) } }

6.2 CODE for detect the viral code & legal code
let Ni = Legal_code let Nj = pseudo_ code let No = Viral_code creating a training set & comprised of self pattern initially Ni != Nj & Ni != No for(i=0;I<10;i++) for(j=0;j<=10;j++) use sliding window principle if Ni match with Nj & Ni mismatch with No then Nj = legal code & No = viral code End

7 . Discussion
Kosko’s BAM model is for pattern reorganization , but it can’t give 100% guaranty for recall of particular pattern or several pattern .

7.1 Example
Consider a pattern pairs

Let AI is the = Legal code & BJ is the = Pseudo code A1 = (000111001) A2 = (111001110) A3 = (110110101) B1 = (010000111) B2 = (100000001) B3 = (101001010)

converting these in bipolar form

X1 = (-1 -1 -1 1 1 1 -1 -1 1 ) X2 = (1 1 1 -1 -1 1 1 1 -1)

Y1 = (1 1 -1 -1 -1 ) Y2 = (1 -1 1 -1 -1)

X3 = (1 1 -1 1 1 -1 1 -1 1) calculate the matrix corrlation

Y3 = (1 -1 1 -1 -1 1 -1 1 -1)

M = X1TY1 + X2TY2 + X3TY3

let us suppose that we start with α = X2 αM = [ 13 -13 -5 1 1 -5 1 1 -5 -13 -19 5 ] β’ = (1 -1 -1 1 1 -1 -1 -1 -1 1 )

β’MT = [ 5 5 11 -11 -11 5 5 11 -11 ] ø (β’MT ) = [1 1 1 -1 -1 1 1 1 -1] = α’ α’M = ( 13 -13 -5 1 1 -5 -13 -19 5) ø (α’M) = (1 -1 -1 1 1 -1 -1 -1 1 ) = β” = β’ hence the cycle is terminated with αF = α = X2 (-1 1 1 1 -1 -1 1 1 1) And βF = β’ = ( 1 -1 -1 1 1 -1 -1 -1 1) however this is a incorrect pair to be recall now a computation of the energy function for (X2 , Y2 ) & (αF , βF ) yield E2 = - X2 M Y2T Will find E2 = -X2 M Y2T = 71 EF = - αF M βFT

= -75 E2 should be match with EF But here it is not . So this model can’t give 100% perfect result .

8 . Current & Future Scope
Now through this we can to detect faults, abnormalities, intrusions, erroneous element. now scientists are working on it. In future it can help to detect the viral code in a computer system & ,missing words (spelling correction) etc.

9. CONCLUSION
In my approach hetero associative memory matching method is used to dig the similarity between genes . I store different virus code & legal code together to keep all the information on the individual level by taking the advantage of relevance between different extracted signatures in the individual. Finally, classification decision is an overall behavior which reduces the information loss to a great extent. The model can effectively and efficiently recognize obfuscated virus, detect new variants of known virus and some unknown viruses. Although these modifications have been made, the model still have its own vulnerabilities. It can not maintain the diversity of the codes in the virus library. Some artificial intelligent algorithms like immune network model or KOSKO’S BAM (binary associated memory) algorithm could be used against it in future.

REFERENCES [1] P. S. Deng, J. Wang, W. Shieh et al. “Intelligent automatic malicious code signatures extraction”, IEEE 37th Annual 2003 International Carnahan Conference on Security Technology, pp. 600-603. [2] K. P. Anchor, P. D. Williams, G. H. Gunsch et al. “The Computer Defense Immune System: Current and Future Research in Intrusion Detection”, Evolutionary Computation, 2002, pp. 1027-1032. [3] J. O. Kephart. “A Biologically Inspired Immune System for Computers”, in Artificial Life IV, Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, 1994, pp. 130-139. [4] S. Forrest, A. S. Perelson, L. Allen et al. “Self - Nonself Discrimination in a Computer”, Security and Privacy, Oakland CA, pp. 202-212, 1994. [5] P. D’haeseleer, S. Forrest, P. Helman. “An immunological approach to change detection: algorithms, analysis, and implications”, Proceedings of IEEE Symposium on Research in Security and Privacy, Oakland, CA, pp. 110 - 119, May 1996. [6] H. Lee, W. Kim, M. Hong. “Artificial Immune System against Viral Attack”, ICCS 2004, Lecture Notes in Computer Science 3037, pp. 499506, 2004. [7] K. S. Edge, G. B. Lamont, R. A. Raines. “A retrovirus inspired algorithm for virus detection & optimization”, 8th Annual Genetic and Evolutionary Computation Conference, Seattle WA, 2006, pp. 103-110. [8] T. Li. Computer Immunology, Beijing: Publishing house of electronics industry, pp. 187-191, 2004. [9] D. Dasgupta, N. Attoh-Okine. “Immunity-Based Systems: A survey”, 1997 IEEE International Conference on Systems, Man, and Cybernetics, Computational Cybernetics and Simulation, 1997, pp. 369-374. [10] P. K. Harmer, P. D. Williams, G. H. Gunsch et al. “An Artificial Immune System Architecture for Computer Security Applications”, IEEE Transactions on Evolutionary Computation, vol. 6(3), pp. 252280, 2002. [11] M. D. Preda, M. Christodorescu, S. Jhaet al. “A Semantics-Based Approach to Malware Detection”, 34th Annual Symposium on Principles of Programming Languages, vol. 42(1), pp. 377-388, 2007. [12] O. Henchiri, N. Japkowicz, J. Nathalie. “A Feature Selection and Evaluation Scheme for Computer Virus Detection”, Sixth International Conference on Data Mining, Hong Kong, China, 2006, pp. 891-895. [13]Beer, R.D., Chiel, H.J. and Sterling, S., A Biological Perspective on Autonomous Agent Design, In Robotics and Autonomous systems, Vol. 6, (1990), 169 – 186. [14] Dasgupta, D, Artificial Immune Systems and Their Applications, Heidelberg, Germany: Springer-Verlag, 1999. [15] Dasgupta, D., An artificial immune system as a multi-agent decision support system, Proc. IEEE Int. Conf. Systems, Man

and Cybernetics ,(Oct. 1998), pp. 3816–3820. [16] David Kotz and Robert S. Gray, Mobile Agents and the Future of the Internet, ACM Operating Systems Review, (Aug. 1999), 7-13. [17] Desel, J., and Reisig, W., Place/Transition Petri Nets. In Lecture on Petri nets I: Basic Models, vol 1491 of Lecture Notes in Computer Science, Springer - Verlag, 1998. [18] Forrest S., Perelson A.S., Allen L., and Cherukuri, R., Self– Nonself Discrimination in a Computer, Proceedings of the IEEE Symposium on Research in Security and Privacy(Los Alamos, CA: IEEE Computer Society Press), 1994. [19] Goel, S and Bush S.F., Biological Models of Security for Virus Propagation in Computer Networks login:, vol. 29, no. 6, (Dec. 2004), 49-56. [20] Kaariboga Mobile Agents (Sep. 2003). [Online]. Available: http:// http://www.projectory.de/kaariboga/index [21] Kephart, J.O., Biologically Inspired Defenses against Computer Viruses, Proceedings of IJCA ’95, (1995) 985– 996. [22] Paul K. Harmer et al, An Artificial Immune System Architecture for Computer Security Applications, IEEE Transactions on Evolutionary Computation, vol. 6, no. 3, (Jun. 2002), 252 – 280. [23] Virus Information and Statistics, [Online]. Available: http:// http://www.avira.com/en/threats/ Proceedings of the World Congress on Engineering 2008 Vol I WCE 2008, July 2 - 4, 2008, London, U.K.

COMPUTER SECURITY

Comments

Content

Sponsor Documents

Recommended