antivirus

Published on December 2016 | Categories: Documents | Downloads: 61 | Comments: 0 | Views: 532
of 6
Download PDF   Embed   Report

Comments

Content

Why Anti-virus Products Slow Down Your Machine?
Wei Yan Trend Micro, Inc. U.S.A. wei [email protected] Nirwan Ansari New Jersey Institute of Technology U.S.A. [email protected]
softwares. The other discontents, for examples, are long scan time and false positives. Fortunately, industry companies have accepted these complaints and improved their security applications. Symantec (www.symantec.com) successfully overhauled its system to make Norton products run faster in 2006. A. Is AV dead? Traditional emergency response teams involve malware collection, signature generation, and signature database updating. However, owing to the flood of malwares, security companies usually receive thousands of suspicious samples daily from honeypots and customers submissions. It is very time consuming and resource intensive for them to analyze these samples manually and generate signatures. Is signature-based virus detection technology dead? There exist some concerns out there that this approach cannot catch up with the flood of new viruses based on the fact that security vendors usually update virus signatures every hour, or even twenty minutes. However, most customers are not willing to remove security softwares out of their machines because they still think these applications are worthwhile and must-have. Signature-based virus recognition has been used for more than two decades, and it is one of the most cost-effective and mature methodologies to detect viruses while keeping a low false rate. The debate still goes on. One alternative solution is the whitelisting technology. Can whitelisting paradigm replace blacklisting? Blacklisting aims to store hash values or fingerprints of malicious programs whereas whitelisting lists benign applications and system files. Almost all AV products use the blacklisting method, and the blacklist is actually the signature file. On the contrary, whitelisting-based tools only allow operating systems to access benign files and websites, and always block non-listed names. At the time of writing, there are about millions of malwares listed in the blacklist, and tens of millions in the whitelist. If security companies are already working around the clock to cope with new blacklist samples, whitelisting protection might not be workable due to even more benign files appearing each day. B. Why my machine slows down? The signature file can be considered as a malicious fingerprint database which is updated frequently to cover the latest threats. It works with the scan engine to detect threats.

Abstract— Customers always complain that anti-virus softwares bog down their computers by consuming much of PC memories and resources. With the popularity and variety of zeroday threats over the Internet, security companies have to keep on inserting new virus signatures into their databases. However, is the increasing size of the signature file the sole reason to drag computers to a crawl during the virus scan? This paper outlines other three reasons for slowing down software-protected computers, which actually are not directly related to the signature file. First, the rising time consumption of de-obfuscating binary payloads by using the emulation technology requires anti-virus softwares take more time to scan a packed file than an unpacked file. Second, New Technology File System causes self-similarity in file index searching and data block accessing. Even if file sizes fit the log-normal distribution, there are still many “spikes” of high virus-scanning latency which cannot be ignored. Last but not least, temporal changes in file size, file type, and storage capacity in modern operation systems are slowing down virus scan. The paper also discusses the cloud-based security infrastructure for deploying a light-weight and fast anti-virus products.

I. I NTRODUCTION It is important to understand that the current threat landscape is changing and we have seen a large volume of new malwares captured by security vendors each day. Why is this happening? It is the online malware generators that enable script kiddies to easily create new viruses and rootkits, and challenge Anti-virus (AV) pattern update schemes. For example, Panda Security (www.pandasecurity.com), a security company, has detected more samples in 2008 than in the previous more than 17 years combined. These threats came from softwares, appliances, and web services. This surge in malware infringement on Internet security calls for urgent demands on security products. Generally speaking, an AV scanner is a software application for checking whether a computer has been infected by spyware, rootkits, or other malwares. To search an executable file for viruses, a scanner typically scans segments at certain offsets for known signatures. It also automatically checks for threats in attachments received through emails, and any file operations. The signature file usually employs prior knowledge, and the scanner detects computer viruses via a scan engines. Moreover, automatic updates immunize users to defend against new virus outbreaks. Increasingly, the first thing computer users will do after reinstalling operation systems is to install security softwares. Then, they may notice slowdowns in their machines after the installations; this is one of the top complaints about security

978-1-4244-4581-3/09/$25.00 ©2009 IEEE

As malwares are becoming more complicated, the signature file is becoming larger and needs to handle various types of protections including detection, cleaning, and recovery. Besides, these signatures will be loaded into the memory. Normally, the scan engine will take milliseconds to scan a file by traversing through the signature file. It will not be surprised that a big signature file will drag down computers tremendously. However, users always exaggerate the downside of the signature file. They take for granted that a large signature file is the only main reason why AV products bog down their computers. As a result, they often blame the security industry’s hesitation to adopt new technology to shrink the signature size. However, the PC Spy (http://www.thepcspy.com/read/what slows windows down/) had done an interesting testing to show how popular software applications slowed down Windows. Besides anti-virus softwares, Fonts, Yahoo’s and AOL’s chat programs, .NET, Visual Studio, and VMWare all slowed down computers quite a lot. This work even showed that 1000 Fonts had a bigger negative effect on the window load time than most AV products. If the size of a signature file can be reduced as small as a few years earlier, will the PC’s speed be almost as fast as it did before? In this paper, we outlines three other reasons of slowing down virus scan, that are actually not directly related to the size of the signature file. 1) To evade detection, modern malwares are able to obscure their fingerprints and to make themselves undetected. Portable Executable (PE) packers become the most favorite binary tools for malware authors to instigate code obfuscation. Thus, it is essential for AV scanners to support the emulation functionality, which can safely analyze obfuscated malwares and then unpack their payloads. Yan et al. [1] discussed three approaches to cope with packers. However, malware emulation is very slow and expensive because it lets an executable file run within a virtual environment implemented by the software instead of the hardware. 2) By hiding themselves deep into operating systems by using the rootkit technology, modern malwares can completely bypass personal firewalls and anti-virus scanners [2]. In this paper, we will demonstrates how lowlevel file operations can propagate self-similarity. This burstiness is caused by the Microsoft’s New Technology File System (NTFS) data accessing algorithm, and will give rise to large scanning latencies. 3) The study in [3] showed temporal changes in the file size, file number, and storage capacity have increased over the past years. Accordingly, security products which scan data proportional to the number and size of files will take much longer time. The rest of the paper is organized as follows. Section 2 describes the code obfuscation, unpacking, and emulation. In Section 3, the rootkit hidden problem in the NTFS file system is discussed, followed by the low-level file scanning work flows. Temporal changes in the file size, file type, and

storage capacity in modern operation systems are discoursed in Section 4. Section 5 presents the concluding remarks. II. U NPACKING AND E MULATION A. Code obfuscation Security researchers are facing a great challenge in overcoming the complexity of malwares. It is no doubt that Microsoft Windows seems to be the most heavily attacked platform nowadays. Malwares are most commonly written for that platform as compared to that of Linux and Unix. A Portable Executable (PE) file is an executable mostly used by Microsoft Windows. Reference [4] provides more information about the PE format. A PE file comprises various sections and headers which describe the section data, import table, export table, resources, etc. It starts with the DOS header and PE header. The PE header contains general file properties, such as the number of sections, machine type, and time stamp. Another important header is the optional header, which includes a set of important information segments. The optional header is followed by the section table header, which summarizes each section’s raw size, virtual size, section name, etc. Finally, at the end of the PE file is the section data, which contains the file’s Original Entry Point (OEP), which refers to where the file execution begins. Conventional virus scanners search executable files in the signature database for pre-defined fingerprints. Unfortunately, this method can be easily defeated by packed or obfuscated viruses. For example, hackers can use packers, which are softwares that compress and encrypt original payloads in advance, and then restore them when loaded into memory, to scramble the malicious signatures from being detected. This paradigm is inferred to as code obfuscation. Code obfuscation has evolved from simple compression and encryption to polymorphism and metamorphism. Currently, packers become the most favorite toolkits to bypass security applications. For example, Armadillo (www.siliconrealms.com), Themida (www.oreans.com), and Obsidium (www.obsidium.de) are all commonly used packers. Therefore, it is vital for security products to be able to unpack and inspect original payloads hidden inside packed programs. Unpacking is the process of stripping packer layers and restoring the original contents. Normally, a software, called emulator or sandbox, is developed to construct a virtual environment, where the emulator can “execute” packed programs until they are fully decrypted or unpacked. B. Unpacking Obsidium Reverse Engineering (RE) has become an important approach to analyze a program’s logic flow and structure, such as system call functions. However, RE is a time-consuming process of discovering the specifications of a system or program by analyzing its outputs and internal logics. Obsidium is a Windows-based packer which encrypts PE files with advanced protection mechanism. Its unpacking process involves four consecutive steps: anti-debugging checking, memory-page encryption, import table rebuilding, and

jumping to OEP. Obsidium calls quite a few functions to detect debuggers, such as CheckRemoteDebuggerPresent(), CreateToolhelp32Snapshot(), FindWindowA(), IsDebuggerPresent(), and UnhandledExceptionFilter(). Runtime decryption is used by Obsidium as the encryption engine. Specifically, Obsidium performs the decryption at the memory-page level. After decrypting a memory page and executing the corresponding assembly instructions, Obsidium wipes this page out right away, and decrypts the next one. Therefore, it is very hard to dump the whole original codes without debugging step-bystep. The stages of import table rebuilding and jumping to OEP are similar to other complicated packers. For the import table building, Obsidium inserts a large amount of junk codes to defend against RE. It also applies six different types of protection methods to hide import table data. Moreover, it takes advantage of fake OEP trick by stealing a segment of codes around the original OEP and storing them somewhere. Hence, the scan engine has to discover those stolen codes first, and then patch them back to rebuild the original OEP. C. Emulation speed Despite its power and potentials, emulation cannot be heavily used by AV products, mainly because of complexities of implementing a fully virtual environment, and also because of its tradeoff in the speed. The emulator being used by the scan engine is a software which simulates CPU hardware without affecting the actual computer environment so that the computer will not be infected with viruses. However, the core problem is that the emulation is very slow because the emulator has to interpret assembly instructions one by one. Unfortunately, as more and more new malwares are packed or polymorphic, they mutate themselves as they spread around so that no two copies will share the same codes. To perform de-obfuscating, an emulator first needs to parse PE internal structures to locate OEP. Then, it will go through the decompressing or decrypting routines to dump original instructions in the memory, and to execute these codes. As compared to spending milliseconds to scan a unpacked malware, sometimes the emulator needs up to minutes to emulate a packed file; this is not tolerable for in-the-fly protection. If the scanner could also emulate an obfuscated sample for only milliseconds, it might not collect enough information to determine whether the sample is malicious or not. On the other hand, if a suspicious sample is given seconds or even minutes to get a “wild run”, desktop machines will slow down dramatically. Therefore, in this aspect, even if the size of signature file remains the same as before, the scan time will not be as fast as before. III. V IRUS S CANNING IN NTFS F ILE S YSTEMS Current popular file systems include New Technology File System (NTFS) for Windows, Third Extended Filesystem (ext3) for Linux, and Hierarchical File System Plus (HFS+) for Mac OS. Since Microsoft Windows is the dominant and the most heavily attacked operating system, the scope of this paper is limited to NTFS.

A. Rootkit Malware authors usually prevent the AV engine from detecting their malicious codes by hiding their files in the infected systems. Rootkit is the technique to manipulate file system and system calls so that certain files become invisible or inaccessible to regular users and AV scanners. In order to achieve data hiding, rootkit uses Application Programming Interface (API) hooking at both the user level and kernel level. By intercepting system calls, replacing them with faked ones, and altering the execution paths, a rootkit is able to hide files [2]. The presence of a rootkit compromises the reliability and the security of the operating system because attackers can modify system environment variables, and hide malicious codes in hidden files and processes. Since rootkit works by intercepting API calls, a highlevel view using Windows APIs will differ from the lowlevel (accessing disk data without calling APIs) view, if a rootkit resides in the system. So the mechanism of a rootkit detection is to list the file discrepancies by comparing results of API high-level scanning with low-level scanning. Therefore, understanding how NTFS fetches disk data at low-level is critical for developing such rootkit scanner and integrating its function into security softwares. B. NTFS data accessing The use of RE in NTFS structures and principles has been addressed by several researchers. For example, the LinuxNTFS project [5] was developed to create a new Linux kernel driver for NTFS, user space utilities, and a function library. Ragar [6] presented the details of writing kernelmode Windows NT file-system drivers. Files play a key role in Windows systems, and constitute the largest percentage of the hidden objects in NTFS. In this section, NTFS file accessing mechanisms and low-level file scanning work flow are introduced. Everything on an NTFS volume exists as a file record. NTFS uses B-tree to index file record data, which allows the efficient insertion, retrieval and removal of those file records. For example, NTFS can quickly list all the files’ sizes, modified dates and types in an ordinal order under a certain directory without accessing their real data. When an NTFS volume is formatted, metadata files are created, containing Master File Table ($MFT), $BITMAP, $BOOT, etc. For example, $MFT contains the descriptions of metadata, and the attributes of all the files and directories. Every file record in $MFT stands for a file or directory, and if a file is small enough, its actual data will be stored directly in the record itself. Otherwise, a file index is saved instead. A file’s attributes, both resident and non-resident, can be accessed by traversing the MFT table. The most important attributes include the file name, data, index root, and index allocation attributes. The file name attribute contains the file’s both long name and MS-DOS short name. NTFS allows multiple data attributes in one file record, which makes the data attribute to be the most suitable place for a hacker to hide their malicious files. Finally, index root and allocation attributes are used to implement folders and other

indices. Since NTFS uses B-tree to access files, directories are indexed for quick searching by the index entries.

where Wk = (X1 , X2 + . . . + Wk ) − kX(n), k ≥ 1 Self-similar traces satisfy E[ R(n) ] ∼ nH , 0 < H < 1 S(n) (7) (6)

Fig. 1.

Low-level file scanning

Fig. 1 shows the work flow of an NTFS low-level file scanning tool. The scanner first reads the NTFS volume’s boot sector, which stores the start address of the $MFT table. The $ROOT file record in the $MFT table contains the root directory information. From there, the scanner reads the index root (root node of the B+ tree) or index allocation attribute, which is the basic component of an index. In NTFS, a directory is a sequence of index entries. Thus, the specific file record can be accessed from its index. Finally, the file’s content can be accessed and copied to a new file, which is scanned by an AV scanner. C. Burstineess and latency in NTFS file operations Given a stationary time series X(t), t ∈ , where X(t) is interpreted as the traffic at time instance t, the aggregated X m of X(t) at aggregation level m is defined [7] as X m (k) = 1 m
km

X(t)
i=km−(m−1)

That is, X(t) is partitioned into non-overlapping blocks of size m; their values are averaged, and k indexes these blocks. Denote rm (k) as the auto-covariance function of X m (k). X(t) is called self-similar with Hurst parameter H(0.5 < H < 1), if for all k, m ≥ 1, V ar(X m ) ∝ m−β and rm (k) → r(k) as m → ∞ (2) (3)

The variance-time plot and R/S plot are two of the most commonly used methods to calculate the Hurst parameter, H. The variance-time plot is based on the slowly decaying variance of a self-similar trace. From Equation 2: log(V ar(X m )) = c − βlog(m)
β 2.

This plot is called variance-time plot with H = 1 − Given a series of observations X(t), t ∈ with mean X(n) and sample variance S 2 (n),
R(n) S(n)

=

min(0, W1 , W2 , . . . Wn )]

1 S(n) [max(0, W1 , W2 , . . . Wn )−

elif gninnacs

ycnapercsid

elif wen
(1) (4) (5)

atad elif gnirapmoc

drocer elif atad elif sllac I PA

eert +B xedni

noitacolla xedni elif gnitsil

r ot c e s t o o b

TOOR$

toor xedni

In this section, NTFS file systems were scanned within a small scale network. The data were collected from four hosts. Most file size values range from 256B to 512kB. Our results show that their frequencies fit the log-normal distribution and only the distribution tail presents self-similar behavior at a low bursty degree, which is similar to the work described in [8]. Table 1 shows the input directory, file number, and the measured Hurst parameters of the input traces. For example, for the input trace of “system32” directory with 5895 files, the variance-time measured H is 0.612107. The input trace of “F:” drive variance-time measured H is 0.679351.
TABLE I I NPUT T RACES FOR S IMULATIONS . Input traces Trace 1 Trace 2 Trace 3 Trace 4 Input directory system32 F: system32 E: File number 5895 101781 4940 6055 Variance-time Measured H 0.612107 0.679351 0.608519 0.667326 Measured R/S H 0.632398 0.595343 0.630199 0.631928

Three file operation events are defined: listing, scanning, and content comparing. First, starting from the index B+ tree root node, all the file names from a directory or even a whole raw disk can be listed in alphabetical order one by one. By comparing with the query results of high-level API calls, file name discrepancies could be found. Second, based on the index entry and file record, the corresponding file’s raw content can be accessed. Finally, to detect malwares at the deepest level, the file raw content was compared with the results of API calls again for any content discrepancies.
TABLE II I NPUT T RACES FOR L OW- LEVEL F ILE P ROCESSING . trace 1 2 3 4 list v-t H 0.764 0.682 0.736 0.692 list R/S H 0.739 0.742 0.732 0.738 scan v-t H 0.840 0.823 0.823 0.746 scan R/S H 0.824 0.752 0.826 0.701 compare v-t H 0.852 0.847 0.827 0.850 compare R/S H 0.797 0.869 0.775 0.883

For the low-level file processing, the searching time depends on both file locations in B-tree and file content sizes. We have showed that the file listing, scanning, and comparing time distributions are not log-normal. In [5], the B-tree searching mechanism is expounded in details. Here, the inter-arrival file events were demonstrated to present high self-similarity in those event traces, whose distribution burstiness cannot be smoothed by averaging over a large time scale. For the four traces shown in Table 2, we have listed their measured Hurst

parameters of listing, scanning, and comparing event traces. It is clear that they have much higher bursty degrees. To our best knowledge, our simulation [9] was the first to provide the evidence for the bursty feature in the high-level file system input and output events, that is caused by the pareto-distributed NTFS file index searching and data block accessing time. This conclusion explains delay discrepancies of file scanning well. AV scan engines usually enumerate files by calling Windows APIs, such as FindFirstFile() and FindNextFile(), which then will instead enumerate disk blocks by using the NTFS low-level approach. Therefore, during the virus scanning in NTFS file systems, even if the trace of the file size fits the log-normal distribution, there are still many “spikes” of high virus-scanning latency which cannot be ignored. Furthermore, this kind of scan delay has nothing to do with the size of signature file, but is only related to how Microsoft designs and implements NTFS file accessing algorithms. IV. W INDOWS : THE S YSTEM THAT S LOWS D OWN W INDOWS Windows system metadata has been changed in recent years. Does this trend have any effect on virus scan? Metadata describes a set of characteristics of files and directories existing in the file system. It contains features including: file size, number, timestamps, attributes, etc. Authors in [3] collected annual snapshots of file-system metadata from over 60000 Windows PC file systems. Their results showed how NTFS file system metadata changed from 2000 to 2004. Table 3 summarizes their research observations of a few important properties.
TABLE III C HANGES OF F ILE S YSTEM M ETADATA . 2000 30k 108k 2400 8G 2004 90k 189k 8900 46G effects on AV products on-demand scan on-access scan on-demand scan on-demand scan

been increased from 108k to 189k over the past four years. As a result, we expect that users have to wait longer for onaccess scan owing to the grown average file size. On the other hand, the findings in [3] also showed that files deeper in the index tree tend to be smaller whereas more and more large files will reside in shallow levels. Since Trojan and Internet zero-day malwares are generally much smaller in size than other types of viruses, their corresponding index searching time and data block accessing time tend to be a little bit longer. V. D ISCUSSION AND C ONCLUSION A countermeasure to speed up the virus scan is to move AV functionality from the user desktop into the cloud. AV In-the-Cloud service is becoming the next-generation security infrastructure designed to defend against virus threats. It provides reliable protection service delivered through data centers worldwide which are built on virtualization technologies.

Fig. 2.

Anti-virus In-the-Cloud infrastructure.

file number file size directory number storage capacity

In AV products, on-demand scan is one of the main scan types, and is a full search and scan in the file system. Ondemand scan is at the file level, and it scans all files in the hard disk. Whenever virus signatures are updated, users are recommended to start the on-demand scan to make sure that all files are checked with the latest signatures. As shown in Table 3, the mean value of the number of files in the NTFS file system has grown from 30k to 90k, implying that oncommand scan will take much more time. In addition, the number of directories and the total storage capacity of the whole file system have also increased steadily; this also drags down machines further. On-access scan is another mainstream type of scan implemented inside the virus scanner. It continually monitors PC memory and any on-access file operation. The speed of onaccess scan is largely dependent on the specific size of the accessed file. It was observed that the average file size has

AV In-the-Cloud service has been advocated as the next-generation model for virus detection by Trend Micro (http://www.trendmicro.com) and other AV vendors since June, 2008. It is a software distribution model in which security services are hosted by vendors and made available to customers over the Internet. This approach employs a cloud server pool which analyzes and correlates new attacks, and generates vaccinations online. The cloud infrastructure will sharply reduce computation burdens on the clients, and enhance security products in mitigating new malwares. Furthermore, customers only need to maintain a small and lightweight version of a virus signature file instead of the full copy. Benefits include easy deployment, low costs of operation, and fast virus detection. Fig. 2 shows the architecture of AV In-the-Cloud service. The agent is an on-access scanner deployed at the desktop. It places itself between the applications and the operating system. The agent automatically examines the local machine’s memory and file system whenever these resources are accessed by an application. For any suspicious file, the agent generates the hash value or a specific signature of the file, and sends it to the remote cloud server for security verification. The low-latency

esabatad erutangiS

srevres duolc VA

edon tixE

metsys suomynonA tnega potksed VA

resU `

edon ecnartnE

anonymous communication network is used to forward these requests from the desktop to the remote cloud. Our work is motivated by the need of explanation why AV softwares drag down users’ computers. In this paper, we have showed that the large signature file is not the only reason for the slowdown. The virtual emulation widely used in security products has required AV scan engine more time on de-obfuscating polymorphic viruses than unpacked ones. On the other hand, low-level NTFS file operations and the recent changes of file system metadata also delay both on-command and on-access scan time. R EFERENCES
[1] W. Yan, Z. Zhang, and N. Ansari “Revealing packed malware,” IEEE Security and Privacy, vol. 6, no. 5, pp. 65-69, Sep/Oct, 2008 [2] C. Kruegel, W. Robertson, and G. Vigna, “Detecting Kernel-Level Rootkits Through Binary Analysis”, Proceedings of 20th Annual Computer Security Applications Conference, pp. 91-100. Tuscon, AZ, December 2004. [3] N. Agrawal, W. Bolosky, J. Douceur, and J. Lorch, “A five-year study of file-system metadata,” Proceedings of the 5th USENIX conference on File and Storage Technologies, p.3-3, San Jose, CA, February 2007 [4] http://msdn.microsoft.com/msdnmag/issues/02/02/PE/ [5] Linux-NTFS Project, NTFS Documentation, http://www.linux-ntfs.org [6] R. Nagar, Windows NT File System Internals, O’Reilly, 1997. [7] W. Leland, M. Taqqu, W. Willinger and D. Wilson, “On the self-similar nature of Ethernet traffic”, IEEE/ACM Transactions on Networking, vol. 2, no.1 pp. 1-15, 1994. [8] J. R. Douceur and W. J. Bolosky, “A large-scale study of file-system contents”, Proceedings of 1999 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pp. 59–70, Atlanta, Georgia, June, 1999. [9] W. Yan, “Revealing Self-similarity in NTFS File Operations,” poster paper, Proceedings of the 7th USENIX Conference on File and Storage Technologies, San Francisco, CA, February 2009

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close