unit 4

Published on December 2016 | Categories: Documents | Downloads: 88 | Comments: 0 | Views: 595
of 21
Download PDF   Embed   Report

Comments

Content

UNIT 4
MEMORY MANAGEMENT
4.1 OVERVIEW
Memory management is a complex field of computer science and there are many
techniques being developed to make it more efficient. This guide is designed to introduce you to
some of the basic memory management issues that programmers face.
This guide attempts to explain any terms it uses as it introduces them. In addition, there is
a Memory Management Glossary of memory management terms that gives fuller information;
some terms are linked to the relevant entries.
Memory management is usually divided into three areas, although the distinctions are a
little fuzzy:


Hardware memory management



Operating system memory management



Application memory management
These are described in more detail below. In most computer systems, all three are present

to some extent, forming layers between the user’s program and the actual memory hardware. The
Memory Management Reference is mostly concerned with application memory management.

4.1.1. Hardware memory management
Memory management at the hardware level is concerned with the electronic devices that
actually store data. This includes things like RAM and memory caches.

4.1.2. Operating system memory management

In the operating system, memory must be allocated to user programs, and reused by other
programs when it is no longer required. The operating system can pretend that the computer has
more memory than it actually does, and also that each program has the machine’s memory to
itself; both of these are features of virtual memory systems.

4.1.3. Application memory management
Application memory management involves supplying the memory needed for a
program’s objects and data structures from the limited resources available, and recycling that
memory for reuse when it is no longer required. Because application programs cannot in general
predict in advance how much memory they are going to require, they need additional code to
handle their changing memory requirements.
Application memory management combines two related tasks:
i) Allocation
When the program requests a block of memory, the memory manager must allocate that
block out of the larger blocks it has received from the operating system. The part of the memory
manager that does this is known as the allocator. There are many ways to perform allocation, a
few of which are discussed in Allocation techniques.
ii)Recycling
When memory blocks have been allocated, but the data they contain is no longer required
by the program, then the blocks can be recycled for reuse. There are two approaches to recycling
memory: either the programmer must decide when memory can be reused (known as manual
memory management); or the memory manager must be able to work it out (known as automatic
memory management). These are both described in more detail below.
An application memory manager must usually work to several constraints, such as:
i)CPU overhead

The additional time is taken by the memory manager while the program is running.

ii)Pause times
The time it takes for the memory manager to complete an operation and return control to
the program.
This affects the program’s ability to respond promptly to interactive events, and also to any
asynchronous event such as a network connection.
iii)Memory overhead
It defines how much space is wasted for administration, rounding (known as internal
fragmentation), and poor layout (known as external fragmentation). Some of the common
problems encountered in application memory management are considered in the next section.

4.1.4. Memory management problems
The basic problem in managing memory is knowing when to keep the data it contains,
and when to throw it away so that the memory can be reused. This sounds easy, but is, in fact,
such a hard problem that it is an entire field of study in its own right. In an ideal world, most
programmers wouldn’t have to worry about memory management issues. Unfortunately, there
are many ways in which poor memory management practice can affect the robustness and speed
of programs, both in manual and in automatic memory management.
Typical problems include:
i)Premature frees and dangling pointers
Many programs give up memory, but attempt to access it later and crash or behave
randomly. This condition is known as a premature free, and the surviving reference to the
memory is known as a dangling pointer. This is usually confined to manual memory
management.

ii)Memory leak
Some programs continually allocate memory without ever giving it up and eventually run out of
memory. This condition is known as a memory leak.
iii)External fragmentation
A poor allocator can do its job of giving out and receiving blocks of memory so badly
that it can no longer give out big enough blocks despite having enough spare memory. This is
because the free memory can become split into many small blocks, separated by blocks still in
use. This condition is known as external fragmentation.
iv)Poor locality of reference
Another problem with the layout of allocated blocks comes from the way that modern
hardware and operating system memory managers handle memory: successive memory accesses
are faster if they are to nearby memory locations. If the memory manager places far apart the
blocks a program will use together, then this will cause performance problems. This condition is
known as poor locality of reference.
v) Inflexible design
Memory managers can also cause severe performance problems if they have been designed with
one use in mind, but are used in a different way. These problems occur because any memory
management solution tends to make assumptions about the way in which the program is going to
use memory, such as typical block sizes, reference patterns, or lifetimes of objects. If these
assumptions are wrong, then the memory manager may spend a lot more time doing
bookkeeping work to keep up with what’s happening.
vi)Interface complexity

If objects are passed between modules, then the interface design must consider the
management of their memory.
A well-designed memory manager can make it easier to write debugging tools, because
much of the code can be shared. Such tools could display objects, navigate links, validate
objects, or detect abnormal accumulations of certain object types or block sizes.

4.1.5. Manual memory management
Manual memory management is where the programmer has direct control over when
memory may be recycled. Usually this is either by explicit calls to heap management functions
(for example, malloc and free(2) in C), or by language constructs that affect the control stack
(such as local variables). The key feature of a manual memory manager is that it provides a way
for the program to say, “Have this memory back; I’ve finished with it”; the memory manager
does not recycle any memory without such an instruction.
The advantages of manual memory management are:


it can be easier for the programmer to understand exactly what is going on;



some manual memory managers perform better when there is a shortage of memory.

The disadvantages of manual memory management are:


the programmer must write a lot of code to do repetitive bookkeeping of memory;



memory management must form a significant part of any module interface;



manual memory management typically requires more memory overhead per object;



memory management bugs are common.
It is very common for programmers, faced with an inefficient or inadequate manual

memory manager, to write code to duplicate the behavior of a memory manager, either by

allocating large blocks and splitting them for use, or by recycling blocks internally. Such code is
known as a suballocator. Suballocators can take advantage of special knowledge of program
behavior, but are less efficient in general than fixing the underlying allocator. Unless written by a
memory management expert, suballocators may be inefficient or unreliable.
The following languages use mainly manual memory management in most
implementations, although many have conservative garbage collection extensions: Algol; C; C+
+; COBOL; Fortran; Pascal.

4.1.6. Automatic memory management
Automatic memory management is a service, either as a part of the language or as an extension,
that automatically recycles memory that a program would not otherwise use again. Automatic
memory managers (often known as garbage collectors, or simply collectors) usually do their job
by recycling blocks that are unreachable from the program variables (that is, blocks that cannot
be reached by following pointers).
The advantages of automatic memory management are:


the programmer is freed to work on the actual problem;



module interfaces are cleaner;



there are fewer memory management bugs;



memory management is often more efficient.

The disadvantages of automatic memory management are:


memory may be retained because it is reachable, but won’t be used again;



automatic memory managers (currently) have limited availability.

There are many ways of performing automatic recycling of memory, a few of which are
discussed in Recycling techniques.
Most modern languages use mainly automatic memory management: BASIC, Dylan, Erlang,
Haskell, Java, JavaScript, Lisp, ML, Modula-3, Perl, PostScript, Prolog, Python, Scheme,
Smalltalk, etc.

4.2 Pages
4.2.1 Basic Information
Paging is a method of writing data to, and reading it from, secondary storage for use in
primary storage, also known as main memory. Paging plays a role in memory management for a
computer's OS (operating system).
In a memory management system that takes advantage of paging, the OS reads data from
secondary storage in blocks called pages, all of which have identical size. The physical region of
memory containing a single page is called a frame. When paging is used, a frame does not have
to comprise a single physically contiguous region in secondary storage. This approach offers an
advantage over earlier memory management methods, because it facilitates more efficient and
faster use of storage.

Fig. Page Memory Table (PMT)


A page is a unit of logical memory of a program



A frame is a unit of physical memory



All pages are of the same size



All frames are of the same size



A frame is of the same size as a page

4.2.2 Physical Address Generation



To produce a physical address, you first look up the page in the PMT to find the frame
number in which it is stored. Then multiply the frame number by the frame size and add
the offset to get the physical address.



A page table is kept in main memory. It is part of the process control block (PCB) for
each process.



Page-table base register (PTBR) points to the page table.



Page-table length register (PRLR) indicates size of the page table.



In this scheme every data/instruction access requires two memory accesses. One for the
page table and one for the data/instruction.

Fig. Paged Lookup

4.2.3 Examples of PMT
Windows PMT

Linux PMT

4.3 kmalloc and vmalloc
4.3.1 kmalloc

The kmalloc() function’s operation is similar to that of user-space’s familiar malloc()
routine, with the exception of the additional flags parameter.The kmalloc() function is a simple
interface for obtaining kernel memory in byte-sized chunks. If you need whole pages, the
previously discussed interfaces might be a better choice. For most kernel allocations, however,
kmalloc() is the preferred interface.
The function is declared in :
void * kmalloc(size_t size, gfp_t flags)
The function returns a pointer to a region of memory that is at least size bytes in length 3.
The region of memory allocated is physically contiguous. On error, it returns NULL. Kernel
allocations always succeed, unless an insufficient amount of memory is available. Thus, you
must check for NULL after all calls to kmalloc() and handle the error appropriately.
Let’s look at an example.Assume you need to dynamically allocate enough room for a
abc structure:
struct abc *p;
p = kmalloc(sizeof(struct abc), GFP_KERNEL);
if (!p)
/* handle error ... */
If the kmalloc() call succeeds, p now points to a block of memory that is at least the
requested size.The GFP_KERNEL flag specifies the behavior of the memory allocator while
trying to obtain the memory to return to the caller of kmalloc().

4.3.2 vmalloc
The vmalloc() function works in a similar fashion to kmalloc(), except it allocates
memory that is only virtually contiguous and not necessarily physically contiguous.This is how a
user-space allocation function works:The pages returned by malloc() are contiguous within the

virtual address space of the processor, but there is no guarantee that they are actually contiguous
in physical RAM.The kmalloc() function guarantees that the pages are physically contiguous
(and virtually contiguous).The vmalloc() function ensures only that the pages are contiguous
within the virtual address space. It does this by allocating potentially noncontiguous chunks of
physical memory and “fixing up” the page tables to map the memory into a contiguous chunk of
the logical address space.
For the most part, only hardware devices require physically contiguous memory
allocations. On many architectures, hardware devices live on the other side of the memory
management unit and, thus, do not understand virtual addresses. Consequently, any regions of
memory that hardware devices work with must exist as a physically contiguous block and not
merely a virtually contiguous one. Blocks of memory used only by software— for example,
process-related buffers—are fine using memory that is only virtually contiguous. In your
programming, you never know the difference.All memory appears to the kernel as logically
contiguous.
Despite the fact that physically contiguous memory is required in only certain cases, most
kernel code uses kmalloc() and not vmalloc() to obtain memory. Primarily, this is for
performance.The vmalloc() function, to make nonphysically contiguous pages contiguous in the
virtual address space, must specifically set up the page table entries.Worse, pages obtained via
vmalloc() must be mapped by their individual pages (because they are not physically
contiguous), which results in much greater TLB4 thrashing than you see when directly mapped
memory is used. Because of these concerns, vmalloc() is used only when absolutely necessary—
typically, to obtain large regions of memory. For example, when modules are dynamically
inserted into the kernel, they are loaded into memory created via vmalloc().
The kernel manages the system’s physical memory, which is available only in page sized
chunks. As a result, kmalloc looks rather different from a typical user-space malloc
implementation. A simple, heap-oriented allocation technique would quickly run into trouble; it
would have a hard time working around the page boundaries. Thus, the kernel uses a special
page-oriented allocation technique to get the best use from the system’s RAM. Linux handles
memory allocation by creating a set of pools of memory objects of fixed sizes. Allocation

requests are handled by going to a pool that holds sufficiently large objects and handing an entire
memory chunk back to the requester.

4.4 Zones
4.4.1 Introduction
Due to hardware limitations, the kernel cannot treat all pages as identical. Two
shortcomings of hardware w.r.t. memory addressing:


Some hardware are capable to performing DMA to only certain memory



addresses.
Some architectures are capable of physically addressing larger amounts of



memory than they can virtually address.
Some memory is not permanently mapped into the kernel address space

4.4.2 Three Memory Zones




ZONE_DMA: This zone contains pages that are capable of undergoing DMA.
ZONE_NORMAL: This zone contains normal, regular mapped, pages.
ZONE_HIGHMEM: This zone contains “high memory”, which are pages not
permanently mapped into the kernel’s address space.

These zones are defined in <linux/mmzone.h>.
The actual use and layout of memory zones is architecture dependent.

Zones on X86:

Zone

description

Physical Memory

ZONE_DMA

DMA-able pages

<16MB

ZONE_NORMAL

Normally addressable pages

16~896MB

ZONE_HIGHMEM

Dynamically mapped pages

>896MB

4.4.3 Zone Structure
struct zone {
spinlock_t lock;
unsigned long free_pages;
unsigned long pages_min;
/*...*/
char *name;
/*...*/
};



lock: protects only the structure not all the pages that reside in the zone
free_pages: the number of free pages in this zone. The kernel tries to keep at least



page_min pages free (through swapping), if possible.
name: a NULL-terminated string. Three zones are given the names “DMA”, “Normal”,
and “HighMem”.

4.5 Slab layer
4.5.1 Introduction







“free lists” -- no global control!
Slab layer acts as a generic data structure-caching layer.
Divides different objects into groups called caches.
One cache per object type
For example, for process descriptors, a free list of task_struct structures
Caches are divided into slabs, composed of one or more physically contiguous
pages.

4.5.2 Slab

The slab is the primary unit of currency in the slab allocator. When the allocator needs to
grow a cache, for example, it acquires an entire slab of objects at once. Similarly, the allocator
reclaims unused memory (shrinks a cache) by relinquishing a complete slab.
A slab consists of one or more pages of virtually contiguous memory carved up into
equal-size chunks, with a reference count indicating how many of those chunks have been
allocated. The benefits of using this simple data structure to manage the arena are somewhat
striking:
(1) Reclaiming unused memory is trivial. When the slab reference count goes to zero
the associated pages can be returned to the VM system. Thus a simple reference count replaces
the complex trees, bitmaps, and coalescing algorithms found in most other allocators [Knuth68,
Korn85, Standish80].
(2) Allocating and freeing memory are fast, constant-time operations. All we have to
do is move an object to or from a freelist and update a reference count.
(3) Severe external fragmentation (unused buffers on the freelist) is unlikely. Over
time, many allocators develop an accumulation of small, unusable buffers. This occurs as the
allocator splits existing free buffers to satisfy smaller requests. For example, the right sequence
of 32-byte and 40-byte allocations may result in a large accumulation of free 8-byte buffers —
even though no 8-byte buffers are ever requested [Standish80]. A segregated storage allocator
cannot suffer this fate, since the only way to populate its 8-byte freelist is to actually allocate and
free 8-byte buffers. Any sequence of 32-byte and 40-byte allocations — no matter how complex
— can only result in population of the 32- byte and 40-byte freelists. Since prior allocation is
a good predictor of future allocation [Weinstock88] these buffers are likely to be used again.







Each slab contains some number of objects, cached data structures.
Each slab is in one of three states: full, partial, or empty.
When the kernel requests a new object,
the request is satisfied from a partial slab, if one exists
otherwise, it is satisfied from an empty slab, if one exists otherwise, one empty
slab is created.

The above strategy reduces fragmentation

Fig. Relationship between caches, slabs and objects.

4.6 Slab Layer Allocation
4.6.1 Introduction
Slab allocation is a memory management mechanism intended for the efficient memory
allocation of kernel objects which displays the desirable property of eliminating
fragmentation caused by allocations and deallocations. The technique is used to retain
allocated memory that contains a data object of a certain type for reuse upon subsequent
allocations of objects of the same type; it is analogous to an object pool, but only applies to
memory, not other resources. Slab allocation was first introduced in the Solaris 2.4 kernel by
Jeff Bonwick and now is widely used by many Unix and Unix-like operating systems
including FreeBSD and Linux.

4.6.2 Basics

The primary motivation for slab allocation is that the initialization and destruction of
kernel data objects can actually outweigh the cost of allocating memory for them. [4] As object
creation and deletion are widely employed by the kernel, mitigating overhead costs of
initialization can result in significant performance drops. The notion of object caching was
therefore introduced in order to avoid the invocation of functions used to initialize object state.
With slab allocation, memory chunks suitable to fit data objects of certain type or size are
preallocated. The slab allocator keeps track of these chunks, known as caches, so that when a
request to allocate memory for a data object of a certain type is received it can instantly satisfy
the request with an already allocated slot. Destruction of the object, however, does not free up
the memory, but only opens a slot which is put in the list of free slots by the slab allocator. The
next call to allocate memory of the same size will return the now unused memory slot. This
process eliminates the need to search for suitable memory space and greatly alleviates memory
fragmentation. In this context a slab is one or more contiguous pages in the memory containing
pre-allocated memory chunks.

4.6.3 Implementation
Understanding the slab allocation algorithm requires defining and explaining some terms:
1. Cache: cache represents a small amount of very fast memory. A cache is a storage for a
specific type of object such as semaphores, process descriptors, file objects etc.
2. Slab: slab represents a contiguous piece of memory, usually made of several physically
contiguous pages. A cache is stored in one or more slabs. The slab is the actual container
of data associated to objects of the specific kind of the containing cache.
When a program sets up a cache, it allocates a number of objects to the slabs associated with
that cache. This number depends on the size of the associated slabs.
Slabs may exist in one of the following states :

1. empty - all objects on a slab marked as free
2. partial - slab consists of both used and free objects
3. full - all objects on a slab marked as used
Initially, the system marks each slab as "empty". When the process calls for a new kernel
object, the system tries to find a free location for that object on a partial slab in a cache for that
type of object. If no such location exists, the system allocates a new slab from contiguous
physical pages and assigns it to a cache. The new object gets allocated from this slab, and its
location becomes marked as "partial".
The allocation takes place quickly, because the system builds the objects in advance and
readily allocates them from a slab.

4.7 Non-contiguous memory management
4.7.1 Introduction
The earliest computing system required contiguous storage allocation in which each
program occupied a single contiguous memory block. In these systems, the technique of
multiprogramming was not possible.
In non-contiguous storage allocation, a program is divided into several blocks that may
be placed in different parts of main memory. It is more difficult for an operating system to
control non-contiguous storage allocation. The benefit is that if main memory has many small
holes available instead of a single large hole, then operating system can often load and execute a
program that would otherwise need to wait.
In Contiguous Memory allocation strategies, the Operating System allocates memory to a
process that is always in a sequence for faster retrieval and less overhead, but this strategy
supports static memory allocation only. This strategy is inconvenient in the sense that there are
more instances of internal memory fragmentation than compared to Contiguous Memory
Allocation strategies. The Operating System can also allocate memory dynamically to a process

if the memory is not in sequence; i.e. they are placed in non–contiguous memory segments.
Memory is allotted to a process as it is required. When a process no longer needs to be in
memory, it is released from the memory to produce a free region of memory or a memory hole.
These memory holes and the allocated memory to the other processes remain scattered in
memory. The Operating System can compact this memory at a later point in time to ensure that
the allocated memory is in a sequence and the memory holes are not scattered. This strategy has
support for dynamic memory allocation and facilitates the usage of Virtual Memory. In dynamic
memory allocation there are no instances of internal fragmentation.

4.7.2 Fragmentation
In computer storage, fragmentation is a phenomenon in which storage space is used
inefficiently, reducing capacity or performance and often both. The exact consequences of
fragmentation depend on the specific system of storage allocation in use and the particular form
of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that
case the term also refers to the wasted space itself. For other systems (e.g. the FAT file system)
the space used to store given data (e.g. files) is the same regardless of the degree of
fragmentation (from none to extreme).
Basic Principle
There are three different but related forms of fragmentation: external fragmentation,
internal fragmentation, and data fragmentation, which can be present in isolation or conjunction.
Fragmentation is often accepted in return for improvements in speed or simplicity.
When a computer program requests blocks of memory from the computer system, the
blocks are allocated in chunks. When the computer program is finished with a chunk, it can free
the chunk back to the system, making it available to later be allocated again to another or the
same program. The size and the amount of time a chunk is held by a program varies. During its
lifespan, a computer program can request and free many chunks of memory.
When a program is started, the free memory areas are long and contiguous. Over time
and with use, the long contiguous regions become fragmented into smaller and smaller

contiguous areas. Eventually, it may become impossible for the program to obtain large
contiguous chunks of memory.
Types
A) Internal Fragmentation
Internal fragmentation occurs when the memory allocator leaves extra space empty inside
of a block of memory that has been allocated for a client. This usually happens because the
processor’s design stipulates that memory must be cut into blocks of certain sizes -- for example,
blocks may be required to be evenly be divided by four, eight or 16 bytes. When this occurs, a
client that needs 57 bytes of memory, for example, may be allocated a block that contains 60
bytes, or even 64. The extra bytes that the client doesn’t need go to waste, and over time these
tiny chunks of unused memory can build up and create large quantities of memory that can’t be
put to use by the allocator. Because all of these useless bytes are inside larger memory blocks,
the fragmentation is considered internal.
B) External Fragmentation
External fragmentation happens when the memory allocator leaves sections of unused
memory blocks between portions of allocated memory. For example, if several memory blocks
are allocated in a continuous line but one of the middle blocks in the line is freed (perhaps
because the process that was using that block of memory stopped running), the free block is
fragmented. The block is still available for use by the allocator later if there’s a need for memory
that fits in that block, but the block is now unusable for larger memory needs. It cannot be
lumped back in with the total free memory available to the system, as total memory must be
contiguous for it to be useable for larger tasks. In this way, entire sections of free memory can
end up isolated from the whole that are often too small for significant use, which creates an
overall reduction of free memory that over time can lead to a lack of available memory for key
tasks.
Memory managers can suffer from two types of fragmentation. External fragmentation
occurs when a series of memory requests result in lots of small free blocks, no one of which is

useful for servicing typical requests. Internal fragmentation occurs when more than m words
are allocated to a request for m words, wasting free storage. The difference between internal and
external fragmentation is illustrated by this figure. The small white block labeled "External
fragmentation" is too small to satisfy typical memory requests. The small grey block labeled
"Internal fragmentation" was allocated as part of the grey block to its left, but it does not actually
store information.

Fig.An illustration of internal and external fragmentation
Some memory management schemes sacrifice space to internal fragmentation to make
memory management easier (and perhaps reduce external fragmentation). For example, external
fragmentation does not happen in file management systems that allocate file space in clusters.
Another example of sacrificing space to internal fragmentation so as to simplify memory
management is the buddy method.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close