Notes on the Design and Implementation of UVM

I have been reading a dissertation called “The Design and implementation of UVM” by Dr Chuck Cranor. The dissertation is about the virtual memory manager that NetBSD uses at the moment. It was really informative but a bit too long (I am pretty new to reading such technical stuff).

Below are a couple of notes (Notes are also a bit long 🙂 and please keep in mind that these are scratch) that I have made on it. I hope it will be useful for people to take a quick peek at UVM and also for a review after reading the dissertation (Feel free to ping me to add content to this).

Quick links :

What is UVM?

UVM, the virtual memory manager of NetBSD was designed to replace the BSD VM system. Its aim was to keep the best features of the BSD OS and add new features.

DESIGN AND IMPLEMENTATION OF UVM

Chap 1 : Introduction

Modern os -> require unnecessary data copies

Data copies are expensive

  • Bandwidth of main memory is limited.
  • Copying data flushed important information from the cache.
  • Application affected include video, file servers

Modifying the VM is difficult:

  • Kernel and process share the data structures. (synchronization is a must)
  • VM must be able to handle synchronous and asynchronous I/O operations
  • Errors in VM system are harder to diagnose.

VM system in NetBSD:

  • Va space is defined by map structure. There is a map for each process. Consisting of entries that describe location of memory objects.
  • Vm system manages mappings of individual virtual memory pages to corresponding physical page.

UVM feature

  • Process can loan its memory to other processes.
  • Page transfer
  • Map entry passing – export regions of address space

Chap 2 : Role of VM system

  • When a process is run
    • Subsystem sets up va space.
    • MMU translates each address to physical and CPU gets back instruction.
  • Allows to use more memory than present on the system.
  • Backing store – disk space used for storing memory data.
  • Key operations :
    • Allocation on physical memory. (track of allocated, free pages and allocating pages)
    • Allocation va space. (list of all allocated regions and allocating free regions)
    • Mapping physical pages into va space.
    • Handling page faults. Happens when unmapped memory is referenced.
    • Moving data bw physical memory and backing store.
    • Managing memory shortages. (freeing unused and inactive pages)
    • Duplication of address space during a fork
  • Service of VM can be requested in 3 ways :
    • Syscalls like mprotect, mmap
    • Vm process management services during fork,exec or exit.
    • Page fault.
  • VM can request services of other parts of kernel.
    • Devices like frame buffers – d_mmap (offset into device -> physical address)
    • Vnode system to perform I/O btw VM system’s buffers and underlying file.

VM System and Process life cycle

  • Process startup :
    • Init process first process created by the kernel after bootstrap.
    • Must map data to user (/sbin/init)
      • Text : code for init program
      • Data section
      • Symbols
    • In exec call – kernel opens file – reads the header.
    • Allocates memory for all the sections(text, data, bss)
    • copy-on -write – changes made to memory will be reflected in the backing object.
  • Running processes and Page faults
    • It begins init process execution. At the address specified in the file’s header.
    • Hardware try to execute the first instruction and discovers no physical page is mapped
    • Page fault occurs
      • Process accesses a memory region not mapped. Process is frozen until page fault is resolved.
      • Processor specific part of the kernel catches the page fault and uses the MMU to determine the virtual address that caused the fault and access type of the fault.
      • VM systems page fault routine s called with the above info
      • VM system looks in the process mapping’s to see what data has to be mapped.
      • If process address space doesn’t allow access then return segfault
      • In case the memory access is valid – it loads the data and MMU maps the page. (faulting in data).
    • Init starts with page faults in multiple sections
      • If process faults on page with copy-on-write then file is fetched from backing file and backed in read only.
      • copy -on-write pages start with pages ”un-written state”
  • Forking and exiting
    • Init creates more processes using the fork system call.
    • If vm system copied all the pages then it would be expensive.
      • It uses copy-on-write feature again.
      • fork – all pages that are copied to child process are put into unwritten state.
      • Child does another exec operation on a different file. In preparing for th exec the kernel will ask the vm to unmap active regions in the process doing the exec.
      • Process returns to unmapped state.
    • On a process exit the VM system removes all mapping from process address space.
    • Handle errors occuring with copy-on-write and shared memory.
  • VM operations
    • Break – change process heap area
    • Mmap – memory maps a file
    • Munmap – remove mapping
    • Mprotect – changes protection
    • Minherit – changes inheritance
    • Msync – flushes modified memory back to backing store
    • Madvise – changes the current usage pattern.
    • Mlock – locks data into physical memory.
    • swapctl – congfigure swap area.
    • Majority of the time is spent – resolving page faults, adding and removing mapping.
      • When number of free pages drop below threshold – start the pagedeamon.
      • Pagedeamon – special system process that looks for pages of memory and that are allocated but not used in a while and adds them back to the free list.
        • If page has modified data then it needs to be flushed before freed. Pagedeamon does this also

The evolution of the VM system in BSD

Design overview of BSD VM system

  • Layering in the BSD VM system
    • Machine independent (MI) and Machine dependent(MD)
    • MD – called pmap – handles lower level details
    • Each architecture has its own pmap module.
    • Each layer does its own mapping
      • MI maps memory objects into the va space.
      • A memory object – a kernel data structure representing data mapped into the va space.
      • MD – only knows how to map physiclal pages of memory into va space.
  • Machine dependent Layer
    • Meant to be the interface for MI to the MMU
    • MD can be abstracted as a big array of mapping entries indexed by the va to produce a physical address and attributes.
    • Mapping entries are called PTEs. How PTEs are stored depends on the hardware.
    • PTE -> PT -> PD
    • Common pmap operations:
      • Pmap-enter : adds a virt_to_phys address mapping to specified pmap at specified protection
      • Pmap-remove : removes a range of the virt_to_phys address mapping/
      • Pmap_protect – changes protection
      • Pmap_page_protect : changes the protection of all mappings of a single page.
      • Pmap_is_refernced , pmap_is_modified – tests the referenced and modified attributes for a page.
      • Pmap_copy : copy mappings
  • MI layer
    • Higher level functions are handled here.
    • Centered around 5 structures
      • vmspace – describes va space of a process. Contains pointers to vm_map and pmap and statistics
      • Vm_map : list of mappings in the va space and attributes
        • pmap pointer
        • Header to linked list of vmap_map_entry
        • Size
        • Refcnt
        • Min va
        • Max va
      • Vm_object : describes a file that can be mapped into the a va space.
      • vm_pager : describes how to access a backing store. List of functions used by the object to fetch and store pages.
      • Vm_page : describes a page of physical memory.
    • Data in VM pages is copied to and from backing store by VM pagers.
    • Each vm_map structure has a pmap structure to contain the lower level mapping info for the va space mapped by the vm_map.
    • In order to find which vm_page should be mapped at a certain address the VM system must look in the vm_map for the mapping of the VA. check the vm_object for the needed page.
      • If the page is resident end of search
      • Else, VM system must issue a request to the vm_object’s vm_pager to fetch the data from the backing store.
  • VM Maps
    • Vm_map structure maps the objects to the regions of va space.
    • Structure entries given above
    • It also has a pointer to the backing object and attributes such as protection and inheritance code.
    • Map entry usualy points to the VM object that it is mapping. (sometimes can point to another map)
    • vm_map _entry
      • Structure
        • Prev and next
        • Start
        • End
        • Object -> v,_object/vm_map
        • Map
        • Submap
        • Attributes
      • Submaps can only be used by the kernel. Main purpose is to break up the kernel va space into small units.
      • Share maps allows 2 or more processes to share a range og va space and thus all mappings within it.
    • There are lot of functions that perform operations on VM maps. There are functions to create, free, add and drop references to a UVM map and many more.
    • VM map entry structures are allocated with malloc. But it can’t allocate its own VM map’s entries with malloc it might loop while allocating map entries. And hence allocates its VM map entries from a private structure.
  • VM Objects
    • VM map contains a list of map entries that show allocated regions in a map’s address space.
    • Each map entry points to a memory object that is mapped in that region.
    • All memory objects are defined by the VM object structure.
      • Memq -> vm_page
      • Obj_list -> vm_object (all active objects)
      • Pip (paging in progres?)
      • refcnt
      • Copy -> copy object
      • Shadow -> shadow object
      • Pager -> vm_pager
      • Cached list
  • VM pagers
    • Structure
      • List of all pagers -> vm_pager
      • Handle – identification tag for the pager
      • Type
      • Pagerops
      • Data -> void
    • Used by vm_objects to read and write data from backing store into a page of memory.
    • 3 types of pagers
      • Device – /dev files
      • Swap – anonymous memory objects
      • vnode – used for normal files that are mmaped
    • Kernel maintains a linked list of all VM pager structures on the system.
    • Supported operations include :
      • Allocating a new pager structure
      • Freeing a pager structure
      • Reading pages in
      • Saving pages
  • VM pages
    • Physical memory divided into pages
    • VM page structure for every page that is available for the VM system to use.
    • Structure
      • Page list -> vm_page – active, inactive and free pages
      • Hash queue -> vm_page – allows a page to be quickly looked up
      • Obj list -list of pages that belong to an object
      • Pointer to object -> vm_object
      • Offset in object
      • Flags
      • Physical address
  • Copy-on-write and Object chaining
    • How VM handles objects that are mapped copy-on-write
    • In cow mapping
      • Mapped pages are not shared
      • private
      • uses shadow objects – anonymous memory object that contains modified pages of a copy-on-write mapping.
    • Types of cow mapping
      • Private – changes are private to that process. Changes made to the underlying object are not shadowed by a page in the shadow object are seen.
      • Copy – mapping process gets a complete screenshot of the object being mapped at the time of mapping.
    • Private
      • VM map structure -> map entry -> VM object (corresponding to file).
      • Map entry will have both cow and needs copy attribute set.
      • When written to -> a shadow object will be allocated(to hold changed pages).
      • On fork
        • Child needs copy of the memory
        • Shadow object is treated like backing object.
        • Both child and parent enter “needs copy” state
        • If any process tries to write. VM will catch it and insert a new shadow object and clear “needs copy”
        • Called shadow object chaining
      • Once an item in the chain exits then the chain needs to merge to the latest copy of the shadow object. This is not done by BSD system. Called as the object collapse problem or swap memory leak
      • Solution code is pretty complex
      • The solution to this problem by UVM shown later.
    • Copy
      • A copy object is used to shadow rather than the usual object
      • The list of objects is copy object chain
      • Code object are needed to support non standard copy , cow semantics.
      • Add a another layer of complexity
  • Page Fault Handling
    • MD catched invalid mapping and calls the VM page fault routine.
    • If unable to handle then return a segfault
    • Fault routine operation :
      • List of map entries searched for map_entry whose va falls in the range. Incase not found return segfault.
      • Fault routine starts at the VM object it points to. And it searches for the needed page. Continues until it finishes all the objects.
      • The VM’s pager must retrieve from backing store.
      • If there is a I/O to get the page.then all the data structure are unlocked,
      • Pmap layer gets mapped
      • Fault routine returns success code.
  • Memory sharing
    • Multithread environment – VM map and pmap can be shared by threads
    • VM space shared using share maps
    • Memory objects mapped into multiple VM entry structures
    • Objects that are copied with cow use read-only pages to defer the copy.

CHAPTER 3 : Goals

(using this as a reference to know whether I am familiar with all the features)

  • Allow process to safely let a shared cow copy of its memory to be used by other processes.
  • Pages from IO or IPC can be easily inserted into another processes address space
  • Process and kernel should exchange large chunks of va spaces using VM’s memory mapping structures.
  • Optimize parts of the VM that effect performance and complexity of the New vm
  • Improve secondary elements

UVM Features (again a reference)

  • Page loanout –
    • loaned pages are read only.
    • If a process tries to change data in a loaned page. Changes are stored in a other (non-loaned) page before the change is allowed.
  • Page transfer –
    • Kernel subsystems transfer pages under control to a process’s normal virtual memory
  • Map entry passing –
    • Process and kernel can dynamically exchange va space.
  • Partial deallocation –

Improved Features

  • Simplified copy-on-write
  • Clustered anonymous memory management
  • Improved page wiring
    • Data is said to be wired if it resides in a page that is set to be always resident in memory.
  • Efficient traversal
  • Reduced lock time during unmap

Design Overview

  • MI layer
    • Structures
      • Vmspace
      • Vmmap
  • Data structure locking
    • Use of multiple locks – fine grain locking
      • Sleep lock – semaphore
      • Spin lock – mutex
    • Map – only lock holder can change the sorted list of map entries.
    • Amap – adding or removing anon structures
    • Object – object lock protects the list of pages associated with the object changed.
    • Anon – page or disk it points to
  • VM maps
    • Map_entry modified
      • Pointer to backing UVM object\
      • A vm_aref structure
    • Uvm_map function – all in one function to do a mapping with specified attributes.
    • Functions for page loanout etc
  • UVM Object
    • Uvm_object different from the vm_object
    • structure
      • Object lock
      • Ptr to pager operations
      • List of obj pages -> vm_page
      • Number of pages
      • Refs
  • Anonymous memory structures
    • Arefs, amaps, anons
    • Map entry contains aref structures.
    • Aref can point to amap structures
    • Amap -> top layer of the UVM 2 layered vm mapping scheme
    • Amaps contains one or more anon structures
  • Pager operations
    • pager _ops defines the set of operations that can be performed on a object.
    • Each object has a pointer to its pager operations
    • Extra vm_pager level is eliminated
  • Pages
    • Vm_page structure has been updated for amap and page loanout
    • Loan counter added

Other VM managers

  • Mach – microkernel -only important processes are there in the kernel.
  • All other features are seperate server processes
  • Message passing – different processes hence communication is important.
    • Small, copied into kernel buffer and then copied out
    • Large, VM system is used .
    • There address space is copied into the copy map of the shadow and copy objects. The copy map is then passed.
  • External pagers – user processes can act as a pager for an object
    • Page Daemon can pageout a page managed by an external process
    • Copy map can access a pointer to a copy map copy object
    • Pages to be pageout are inserted into copy map’s copy object
    • Copy map’s copy pager is managed by the default pager
    • Pages can be double mapped
      • Have two or more vm_page structures
  • Fictitious pages
    • A vm_page structure with no physical memory
    • Used as busy-page placeholders to block access to certain data structures

FreeBSD VM

  • Data structures are similar
  • Freebsd doesnt have a lot of features that UVM has

Sun OS

  • Address space described by as structure
    • Contains a list of segments
  • Segments are decribed by a seg structure
  • Seg structure contains its starting virtual address, size and pointer to a segment driver
  • The format of the memory object structure is private to the object’s segment driver.
  • Therefore, pointer to the segment driver is in the seg structure
  • Also divided the vm system to two layers MI and MD
  • Also uses amaps and anons
    • Are not a general purpose VM abstraction.
  • The mapping cannot be changed cow remains cow

Linux

  • mm_struct – address.
  • Mapped area of memory has a set of flags, an offset and pointer to file
  • Each page has corresponding page structure
  • VM layering
    • MI expects the MD code to provide the set of page tables for a third level forward mapped MMU
    • MI code reads and writes into these page tables
    • If MMU matches the Linux MI model then the code can be directly written to the real page tables
    • Disadvantages
      • Hooks for md cache flushing must appear throughout the MI
      • MI virtual operations must walk all the three levels of layering
    • Entire contents of the physical memory continously mapped into the kernel ‘s address space.
    • Kernel va must be as large as the physical memory.
    • This allows faster page recovery
    • Cow
      • Each page has a refernce counter
      • Each mapping counts as a reference
      • Pages in the buffer cache also has a reference
      • If refnt > 2 then page is shared
      • Cow mem is identified with a cow flag in the vm_area struct the maps the region
      • First read-fault will cause the backing file to be mapped in read-only
      • On write fault cow area of region. The page ‘s refcnt is checked to see if page is shared
        • If page is shared, new page is allocated the data is copied. And new page is mapped in.
        • If process with cow is forked, refcnt is incremented
      • Cow memory is paged out to swap. Swap is divided into page sized blocks.
        • Blocks have reference counter
        • To page a cow page out to swap the PTE with a invalid PTE containing address on swap where data is located.
        • Page offset is set to the address of its location on swap
        • It is retrieved by the page fault routine on a read or write fault.
    • Page Tables : Information stored in the hardware page tables
      • The information stored in hardware page tables can be thrown away.
      • It can be reconstructed based on the information in the address space of map structure
    • Doesn’t support cow memory sharing or memory inheritance.

Windows

  • Each process has its own private va space described by VADs
  • Memory allocation –
    • Processes allocate and deallocate memory
      • VM Can be reserved by adding a entry to the VAD list.
      • Process can demmit and free memory when no longer needed.
  • Page table and management
    • Has 2 level instead of 3.
    • Maintaines a database for all pages\
  • Prototype PTEs
    • Pages can be shared through this
    • Each memory mapped section has a array of prototype PTEs
    • These are used to determine location of each page the object contains

Chapter 4 : Anonymous Memory handling

Anonymous memory – memory that is freed as soon as it is no longer referenced. It is called anonymous as it is not associated with a file and doesn’t have a name. *Anonymous memory is paged out to swap when running out of memory

Overview

  • Anonymous memory used for a number of purposes
    • Zero fill mappings – bss and stack
    • To store changed pages in cow mapping
    • Shared memory regions that can be accessed with system V system calls
    • For pageable areas of kernel memory
  • BSD VM
    • Managed through objects whose pager is swap pager
  • UVM
    • Uses 2 level amap memory mapping scheme
    • Anonymous memory can be found at either level
    • At lower backing object level
      • Anonymous memory uvm_object structures supported
      • Objects use the aobj pager to access the backing store
    • Upper level (amap layer)
      • Entirely anonymous memory
      • Referenced through amaps
      • During page fault
        • UVM checks if fault is in amap layer
        • If not the backing object is checked

Amap Interface

  • First time data is written to memory mapped by an anonymous mapping, a amap structure is created.
  • Amap contains slot for each page – an anon structure each
  • Amaps are referenced from map entry structures.
    • If map entry is split then the resulting entries should each point to parts of the amap
    • Original amap could be broken, but will be expensive data copying.
    • To deal with the splitting a vm_aref structure is used.
      • Slot offset
      • Pointer to amap
    • One of the new entries will use the slot offset.
    • Vm_anon structure
      • Spin lock
      • ref
      • Page -> vm_anon/vm_page
      • Swap slot
        • If non zero – anon’s data has been paged out
    • Unused anon structures use their pointer to form the free list
    • In use use their pointer to point to the vm_page structure associated with the data
    • When UVM needs to access the data contained in UVM. It looks for the page structure. If there is none then it allocates one and arranges for the data to be brought from the backing store

Amap implementation options

  • Internal structure of the amap is considered private to the amap implementation.
  • Using arrays
    • Array of pointers to the anon structure
    • Array indexed by the slot number.
    • lookup operations fast
    • Operations requiring traversing all active slots will be hard
  • Using linked list
    • Dynamic sized array contains list of anons and their slot number
    • If array fills then newer one is allocated
    • Saves physical memory.
    • Slows lookup – entire list has to be searched for each lookup
  • UVM implementation
    • Structure
      • Lock
      • Reference counter – checks how many maps are referencing it.
      • Flag for sharing
      • Number of slots allocated for the amap
      • Number of slots active
      • No of active slots in use
      • 4 arrays that describe the anons
        • am_anon
          • Set of pointers to anon structures indexed by slot number
          • Quick amap lookup operations
        • Am_backptr
          • Used to keep slots and anon in sync
          • Maps an active slot number to the index in the am_slots array in which its entry lies
        • Am_slots
          • Contiguous list of slots in the am_anon structure.
          • Used to quickly traverse all anons pointed by the amap
        • Am_ppref
    • Partial unmap of an Amap
      • When all references are removed due to unmap.
        • Physical memory and backing store can be freed as memory can never be referenced again.
        • Problem arises when only part of an amap is unmapped
      • Tradeoffs
        • Amap module does not free partially unmapped sections of an amap until the entire amap’s reference goes to 0. Then less code complexiety
        • Newer VM systems use amap layer hence cause more partial unmaps.
      • Doesn’t cause swap memory leaks
        • Uses internal functions and am_ppref array
          • Am_ppref is a per page array of ref counters
          • Only allocated and active when a reference to a map is split
          • During startup – each page gets a ref count equal to that of the amap itself.
        • Parts of amap are referenced and freed per page reference counters.
        • If the reference count drops to 0 then any allocated memory is freed
      • Keeping ref counter for each page in a large map would be expensive. Hence use one reference counter for each contigious block of memory.
        • Using a second array to keep track of length of mapping will allow us to change the ref count on block of pages quickly.
        • It also doubles memory needed by per page reference count system.
        • UVM uses large number of don’t cares. To compact the references and length array into one single array
        • References divided into two groups
          • One page
          • More than one page
        • Single page – positive
        • Multi page – negative
        • To include 0 add 1 to the count during the start
      • Processing cost starts of small and rises as the amap is broken into smaller chunks.
      • Processes that do not do any partial unmaps will not have any per page ref count arrays active.

Accessing Anonymous backing store

  • BSD – vm_pager structure is allocated and assigned to every vm_object that accesses backing store.
    • Anonymous memory lives in vm_object structures whose pagers are with the system swap area
    • These are usually shadow objects or copy objects
    • Location of backing store to which memory can be paged out is accessed through vm_pager
    • Vm_pager points to another structure for the swap pager that translates offsets into locations on the map
  • UVM
    • Anonymous page can belong either to aobj uvm_object or an anon.
    • Accessing backing store is done through uvm_objects pager operations pointer.
    • This will not work for anon (not uvm_object)
      • Since anon is backed by the swap area there is no need for such a pointer
      • However we still need to map the paged out data
      • Sun
        • Statically assigning a page sized block of swap.
        • Forces system swap size to be greater than size of physical memory
        • Static allocation makes it hard to cluster anonymous memory pageouts and thus take longer
      • UVM solution
      • Assignment of a swap block to an anon is done by the pagedeamonat pageout time
      • Swapping layer of vm system presents swap as a big file.
      • This can be accessed via /dev/drum
        • Each page sized chunk is assigned a slot number
        • Anon stores this slot number
      • Data in anon can be in one of three possible states
        • Resident with no backing store – an_page points to the page. An_swslot is empty. Starting state
        • Non resident – swap slot allocated for it. Page paged out and then freed. Page pointer = 0 swap slot != 0
        • Resident with backing slot assigned – prev non resident and then data paged in.

Effects of amaps on Pages and objects

  • BSD
    • Each allocated page can belong to a single structure at a specified offset. Kernel asks to perform an operation on the object and locks the object with the object pointer.
  • UVM
    • Pages can belong to uvm_object and also anon structures
    • Anon is likely to be used by one or more amaps.
    • There is a flag_bit PG_ANON which is true if page is part of an anon structure.

Effect of amaps on Forking and Copy on write

  • Copy-on-write
    • Va space is described by a map structure.
      • Mapped region is described by map entry structures. Stored in a linked list.
      • Map entry maps 2 levels
        • Top level anonymous layer
        • Bottom backing object
      • Both of these can be null
      • Cow mappings – cow anonymous data is stored in the amap layer.
      • There are 2 important boolean attributes
      • copy-on-write
        • Mapped region is cow. And all changes made to the memory should be done in anonymous memory.
        • If false – mapping is shared and all changes should be made directly to the mapped object.
      • Needs-copy
        • If true – Region needs its own private amap. Hasn’t been allocated yet
        • Allows deferring the creation of an amap until the first memory write occurs. Used for cow operations.
      • Data area mapped read-write cow.
        • Aref is null
        • Cow is true
        • Need-copy is true
      • The difference between a zero-fill mapping and the copy-on-write mapping of a program’s data area is that the zero-fill area has a null backing object, where as the data area has a file as a backing object.
      • When data is written to a cow area – an anon with a ref cnt of one is allocated in the amap to contain the data
      • Ref count is used to determine its cow status.
        • 1 -> only one is using. Safe to write directly
        • >1 -> cow operation must be performed
      • COW operation
        • New anon – ref count 1
        • New page allocated – data is copied from old to new anon’s page.
        • Old ref counter dropped by one.
        • New anon is installed in current amap
        • Cow is complete and write can proceed.
      • If system out of memory
        • Allocation fail
        • Wait for process to be killed.

Forking

  • Must create new address space for the child process.
  • Traditionally, regions mapped cow are copied and regions mapped shared are shared.
  • Each map entry in a map has a interitence value
    • None : Child process should not get access to the area of memory mapped by the map entry.
    • Share : Parent and child process should share the area of memory.
      • Both the references to any amao and backing object should be shared.
      • Cow flag of a mapping doesn’t change.
    • Copy : Child should get a cow copy of the parents entry
      • All resident pages in parent are write protected.
      • Page fault occurs next time parent attempts to write to one of the pages.
        • Page fault handler – cow on the faulting page.
        • child gets a reference to parent’s backing object.
        • Child adds a reference to parent’s amap and sets cow flag.
        • Needs-copy flag is set in both.
  • Default inheritance values
    • Cow region – copy
    • Shared region – share
    • It can be changed with the minherit syscall
  • Shared inheritance – amap structure can be shared between the mappings.
    • Shared flag is set.
    • Amap is careful to propogate changes to all processes
  • Function to copy address space is uvmspace_fork
  • Diagrams for share, copy inheritance.

Inheritance and Forking

  • Share inheritance when Needs-copy is true
    • Data should be shared between parent and child.
    • Previous configuration will not work
      • Reason shown in diagram
    • Problem avoided by checking the needs copy flag on the parent
      • If set, UVM calls amap_copy immediately so that amap to be shared can be created.
  • Copy inheritance when amap is shared
    • Amap shared by two processes
      • Share flag set and ref count of 2
      • 1 process mapping the map has its inheritance value set to copy.
      • Process 2 forks
    • A write fault by any process will create issues.
      • If p1 writes, p2 will see the change because its sharing. P3 will also see.
      • If p2 writes, NC flag will cause a new amap to be allocated to store the changed page. (not shared with p1)
      • If p3 writes, it will have a new amap as required. But p2’s NC flag will not be cleared.
    • Correct way of handling will be to use amap_copy during the fork to immediately clear NC flag in the child.
  • Copy inheritance of a shared mapping.
    • Inheritance changed with minherit syscall.
    • Unusual
    • Handled using
      • Data in the amap layer becomes cow in both parent and child.
      • Data in the object layer remains shared in the parent process but cow in the child.
  • Copy inheritance when map entry is wired
    • Wired : it’s always resident and never needs to be fetched from the backing store.
      • Used to improve performance
    • If process with wired memory inheritance value is copy forks
      • Fork calls amap_cow_now
      • Ensures that the child has a amao and cow faults are resolved.

Chapter 5 : Handling page faults

Page fault overview

  • Called by low level machine dependant code when MMU detects a memory access that causes a page fault.
  • Fault handler must use information stored in the MI layer of the VM system to solve the fault.
  • If unable then the a error signal is sent to the process
  • Two kinds of faults
    • Read fault – when a process or kernel tries to read from unmapped memory
    • Write fault – write to unmapped memory or read-only memory
  • BSD – page fault routines needs to traverse shadow and copy object chains.
  • UVM
    • Looking up the faulting address in the process’s VM map
    • Map entry is found, routine checks for faulting page.
      • Checks amap level first
      • Then requests page from vm_object
    • If cow fault, new anon is allocated and inserted into the amap.
    • Establishes a mapping in the faulting process’s pmap and returns.
  • Calling UVM Fault
    • Called from MD code
      • Faulting va – obtained from MMU
      • Faulting vm_map – based on Va, Hardware info and pointer to running process. Determine if fault occured in current process’s vm_map or kernel’s vm_map.
      • Fault type –
        • Invalid – Va addressed was not mapped
        • Protect – Mapped but access violation
        • Wire – internal, while wiring pages
      • Access type – read or write
    • Fault routine uses above info to process the fault. It returns a status value to the MD code.
      • Success – resolved fault
      • Error
  • Fault address lookup
    • Determine what data is mapped.
    • Information in vm_map structures sorted linked list of map entries.
      • Search linked list for structure that maps the faulting address
    • Issues
      • Map lock
        • Each structure has its own lock
        • Maps are read locked when read and write locked when written to
        • Multiple processes can read lock at the same time
        • One process can write lock. Only if process is not read locked.
        • When fault routine is called – map is unlocked.
          • Routine read locks in order to lookup the address.
          • As long as the routine holds the lock the lookup is valid
        • Case : routine has to do lenghty IO operation,
          • It unlocks the map
          • After the IO it relocks to check whether the lookup was valid.
      • Map lookup leads to share or sub map
        • Then lookup must be repeated in that map
        • Only two levels of maps can be encountered.
          • Faulting map (parent)
          • Share or sub map
        • Data structure with info regarding current mapping of the faulting address. – uvm_faultinfo
        • Fault routine stores the map, address and size in it
        • Uvmfault_lookup is called with the structure as a argument.
          • If address is not mapped then it returns a error code
          • If address is in region mapped with submap
            • it makes the original the parent map,
            • locks the new map,
            • Repeats the lookup in the new map
            • If successful – fills the other fields
          • Address that map to a share/submap
            • Parent map field point to original map.
            • Map field point to the share map.
        • Lowest level map ends up in the map field
      • Faultinfo structure contains version numbers for both the parent and current map. This is used to check the integrity of the look up operation incase there are any IO in between
        • Incase the version numbers don’t match the lookup is done again,
  • Post lookup Checks
    • If the look up fails then a invalid address error is returned to the caller.
    • If successful – current map is obtained through the faultinfo structure.
    • Routine now has readlocks on the maps in the structure.
      • Addresses current protection is checked with the type of access that caused the fault.
      • The needs copy flag is checked. If a write fault or the mapping is zerofill then the needs-copy flag will need to be cleared. Enter uvnfault_amapcopy function.
      • Fault routine checks the current map entry to check mapping in one of the two levels. If there no amap or uvm_object associated. Then returns a invalid address error
  • Establishing Fault Ranges
    • UVM page fault routine establishes a range of pages to examine during the fault.
      • Narrow fault – just the page
    • Size determined by the size of va mapped by the map entry
    • For sequential usage – pages behind the faulting page are flushed.
    • For RMA the range is just the page
    • Normal has a few pages in either direction
  • Mapping neighbouring Resident pages
    • Fault routine looks for neighbouring resident pages in the mapping’s amap
    • If found unmapped but resident – it maps it. (reducing future faults)
    • UVM ignores neighbouring non resident anon with data paged out to the swap area.
    • If a fault routine detected that a uvm_object’s page is needed to resolve the fault then it looks for resident paes in that object as well.
    • Neighbouring objects are mapped based on the memory usage hint.
    • Locked get – All data structures are locked when get function is called. If data is resident.
    • Unlocked get – Data is not resident, fault rountine will unlock the data structure and call the get function again to fo a I/O to the backing store.
    • This is the preliminary work of the fault routine.\
      • Case1 – anon faults
      • Case 2 – object faults
  • Anon Faults
    • Data faulted resides in a anon structure.
    • Content of mapping’s object layer is irrelevant.
    • Processing an anon fault
      • Ensuring the anon’s data is resident
      • Handling two sub cases of anon faults
      • Mapping in the page
  • STEP 1 : Ensuring the Faulting Data is resident
    • Done using the anonget function
      • Called with the faultinfo, amap and anon
      • All must be locked.
    • 3 cases to be handled
      • Non-resident : paged to swap
      • Unavailable resident page : resident but busy
      • Avaliable resident page : resident and available
    • Non resident
      • Made resident.
      • Allocating a new page
      • Unlocking the fault structures
      • Issue request to swap I/O layer to read the data from swap
    • Resident
      • Mark wanted
      • Unlock fault structures
      • Sleep until page is available
      • Types of pages
        • Busy – data in flux
          • Page setting the flag is the owner
        • Released – being freed
  • STEP 2 : Anon Fault Sub Cases
    • Read-fault or write fault on anon with ref count = 1
      • Faulting page must be mapped read only
    • Write fault on anon with ref count > 1
      • Fault routine must allocate new anon with new page
      • Data is copied to the new anon
      • New anon is installed in place of old one
  • STEP 3 : Mapping the Page in
    • New page can be entered into the process’s pmap.
    • Resume execution
  • Object Faults
    • 3 step procedure
  • STEP 1 : Fetching non resident data
    • If page not resident
      • Unlocks all data structures
      • Issues unlocked get request
      • Sleep until data is available
  • STEP 2 : Object fault subcases
    • Object’s page is mapped directly into the faulting process’ address space
      • Triggered by read fault or write fault (on share mapped)
      • Page will be mapped read – only
    • Write fault on a object mapped cow
      • Allocate a new anon
      • Copy data to new page
  • STEP 3 : Mapping page in
    • Enter page into the pmap with pmap_enter
    • Clear the page’s busy flag

Error Recovery

  • Invalid VA
  • Protection violations
  • Null Mapping
  • Out of physical memory
    • Physical memory allocated
      • Faulting memory reference is on a unaccesses 0-fill area
      • Write to cow memory
    • Wakes the page daemon
  • Out of Anonymous VM
  • IO errors
    • Either re-try the fault
    • Make the fault routine give up

Chapter 6 : Pager Interface

Role of the Pager

  • Data structure that points to a set of function that perform operations on the object.
  • 3 pagers
    • Aobj – used by the uvm_object that contain anonymous memory.
    • Device – used by the uvm_object that have device memory
    • Vnode – normal file memory
  • Pager related operations using pager functions
  • Pagers perform operations only on uvm_object

Pager operations

  • Composed of functions that perform operations on an object.
  • Pointers to all these functions except one are defined by a uvm_pagerops
  • Init operation
    • UVM startup routine will call the pager’s pgo_init functions
    • This allows each pager to set any global data structutres it needs before the pager is first used.
    • pager init is not needed by all pagers. Those not needing set the pointer to null
  • Attach operation
    • Used to gain initial reference to an object.
    • Must be called before the VM system has a pointer to the object and its pagerops structure. Hence not included as a part of the pagerops.
    • Returns a pointer to the uvm_object structure
    • 3 pagers 3 functions
      • Uao create – aobj
        • Allocates new aobj of size and returns pointer to its uvm_object.
        • Used to map system V shared memory
      • Udv attach – device attach
        • Dev_t device identified and returns pointer to its object.
        • If no uvm_object. It will allocate one.
      • Uvn attach – vnode attach function
        • Pointer to a vnode structure and returns a pointer to a vnode’s object structure.
        • If not being used. It will be initialized
        • Access level used to determine if mapped writeable or not.
    • Return null if the operation fails
  • Reference operation
    • Takes pointer to an unlocked object and adds a reference to it.
    • Called when performing mapping operations
    • Ex : forking, child get access to any uvm_object it inherits
  • Detach operation
    • Takes unlocked object and drops a reference
    • Unmap operation.
  • Get operation
    • Used to obtain pages from the object.
    • Called by the page fault routine to get a object’s pages to resolve a fault
    • The page where the fault has occurred is called the center page.
    • Allpages flag = removes importance on the center page
    • Two modes
      • Locked get
        • Critical data structures locked
        • Does not unlock
        • Resident pages
        • Can’t have IO
      • Unlocked get
    • Arguments
      • uobj: A pointer to the locked object whose data is being requested.
      • offset: The location in the object of the first page being requested.
      • pps: An array of pointers to vm page structures. There is one pointer per page in the requested range.
      • npagesp: A pointer to an integer that contains the number of page pointers in the “pps” array.
        • For locked get operations, the number of pages found
      • centeridx: The index of the “center” page in the pps array.
      • access type: Indicates how the caller wants to access the page. The access type will either read or write
      • advice: A hint to the pager as to the calling process’ access pattern on the object’s memory.
        • This value usually comes from the map entry data structure’s advice field.
      • flags: Flags for the get operation. The get function has two flags: “locked” and “allpages.”
    • Return Values
      • VM PAGER OK: The get operation was successful.
      • VM PAGER BAD: The requested pages are not part of the object.
      • VM PAGER ERROR: An I/O error occurred while fetching the data. This is usually the result of a hardware problem.
      • VM PAGER AGAIN: A temporary resource shortage occurred.
      • VM PAGER UNLOCK: The requested pages cannot be retrieved without unlocking the object for I/O. This value can only be returned by a locked get operation
    • Get operation can be called by the fault routine or the loanout routine
    • The routine determines the range of interest.
    • The routine checks the amap layer for the page.
      • If not found, there is a object in the object layer.
    • The routine passes an array of page pointers into the get routine.
    • Neighbouring page pointers are set as don’t care
    • The routine returns a value
      • VM PAGER OK – center page resident and available
      • Otherwise VM PAGER UNLOCK
        • Will unlock data structures and do a unlocked get
  • Asyncget operation
    • Pager to start paging in a objects data from backing store.
    • Returns after starting IO
    • Useful to preload the data in a object
  • The Fault operation
    • Operation to allow the pager more control over how pages are faulted in
    • Pagers either have a get or fault operation.
    • Pager get takes a object and returns pointers to the vm_page structures.
    • Device memory doesn’t have any vm_pages with it.
    • Page fault operation takes 8 arguments
      • ufi: A uvm faultinfo data structure that contains the current state of the fault.
      • vaddr: The virtual address of the first page in the requested region in the current vm map’s address space.
      • pps: An array of vm page pointers. The fault routine ignores any page with a non-null pointer.
      • npages: The number of pages in the pps array.
      • centeridx: The index in pps of the page being faulted on.
      • fault type: The fault type.
      • access type: The access type.
      • flags: The flags (only “allpages” is allowed).
  • Put operation
    • Takes modified pages and flushes the changes to the backing store/
    • Called by pagedeamon and pager flush operation.
      • uobj: The locked object to which the pages belong. The put function will unlock the object before starting I/O and return with the object unlocked.
      • pps: An array of pointers to the pages being flushed. The pages must have been marked busy by the caller.
      • npages: The number of pages in the pps array.
      • flags: The flags. Currently there is only one flag: PGO SYNCIO. If this flag is set then synchronous I/O rather than asynchronous I/O is used.
    • Return values
      • Used by get
        • Calling function must clear the busy flag
      • VM_PAGER_PEND – when asynchronous IO started
        • Pager will unbusy the pages and set the clean flag.
      • If successful then the page and the backing store are in sync
  • Flush operation
    • Performs number of flushing operations on pages in a object
      • Marking pages inactive
      • Removing pages from object
      • Writing dirty pages to backing store
    • Used by msync to write memory to backing store and invalidate pages
    • Takes arguments
      • uobj: The locked object to which the pages belong. The flush function returns with the object locked. The calling function must not be holding the page queue lock since the flush function may have to lock it. The flush function will unlock the object if it has to perform any I/O.
      • start: The starting offset in the object.
      • end: The ending offset in the object. Pages from the ending offset onward are not flushed.
      • flags: The flags.
    • Returns TRUE unless there was a IO error
    • Flags
      • PGO CLEANIT
        • Write pages to backing store
        • Unlock object during IO
      • PGO SYNCIO
        • Cleaning IO to be synchronous
        • Will not return until IO complete
      • PGO DOACTCLUST
        • Flush function to consider pages that are currently mapped for clustering when writing to backing store
      • PGO DEACTIVATE
        • Object pages marked as inactive
        • Likely to be recycled by pagedeamon
      • PGO FREE
        • Objects pages to be freed
      • PGO ALLPAGES
        • Ignore start and end arguments
        • Perform flushing on all objects
  • Cluster operation
    • Determines how large a clustered IO operation can be
    • Takes the object and the offset of a page
    • Returns start and end of the sluster
    • Max size is MAXBSIZE of data
  • Make put cluster
    • Optional pager operation that builds a cluster of pages for pageout operation
    • Cluster function is null. IO will never be clustered on pageout
    • Uvm_mk_pcluster – make put cluster that can be used if a special function is not need
    • Called by the pagedeamon when paging data to backing store.
    • Use – helps to see if any neighbouring pages can be included in the IO
    • Candidate pages – that are possible
      • These are dirty but not busy
      • Will be made write protect and busy flagged
      • Caller responsible for cleaning the the flag when it is done
  • Share protect
    • Optional pager that changes the protection of a objects pages in all maps.
    • Object must be locked by the caller.
    • Called when removing mappings from share map
      • If share map is unmapped – must be removed from all processes
      • VM does not keep track of which process are using it
      • Hence removed from all processes
    • If pager does not have share protect. Objects using pager can’t have share maps.
  • Async IO Done operation
    • Sync IO –
      • pager starts IO
      • Waits for it to complete
      • State of IO stored on the kernel stack
    • Async IO
      • Pager starts IO
      • Immediately return VM_PAGER_PEND
      • Process continue to run while IO in progress
      • IO done called to finish IO
      • State of IO stored in its own memory area (uvm_aiodesc)
        • Number of page
        • Kva
        • Pointer to done function
        • Linked list
        • Pointer to Pager dependent data
    • PGO SYNCIO – flag to specify synchonous IO
    • Steps in Async IO
      • Memory allocated to store state information
      • Starts IO
      • Returns VM PAGER PEND
      • IO completed – interrupt generated
      • Handler – uvm_aiodesc is placed on global list of async IO
      • Pagedeamon waked
      • Page deamon calls done functions of all structures in the list
      • Done function un-busy pages involved in IO
      • Wake processes waiting for IO to be complete
  • Release page operations
    • Called when a released page is encountered
    • Page is released when a process tries to free a page but finds it busy
    • Return value
      • If releasing page made the object to be terminated – False
      • True indicating the object is alive

BSD VM vs UVM Pager

  • Data structure layout
    • Significant difference data structures
    • BSD VM used multiple structures that pointed to each other. UVM had everything inside the Vnode structure.
    • Figure to show hierarchy
  • Page access
    • UVM – all accesses to pages are through a pager’s get function.
      • Process fetching the data does not allocate anything
      • Pager allocates a free page if it needs

Chapter 7 : Moving Data in UVM

Intro

  • Page loanout to safely loan readonly copies of its memory pages
  • Page transfer allows the process to recieve pages from kernel or other processes
  • Map entry passing allows processes to move chunks of va space between them
  • Transfer a page through copying
    • First ensure that the source and destination are properly mapped
    • Copy byte by byte the data
  • Through virtual memory
    • Ensure that page is read only
    • Establish a second mapping of the page at the dest va
    • complicated
      • Use copy on write to prevent corruption
      • Kernel IO pages must be wired
      • Only useful for large chunks of data

Page loanout

  • Traditional kernel
    • Data copied from process address space to kernel buffer
    • Buffer handed off to its destination
    • Copy adds overhead to IO
    • Data must be copied
      • Data may not be resident
        • Kernel faults at any non resident pages
        • Non resident pages don’t interfere with IO
      • Process can modify data while IO in progress
        • Kernel makes copy of data
      • Data may be flushed or paged out
        • Data must be resident during IO
  • Intro
    • With loanout
      • Send data directly without copying to intermediate buffer
      • Reduce IO overhead
    • Abstract level
      • Mark page read only
      • Increment page loan counter
      • Map the page
  • Loan Types and attributes
    • Loan out categories
      • object to kernel: A page from a uvm object loaned out to the kernel.
      • anon to kernel: A page from an anon loaned out to the kernel.
      • object to anon: A page from a uvm object loaned out to an anon.
      • anon to anon: A page from an anon loaned out as an anon.
    • All pages are loaned read only
      • In order to modify data in a loaned out page
      • Need to terminate the loan
        • Involves allocation a new page
        • copying data from loaned paged to new page
        • Use new page
      • Lot of events cause this to happen
    • Pages loaned to kernel
      • Must be resident
      • Must be wired
        • If pagedaeamon pageout the page. The kernel would segfault
      • Pages are entered into the kernel’s pmap
        • Prevents page based pmap operations
        • UVM is able to do normal VM operations without worrying about kernel’s loaned out page
    • Pages loaned from object to anon
      • These pages are referenced by both object and anon
      • They are owned by the object – hence need to lock the object to lock fields of the page
      • An uvm object’s page can only be loaned to one anon.
        • Further loans only require a reference to the anon
    • Ownerless pages
      • Object owns a loaned page frees it
      • Page cant place it on the free list – since another process is using it
      • Page becomes ownerless
      • If loan count becomes 0 then it is put on the free list
  • Loaning and locking
    • To lock a page we need to lock the owner.
    • Lock ordering
      • UVM holding lock on anon needs to lock the object.
      • It can only try
      • Else it needs to drop the lock and try again,
    • Anon with page holds no reference to the object
      • While trying to lock the object
      • Possible for the object to terminate
      • Prevention
        • Page queues must be locked
        • Prevents page’s loan count from changing
    • Option of allowing a anon to hold a reference dropped
      • Possibility of deadlock
  • Loanout data structures and functions
    • A new loan count field was added
    • Page is loaned if loan count is non zero
    • Page owner field changed – now its a structure
    • Uvm_loan – takes a range of va in a vm_map
      • Returns array of pointers to pages or anons depending on the requester
      • Can fail
        • Part of va space is unmapped
        • Unable to get some pages
    • Drop loan with uvm_unloan page/anon
  • Anonget and loaned pages
    • Anon can point to a page loaned by a uvm_object
    • Anon doesn’t own the page
      • Anonget lock the object
    • Anon points to an ownerless page
      • Causes anon to get ownership of the page.
  • Loanout procedure
    • Object to kernel
      • Object is locked
      • Page fetched from the object
        • If not present – unlocked and IO
      • Page queues are locked
      • If loan counter is 0
        • Globally write protect the page
      • Page is wired
      • Unlock
    • Anon to kernel
      • Anon locked
      • Uvmfault_anonget called to make anon’s page resident
        • If page on loan
          • Object also locked
      • Page queues are locked
      • If needed page is write protected
      • Loan count incremented
      • Wired
      • Unlock
    • Object to anon
      • Object is locked
      • Page is looked up with the pager get function
      • If not resident then it is fetched
      • If loaned to a different anon
        • Locks the anon
        • Increments the reference count
        • Unlocks the anon object
        • Return pointer to anon
      • If not loaned
        • New anon allocated
      • Page queue is locked
      • Page is write protected if needed
      • A bi directional link established anon and page
      • Increase the loan count
    • Anon to anon
      • Lock anon
      • Increment reference count
      • Unlock anon
      • Return anon
  • Dropping and freeing loaned out pages
    • Pages loaned to a anon
      • Locking anon
      • Decrementing ref count
      • Unlocking anon
    • If ref cnt == 0
      • Anon is freed with uvm_anfree function
      • Uvm_anfree handles pages that have been loaned to anon
      • If page loan count > 0
        • First tries to lock the true owner of the page
      • Else anon takes over ownership
      • Page is owned by a uvm_object
        • Page queues locked
        • Loan count decremented
        • Page pointer nulled
      • Loaned to kernel
        • Uvm_pagerfree and anon is freeed
        • Page is ownerless
  • Pagedeamon and loaned out pages
    • Pages out and frees pages when free physical memory is scarce
    • Special care taken when it encounters a loaned out page
      • Loaned to anon from uvm object
    • Steps
      • Try to lock the page’s uvm object
      • If lock fails, page daemon will skip to next page
      • If page dirty – it will start a pageout IO
      • If page clean – then it call uvm_pagefree
    • Page daemon can also detect ownerless pages
  • Faulting on a uvm_object with loaned out pages
    • Fault routine must be aware of loaned pages
    • Must ensure
      • Pages are never mapped read – write
      • Write faults on loaned pages cause loans to be broken
    • If fault is a read fault on a loaned page
      • Ensure that mapped read only
    • If write fault
      • Loan must be broken
    • To break the loan
      • Object is locked
      • New page is allocated
      • Data from old page is copied to new page
      • All mappings of the old page is removed
      • Old page is dropped from the object – ownership is dropped
      • New page is installed in the object at old page’s offset
      • Newly allocated page replaced old page in the object
      • Old page will be ownerless and will be freed when the process loaned to finishes with it.
  • Faulting on an Anon with loaned out pages
    • Special handling by the fault routine
    • uvmfault_anonget on a anon that points to a loaned ownerless page
      • Anon will take ownership
    • With ownered page
      • Lock object
      • If read fault
        • Page entered into faulting process
        • Pmap read only
        • Even if reference count of anon is 1
      • Write fault
        • Break the loan
        • Anon with ref cnt > 1
          • Copy on write
          • New anon created
          • Data copied
        • Anon with ref cnt = 1
          • Allocate new page
          • Wake pagedeamon if no pages available
          • Copy data
          • Remove old page from user pmap
          • Locks page queues
          • Decrements loan count
          • Zeroes page pointer
          • New page into the anon and fault can be resolved
  • Using page loanout
    • To quickly transfer a large chunk of data between two process. The data can be loaned to anons and the anons inserted into the target
    • Is used to improve both network and device IO
    • Can be used to partly replace the kernel physio interface.
      • Physio – device to read and write directly from a process memory without making a copy
        • Encapsulation memory into buffer
        • Starting an IO operation
        • Waiting IO operation
      • Two problems
        • Assumption that no other process is performing IO
        • Only supports synchronous IO
      • Modified
        • Write operation invokes physio
        • Physio – uvm_loan to loan out the process’s buffers to the kernel
        • Loaned pages start Async IO
        • Function returns

Page Transfer

  • Allow kernel to inject pages of memory into a process’s va space
  • Types
    • Kernel Page transfer
    • Anonymous page transfer
  • Kernel page transfer
    • When a page sized buffer from kernel passed to a process’’s address space
    • Audio device driver
    • Networking subsystem
  • Anonymous Page transfer
    • Pages to be transefered are already a part of an anon
    • Can be used to page loanout as a part of IPC mechanism
      • Improve performance of read syscalls if the buffer is page aligned and page sized
      • Buffers can be transfered using anons
    • Gives page-level granularity for mappings
  • Disposing of transfered data
    • Munmap syscall will not only unmap the transferred data but also the amap with it.
      • System need to allocate new amap for further transfers
    • Hence new anflush syscall
      • Used to restore an area of anonymous memory to zero-fill state

Map entry passing

  • Exchange data using pipes or shared memory
  • Pipes are implemented as a pair of network sockets
    • Sending process writes on pipe
    • Copied to kernel to mbuf
    • Placed on queue for the recieving process
    • Data copied from buf to the process
  • Shared memory
    • System 5 api
      • Shared memory segment
      • Other processes can attach to
    • Mmap – MAP_SHARED
      • Files memory can be seen by all processes
    • VM with inheritance code share
      • Fork
  • New method – map entry passing
    • More flexible
    • Range of va to be shared
      • Multiple memory mappings and unmapped areas of vm
  • Export and import of VM
    • Map entry passing can export a range. Recving process can import
    • Mexpimp_info
      • Describes block of vm to export
      • Base and len field specify range
      • Type
        • Share
        • Copy
        • Donate
        • Donate with 0
    • On successful export
      • Tag structure is created – mexpimp_tag
      • Used to look up exported region of memory and import it
    • System calls for import and export
      • Mexport – mimport
      • Return zero on success, -1 on error
    • Export
      • Exporting process fill out the tag structure
      • Kernel fill out the tag structure with tag of exported region
    • Import
      • Fills tag structure and calls import
      • Memexp_info not null
        • Export the region of memory
  • Implementation of map entry passing
    • Map extraction function extracts part of the virtual memory mapped by a map and places another map.
    • Allows a process to access another process’s workspace
    • Uvm_map_extract is called with a source map, source va and a couple of other arguments
      • Flags are
        • Remove – remove from source after transfer
        • Zero – make zero fill
        • contig – abort if there are unmapped regions
        • Qref – quick references
        • Fixprot – area set to max protection
    • Uvm_map_extract :
      • Sanity check of uvm_map_extract’s arguments is performed.
        • Address must be on a page boundary
        • length multiple of the page size
      • Virtual space is reserved in the destination map
      • Source map searched for entry that contains the starting va
      • Each source map entry in range is copied into a list of map entries that will be inserted
      • New map entries are inserted in the destination
  • Usage of map entry fixing
    • Move large chunks without data copying overhead.
    • More flexible than shared memory. Does Not need to be a contiguous chunk of anonymous memory.
    • More effective than memory shared with mmap since it doesn’t require interaction with the file system layer.
    • Allows a process to grant another process to only part of the file

Chapter 8 : Secondary design elements

  • Amap chunking
    • BSD kernel
      • Size of stack determined by the process’ resource limits
      • Kernel reserves space in each process’ map for the largest possible stack. This requires 2 entries.
        • Current stack – zero fill read write
        • Space from end of stack to the hard limit zero-fill no-access
      • Size of vm_object is constant.
      • Amount of kernel memory associated with the amap increases with the size of the amap.
    • UVM
      • Breaks very large amaps into smaller ones when possible
      • Current chunk size – 64 kb
        • Requires ~200 bytes of kernel memory
        • Amap_copy has parameter which decides whether chunking is allowed
      • Benefits
        • Kernel memory saved during process creation
  • Clustered Anonymous memory pageout
    • BSD
      • When low memory – pagedeamon is invoked.
      • Pages with file data – paged out to backing store
      • Anonymous Pages – to the swap area
      • Types of pages
        • Clean pages – never modified
        • Dirty pages – modified. Different from backing store
      • Pageout dirty page
        • IO invoked
        • Page written to backing store
        • Page deamon frees the page
      • During IO
        • More efficient to transfer multiple contigous pages at the sae time
        • This is a clustered pageout
      • All pageable anonymous memory is a part of a vm_object
        • Pager is swap pager
        • Takes the object and divides it into swap blocks
          • Fixed size
          • One or more pages in length
          • Has contigous location in the swap area
          • Assignment is static – always paged out to that location
    • UVM
      • No permanent home on backing store.
      • Aggressively clusters anonymous memory
        • Page daemon can reassign page out location
        • Hence can collect many dirty pages for a pageout
  • Anonymous Memory object pager
    • Anonymous memory is available in either of the two levels
      • Object layer – uvm_object backed by anonymous memory
      • Managed by the aobj pager hence called aobj_object.
    • Uvm_aobj structure
      • State information for a aobj_object
      • Keeps track of
        • Size of the object
        • Page out location in the backing store
      • Contains
        • Size of the object
        • Flags
        • Offset of the location on swap
        • Linked list of all aobj-objects (active)
      • Creation
        • Pager examines the size of the object
          • Small – allocated an array with an entry for each page
          • Entry – location on swap
          • Large – allocates a hash table
    • Aobj Pager Functions
      • Attach and IO functions
      • Uao_create
        • Creates a new aobj-based uvm_object
        • Aobj-object remain allocated until reference count is zero
      • UVM pageouts are handled by pagedaemon
        • Aobj put function not used
        • Get function – the front end that calls out to functions that handle IO to swap area.
  • Device Pager
    • Allows device memory to be mapped by user processes
    • Attach function
      • Takes a device specifier and returns the uvm_object for that device.
      • If a uvm_object is already present for the device its reference is increased.
    • Pager maintains list of all the active device objects
    • The device pager does no IO
      • Fault function –
        • Process faults on device memory
        • Device pager fault routine gets control
        • Consults driver’s d_mmap to get correct page to map in
        • Calls pmap_enter to resolve the fault
  • Vnode Pager
    • BSD
      • Vm_pager_allocate called with vnode as an argument. Calls BSD VM vnode pager.
      • Checks if the vnode has a vm_pager associated
        • Yes,
          • Allocate looks up the vm_object of the pager
          • Gains a reference
          • Returns pointer to pager
        • No,
          • Mallocs new vm_pager, vn_pager and vm_object
          • Ties them together
          • Returns pointer to the freshly allocated pager
      • Pager used by vm_mmap to lookup vnode’s vm_object (again)
        • Shared mappings
          • Vm_mmap calls the vm_allocate_with_pager
          • Enters object in map structure
        • Cow mappings
          • Object entered in the map structure with correct object chain configuration
      • Vm_object maintains an active reference to the vnode.
      • Object cache
        • Allows vm_object structures to remain active for a time after last reference is dropped
        • Such objects are called persisting
        • Useful for maps for files like /bin/ls
        • There is a upper limit for the number of objects
      • Vnode layer also has a caching mechanism
      • BSD VM has 2 layers of caching code and they interfere with each other.
    • UVM
      • Object cache removed
      • Supports persisting objects using the vnode cache.
      • All vm related data structures are embedded within the vnode structure
      • Vnode attch function is called with vnode as an argument. Returns a pointer to the uvm_object
        • If not in use
          • Initialize the object
          • Gain a reference
          • Return pointer
        • If already active
          • Check the reference count
          • If zero – persisting
            • Reference count incremented
            • Reference gained
          • Else
            • Increment ref count
      • Uvm_map is called to map the object with req. attributes
      • When the final reference is dropeed the vnode is moved to the cache.
      • Uvm is notified when a vnode is being removed from the vnode cache
      • If it is removed from the cache then it frees all the pages and marks it invalid
    • Vnode helper functions
      • Several helper functions that are called by different parts of the kernel.
      • Setsize function –
        • Change the size of a file
      • Umount helper – Not present in UVM
        • Filesystem being unmounted
        • All vnodes associated with the filesystem must be removed
        • Problem when there is a unreferenced vnode with a object persisting in the cache.
        • Not needed in uvm – since there is no object cache
      • Uncache function –
        • Ensure vnode’s data will not persist when last reference is dropped
      • Terminate – present only in UVM
        • Used to clean a vnode’s VM data during recycling.
      • Sync helper –
        • Allow modified data to be periodically flushed out to backing store.
        • Builds a list of active vnode objects
        • Flushes all pages in them to backing store
        • UVM vnode pager also has this function
          • Uvm_attach also requires access level (rw)
          • Hence it maintains list of writable objects
          • This shortens the number
  • Memory Mapping functions
    • kernel system calls
      • exec: maps a newly executed program’s text, data, and bss areas into the process’ address space.
      • mmap: maps a file or anonymous memory into a process’ address space.
      • obreak: grows a process’ heap by mapping zero-fill memory at the end of the heap.
      • shmat: maps a System V shared memory segment into a process’ address space.
    • Call path diagram
    • BSD has 8 functions whereas uvm has 5.
    • Uvm-map is used by all system calls
      • Makes code cleaner
      • Maps needed bytes of passed object into the given map.
      • The actual address used for the mapping is returned
      • Flag can specify protection, inheritance, advice etc
    • BSD can only make a mapping with default protection and inheritance
      • It then uses the map_protect function to change the protection.
      • This is dangerous and wasteful
  • Unmapping memory
    • BSD
      • vm_map _delete
        • Locks map
        • Searches for map entries to remove
        • Unlocks the map
      • If there are dirty pages then a Async IO is started and the unmap function may exit
      • If the pager doesn’t have async IO then there is a problem
    • UVM
      • Uvm_unmap and uvm_unmap_remove
      • Uvm_unmap locks the map
      • Calls uvm_unmap_remove
        • Cleans out the map’s pmap
        • Removes the specified map entries
        • Doesn’t drop references to mapped objects
        • Returns list of map entries removed
      • Uvm_unmap unlocks the map
      • Calls uvm_unmap_detach
        • Drops references
        • Frees map entries
  • Kernel Memory management
    • Contains kernel’s text, data and bss
    • Special set of functions used to establish and remove private kernel mappings
    • Uvm_km_alloc – allocates wired memory in kernel va
      • Memory is uninialized
    • Uvm_km_zalloc – same as above
      • Memory is initialized to 0
    • uvm_km_valloc – allocates zero fill memory in kernel va
    • Uvm_km_valloc_wait – waits if map out of vm
    • uvm_km_kmemalloc – low level memory allocatr
      • Allocated wired memory
    • Uvm_km_free – front end for the unmap function
    • Uvm_km_free_wakeup – after freeing it wakes up any processes waiting for space
    • Kernel memory object-
      • Kernel-object – full address space of the kernel
        • Used while allocating to find free virtual addresses
      • Advantages
        • All private pages are collected into a few object
        • Reduces map entry fragmentation
    • Difference
      • Kernel map is accessed to establish a private kernel mapping
        • Offset within the kernel object depends on the address chosen
        • Steps
          • Lock the map
          • Establish the mapping
            • Find space in the kernel’s map
            • Insert a mapping from the va allocated
          • Unlocks the map
        • Uvm passes uvm_unkown_offset to the uvm_map
      • Allocation of pageable kernel vm
        • mapped zero-fill with no physical memory assigned until a page fault
        • BSD
          • Allocate mapping in a null object cow
        • UVM
          • Uses the unknown offset method
          • Allows it use the kernel object
      • Operation of the low level kernel memory allocator
        • BSD
          • 2 step mapping to determine va
          • Pages of memory are allocated in the object
          • Pages in the object are looked up and entered in the pmap
          • Wasteful
            • Two loops
            • Doesn’t save the pointers
        • UVM
          • Establishes mapping with the uvm_map call
          • Enters a loop
            • Allocates page
            • Maps it in
            • Continues
  • Wired memory
    • Memory that is not paged out to the backing store
    • Reasons
      • Process can wire using mlock syscalls. For time critical operations
      • Used to store kernel’s code segments – even page fault handlers
      • Process’s proc and user structure
      • Sysctl copies data to a user buffer. This must be wired
      • Resident memory for physio is wired
    • Each vm_page has a wire count
      • Is > 0 then page is wired
      • Removed from page queues
    • Map entry has a wire count
      • If only a part is wired then it need to be put in a new map entry for that area
    • Causes map entry fragmentation
    • UVM
      • Preventing map entry fragmentation
        • Coalesce adjoining map entries
        • Avoid fragmentation – UVM method
      • Advantage
        • No need to coalesce each time
        • No need for extra code to coalesce
      • UVM stores the data somewhere else
      • Approach
        • Private kmem –
          • Always wired
          • Only in kernel map
          • Wired state only stored in vm_page structure s
        • Pageable kmem –
          • Used for processes’ user structure
          • Only flag and wire counters in the vm_map structure are adjusted properly
        • Sysctl and physio –
          • Memory is wired
          • Operation performed
          • Memory unwired
          • Wired state of memory is only stored in the processes’ kernel stack
      • Only time a map entry’s wire counter is used during sysctl and physio syscalls
  • Page Flags
    • Each page has a set of boolean flags
    • UVM
      • Two integers
        • Flags
        • Pqflags
      • Flags is locked by the object owning the page
      • Pqflags locked by the page queue
  • VM system startup
    • Physical memory config
      • Determine how physical memory is configured
        • Contiguous or not contiguous
      • Contiguous
        • Must define start and end of physical and kernel virtual memory
      • Non contiguous
        • pmap virtual space: returns values that indicate the range of kernel virtual space that is available for use by the VM system.
        • pmap free pages: returns the number of pages that are currently free.
        • pmap next page: returns the physical address of the next free page of available memory.
        • pmap page index: given a physical address, return the index of the page in the array of vm page structures.
        • pmap steal memory: allocates memory. This allocator can be used only before the VM system is started.
    • Physical memory configuration interface
      • Physical memory managed using statically allocated array of memory segment descriptors
        • Contiguous – array has 1 element
        • Non contiguous – hardware dependent
      • Each entry
        • Start and end of physical memory
        • Pointers to array of vm_page for that memory
      • Noncontiguous
        • Search algorithms
          • Random
          • Bsearch
          • Bigfirst
    • UVM startup procedure
      • Uvm_init is used to bring the rest of the system
      • This is called by the kernel’s main function.
    • UVM boot steps
      • Global data structures are intialized
      • Page sub system is brought up
        • Page queues
        • Allocating vm_page structures
      • Private pool of statically allocated maps
      • Kernel’s vm data structures
        • Map and submap
        • Kernel object
      • Machine dependent pmap_init called
      • Kernel memory allocator malloc is initialized
      • Pagers are initialized
      • An initial pool of anons is allocated
  • New pmap interface
    • New API to support page loanout to kernel
    • Page functions
      • Pmap_page_protect – used to write protect all active mappings of a page
      • Earlier, the Physical address is passed – from the vm_page structure
      • Current API takes the pointer to a vm_page
    • Kernel pmap enter functions
      • Mapping a page faster by not entering page’s mapping on the list of mappings for that page

 

There are two more chapters left

  • Implementation methods – It’s mostly how they used to debug UVM. I will take a look at it and add a separate article after trying it out.
  • Results – mostly Statistics on the performance. Worth checking out.

That’s the end of the Dissertation. Cheers!

Advertisements

2 thoughts on “Notes on the Design and Implementation of UVM

    • R3x says:

      The dissertation is pretty written (Though it is a bit old, there are not many major changes ) and the code is really well documented. You might wanna go through them both.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s