CSC373/406: Lecture 11 (Final Exam)

Contents [0/13]

1 [1/13]
2 [2/13]
3 [3/13]
4 [4/13]
5 [5/13]
6 [6/13]
7 [7/13]
8 [8/13]
9 [9/13]
10 [10/13]
11 [11/13]
12 [12/13]
13 [13/13]

1 [1/13]

Figure 1.3: The compilation system.

intro-compilation

Figure 1.4: Hardware organization of a typical system.

intro-hardware

Figure 1.5: Reading the hello command from the keyboard.

intro-keyboardread

Figure 1.6: Loading the executable from disk into main memory.

intro-helloload

Figure 1.7: Writing the output string from memory to the display.

intro-displaywrite

Figure 1.8: Cache memories.

intro-cachebus

Figure 1.9: An example of a memory hierarchy.

intro-memhier

Figure 1.10: Layered view of a computer system.

intro-layers

Figure 1.11: Abstractions provided by an operating system.

intro-abstractions

Figure 1.12: Process context switching.

intro-switch

Figure 1.13: Process virtual address space.

intro-rtimage

Figure 1.14: A network is another I/O device.

intro-nethost

Figure 1.15: Using telnet to run hello remotely over a network.

intro-telnet

2 [2/13]

Figure 2.11: Conversion from two's complement to unsigned.

data-t2u

Figure 2.12: Conversion from unsigned to two's complement.

data-u2t

Figure 2.14: Integer addition.

data-add4

Figure 2.15: Relation between integer addition and unsigned addition.

data-uadd-ovf

Figure 2.16: Unsigned addition.

data-uadd4

Figure 2.17: Relation between integer and two's-complement addition.

data-tadd-ovf

Figure 2.19: two's-complement addition.

data-tadd4

Figure 2.22: Representable values for six-bit floating-point format.

data-fp-values

Figure 2.22: Representable values for six-bit floating-point format.

data-fp-values-small

3 [3/13]

Figure 3.5: Illustration of stack operation.

asm-stack

Figure 3.17: Stack frame structure.

asm-frame

Figure 3.19: Stack frames for caller and swap_add.

asm-swap

Figure 3.22: Stack frame for recursive Fibonacci function.

asm-fib-frame

Figure 3.28: Stack organization for echo function.

asm-buf-frame

4 [4/13]

Figure 4.1: Y86 programmer-visible state.

arch-state

Figure 4.2: Y86 instruction set.

arch-isa

Figure 4.3: Function codes for Y86 instruction set.

arch-icodes

Figure 4.8: Logic gate types.

arch-logic-gates

Figure 4.9: Combinational circuit to test for bit equality.

arch-logic-bit-equal

Figure 4.10: Single-bit multiplexor circuit.

arch-logic-bit-mux

Figure 4.11: Word-level equality test circuit.

arch-logic-word-equal

Figure 4.12: Word-level multiplexor circuit.

arch-logic-word-mux

Figure 4.13: Arithmetic/logic unit (ALU).

arch-logic-alu

Figure 4.14: Register operation.

arch-eg-pipe-reg

Figure 4.20: Abstract view of SEQ, a sequential implementation.

arch-seq-abs

Figure 4.21: Hardware structure of SEQ, a sequential implementation.

arch-seq-full

Figure 4.23: Tracing two cycles of execution by SEQ.

arch-seq-flow

Figure 4.25: SEQ fetch stage.

arch-seq-fetch

Figure 4.26: SEQ decode and write-back stage.

arch-seq-decode

Figure 4.27: SEQ execute stage.

arch-seq-execute

Figure 4.28: SEQ memory stage.

arch-seq-memory

Figure 4.29: SEQ PC update stage.

arch-seq-pc

Figure 4.30: SEQ+ abstract view.

arch-seq+-abs

Figure 4.31: SEQ+ hardware structure.

arch-seq+-full

Figure 4.32: Unpipelined computation hardware.

arch-eg-pipe0

Figure 4.33: Three-stage pipelined computation hardware.

arch-eg-pipe3

Figure 4.34: Three-stage pipeline timing.

arch-eg-pipe3-key

Figure 4.35: One clock cycle of pipeline operation.

arch-eg-pipe3-cyc

Figure 4.36: Limitations of pipelining due to nonuniform stage delays.

arch-eg-pipe3nu

Figure 4.37: Limitations of pipelining due to overhead.

arch-eg-pipe6

Figure 4.38: Limitations of pipelining due to logical dependencies.

arch-eg-pipefb

Figure 4.39: Abstract view of PIPE-, an initial pipelined implementation.

arch-pipe--abs

Figure 4.40: Example of instruction flow through pipeline.

arch-basic

Figure 4.41: Hardware structure of PIPE-, an initial pipelined implementation.

arch-pipe--full

Figure 4.42: Pipelined execution of prog1 without special pipeline control.

arch-prog1-uncontrol

Figure 4.43: Pipelined execution of prog2 without special pipeline control.

arch-prog2-uncontrol

Figure 4.44: Pipelined execution of prog3 without special pipeline control.

arch-prog3-uncontrol

Figure 4.45: Pipelined execution of prog4 without special pipeline control.

arch-prog4-uncontrol

Figure 4.46: Pipelined execution of prog2 using stalls.

arch-prog2-stall

Figure 4.47: Pipelined execution of prog3 using stalls.

arch-prog3-stall

Figure 4.48: Pipelined execution of prog4 using stalls.

arch-prog4-stall

Figure 4.49: Pipelined execution of prog2 using forwarding.

arch-prog2-forward

Figure 4.50: Pipelined execution of prog3 using forwarding.

arch-prog3-forward

Figure 4.51: Pipelined execution of prog4 using forwarding.

arch-prog4-forward

Figure 4.52: Abstract view of PIPE, our final pipelined implementation.

arch-pipe-abs

Figure 4.53: Hardware structure of PIPE, our final pipelined implementation.

arch-pipe-full

Figure 4.54: Example of load/use data hazard.

arch-prog5-hazard

Figure 4.55: Handling a load/use hazard by stalling.

arch-prog5-stall

Figure 4.56: PIPE PC selection and fetch logic.

arch-pipe-fetch

Figure 4.57: PIPE decode and write-back stage logic.

arch-pipe-decode

Figure 4.58: Demonstration of forwarding priority.

arch-prog6-forward

Figure 4.59: PIPE execute stage logic.

arch-pipe-execute

Figure 4.60: PIPE memory stage logic.

arch-pipe-memory

Figure 4.61: Simplified view of ret instruction processing.

arch-prog7-simple

Figure 4.62: Actual processing of the ret instruction.

arch-prog7

Figure 4.63: Processing mispredicted branch instructions.

arch-prog8

Figure 4.65: Additional pipeline register operations.

arch-eg-pipe-reg-full

Figure 4.67: Pipeline states for special control conditions.

arch-control-combination

Figure 4.68: PIPE pipeline control logic.

arch-pipe-control

Figure 4.69: Execute and memory stages capable of load forwarding.

arch-pipe-lf

5 [5/13]

Figure 5.2: Performance of vector sum functions.

opt-cpe-example

Figure 5.3: Vector abstract data type.

opt-vec-adt

Figure 5.8: Comparative performance of lower-case conversion routines.

opt-lower-linear

Figure 5.11: Block diagram of a modern processor.

opt-processor

Figure 5.13: Operations for first iteration of inner loop of combine4 for integer multiplication.

opt-ipc-stage0

Figure 5.14: Operations for first iteration of inner loop of combine4 for integer addition.

opt-isc-stage0

Figure 5.15: Scheduling of operations for integer multiplication with unlimited number of execution units.

opt-ipc-i

Figure 5.16: Scheduling of operations for integer addition with unbounded resource constraints.

opt-isc-i

Figure 5.17: Scheduling of operations for integer multiplication with actual resource constraints.

opt-ipc-r

Figure 5.18: Scheduling of operations for integer addition with actual resource constraints.

opt-isc-r

Figure 5.20: Operations for first iteration of inner loop of three-way unrolled integer addition.

opt-isc3-stage0

Figure 5.21: Scheduling of operations for three-way unrolled integer sum with bounded resource constraints.

opt-isc3-r

Figure 5.25: Operations for first iteration of inner loop of two-way unrolled, two-way parallel integer multiplication.

opt-ipc2x2-stage0

Figure 5.26: Scheduling of operations for two-way unrolled, two-way parallel integer multiplication with unlimited resources.

opt-ipc2x2-i

Figure 5.31: Scheduling of operations for list length function.

opt-ll-i

Figure 5.33: Code to write and read memory locations, along with illustrative executions.

opt-wr-sim

Figure 5.34: Detail of load and store units.

opt-load-store

Figure 5.35: Timing of write_read for example A.

opt-wr-i

Figure 5.36: Timing of write_read for example B.

opt-wrs-i

Figure 5.37: Profile results for different version of word frequency counting program.

opt-prof-all

Figure 5.37: Profile results for different version of word frequency counting program.

opt-prof-fast

6 [6/13]

Figure 6.1: Inverted pendulum.

mem-pendulum

Figure 6.3: High level view of a 128-bit 16 x 8 DRAM chip.

mem-dramarray

Figure 6.4: Reading the contents of a DRAM supercell.

mem-dramras

Figure 6.4: Reading the contents of a DRAM supercell.

mem-dramcas

Figure 6.5: Reading the contents of a memory module.

mem-drammodule

Figure 6.6: Typical bus structure that connects the CPU and main memory.

mem-membus

Figure 6.7: Memory read transaction for a load operation: movl A,%eax.

mem-memread1

Figure 6.7: Memory read transaction for a load operation: movl A,%eax.

mem-memread2

Figure 6.7: Memory read transaction for a load operation: movl A,%eax.

mem-memread3

Figure 6.8: Memory write transaction for a store operation: movl %eax,A.

mem-memwrite1

Figure 6.8: Memory write transaction for a store operation: movl %eax,A.

mem-memwrite2

Figure 6.8: Memory write transaction for a store operation: movl %eax,A.

mem-memwrite3

Figure 6.9: Disk geometry.

mem-surface

Figure 6.9: Disk geometry.

mem-cylinder

Figure 6.10: Disk dynamics.

mem-arm

Figure 6.10: Disk dynamics.

mem-cylinderarms

Figure 6.11: Typical bus structure that connects the CPU, main memory, and I/O devices.

mem-iobus

Figure 6.12: Reading a disk sector.

mem-diskread1

Figure 6.12: Reading a disk sector.

mem-diskread2

Figure 6.12: Reading a disk sector.

mem-diskread3

Figure 6.16: The increasing gap between DRAM, disk, and CPU speeds.

mem-cpumemgap

Figure 6.21: The memory hierarchy.

mem-memhier

Figure 6.22: The basic principle of caching in a memory hierarchy.

mem-cacheconcept

Figure 6.24: Typical bus structure for L1 and L2 caches.

mem-cachebus

Figure 6.25: General organization of cache (S, E, B, m).

mem-cacheorg

Figure 6.25: General organization of cache (S, E, B, m).

mem-physaddr

Figure 6.27: Direct-mapped cache (E = 1).

mem-dmcacheorg

Figure 6.28: Set selection in a direct-mapped cache.

mem-dmcacheindex

Figure 6.29: Line matching and word selection in a direct-mapped cache.

mem-dmcachematch

Figure 6.31: Why caches index with the middle bits.

mem-middlebits

Figure 6.32: Set associative cache (1 < E < C/B).

mem-sacacheorg

Figure 6.33: Set selection in a set associative cache.

mem-sacacheindex

Figure 6.34: Line matching and word selection in a set associative cache.

mem-sacachematch

Figure 6.35: Fully set associative cache (E = C/B).

mem-facacheorg

Figure 6.36: Set selection in a fully associative cache.

mem-facacheindex

Figure 6.37: Line matching and word selection in a fully associative cache.

mem-facachematch

Figure 6.38: A typical multi-level cache organization.

mem-multilevel

Figure 6.42: The memory mountain.

mem-xeonmountain

Figure 6.43: Ridges of temporal locality in the memory mountain.

mem-xeoncaches

Figure 6.44: A slope of spatial locality.

mem-xeonlines

Figure 6.47: Pentium III Xeon matrix multiply performance.

mem-xeonmm

Figure 6.49: Graphical interpretation of blocked matrix multiply

mem-bmmidea

Figure 6.50: Pentium III Xeon blocked matrix multiply performance.

mem-xeonbmm

7 [7/13]

Figure 7.2: Static linking.

link-linker

Figure 7.3: Typical ELF relocatable object file.

link-elfrelo

Figure 7.7: Linking with static libraries.

link-staticlibs

Figure 7.11: Typical ELF executable object file

link-elfexec

Figure 7.13: Linux run-time memory image

link-rtimage

Figure 7.15: Dynamic linking with shared libraries.

link-sharedlibs

8 [8/13]

Figure 8.1: Anatomy of an exception.

ecf-exception

Figure 8.2: Exception table.

ecf-intvec

Figure 8.3: Generating the address of an exception handler.

ecf-intaddr

Figure 8.5: Interrupt handling.

ecf-interrupt

Figure 8.6: Trap handling.

ecf-trap

Figure 8.7: Fault handling.

ecf-fault

Figure 8.8: Abort handling.

ecf-abort

Figure 8.10: Logical control flows.

ecf-process1

Figure 8.11: Process address space.

ecf-addrspace

Figure 8.12: Anatomy of a process context switch.

ecf-switch

Figure 8.14: Examples of fork programs.

ecf-fork1

Figure 8.14: Examples of fork programs.

ecf-fork2

Figure 8.14: Examples of fork programs.

ecf-fork3

Figure 8.17: Organization of an argument list.

ecf-argv

Figure 8.18: Organization of an environment variable list.

ecf-envp

Figure 8.19: Typical organization of the user stack when a new program starts.

ecf-stackinit

Figure 8.24: Foreground and background process groups.

ecf-procgroups

9 [9/13]

Figure 9.1: Time scale of computer system events.

perf-timescale

Figure 9.2: System's vs.~applications view of time.

perf-app-vs-sys

Figure 9.4: Graphical representation of trace in Figure ....

perf-events-l1

Figure 9.6: Graphical representation of activity periods for trace in Figure ....

perf-events-l2

Figure 9.7: Process timing by interval counting.

perf-interval

Figure 9.8: Measuring interval counting accuracy.

perf-interval-experiment

Figure 9.11: Measurements of long duration procedure under different loading conditions.

perf-process

Figure 9.12: measurements of short duration procedure under different loading conditions.

perf-process-small

Figure 9.14: Experimental validation of K-best measurement scheme on Linux system

perf-kbest-linux

Figure 9.15: Effectiveness of K-best scheme for different values of K.

perf-kbest-k1

Figure 9.15: Effectiveness of K-best scheme for different values of K.

perf-kbest-k2

Figure 9.15: Effectiveness of K-best scheme for different values of K.

perf-kbest-k3

Figure 9.15: Effectiveness of K-best scheme for different values of K.

perf-kbest-k5

Figure 9.16: Measurements with compensation for timer interrupt overhead.

perf-kbest-linux-comp

Figure 9.17: Experimental validation of K-best measurement scheme on IA32/Linux system with older version of the kernel.

perf-kbest-linux-old

Figure 9.18: Experimental validation of K-best measurement scheme on Windows-NT system.

perf-kbest-nt

Figure 9.19: Experimental validation of K-best measurement scheme on Compaq Alpha system.

perf-kbest-alpha

Figure 9.22: Experimental validation of K-best measurement scheme using gettimeofday function.

perf-kbest-tod

10 [10/13]

Figure 10.1: A system that uses physical addressing.

vm-physicaloverview

Figure 10.2: A system that uses virtual addressing.

vm-virtualoverview

Figure 10.3: How a VM system uses main memory as a cache.

vm-vaddrspace

Figure 10.4: Page table.

vm-pt

Figure 10.5: VM page hit.

vm-pthit

Figure 10.6: VM page fault (before).

vm-ptmissbefore

Figure 10.7: VM page fault (after).

vm-ptmissafter

Figure 10.8: Allocating a new virtual page.

vm-ptalloc

Figure 10.9: How VM provides processes with separate address spaces.

vm-separatespaces

Figure 10.10: The memory image of a Linux process.

vm-rtimage

Figure 10.11: Using VM to provide page-level memory protection.

vm-vmprotect

Figure 10.13: Address translation with a page table.

vm-addrtrans

Figure 10.14: Operational view of page hits and page faults.

vm-vmhit

Figure 10.14: Operational view of page hits and page faults.

vm-vmmiss

Figure 10.15: Integrating VM with a physically addressed cache.

vm-vmcache

Figure 10.16: Components of a virtual address that are used to access the TLB.

vm-tlbaddr

Figure 10.17: Operational view of a TLB hit and miss.

vm-tlbhit

Figure 10.17: Operational view of a TLB hit and miss.

vm-tlbmiss

Figure 10.18: A two-level page table hierarchy.

vm-multilevel

Figure 10.19: Address translation with a k-level page table.

vm-multiaddr

Figure 10.20: Addressing for small memory system.

vm-ataddr

Figure 10.21: TLB, page table, and cache for small memory system.

vm-attlb

Figure 10.21: TLB, page table, and cache for small memory system.

vm-atpt

Figure 10.21: TLB, page table, and cache for small memory system.

vm-atcache

Figure 10.22: The Pentium memory system.

vm-p6overview

Figure 10.23: Summary of Pentium address translation.

vm-p6addrtrans

Figure 10.24: Pentium multi level page table.

vm-p6multilevel

Figure 10.25: Formats of Pentium page directory entry (PDE) and page table entry (PTE).

vm-p6pde

Figure 10.25: Formats of Pentium page directory entry (PDE) and page table entry (PTE).

vm-p6pte

Figure 10.26: Pentium page table translation.

vm-p6ptmap

Figure 10.27: Pentium TLB translation.

vm-p6tlbtrans

Figure 10.28: The virtual memory of a Linux process.

vm-linuxrtimage

Figure 10.29: How Linux organizes virtual memory.

vm-linuxvm

Figure 10.30: Linux page fault handling.

vm-linuxfault

Figure 10.31: A shared object.

vm-sharedobj1

Figure 10.31: A shared object.

vm-sharedobj2

Figure 10.32: A private copy-on-write object.

vm-privateobj1

Figure 10.32: A private copy-on-write object.

vm-privateobj2

Figure 10.33: How the loader maps the areas of the user address space.

vm-execmap

Figure 10.34: Visual interpretation of mmap arguments.

vm-mmapargs

Figure 10.35: The heap.

vm-heapmap

Figure 10.36: Allocating and freeing blocks with malloc.

vm-mallocexa

Figure 10.36: Allocating and freeing blocks with malloc.

vm-mallocexb

Figure 10.36: Allocating and freeing blocks with malloc.

vm-mallocexc

Figure 10.36: Allocating and freeing blocks with malloc.

vm-mallocexd

Figure 10.36: Allocating and freeing blocks with malloc.

vm-mallocexe

Figure 10.37: Format of a simple heap block.

vm-implicitblock

Figure 10.38: Organizing the heap with an implicit free list.

vm-implicitlist

Figure 10.39: Splitting a free block to satisfy a three-word allocation request.

vm-implicitsplit

Figure 10.40: An example of false fragmentation.

vm-implicitfrag

Figure 10.41: Format of heap block that uses a boundary tag.

vm-boundarytags

Figure 10.42: Coalescing with boundary tags.

vm-coalesce1

Figure 10.42: Coalescing with boundary tags.

vm-coalesce2

Figure 10.42: Coalescing with boundary tags.

vm-coalesce3

Figure 10.42: Coalescing with boundary tags.

vm-coalesce4

Figure 10.44: Invariant form of the implicit free list.

vm-mmimplicitlist

Figure 10.50: Format of heap blocks that use doubly-linked free lists.

vm-linkedblocka

Figure 10.50: Format of heap blocks that use doubly-linked free lists.

vm-linkedblockf

Figure 10.51: A garbage collector's view of memory as a directed graph.

vm-gcmem

Figure 10.52: Integrating a conservative garbage collector and a C malloc package.

vm-cgc

Figure 10.54: Mark and sweep example.

vm-marksweepex

Figure 10.55: Left and right pointers in a balanced tree of allocated blocks.

vm-balancedtree

11 [11/13]

Figure 11.11: Typical kernel data structures for open files.

io-fileorg

Figure 11.12: File sharing.

io-filesharing

Figure 11.13: How a child process inherits the parent's open files.

io-afterfork

Figure 11.14: Kernel data structures after redirecting standard output by calling dup2(4,1)

io-dupafter

Figure 11.15: Relationship between Unix I/O, standard I/O, and RIO.

io-iofunctions

12 [12/13]

Figure 12.1: A client-server transaction.

netp-cliservsteps

Figure 12.2: Hardware organization of a network host.

netp-nethost

Figure 12.3: Ethernet segment.

netp-hub

Figure 12.4: Bridged Ethernet segments.

netp-bridge

Figure 12.5: Conceptual view of a LAN.

netp-lan

Figure 12.6: A small internet.

netp-internet

Figure 12.7: How data travels from one host to another on an internet.

netp-intertrans

Figure 12.8: Hardware and software organization of an Internet application.

netp-ipinternet

Figure 12.10: Subset of the Internet domain name hierarchy.

netp-domainnames

Figure 12.13: Anatomy of an Internet connection

netp-connection

Figure 12.14: Overview of the sockets interface.

netp-sockoverview

Figure 12.18: The roles of the listening and connected descriptors.

netp-accept

13 [13/13]

Figure 13.1: Step 1: Server accepts connection request from client.

conc-conc1

Figure 13.2: Step 2: Server forks a child process to service the client.

conc-conc2

Figure 13.3: Step 3: Server accepts another connection request.

conc-conc3

Figure 13.4: Step 4: Server forks another child to service the new client.

conc-conc4

Figure 13.7: State machine for a logical flow in a concurrent event-driven echo server.

conc-state

Figure 13.12: Concurrent thread execution.

conc-concthreads

Figure 13.17: IA32 assembly code for the counter loop in badcnt.c.

conc-badcntasm

Figure 13.19: Progress graph for the first loop iteration of badcnt.c.

conc-pg

Figure 13.20: An example trajectory.

conc-trajectory

Figure 13.21: Critical sections and unsafe regions.

conc-unsaferegion

Figure 13.22: Safe and unsafe trajectories.

conc-safetraj

Figure 13.23: Safe sharing with semaphores.

conc-pgsem

Figure 13.24: Producer-consumer model.

conc-prodconsdef

Figure 13.29: Organization of a prethreaded concurrent server.

conc-prethreaded

Figure 13.34: Relationships between the sets of reentrant, thread-safe, and nonthread-safe functions.

conc-threadsafe

Figure 13.39: Progress graph for a program that can deadlock.

conc-deadlock

Figure 13.40: Progress graph for a deadlock-free program.

conc-deadlock2

Revised: 2007/04/05 17:41