Certified software

Proof-carrying code

Machine model

Verification condition

Putting it all together

Issues in PCC

Code comes from many sources: there are *code producers*
and *code consumers*. Examples?

Code consumers may have a *safety policy* for their
machine. What is this? Examples?

How can a consumer ensure the safety policy is maintained?

What is a signed binary? Is a signed binary guaranteed to maintain the safety policy?

A complementary technology to signed binaries: bytecode verification. What is this? What guarantees does bytecode verification provide?

Problem: bytecode verification only applies to bytecode! Why is this a problem? What can we do about it?

Another possibility: model checking. Why not have the consumer model check the binary?

Proof-carrying code consists of:

- a native executable, together with
- a proof of safety for the executable.

Obligation for the code producer: produce the proof of safety.

Obligation for the code consumer: check the proof of safety.

Where is most of the burden being placed?

Claims for PCC:

1. Burden on the code producer, not consumer.

2. Consumer doesn't care where the proofs come from. Examples?

3. Code is taperproof. What happens if an attacker modifies the binary?

4. No reliance on trusted third parties. What is the trusted code base?

5. Static checking, not dynamic checking. Why is this good?

Example PCC and related systems:

DEC Alpha Touchstone: certifies Alpha machine code.

Java Touchstone: certifies x86 machine code from Java source.

Foundational PCC: minimal trusted computing base.

Typed Assembly Language: typed x86 machine code.

Teams from Berkeley, Carnegie Mellon, Cornell, Ottowa, Princeton, Yale...

A subset of Alpha assembly language (real thing does x86, jumps,...):

op::= (Operand)

n(Integer Constant)

r(Register)_{i}

instr::= (Instruction)

ADDQr,_{s}op,r(Put_{d}r+_{s}opinr)_{d}

...

LDQr_{d}n(r) (Load from address_{s}r+_{s}n)

STQr_{s}n(r) (Store to address_{d}r+_{d}n)

RET (Return)

For example (what does this do? when is it safe?):

1. LDQ r2 0(r1) 2. ADDQ r2, 1, r3 3. STQ r3 0(r1)

Model for Alpha assembly language is based on *stores* giving
the contents of every register *r _{i}*.

We treat the memory as a special (very large!) register *r*_{m}.

For example:

(r_{1}= 8,r_{2}= 5,r_{3}= 37,r_{m}= { 7, 2, 12 } )

Memory is word aligned, with 8-byte words so the memory has:

contents of 0 = 7, contents of 8 = 2, contents of 16 = 12

What will happen to this store after running the program:

1. LDQ r2 0(r1) 2. ADDQ r2, 1, r3 3. STQ r3 0(r1)

Some operations on stores:

ρ [r_{i}n]

(Update store ρ soris_{i}n)

sel(r_{m},a)

(Select contents of memory addressa)

upd(r_{m},a,r)_{s}

(Update memory addressato contain contents ofr)_{s}

For example, what is:

(r_{1}= 8,r_{2}= 5,r_{3}= 37,r_{m}= { 7, 2, 12 } )

[r_{2}sel(r_{m},r_{1}) ]

[r_{3}r_{2}+ 1 ]

[r_{m}upd(r_{m},r_{1},r_{3}) ]

Machine model consists of steps:

(ρ_{1},pc_{1}) (ρ_{2},pc_{2})

Given by rules (incomplete rules here!):

1. If instruction *pc* is ADDQ *r _{s}*,

(ρ,pc) (ρ[r(_{d}r+_{s}op) ],pc+ 1)

2. If instruction *pc* is LDQ *r _{d}*

(ρ,pc) (ρ[rsel(_{d}r_{m},n+r) ],_{s}pc+ 1)

3. If instruction *pc* is STQ *r _{s}*

(ρ,pc) ??? what goes here ???

All arithmetic is mod 2^{64}.

Run the example:

1. LDQ r2 0(r1) 2. ADDQ r2, 1, r3 3. STQ r3 0(r1)

in initial state (ρ, 1) where:

ρ = (r_{1}= 8,r_{2}= 5,r_{3}= 37,r_{m}= { 7, 2, 12 } )

Assume predicates rd(*a*) and wr(*a*) saying when we have read-access
and write-access to memory. Add side-conditions:

2. If instruction *pc* is LDQ *r _{d}*

(ρ,pc) (ρ[rsel(_{d}r_{m},n+r) ],_{s}pc+ 1)

as long asrd(n+r)_{s}

3. If instruction *pc* is STQ *r _{s}*

(ρ,pc) ??? what goes here ???

A program is *memory safe* as long as these side-conditions never fail!

1. LDQ r2 0(r1) 2. ADDQ r2, 1, r3 3. STQ r3 0(r1)

Each program has a *verification condition*
*VC _{pc}* satisfying the property that:

IfVCis true in store ρ then (ρ,_{pc}pc) is memory safe.

1. If instruction *pc* is ADDQ *r _{s}*,

VC=_{pc}VC[_{pc+1}r(_{d}r+_{s}op) ]

2. If instruction *pc* is LDQ *r _{d}*

VC= rd(_{pc}n+r)_{s}VC[_{pc+1}rsel(_{d}r_{m},n+r) ]_{s}

3. If instruction *pc* is STQ *r _{s}*

(ρ,pc) ??? what goes here ???

Assuming that *VC _{4}* is

1. LDQ r2 0(r1) 2. ADDQ r2, 1, r3 3. STQ r3 0(r1)

Every architecture comes with a *calling convention*
which defines a *precondition* and a *postcondition*
for function calls.

4. If instruction *pc* is RET then:

VC=_{pc}Post

We can check that a program is memory safe by verifying:

PreVC_{0}

We call this the *safety predicate* (*SP*).

Code producer:

1. Generates native code.

2. Calculates *SP*.

3. Builds a proof of *SP*.

Code consumer:

1. Receives native code and proof from the producer.

2. Calculates *SP*.

3. Checks the proof of *SP*.

4. If the proof is valid, runs the code.

How to represent *SP* and it's proof:
most PCC systems use an off-the-shelf
implementation called *Edinburgh Logical Framework* (*LF*).

How to check proofs quickly: a dedicated C implementation of LF checker.

How to represent proofs compactly: lots of ad hoc compression techniques to make proofs smaller than the code!

How to minimize the trusted computing base
(in particular the function which generates *SP*).

Try it out on-line.

Information flow.