Verder naar navigatie Doorgaan naar hoofdinhoud Ga naar de voettekst

Software Verification and Analysis Using Z3

29 januari 2021

door Gérald Doussot

This post provides a technical introduction on how to leverage the Z3 Theorem Prover to reason about the correctness of cryptographic software, protocols and otherwise, and to identify potential security vulnerabilities.

In this document, we will cover two distinct use cases:

  • Modeling and analysis of an algorithm documented in an old version of the QUIC Transport protocol IETF draft.
  • Modeling of specific finite field arithmetic operations for elliptic curve cryptography, with integers represented using a uniform saturated limb schedule (four limbs of 64 bits), to prove equivalence with arbitrary-precision1 arithmetic, and for test cases generation.

Software, Protocols, Security Assurance and Formal Verification

Software system and protocol design and implementation are complex (even more so in cryptography), prone to errors and security vulnerabilities. The industry, for the most part, does not rest on its laurels and continues to refine strategies, tactics, processes and tools to provide some level of assurance that software is secure. There is a strong interest in formal methods to increase confidence in the security of cryptographic systems. Widely deployed software, BoringSSL makes use of formal methods to generate correct elliptic curve implementations. The TLS 1.3 protocol specification design was also influenced by these techniques. One could argue that this interest is due to a combination of several factors, including but not limited to the persistent difficulties in eradicating vulnerabilities (the TLS protocol and its implementations had a sizable share of these), the availability of more computing resources, easier-to-use tools, or even the incessant innovation coming out of the blockchain world. In any case, this is an interesting area of research, with tangible and beneficial outcomes.

Formal verification is a set of techniques to prove some properties about a system. It may target different stages of the software (or hardware) development life cycle, including design, implementation and testing. There are several approaches to verification, sometimes complementary. For the purpose of this introduction, we are going to look at Satisfiability Modulo Theories (SMT) solvers and at Z3 in particular, and how it can be used to reason about systems, prove some properties and generate interesting test cases.

SMT and Z3 Solver

SMT is a problem, expressed in a first order logic language, where one attempts to determine whether a logical formula, interpreted in domains of interest (theories), such as integers, lists or bit vectors, is satisfiable. An SMT solver primarily searches for a solution to the stated problem, i.e. whether the formula is satisfiable or not, and permits us to examine the solution’s model.

Z3 is a popular SMT solver developed by Microsoft Research, and that was open sourced in 2015. It is particularly suited for software verification and analysis. It has been successfully used in research and commercial environments.

SMT-LIB is the lingua franca of SMT solvers and has a Lisp language syntax. We will use SMT-LIB for the purpose of this post. Note that Z3 has several, perhaps more approachable APIs available, including in the Python language.

Let’s proceed with a simple example to illustrate these concepts. We can query an SMT solver such as Z3 to determine whether the expression x + y = 5 can be satisfied, in the context of integers and arithmetic, e.g. are there integer values x and y, that add up to value 5. We will express the problem in the SMT-LIB language:

; this is a comment - it is ignored by solvers

; declare x as integer
(declare-const x Int)

; declare y as an integer
(declare-const y Int)

; express the problem - e.g. add the formula to the list of formulas we are trying to prove
(assert (= (+ x y) 5)) 

; query the solver to determine if the SMT problem is satisfiable
(check-sat) 

; if it is, show one possible solution
(get-model) 

When we run the Z3 SMT solver, we obtain the following response:

sat
(model 
  (define-fun y () Int
    0)
  (define-fun x () Int
    5)
)

sat means that the solver determined that the formula can be satisfied. It identified one solution (model), with y set to 0 and x set to 5 that satisfies the problem as stated. We will work through more complex examples later.

Now that we have a basic understanding of SMT, Z3 and what it can do, we are going to study two use cases in more depth.

Analysis of the QUIC Packet Number Decoding Algorithm

In the course of reviewing a QUIC protocol implementation, the author found potentially concerning discrepancies with the sample, non constant-time2 decoding algorithm in draft 33 of the RFC. Specifically, the sample algorithm alluded to a number of checks, some of which were apparently absent including candidate_pn < (1 << 62) - pn_win, or incomplete e.g. candidate_pn > pn_win instead of candidate_pn >= pn_win, from the implementation:

// Note the extra checks to prevent overflow and underflow.
// (...)
if candidate_pn <= expected_pn - pn_hwin and
   candidate_pn < (1 << 62) - pn_win:
   return candidate_pn + pn_win
if candidate_pn > expected_pn + pn_hwin and
   candidate_pn >= pn_win:
   return candidate_pn - pn_win
// (...)

The discrepancies and their impact were privately reported to the maintainer. The algorithm steps are relatively easy to implement, but somehow more challenging to analyze. It is not straightforward to come up with inputs that may trigger over- and underflows indeed and to determine how likely these may occur (naturally, or in adverse conditions). Z3 permits us to easily query for (and generate) such inputs. The algorithm is easy to model as a bonus. We will do that shortly, but let’s introduce the QUIC protocol and packet numbers briefly first.

QUIC is a UDP-based multiplexed and secure transport protocol. It provides stream multiplexing, flow control, low latency connection establishment, connection migration and authenticated and encrypted headers and payloads. The latest draft version of the QUIC specification describes the QUIC protocol security properties, some of which are provided by TLS 1.3. The integration of QUIC with TLS is more thoroughly described in the Using TLS to Secure QUIC draft-ietf-quic-tls-33 internet draft memo, which is, as with the QUIC protocol specification, a work in progress. Many implementations of the QUIC protocol are available. It is widely used by organizations such as Facebook and Google, in clients and back-end services.

QUIC protocol data is embedded in “packet” logical units, which are in turn encapsulated in UDP datagrams. QUIC packets are organized in packet number spaces (initial, handshake and application data spaces); each packet number space starts at packet number 0 and is incremented independently, up to 2^62-1, at which stage the connection must be closed. Endpoints maintain a separate packet number for emission and reception of QUIC data. QUIC packet data is protected using encryption. The QUIC packet number is used in determining the cryptographic nonce for packet protection.

Packets use long packet headers during connection establishment and short headers after. Both formats employ a variable length encoding of the packet number (8, 16, 24 or 32 bits). Only the least significant bits of the packet numbers required to represent it are sent over the wire. In order to successfully decode and reconstruct the number, endpoints must track the largest packet number received. Furthermore, peers must use packet number lengths large enough to represent a sufficient range from the largest acknowledged packet number and the packet number being sent to permit successful decoding.

Coming back to modeling the packet number decoding algorithm in Z3, the author later discovered that the QUIC implementation used the algorithm presented in draft 17 of the RFC, in which researchers have found bugs. These were fixed in later iterations of the RFC. Interestingly, the researchers found these bugs using formal verification.

QUIC Transport RFC draft 17 provides the sample and erroneous packet number decoding algorithm, reproduced below:


DecodePacketNumber(largest_pn, truncated_pn, pn_nbits):
  expected_pn  = largest_pn + 1
  pn_win       = 1 << pn_nbits
  pn_hwin      = pn_win / 2
  pn_mask      = pn_win - 1
  # The incoming packet number should be greater than
  # expected_pn - pn_hwin and less than or equal to
  # expected_pn + pn_hwin
  #
  # This means we can't just strip the trailing bits from
  # expected_pn and add the truncated_pn because that might
  # yield a value outside the window.
  #
  # The following code calculates a candidate value and
  # makes sure it's within the packet number window.
  candidate_pn = (expected_pn   ~pn_mask) | truncated_pn
  if candidate_pn <= expected_pn - pn_hwin:
     return candidate_pn + pn_win
  # Note the extra check for underflow when candidate_pn
  # is near zero.
  if candidate_pn > expected_pn + pn_hwin and
     candidate_pn > pn_win:
     return candidate_pn - pn_win
  return candidate_pn

We will model the above problematic algorithm in Z3 and find input values to overflow the packet number decoding routine. We first create an empty text file, named decoding.smt2. The smt2 extension represents the SMT-LIB language format.

When expressing our problem, we need to think declaratively, a bit like writing a SQL SELECT statement or programming in Prolog. First, we declare the variables introduced in the RFC Draft 17 decoding algorithm, plus a result variable for convenience, using the SMT-LIB declare-const term. We will be performing operations over numbers – we choose to represent them as bit vectors of 64 bits in size (_ BitVec 64), akin to an unsigned 64 bit C variable type or computer memory register to closely match most QUIC implementations developed in general programming languages on modern platforms.

(declare-const largest-pn  (_ BitVec 64))
(declare-const truncated-pn  (_ BitVec 64))
(declare-const pn-nbits  (_ BitVec 64))
(declare-const expected-pn  (_ BitVec 64))
(declare-const pn-win  (_ BitVec 64))
(declare-const pn-hwin  (_ BitVec 64))
(declare-const pn-mask  (_ BitVec 64))
(declare-const candidate-pn  (_ BitVec 64))
(declare-const result  (_ BitVec 64))

Then we write a number of formulas that must always be true.

(assert (bvule truncated-pn (_ bv4294967295 64))) ; max 2**32 -1 bits 

(assert (bvule largest-pn (_ bv4611686018427387903 64))); max 2**62 -1 bits
(assert (bvuge largest-pn (_ bv0 64)))

We learnt in our QUIC transport protocol introduction earlier that encoded packet numbers (truncated-pn) cannot exceed 2*32 - 1. We also learnt that a decoded packet number ranges from 0 to 2^62 - 1. The above captures that. The first line for example asserts that the truncated-pn variable must always be lower or equal (bvule or “Bit Vector Unsigned Lower or Equal”) to 2^32 - 1. In declaring these constraints and the next, we also progressively but dramatically reduce the search space of the Z3 solver for potential solutions to overflowing the decoded packet number.

Note that in an adversarial environment, we cannot assume that these conditions are true. When performing a security assessment of a system such as a QUIC implementation, we need to verify that the extraction of the truncated_pn value for instance from a packet is correct, and that its value cannot fall out of range, among other things.

Let’s move further:

(assert (or (= pn-nbits  (_ bv8 64))
	    (= pn-nbits  (_ bv16 64))
	    (= pn-nbits  (_ bv24 64))
	    (= pn-nbits  (_ bv32 64))))
		
;; ensure that truncated-pn is < 2^pn-bits and >= 2^(pn-bits-8)
;; This may not not be the case in an adverse scenario
(assert
 (and
  (bvult truncated-pn (bvshl (_ bv1 64) pn-nbits))
  (bvuge truncated-pn (bvshl (_ bv1 64) (bvsub pn-nbits (_ bv8 64))))))

Remember that we explained that the encoded packet number size can be either 8, 16, 24 or 32 bits. The above first declares that the encoded packet number size pn-bits must take one of these values e.g. (_ bv8 64)), which means “Bit Vector of Value 8 and of size 64 bits”, our chosen variable representation.

We then ensured via an assertion, that the encoded packet number length pn-bits corresponds to the length of what is strictly required to encode the packet number truncated-pn, for the purpose of modeling the intended QUIC protocol specification behavior. For instance, packet number 0xff00 could be encoded as 0x00ff00(three bytes long) but the leading byte is superfluous so two bytes suffice to encode its length. Note that we cannot assume that correspondence in an adverse environment, and it would be interesting to investigate the potential impact of introducing a discrepancy when testing a QUIC implementation, e.g. what happens if one encodes packet number 0xff using a length of four bytes, 0x000000ff.

Next, we actually model the algorithm of Draft 17:

(assert ( = expected-pn
	    (bvadd largest-pn (_ bv1 64))))

We translated expected_pn = largest_pn + 1 by asserting, i.e. making a statement or describing a formula, that expected-pn must take some value equal (=) to largest-pn + 1 using bvadd for “Bit Vector Add”.

(assert( = pn-win
	   (bvshl (_ bv1 64) pn-nbits)))

Draft 17 algorithm states that pn_win takes the value 1 left shifted by pn_nbits. This is easily translated, using the SMT-LIB bvshl operation, for “Bit Vector Shift Left”, as illustrated above.

(assert ( =  pn-hwin 
	     (bvlshr pn-win (_ bv2 64) )))

Above, we translated pn_hwin = pn_win / 2. Division by 2 is equivalent to a binary right shift with an offset of 1, which is implemented in SMT-LIB using bvshlshr, “unsigned (logical) shift right”. We assert that pn-hwin must take some value equal to pn_win divided by 2.

(assert (= pn-mask
	   (bvsub pn-win (_ bv1 64) )))

This is getting tediously easy. We implemented pn_mask = pn_win - 1 above.

(assert (= candidate-pn
	   (bvor
	    (bvand expected-pn (bvnot pn-mask))
	    truncated-pn)))

In the above code, we broke candidate_pn = (expected_pn ~pn_mask) | truncated_pn into a number of SMT-LIB expressions. We assert that candidate-pn must take a value that is the bitwise or ( the | token in the algorithm and bvor or “Bit Vector Or” in SMT-LIB) operation of:

  • the bitwise AND ( token represented as bvand in SMT-LIB) operation of:
    • some value expected-pn
    • the bitwise negation (~ token is bvnot using SMT-LIB) operation of some value pn-mask
  • some value truncated-pn.

We finally model the last part of the algorithm:

(assert
 (or
  (and (bvule candidate-pn (bvsub expected-pn pn-hwin))
       (= result (bvadd candidate-pn pn-win)))
  (and (bvugt candidate-pn (bvadd expected-pn pn-win))
       (bvugt candidate-pn pn-win)
       (= result (bvsub candidate-pn pn-win)))
  (= result candidate-pn)))
  

The algorithm’s if clauses are implemented using OR boolean operations with bvor. Notice that the algorithm has 3 conditions (2 explicit and one implicit condition if the two first conditions were not met), with 3 different return statements. Within these if conditions, the algorithm AND boolean operations are implemented using SMT-LIB’s bvand statement. bvule and bvugt stand for “Binary Vector Unsigned Lower or Equal” and “Greater Than” respectively. If any condition matches, we assert that our result variable must take the appropriate value, for instance the previously “calculated” candidate-pn value in the preceding step (candidate_pn = (expected_pn ~pn_mask) | truncated_pn).

We now have fully implemented our algorithm using SMT-LIB. We need to ensure that it is correct. The Draft 17 standard provides one test case:

For example, if the highest successfully authenticated packet had a packet number of 0xa82f30ea, then a packet containing a 16-bit value of 0x9b32 will be decoded as 0xa82f9b32.

Let’s model it as a unit test case, using a decimal representation of these values:

(assert (and
	 (= largest-pn (_ bv2821665002 64))
	 (= pn-nbits (_ bv16 64)) 
	 (= truncated-pn (_ bv39730 64))
	 (= result (_ bv2821692210 64))))

In the above code snippet, we added one more assertion, which describes one valid state of potentially many. This state binds the different variables of our algorithms to the values of our unit test case. This is the last of all the formulas that we have written that must be satisfied.

(check-sat)
(get-model)

Append the above commands to our SMT-LIB formulas file. (check-sat) requests Z3 to determine if the set of formulas can be satisfied. If they can, Z3 will output sat and the (get-model) command will output one instance of solutions that Z3 found. Otherwise, Z3 will return unsat.

Now, let’s ask Z3 to find such a solution:

$ z3 decoding.smt2

sat
(model 
  (define-fun pn-hwin () (_ BitVec 64)
    #x0000000000008000)
  (define-fun pn-win () (_ BitVec 64)
    #x0000000000010000)
  (define-fun pn-nbits () (_ BitVec 64)
    #x0000000000000010)
  (define-fun pn-mask () (_ BitVec 64)
    #x000000000000ffff)
  (define-fun largest-pn () (_ BitVec 64)
    #x00000000a82f30ea)
  (define-fun expected-pn () (_ BitVec 64)
    #x00000000a82f30eb)
  (define-fun result () (_ BitVec 64)
    #x00000000a82f9b32)
  (define-fun truncated-pn () (_ BitVec 64)
    #x0000000000009b32)
  (define-fun candidate-pn () (_ BitVec 64)
    #x00000000a82f9b32)
)

Z3 did find a solution that matches the test data of Draft 17 of the standard! It also outputted variables that we declared in our model and bound to values that satisfy all our formulas. This gives us some confidence that we modeled the algorithm correctly. We can now attempt to find input that will trigger this overflowing condition.

In our text file, replace the following assertion:

(assert (and
	 (= largest-pn (_ bv2821665002 64))
	 (= pn-nbits (_ bv16 64)) 
	 (= truncated-pn (_ bv39730 64))
	 (= result (_ bv2821692210 64))))

with:

; over > 2**61 -1
(assert  (bvugt result (_ bv4611686018427387903 64)))

First, note that the ; character introduces a comment. It is ignored by Z3. Then, we ask Z3 to find a solution for our previous formulas, with the added condition that result, the decoded packet number, is superior to 2^62 - 1, which is forbidden by the QUIC standard.

Below is the output of running Z3 on the above model. It displays the variables, including input variables, required to trigger an overflow.

$ z3 decoding.smt2
sat
(model 
  (define-fun pn-hwin () (_ BitVec 64)
    #x0000000080000000)
  (define-fun pn-win () (_ BitVec 64)
    #x0000000100000000)
  (define-fun pn-nbits () (_ BitVec 64)
    #x0000000000000020)
  (define-fun pn-mask () (_ BitVec 64)
    #x00000000ffffffff)
  (define-fun largest-pn () (_ BitVec 64)
    #x3fffffffffffffff)
  (define-fun expected-pn () (_ BitVec 64)
    #x4000000000000000)
  (define-fun result () (_ BitVec 64)
    #x4000000080000000)
  (define-fun truncated-pn () (_ BitVec 64)
    #x0000000080000000)
  (define-fun candidate-pn () (_ BitVec 64)
    #x4000000080000000)
)

So, if the latest accepted packet number is 2^62 -1, and the received encoded packet number value is 1 or more, with a 32 bits packet number length, then the decoded packet number overflows. This was arguably an obvious failure scenario. We can add or changes assertions to obtain non-trivial cases, such as follows:

;; We want result > 2**62 -1 (overflow)
;; but dont "cheat" by starting with packet number 2**62-1
(assert  (and (bvugt result (_ bv4611686018427387903 64))
	      (bvult largest-pn ( _ bv4611686018427387902 64))))

Z3 then finds the following solution, with result overflowing to 2^62 - 1 + 1026, with the last accepted packet number set to 2**62 - 1 - 31743.

sat
(model 
  (define-fun pn-hwin () (_ BitVec 64)
    #x0000000000008000)
  (define-fun pn-win () (_ BitVec 64)
    #x0000000000010000)
  (define-fun pn-nbits () (_ BitVec 64)
    #x0000000000000010)
  (define-fun pn-mask () (_ BitVec 64)
    #x000000000000ffff)
  (define-fun largest-pn () (_ BitVec 64)
    #x3fffffffffff8400)
  (define-fun expected-pn () (_ BitVec 64)
    #x3fffffffffff8401)
  (define-fun result () (_ BitVec 64)
    #x4000000000000401)
  (define-fun truncated-pn () (_ BitVec 64)
    #x0000000000000401)
  (define-fun candidate-pn () (_ BitVec 64)
    #x3fffffffffff0401)
)

The impact of overflowing or underflowing when eliciting the QUIC packet number would vary with the QUIC implementation. It may include failure to decrypt protected packets in certain conditions (the most likely scenario – and these conditions can be enumerated using Z3). When modeling the RFC algorithm fix in Z3, Z3 cannot find a solution (unsat) to overflow the decoded packet number!

Finite Field Arithmetic Using Uniform Saturated Limb Schedule Verification

Modern elliptic curve cryptography deals with underlying field elements that are too large to fit in contemporaneous computer memory registers. For instance, P-384, the elliptic curve specified in NSA post-quantum transition phase algorithms, has a modulus p equal to 2^384 - 2^128 - 2^96 + 2^32 - 1, which does not fit in a single 64 bits register. Therefore, one must devise a representation of numbers that spans multiple registers.

Many programming languages provide arbitrary-precision arithmetic functions as part of their standard libraries. However, they are optimized for general use cases and arbitrarily large numbers, but not for specific curves, so they may not provide optimal performance. More importantly, these libraries are not suitable for cryptographic operations, as they are unlikely to be constant-time. This means attackers may be in a position to retrieve some or all of the secret information these libraries are handling, including cryptographic keys.

There are several ways to represent and handle field elements over multiple registers, called limb schedules. A limb represents one piece of a field element. For instance, a field element of 128 bits in size may be represented by 4 or more 32 bits registers as limbs in a 32-bit computer architecture. In a uniform limb schedule, the limbs making one field element have the same size, whereas in a non-uniform limb schedule, the limbs size differs. In a saturated limb schedule, operations on limbs may overflow their capacity; they do not in unsaturated limb schedules.

The choice of a limb schedule depends mostly of the underlying architecture and of the operations to apply on data (one important factor is whether one has access to the carry flag value in the programming language of their choice). The recently unveiled Double-Odd elliptic curves use a uniform saturated limb schedule in several implementations. This schedule does not appear to be very common but is not unique to Double-Odd implementations.

Below is the implementation of Double-Odd field addition and subtraction in Go:

// Internal function for field addition.
// Parameters:
//   d    destination
//   a    first operand
//   b    second operand
//   mq   modulus definition parameter
func gf_add(d, a, b *[4]uint64, mq uint64) {
	// First pass: sum over 256 bits + carry
	var cc uint64 = 0
	for i := 0; i < 4; i ++ {
		d[i], cc = bits.Add64(a[i], b[i], cc)
	}

	// Second pass: if there is a carry, subtract 2*p = 2^256 - 2*mq;
	// i.e. we add 2*mq.
	d[0], cc = bits.Add64(d[0], (mq << 1)   -cc, 0)
	for i := 1; i < 4; i ++ {
		d[i], cc = bits.Add64(d[i], 0, cc)
	}

	// If there is an extra carry, then this means that the initial
	// sum was at least 2^257 - 2*mq, in which case the current low
	// limb is necessarily lower than 2*mq, and adding 2*mq again
	// won't trigger an extra carry.
	d[0] += (mq << 1)   -cc
}

func gf_sub(d, a, b *[4]uint64, mq uint64) {
	// First pass: difference over 256 bits + borrow
	var cc uint64 = 0
	for i := 0; i < 4; i ++ {
		d[i], cc = bits.Sub64(a[i], b[i], cc)
	}

	// Second pass: if there is a borrow, add 2*p = 2^256 - 2*mq;
	// i.e. we subtract 2*mq.
	d[0], cc = bits.Sub64(d[0], (mq << 1)   -cc, 0)
	for i := 1; i < 4; i ++ {
		d[i], cc = bits.Sub64(d[i], 0, cc)
	}

	// If there is an extra borrow, then this means that the
	// subtraction of 2*mq above triggered a borrow, and the first
	// limb is at least 2^64 - 2*mq; we can subtract 2*mq again without
	// triggering another borrow.
	d[0] -= (mq << 1)   -cc
}

Both operations resort to three carry passes, which does arguably feel counter-intuitive. Furthermore, implementation of operations that trigger a carry is notoriously hard to get right and a common source of security vulnerabilities in cryptography software. We will try to determine whether this algorithm is correct and whether we can generate test cases to ensure all carry flags are set when they should be, thus providing good test coverage. We will attempt to model the field addition and subtraction in Z3 to achieve our objectives.

We first need to improve our knowledge of SMT-LIB by introducing a few more terms:

First, let introduces one or more variables in parallel and bind them to an expression or value, e.g. (let ((x 0) (y 1) ...) bind x to 0 and y to 1. Because bindings are introduced in parallel, they cannot refer to other bindings being evaluated e.g. (let ((x 0) (z (+ x 3)) ...) is not correct. To permit sequential bindings, we have to introduce another let term, as illustrated below:

(let ((x 0))
	  (y 1) 
	  ;; we want to bind z to x + 3, where x is a previous defined binding
  (let ((z (+ x y 3)))
	  ...))

Second, declare-datatypes introduces one or more data types, akin to a structure with a constructor and fields. The following defines the Pair tuple type of two integer elements and associated constructor, and field accessors first and second:

;; define our tuple type
(declare-datatypes () ((Pair (mk-pair (first Int) (second Int)))))

;; declare a constant of type Pair
(declare-const p1 (Pair))

;; assert that elements of our 2 elements tuple p1 are [0, 1]
(assert (=  (first p1) 0))
(assert (=  (second p1) 1))

Next, push and pop adds and removes elements to our solver list (a stack) of assertions, declarations and definitions. This is particularly useful to revert back to previous states after finding a solution to a problem where we need to start with a fresh model. This would have avoided replacing definitions and assertions in our previous worked QUIC example with new ones, as we progressed.

Finally, define-fun can be thought as introducing the definition of a function. The resulting function object can take a number of arguments and returns a value.

We are now ready to model the Double-Odd Go implementation addition and subtraction field operations in Z3.

;;;; Modeling Section
(declare-datatypes
 ()
 ;; CarryResult is a structure holding the result and carry flag of an operation
 ;; on a limb of 64 bits
 ((CarryResult
   (mk-carry-result (carry (_ BitVec 64))
		    (result (_ BitVec 64))))
  ;; Field is a struct holding 4 limbs of 64 bits
  (Field
   (mk-field
    (first (_ BitVec 64))
    (second (_ BitVec 64))
    (third (_ BitVec 64))
    (fourth (_ BitVec 64))))))

In the above code snippet, we declared two structures for convenience:

  • CarryResult is a structure of two elements, carry and result, two bit vectors of 64 bits.
  • Field implements Double-Odd’s chosen representation of finite fields with four limbs of 64 bits, whose values can be accessed using first, second, third and fourth.
(declare-datatypes
 ()
 ;; InternalCarryState is a structure holding the intermediary and final state
 ;; of computation over fields. This is so we can prove / query / assert certain
 ;; state combinations
 ((InternalCarryState
   (mk-internal-carry-state
    (result Field)
    (carry0 (_ BitVec 64))
    (carry1 (_ BitVec 64))
    (carry2 (_ BitVec 64))
    (carry3 (_ BitVec 64))
    (carrymid (_ BitVec 64))
    (carry4 (_ BitVec 64))
    (carry5 (_ BitVec 64))
    (carry6 (_ BitVec 64))
    (d0-lhs (_ BitVec 64))
    (d0-rhs (_ BitVec 64))))))

Next, we defined an InternalCarryState data structure type, that captures the number of times the carry (cc) variable in the Go source code is assigned (carry0, carry1, etc. in SMT-LIB). This will allow us to make assertions and prove properties about these, i.e. whether they are set to 1 or 0. It also holds the final result of the addition or subtraction operations (the “destination” variable in gf_add() and gf_sub(), which is composed of four limbs of 64 bits). We also track the left- and right-hand side values of the last statement in the gf_add() and gf_sub() functions.

Below, we annotated the relation between the SMT-LIB and the Go language implementation carries, result and other variables directly in the Go source code of the addition function (the relation is similar in the subtraction function) to help comprehension:

// d will contain the result of the gf_add operation 
// and correspond to the result field of SMT-LIB InternalCarryState structure.
func gf_add(d, a, b *[4]uint64, mq uint64) {

var cc uint64 = 0 
	for i := 0; i < 4; i ++ {
		// cc is carry0, carry1, carry2, carry3
		d[i], cc = bits.Add64(a[i], b[i], cc)
	}

	d[0], cc = bits.Add64(d[0], (mq << 1)   -cc, 0)
	// cc is carrymid

	for i := 1; i < 4; i ++ {
	// cc is carry4, carry5, carry6
		d[i], cc = bits.Add64(d[i], 0, cc)
	}

	// Here, d0-lhs corresponds to the Left Hand Side
	// and d0-rhs corresponds to the Right Hand Side of the following statement
	d[0] += (mq << 1)   -cc
}

This InternalCarryState structure will in effect permit to generate input data for test cases to assert different values for each carry. Next, we declare some constants:

;; Set mq and p
(declare-const mq (_ BitVec 64))
(assert (= mq (_ bv18651 64)))
(declare-const p (_ BitVec 264))
(assert (= p (_ bv57896044618658097711785492504343953926634992332820282019728792003956564801317 264)))

We are modeling field operations for the do255e elliptic curve. We asserted above that mq is equal to 18651 and p is equal to 2^255 - 18651. Note that we chose to define the p variable type as a bit vector of 264 bits. We will explain later why.

Now, we need to model Go’s Bits package Add64() and Sub64() functions:

;; add/carry 64 bits function
(define-fun addcarry((x (_ BitVec 64))
		     (y (_ BitVec 64))
		     (cc (_ BitVec 64)))
  CarryResult
  (let ((d (bvadd x y cc)))
    (let ((ccout
	   (bvlshr 
	    (bvor
	     (bvand x y)
	     (bvand (bvor x y)
		    (bvnot d)))
	    (_ bv63 64))))
      (mk-carry-result ccout d))))

;; sub/borrow 64 bits function
(define-fun subborrow((x (_ BitVec 64))
		      (y (_ BitVec 64))
		      (cc (_ BitVec 64)))
  CarryResult
  (let ((d (bvsub (bvsub x y) cc)))
    (let ((ccout
	   (bvlshr 
	    (bvor
	     (bvand (bvnot x) y)
	     (bvand (bvnot (bvxor x y))
		    d))
	    (_ bv63 64))))
      (mk-carry-result ccout d))))

Above, we just translated their source code to SMT-LIB. We now proceed to model the Double-Odd Go gf_add() function in SMT-LIB:

;; gf add function with mq := 18_651
(define-fun gf-add ((field-one Field) (field-two Field))
  InternalCarryState
  (let
      ((x0 (first field-one))
       (x1 (second field-one))
       (x2 (third field-one))
       (x3 (fourth field-one))
       (y0 (first field-two))
       (y1 (second field-two))
       (y2 (third field-two))
       (y3 (fourth field-two)))
    (let ((result0 (addcarry x0 y0 (_ bv0 64))))
      (let ((result1 (addcarry x1 y1 (carry result0))))
	(let ((result2 (addcarry x2 y2 (carry result1))))
	  (let ((result3 (addcarry x3 y3 (carry result2))))
	    (let ((resultmid
		   (addcarry (result result0)
			     (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result3)))
			     (_ bv0 64))))
	      (let ((result4 (addcarry (result result1) (_ bv0 64) (carry resultmid))))
		(let (( result5 (addcarry (result result2) (_ bv0 64) (carry result4))))
		  (let ((result6 (addcarry (result result3) (_ bv0 64) (carry result5))))
		    (let ((d0-internal-rhs (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result6)))))
		      (let ((d0
			     (bvadd
			      (result resultmid)
			      d0-internal-rhs)))
			(let ((d1 (result result4)))
			  (let ((d2 (result result5)))
			    (let ((d3 (result result6)))
			      (mk-internal-carry-state
			       (mk-field d0 d1 d2 d3)
			       (carry result0)
			       (carry result1)
			       (carry result2)
			       (carry result3)
			       (carry resultmid)
			       (carry result4)
			       (carry result5)
			       (carry result6)
			       (result resultmid)
			       d0-internal-rhs))))))))))))))))

We defined a function gf-add above that takes two fields of four 64-bit limbs, adds them and returns the aforementioned InternalCarryState structure, which contains the result of the computation, but also a model of all carries, i.e. whether they are set to 1 or 0. The listing appears somehow complex and it is because we want to track these carry variables; but it is a straight translation of the Double-Odd gf_add() function otherwise. Furthermore, this will allow us to express a large number of test cases more concisely after. We now translate the gf_sub() field subtraction operation:

;; gf sub function with mq := 18_651
(define-fun gf-sub ((field-one Field) (field-two Field))
  InternalCarryState
  (let
      ((x0 (first field-one))
       (x1 (second field-one))
       (x2 (third field-one))
       (x3 (fourth field-one))
       (y0 (first field-two))
       (y1 (second field-two))
       (y2 (third field-two))
       (y3 (fourth field-two)))
    (let ((result0 (subborrow x0 y0 (_ bv0 64))))
      (let ((result1 (subborrow x1 y1 (carry result0))))
	(let ((result2 (subborrow x2 y2 (carry result1))))
	  (let ((result3 (subborrow x3 y3 (carry result2))))
	    (let ((resultmid
		   (subborrow (result result0)
			      (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result3)))
			      (_ bv0 64))))
	      (let ((result4 (subborrow (result result1) (_ bv0 64) (carry resultmid))))
		(let (( result5 (subborrow (result result2) (_ bv0 64) (carry result4))))
		  (let ((result6 (subborrow (result result3) (_ bv0 64) (carry result5))))
		    (let ((d0-internal-rhs (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result6)))))
		      (let ((d0
			     (bvsub
			      (result resultmid)
			      d0-internal-rhs)))
			(let ((d1 (result result4)))
			  (let ((d2 (result result5)))
			    (let ((d3 (result result6)))
			      (mk-internal-carry-state
			       (mk-field d0 d1 d2 d3)
			       (carry result0)
			       (carry result1)
			       (carry result2)
			       (carry result3)
			       (carry resultmid)
			       (carry result4)
			       (carry result5)
			       (carry result6)
			       (result resultmid)
			       d0-internal-rhs))))))))))))))))				   

As you can see, the code for subtraction is extremely similar to that of addition for both the Z3 and Go implementations. We are ready to generate interesting test cases that can be used by implementations such as Go and others.

Before we proceed, we will add some formulas to support proving the equivalence of addition and subtraction using the Go implementation uniform saturated limb schedule (four limbs of 64 bits) with addition and subtraction using arbitrary precision arithmetic. Now we don’t need to have unlimited memory to simulate the latter. We are doing addition and subtraction with 256-bit numbers after all so we need at most 257 bits precision. We can round this number to the next byte, 264.

Recall that elliptic curve field operations are performed modulo p. At a high-level, this is how we are going to proceed to prove the equivalence:

  • we ask Z3 to compute some addition using the uniform saturated limb schedule (256 bits input and result).
  • We ask Z3 to compute the same addition using simulated arbitrary precision arithmetic (264 bits result).
  • Because the Go code does not completely reduce the addition or subtraction modulo p (it is performed elsewhere, when required), the above results are not equal. So next, we reduce both the uniform saturated limb schedule result and the arbitrary precision results modulo p.
  • We assert to Z3 that the reduced results are equal (Z3 will ensure that).

Because Z3 cannot compare 256 bits and 264 bits values, we will have to perform an intermediary step, before reduction, to extend the uniform saturated limb schedule result to 264 bits.

In SMT problems, we cannot prove that an algorithm is correct but we can prove that two different operations are equivalent and this is what we are trying to achieve here.

Now, let’s implement this equivalence proving formula:

;; Extend a 64 bits vector to 264bits (+200 bits). Used to compare results of field
;; operations between our implemented algorithm and Z3 normal operations on
;; vectors, which must be bigger than 264 bits to cater for overflows.
(define-fun extend-64-264((x (_ BitVec 64)))
  (_ BitVec 264)
  (concat #x00000000000000000000000000000000000000000000000000 x))

;; Apply extend-64-264 to each limb of a field
(define-fun field-extend-64-264((f Field))
  (_ BitVec 264)
  (bvadd
   (extend-64-264 (first f))
   (bvshl (extend-64-264 (second f)) (_ bv64 264))
   (bvshl (extend-64-264 (third f)) (_ bv128 264))
   (bvshl (extend-64-264 (fourth f)) (_ bv192 264))))

We implemented above the functions required to extend a field from 256 to 264 bits, using the SMT-LIB‘s concat statement. We then implement two functions that take two fields to add or subtract respectively using arbitrary precision arithmetic and the result of the uniform saturated limb schedule on these two fields, and compare the results, modulo p, as we explained earlier:

;; Check that Z3 and our addition algorithm find the same result modulo p
(define-fun self-test-add((field-one Field)
			  (field-two Field)
			  (result Field))
  Bool
  (let ((a264 (field-extend-64-264 result))
	(b264 (bvadd (field-extend-64-264 field-one)
 		     (field-extend-64-264 field-two))))
    (=
     (bvsmod a264 p)
     (bvsmod b264 p))))

;; Check that Z3 and our subtraction algorithm find the same result modulo p
(define-fun self-test-sub((field-one Field)
			  (field-two Field)
			  (result Field))
  Bool
  (let ((a264 (field-extend-64-264 result))
	(b264 (bvsub (field-extend-64-264 field-one)
 		     (field-extend-64-264 field-two))))
    (=
     (bvsmod a264 p)
     (bvsmod b264 p))))

Both functions above return true or false, depending on whether the results of the two methods of computation are equal or not. We are now ready to write out test cases! We first declare our test case variables:


;;;; "Queries" / Proofs
;; Below are sample "queries" - find a solution that satisfies formulas

;;; Setup
(declare-const x0 (_ BitVec 64))
(declare-const x1 (_ BitVec 64))
(declare-const x2 (_ BitVec 64))
(declare-const x3 (_ BitVec 64))
(declare-const y0 (_ BitVec 64))
(declare-const y1 (_ BitVec 64))
(declare-const y2 (_ BitVec 64))
(declare-const y3 (_ BitVec 64))

(declare-const d InternalCarryState)
(declare-const f1 Field)
(declare-const f2 Field)

(assert (= f1 (mk-field x0 x1 x2 x3)))
(assert (= f2 (mk-field y0 y1 y2 y3)))

;; save state
(push)

In the code snippet above, we declared two fields to operate on (f1 and f2), and the result of our uniform saturated limb schedule addition or subtraction (d). We also save the state so we can revert to it later using the (pop) expression. Now we write a first test case, to give a view of the possibilities:

;;; Addition

;; 1st query: find a solution where output (result of addition) is all 1s.
(echo "-- Field Addition: Find input where output all 1s (Prior to modulo reduction)")
(assert (= d (gf-add f1 f2)))
(assert (= (first (result d)) (_ bv18446744073709551615 64)))
(assert (= (second (result d)) (_ bv18446744073709551615 64)))
(assert (= (third (result d)) (_ bv18446744073709551615 64)))
(assert (= (fourth (result d)) (_ bv18446744073709551615 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

Above, we asked our Z3 SMT solver to find any input to our uniform saturated limb schedule addition that results in a field with a value with all bits set to 1. Furthermore, we asked the solver to determine whether the addition modulo p using the Go’s uniform saturated limb schedule algorithm and the arbitrary precision addition are equivalent. This is the response:

;;-- Field Addition: Find input where output all 1s (Prior to modulo reduction)
sat
((f1 (mk-field #xffffffffe1efc9bd
          #xfffffffffffffffe
          #x0000000000000001
          #x0000000000000000))
 (f2 (mk-field #x000000001e103642
          #x0000000000000001
          #xfffffffffffffffe
          #xffffffffffffffff))
 ((result d) (mk-field #xffffffffffffffff
          #xffffffffffffffff
          #xffffffffffffffff
          #xffffffffffffffff)))

Z3 found a model satisfying our query and provided values for this model, including for all the limbs of input fields f1 and f2. Now let’s query for models where the various carry variables are set to 1:

;; Find a solution where carry #0 is equal to 1
(echo "-- Field Addition: Find input where cc#0 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry0 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))


;; restore state
(pop)
(push)

;; Find a solution where last carry is equal to 1
(echo "-- Field Addition: Find input where cc#6 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry6 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))


;; restore state
(pop)
(push)
(echo "-- Field Subtraction: Find input where cc#1 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry1 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

Below, Z3 responds with sample results to our queries. This can be turned into unit test cases for implementations of Double-Odd (do255e):


;;-- Field Addition: Find input where cc#0 == 1 
sat
((f1 (mk-field #xffffffffe1efc9a5
          #x7ffffffffffffffe
          #x0000000000000001
          #x0000000000000000))
 (f2 (mk-field #x000000001e103662
          #x8000000000000001
          #xfffffffffffffffe
          #xfffffffffffffffd))
 ((result d) (mk-field #x0000000000000007
          #x0000000000000000
          #x0000000000000000
          #xfffffffffffffffe))

;;-- Field Addition: Find input where cc#6 == 1 
sat
((f1 (mk-field #xffffffffffffef2a
          #xffffffffffffffff
          #xffffffffffffffff
          #xffffffffffffffff))
 (f2 (mk-field #xffffffffffff8018
          #xffffffffffffffff
          #xffffffffffffffff
          #xffffffffffffffff))
 ((result d) (mk-field #x00000000000092ae
          #x0000000000000000
          #x0000000000000000
          #x0000000000000000)))

;;-- Field Subtraction: Find input where cc#1 == 1 
sat
((f1 (mk-field #xafffffffffffff2a
          #x0000000000000000
          #x7fffffffffffffff
          #x07ffdfffffffffff))
 (f2 (mk-field #xbfffffffffff6008
          #x0000000000000000
          #x8000000000000000
          #x7800000000000000))
 ((result d) (mk-field #xf000000000000d6c
          #xffffffffffffffff
          #xfffffffffffffffe
          #x8fffdffffffffffe)))

More tests are included at the end of this post. Let’s write two last tests. Why restrict ourselves to checking equivalence of operations on a restricted number of inputs. Can we ask Z3 to prove that there is absolutely no input for which the uniform saturated limb schedule algorithm and the arbitrary precision addition and subtraction are different? It turns out we can:

(echo "-- Field Subtraction: Find a solution mod p that differs from Z3 standard bitvector subtraction over a sufficiently large field. E.g. prove that our subtraction operation is correct. Should return unsat")
;; restore state
(pop)
(push)
(assert (= d (gf-sub f1 f2)))
(assert (= (self-test-sub f1 f2 (result d)) false))
(check-sat)


(echo "-- Field Addition: Find a solution mod p that differs from Z3 standard bitvector addition over a sufficiently large field. E.g. prove that our addition operation is correct. Should return unsat")
(pop)
(push)
(assert (= d (gf-add f1 f2)))
(assert (= (self-test-add f1 f2 (result d)) false))
(check-sat)

The above two tests should return unsat. It will take a few hours but this verification dramatically increased our confidence in the Double-Odd algorithms to perform addition or subtraction of fields using only four 64-bit limbs.

Conclusion

In this article, we used Z3, an SMT solver to reason about the QUIC Transport packet number decoding routine described in an early draft of the standard. In QUIC, the packet number is employed as a nonce to decrypt protected payloads and therefore must be decoded correctly to avert denial of service conditions. We generated input values that fail to decode correctly and that can be used to test QUIC implementations for defects.

We also evaluated the correctness of finite fields operations in the Go language implementation of the Double-Odd do255e elliptic curve, specifically of addition and subtraction using a uniform saturated limb schedule. We prove equivalence of the implementation of these operations with arbitrary precision arithmetic. We also generated unit test cases that can exercise fine-grained parts of the operations and specifically of carry operations, which are notoriously hard to implement correctly. These test cases can be incorporated by implementations in languages using the same algorithm.

Z3 and SMT solvers in general are useful tools to analyze and validate key aspects of cryptographic protocols and implementations. They are relatively straightforward to use, but as with everything, are not immune to defects, be it in their implementations or in the modeling of problems by users.

Resources

Z3 can be obtained from its GitHub home page.

The “Getting Started with Z3: A Guide” tutorial is a great place to start learning Z3. The “SAT/SMT by Example” book provides valuable examples on how to use different SAT and SMT solvers in different contexts.

Sample Code

decoding.smt2

The following SMT-LIB code implements a model for the QUIC Transport Draft RFC 17 (buggy) packet number decoding algorithm. It first validates that the model is consistent with the test data in the RFC. It then provides an example of input data required to trigger an overflow in the decoded packet number. This input data may then be employed to test QUIC implementations, as a unit test case or in a dynamic test.

(declare-const largest-pn  (_ BitVec 64))
(declare-const truncated-pn  (_ BitVec 64))
(declare-const pn-nbits  (_ BitVec 64))
(declare-const expected-pn  (_ BitVec 64))
(declare-const pn-win  (_ BitVec 64))
(declare-const pn-hwin  (_ BitVec 64))
(declare-const candidate-pn  (_ BitVec 64))
(declare-const pn-mask  (_ BitVec 64))
(declare-const result  (_ BitVec 64))

(assert (bvule truncated-pn (_ bv4294967295 64))) ; max 2**32 -1 bits 

(assert (bvule largest-pn (_ bv4611686018427387903 64))); max 2**62 -1 bits
(assert (bvuge largest-pn (_ bv0 64)))

(assert (or (= pn-nbits  (_ bv8 64))
	    (= pn-nbits  (_ bv16 64))
	    (= pn-nbits  (_ bv24 64))
	    (= pn-nbits  (_ bv32 64))))

;; ensure that truncated-pn is < 2^pn-bits and >= 2^(pn-bits-8)
;; This may not not be the case in an adverse scenario
(assert
 (and
  (bvult truncated-pn (bvshl (_ bv1 64) pn-nbits))
  (bvuge truncated-pn (bvshl (_ bv1 64) (bvsub pn-nbits (_ bv8 64))))))

(assert ( = expected-pn
	    (bvadd largest-pn (_ bv1 64))))

(assert( = pn-win
	   (bvshl (_ bv1 64) pn-nbits)))

(assert ( =  pn-hwin 
	     (bvlshr pn-win (_ bv1 64) )))

(assert (= pn-mask
	   (bvsub pn-win (_ bv1 64) )))

(assert (= candidate-pn
	   (bvor
	    (bvand expected-pn (bvnot pn-mask))
	    truncated-pn)))

(push) ;saving state
(echo "Draft 17 unit test case result")

(assert
 (or
  (and (bvule candidate-pn (bvsub expected-pn pn-hwin))
       (= result (bvadd candidate-pn pn-win)))
  (and (bvugt candidate-pn (bvadd expected-pn pn-hwin))
       (bvugt candidate-pn pn-win)
       (= result (bvsub candidate-pn pn-win)))
  (= result candidate-pn)))

(assert (and
	 (= largest-pn (_ bv2821665002 64))
	 (= pn-nbits (_ bv16 64))
	 (= truncated-pn (_ bv39730 64))
	 (= result (_ bv2821692210 64))))

(check-sat)
(get-model)

(pop)
(push)
;;;

(echo "Overflow model result")

(assert
 (or
  (and (bvule candidate-pn (bvsub expected-pn pn-hwin))
       (= result (bvadd candidate-pn pn-win)))
  (and (bvugt candidate-pn (bvadd expected-pn pn-hwin))
       (bvugt candidate-pn pn-win)
       (= result (bvsub candidate-pn pn-win)))
  (= result candidate-pn)))

;; We want result > 2**61 -1 (overflow)
;; but dont "cheat" by starting with packet number 2**61-1
(assert  (and (bvugt result (_ bv4611686018427387903 64))
	      (bvult largest-pn ( _ bv4611686018427387902 64))))

(check-sat)
(get-model)

field.smt2

The following SMT-LIB code implements a model for the Go implementation of the Double-Odd elliptic curve do255e field arithmetic addition and subtraction operations. It provides unit test cases data to ensure good coverage of the carry operations and that can be used by other implementations. It also demonstrates equivalence of these operations using uniform saturated limb schedule with operations using “simulated” arbitrary precision arithmetic.

;; Run with `z3 field.smt2`

;;;; Modeling Section
(declare-datatypes
 ()
 ;; CarryResult is a structure holding the result and carry flag of an operation
 ;; on a limb of 64 bits
 ((CarryResult
   (mk-carry-result (carry (_ BitVec 64))
		    (result (_ BitVec 64))))
  ;; Field is a struct holding 4 limbs of 64 bits
  (Field
   (mk-field
    (first (_ BitVec 64))
    (second (_ BitVec 64))
    (third (_ BitVec 64))
    (fourth (_ BitVec 64))))))

(declare-datatypes
 ()
 ;; InternalCarryState is a structure holding the intermediary and final state
 ;; of computation over fields. This is so we can prove / query / assert certain
 ;; state combinations
 ((InternalCarryState
   (mk-internal-carry-state
    (result Field)
    (carry0 (_ BitVec 64))
    (carry1 (_ BitVec 64))
    (carry2 (_ BitVec 64))
    (carry3 (_ BitVec 64))
    (carrymid (_ BitVec 64))
    (carry4 (_ BitVec 64))
    (carry5 (_ BitVec 64))
    (carry6 (_ BitVec 64))
    (d0-lhs (_ BitVec 64))
    (d0-rhs (_ BitVec 64))))))

;; Set mq and p
(declare-const mq (_ BitVec 64))
(assert (= mq (_ bv18651 64)))
(declare-const p (_ BitVec 264))
(assert (= p (_ bv57896044618658097711785492504343953926634992332820282019728792003956564801317 264)))

;; add/carry 64 bits function
(define-fun addcarry((x (_ BitVec 64))
		     (y (_ BitVec 64))
		     (cc (_ BitVec 64)))
  CarryResult
  (let ((d (bvadd x y cc)))
    (let ((ccout
	   (bvlshr 
	    (bvor
	     (bvand x y)
	     (bvand (bvor x y)
		    (bvnot d)))
	    (_ bv63 64))))
      (mk-carry-result ccout d))))

;; sub/borrow 64 bits function
(define-fun subborrow((x (_ BitVec 64))
		      (y (_ BitVec 64))
		      (cc (_ BitVec 64)))
  CarryResult
  (let ((d (bvsub (bvsub x y) cc)))
    (let ((ccout
	   (bvlshr 
	    (bvor
	     (bvand (bvnot x) y)
	     (bvand (bvnot (bvxor x y))
		    d))
	    (_ bv63 64))))
      (mk-carry-result ccout d))))


;; gf add function with mq := 18_651
(define-fun gf-add ((field-one Field) (field-two Field))
  InternalCarryState
  (let
      ((x0 (first field-one))
       (x1 (second field-one))
       (x2 (third field-one))
       (x3 (fourth field-one))
       (y0 (first field-two))
       (y1 (second field-two))
       (y2 (third field-two))
       (y3 (fourth field-two)))
    (let ((result0 (addcarry x0 y0 (_ bv0 64))))
      (let ((result1 (addcarry x1 y1 (carry result0))))
	(let ((result2 (addcarry x2 y2 (carry result1))))
	  (let ((result3 (addcarry x3 y3 (carry result2))))
	    (let ((resultmid
		   (addcarry (result result0)
			     (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result3)))
			     (_ bv0 64))))
	      (let ((result4 (addcarry (result result1) (_ bv0 64) (carry resultmid))))
		(let (( result5 (addcarry (result result2) (_ bv0 64) (carry result4))))
		  (let ((result6 (addcarry (result result3) (_ bv0 64) (carry result5))))
		    (let ((d0-internal-rhs (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result6)))))
		      (let ((d0
			     (bvadd
			      (result resultmid)
			      d0-internal-rhs)))
			(let ((d1 (result result4)))
			  (let ((d2 (result result5)))
			    (let ((d3 (result result6)))
			      (mk-internal-carry-state
			       (mk-field d0 d1 d2 d3)
			       (carry result0)
			       (carry result1)
			       (carry result2)
			       (carry result3)
			       (carry resultmid)
			       (carry result4)
			       (carry result5)
			       (carry result6)
			       (result resultmid)
			       d0-internal-rhs))))))))))))))))


;; gf sub function with mq := 18_651
(define-fun gf-sub ((field-one Field) (field-two Field))
  InternalCarryState
  (let
      ((x0 (first field-one))
       (x1 (second field-one))
       (x2 (third field-one))
       (x3 (fourth field-one))
       (y0 (first field-two))
       (y1 (second field-two))
       (y2 (third field-two))
       (y3 (fourth field-two)))
    (let ((result0 (subborrow x0 y0 (_ bv0 64))))
      (let ((result1 (subborrow x1 y1 (carry result0))))
	(let ((result2 (subborrow x2 y2 (carry result1))))
	  (let ((result3 (subborrow x3 y3 (carry result2))))
	    (let ((resultmid
		   (subborrow (result result0)
			      (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result3)))
			      (_ bv0 64))))
	      (let ((result4 (subborrow (result result1) (_ bv0 64) (carry resultmid))))
		(let (( result5 (subborrow (result result2) (_ bv0 64) (carry result4))))
		  (let ((result6 (subborrow (result result3) (_ bv0 64) (carry result5))))
		    (let ((d0-internal-rhs (bvand (bvshl mq (_ bv1 64)) (bvneg (carry result6)))))
		      (let ((d0
			     (bvsub
			      (result resultmid)
			      d0-internal-rhs)))
			(let ((d1 (result result4)))
			  (let ((d2 (result result5)))
			    (let ((d3 (result result6)))
			      (mk-internal-carry-state
			       (mk-field d0 d1 d2 d3)
			       (carry result0)
			       (carry result1)
			       (carry result2)
			       (carry result3)
			       (carry resultmid)
			       (carry result4)
			       (carry result5)
			       (carry result6)
			       (result resultmid)
			       d0-internal-rhs))))))))))))))))

;; Extend a 64 bits vector to 264bits (+200 bits). Used to compare results of field
;; operations between our implemented algorithm and Z3 normal operations on
;; vectors, which must be bigger than 264 bits to cater for overflows.
(define-fun extend-64-264((x (_ BitVec 64)))
  (_ BitVec 264)
  (concat #x00000000000000000000000000000000000000000000000000 x))

;; Apply extend-64-264 to each limb of a field
(define-fun field-extend-64-264((f Field))
  (_ BitVec 264)
  (bvadd
   (extend-64-264 (first f))
   (bvshl (extend-64-264 (second f)) (_ bv64 264))
   (bvshl (extend-64-264 (third f)) (_ bv128 264))
   (bvshl (extend-64-264 (fourth f)) (_ bv192 264))))


;; Check that Z3 and our addition algorithm find the same result modulo p
(define-fun self-test-add((field-one Field)
			  (field-two Field)
			  (result Field))
  Bool
  (let ((a264 (field-extend-64-264 result))
	(b264 (bvadd (field-extend-64-264 field-one)
 		     (field-extend-64-264 field-two))))
    (=
     (bvsmod a264 p)
     (bvsmod b264 p))))

;; Check that Z3 and our substraction algorithm find the same result modulo p
(define-fun self-test-sub((field-one Field)
			  (field-two Field)
			  (result Field))
  Bool
  (let ((a264 (field-extend-64-264 result))
	(b264 (bvsub (field-extend-64-264 field-one)
 		     (field-extend-64-264 field-two))))
    (=
     (bvsmod a264 p)
     (bvsmod b264 p))))

;;;; "Queries" / Proofs
;; Below are sample "queries" - find a solution that satisfies formulas

;;; Setup
(declare-const x0 (_ BitVec 64))
(declare-const x1 (_ BitVec 64))
(declare-const x2 (_ BitVec 64))
(declare-const x3 (_ BitVec 64))
(declare-const y0 (_ BitVec 64))
(declare-const y1 (_ BitVec 64))
(declare-const y2 (_ BitVec 64))
(declare-const y3 (_ BitVec 64))

(declare-const d InternalCarryState)
(declare-const f1 Field)
(declare-const f2 Field)

(assert (= f1 (mk-field x0 x1 x2 x3)))
(assert (= f2 (mk-field y0 y1 y2 y3)))

;; save state
(push)

;;; Addition

;; 1st query: find a solution where output (result of addition) is all 1s.
(echo "-- Field Addition: Find input where output all 1s (Prior to modulo reduction)")
(assert (= d (gf-add f1 f2)))
(assert (= (first (result d)) (_ bv18446744073709551615 64)))
(assert (= (second (result d)) (_ bv18446744073709551615 64)))
(assert (= (third (result d)) (_ bv18446744073709551615 64)))
(assert (= (fourth (result d)) (_ bv18446744073709551615 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

;; Find a solution where carry #0 is equal to 1
(echo "-- Field Addition: Find input where cc#0 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry0 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)
(echo "-- Field Addition: Find input where cc#1 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry1 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Addition: Find input where cc#2 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry2 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Addition: Find input where cc#3 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry3 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Addition: Find input where cc#mid == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carrymid d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Addition: Find input where cc#4 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry4 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Addition: Find input where cc#5 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry5 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

;; Find a solution where last carry is equal to 1
(echo "-- Field Addition: Find input where cc#6 == 1 ")
(assert (= d (gf-add f1 f2)))
(assert (= (carry6 d) (_ bv1 64)))
(assert (= (self-test-add f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

;; Find a solution where `(mq << 1)   -cc` + `d[0]` overflows. Should return no
;; solution (unsat).
(echo "-- Field Addition: Find a solution where `(mq << 1)   -cc` + `d[0]` overflows. Should return no solution (unsat)")
(assert (= d (gf-add f1 f2)))
(assert ( = 
 	  (_ bv1 64)
 	  (carry
 	   (addcarry (d0-lhs d)
 		     (d0-rhs d)
 		     (_ bv0 64)))))
(assert (= (self-test-add f1 f2 (result d)) true))
;; run
(check-sat)
;;(get-model)
;;(get-value (f1 f2 (result d)))

;;; Substractions tests

;; restore state
(pop)
(push)

;; Find a solution where carry #0 is equal to 1
(echo "-- Field Substraction: Find input where cc#0 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry0 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)
(echo "-- Field Substraction: Find input where cc#1 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry1 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Substraction: Find input where cc#2 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry2 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Substraction: Find input where cc#3 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry3 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Substraction: Find input where cc#mid == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carrymid d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Substraction: Find input where cc#4 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry4 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

(echo "-- Field Substraction: Find input where cc#5 == 1 ")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry5 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

;; Find a solution where last carry is equal to 1
(echo "-- Field Substraction: Find input where cc == 1 in `d[0] -= (mq << 1)   -cc`")
(assert (= d (gf-sub f1 f2)))
(assert (= (carry6 d) (_ bv1 64)))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
(get-value (f1 f2 (result d)))

;; restore state
(pop)
(push)

;; Find a solution where `(mq << 1)   -cc` + `d[0]` overflows. Should return no
;; solution (unsat).
(echo "-- Field Substraction: Find a solution where `d[0] -= (mq << 1)   -cc` overflows. Should return no solution (unsat)")
(assert (= d (gf-sub f1 f2)))
(assert ( = 
	  (_ bv1 64)
	  (carry
	   (subborrow (d0-lhs d)
		     (d0-rhs d)
		     (_ bv0 64)))))
(assert (= (self-test-sub f1 f2 (result d)) true))

;; run
(check-sat)
;;(get-model)
;(get-value (f1 f2 (result d)))


;;;; End


(echo "-- Field Substraction: Find a solution mod p that differs from Z3 standard bitvector substraction over a sufficiently large field. E.g. prove that our substraction operation is correct. Should return unsat")
;; restore state
(pop)
(push)
(assert (= d (gf-sub f1 f2)))
(assert (= (self-test-sub f1 f2 (result d)) false))
(check-sat)



(echo "-- Field Addition: Find a solution mod p that differs from Z3 standard bitvector addition over a sufficiently large field. E.g. prove that our addition operation is correct. Should return unsat")
(pop)
(push)
(assert (= d (gf-add f1 f2)))
(assert (= (self-test-add f1 f2 (result d)) false))
(check-sat)

Author

Gérald Doussot (@gerald_doussot) , NCC Group Cryptography Services

Thanks

Many thanks to my colleagues Paul Bottinelli and Eli Sohl for their thorough and insightful reviews!

  1. Arbitrary-precision arithmetic is typically not constrained by the underlying representation of numbers, but by the amount of resources, including memory, a computer has.
  2. The “Using TLS to Secure QUIC” Draft 33 RFC, in section “Header Protection Timing Side-Channels“, states that the process of recovering the packet number should be free from side-channels; however, the sample decoding algorithm is not constant-time.