270-newhope-hybrid-handshake - Tor design proposals

Filename: 270-newhope-hybrid-handshake.txt
Title: RebelAlliance: A Post-Quantum Secure Hybrid Handshake Based on NewHope
Author: Isis Lovecruft, Peter Schwabe
Created: 16 Apr 2016
Updated: 22 Jul 2016
Status: Obsolete
Depends: prop#220 prop#249 prop#264 prop#270

§0. Introduction

  RebelAlliance is a post-quantum secure hybrid handshake, comprised of an
  alliance between the X25519 and NewHope key exchanges.

  NewHope is a post-quantum-secure lattice-based key-exchange protocol based
  on the ring-learning-with-errors (Ring-LWE) problem.  We propose a hybrid
  handshake for Tor, based on a combination of Tor's current NTor handshake
  and a shared key derived through a NewHope ephemeral key exchange.

  For further details on the NewHope key exchange, the reader is referred to
  "Post-quantum key exchange - a new hope" by Alkim, Ducas, Pöppelmann, and
  Schwabe [0][1].

  For the purposes of brevity, we consider that NTor is currently the only
  handshake protocol in Tor; the older TAP protocol is ignored completely, due
  to the fact that it is currently deprecated and nearly entirely unused.


§1. Motivation

  An attacker currently monitoring and storing circuit-layer NTor handshakes
  who later has the ability to run Shor's algorithm on a quantum computer will
  be able to break Tor's current handshake protocol and decrypt previous
  communications.

  It is unclear if and when such attackers equipped with large quantum
  computers will exist, but various estimates by researchers in quantum
  physics and quantum engineering give estimates of only 1 to 2 decades.
  Clearly, the security requirements of many Tor users include secrecy of
  their messages beyond this time span, which means that Tor needs to update
  the key exchange to protect against such attackers as soon as possible.


§2. Design

  An initiator and responder, in parallel, conduct two handshakes:

  - An X25519 key exchange, as described in the description of the NTor
    handshake in Tor proposal #216.
  - A NewHope key exchange.

  The shared keys derived from these two handshakes are then concatenated and
  used as input to the SHAKE-256 extendable output function (XOF), as described
  in FIPS-PUB-202 [2], in order to produce a shared key of the desired length.
  The testvectors in §C assume that this key has a length of 32 bytes, but the
  use of a XOF allows arbitrary lengths to easily support future updates of
  the symmetric primitives using the key. See also §3.3.1.


§3. Specification

§3.1. Notation

  Let `a || b` be the concatenation of a with b.

  Let `a^b` denote the exponentiation of a to the bth power.

  Let `a == b` denote the equality of a with b, and vice versa.

  Let `a := b` be the assignment of the value of b to the variable a.

  Let `H(x)` be 32-bytes of output of the SHAKE-256 XOF (as described in
  FIPS-PUB-202) applied to message x.

  Let X25519 refer to the curve25519-based key agreement protocol described
  in RFC7748 §6.1. [3]

  Let `EXP(a, b) == X25519(., b, a)` with `g == 9`. Let X25519_KEYGEN() do
  the appropriate manipulations when generating the secret key (clearing the
  low bits, twidding the high bits).  Additionally, EXP() MUST include the
  check for all-zero output due to the input point being of small
  order (cf. RFC7748 §6).

  Let `X25519_KEYID(B) == B` where B is a valid X25519 public key.

  When representing an element of the Curve25519 subgroup as a byte string,
  use the standard (32-byte, little-endian, x-coordinate-only) representation
  for Curve25519 points.

  Let `ID` be a router's identity key taken from the router microdescriptor.
  In the case for relays possessing Ed25519 identity keys (cf. Tor proposal
  #220), this is a 32-byte string representing the public Ed25519 identity key.
  For backwards and forwards compatibility with routers which do not possess
  Ed25519 identity keys, this is a 32-byte string created via the output of
  H(ID).

  We refer to the router as the handshake "responder", and the client (which
  may be an OR or an OP) as the "initiator".


  ID_LENGTH      [32 bytes]
  H_LENGTH       [32 bytes]
  G_LENGTH       [32 bytes]

  PROTOID  :=    "pqtor-x25519-newhope-shake256-1"
  T_MAC    :=    PROTOID || ":mac"
  T_KEY    :=    PROTOID || ":key_extract"
  T_VERIFY :=    PROTOID || ":verify"

  (X25519_SK, X25519_PK) := X25519_KEYGEN()


§3.2. Protocol

 ========================================================================================
 |                                                                                      |
 | Fig. 1: The NewHope-X25519 Hybrid Handshake.                                         |
 |                                                                                      |
 | Before the handshake the Initiator is assumed to know Z, a public X25519 key for     |
 | the Responder, as well as the Responder's ID.                                        |
 ----------------------------------------------------------------------------------------
 |                                                                                      |
 | Initiator                             Responder                                      |
 |                                                                                      |
 | SEED         := H(randombytes(32))                                                   |
 | x, X         := X25519_KEYGEN()                                                      |
 | a, A         := NEWHOPE_KEYGEN(SEED)                                                 |
 | CLIENT_HDATA := ID || Z || X || A                                                    |
 |                                                                                      |
 |               --- CLIENT_HDATA --->                                                  |
 |                                                                                      |
 |                                       y, Y           := X25519_KEYGEN()              |
 |                                       NTOR_KEY, AUTH := NTOR_SHAREDB(X,y,Y,z,Z,ID,B) |
 |                                       M, NEWHOPE_KEY := NEWHOPE_SHAREDB(A)           |
 |                                       SERVER_HDATA   := Y || AUTH || M               |
 |                                       sk := SHAKE-256(NTOR_KEY || NEWHOPE_KEY)       |
 |                                                                                      |
 |               <-- SERVER_HDATA ----                                                  |
 |                                                                                      |
 | NTOR_KEY    := NTOR_SHAREDA(x, X, Y, Z, ID, AUTH)                                    |
 | NEWHOPE_KEY := NEWHOPE_SHAREDA(M, a)                                                 |
 | sk := SHAKE-256(NTOR_KEY || NEWHOPE_KEY)                                             |
 |                                                                                      |
 ========================================================================================


§3.2.1. The NTor Handshake

§3.2.1.1. Prologue

  Take a router with identity ID. As setup, the router generates a secret key z,
  and a public onion key Z with:

    z, Z := X25519_KEYGEN()

  The router publishes Z in its server descriptor in the "ntor-onion-key" entry.
  Henceforward, we refer to this router as the "responder".


§3.2.1.2. Initiator

  To send a create cell, the initiator generates a keypair:

    x, X := X25519_KEYGEN()

  and creates the NTor portion of a CREATE2V cell's HDATA section:

    CLIENT_NTOR    := ID || Z || X                   [96 bytes]

  The initiator includes the responder's ID and Z in the CLIENT_NTOR so that, in
  the event the responder OR has recently rotated keys, the responder can
  determine which keypair to use.

  The initiator then concatenates CLIENT_NTOR with CLIENT_NEWHOPE (see §3.2.2),
  to create CLIENT_HDATA, and creates and sends a CREATE2V cell (see §A.1)
  to the responder.

    CLIENT_NEWHOPE                                   [1824 bytes]  (see §3.2.2)
    CLIENT_HDATA   := CLIENT_NTOR || CLIENT_NEWHOPE  [1920 bytes]

  If the responder does not respond with a CREATED2V cell, the initiator SHOULD
  NOT attempt to extend the circuit through the responder by sending fragmented
  EXTEND2 cells, since the responder's lack of support for CREATE2V cells is
  assumed to imply the responder also lacks support for fragmented EXTEND2
  cells.  Alternatively, for initiators with a sufficiently late consensus
  method, the initiator MUST check that "proto" line in the responder's
  descriptor (cf. Tor proposal #264) advertises support for the "Relay"
  subprotocol version 3 (see §5).


§3.2.1.3. Responder

  The responder generates a keypair of y, Y = X25519_KEYGEN(), and does
  NTOR_SHAREDB() as follows:

  (NTOR_KEY, AUTH) ← NTOR_SHAREDB(X, y, Y, z, Z, ID, B):
    secret_input := EXP(X, y) || EXP(X, z) || ID || B || Z || Y || PROTOID
    NTOR_KEY     := H(secret_input, T_KEY)
    verify       := H(secret_input, T_VERIFY)
    auth_input   := verify || ID || Z || Y || X || PROTOID || "Server"
    AUTH         := H(auth_input, T_MAC)

  The responder sends a CREATED2V cell containing:

    SERVER_NTOR    := Y || AUTH                      [64 bytes]
    SERVER_NEWHOPE                                   [2048 bytes]  (see §3.2.2)
    SERVER_HDATA   := SERVER_NTOR || SERVER_NEWHOPE  [2112 bytes]

  and sends this to the initiator.


§3.2.1.4. Finalisation

  The initiator then checks Y is in G^* [see NOTE below], and does
  NTOR_SHAREDA() as follows:

  (NTOR_KEY) ← NTOR_SHAREDA(x, X, Y, Z, ID, AUTH)
    secret_input := EXP(Y, x) || EXP(Z, x) || ID || Z || X || Y || PROTOID
    NTOR_KEY     := H(secret_input, T_KEY)
    verify       := H(secret_input, T_VERIFY)
    auth_input   := verify || ID || Z || Y || X || PROTOID || "Server"
    if AUTH == H(auth_input, T_MAC)
       return NTOR_KEY

  Both parties now have a shared value for NTOR_KEY.  They expand this into
  the keys needed for the Tor relay protocol.

  [XXX We think we want to omit the final hashing in the production of NTOR_KEY
  here, and instead put all the inputs through SHAKE-256. --isis, peter]

  [XXX We probably want to remove ID and B from the input to the shared key
  material, since they serve for authentication but, as pre-established
  "prologue" material to the handshake, they should not be used in attempts to
  strengthen the cryptographic suitability of the shared key.  Also, their
  inclusion is implicit in the DH exponentiations.  I should probably ask Ian
  about the reasoning for the original design choice.  --isis]


§3.2.2. The NewHope Handshake

§3.2.2.1. Parameters & Mathematical Structures

  Let ℤ be the ring of rational integers. Let ℤq, for q ≥ 1, denote the quotient
  ring ℤ/qℤ.  We define R = ℤ[X]/((X^n)+1) as the ring of integer polynomials
  modulo ((X^n)+1), and Rq = ℤq[X]/((X^n)+1) as the ring of integer polynomials
  modulo ((X^n)+1) where each coefficient is reduced modulo q. When we refer to
  a polynomial, we mean an element of Rq.

    n := 1024
    q := 12289

    SEED         [32 Bytes]
    NEWHOPE_POLY [1792 Bytes]
    NEWHOPE_REC  [256 Bytes]
    NEWHOPE_KEY  [32 Bytes]

    NEWHOPE_MSGA := (NEWHOPE_POLY || SEED)
    NEWHOPE_MSGB := (NEWHOPE_POLY || NEWHOPE_REC)


§3.2.2.2. High-level Description of Newhope API Functions

  For a description of internal functions, see §B.

    (NEWHOPE_POLY, NEWHOPE_MSGA) ← NEWHOPE_KEYGEN(SEED):
        â    := gen_a(seed)
        s    := poly_getnoise()
        e    := poly_getnoise()
        ŝ    := poly_ntt(s)
        ê    := poly_ntt(e)
        b̂    := pointwise(â, ŝ) + ê
        sp   := poly_tobytes(ŝ)
        bp   := poly_tobytes(b̂)
        return (sp, (bp || seed))

    (NEWHOPE_MSGB, NEWHOPE_KEY) ← NEWHOPE_SHAREDB(NEWHOPE_MSGA):
        s'   := poly_getnoise()
        e'   := poly_getnoise()
        e"   := poly_getnoise()
        b̂    := poly_frombytes(bp)
        â    := gen_a(seed)
        ŝ'   := poly_ntt(s')
        ê'   := poly_ntt(e')
        û    := poly_pointwise(â, ŝ') + ê'
        v    := poly_invntt(poly_pointwise(b̂,ŝ')) + e"
        r    := helprec(v)
        up   := poly_tobytes(û)
        k    := rec(v, r)
        return ((up || r), k)

    NEWHOPE_KEY ← NEWHOPE_SHAREDA(NEWHOPE_MSGB, NEWHOPE_POLY):
        û    := poly_frombytes(up)
        ŝ    := poly_frombytes(sp)
        v'   := poly_invntt(poly_pointwise(û, ŝ))
        k    := rec(v', r)
        return k

  When a client uses a SEED within a CREATE2V cell, the client SHOULD NOT use
  that SEED in any other CREATE2V or EXTEND2 cells.  See §4 for further
  discussion.


§3.3. Key Expansion

  The client and server derive a shared key, SHARED, by:

    HKDFID := "THESE ARENT THE DROIDS YOURE LOOKING FOR"
    SHARED := SHAKE_256(HKDFID || NTorKey || NewHopeKey)


§3.3.1. Note on the Design Choice

  The reader may wonder why one would use SHAKE-256 to produce a 256-bit
  output, since the security strength in bits for SHAKE-256 is min(d/2,256)
  for collision resistance and min(d,256) for first- and second-order
  preimages, where d is the output length.

  The reasoning is that we should be aiming for 256-bit security for all of
  our symmetric cryptography.  One could then argue that we should just use
  SHA3-256 for the KDF.  We choose SHAKE-256 instead in order to provide an
  easy way to derive longer shared secrets in the future without requiring a
  new handshake.  The construction is odd, but the future is bright.
  As we are already using SHAKE-256 for the 32-byte output hash, we are also
  using it for all other 32-byte hashes involved in the protocol. Note that
  the only difference between SHA3-256 and SHAKE-256 with 32-byte output is
  one domain-separation byte.

  [XXX why would you want 256-bit security for the symmetric side? Are you
  talking pre- or post-quantum security? --peter]


§4. Security & Anonymity Implications

  This handshake protocol is one-way authenticated.  That is, the server is
  authenticated, while the client remains anonymous.

  The client MUST NOT cache and reuse SEED.  Doing so gives non-trivial
  adversarial advantages w.r.t. all-for-the-price-of-one attacks during the
  caching period.  More importantly, if the SEED used to generate NEWHOPE_MSGA
  is reused for handshakes along the same circuit or multiple different
  circuits, an adversary conducting a sybil attack somewhere along the path(s)
  will be able to correlate the identity of the client across circuits or
  hops.


§5. Compatibility

  Because our proposal requires both the client and server to send more than
  the 505 bytes possible within a CREATE2 cell's HDATA section, it depends
  upon the implementation of a mechanism for allowing larger CREATE cells
  (cf. Tor proposal #249).

  We reserve the following handshake type for use in CREATE2V/CREATED2V and
  EXTEND2V/EXTENDED2V cells:

    0x0003            [NEWHOPE + X25519 HYBRID HANDSHAKE]

  We introduce a new sub-protocol number, "Relay=3", (cf. Tor proposal #264
  §5.3) to signify support this handshake, and hence for the CREATE2V and
  fragmented EXTEND2 cells which it requires.

  There are no additional entries or changes required within either router
  descriptors or microdescriptors to support this handshake method, due to the
  NewHope keys being ephemeral and derived on-the-fly, and due to the NTor X25519
  public keys already being included within the "ntor-onion-key" entry.

  Add a "UseNewHopeKEX" configuration option and a corresponding consensus
  parameter to control whether clients prefer using this NewHope hybrid
  handshake or some previous handshake protocol.  If the configuration option
  is "auto", clients SHOULD obey the consensus parameter.  The default
  configuration SHOULD be "auto" and the consensus value SHOULD initially be "0".


§6. Implementation

  The paper by Alkim, Ducas, Pöppelmann and Schwabe describes two software
  implementations of NewHope, one C reference implementation and an optimized
  implementation using AVX2 vector instructions. Those implementations are
  available at [1].

  Additionally, there are implementations in Go by Yawning Angel, available
  from [4] and in Rust by Isis Lovecruft, available from [5].

  The software used to generate the test vectors in §C is based on the C
  reference implementation and available from:

  https://code.ciph.re/isis/newhope-tor-testvectors
  https://github.com/isislovecruft/newhope-tor-testvectors


§7. Performance & Scalability

  The computationally expensive part in the current NTor handshake is the
  X25519 key-pair generation and the X25519 shared-key computation. The
  current implementation in Tor is a wrapper to support various highly optimized
  implementations on different architectures. On Intel Haswell processors, the
  fastest implementation of X25519, as reported by the eBACS benchmarking
  project [6], takes 169920 cycles for key-pair generation and 161648 cycles
  for shared-key computation; these add up to a total of 331568 cycles on each
  side (initiator and responder).

  The C reference implementation of NewHope, also benchmarked on Intel
  Haswell, takes 358234 cycles for the initiator and 402058 cycles for the
  Responder. The core computation of the proposed combination of NewHope and
  X25519 will thus mean a slowdown of about a factor of 2.1 for the Initiator
  and a slowdown by a factor of 2.2 for the Responder compared to the current
  NTor handshake. These numbers assume a fully optimized implementation of the
  NTor handshake and a C reference implementation of NewHope. With optimized
  implementations of NewHope, such as the one for Intel Haswell described in
  [0], the computational slowdown will be considerably smaller than a factor
  of 2.


§8. References

[0]: https://cryptojedi.org/papers/newhope-20160328.pdf
[1]: https://cryptojedi.org/crypto/#newhope
[2]: http://www.nist.gov/customcf/get_pdf.cfm?pub_id=919061
[3]: https://tools.ietf.org/html/rfc7748#section-6.1
[4]: https://github.com/Yawning/newhope
[5]: https://code.ciph.re/isis/newhopers
[6]: http://bench.cr.yp.to


§A. Cell Formats

§A.1. CREATE2V Cells

  The client portion of the handshake should send CLIENT_HDATA, formatted
  into a CREATE2V cell as follows:

    CREATE2V {                                              [2114 bytes]
      HTYPE   := 0x0003                                     [2 bytes]
      HLEN    := 0x0780                                     [2 bytes]
      HDATA   := CLIENT_HDATA                               [1920 bytes]
      IGNORED := 0x00                                       [194 bytes]
    }

  [XXX do we really want to pad with IGNORED to make CLIENT_HDATA the
  same number of bytes as SERVER_HDATA? --isis]

§A.2. CREATED2V Cells

  The server responds to the client's CREATE2V cell with SERVER_HDATA,
  formatted into a CREATED2V cell as follows:

    CREATED2V {                                             [2114 bytes]
      HLEN    := 0x0800                                     [2 bytes]
      HDATA   := SERVER_HDATA                               [2112 bytes]
      IGNORED := 0x00                                       [0 bytes]
    }

§A.3. Fragmented EXTEND2 Cells

  When the client wishes to extend a circuit, the client should fragment
  CLIENT_HDATA into four EXTEND2 cells:

    EXTEND2 {
      NSPEC := 0x02 {                                     [1 byte]
        LINK_ID_SERVER                                    [22 bytes] XXX
        LINK_ADDRESS_SERVER                               [8 bytes]  XXX
      }
      HTYPE := 0x0003                                     [2 bytes]
      HLEN  := 0x0780                                     [2 bytes]
      HDATA := CLIENT_HDATA[0,461]                        [462 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := CLIENT_HDATA[462,954]                      [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := CLIENT_HDATA[955,1447]                     [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := CLIENT_HDATA[1448,1919] || 0x00[20]        [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := 0x00[172]                                  [172 bytes]
    }

  The client sends this to the server to extend the circuit from, and that
  server should format the fragmented EXTEND2 cells into a CREATE2V cell, as
  described in §A.1.

§A.4. Fragmented EXTENDED2 Cells

    EXTENDED2 {
      NSPEC := 0x02 {                                     [1 byte]
        LINK_ID_SERVER                                    [22 bytes] XXX
        LINK_ADDRESS_SERVER                               [8 bytes]  XXX
      }
      HTYPE := 0x0003                                     [2 bytes]
      HLEN  := 0x0800                                     [2 bytes]
      HDATA := SERVER_HDATA[0,461]                        [462 bytes]
    }
    EXTENDED2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := SERVER_HDATA[462,954]                      [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := SERVER_HDATA[955,1447]                     [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := SERVER_HDATA[1448,1939]                    [492 bytes]
    }
    EXTEND2 {
      NSPEC := 0x00                                       [1 byte]
      HTYPE := 0xFFFF                                     [2 bytes]
      HLEN  := 0x0000                                     [2 bytes]
      HDATA := SERVER_HDATA[1940,2112]                    [172 bytes]
    }


§B. NewHope Internal Functions

  gen_a(SEED):                  returns a uniformly random poly
  poly_getnoise():              returns a poly sampled from a centered binomial
  poly_ntt(poly):               number-theoretic transform; returns a poly
  poly_invntt(poly):            inverse number-theoretic transform; returns a poly
  poly_pointwise(poly, poly):   pointwise multiplication; returns a poly
  poly_tobytes(poly):           packs a poly to a NEWHOPE_POLY byte array
  poly_frombytes(NEWHOPE_POLY): unpacks a NEWHOPE_POLY byte array to a poly

  helprec(poly):                returns a NEWHOPE_REC byte array
  rec(poly, NEWHOPE_REC):       returns a NEWHOPE_KEY


  --- Description of the Newhope internal functions ---

  gen_a(SEED seed) receives as input a 32-byte (public) seed.  It expands
  this seed through SHAKE-128 from the FIPS202 standard. The output of SHAKE-128
  is considered a sequence of 16-bit little-endian integers. This sequence is
  used to initialize the coefficients of the returned polynomial from the least
  significant (coefficient of X^0) to the most significant (coefficient of
  X^1023) coefficient. For each of the 16-bit integers first eliminate the
  highest two bits (to make it a 14-bit integer) and then use it as the next
  coefficient if it is smaller than q=12289.
  Note that the amount of output required from SHAKE to initialize all 1024
  coefficients of the polynomial varies depending on the input seed.
  Note further that this function does not process any secret data and thus does
  not need any timing-attack protection.


  poly_getnoise() first generates 4096 bytes of uniformly random data. This can
  be done by reading these bytes from the system's RNG; efficient
  implementations will typically only read a 32-byte seed from the system's RNG
  and expand it through some fast PRG (for example, ChaCha20 or AES-256 in CTR
  mode). The output of the PRG is considered an array of 2048 16-bit integers
  r[0],...,r[2047]. The coefficients of the output polynomial are computed as
  HW(r[0])-HW(r[1]), HW(r[2])-HW(r[3]),...,HW(r[2046])-HW(r[2047]), where HW
  stands for Hamming weight.
  Note that the choice of RNG is a local decision; different implementations are
  free to use different RNGs.
  Note further that the output of this function is secret; the PRG (and the
  computation of HW) need to be protected against timing attacks.


  poly_ntt(poly f): For a mathematical description of poly_ntt see the [0]; a
  pseudocode description of a very naive in-place transformation of an input
  polynomial f = f[0] + f[1]*X + f[2]*X^2 + ... + f[1023]*X^1023 is the
  following code (all arithmetic on coefficients performed modulo q):

    psi   = 7
    omega = 49

    for i in range(0,n):
      t[i] = f[i] * psi^i

    for i in range(0,n):
      f[i] = 0
      for j in range(0,n):
        f[i] += t[j] * omega^((i*j)%n)

  Note that this is not how poly_ntt should be implemented if performance is
  an issue; in particular, efficient algorithms for the number-theoretic
  transform take time O(n*log(n)) and not O(n^2)
  Note further that all arithmetic in poly_ntt has to be protected against
  timing attacks.


  poly_invntt(poly f): For a mathematical description of poly_invntt see the
  [0]; a pseudocode description of a very naive in-place transformation of an
  input polynomial f = f[0] + f[1]*X + f[2]*X^2 + ... + f[1023]*X^1023 is the
  following code (all arithmetic on coefficients performed modulo q):

    invpsi = 8778;
    invomega = 1254;
    invn = 12277;

    for i in range(0,n):
      t[i] = f[i];

    for i in range(0,n):
      f[i]=0;
      for j in range(0,n):
        f[i] += t[j] * invomega^((i*j)%n)
      f[i] *= invpsi^i
      f[i] *= invn

  Note that this is not how poly_invntt should be implemented if performance
  is an issue; in particular, efficient algorithms for the inverse
  number-theoretic transform take time O(n*log(n)) and not O(n^2)
  Note further that all arithmetic in poly_invntt has to be protected against
  timing attacks.


  poly_pointwise(poly f, poly g) performs pointwise multiplication of the two
  polynomials.  This means that for f = (f0 + f1*X + f2*X^2 + ... +
  f1023*X^1023) and g = (g0 + g1*X + g2*X^2 + ... + g1023*X^1023) it computes
  and returns h = (h0 + h1*X + h2*X^2 + ... + h1023*X^1023) with h0 = f0*g0,
  h1 = f1*g1,..., h1023 = f1023*g1023.


  poly_tobytes(poly f) first reduces all coefficents of f modulo q, i.e.,
  brings them to the interval [0,q-1]. Denote these reduced coefficients as
  f0,..., f1023; note that they all fit into 14 bits. The function then packs
  those coefficients into an array of 1792 bytes r[0],..., r[1792] in "packed
  little-endian representation", i.e.,
  r[0]     = f[0] & 0xff;
  r[1]     = (f[0] >>  8) & ((f[1] & 0x03) << 6)
  r[2]     = (f[1] >>  2) & 0xff;
  r[3]     = (f[1] >> 10) & ((f[2] & 0x0f) << 4)
  .
  .
  .
  r[1790]  = (f[1022]) >> 12) & ((f[1023] & 0x3f) << 2)
  r[1791]  = f[1023] >> 6
  Note that this function needs to be protected against timing attacks. In
  particular, avoid non-constant-time conditional subtractions (or other
  non-constant-time expressions) in the reduction modulo q of the coefficients.


  poly_frombytes(NEWHOPE_POLY b) is the inverse of poly_tobytes; it receives
  as input an array of 1792 bytes and coverts it into the internal
  representation of a poly. Note that poly_frombytes does not need to check
  whether the coefficients are reduced modulo q or reduce coefficients modulo
  q. Note further that the function must not leak any information about its
  inputs through timing information, as it is also applied to the secret key
  of the initiator.


  helprec(poly f) computes 256 bytes of reconciliation information from the
  input poly f. Internally, one byte of reconciliation information is computed
  from four coefficients of f by a function helprec4. Let the input polynomial f
  = (f0 + f1*X + f2*X^2 + ... + f1023*X^1023); let the output byte array be
  r[0],...r[256]. This output byte array is computed as
  r[0]   = helprec4(f0,f256,f512,f768)
  r[1]   = helprec4(f1,f257,f513,f769)
  r[2]   = helprec4(f2,f258,f514,f770)
  .
  .
  .
  r[255] = helprec4(f255,f511,f767,f1023), where helprec4 does the following:

    helprec4(x0,x1,x2,x3):
      b = randombit()
      r0,r1,r2,r3 = CVPD4(8*x0+4*b,8*x1+4*b,8*x2+4*b,8*x3+4*b)
      r = (r0 & 0x03) | ((r1 & 0x03) << 2) | ((r2 & 0x03) << 4) | ((r3 & 0x03) << 6)
      return r

  The function CVPD4 does the following:

    CVPD4(y0,y1,y2,y3):
      v00 = round(y0/2q)
      v01 = round(y1/2q)
      v02 = round(y2/2q)
      v03 = round(y3/2q)
      v10 = round((y0-1)/2q)
      v11 = round((y1-1)/2q)
      v12 = round((y2-1)/2q)
      v13 = round((y3-1)/2q)
      t   = abs(y0 - 2q*v00)
      t  += abs(y1 - 2q*v01)
      t  += abs(y2 - 2q*v02)
      t  += abs(y3 - 2q*v03)
      if(t < 2q):
        v0 = v00
        v1 = v01
        v2 = v02
        v3 = v03
        k  = 0
      else
        v0 = v10
        v1 = v11
        v2 = v12
        v3 = v13
        r  = 1
      return (v0-v3,v1-v3,v2-v3,k+2*v3)

  In this description, round(x) is defined as ⌊x + 0.5⌋, where ⌊x⌋ rounds to
  the largest integer that does not exceed x; abs() returns the absolute
  value.
  Note that all computations involved in helprec operate on secret data and must
  be protected against timing attacks.


  rec(poly f, NEWHOPE_REC r) computes the pre-hash (see paper) Newhope key from
  f and r. Specifically, it computes one bit of key from 4 coefficients of f and
  one byte of r. Let f = f0 + f1*X + f2*X^2 + ... + f1023*X^1023 and let r =
  r[0],r[1],...,r[255]. Let the bytes of the output by k[0],...,k[31] and let
  the bits of the output by k0,...,k255, where
  k0   = k[0] & 0x01
  k1   = (k[0] >> 1) & 0x01
  k2   = (k[0] >> 2) & 0x01
  .
  .
  .
  k8   = k[1] & 0x01
  k9   = (k[1] >> 1) & 0x01
  .
  .
  .
  k255 = (k[32] >> 7)
  The function rec computes k0,...,k255 as
  k0   = rec4(f0,f256,f512,f768,r[0])
  k1   = rec4(f1,f257,f513,f769,r[1])
  .
  .
  .
  k255 = rec4(f255,f511,f767,f1023,r[255])

  The function rec4 does the following:

    rec4(y0,y1,y2,y3,r):
      r0 = r & 0x03
      r1 = (r >> 2) & 0x03
      r2 = (r >> 4) & 0x03
      r3 = (r >> 6) & 0x03
      Decode(8*y0-2q*r0, 8*y1-2q*r1, 8*y2-2q*r2, 8*y3-q*r3)

  The function Decode does the following:

    Decode(v0,v1,v2,v3):
      t0 = round(v0/8q)
      t1 = round(v1/8q)
      t2 = round(v2/8q)
      t3 = round(v3/8q)
      t  = abs(v0 - 8q*t0)
      t += abs(v0 - 8q*t0)
      t += abs(v0 - 8q*t0)
      t += abs(v0 - 8q*t0)
      if(t > 1) return 1
      else return 0


§C. Test Vectors