Computing a consensus from a set of votes

Given a set of votes, authorities compute the contents of the consensus.

The consensus status, along with as many signatures as the server currently knows (see section 3.10 below), should be available at

http://<hostname>/tor/status-vote/next/consensus.z

The contents of the consensus document are as follows:

The "valid-after", "valid-until", and "fresh-until" times are taken as the median of the respective values from all the votes.

The times in the "voting-delay" line are taken as the median of the VoteSeconds and DistSeconds times in the votes.

Known-flags is the union of all flags known by any voter.

Entries are given on the "params" line for every keyword on which a majority of authorities (total authorities, not just those participating in this vote) voted on, or if at least three authorities voted for that parameter. The values given are the low-median of all votes on that keyword.

(In consensus methods 7 to 11 inclusive, entries were given on the "params" line for every keyword on which any authority voted, the value given being the low-median of all votes on that keyword.)

    "client-versions" and "server-versions" are sorted in ascending
     order; A version is recommended in the consensus if it is recommended
     by more than half of the voting authorities that included a
     client-versions or server-versions lines in their votes.

With consensus methods 19 through 33, a package line is generated for a given PACKAGENAME/VERSION pair if at least three authorities list such a package in their votes. (Call these lines the "input" lines for PACKAGENAME.) The consensus will contain every "package" line that is listed verbatim by more than half of the authorities listing a line for the PACKAGENAME/VERSION pair, and no others.

The authority item groups (dir-source, contact, fingerprint, vote-digest) are taken from the votes of the voting authorities. These groups are sorted by the digests of the authorities identity keys, in ascending order. If the consensus method is 3 or later, a dir-source line must be included for every vote with legacy-key entry, using the legacy-key's fingerprint, the voter's ordinary nickname with the string "-legacy" appended, and all other fields as from the original vote's dir-source line.

     A router status entry:
        * is included in the result if some router status entry with the same
          identity is included by more than half of the authorities (total
          authorities, not just those whose votes we have).
          (Consensus method earlier than 21)

        * is included according to the rules in section 3.8.0.1 and
          3.8.0.2 below. (Consensus method 22 or later)

        * For any given RSA identity digest, we include at most
          one router status entry.

        * For any given Ed25519 identity, we include at most one router
          status entry.

        * A router entry has a flag set if that is included by more than half
          of the authorities who care about that flag.

        * Two router entries are "the same" if they have the same
          (descriptor digest, published time, nickname, IP, ports> tuple.
          We choose the tuple for a given router as whichever tuple appears
          for that router in the most votes.  We break ties first in favor of
          the more recently published, then in favor of smaller server
          descriptor digest.

       [
        * The Named flag appears if it is included for this routerstatus by
          _any_ authority, and if all authorities that list it list the same
          nickname. However, if consensus-method 2 or later is in use, and
          any authority calls this identity/nickname pair Unnamed, then
          this routerstatus does not get the Named flag.

        * If consensus-method 2 or later is in use, the Unnamed flag is
          set for a routerstatus if any authorities have voted for a different
          identities to be Named with that nickname, or if any authority
          lists that nickname/ID pair as Unnamed.

          (With consensus-method 1, Unnamed is set like any other flag.)

          [But note that authorities no longer vote for the Named flag,
          and the above two bulletpoints are now irrelevant.]
       ]

        * The version is given as whichever version is listed by the most
          voters, with ties decided in favor of more recent versions.

        * If consensus-method 4 or later is in use, then routers that
          do not have the Running flag are not listed at all.

        * If consensus-method 5 or later is in use, then the "w" line
          is generated using a low-median of the bandwidth values from
          the votes that included "w" lines for this router.

        * If consensus-method 5 or later is in use, then the "p" line
          is taken from the votes that have the same policy summary
          for the descriptor we are listing.  (They should all be the
          same.  If they are not, we pick the most commonly listed
          one, breaking ties in favor of the lexicographically larger
          vote.)  The port list is encoded as specified in section 3.8.2.

        * If consensus-method 6 or later is in use and if 3 or more
          authorities provide a Measured= keyword in their votes for
          a router, the authorities produce a consensus containing a
          Bandwidth= keyword equal to the median of the Measured= votes.

        * If consensus-method 7 or later is in use, the params line is
          included in the output.

        * If the consensus method is under 11, bad exits are considered as
          possible exits when computing bandwidth weights.  Otherwise, if
          method 11 or later is in use, any router that is determined to get
          the BadExit flag doesn't count when we're calculating weights.

        * If consensus method 12 or later is used, only consensus
          parameters that more than half of the total number of
          authorities voted for are included in the consensus.

        [ As of 0.2.6.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 13. ]

        * If consensus method 13 or later is used, microdesc consensuses
          omit any router for which no microdesc was agreed upon.

        * If consensus method 14 or later is used, the ns consensus and
          microdescriptors may include an "a" line for each router, listing
          an IPv6 OR port.

        * If consensus method 15 or later is used, microdescriptors
          include "p6" lines including IPv6 exit policies.

        * If consensus method 16 or later is used, ntor-onion-key
          are included in microdescriptors

        * If consensus method 17 or later is used, authorities impose a
          maximum on the Bandwidth= values that they'll put on a 'w'
          line for any router that doesn't have at least 3 measured
          bandwidth values in votes. They also add an "Unmeasured=1"
          flag to such 'w' lines.

        * If consensus method 18 or later is used, authorities include
          "id" lines in microdescriptors. This method adds RSA ids.

        * If consensus method 19 or later is used, authorities may include
          "package" lines in consensuses.

        * If consensus method 20 or later is used, authorities may include
          GuardFraction information in microdescriptors.

        * If consensus method 21 or later is used, authorities may include
          an "id" line for ed25519 identities in microdescriptors.

        [ As of 0.2.8.2-alpha, authorities no longer advertise or negotiate
          consensus method 21, because it contains bugs. ]

        * If consensus method 22 or later is used, and the votes do not
          produce a majority consensus about a relay's Ed25519 key (see
          3.8.0.1 below), the consensus must include a NoEdConsensus flag on
          the "s" line for every relay whose listed Ed key does not reflect
          consensus.

        * If consensus method 23 or later is used, authorities include
          shared randomness protocol data on their votes and consensus.

        * If consensus-method 24 or later is in use, then routers that
          do not have the Valid flag are not listed at all.

        [ As of 0.3.4.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 25. ]

        * If consensus-method 25 or later is in use, then we vote
          on recommended-protocols and required-protocols lines in the
          consensus.  We also include protocols lines in routerstatus
          entries.

        * If consensus-method 26 or later is in use, then we initialize
          bandwidth weights to 1 in our calculations, to avoid
          division-by-zero errors on unusual networks.

        * If consensus method 27 or later is used, the microdesc consensus
          may include an "a" line for each router, listing an IPv6 OR port.

        [ As of 0.4.3.1-alpha, authorities no longer advertise or negotiate
          any consensus methods lower than 28. ]

        * If consensus method 28 or later is used, microdescriptors no longer
          include "a" lines.

        * If consensus method 29 or later is used, microdescriptor "family"
          lines are canonicalized to improve compression.

        * If consensus method 30 or later is used, the base64 encoded
          ntor-onion-key does not include the trailing = sign.

        * If consensus method 31 or later is used, authorities parse the
          "bwweightscale" and "maxunmeasuredbw" parameters correctly when
          computing votes.

        * If consensus method 32 or later is used, authorities handle the
          "MiddleOnly" flag specially when computing a consensus.  When the
          voters agree to include "MiddleOnly" in a routerstatus, they
          automatically remove "Exit", "Guard", "V2Dir", and "HSDir".  If
          the BadExit flag is included in the consensus, they automatically
          add it to the routerstatus.

        * If consensus method 33 or later is used, and the consensus
          flavor is "microdesc", then the "Publication" field in the "r"
          line is set to "2038-01-01 00:00:00".

        * If consensus method 34 or later is used, the consensus
          does not include any "package" lines.

The signatures at the end of a consensus document are sorted in ascending order by identity digest.

All ties in computing medians are broken in favor of the smaller or earlier item.

Deciding which Ids to include

This sorting algorithm is used for consensus-method 22 and later.

  First, consider each listing by tuple of <Ed,Rsa> identities, where 'Ed'
    may be "None" if the voter included "id ed25519 none" to indicate that
    the authority knows what ed25519 identities are, and thinks that the RSA
    key doesn't have one.

  For each such <Ed, RSA> tuple that is listed by more than half of the
    total authorities (not just total votes), include it.  (It is not
    possible for any other <id-Ed, id-RSA'> to have as many votes.)  If more
    than half of the authorities list a single <Ed,Rsa> pair of this type, we
    consider that Ed key to be "consensus"; see description of the
    NoEdConsensus flag.

  Log any other id-RSA values corresponding to an id-Ed we included, and any
    other id-Ed values corresponding to an id-RSA we included.

  For each <id-RSA> that is not yet included, if it is listed by more than
    half of the total authorities, and we do not already have it listed with
    some <id-Ed>, include it, but do not consider its Ed identity canonical.

Deciding which descriptors to include

Deciding which descriptors to include.

A tuple belongs to an <id-RSA, id-Ed> identity if it is a new tuple that matches both ID parts, or if it is an old tuple (one with no Ed opinion) that matches the RSA part. A tuple belongs to an <id-RSA> identity if its RSA identity matches.

A tuple matches another tuple if all the fields that are present in both tuples are the same.

For every included identity, consider the tuples belonging to that identity. Group them into sets of matching tuples. Include the tuple that matches the largest set, breaking ties in favor of the most recently published, and then in favor of the smaller server descriptor digest.

Forward compatibility

Future versions of Tor will need to include new information in the consensus documents, but it is important that all authorities (or at least half) generate and sign the same signed consensus.

To achieve this, authorities list in their votes their supported methods for generating consensuses from votes. Later methods will be assigned higher numbers. Currently specified methods:

     "1" -- The first implemented version.
     "2" -- Added support for the Unnamed flag.
     "3" -- Added legacy ID key support to aid in authority ID key rollovers
     "4" -- No longer list routers that are not running in the consensus
     "5" -- adds support for "w" and "p" lines.
     "6" -- Prefers measured bandwidth values rather than advertised
     "7" -- Provides keyword=integer pairs of consensus parameters
     "8" -- Provides microdescriptor summaries
     "9" -- Provides weights for selecting flagged routers in paths
     "10" -- Fixes edge case bugs in router flag selection weights
     "11" -- Don't consider BadExits when calculating bandwidth weights
     "12" -- Params are only included if enough auths voted for them
     "13" -- Omit router entries with missing microdescriptors.
     "14" -- Adds support for "a" lines in ns consensuses and microdescriptors.
     "15" -- Adds support for "p6" lines.
     "16" -- Adds ntor keys to microdescriptors
     "17" -- Adds "Unmeasured=1" flags to "w" lines
     "18" -- Adds 'id' to microdescriptors.
     "19" -- Adds "package" lines to consensuses
     "20" -- Adds GuardFraction information to microdescriptors.
     "21" -- Adds Ed25519 keys to microdescriptors.
     "22" -- Instantiates Ed25519 voting algorithm correctly.
     "23" -- Adds shared randomness protocol data.
     "24" -- No longer lists routers that are not Valid in the consensus.
     "25" -- Vote on recommended-protocols and required-protocols.
     "26" -- Initialize bandwidth weights to 1 to avoid division-by-zero.
     "27" -- Adds support for "a" lines in microdescriptor consensuses.
     "28" -- Removes "a" lines from microdescriptors.
     "29" -- Canonicalizes families in microdescriptors.
     "30" -- Removes padding from ntor-onion-key.
     "31" -- Uses correct parsing for bwweightscale and maxunmeasuredbw
             when computing weights
     "32" -- Adds special handling for MiddleOnly flag.
     "33" -- Sets "publication" field in microdesc consensus "r" lines
             to a meaningless value.
     "34" -- Removes "package" lines from consensus.

Before generating a consensus, an authority must decide which consensus method to use. To do this, it looks for the highest version number supported by more than 2/3 of the authorities voting. If it supports this method, then it uses it. Otherwise, it falls back to the newest consensus method that it supports (which will probably not result in a sufficiently signed consensus).

All authorities MUST support method 25; authorities SHOULD support more recent methods as well. Authorities SHOULD NOT support or advertise support for any method before 25. Clients MAY assume that they will never see a current valid signed consensus for any method before method 25.

(The consensuses generated by new methods must be parsable by implementations that only understand the old methods, and must not cause those implementations to compromise their anonymity. This is a means for making changes in the contents of consensus; not for making backward-incompatible changes in their format.)

The following methods have incorrect implementations; authorities SHOULD NOT advertise support for them:

"21" -- Did not correctly enable support for ed25519 key collation.

Encoding port lists

Whether the summary shows the list of accepted ports or the list of rejected ports depends on which list is shorter (has a shorter string representation). In case of ties we choose the list of accepted ports. As an exception to this rule an allow-all policy is represented as "accept 1-65535" instead of "reject " and a reject-all policy is similarly given as "reject 1-65535".

Summary items are compressed, that is instead of "80-88,89-100" there only is a single item of "80-100", similarly instead of "20,21" a summary will say "20-21".

Port lists are sorted in ascending order.

The maximum allowed length of a policy summary (including the "accept " or "reject ") is 1000 characters. If a summary exceeds that length we use an accept-style summary and list as much of the port list as is possible within these 1000 bytes. [XXXX be more specific.]

Computing Bandwidth Weights

Let weight_scale = 10000, or the value of the "bwweightscale" parameter. (Before consensus method 31 there was a bug in parsing bwweightscale, so that if there were any consensus parameters after it alphabetically, it would always be treated as 10000. A similar bug existed for "maxunmeasuredbw".)

Starting with consensus method 26, G, M, E, and D are initialized to 1 and T to 4. Prior consensus methods initialize them all to 0. With this change, test tor networks that are small or new are much more likely to produce bandwidth-weights in their consensus. The extra bandwidth has a negligible impact on the bandwidth weights in the public tor network.

Let G be the total bandwidth for Guard-flagged nodes. Let M be the total bandwidth for non-flagged nodes. Let E be the total bandwidth for Exit-flagged nodes. Let D be the total bandwidth for Guard+Exit-flagged nodes. Let T = G+M+E+D

Let Wgd be the weight for choosing a Guard+Exit for the guard position. Let Wmd be the weight for choosing a Guard+Exit for the middle position. Let Wed be the weight for choosing a Guard+Exit for the exit position.

Let Wme be the weight for choosing an Exit for the middle position. Let Wmg be the weight for choosing a Guard for the middle position.

Let Wgg be the weight for choosing a Guard for the guard position. Let Wee be the weight for choosing an Exit for the exit position.

Balanced network conditions then arise from solutions to the following system of equations:

WggG + WgdD == M + WmdD + WmeE + WmgG (guard bw = middle bw) WggG + WgdD == WeeE + WedD (guard bw = exit bw) WedD + WmdD + WgdD == D (aka: Wed+Wmd+Wdg = weight_scale) WmgG + WggG == G (aka: Wgg = weight_scale-Wmg) WmeE + WeeE == E (aka: Wee = weight_scale-Wme)

We are short 2 constraints with the above set. The remaining constraints come from examining different cases of network load. The following constraints are used in consensus method 10 and above. There are another incorrect and obsolete set of constraints used for these same cases in consensus method 9. For those, see dir-spec.txt in Tor 0.2.2.10-alpha to 0.2.2.16-alpha.

Case 1: E >= T/3 && G >= T/3 (Neither Exit nor Guard Scarce)

In this case, the additional two constraints are: Wmg == Wmd, Wed == 1/3.

    This leads to the solution:
        Wgd = weight_scale/3
        Wed = weight_scale/3
        Wmd = weight_scale/3
        Wee = (weight_scale*(E+G+M))/(3*E)
        Wme = weight_scale - Wee
        Wmg = (weight_scale*(2*G-E-M))/(3*G)
        Wgg = weight_scale - Wmg

  Case 2: E < T/3 && G < T/3 (Both are scarce)

Let R denote the more scarce class (Rare) between Guard vs Exit. Let S denote the less scarce class.

Subcase a: R+D < S

In this subcase, we simply devote all of D bandwidth to the scarce class.

       Wgg = Wee = weight_scale
       Wmg = Wme = Wmd = 0;
       if E < G:
         Wed = weight_scale
         Wgd = 0
       else:
         Wed = 0
         Wgd = weight_scale

    Subcase b: R+D >= S

In this case, if M <= T/3, we have enough bandwidth to try to achieve a balancing condition.

Add constraints Wgg = weight_scale, Wmd == Wgd to maximize bandwidth in the guard position while still allowing exits to be used as middle nodes:

Wee = (weight_scale*(E - G + M))/E Wed = (weight_scale*(D - 2E + 4G - 2M))/(3D) Wme = (weight_scale*(G-M))/E Wmg = 0 Wgg = weight_scale Wmd = (weight_scale - Wed)/2 Wgd = (weight_scale - Wed)/2

If this system ends up with any values out of range (ie negative, or above weight_scale), use the constraints Wgg == weight_scale and Wee == weight_scale, since both those positions are scarce:

         Wgg = weight_scale
         Wee = weight_scale
         Wed = (weight_scale*(D - 2*E + G + M))/(3*D)
         Wmd = (weight_Scale*(D - 2*M + G + E))/(3*D)
         Wme = 0
         Wmg = 0
         Wgd = weight_scale - Wed - Wmd

      If M > T/3, then the Wmd weight above will become negative. Set it to 0
      in this case:
         Wmd = 0
         Wgd = weight_scale - Wed

  Case 3: One of E < T/3 or G < T/3

    Let S be the scarce class (of E or G).

    Subcase a: (S+D) < T/3:
      if G=S:
        Wgg = Wgd = weight_scale;
        Wmd = Wed = Wmg = 0;
        // Minor subcase, if E is more scarce than M,
        // keep its bandwidth in place.
        if (E < M) Wme = 0;
        else Wme = (weight_scale*(E-M))/(2*E);
        Wee = weight_scale-Wme;
      if E=S:
        Wee = Wed = weight_scale;
        Wmd = Wgd = Wme = 0;
        // Minor subcase, if G is more scarce than M,
        // keep its bandwidth in place.
        if (G < M) Wmg = 0;
        else Wmg = (weight_scale*(G-M))/(2*G);
        Wgg = weight_scale-Wmg;

    Subcase b: (S+D) >= T/3
      if G=S:
        Add constraints Wgg = weight_scale, Wmd == Wed to maximize bandwidth
        in the guard position, while still allowing exits to be
        used as middle nodes:
          Wgg = weight_scale
          Wgd = (weight_scale*(D - 2*G + E + M))/(3*D)
          Wmg = 0
          Wee = (weight_scale*(E+M))/(2*E)
          Wme = weight_scale - Wee
          Wmd = (weight_scale - Wgd)/2
          Wed = (weight_scale - Wgd)/2
      if E=S:
        Add constraints Wee == weight_scale, Wmd == Wgd to maximize bandwidth
        in the exit position:
          Wee = weight_scale;
          Wed = (weight_scale*(D - 2*E + G + M))/(3*D);
          Wme = 0;
          Wgg = (weight_scale*(G+M))/(2*G);
          Wmg = weight_scale - Wgg;
          Wmd = (weight_scale - Wed)/2;
          Wgd = (weight_scale - Wed)/2;

To ensure consensus, all calculations are performed using integer math with a fixed precision determined by the bwweightscale consensus parameter (defaults at 10000, Min: 1, Max: INT32_MAX). (See note above about parsing bug in bwweightscale before consensus method 31.)

For future balancing improvements, Tor clients support 11 additional weights for directory requests and middle weighting. These weights are currently set at weight_scale, with the exception of the following groups of assignments:

Directory requests use middle weights:

Wbd=Wmd, Wbg=Wmg, Wbe=Wme, Wbm=Wmm

Handle bridges and strange exit policies:

Wgm=Wgg, Wem=Wee, Weg=Wed

Computing consensus flavors

Consensus flavors are variants of the consensus that clients can choose to download and use instead of the unflavored consensus. The purpose of a consensus flavor is to remove or replace information in the unflavored consensus without forcing clients to download information they would not use anyway.

Directory authorities can produce and serve an arbitrary number of flavors of the same consensus. A downside of creating too many new flavors is that clients will be distinguishable based on which flavor they download. A new flavor should not be created when adding a field instead wouldn't be too onerous.

Examples for consensus flavors include:

      - Publishing hashes of microdescriptors instead of hashes of
        full descriptors (see section 3.9.2).
      - Including different digests of descriptors, instead of the
        perhaps-soon-to-be-totally-broken SHA1.

Consensus flavors are derived from the unflavored consensus once the voting process is complete. This is to avoid consensus synchronization problems.

Every consensus flavor has a name consisting of a sequence of one or more alphanumeric characters and dashes. For compatibility, the original (unflavored) consensus type is called "ns".

The supported consensus flavors are defined as part of the authorities' consensus method.

All consensus flavors have in common that their first line is "network-status-version" where version is 3 or higher, and the flavor is a string consisting of alphanumeric characters and dashes:

"network-status-version" SP version [SP flavor] NL

ns consensus

The ns consensus flavor is equivalent to the unflavored consensus. When the flavor is omitted from the "network-status-version" line, it should be assumed to be "ns". Some implementations may explicitly state that the flavor is "ns" when generating consensuses, but should accept consensuses where the flavor is omitted.

Microdescriptor consensus

The microdescriptor consensus is a consensus flavor that contains microdescriptor hashes instead of descriptor hashes and that omits exit-policy summaries which are contained in microdescriptors. The microdescriptor consensus was designed to contain elements that are small and frequently changing. Clients use the information in the microdescriptor consensus to decide which servers to fetch information about and which servers to fetch information from.

The microdescriptor consensus is based on the unflavored consensus with the exceptions as follows:

"network-status-version" SP version SP "microdesc" NL

[At start, exactly once.]

The flavor name of a microdescriptor consensus is "microdesc".

Changes to router status entries are as follows:

    "r" SP nickname SP identity SP publication SP IP SP ORPort
        SP DirPort NL

        [At start, exactly once.]

        Similar to "r" lines in section 3.4.1, but without the digest element.

    "a" SP address ":" port NL

        [Any number]

        Identical to the "r" lines in section 3.4.1.

(Only included when the vote is generated with consensus-method 14 or later, and the consensus is generated with consensus-method 27 or later.)

"p" ... NL

[At most once]

Not currently generated.

Exit policy summaries are contained in microdescriptors and therefore omitted in the microdescriptor consensus.

"m" SP digest NL

[Exactly once.*]

"digest" is the base64 of the SHA256 hash of the router's microdescriptor with trailing =s omitted. For a given router descriptor digest and consensus method there should only be a single microdescriptor digest in the "m" lines of all votes. If different votes have different microdescriptor digests for the same descriptor digest and consensus method, at least one of the authorities is broken. If this happens, the microdesc consensus should contain whichever microdescriptor digest is most common. If there is no winner, we break ties in the favor of the lexically earliest.

[*Before consensus method 13, this field was sometimes erroneously omitted.]

Additionally, a microdescriptor consensus SHOULD use the sha256 digest algorithm for its signatures.