BEP:8
Title:Tracker Peer Obfuscation
Version: a6d4c7fcc6be11d0093b6b3b5687e3209e97c2cb
Last-Modified:Wed Apr 30 01:09:14 2008 +0000
Author: David Harrison <dave@bittorrent.com>, Anthony Ciani <tony@ciani.phy.uic.edu>, Arvid Norberg <arvid@bittorrent.com>, Greg Hazel <greg@bittorrent.com>
Status: Deferred
Type:Standards Track
Created:31-Jan-2008
Post-History:

This extends the tracker protocol to support simple obfuscation of the peers it returns, using the infohash as a shared secret between the peer and the tracker. The obfuscation does not provide any security against eavesdroppers that know the infohash of the torrent. The goal is to prevent internet service providers and other network administrators from blocking or disrupting bittorrent traffic connections that span between the receiver of a tracker response and any peer IP-port appearing in that tracker response.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119.

Announce

When using this extension, instead of passing the info_hash parameter to the tracker, a sha_ih is passed.

The value of sha_ih MUST be the info-hash of the torrent, with a second SHA-1 applied to it.

For example if a torrent has infohash with hex representation aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d then its sha_ih is sha1(infohash)='6b4f89a54e2d27ecd7e8da5b4ab8fd9d1d8b119'.

The value MUST be url encoded, just like the info_hash. Thus the sha_ih above when url encoded becomes kO%89%A5N-%27%EC%D7%E8%DA%05%B4%AB%8F%D9%D1%D8%B1%19.

If the sha_ih is passed then the value for the port parameter should be treated as a 16 bit integer and MUST be obscured as described in the Obfuscation Method section. Similarly if the optional ip parameter is passed in the announce then its value MUST also be so obscured.

This extension does not change the semantics of any parameter passed in the peer's announce.

Announce Response

If the tracker supports this extension, the response should be exactly the same as if the info_hash had been passed, except that any field that contains peer information (such as peers, peers6 or any other field defined by another extension) MUST be obfuscated as described in the next section.

There are additional parameters the tracker may OPTIONALLY return. These are discussed in the optimizations section.

Obfuscation Method

The values for the ip and port announce parameters, the returned peer list and any other values that contain peer information are obscured using the method described in this section.

We distinguish between the tracker peer list and the returned peer list. The tracker peer list contains the ip-port pairs of all known peers in a given torrent, i.e., those peers that have reported to the tracker that they are transferring the file with a given infohash. The tracker may store this peer list however it wishes. The returned peer list contains a packed array of ip-port pairs conforming to the BitTorrent protocol specification. If the swarm is sufficiently large then the returned ip-port pairs constitute a subset of the ip-port pairs in the tracker peer list.

When a parameter is obscured, it is encrypted using RC4-drop768 encryption using the infohash as a shared secret and optionally employing an initialization vector.

For the remainder of this document RC4 refers to RC4-drop768. In the process of encryption, RC4 generates a pseudorandom string that is XOR'd with the plaintext to generate the ciphertext. The receiver recovers the plaintext by generating the same pseudorandom string and XOR'ing it with the ciphertext. In generating the pseudorandom string, the tracker and client MUST discard the first 768 bytes. The next 8 bytes in the pseudorandom string are reserved for optimizations discussed in the next section.

To communicate an initialization vector, the tracker includes in the bencoded response the parameter iv with value set to a byte string containing the initialization vector. The initialization vector can be of arbitrary length and is sent in plaintext. Initialization vectors can only be applied to parameters in tracker responses and NOT to announces.

If the tracker sends no initialization vector then the infohash is used as the RC4 key (160 bit key). If the tracker provides an initialization vector then the RC4 key is generated by appending the vector to the infohash and then hashing with SHA-1. The resulting hash is then used as the RC4 key.

For example, given infohash aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d and initialization vector abcd both represented in hex, the RC4 key is derived as follows:

key = sha1( 'aaf4c61ddcc5e8a2dabedef3b482cd9aea9434dabcd' )

The resulting key in hex is f36e9cae87cf33e07645ef5ca745a8a83469f31e.

It is RECOMMENDED that the tracker use the initialization vector, and that it change the iv on roughly the same period as the rerequest interval. The reasoning for this is contained in the rationale.

Optimizations

The described optimizations are OPTIONAL for the tracker, but the corresponding client-side MUST be implemented by clients that support this extension. These optimizations hobble the strength of the RC4 encryption in order to improve tracker performance. In the rationale section we discuss why hobbling RC4 is reasonable and in many cases has negligible foreseen effect on security.

For the purpose of these optimizations we assume that the tracker stores the tracker peer list for each infohash as a packed array that can be copied directly into the response. We further assume that the packed array is reused many times and that with each request the tracker either returns the entire packed array or copies a single contiguous substring from the tracker peer list into the response.

If the peerlist is represented and used as assumed then to improve randomness in the set of peers handed out by the tracker, it is RECOMMENDED that the tracker periodically reshuffle the peerlist with period similar to the rerequest interval. After each reshuffle the tracker reperforms the operations described in this section.

To reduce computation the tracker MAY cache the pseudorandom string generated by RC4 and reuse it as peers arrive and depart.

The tracker MAY also cache the encrypted tracker peer list. To support this the tracker MUST pass two additional parameters i and n each with 32-bit integer values, except the tracker MAY omit i and n when i=0 and the returned peer list is the entire tracker peer list. Whether the tracker returns i and n, the first 8 bytes of the RC4 psuedorandom string are reserved for obscuring i and n. We come back to this momentarily. Decryption starts by XORing from 6i bytes for ipv4 (or 18i for ipv6) into the pseudorandom string after the discarded and reserved bytes. Assuming that the tracker encrypted the tracker peer list starting from the first byte after the discarded and reserved bytes in the pseudorandom string then i also corresponds to the ith ip-port pair in the tracker peer list.

So that the client and the tracker do not have to generate an arbitrarily long pseudorandom string to support large swarms, we assume the tracker bounds the length of the pseudorandom string and reports the length in ip-port pairs as the value to parameter n. n excludes reserved and discarded bytes. We RECOMMEND that n be equal to the length of the tracker peer list or random but within constant factor of the longest peerlist returned by the tracker, whichever is smaller. Thus the tracker encrypts the jth byte of the ith ip-port pair in an ipv4 tracker peer list by XORing with the byte (6i+j) mod n bytes into the pseudorandom string.

Transmitting i and n as plaintext would significantly reduce the cost for an attacker to recover the pseudorandom string. The tracker MUST XOR the value of i with the first 32 bits of the pseudorandom string. The tracker then XORs n with the next 32 bits from the pseudorandom string (see Figure 1).

bep_0008_pseudo.png

Figure 1: The first 768 bytes of the RC4 pseudorandom string are discarded. The parameter i in the tracker response has value x xor i. The parameter n has value y xor n.

We describe encryption in the following example for an ipv4 tracker peer list consisting of 3 ip-port pairs, and using an RC4 pseudorandom string of length n=2. n is small for purposes of illustration. Also, for the purpose of illustration, the tracker returns only 2 peers at a time.

Given the following peer list
(208.72.193.86, 6881), (209.81.173.15,14321), (128.213.6.8, 6881)

As a packed array represented in hex it becomes

d048c1561ae1d151ad0f37f180d506081ae1

which we XOR with an RC4 pseudorandom string excluding discarded and
reserved bytes, e.g.,

a496e5f9b83e835013d42226

to generate

74de24afa2df5201bedb15d72443e3f1a2df

Because the RC4 pseudorandom string is shorter than the tracker peer list, we wrap to the beginning of the pseudorandom string.

A tracker returning the first two peers would return the bencoded equivalent of:

peers=74de24afa2df5201bedb15d7, i=0, n=2

A tracker returning the second and third peer would return the bencoded equivalent of:

peers=5201bedb15d72443e3f1a2df, i=1, n=2

In each response the tracker includes additional parameters such as the rerequest interval and the initialization vector iv.

The tracker response MUST remain a valid bencoded message.

Backwards Compatibility

Trackers that support obfuscation are identified in the .torrent file by the inclusion of an obfuscate-announce-list which otherwise has the same semantics as the announce-list parameter. Peers that do not support obfuscation simply ignore the obfuscate-announce-list.

A client that is configured to use this extension should always send the sha_ih to any tracker supporting obfuscation. The client SHOULD only contact trackers in the announce-list once the client has attempted all trackers in the obfuscate-announce-list and all failed.

If a tracker that supports obfuscation wishes to allow legacy peers to connect to the tracker then the announce URL should appear in both the obfuscate-announce-list and the announce-list.

If a tracker URL appears in both lists running on the same port, and the tracker failed to respond when selected from the obfuscate-announce-list then the client MAY treat the tracker in the announce-list as if it were temporarily unreachable and defer trying it until it has tried other trackers in the announce-list.

Peers MUST never send both the info_hash and sha_ih parameters in the same request, since that would defeat the purpose of the shared secret.

Any peer that requests with a sha_ih SHOULD implement Message Stream Encryption (MSE) [1]. Any peer returned from the tracker in response to a request with a sha_ih SHOULD be assumed to support Message Stream Encryption. We include these provisions because if a peer communicates with another peer without using MSE then the BitTorrent protocol is trivially identified from the first twenty bytes of the BitTorrent header and the info_hash appears in plaintext as the next twenty bytes, hence also defeating the purpose of the shared secret.

If the tracker does not know enough peers assumed to support MSE to return the desired number of peers then it MAY include peers that are not assumed to support MSE. If a peer closes a connection in response to an encrypted header then the initiating peer SHOULD assume that the peer does not support MSE. The initiating peer however SHOULD ONLY initiate unencrypted connections when all peers have been tried and those that support MSE fail to provide "adequate performance." We intentionally omit any definition of "adequate performance."

Rationale

This extension directly addresses a known attack on the BitTorrent protocol performed by some deployed network hardware. By obscuring the ip-port pairs network hardware can no longer easily identify ip-port pairs that are running BitTorrent by observing peer-to-tracker communications. This deployed hardware under some conditions disrupts BitTorrent connections by injecting forged TCP reset packets.

This hardware was presumably deployed to get around BitTorrent Message Stream Encryption [1]. Peers implementing BitTorrent Message Stream Encryption obfuscate peer-to-peer connections by employing RC4 encryption on every byte from the first byte transferred. BitTorrent Message Stream Encryption thus increases the difficulty for a device observing passing packets to identify BitTorrent peer-to-peer connections.

By using the SHA-1 of the infohash, the tracker is able to identify torrents without sending the plaintext infohash and without requiring an additional prior exchange of a shared secret. Where trackers now maintain mappings from infohash to the corresponding torrent's peerlist and other torrent-specific state, obfuscated trackers would need one additional mapping from sha_ih to the torrent's state. Trackers may also cache the encrypted version of each torrent's tracker peer list, to increase computational performance at the expense of increasing memory footprint by a constant factor.

The obfuscation method meets the following criteria:

  • The entire plaintext of the peer list is not easily obtained even if an eavesdropper identifies one or more subsequent connections as using BitTorrent and the corresponding ip-port pairs appeared in the ciphertext of the tracker response.
  • Even when a subsequent connection from a peer that has received a tracker response is observed by an eavesdropper, it is difficult to map the ip-port pair to specific ciphertext to verify that the connection is using BitTorrent.

When the optimizations are used,

  • Few computations are performed at request time.
  • Encryption may be performed at the time a peer is added. The encrypted peer ip and port may be handed out hundreds of times.
  • Security is minimally impacted.

The objective is NOT to create a cryptographically secure protocol that can survive unlimited observation of passing packets and substantial computational resources on network timescales. The objective is to raise the bar sufficiently to deter attacks based on observing ip-port numbers in peer-to-tracker communications.

If a tracker observes a large number of tracker requests and responses and subsequent connections, it is possible to attack the encryption. RC4 is known to have a number of weaknesses especially in the way it is used with WEP [2] [3] [4]. However, with tracker peer obfuscation, the number of bytes transferred between the tracker and a client is likely significantly smaller than transferred between a wireless computer and a basestation. An attacker faces a much larger task in obtaining sufficient ciphertext to directly break the encryption.

Hobbling the RC4 encryption by using a bounded-length RC4 pseudorandom string for small swarms is likely to have negilgible impact on security over any other encyption method since the pseudorandom string is probably equal to or longer than the plaintext and thus no part of it is repeated in the XOR except as peers arrive or leave the swarm. Thus on the timescales of rerequest intervals, nearly the same ciphertext is handed to every peer requesting the same infohash. Intercepting the same ciphertext multiple times provides no additional information to the attacker. The attacker could correlate ip-port pairs in connections following tracker responses, but an attacker could do this regardless of the encryption method employed. Furthermore more direct methods of traffic analysis applied to peer-to-peer communication is available to network operators.

For larger swarms, hobbling RC4 may simplify breaking the encryption since the same pseudorandom string is used repeatedly across the peer list. Some study is in order taking into account that the tracker can periodically change intiailization vectors.

We know from experience that periodically reshuffling peer lists on the order of the rerequest interval negligibly impacts tracker performance even with swarms containing millions of peers. Generating a new pseudorandom string using RC4 on this same time interval is likely to incur negligible performance penalty because 1) RC4 is a small constant factor more expensive than a shuffle on an input string of equal length, 2) the generated pseudorandom string is only n ip-port pairs long where recommended n is within a small constant factor larger than the largest returned peer list and thus much smaller than the tracker peer list for large swarms, and 3) the cost of the XOR operation is lighter weight than performing a random shuffle.

References

[1](1, 2) BitTorrent Message Stream Encryption (http://www.azureuswiki.com/index.php/Message_Stream_Encryption)
[2]Nikita Borisov, Ian Goldberg, and David Wagner. Intercepting mobile communications: the insecurity of 802.11. In ACM MobiCom 2001, pages 180-189. ACM Press, 2001.
[3]Scott R. Fluhrer, Itsik Mantin, and Adi Shamir. Weaknesses in the key scheduling algorithm of RC4. In Serge Vaudenay and Amr M. Youssef, editors, Selected Areas in Cryptography 2001, volume 2259 of Lecture Notes in Computer Science, pages 1-24. Springer, 2001.
[4]Adam Stubblefeld, John Ioannidis, and Aviel D. Rubin. A key recovery attack on the 802.11b wired equivalent privacy protocol (WEP). ACM Transactions on Information and System Security, 7(2):319-332, May 2004.

Example Python Code

Request handling in a dummy tracker implementing tracker peer obfuscation:

from sha import sha
from random import randint
from struct import unpack
from rc4 import rc4  # rc4(k) generates k RC4 pseudorandom bytes.

rand = open("/dev/random","r").read
rc4 = rc4()

# tracker configuration
MAX_PEERS = 100

# per torrent state.
infohash = sha("dummy_info").digest()
pseudo = ''                        # pseudorandom RC4 string.
num_peers = 1000                   # current swarm size.
tracker_peer_list = rand(6) * num_peers
obfuscated_tracker_peer_list = ''

def xor(plaintext,pseudo):
  isint = False
  if type(plaintext) == int: # convert to byte string.
    plaintext = "".join([chr(int(x,16)) for x in "%.4x" % plaintext])
    isint = True
  n = len(pseudo)
  ciphertext = "".join(
    [chr(ord(pseudo[i%n])^ord(plaintext[i])) for i in xrange(len(plaintext))])
  if isint:
    ciphertext = unpack("!I", ciphertext)[0]   # convert back to unsigned int
  return ciphertext

def init():  # called once per rerequest interval.
  global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
  iv = rand(20)
  rc4.key = sha(infohash + iv).digest()
  rc4(768)                         # discard first 768
  x = rc4(4)
  y = rc4(4)
  n = min(num_peers, randint(MAX_PEERS * 2, MAX_PEERS * 4))
  n_xor_y = xor(n,y)
  pseudo = rc4(n*6)
  obfuscated_tracker_peer_list = xor(tracker_peer_list,pseudo)

def getpeers( numwant ):
  global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
  response = {}
  response['iv'] = iv
  numwant = min(numwant, MAX_PEERS)
  if numwant >= num_peers:
    response['peers'] = obfuscated_tracker_peer_list
    return response

  i = randint(0,num_peers-numwant)
  response['i'] = xor(i,x)
  response['n'] = n_xor_y
  # peers at end of tracker peer list have lower probability of being picked,
  # but this requires only one copy.
  response['peers'] = obfuscated_tracker_peer_list[i*6:(i+numwant)*6]
  return response

init()
print getpeers(20)