BEP:49
Title:Distributed Torrent Feeds
Version: ff3365f62b5a398b4fc00525af3fd943211e0dec
Last-Modified:Mon Oct 3 01:03:39 2016 +0200
Author: The 8472 <the8472.bep@infinite-source.de>
Status: Draft
Type:Standards Track
Content-Type:text/x-rst
Created:14-Aug-2016
Post-History:

Abstract

This document specifies conventions for content feeds represented as torrents within torrents, i.e. metadata files which in turn list metadata files describing the actual content.

The goal is to provide functionality comparable to RSS Feeds [2] but in a distributed fashion by using the bittorrent protocol itself.

Intended Use

This BEP should be used when the distribution of torrent files themselves is the goal, especially when subscribers may only be interested in a subset of the data or may wish to search through the metadata before selecting torrents to download.

In cases where the publisher can reasonably assume that subscribers will want almost all content of a foreseeably small set of files then BEP 46 should be used directly by adding content to a multi-file torrent.

This BEP is specified in terms of torrents as feed items as that is its primary goal, but in principle it may also be used to distribute other types of feed items.

Design

Feeds can be used in many different ways, from publishing a small feed of individual items at irregular intervals to regular incremental snapshots of very large datasets for archiving purposes. Some may need append-only semantics, others may also wish to replicate deletions.

Subscribers may have different interests such as wanting access to the most recent items quickly with little overhead or having a complete copy of data at any time.

To cover these differing needs the feed is split into a HEAD element containing most recent items and an optional tail of one or more large archives for older items to which HEAD items are eventually retired.

Metadata files are used instead of a list of magnet links to remove a layer of indirection where subscribers would have to perform a large amount of DHT lookups and metadata exchanges to obtain the data they need to make a decision about which torrent to download. Unlike the metadata exchange this also preserves the outer dictionary, which is used by some BEPs.

Basic Conventions

  1. The info dictionary contains an entry "bep49": {} with zero or more entries in the sub-dictionary.

  2. Feed torrents always are multi-file torrents

  3. All files in the root directory of the torrent are feed items. Unless specified otherwise files contained in subdirectories are metadata describing the feed itself or part of future extensions

  4. All feed items MUST have a sha1 field in their file list entry, as defined by BEP 47 [5]. This is to make it simple for subscribers to reuse as much data as possible when retrieving an updated feed and thus avoid wasted bandwidth.

  5. The files list SHOULD NOT be sorted lexicographically. Instead new files should be appended at the end when updating the feed. Any content that is not ordered by insertion time should be put into a separate piece-aligned block so it does not interfere with appends or archival operations.

  6. When creating or updating a feed torrent an additional padding file [5] SHOULD be inserted at the end to pad it to a full piece boundary. This way append and archive operations can be performed without recomputing hashes. Future BEPs that offer piece-aligned files through other means may be used instead.

  7. It is RECOMMENDED that the filename of a feed item is the same as name field of its contents. Publishers MAY shorten the names to reduce overhead on very large archives containing many small torrents.

    Additionally publishers SHOULD avoid using duplicate names for files with different content. Subscribers MUST NOT rely on that.

    Similarly the .torrent extension should be used for all feed items, assuming that they are torrents.

  8. Feed items SHOULD be torrents themselves, but specialized applications using this specification MAY put other types of files directly into the feed if the indirection is too costly. Therefore subscribers MUST NOT assume that every feed item is a valid torrent file

Updates and Chain of Trust

This BEP mostly covers the semantics of creating and processing feed torrents. Establishing trust and acquiring updates is handled by other specifications.

Feeds can either be signed and updated via BEPs 35 [1] and 39 [3] or, the recommended way, via BEP 46 [4].

Since BEP 46 is a N:1 mapping the chain of trust is established based on the pubkey used to perform the DHT lookup. I.e. a feed could be published by multiple sources and eventually be forked when they publish diverging updates.

Optionally "bep49": {"bep46 source": <bep 46 magnet link (string)>} can be embedded in the info dictionary to inform subscribers where to obtain updates. If a feed has been obtained through a BEP 46 magnet then that takes precedence over the embedded magnet.

Archives

Publishers MAY move older items - from the beginning of the files list - to additional torrents to keep the HEAD small.

To indicate archives are present "bep49": {"archive next": <bep 9 magnet link (string)>} should be merged into the info dictionary.

To indicate that a feed torrent is not the head but part of the archive "bep49": {"archive": 1}. Additionally archive feed torrents can point to even older archives. Archives MUST NOT contain prev or bep46 source fields.

Since torrents are usually fairly small this mechanism will only be needed for feeds exceeding thousands of items. The need for multiple archives should be even less common.

Implementation note: Assuming that appends to the HEAD were done in a piece-aligned manner and the HEAD and archives share the same piece size then the publisher can move batches of items and their pieces hashes to the archive by exploiting that alignment.

Subscribers MUST NOT rely on this property.

Note that archives can be reused since it is not necessary to retire items to the archives every time new ones are added to the feed HEAD.

Revisions

Since the feed implicitly goes through multiple revisions publishers MAY make them explicit by pointing to the previous version of the feed head by merging "bep49": {"prev": <bep9 magnet link (string)>} into the info dictionary.

This BEP does not specify any semantics about the differences in file content between the current and previous revisions. I.e. how absent or overwritten files should be handled. It is left to the subscriber implementation whether to operate in append-only mode or honor removals and replacements.

On disk layout

This section is advisory.

The file names should only be seen as a hint and implementations should be prepared to remap them to a different directory layout due to potential file name collisions.

This already is the case with regular torrents, but when managing multiple feed revisions and archives in a shared directory structure extra care has to be taken to avoid accidental content loss, e.g. when an updated feed contains a feed item of the same name but with different content.

Feeds of Feeds

Feed items themselves may be feed torrents. This allows feeds to act as updatable index of more specialized feeds. In other words, it's possible for feeds to link to each other.

Examples

The following graph shows how a feed might evolve over time:

+---------------+
| Rev 1: 0 - 30 |
+---------------+
        ^
        |
+-----------------+
| Rev 2: 0 - 1000 |
+-----------------+
        ^
        |
+-------------------+   +------------------+
| Rev 3: 501 - 1500 |-->| Archive 0 - 500  |
+-------------------+   +------------------+
        ^                      ^
        |                      |
+-------------------+          |
| Rev 4: 501 - 2000 |----------+
+-------------------+
        ^
        |
        .
        .
        .
        |
+----------------------+   +-----------------------+   +-------------------+
| Rev N: 50000 - 51000 |-->| Archive 25000 - 49999 |-->| Archive 0 - 24999 |
+----------------------+   +-----------------------+   +-------------------+

References

[1]BEP_0035. Torrent Signing (http://www.bittorrent.org/beps/bep_0035.html)
[2]BEP_0036. Torrent RSS feeds (http://www.bittorrent.org/beps/bep_0036.html)
[3]BEP_0039. Updating Torrents Via Feed URL. (http://www.bittorrent.org/beps/bep_0039.html)
[4]BEP_0046. Updating Torrents Via DHT Mutable Items. (http://www.bittorrent.org/beps/bep_0046.html)
[5](1, 2) BEP_0047. Padding files and extended file attributes. (http://www.bittorrent.org/beps/bep_0047.html)