TODO

   1 Comply with the newly defined protocol on the web page.
   2
   3 Various things need to done to comply with the newly defined protocol:
   4  - use the compact encoding of contact information
   5  - remove the originated time from the value storage
   6  - add the token to find_node responses
   7  - use the token in store_node requests
   8  - standardize the error messages (especially for a bad token)
   9
  10
  11 Reduce the memory footprint by clearing the AptPackages caches.
  12
  13 The memory usage is a little bit high due to keeping the AptPackages
  14 caches always. Instead, they should timeout after a period of inactivity
  15 (say 15 minutes), and unload themselves from meory. It only takes a few
  16 seconds to reload, so this should not be an issue.
  17
  18
  19 Packages.diff files need to be considered.
  20
  21 The Packages.diff/Index files contain hashes of Packages.diff/rred.gz
  22 files, which themselves contain diffs to the Packages files previously
  23 downloaded. Apt will request these files for the testing/unstable
  24 distributions. They need to either be ignored, or dealt with properly by
  25 adding them to the tracking done by the AptPackages module.
  26
  27
  28 PeerManager needs to download large files from multiple peers.
  29
  30 The PeerManager currently chooses a peer at random from the list of
  31 possible peers, and downloads the entire file from there. This needs to
  32 change if both a) the file is large (more than 512 KB), and b) there are
  33 multiple peers with the file. The PeerManager should then break up the
  34 large file into multiple pieces of size < 512 KB, and then send requests
  35 to multiple peers for these pieces.
  36
  37 This can cause a problem with hash checking the returned data, as hashes
  38 for the pieces are not known. Any file that fails a hash check should be
  39 downloaded again, with each piece being downloaded from different peers
  40 than it was previously. The peers are shifted by 1, so that if a peers
  41 previously downloaded piece i, it now downloads piece i+1, and the first
  42 piece is downloaded by the previous downloader of the last piece, or
  43 preferably a previously unused peer. As each piece is downloaded the
  44 running hash of the file should be checked to determine the place at
  45 which the file differs from the previous download.
  46
  47 If the hash check then passes, then the peer who originally provided the
  48 bad piece can be assessed blame for the error. Otherwise, the peer who
  49 originally provided the piece is probably at fault, since he is now
  50 providing a later piece. This doesn't work if the differing piece is the
  51 first piece, in which case it is downloaded from a 3rd peer, with
  52 consensus revealing the misbehaving peer.
  53
  54
  55 Store and share torrent-like strings for large files.
  56
  57 In addition to storing the file download location (which would still be
  58 used for small files), a bencoded dictionary containing the peer's
  59 hashes of the individual pieces could be stored for the larger files
  60 (20% of all the files are larger than 512 KB). This dictionary would
  61 have the normal piece size, the hash length, and a string containing the
  62 piece hashes of length <hash length>*<#pieces>. These piece hashes could
  63 be compared ahead of time to determine which peers have the same piece
  64 hashes (they all should), and then used during the download to verify
  65 the downloaded pieces.
  66
  67 For very large files (5 or more pieces), the torrent strings are too
  68 long to store in the DHT and retrieve (a single UDP packet should be
  69 less than 1472 bytes to avoid fragmentation). Instead, the peers should
  70 store the torrent-like string for large files separately, and only
  71 contain a reference to it in their stored value for the hash of the
  72 file. The reference would be a hash of the bencoded dictionary. If the
  73 torrent-like string is short enough to store in the DHT (i.e. less than
  74 1472 bytes, or about 70 pieces for the SHA1 hash), then a
  75 lookup of that hash in the DHT would give the torrent-like string.
  76 Otherwise, a request to the peer for the hash (just like files are
  77 downloaded), should return the bencoded torrent-like string.
  78
  79
  80 PeerManager needs to track peers' properties.
  81
  82 The PeerManager needs to keep track of the observed properties of seen
  83 peers, to help determine a selection criteria for choosing peers to
  84 download from. Each property will give a value from 0 to 1. The relevant
  85 properties are:
  86
  87  - hash errors in last day (1 = 0, 0 = 3+)
  88  - recent download speed (1 = fastest, 0 = 0)
  89  - lag time from request to download (1 = 0, 0 = 15s+)
  90  - number of pending requests (1 = 0, 0 = max (10))
  91  - whether a connection is open (1 = yes, 0.9 = no)
  92
  93 These should be combined (multiplied) to provide a sort order for peers
  94 available to download from, which can then be used to assign new
  95 downloads to peers. Pieces should be downloaded from the best peers
  96 first (i.e. piece 0 from the absolute best peer).
  97
  98
  99 When looking up values, DHT should return nodes and values.
 100
 101 When a key has multiple values in the DHT, returning a stored value may not
 102 be sufficient, as then no more nodes can be contacted to get more stored
 103 values. Instead, return both the stored values and the list of closest
 104 nodes so that the peer doing the lookup can decide when to stop looking
 105 (when it has received enough values).
 106
 107 Instead of returning both, a new method could be added, "lookup_value".
 108 This method will be like "get_value", except that every node will always
 109 return a list of nodes, as well as the number of values it has for that
 110 key. Once a querying node has found enough values (or all of them), then
 111 it would send the "get_value" method to the nodes that have the most
 112 values. The "get_value" query could also have a new parameter "number",
 113 which is the maximum number of values to return.
 114
 115
 116 Missing Kademlia implementation details are needed.
 117
 118 The current implementation is missing some important features, mostly
 119 focussed on storing values:
 120  - values need to be republished (every hour?)