1 Files for which a hash cannot be found should not be added to the DHT.
3 If the hash can't found, it stands to reason that other peers will not
4 be able to find the hash either. So adding those files to the DHT will
5 just clutter it with useless information. Examples include Release.gpg,
6 Release, Translation-de.bz2, and Contents.gz.
9 Use python-debian for parsing RFC 822 files.
11 There are already routines for parsing these files, so there is no need
12 to write more. In the AptPackages, change the Release file parsing to
13 use the python-debian routines.
16 Packages.diff files need to be considered.
18 The Packages.diff/Index files contain hashes of Packages.diff/rred.gz
19 files, which themselves contain diffs to the Packages files previously
20 downloaded. Apt will request these files for the testing/unstable
21 distributions. They need to either be ignored, or dealt with properly by
22 adding them to the tracking done by the AptPackages module.
25 Change file identifier from path to hash.
27 Some files can change without changing the path, since the file was
28 added to the DHT by the peer. Examples are Release, Packages.gz, and
29 Sources.bz2. This would cause problems when requesting these files by
30 path. Instead, share the files by hash, then the request would be for
31 http://127.3.45.9:9977/~<urlencodedHash>, and it would always work. This
32 will require a database lookup for every request.
35 PeerManager needs to download large files from multiple peers.
37 The PeerManager currently chooses a peer at random from the list of
38 possible peers, and downloads the entire file from there. This needs to
39 change if both a) the file is large (more than 512 KB), and b) there are
40 multiple peers with the file. The PeerManager should then break up the
41 large file into multiple pieces of size < 512 KB, and then send requests
42 to multiple peers for these pieces.
44 This can cause a problem with hash checking the returned data, as hashes
45 for the pieces are not known. Any file that fails a hash check should be
46 downloaded again, with each piece being downloaded from different peers
47 than it was previously. The peers are shifted by 1, so that if a peers
48 previously downloaded piece i, it now downloads piece i+1, and the first
49 piece is downloaded by the previous downloader of the last piece, or
50 preferably a previously unused peer. As each piece is downloaded the
51 running hash of the file should be checked to determine the place at
52 which the file differs from the previous download.
54 If the hash check then passes, then the peer who originally provided the
55 bad piece can be assessed blame for the error. Otherwise, the peer who
56 originally provided the piece is probably at fault, since he is now
57 providing a later piece. This doesn't work if the differing piece is the
58 first piece, in which case it is downloaded from a 3rd peer, with
59 consensus revealing the misbehaving peer.
62 Store and share torrent-like strings for large files.
64 In addition to storing the file download location (which would still be
65 used for small files), a bencoded dictionary containing the peer's
66 hashes of the individual pieces could be stored for the larger files
67 (20% of all the files are larger than 512 KB). This dictionary would
68 have the normal piece size, the hash length, and a string containing the
69 piece hashes of length <hash length>*<#pieces>. These piece hashes could
70 be compared ahead of time to determine which peers have the same piece
71 hashes (they all should), and then used during the download to verify
72 the downloaded pieces.
74 For very large files (5 or more pieces), the torrent strings are too
75 long to store in the DHT and retrieve (a single UDP packet should be
76 less than 1472 bytes to avoid fragmentation). Instead, the peers should
77 store the torrent-like string for large files separately, and only
78 contain a reference to it in their stored value for the hash of the
79 file. The reference would be a hash of the bencoded dictionary. If the
80 torrent-like string is short enough to store in the DHT (i.e. less than
81 1472 bytes, or about 70 pieces for the SHA1 hash), then a
82 lookup of that hash in the DHT would give the torrent-like string.
83 Otherwise, a request to the peer for the hash (just like files are
84 downloaded), should return the bencoded torrent-like string.
87 PeerManager needs to track peers' properties.
89 The PeerManager needs to keep track of the observed properties of seen
90 peers, to help determine a selection criteria for choosing peers to
91 download from. Each property will give a value from 0 to 1. The relevant
94 - hash errors in last day (1 = 0, 0 = 3+)
95 - recent download speed (1 = fastest, 0 = 0)
96 - lag time from request to download (1 = 0, 0 = 15s+)
97 - number of pending requests (1 = 0, 0 = max (10))
98 - whether a connection is open (1 = yes, 0.9 = no)
100 These should be combined (multiplied) to provide a sort order for peers
101 available to download from, which can then be used to assign new
102 downloads to peers. Pieces should be downloaded from the best peers
103 first (i.e. piece 0 from the absolute best peer).
106 Missing Kademlia implementation details are needed.
108 The current implementation is missing some important features, mostly
109 focussed on storing values:
110 - values need to be republished (every hour?)
111 - original publishers need to republish values (every 24 hours)
112 - when a new node is found that is closer to some values, replicate the
113 values there without deleting them
114 - when a value lookup succeeds, store the value in the closest node
115 found that didn't have it
116 - make the expiration time of a value exponentially inversely
117 proportional to the number of nodes between the current node and the
118 node closest to the value