X-Git-Url: https://git.mxchange.org/?p=quix0rs-apt-p2p.git;a=blobdiff_plain;f=TODO;h=0654dba2b87abfd22450402df4f2c7de704e42b9;hp=49a33d8fc8227cc202a32a6e072f68ea6be12283;hb=95453ed929f4b9faa3ddb4702ceea9d01784e0e6;hpb=9a8119cf7bb5dbdea853a694c84aee7e638aa287 diff --git a/TODO b/TODO index 49a33d8..0654dba 100644 --- a/TODO +++ b/TODO @@ -1,9 +1,41 @@ -Add all cache files to the database. - -All files in the cache should be added to the database, so that they can -be checked to make sure nothing has happened to them. The database would -then need a flag to indicate files that are hashed and available, but -that shouldn't be added to the DHT. +Use GPG signatures as a hash for files. + +A detached GPG signature, such as is found in Release.gpg, can be used +as a hash for the file. This hash can be used to verify the file when +it is downloaded, and a shortened version can be added to the DHT to +look up peers for the file. To get the hash into a binary form from +the ascii-armored detached file, use the command +'gpg --no-options --no-default-keyring --output - --dearmor -'. The +hash should be stored as the reverse of the resulting binary string, +as the bytes at the beginning are headers that are the same for most +signatures. That way the shortened hash stored in the DHT will have a +better chance of being unique and being stored on different peers. To +verify a file, first the binary hash must be re-reversed, armored, and +written to a temporary file with the command +'gpg --no-options --no-default-keyring --output $tempfile --enarmor -'. +Then the incoming file can be verified with the command +'gpg --no-options --no-default-keyring --keyring /etc/apt/trusted.gpg +--verify $tempfile -'. + +All communication with the command-line gpg should be done using pipes +and the python module python-gnupginterface. There needs to be a new +module for GPG verification and hashing, which will make this easier. +In particular, it would need to support hashlib-like functionality +such as new(), update(), and digest(). Note that the verification +would not involve signing the file again and comparing the signatures, +as this is not possible. Instead, the verify() function would have to +behave differently for GPG hashes, and check that the verification +resulted in a VALIDSIG. CAUTION: the detached signature can have a +variable length, though it seems to be usually 65 bytes, 64 bytes has +also been observed. + + +Consider what happens when multiple requests for a file are received. + +When another request comes in for a file already being downloaded, +the new request should wait for the old one to finish. This should +also be done for multiple requests for peer downloads of files with +the same hash. Packages.diff files need to be considered. @@ -15,45 +47,27 @@ distributions. They need to be dealt with properly by adding them to the tracking done by the AptPackages module. -Retransmit DHT requests before timeout occurs. - -Currently, only a single transmission to a peer is ever attempted. If -that request is lost, a timeout will occur after 20 seconds, the peer -will be declared unreachable and the action will move on to the next -peer. Instead, try to resend the request periodically using exponential -backoff to make sure that lost packets don't delay the action so much. -For example, send the request, wait 2 seconds and send again, wait 4 -seconds and send again, wait 8 seconds (14 seconds have now passed) and -then declare the host unreachable. The same TID should be used in each -retransmission, so receiving multiple responses should not be a problem -as the extra ones will be ignored. - - -PeerManager needs to download large files from multiple peers. - -The PeerManager currently chooses a peer at random from the list of -possible peers, and downloads the entire file from there. This needs to -change if both a) the file is large (more than 512 KB), and b) there are -multiple peers with the file. The PeerManager should then break up the -large file into multiple pieces of size < 512 KB, and then send requests -to multiple peers for these pieces. - -This can cause a problem with hash checking the returned data, as hashes -for the pieces are not known. Any file that fails a hash check should be -downloaded again, with each piece being downloaded from different peers -than it was previously. The peers are shifted by 1, so that if a peers -previously downloaded piece i, it now downloads piece i+1, and the first -piece is downloaded by the previous downloader of the last piece, or -preferably a previously unused peer. As each piece is downloaded the -running hash of the file should be checked to determine the place at -which the file differs from the previous download. - -If the hash check then passes, then the peer who originally provided the -bad piece can be assessed blame for the error. Otherwise, the peer who -originally provided the piece is probably at fault, since he is now -providing a later piece. This doesn't work if the differing piece is the -first piece, in which case it is downloaded from a 3rd peer, with -consensus revealing the misbehaving peer. +Improve the downloaded and uploaded data measurements. + +There are 2 places that this data is measured: for statistics, and for +limiting the upload bandwidth. They both have deficiencies as they +sometimes miss the headers or the requests sent out. The upload +bandwidth calculation only considers the stream in the upload and not +the headers sent, and it also doesn't consider the upload bandwidth +from requesting downloads from peers (though that may be a good thing). +The statistics calculations for downloads include the headers of +downloaded files, but not the requests received from peers for upload +files. The statistics for uploaded data only includes the files sent +and not the headers, and also misses the requests for downloads sent to +other peers. + + +Rehash changed files instead of removing them. + +When the modification time of a file changes but the size does not, +the file could be rehased to verify it is the same instead of +automatically removing it. The DB would have to be modified to return +deferred's for a lot of its functions. Consider storing deltas of packages.