X-Git-Url: https://git.mxchange.org/?p=quix0rs-apt-p2p.git;a=blobdiff_plain;f=TODO;h=bd8283c904d6ed372c9a84136ce204422bb6e183;hp=dd9524a40f3ce417eba51814d8430bb5d34372b1;hb=21f3067f2e5f694c696835f5eceab0eba5c3d479;hpb=136c82d6c138402e93bdefb499aed967ba059385 diff --git a/TODO b/TODO index dd9524a..bd8283c 100644 --- a/TODO +++ b/TODO @@ -1,9 +1,51 @@ -Add all cache files to the database. - -All files in the cache should be added to the database, so that they can -be checked to make sure nothing has happened to them. The database would -then need a flag to indicate files that are hashed and available, but -that shouldn't be added to the DHT. +Rotate DNS entries for mirrors more reliably. + +Currently the mirrors are accessed by DNS name, which can cause some +issues when there are mirror differences and the DNS gets rotated. +Instead, the HTTP Downloader should handle DNS lookups itself, store +the resulting addresses, and send requests to IP addresses. If there +is an error from the mirror (hash check or 404 response), the next IP +address in the rotation should be used. + + +Use GPG signatures as a hash for files. + +A detached GPG signature, such as is found in Release.gpg, can be used +as a hash for the file. This hash can be used to verify the file when +it is downloaded, and a shortened version can be added to the DHT to +look up peers for the file. To get the hash into a binary form from +the ascii-armored detached file, use the command +'gpg --no-options --no-default-keyring --output - --dearmor -'. The +hash should be stored as the reverse of the resulting binary string, +as the bytes at the beginning are headers that are the same for most +signatures. That way the shortened hash stored in the DHT will have a +better chance of being unique and being stored on different peers. To +verify a file, first the binary hash must be re-reversed, armored, and +written to a temporary file with the command +'gpg --no-options --no-default-keyring --output $tempfile --enarmor -'. +Then the incoming file can be verified with the command +'gpg --no-options --no-default-keyring --keyring /etc/apt/trusted.gpg +--verify $tempfile -'. + +All communication with the command-line gpg should be done using pipes +and the python module python-gnupginterface. There needs to be a new +module for GPG verification and hashing, which will make this easier. +In particular, it would need to support hashlib-like functionality +such as new(), update(), and digest(). Note that the verification +would not involve signing the file again and comparing the signatures, +as this is not possible. Instead, the verify() function would have to +behave differently for GPG hashes, and check that the verification +resulted in a VALIDSIG. CAUTION: the detached signature can have a +variable length, though it seems to be usually 65 bytes, 64 bytes has +also been observed. + + +Consider what happens when multiple requests for a file are received. + +When another request comes in for a file already being downloaded, +the new request should wait for the old one to finish. This should +also be done for multiple requests for peer downloads of files with +the same hash. Packages.diff files need to be considered. @@ -15,6 +57,42 @@ distributions. They need to be dealt with properly by adding them to the tracking done by the AptPackages module. +Improve the estimation of the total number of nodes + +The current total nodes estimation is based on the number of buckets. +A better way is to look at the average inter-node spacing for the K +closest nodes after a find_node/value completes. Be sure to measure +the inter-node spacing in log2 space to dampen any ill effects. This +can be used in the formula: + nodes = 2^160 / 2^(average of log2 spacing) +The average should also be saved using an exponentially weighted +moving average (of the log2 distance) over separate find_node/value +actions to get a better calculation over time. + + +Improve the downloaded and uploaded data measurements. + +There are 2 places that this data is measured: for statistics, and for +limiting the upload bandwidth. They both have deficiencies as they +sometimes miss the headers or the requests sent out. The upload +bandwidth calculation only considers the stream in the upload and not +the headers sent, and it also doesn't consider the upload bandwidth +from requesting downloads from peers (though that may be a good thing). +The statistics calculations for downloads include the headers of +downloaded files, but not the requests received from peers for upload +files. The statistics for uploaded data only includes the files sent +and not the headers, and also misses the requests for downloads sent to +other peers. + + +Rehash changed files instead of removing them. + +When the modification time of a file changes but the size does not, +the file could be rehased to verify it is the same instead of +automatically removing it. The DB would have to be modified to return +deferred's for a lot of its functions. + + Consider storing deltas of packages. Instead of downloading full package files when a previous version of