X-Git-Url: https://git.mxchange.org/?p=quix0rs-apt-p2p.git;a=blobdiff_plain;f=TODO;h=49a33d8fc8227cc202a32a6e072f68ea6be12283;hp=cb20020d29f1a73c085f0cacb0d910df31fc7b67;hb=9a8119cf7bb5dbdea853a694c84aee7e638aa287;hpb=3d6c833df60bcae0bb3aa809ea8c5813289e97f1 diff --git a/TODO b/TODO index cb20020..49a33d8 100644 --- a/TODO +++ b/TODO @@ -1,13 +1,98 @@ -Missing Kademlia implementation details are needed. - -The current implementation is missing some important features, mostly -focussed on storing values: - - values need to be republished (every hour?) - - original publishers need to republish values (every 24 hours) - - when a new node is found that is closer to some values, replicate the - values there without deleting them - - when a value lookup succeeds, store the value in the closest node - found that didn't have it - - make the expiration time of a value exponentially inversely - proportional to the number of nodes between the current node and the - node closest to the value +Add all cache files to the database. + +All files in the cache should be added to the database, so that they can +be checked to make sure nothing has happened to them. The database would +then need a flag to indicate files that are hashed and available, but +that shouldn't be added to the DHT. + + +Packages.diff files need to be considered. + +The Packages.diff/Index files contain hashes of Packages.diff/rred.gz +files, which themselves contain diffs to the Packages files previously +downloaded. Apt will request these files for the testing/unstable +distributions. They need to be dealt with properly by +adding them to the tracking done by the AptPackages module. + + +Retransmit DHT requests before timeout occurs. + +Currently, only a single transmission to a peer is ever attempted. If +that request is lost, a timeout will occur after 20 seconds, the peer +will be declared unreachable and the action will move on to the next +peer. Instead, try to resend the request periodically using exponential +backoff to make sure that lost packets don't delay the action so much. +For example, send the request, wait 2 seconds and send again, wait 4 +seconds and send again, wait 8 seconds (14 seconds have now passed) and +then declare the host unreachable. The same TID should be used in each +retransmission, so receiving multiple responses should not be a problem +as the extra ones will be ignored. + + +PeerManager needs to download large files from multiple peers. + +The PeerManager currently chooses a peer at random from the list of +possible peers, and downloads the entire file from there. This needs to +change if both a) the file is large (more than 512 KB), and b) there are +multiple peers with the file. The PeerManager should then break up the +large file into multiple pieces of size < 512 KB, and then send requests +to multiple peers for these pieces. + +This can cause a problem with hash checking the returned data, as hashes +for the pieces are not known. Any file that fails a hash check should be +downloaded again, with each piece being downloaded from different peers +than it was previously. The peers are shifted by 1, so that if a peers +previously downloaded piece i, it now downloads piece i+1, and the first +piece is downloaded by the previous downloader of the last piece, or +preferably a previously unused peer. As each piece is downloaded the +running hash of the file should be checked to determine the place at +which the file differs from the previous download. + +If the hash check then passes, then the peer who originally provided the +bad piece can be assessed blame for the error. Otherwise, the peer who +originally provided the piece is probably at fault, since he is now +providing a later piece. This doesn't work if the differing piece is the +first piece, in which case it is downloaded from a 3rd peer, with +consensus revealing the misbehaving peer. + + +Consider storing deltas of packages. + +Instead of downloading full package files when a previous version of +the same package is available, peers could request a delta of the +package to the previous version. This would only be done if the delta +is significantly (>50%) smaller than the full package, and is not too +large (absolutely). A peer that has a new package and an old one would +add a list of deltas for the package to the value stored in the DHT. +The delta information would specify the old version (by hash), the +size of the delta, and the hash of the delta. A peer that has the same +old package could then download the delta from the peer by requesting +the hash of the delta. Alternatively, very small deltas could be +stored directly in the DHT. + + +Consider tracking security issues with packages. + +Since sharing information with others about what packages you have +downloaded (and probably installed) is a possible security +vulnerability, it would be advantageous to not share that information +for packages that have known security vulnerabilities. This would +require some way of obtaining a list of which packages (and versions) +are vulnerable, which is not currently available. + + +Consider adding peer characteristics to the DHT. + +Bad peers could be indicated in the DHT by adding a new value that is +the NOT of their ID (so they are guaranteed not to store it) indicating +information about the peer. This could be bad votes on the peer, as +otherwise a peer could add good info about itself. + + +Consider adding pieces to the DHT instead of files. + +Instead of adding file hashes to the DHT, only piece hashes could be +added. This would allow a peer to upload to other peers while it is +still downloading the rest of the file. It is not clear that this is +needed, since peer's will not be uploading and downloading ery much of +the time.