-that fetches packages to be installed from a network of mirrors. The
-Debian project \cite{debian} (and other Debian-based distributions)
-uses the \texttt{apt} program, which downloads Debian packages in
-the \texttt{.deb} format from one of many mirrors. The program will
-first download index files that contain a listing of which packages
-are available, as well as important information such as their size,
-location, and a hash of their content. Red Hat's Fedora project
-\cite{fedora} (and other RPM-based distributions) use the
-\texttt{yum} program to obtain RPMs, and Gentoo \cite{gentoo} uses
-\texttt{portage} in a similar way.
-
-Other software vendors also use a similar system. CPAN \cite{cpan}
-distributes files containing software packages for the PERL
-programming language, using SOAP RPC requests to find and download
-files. Cygwin \cite{cygwin} provides many of the
-standard Unix/Linux tools in a Windows environment, using a
-package management tool that requests packages from websites. There
-are two software distribution systems for Mac OSX, fink and
-MacPorts, that also retrieve packages in this way.
-
-Also, some systems use direct web downloads, but with a hash
-verification file also available for download next to the desired
-file. These hash files usually have the same file name, but with an
-added extension identifying the hash used (e.g. \texttt{.md5} for
-the MD5 hash).
-
-Finally, there are other programs that use cryptographic hashing to
-identify files. Git is a version control system in which all files,
-commits, and tags, are identified by their SHA1 hash. These hashes
-are used to verify the origin of these items, but are also used when
-clients update their local files with new remote information.
-
-The important things to note for each of the systems mentioned
-is that they all have the following in
-common:
-\begin{itemize}
- \item The content is avaliable for anyone to freely download.
- \item The content is divided into distinct units (packages), each of which contains
- a small independent part of all the content.
- \item Users are typically not interested in downloading all of the content
- available.
- \item Hashes of the content and its size are available before the
- download is attempted.
- \item Requests for the content are sent by a tool, not directly by
- the user (though the tool is responding to requests from the user).
-\end{itemize}
-
-We also expect that there are a number of users of these systems
-that are motivated by altruism to want to help out with this
-distribution. This is common in these systems due to the free nature
-of the content being delivered, which encourages some to want to
-help out in some way. A number of the systems are also used by
-groups that are staffed mostly, or sometimes completely, by
-volunteers. This also encourages users to want to \emph{give back}
-to the volunteer community that has created the content they are
-using.
-
-Although at first it might seem that a single all-reaching solution
-is possible for this situation, there are some differences in each
-system that require independent solutions. The systems all use
-different tools for their distribution, so any implementation would
-have to be specific to the tool it is to integrate with. In
-particular, how to obtain requests from the tool or user, and how to
-find a hash that corresponds to the file being requested, is very
-specific to each system.
-
-Also, there may be no benefit in creating a single large solution to
-integrate all these problems. For one, though they sometimes
-distribute nearly identical content (e.g. the same software
-available in multiple Linux distributions), it is not exactly
-identical and has usually been tailored to the system. The small
-differences will change the hash of the files, and so
-will make it impossible to distribute similar content
-across systems. And, although peer-to-peer systems scale very well
-with the number of peers in the system, there is some overhead
-involved, so having a much larger system of peers would mean that
-requests could take longer to complete.
-
-The situation presents a clear
-opportunity to use some form of peer-to-peer file-sharing protocol.
-This sparse interest in a large number of packages, and constant
-updating, is well suited to the functionality provided by a
-Distributed Hash Table (DHT). DHTs require unique keys to store and
-retrieve strings of data, for which the cryptographic hashes used by
-these package management systems are perfect for. The stored and
-retrieved strings can then be pointers to the peers that have the
-content package that hashes to that key. A downloading peer can
-lookup the package hash in the DHT and, if it is found,
-download the file from those peers and verify the content. If the
-package hash can not be found in the DHT, the peer will fallback to
-downloading from the original content location (i.e. the network of
-mirrors), and once complete will add a new entry to the DHT
-indicating that it has the content.
-
-There are several ways to implement the desired P2P functionality
-into the existing package management software. The functionality can
-be directly integrated into the software, though this can be
-difficult as the DHT should be running at all times, both for
-efficient lookups and to support uploading of already downloaded
-content, whereas the tools typically only run until the download request is complete.
-Many of the package management software implementations use
-HTTP requests to download the files, which makes it possible to
-implement the P2P aspect as a standard HTTP caching proxy, which
-will get uncached requests first from the P2P system, and then
-fallback to the normal HTTP request from a server. For methods that
-don't use HTTP requests, other types of proxies may also be
-possible.
-
-Downloading a file efficiently from a number of peers is where
+that fetches packages to be installed from a network of mirrors.
+Debian-based distributions uses the \texttt{apt} program, which
+downloads Debian packages from one of many mirrors. RPM-based
+distributions use \texttt{yum}, and Gentoo uses \texttt{portage},
+which both operate in a similar way. Other free software
+distributors also use a similar system: CPAN distributes files
+containing software packages for the PERL programming language,
+using SOAP RPC requests to find and download files; Cygwin provides
+many of the standard Unix/Linux tools in a Windows environment,
+using a package management tool that requests packages from
+websites; there are two software distribution systems for Mac OSX,
+fink and MacPorts, that also retrieve packages in this way. Also,
+some systems use direct web downloads, but with a hash verification
+file also available for download next to the desired file. These
+hash files usually have the same file name, but with an added
+extension identifying the hash used (e.g. \texttt{.md5} for the MD5
+hash). Finally, there are other programs that make use of
+cryptographic hashing to identify files: e.g. Git is a version
+control system in which all files and changes are identified by
+their SHA1 hash.
+
+These systems all share some commonalities: the content is avaliable
+for anyone to freely download, the content is divided into distinct
+units (packages), users are typically not interested in downloading
+all of the content available, and hashes of the content are
+available before the download is attempted. We also expect that
+there are a number of users of these systems that are motivated by
+altruism to want to help out with this distribution. This is common
+in these systems due to the free nature of the content being
+delivered.
+
+% Although at first it might seem that a single all-reaching solution
+% is possible for this problem, there are some differences in each
+% system that require independent solutions. The systems all use
+% different tools for their distribution, so any implementation would
+% have to be specific to the tool it is to integrate with. In
+% particular, how to obtain requests from the tool or user, and how to
+% find a hash that corresponds to the file being requested, is very
+% specific to each system. Also, there may be no benefit in creating a
+% single large solution to integrate all these systems since, even
+% though they sometimes distribute nearly identical content, it is not
+% identical as it has been tailored to the system, which will change
+% the hash of the files and so will make it impossible to distribute
+% similar content across systems.
+
+\section{Solution}
+
+This situation presents a clear opportunity to use some form of
+peer-to-peer file-sharing protocol. The sparse interest in a large
+number of packages undergoing constant updating is well suited to
+the functionality provided by a Distributed Hash Table (DHT). DHTs
+require unique keys to store and retrieve strings of data, which the
+cryptographic hashes used by these package management systems are
+perfect for. The stored and retrieved strings can then be pointers
+to the peers that have the content package that hashes to that key.
+A downloading peer can lookup the package hash in the DHT and, if it
+is found, download the file from those peers and verify the content.
+If the package hash can not be found in the DHT, the peer will
+fallback to downloading from the original content location, and once
+complete will add a new entry to the DHT indicating that it now has
+the content. The servers or mirrors thus act as \emph{seeds} for the
+P2P system without any modification to them.
+
+This desired P2P functionality could be integrated into the existing
+package management software in 2 ways. The functionality can be
+directly integrated into the software, though this could be
+difficult as the DHT should be running at all times, whereas the
+tools typically only run until the download request is complete.
+Alternatively, since many of the package management software
+implementations use HTTP requests to download the files, this makes
+it possible to implement the P2P aspect as a standard HTTP caching
+proxy, which will get uncached requests first from the P2P system,
+and then fallback to the normal HTTP request from a server. For
+methods that don't use HTTP requests, other types of proxies may
+also be possible.
+
+Downloading a large file efficiently from a number of peers is where