Generic Object-Sharing Protocol

Generic Object-Sharing Protocol
- Node Identification
  - Every node generates an id string by random
    - This is the node id (node identifier)
    - Only on first use
    - This should be globally unique
    - It will be stored in node's database for later reuse
    - A hash is being generated of it
      - Hashed data:
        Node's IP number and hostname
        Some random characters
    - This id does not change as long as the database is not purged
  - Per session another id is generated
    - This is the SID (Session IDentifier)
    - It is being distributed to the nodes
    - It stored together with the Node-Id
      - So other can validate bother together
    - Logging should only enabled for debugging purposes
  - Locking IPs or Node-Ids on master-nodes is not planed
    - Censorship would be to very easy
      - Government agencies or enterprise parties
    - Censhorship makes no sence here
      - It can very easy be bypassed:
      - Delete Node-Id in database
        A new one got generated
      - Locked IP or port number can be bypassed by proxies
      - One or two master-nodes should listen on ports commonly unblocked by firewalls
        Like 80/443/110/25
  - Hubs can be optionally registered by master-nodes
    - Increases karma because the node admin is verified
    - Unregistered nodes does not receive negative votings
- Bootstrapping
  - At least one, better 3 to 4, master-nodes are required
    - Aka. "Bootstrap-Nodes"
    - They should be listed in the configuration for all applications
    - A comma-seperated list of node IPs with port numbers seperated by double-dot (:)
  - Bootstrap-Nodes are working stand-alone
    - No central "Super-Node" is required
    - Too much traffic would have to flow through it
    - Attacks on the network by censorship are reduced
    - Traffic does not increase network-overall load
    - Small disadvantage:
      - Hubs must register with ...
      - ... more than one master-node ...
      - .. or connect with each other
  - 1. Node checks if there is a list of master nodes already stored
    - If so, it skips fetching the list of nodes step
  - 2. Node is announcing itself to the upper bootstrap hub(s)
    - This should be done generic to keep things easy
    - Recommended is maybe an XML with all neccessary data
    - The session id will not be included here
      - A bootstrap node will never try to connect clients with nodes
      - It should only "bootstrap" (tell the node where it should start sharing its objects)
  - 3. Node is fetching a list of other nodes
    - They must have at least X matching object types
    - If a bootstrap node is full it forwards the node to an other bootstrap hub
    - If that node is again full the node will be forwarded to a list node
    - If wether no free bootstrap nor list node is available the node waits some time and tries it again
    - Hashes of node-lists distributed over the bootstrap and list nodes should match
      - This can be ensured by a DHT
        DHT = Distributed Hash Table
        Which format?
    - If to much are inconsistent:
      - No connect can happen
        Node list is rejected
      - Or the bootstrap-nodes are working as regular nodes
      - Replication of the node-list is required by all bootstrap-nodes
  - 4. Node connects to gathered master nodes
    - It again announces its object types to the master nodes
    - Again it provides the session id so the master node can map session id -> node id
  - 5. If all authorization steps are completed:
    - The node starts to accept client connections
      - (It already listens to them but rejects them)
    - Objects will now be shared with other nodes which accepts the same object types
- Karma
  - Karma is given for validating entries in the DHT
    - Last activity in near past
      - Does not affect karma
    - Returned pings
      - Amount of sent pings
        If no reply it got dead-listed
      - Failed pings reduce karma
      - Slow responses reduce karma
    - Karma voting for other nodes is not to negative
      - Reduces manipulation chances
        Prefer karma votes of trusted nodes
        Negative karma votings for untrusted karma reduce own karma
    - To much "spam packages" reduce karma
    - Validated packages increase karma
    - Protocol version should not be to old
      - This affects karma only negativly
      - An up-to-date protocol does not increase karma
      - Does also serve as a "spam protection"
      - Received protocol version of node is older than stored
        Karma is reduced
      - Received protocol version is much than from master-nodes
        Karma is reduced
    - Provided object types by the peer hub
      - This affectes karma only negativly
      - New types must first be known by masters
      - This should be configurable:
        Karma should be reduced...
        ... or peer node should be black-listed
      - Because of every node can be a master-mode censorship is really hard
    - Correctly logging
      - Does not affect karma
      - Logout must be done by master node and active nodes
        "Bye" message
      - Rotating of dynamic IPs should be considered
        Must be registered by master-node
        ID is registered as "Dynamic IP"
        So connects are still possible
        No negative votings by other nodes
        Current IP does spread good in network
        Query of the master-node only in doubt
- Update Messages
  - Will only be broadcasted from bootstrap- to master- and list-nodes
  - No node will receive update messages due to heavy network load
    - Maybe only "good" nodes should receive this?
  - Contains update notes and importance level
- "Client" Connections
  - Should be interpreted as "application software"
  - Clients should also generate a "client id"
    - Both id and sid
  - Will also connect first to bootstrap-nodes
    - Ask for a node-list as well
  - Do also receive karma from nodes
  - Dynamic IPs are also accepted and therefore must be registered
- Client<->Node Communication
  - After a client has bootstrapped it announces all it's object types to the nodes
    - Including acceptance of broadcasts, poll-mode and Ping-POST
  - By this the nodes know clients and their accepted object types
  - Clients may download a node-list for a specific object type
    - Distinct-List-Mode
    - After selecting a node the client can request a list of clients from that hub
    - From these clients the client can accept objects from and send to
      - E.g. news by broadcast
  - Clients may send "broadcast" objects
    - Broadcast-Mode
    - Must be allowed by nodes
      - This consumes traffic
      - Acceptance of broadcasts is known to list-/master- and bootstrap-nodes
    - A client sends its broadcast to the master-nodes
      - They are distribute it to their fellow nodes
      - A node knows which client accepts broadcasts and "deposits" it for the client
      - Clients are requesting such broadcasts by poll-mode or are "pinged"
        In poll-mode the client asks on a regular basis at the node for new broadcasts
        A Ping-POST is being sent by the node as a regular HTTP-POST request to the client
        This also happens on a regular basis
        A node-admin may allow both types independly
        If none is allowed the node acts as a "relay"
        And therefore it cannot accept clients with broadcast-functionality enabled
  - Client-Client Communication
    - May be done "anonymously" over the node or directly with an other client
      - Communication of the node is done in poll-mode or by Ping-POST
      - Direct client-client communication client "A" sends a Ping-POST directly to client "B"
    - Wrongly sent Ping-POSTs (e.g. the admin doesn't allow them) may be answered with a regular HTTP status '4XX'
- Usage of low-level protocols
  - Already existing low-level protocols like TCP/IP and UDP should be used
    - TCP should be used for "inter-communication"
    - UDP should be used for "streaming" the objects to other nodes
      - Parties are generating hashes of chunks for validation
      - Chunks should only be created for very big objects
        Total object size is larger than X KByte
      - The sender creates hashes and adds them to the chunk
        The receivers validates them
        No serial numbers a-la TCP are generated
      - The last chunk package contains both hashes
        Hash of itself and the final hash
      - If a hash fails to validate it is being collected
        After the final chunk was sent, failed chunks a re-requested
      - This is retried X times per hash
        But always at the end of the whole transaction and all together
        If still some hashes failed to transfer
        The object got dropped or fully requested
        This should be configurable by the admin
        To do so, the final hash and object type is submitted to the sender
        "Restransmit-Message"
        The sender is now trying smaller chunks
      - If all was successfully received
        The receivers sends a "done-message" to the sender with final hash and object type
    - There is also a "real" streaming mode
      - This is e.g. used for chat
      - For this TCP/IP is used and no hashes are generated
      - Also no chunks are generated
      - Only in this mode "multi-casting" is possible
- Fault Tolerance / Reliability
  - After X failed connection attempts a node got removed
    - Other nodes report this to the master-node
    - The master-node probes the failed node and removes it
  - Failed list-node
    - Hubs are reporting it to the master-node
    - The master-node probes the failed list-node and removes it
  - Failed master-node
    - List-nodes takeover the role of a master-node if no bootstrap-nodes are available
      - This takeover should not be entirely and should be defined
    - If there is no list-node, nodes look for an active master-node
      - They report the failed master-node to it
    - If additionally no master-node is up, a node will be elected as new master-node
      - Doing so, all nodes are identifying the node with...
        ... the best karma
        This is known to many nodes
        ... most votings
        A "vote" is a positive karma
        Also known to many nodes
      - The "election" should take place within a specific timeout
      - If no election is happening the node with most connections got elected
  - If one of the bootstrap-nodes is up
    - The elected nodes notifies a some of it's fellow nodes that the bootstrap-node is back
    - The elected node becomes a regular node and notifies other nodes on connection attempts
  - Disadvantages:
    - A new node with only knowlege about the bootstrap-nodes may not be able to connect to the nodes
      - Additional bootstrap-nodes on other server and/or continent may help here
- Object Types
  - New object types are only addable by updating the software
  - It also possible by 3rd-party
    - Must be known by master/bootstrap-nodes
  - Outdated object types are marked "deprecated" for a longer time
    - Master-nodes may accept or reject them
    - A "deprecation message" is always being sent
    - A note of a required update can optionally be added
  - After deprecation time they are treated as "unknown"
  - Other nodes should ask bootstrap-nodes
    - This compensate errors made by master-nodes
    - Wrongly deprecated object types by the master-node result in bad karma by the bootstrap-node