Roland Häder [Tue, 29 Aug 2023 05:52:42 +0000 (07:52 +0200)]
Another attempt to rewrite:
- don't update nodeinfo URL and detection mode to STATIC_CHECK while fetching
blocks for Pleroma
- Pleroma has their block list exposed in that nodeinfo and not in separate API
Roland Häder [Mon, 28 Aug 2023 14:30:09 +0000 (16:30 +0200)]
Continued:
- some misskey instances may have no nodeinfo URL, e.g. they got only detected
through APP_NAME method
- still they may provide a blocklist
- it is now rewritten that first a generic "/api/v1/instance/domain_blocks" is
fetched and if it fails, a software-specific attempt is done
Roland Häder [Fri, 25 Aug 2023 23:31:52 +0000 (01:31 +0200)]
Continued:
- I hope this isn't to strict, some hosts return a "298 None" which the HTTP
library doesn't see as failed (response.ok = False) but still doesn't return
a JSON
Roland Häder [Thu, 24 Aug 2023 19:12:14 +0000 (21:12 +0200)]
Continued:
- added parameter --software3 which searchinf for a file 'software.txt'
- you can generate this by running e.g.
sqlite3 ./blocks.db "SELECT software FROM instances WHAT_EVER_PARAMETER;" > software.txt
- reset nodeinfo_url if software is now None
- always set complete URL, including domain
Roland Häder [Thu, 24 Aug 2023 18:35:55 +0000 (20:35 +0200)]
Continued:
- renamed fetch_nodeinfo() to fetch() as it is already part of module nodeinfo
- added 3rd optional parameter to it, fetching of /.well-known/* isn't then
required anymore and saves another request
- also the wanted URL can be directly used
Roland Häder [Tue, 22 Aug 2023 18:34:24 +0000 (20:34 +0200)]
Continued:
- added parameter --no-software which fetching instances with no software
detected and not recently being checked
- the parameter --force is not re-checking recently entries. If you want this
you need to use ./nodeinfo.sh --software --force but be kind to other
webmasters!
Roland Häder [Wed, 16 Aug 2023 22:56:19 +0000 (00:56 +0200)]
Continued:
- no, nope: validators.hostname() was a bad idea, it also let IP addresses and
local host names in as well
- added command remove_invalid to remove those from database
- renamed recheck.sh -> nodeinfo.sh
Roland Häder [Wed, 16 Aug 2023 13:27:44 +0000 (15:27 +0200)]
Continued:
- chaos.social isn't part of oliphant, so it still requires being handled
separately
- fetch software/origin from local database instead software from remote
nodeinfo (saves some requests to their servers)
Roland Häder [Tue, 15 Aug 2023 19:05:10 +0000 (21:05 +0200)]
Continued:
- don't invoke federation.fetch_instances() when last_instance_fetch is
recently being updated (means recently being fetch already)
- column for instances.is_recent() needs to now only start with "last_"
- don't determine unknown software if instances was recently fetched
Roland Häder [Tue, 15 Aug 2023 17:12:09 +0000 (19:12 +0200)]
Continued:
- typical for oliphant members: Hide their own blocklist and then handle it
over to Codeberg repository of oliphant
- this helper now has a simple function to check if the provided domain should
be excluded
Roland Häder [Mon, 14 Aug 2023 04:03:15 +0000 (06:03 +0200)]
Continued:
- added detection-mode 'APP_NAME' which reflects meta information
name="application-name"
- allow checking generator type if status code 410 (Gone) is given, e.g.
wordpress.com still returns a full HTML code to check
Roland Häder [Sun, 13 Aug 2023 17:13:36 +0000 (19:13 +0200)]
Continued:
- added --software2 for re-checking instances with `software` given and no
`detection_mode` given
- also added og:platform to HTML base template
Roland Häder [Tue, 8 Aug 2023 18:14:54 +0000 (20:14 +0200)]
Continued:
- you can now with --feed=https://some-fba/feed.atom specify an other ATOM feed
from an FBA/Pleroma bot
- parserset_defaults() is now specified first, then additional parameter
Roland Häder [Sat, 5 Aug 2023 21:54:19 +0000 (23:54 +0200)]
Continued:
- added network 'mammuthus[ experimental]' for retriving peers
- moved software-related (not version number) functions to software.py
- strip off " experimental", so you can enter e.g. 'mammuthus' easier
Roland Häder [Sat, 5 Aug 2023 13:52:13 +0000 (15:52 +0200)]
Continued:
- throw exception again, if they happen then they won't be fixed within a
split of a second
- also make sure that home directory of FBA is properly set, sure you can
choose a different directory or take the default /home/fba/
- added recheck.sh, a small wrapper script I wrote for myself and you should
try. For example above exceptions might cause the used software not being
detected (sure with timeouts) then you can run ./recheck.sh --software
to re-test them
Roland Häder [Thu, 27 Jul 2023 10:59:53 +0000 (12:59 +0200)]
Continued:
- move nodeinfo handling to new module 'nodeinfo'
- also had to renamed variable nodeinfo to other names
- first newest version at /.well-known/x-nodeinfo2
Roland Häder [Wed, 26 Jul 2023 14:22:34 +0000 (16:22 +0200)]
Partly reverted cdcd2b0109e126bca887d0712a7ddf602e5d6e62:
- "Accept" is not being accepted by misskey (gladly only these instances)
- it must be "Content-Type: application/json" or otherwise it is blocked
Roland Häder [Mon, 24 Jul 2023 22:51:40 +0000 (00:51 +0200)]
Continued:
- instances.is_recent() now checks recheck_block if 'last_blocked' is provided
- command fetch_blocks() now supports --force parameter
- blacklisted fnaf.stream as this domain has super-long sub-domains (troll)
Roland Häder [Mon, 24 Jul 2023 21:58:15 +0000 (23:58 +0200)]
Continued:
- added column `obfuscated_blocks` to save count of (still) obfuscated blocks
- also exposed it in infos.html view
- blacklisted gitpod.io as this domain floods `instances` table
Roland Häder [Mon, 24 Jul 2023 14:35:53 +0000 (16:35 +0200)]
Continued:
- added command fetch_relay() for fetching instances from ActivityPub relays
which show their peers in index page (/)
- added grid.tf as this flooded a lot "testing/developing" sub domains