Roland Häder [Mon, 24 Jul 2023 05:04:51 +0000 (07:04 +0200)]
Optimized:
- first simple checks then invoke methods
- recheck_obfuscation() is about block lists, not instances, therefore we need
to check 'last_blocked' timestamp
Roland Häder [Fri, 21 Jul 2023 07:05:39 +0000 (09:05 +0200)]
Continued:
- added mitra network supporting fetch_instances (not domain_blocks unfortunate)
- if I fetch domain blocks from chaos.social, it is being reset to zero, so
let's better bypass it here
Roland Häder [Fri, 21 Jul 2023 05:08:57 +0000 (07:08 +0200)]
Continued:
- prepared for reverse-proxy, e.g. Apache/nginx
- configuration keys "scheme" (newly added) and "hostname" are how your FBA
instance is called from outside, I was not able to find any other way as
url_for() was returning a http:// URL and not a https:// ... :-(
Roland Häder [Thu, 20 Jul 2023 15:22:02 +0000 (17:22 +0200)]
Continued:
- only attempt to fetch peers when software was detected
- added API /api/v1/instance/domain_blocks
- for this the blacklist needs to be rewritten for having "block" reasons
included
Roland Häder [Thu, 20 Jul 2023 13:29:39 +0000 (15:29 +0200)]
Continued:
- FBA is now a Fediverse "instance"
- outbound "rss" is supported as feeds are provided
- peer list is available at `/api/v1/instance/peers`, but only instances with
valid nodeinfo
Roland Häder [Wed, 12 Jul 2023 09:05:03 +0000 (11:05 +0200)]
Continued:
- max "crawl" depth and min peerlist size to go deeper is now configurable
- for example for low-memory systems, keep max_crawl_depth small and
min_peers_length big
- the default values may cause python3 to consume ~550 MB RAM
- so you can practially say each depth adds another MB RAM usage
Roland Häder [Wed, 12 Jul 2023 08:30:27 +0000 (10:30 +0200)]
Continued:
- roadhouse is an alias for hubzilla, it is currently unsupported as it doesn't
provide needed APIs for fetching peers and blocklists but just in case they
add it
- same with nextcloud and others
- shumihub is an alias for misskey
Roland Häder [Wed, 12 Jul 2023 05:28:43 +0000 (07:28 +0200)]
Continued:
- a recursive (aka. "crawl") depth of 500 is REALLY far deep, practically the
whole Fediverse
- minimum peer count to deepen the "crawl" to max depth is 100 peers
- flush any pending data of current domain before continuing
Roland Häder [Tue, 11 Jul 2023 05:58:58 +0000 (07:58 +0200)]
Continued:
- alias "quarantined_instances" to "quarantined", you may have to run
`DELETE FROM blocks WHERE block_level='quarantined_instances';`
- ... and: `UPDATE instances SET last_blocked = NULL WHERE software IS NOT NULL AND last_status_code = 200;`
- ... to reset your database, then don't forget to execute ./fba.py fetch_blocks
Roland Häder [Tue, 11 Jul 2023 04:42:34 +0000 (06:42 +0200)]
Continued:
- blacklisted hexbear.net as their JavaScript contains Shell commands + broken
JSON inside that script
- added parsing JSON from JavaScript starting with 'isoData' (encapsulated to
function parse_script())
Roland Häder [Mon, 10 Jul 2023 22:51:49 +0000 (00:51 +0200)]
Continued:
- added command convert_idna to convert UTF-8 encoded international domain
names to punycode domains (IDNA), it caused some to be added in both
encodings
Roland Häder [Mon, 10 Jul 2023 19:12:11 +0000 (21:12 +0200)]
Continued:
- renamed utils.deobfuscate_domain() to deobfuscate()
- oliphant blocklists may contain obfuscated domains, need to deobfuscate them
first to get actual domain names
Roland Häder [Mon, 10 Jul 2023 17:35:39 +0000 (19:35 +0200)]
Continued:
- cannot get len() (number of rows) from reader
- instances.set_total_blocks() accepts as 2nd parameter not direct count, so
let's handle the domain list
Roland Häder [Sun, 9 Jul 2023 17:47:11 +0000 (19:47 +0200)]
Fixed:
- ops, to much renames, named 'domains' back to 'blocklist'
- also need to check combined arrays, or else always 2 will be found
- need to invoke commit() in sources.update() function
Roland Häder [Sat, 8 Jul 2023 21:22:23 +0000 (23:22 +0200)]
Continued:
- some instances or honeypots may return empty (None in Python) link[href]
entries
- you can run a honeypot and pay monthly domain fees for it, not my business,
but at least format your /.well-known/nodeinfo properly!
Roland Häder [Sat, 8 Jul 2023 20:23:54 +0000 (22:23 +0200)]
Continued:
- instances.social is a non-federating website, `origin` should always bear a
federating instance
- please run SQL `DELETE FROM instances WHERE origin='instances.social'` and
afterwards ./fba.py fetch_instances --domain=<some-large-instance>
- then you can run this command (fetch_instances_social) again
Roland Häder [Thu, 6 Jul 2023 07:59:05 +0000 (09:59 +0200)]
Fixed:
- PeerTube's JSON response always includes mode2=following or mode2=follower
depending on if mode=followers or mode=following is set
- this causes PeerTube instances being reported with duplicate amount of peers
Roland Häder [Wed, 5 Jul 2023 21:25:25 +0000 (23:25 +0200)]
Continued:
- added "official" name 'nextcloudpi', others like 'crowncloud', 'darkcloud' are
just aliases created by their owners, I don't provide them a stage in my code
- provided template variable 'domain' might be None
Roland Häder [Wed, 5 Jul 2023 20:15:19 +0000 (22:15 +0200)]
Continued:
- added view /list which lists domains by some criteria (mode/value)
- renamed blocks.is_valid_level() to valid() but now requires 2nd parameter
with column to check
Roland Häder [Tue, 4 Jul 2023 18:28:01 +0000 (20:28 +0200)]
Renaming season:
- renamed table/model file 'apis' to 'sources' as wikis are not APIs but
all are (instance) sources
- renamed api_domain to source_domain
Roland Häder [Tue, 4 Jul 2023 12:57:02 +0000 (14:57 +0200)]
Continued:
- added domain.is_in_url() to check if domain is matching netloc or hostname
part of the URL. This function encodes the domain into punycode before
comparing it