Roland Häder [Mon, 5 Jun 2023 20:09:42 +0000 (22:09 +0200)]
Continued:
- introduced argparse which is a more flexible way of handling command-line
arguments
- moved all commands to fba/command.py you can now access them through
$ ./fba.py <command>
- please use --help to see which commands are all supported, you can also use
it on a single command to get all supported arguments
Roland Häder [Mon, 5 Jun 2023 07:31:18 +0000 (09:31 +0200)]
Continued:
- some APIs return the same result at the end all over again
- so we need to check if the domain is already part of the created list
- for that purpose has_element() is introduced
Roland Häder [Sun, 4 Jun 2023 12:54:47 +0000 (14:54 +0200)]
Continued:
- introduced json_from_response() which handles decoding errors, e.g. when
a server has returned a HTML instead of a JSON which is caused by improper
error handling
Roland Häder [Sun, 4 Jun 2023 09:47:07 +0000 (11:47 +0200)]
Continued:
- the value from "misskey_offset" went into "limit" parameter, so let's rename
it to "misskey_limit"
- also misskey may return same set of results, so this need to be counted and
then the while loop aborted
- also TIME() did NOT store timestamp, we need to use time.time() again ...
Roland Häder [Sun, 4 Jun 2023 04:47:28 +0000 (06:47 +0200)]
Continued:
- there are no private variables or methods, all is public
- there is however a notation, that a leading _ indicates private access and
that you should search for getters/setters
- so when you see some code like `module._foo()` then notify them about this
Roland Häder [Sat, 3 Jun 2023 13:31:18 +0000 (15:31 +0200)]
Continued:
- added script to check if instance aka. "domain" is valid, not blacklisted
and not already registered
- in any of these conditions, an other status code is returned
Roland Häder [Fri, 2 Jun 2023 13:44:15 +0000 (15:44 +0200)]
Continued:
- first thanks to `activitypub-trolls.cf` we have tons of registered
"instances" which needs to be fetched and then these trolls need to be
ignored
- then commented out some debug lines
- also fixed some code
- also let */? pass as obsfucations
Roland Häder [Fri, 2 Jun 2023 09:55:26 +0000 (11:55 +0200)]
Continued:
- renamed update_nodeinfos() to update_instance_data()
- also only delete pending instance data if UPDATE statement has updated
something (at least the timestamps SHOULD cause a row update)
Roland Häder [Fri, 2 Jun 2023 09:52:14 +0000 (11:52 +0200)]
Continued:
- need to check 'peer' if type 'None'
- 48 hours is enough for checking for new instances, so fast no new instance
will be created
- updating 'last_updated' now moved to update_nodeinfos()
Roland Häder [Thu, 1 Jun 2023 20:14:27 +0000 (22:14 +0200)]
Continued:
- moved all updates of columns in 'instances' table to central function
update_nodeinfos(), other functions are just wrappers to fill proper array
elements
- also save total found row count
Roland Häder [Tue, 30 May 2023 05:54:22 +0000 (07:54 +0200)]
Continued:
- cache access can be very noisy, others maybe not so much
- also check for og:site_name to "guess" the software type, old Mastodon (2.x.x)
versions don't provide nodeinfo data
- remove " hosted on " and following (typical for og:site_name from Mastodon)
Roland Häder [Mon, 29 May 2023 19:05:16 +0000 (21:05 +0200)]
Continued:
- added .json to all JSON responses
- added response class JSONResponse (I hope it is not overstated this way?
- cleaned up imports a bit (only used once can be referenced directly)
Roland Häder [Mon, 29 May 2023 17:13:00 +0000 (19:13 +0200)]
Continued:
- introduced config key 'host', so you can let this run on an other IP address
- recommended is 127.0.0.1, of course and then setup a reverse-proxy (in
Apache's terminology)
- cleaned up imports
Roland Häder [Mon, 29 May 2023 16:31:34 +0000 (18:31 +0200)]
Continued:
- removed superflous 'get_peers_url' from database, it was only logged and
get_peers() cannot make use of it as it depends on detected software
Roland Häder [Mon, 29 May 2023 06:29:10 +0000 (08:29 +0200)]
Continued:
- is_instance_registered() caused tons of SQL queries, let's introduce some
cache here
- 1 hour for recheck is good for development (even shorter) but a bad idea in
the wild
Roland Häder [Sun, 28 May 2023 13:19:41 +0000 (15:19 +0200)]
Continued:
- 100 rows should work! (the fail-safe check "fetched versus expected" will
kick in here)
- also read origin and pass over 'origin' during fetching instances
Roland Häder [Sun, 28 May 2023 12:06:53 +0000 (14:06 +0200)]
Continued:
- encapsulted into function add_peers()
- need to add "Content-Type: application/json" for API requests, thanks to Kromonos
- introduced 'api_headers' for JSON API requests
Roland Häder [Sun, 28 May 2023 09:36:18 +0000 (11:36 +0200)]
Continued:
- also ngrok-free.app is a testing/development ground, no productive/live
instances will be found there
- please don't abuse their kind services for hosting a live instance!
- didn't log variable "instance", ops
Roland Häder [Sun, 28 May 2023 09:19:13 +0000 (11:19 +0200)]
Continued:
- don't name your variables after packages, Python seem to be not strict on
checking data types while referencing
- also scan misskey instances for new (no filter applied)
Roland Häder [Sun, 28 May 2023 07:58:12 +0000 (09:58 +0200)]
Continued:
- these aren't supposed to be real URLs, they COULD be reached actually
- these URLs are references, not crawlable URLs
- so some people overdoze the SSL here a little, as http:// is just enough
for referencing to a specification
Roland Häder [Fri, 26 May 2023 15:10:52 +0000 (17:10 +0200)]
Continued:
- one character more to remove which cuts of the separator, e.g. '/'
- also don't raise exceptions here, a returned unmodified software name is just fine
Roland Häder [Fri, 26 May 2023 04:41:53 +0000 (06:41 +0200)]
Continued:
- also strip out " by " and " see " (self-advertisement)
- same with " version"
- some version numbers had uncommon long patch levels, e.g. 8.0.0000
Roland Häder [Thu, 25 May 2023 23:24:01 +0000 (01:24 +0200)]
Continued:
- old Friendica installations (I found one with 2019.03) may have version
number in software's name info and in format YYYY.MM (and maybe later others)
Roland Häder [Thu, 25 May 2023 23:09:27 +0000 (01:09 +0200)]
Continued:
- an INFO message is okay, let the user know that the <meta name='generator'>
was found and taken as the software behind the instance/website
Roland Häder [Thu, 25 May 2023 20:07:18 +0000 (22:07 +0200)]
Continued:
- try to strip off version numbers from software name
- remove_version() will output a warning and return 'software' unmodified if it
fails to match version number against regex