]> git.mxchange.org Git - fba.git/log
fba.git
10 months agoContinued:
Roland Häder [Thu, 20 Jul 2023 15:22:02 +0000 (17:22 +0200)]
Continued:
- only attempt to fetch peers when software was detected
- added API /api/v1/instance/domain_blocks
- for this the blacklist needs to be rewritten for having "block" reasons
  included

10 months agoContinued:
Roland Häder [Thu, 20 Jul 2023 13:29:39 +0000 (15:29 +0200)]
Continued:
- FBA is now a Fediverse "instance"
- outbound "rss" is supported as feeds are provided
- peer list is available at `/api/v1/instance/peers`, but only instances with
  valid nodeinfo

10 months agoContinued:
Roland Häder [Thu, 20 Jul 2023 12:51:22 +0000 (14:51 +0200)]
Continued:
- added another alias for misskey
- / (index page) should be last

10 months agoContinued:
Roland Häder [Wed, 19 Jul 2023 15:09:54 +0000 (17:09 +0200)]
Continued:
- added some checks for parameter 'origin'

10 months agoContinued:
Roland Häder [Wed, 19 Jul 2023 11:41:24 +0000 (13:41 +0200)]
Continued:
- renamed parameter --all to --force

10 months agoContinued:
Roland Häder [Wed, 19 Jul 2023 11:05:14 +0000 (13:05 +0200)]
Continued:
- in case of database error (on their side), there is no navigation which
  lead here to a method invocation findAll() on NoneType

11 months agoContinued:
Roland Häder [Mon, 17 Jul 2023 16:12:04 +0000 (18:12 +0200)]
Continued:
- fetch_blocks should NOT check if a blocked domain has recently been "crawled"
  as this would exclude it from being blocked (ops!)

11 months agoContinued:
Roland Häder [Mon, 17 Jul 2023 16:06:59 +0000 (18:06 +0200)]
Continued:
- blacklisted `ngrok.app` as this is another testing/developing domain name

11 months agoContinued:
Roland Häder [Mon, 17 Jul 2023 13:47:09 +0000 (15:47 +0200)]
Continued:
- log returned empty lists separately from warning (mostly caught exception)

11 months agoContinued:
Roland Häder [Mon, 17 Jul 2023 13:24:30 +0000 (15:24 +0200)]
Continued:
- rewrote a bit for better logging
- also don't nest function/method invocations, they are not easy to debug

11 months agoContinued:
Roland Häder [Mon, 17 Jul 2023 09:20:54 +0000 (11:20 +0200)]
Continued:
- skip below code when vital elements 'domain', 'severity' are not found

11 months agoContinued:
Roland Häder [Sun, 16 Jul 2023 22:17:44 +0000 (00:17 +0200)]
Continued:
- added "alias" (long version)

11 months agoContinued:
Roland Häder [Sun, 16 Jul 2023 22:14:53 +0000 (00:14 +0200)]
Continued:
- added flooder to blacklist
- added more networks for peers at least: 'gotosocial', 'brighteon',
  'wildebeest', 'bookwyrm'

11 months agoContinued:
Roland Häder [Sat, 15 Jul 2023 01:19:59 +0000 (03:19 +0200)]
Continued:
- abostrophed alias for 'takahe' network
- alias added for misskey

11 months agoContinued:
Roland Häder [Sat, 15 Jul 2023 00:47:38 +0000 (02:47 +0200)]
Continued:
- init rows list
- log type and length
- check if 'domain' and 'severity' are part of 'block' dictionary

11 months agoContinued:
Roland Häder [Fri, 14 Jul 2023 02:56:35 +0000 (04:56 +0200)]
Continued:
- alias for misskey (Russian)

11 months agoContinued:
Roland Häder [Thu, 13 Jul 2023 10:46:52 +0000 (12:46 +0200)]
Continued:
- check if dict 'row' has key 'hostname'

11 months agoContinued:
Roland Häder [Thu, 13 Jul 2023 07:32:12 +0000 (09:32 +0200)]
Continued:
- these values don't cause you 3 GB RAM usage, here they run fine with
  800MB to 1GB

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 09:38:28 +0000 (11:38 +0200)]
Continued:
- */* catches all, so let it pass as valid content type

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 09:20:12 +0000 (11:20 +0200)]
Continued:
- don't let the user set something higher (- 50) than system allows

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 09:05:03 +0000 (11:05 +0200)]
Continued:
- max "crawl" depth and min peerlist size to go deeper is now configurable
- for example for low-memory systems, keep max_crawl_depth small and
  min_peers_length big
- the default values may cause python3 to consume ~550 MB RAM
- so you can practially say each depth adds another MB RAM usage

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 08:30:27 +0000 (10:30 +0200)]
Continued:
- roadhouse is an alias for hubzilla, it is currently unsupported as it doesn't
  provide needed APIs for fetching peers and blocklists but just in case they
  add it
- same with nextcloud and others
- shumihub is an alias for misskey

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 05:28:43 +0000 (07:28 +0200)]
Continued:
- a recursive (aka. "crawl") depth of 500 is REALLY far deep, practically the
  whole Fediverse
- minimum peer count to deepen the "crawl" to max depth is 100 peers
- flush any pending data of current domain before continuing

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 04:49:13 +0000 (06:49 +0200)]
Continued:
- recursive adding is totally okay, as it won't happen to go endless
- renamed blocks.add_instance() to add()

11 months agoContinued:
Roland Häder [Wed, 12 Jul 2023 00:37:40 +0000 (02:37 +0200)]
Continued:
- okay, this instance floods with non-existing sub-domains

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 21:27:34 +0000 (23:27 +0200)]
Continued:
- added IDNA encoding

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 13:21:38 +0000 (15:21 +0200)]
Continued:
- ops, header was wrong here due to previous changes (search for all headers)
- but after a few renames, all is back in order!

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 12:26:58 +0000 (14:26 +0200)]
Continued:
- need to find all, not just first element ...

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 12:00:27 +0000 (14:00 +0200)]
Continued:
- fixed issues from pylint

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 11:48:05 +0000 (13:48 +0200)]
Continued:
- moved utils.alias_block_level() to blocks model
- fixed some pylint issues

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 10:31:23 +0000 (12:31 +0200)]
Continued:
- introduced new module "processing"
- renamed process_*() to *() ;-)

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 10:22:24 +0000 (12:22 +0200)]
Continued:
- strip off leading dots as the IDNA encoder gets confused about it

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 08:24:04 +0000 (10:24 +0200)]
Continued:
- exclude chaos.social here as their API for fetching blocks is disabled and
  invoking set_total_blocks() would reset it to zero

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 06:22:04 +0000 (08:22 +0200)]
Continued:
- maybe does not contain any header at all?

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 06:14:33 +0000 (08:14 +0200)]
Continued:
- find more blocklists/peer lists from Lemmy by also scanning for (out-dated?
  class=container)

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 05:58:58 +0000 (07:58 +0200)]
Continued:
- alias "quarantined_instances" to "quarantined", you may have to run
  `DELETE FROM blocks WHERE block_level='quarantined_instances';`
- ... and: `UPDATE instances SET last_blocked = NULL WHERE software IS NOT NULL AND last_status_code = 200;`
- ... to reset your database, then don't forget to execute ./fba.py fetch_blocks

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 05:52:03 +0000 (07:52 +0200)]
Continued:
- just remove that single hash to get final SQL statements being printed

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 05:37:16 +0000 (07:37 +0200)]
Continued:
- nope, the returned 'blocking' list is differently structured

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 04:42:34 +0000 (06:42 +0200)]
Continued:
- blacklisted hexbear.net as their JavaScript contains Shell commands + broken
  JSON inside that script
- added parsing JSON from JavaScript starting with 'isoData' (encapsulated to
  function parse_script())

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 02:53:55 +0000 (04:53 +0200)]
Continued:
- more generic scan for possible linked/blocked instances

11 months agoContinued:
Roland Häder [Tue, 11 Jul 2023 02:32:43 +0000 (04:32 +0200)]
Continued:
- updating distinct domain nodeinfos can be forced

11 months agoFixed:
Roland Häder [Mon, 10 Jul 2023 23:04:13 +0000 (01:04 +0200)]
Fixed:
- "TypeError: 'sqlite3.Row' object does not support item assignment"

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 22:58:18 +0000 (00:58 +0200)]
Continued:
- code UTF-8 international domains to punycode (IDNA) here, too

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 22:53:18 +0000 (00:53 +0200)]
Continued:
- okay, then don't check if they are punycode and then raise an exception ...

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 22:51:49 +0000 (00:51 +0200)]
Continued:
- added command convert_idna to convert UTF-8 encoded international domain
  names to punycode domains (IDNA), it caused some to be added in both
  encodings

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 21:06:51 +0000 (23:06 +0200)]
Continued:
- fixed missing dict key element
- added debug messages

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 20:19:40 +0000 (22:19 +0200)]
Continued:
- added CSV file gardenfence.csv for oliphant blocklist member sunny.garden

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 19:38:29 +0000 (21:38 +0200)]
Continued:
- alias severity level during fetch_oliphant, too

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 19:30:40 +0000 (21:30 +0200)]
Continued:
- handle over proper severity level

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 19:12:11 +0000 (21:12 +0200)]
Continued:
- renamed utils.deobfuscate_domain() to deobfuscate()
- oliphant blocklists may contain obfuscated domains, need to deobfuscate them
  first to get actual domain names

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 17:35:39 +0000 (19:35 +0200)]
Continued:
- cannot get len() (number of rows) from reader
- instances.set_total_blocks() accepts as 2nd parameter not direct count, so
  let's handle the domain list

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 12:07:39 +0000 (14:07 +0200)]
Continued:
- "storage share" is nextcloud again
- proper domain for bka.li

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 03:42:05 +0000 (05:42 +0200)]
Continued:
- larger amount of instances have changed (again) Nextcloud's software name to
  "storage share", so let's alias it back

11 months agoContinued:
Roland Häder [Mon, 10 Jul 2023 00:56:21 +0000 (02:56 +0200)]
Continued:
- yet again an other misskey instance (stella.place) returns the same results
  with raising offset all over again

11 months agoContinued:
Roland Häder [Sun, 9 Jul 2023 17:49:15 +0000 (19:49 +0200)]
Continued:
- it is 'ryona.agency';

11 months agoFixed:
Roland Häder [Sun, 9 Jul 2023 17:47:11 +0000 (19:47 +0200)]
Fixed:
- ops, to much renames, named 'domains' back to 'blocklist'
- also need to check combined arrays, or else always 2 will be found
- need to invoke commit() in sources.update() function

11 months agoContinued:
Roland Häder [Sun, 9 Jul 2023 17:27:16 +0000 (19:27 +0200)]
Continued:
- source_last_access is the proper configuration key now
- some spaces
- more average peer count

11 months agoContinued:
Roland Häder [Sun, 9 Jul 2023 03:51:16 +0000 (05:51 +0200)]
Continued:
- added further aliases

11 months agoContinued:
Roland Häder [Sat, 8 Jul 2023 21:22:23 +0000 (23:22 +0200)]
Continued:
- some instances or honeypots may return empty (None in Python) link[href]
  entries
- you can run a honeypot and pay monthly domain fees for it, not my business,
  but at least format your /.well-known/nodeinfo properly!

11 months agoContinued:
Roland Häder [Sat, 8 Jul 2023 21:06:22 +0000 (23:06 +0200)]
Continued:
- also update `last_nodeinfo` when nodeinfo has been auto-discovered or found by
  probing their also well-known locations

11 months agoContinued:
Roland Häder [Sat, 8 Jul 2023 20:23:54 +0000 (22:23 +0200)]
Continued:
- instances.social is a non-federating website, `origin` should always bear a
  federating instance
- please run SQL `DELETE FROM instances WHERE origin='instances.social'` and
  afterwards ./fba.py fetch_instances --domain=<some-large-instance>
- then you can run this command (fetch_instances_social) again

11 months agoFixed:
Roland Häder [Sat, 8 Jul 2023 00:54:09 +0000 (02:54 +0200)]
Fixed:
- forgot to add [key] (value access)

11 months agoContinued:
Roland Häder [Sat, 8 Jul 2023 00:28:47 +0000 (02:28 +0200)]
Continued:
- ops, wrong bait, wrong fish

11 months agoContinued:
Roland Häder [Fri, 7 Jul 2023 20:42:51 +0000 (22:42 +0200)]
Continued:
- check response.status_code and length of response.text
- handle None (NULL) values for last_updated/last_seen

11 months agoContinued:
Roland Häder [Thu, 6 Jul 2023 10:49:07 +0000 (12:49 +0200)]
Continued:
- added missing cases for header (h1) and browser title bar
- fixed module name

11 months agoFixed:
Roland Häder [Thu, 6 Jul 2023 07:59:05 +0000 (09:59 +0200)]
Fixed:
- PeerTube's JSON response always includes mode2=following or mode2=follower
  depending on if mode=followers or mode=following is set
- this causes PeerTube instances being reported with duplicate amount of peers

11 months agoContinued:
Roland Häder [Thu, 6 Jul 2023 03:54:09 +0000 (05:54 +0200)]
Continued:
- cut out ' - ' only if found

11 months agoContinued:
Roland Häder [Thu, 6 Jul 2023 01:00:00 +0000 (03:00 +0200)]
Continued:
- added mode=command
- added links from list.html

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 23:41:33 +0000 (01:41 +0200)]
Continued:
- use fetch_response() instead of invoking reqto.get()
- more debug logging
- check response status of bot posts

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 22:21:58 +0000 (00:21 +0200)]
Continued:
- 11 commands are already there to fetch instances

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 22:14:55 +0000 (00:14 +0200)]
Continued:
- fixed issues reported by pylint

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 21:25:25 +0000 (23:25 +0200)]
Continued:
- added "official" name 'nextcloudpi', others like 'crowncloud', 'darkcloud' are
  just aliases created by their owners, I don't provide them a stage in my code
- provided template variable 'domain' might be None

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 20:15:19 +0000 (22:15 +0200)]
Continued:
- added view /list which lists domains by some criteria (mode/value)
- renamed blocks.is_valid_level() to valid() but now requires 2nd parameter
  with column to check

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 19:36:07 +0000 (21:36 +0200)]
Continued:
- no border for tables inside .notice

11 months agoContinued:
Roland Häder [Wed, 5 Jul 2023 17:59:29 +0000 (19:59 +0200)]
Continued:
- error fixed (ops?)

11 months agoRenaming season:
Roland Häder [Tue, 4 Jul 2023 18:28:01 +0000 (20:28 +0200)]
Renaming season:
- renamed table/model file 'apis' to 'sources' as wikis are not APIs but
  all are (instance) sources
- renamed api_domain to source_domain

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 17:18:27 +0000 (19:18 +0200)]
Continued:
- if response is not okay, throw exception
- avoids init of domain_data variable

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 16:42:21 +0000 (18:42 +0200)]
Continued:
- rewrote to fetch domain data over internal JSON API (seem to be not cached?)
- avoid local variable

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 16:27:51 +0000 (18:27 +0200)]
Continued:
- raise_on() on obfuscated domains was a wrong decision

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 14:49:13 +0000 (16:49 +0200)]
Continued:
- try to "crawl" instances being redirected to

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 13:26:12 +0000 (15:26 +0200)]
Continued:
- wording fixed: "Parameter foo[]='%s' is not type of '%'"

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 12:57:02 +0000 (14:57 +0200)]
Continued:
- added domain.is_in_url() to check if domain is matching netloc or hostname
  part of the URL. This function encodes the domain into punycode before
  comparing it

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 11:56:54 +0000 (13:56 +0200)]
Continued:
- STATIC_CHECK means well-known URLs are "statically" checked

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 11:55:23 +0000 (13:55 +0200)]
Continued:
- PLATFORM documented

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 11:49:38 +0000 (13:49 +0200)]
Continued:
- unique and primary keys on single columns can be moved into table definition

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 11:43:07 +0000 (13:43 +0200)]
Continued:
- also check og:platform
- dont' set None for detected software type, maybe the website is just down for
  maintenance

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 10:47:48 +0000 (12:47 +0200)]
Continued:
- used domain_helper.raise_on(domain) instead

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 10:03:00 +0000 (12:03 +0200)]
Continued:
- first (if needed) acquire lock, then check (if needed) api_domain
- also check host name (components.netloc) for feed in fetch_fba_rss command

11 months agoContinued:
Roland Häder [Tue, 4 Jul 2023 08:39:50 +0000 (10:39 +0200)]
Continued:
- renamed column apis.hostname to apis.api_domain
- added apis.fetch()
- check if fetching an URL from fedilist.com was successful

11 months agoContinued:
Roland Häder [Mon, 3 Jul 2023 22:37:31 +0000 (00:37 +0200)]
Continued:
- added command fetch_instances_social to fetch new instances from
  instances.social
- you need to get an API key from them, please don't lower api_last_access to
  much, your API key/IP address might get banned!
- added table `apis` which keeps track of "API" accessed, including github
  and wikis, this is to lower traffic on these sites, again: please DO NOT
  overdose these requests! Your IP/API key might get blocked!

11 months agoContinued:
Roland Häder [Mon, 3 Jul 2023 21:30:15 +0000 (23:30 +0200)]
Continued:
- rewrote a bit to get lesser nested blocks
- "href" might not be set while "rel" is set ... to much broken
  /.well-known/nodeinfo replies!

11 months agoContinued:
Roland Häder [Mon, 3 Jul 2023 18:04:45 +0000 (20:04 +0200)]
Continued:
- renamed /api/index.json to /api/top.json
- count each checked domain and calculate percentage

11 months agoContinued:
Roland Häder [Mon, 3 Jul 2023 12:19:39 +0000 (14:19 +0200)]
Continued:
- reset software, detection mode and nodeinfo URL to None when redirection is
  done to other domain
- yes, some people have moved their instance to a sub domain and now redirect
  their traffic to there
- still this had caused another instance under a wrong domain name to be
  registered
- this fix solves this, please run ./fba.py update_nodeinfo
- added config key recheck_nodeinfo

11 months agoContinued:
Roland Häder [Mon, 3 Jul 2023 03:09:49 +0000 (05:09 +0200)]
Continued:
- some people have broken /.well-known/nodeinfo links (href), some contain
  scheme, but no netloc (host name)

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 23:12:14 +0000 (01:12 +0200)]
Continued:
- better word

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 20:34:28 +0000 (22:34 +0200)]
Continued:
- also don't output a warning for "application/jrd+json"
- same for "application/activity+json"

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 20:27:48 +0000 (22:27 +0200)]
Continued:
- acquire lock as this command changes database

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 19:19:58 +0000 (21:19 +0200)]
Continued:
- log response.content when used
- variable blocker or block["blocker"] ? The later one.

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 18:49:58 +0000 (20:49 +0200)]
Continued:
- return whole nodeinfo dict, including "json"
- also need to handle this for pleroma

11 months agoContinued:
Roland Häder [Sun, 2 Jul 2023 18:29:28 +0000 (20:29 +0200)]
Continued:
- issue a warning when determined software type has changed