Roland Häder [Tue, 8 Aug 2023 12:57:25 +0000 (14:57 +0200)]
WIP:
- added i18n support
- I still get an exception of missing tag 'trans' and I don't know how to hook
i18n properly into Jinja2
- I found tutorials but they had their Python-based language files while this
here follows: https://svn.python.org/projects/external/Jinja-2.1.1/docs/_build/html/extensions.html
- if someone knows how to make this correctly, please tell me
Roland Häder [Thu, 1 May 2025 11:55:23 +0000 (13:55 +0200)]
Continued:
- skip wordpress.com instances as the public API is always different to the
"instance"
- skip empty doc (BeautifulSoup4) result (HTML parser failed)
- tpzo fixed
Roland Häder [Tue, 29 Apr 2025 19:36:46 +0000 (21:36 +0200)]
Continued:
- ops, `value` is no parameter in daemon's function
- introduced --force-recrawl (to include recently crawled instances) parameter
to 2 commands
- updated --force-all help text
Roland Häder [Mon, 21 Apr 2025 03:03:19 +0000 (05:03 +0200)]
Continued:
- added `srv.us` and `linodeusercontent.com` as mass-hosters/tunnel service,
no real instance can be expected here
- if table `instances` doesn't bear a record then `is_recent()` should return
`False`
- removed parameter `--single` from command `fetch_instances` and moved SQL
statement into `else` block
Roland Häder [Mon, 21 Apr 2025 02:21:03 +0000 (04:21 +0200)]
Continued:
- fixed some tpzos (TypError -> TypeError, row[block] -> block[reason], ...)
- RuntimeException is RuntimeError (confusing as errors are hard for my understanding)
- used f-string instead (pylint is now a bit happier)
- still it is confusing with import errors?!
Roland Häder [Mon, 21 Apr 2025 01:06:54 +0000 (03:06 +0200)]
Continued:
- shorthand "e.g." replaced by "for example"
- removed if() block as a loop on an empty list is still not doing anything
and the else block only contained a debug line
Roland Häder [Sun, 20 Apr 2025 23:55:31 +0000 (01:55 +0200)]
Continued:
- let's not shorten so much, else local functions may be confused with impored
libraries
- renamed variable `domain` to `hostname`, not a domain only
- skip unwanted domains before invoking encode_idna()
Roland Häder [Tue, 21 Jan 2025 11:43:44 +0000 (12:43 +0100)]
Continued:
- added more exceptions to catch and handle
- split long 2-statements lines into single lines for better error handling
and debugging (if fedilist is one day back -> https://fedilist.com )
Roland Häder [Wed, 15 Jan 2025 02:12:11 +0000 (03:12 +0100)]
Continued:
- need to skip invalid table headers, they should be introduced with <thead>
and then each column <th> but some website may use <tr> instead of <thead>
- strip (trim) strings
Roland Häder [Mon, 13 Jan 2025 23:40:06 +0000 (00:40 +0100)]
Continued:
- avoided dangerous (=mutable) argument to functions/methods (thanks to pylint)
- reduced invocation count of find_all("foo") by using local variable
- added more checks in "quarantined" branch
Roland Häder [Mon, 13 Jan 2025 22:33:37 +0000 (23:33 +0100)]
Continued:
- check if required key 'url' is in dict 'row'
- avoided superflous else + indenting
- simplier check before complex checks (`if domain in domains` is lesser
intensive than invoking "expensive" is_wanted())
Roland Häder [Sun, 12 Jan 2025 03:03:52 +0000 (04:03 +0100)]
Continued:
- logging whole tag isn't a good idea
- yes, the HTML is sometimes broken, e.g. <meta> and not <meta /> (self-closing)
- the open tag causes the warning
Roland Häder [Fri, 10 Jan 2025 08:41:22 +0000 (09:41 +0100)]
Continued:
- blacklisted pinggy.link, this floods instances table. Please register yourself
a proper domain and not use "free" hosters' server names as "domain name"
Roland Häder [Sat, 4 Jan 2025 00:49:52 +0000 (01:49 +0100)]
Continued:
- some variables fixed (e.g. no row[domain] but source_domain does exist
- added some random sleep for fetch_observer() as they seem to block to quick
responses on automatic requests
Roland Häder [Tue, 1 Oct 2024 16:52:05 +0000 (18:52 +0200)]
Continued:
- added function update() for "model" obfuscation (module name and table name
must match as this is naming convention)
- added column 'last_used' to table 'obfuscation'