Commit graph

15 commits

Author SHA1 Message Date
Thomas Hochstein b99d41010d Forcibly decode headers with unencoded 8bit chars.
Just assume UTF-8 for the time being.
Fixes database errors with illegal characters
when writing parsed data.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2021-05-29 10:17:00 +02:00
Thomas Hochstein 6deb7dbaa4 Add MID to error message to make it more useful.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2021-05-29 09:47:17 +02:00
Thomas Hochstein 48c8d4bb8e Add some input validation.
Our raw data doesn't have the qualitiy one should
expect. There are empty header lines only containing
whitespace (leading to wrong joining of apparent
continuation lines); header lines that contain garbage
without ':' so split is failing; empty 'newsgroups'
fields; unsupported encondings in MIME encoded words
... and so on.

Add fixes for the aforementioned problems.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:25 +02:00
Thomas Hochstein 13e006104b Add documentation to parsedb.pl.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:23 +02:00
Thomas Hochstein aef5467bfe Handle more than one entitiy in From: etc.
From:, Sender: etc. may contain more than one
entity in a comma separated list, i.e. a From:
line like
"From: Me <me@example.com>, You <you@example.com>"
is perfectly valid.

Handle multiple entities when splitting those
headers and save all names and all adresses
as (new) comma separated lists in the
corresponding database fields.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:21 +02:00
Thomas Hochstein ca8ac4d50f Let gatherstats read its data from DBTableParse.
Switch gatherstat.pl over to the parsed database.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:19 +02:00
Thomas Hochstein 9630376c31 Add decoding and parsing of From: etc.
Decode From:, Sender:, Reply-To:, Subject:;
parse From:, Sender:, Reply-To:.

Add Mail::Address to prerequisites.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:17 +02:00
Thomas Hochstein 6d72dad2c0 Create a database table with parsed raw data.
Incoming data is written to DBTableRaw without
much interpretation. To allow for more and
better analysis that raw data should be parsed
daily and copied to another database table
with separate fields for most header lines.
All other scripts could use that pre-parsed
data.

* Add database schema to install.pl
* Add DBTableParse to newsstats.conf.sample
  and as mandatory to NewsStats.pm
* Add parsedb.pl

TODO:
- Documentation is only rudimentary.
- From:, Sender:, Reply-To: and Subject:
  are not yet parsed.
- gatherstats.pl does not yet use DbTableParse.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:27:50 +02:00
Thomas Hochstein 599fefbf6a Merge branch 'thh-bug51' into next
* thh-bug51:
  One more default sorting order ("grouping").
2013-09-03 22:25:23 +02:00
Thomas Hochstein 8dc6823e98 Small comment fixes.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 17:12:09 +02:00
Thomas Hochstein 17ef44085f --sums is not compatible with --checkgroups.
'Virtual' .ALL groups will never be present in
a checkgroups file, and we can't use them anyway
as they would contain postings from groups that
are filtered out by --checkgroups.

Add a warning, put a note in the documentation.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 15:10:07 +02:00
Thomas Hochstein ea91003a99 One more default sorting order ("grouping").
If --group-by is not set, output will be grouped
by month by default (as long as --boundary is
not set to 'level' or 'average', where grouping
by newsgroup is default).

Now we default to 'newsgroup' if just one newsgroup
is requested by --newsgroups, but more than one
month by --month.

Both defaults can be overridden.

But forced --group-by=month for --report type
'average' or 'sum' in front so defaults are
not checked.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 14:56:17 +02:00
Thomas Hochstein 23ab67a099 Make configuration file configurable.
Add --conffile option to all scripts to
overrride standard config file location
etc/newsstats.conf.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:20 +02:00
Thomas Hochstein dfc2b81c37 Fix some whitespace.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:18 +02:00
Thomas Hochstein 2ad99c20bc Redo directory structure.
* Move all scripts to /bin
* Move configuration to /etc
* Move NewsStats.pm to /lib
* Add new path to NewsStats.pm to all scripts
* Set $HomePath to top level directory
* Move setting of config file name to ReadConf()

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:16 +02:00