Incoming data is written to DBTableRaw without
much interpretation. To allow for more and
better analysis that raw data should be parsed
daily and copied to another database table
with separate fields for most header lines.
All other scripts could use that pre-parsed
data.
* Add database schema to install.pl
* Add DBTableParse to newsstats.conf.sample
and as mandatory to NewsStats.pm
* Add parsedb.pl
TODO:
- Documentation is only rudimentary.
- From:, Sender:, Reply-To: and Subject:
are not yet parsed.
- gatherstats.pl does not yet use DbTableParse.
Signed-off-by: Thomas Hochstein <thh@inter.net>
GetTimePeriod() was written to take a month
('YYYY-MM') and work with that. Make it accept
not only a month, but also a day ('YYYY-MM-DD')
by adding a $TYpe modifier.
Rename LastMonth() to LastMonthDay() and rewrite
it accordingly.
Rename CheckMonth() to CheckPeriod() and rewrite
it accordingly.
As GetTimePeriod() defaults to 'month' if no
modifier is passed this change should be backwards
compatible.
Signed-off-by: Thomas Hochstein <thh@inter.net>
'Virtual' .ALL groups will never be present in
a checkgroups file, and we can't use them anyway
as they would contain postings from groups that
are filtered out by --checkgroups.
Add a warning, put a note in the documentation.
Signed-off-by: Thomas Hochstein <thh@inter.net>
If --group-by is not set, output will be grouped
by month by default (as long as --boundary is
not set to 'level' or 'average', where grouping
by newsgroup is default).
Now we default to 'newsgroup' if just one newsgroup
is requested by --newsgroups, but more than one
month by --month.
Both defaults can be overridden.
But forced --group-by=month for --report type
'average' or 'sum' in front so defaults are
not checked.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Move all scripts to /bin
* Move configuration to /etc
* Move NewsStats.pm to /lib
* Add new path to NewsStats.pm to all scripts
* Set $HomePath to top level directory
* Move setting of config file name to ReadConf()
Signed-off-by: Thomas Hochstein <thh@inter.net>
* next: (26 commits)
Some documentation fixes and enhancments.
Improve INSTALL documentation.
README: Update copyright notice.
README: improve phrasing.
Change handling of warnings.
Improve output padding.
Check for invalid newsgroup names.
Add some basic validation to config parser.
Create better newsgroup lists for SQL clause.
Fix config path detection for install.pl.
Get empty 'virtual' hierarchies working.
Add some TODO entries.
Add database creation to installer.
Handle undefined previous version when installing.
Refactor database initialisation in feedlog.pl.
Add empty 'virtual' .ALL hierarchies as needed.
Change interpretation of --checkgroups to template
Be more fault-tolerant when reading checkgroups.
Remove call to &Bleat where not appropriate.
Allow more characters in TLH definitions.
...
Replace 'perl -W' by 'use warnings;'.
The latter is preferred, and '-W'
(instead of '-w') was causing problems with
warnings in DB::mysql::GetInfo.pm.
Signed-off-by: Thomas Hochstein <thh@inter.net>
Take 'length' of numbers in account.
Change GetMaxLength() accordingly and use that
new information in FormatOutput().
Fixes#53.
Signed-off-by: Thomas Hochstein <thh@inter.net>
Build a 'IN(...)' list for single newsgroup
names without wildcards. Create SQL clause
with a mix of wildcards and wildcard-less
group names.
More code for a better query ...
Fixes#37.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* installation:
Fix config path detection for install.pl.
Add some TODO entries.
Add database creation to installer.
Handle undefined previous version when installing.
Make use of $Path - which is set and checked
to display a correct 'newsfeeds' example - to
load our configuration from the correct location.
Signed-off-by: Thomas Hochstein <thh@inter.net>
Commit b5125b1099
was broken.
We didn't add empty .ALL hierarchies as needed;
we added empty (non-existant) hierarchies without
appended '.ALL', and didn't add the original
empty group we started with.
(What's more, gatherstats didn't even start any
more due to missing ex- and import of
&ParseHierarchies from NewsStats.pm.)
Fixes#52 (and some more breakage).
Signed-off-by: Thomas Hochstein <thh@inter.net>
$OptUpdate is undefined when not upgrading, so don't
prepare an upgrade notice to avoid calling an
undefined variable.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Move database initialisation to a separate function.
* (Re-)try to connect every five seconds
(instead of going into an endless loop) and
log successful (re-)connections.
* Log postings that are dropped due to database failures
to syslog (Message-ID) for recovery.
* If the connection to the database is lost, try to
recover it (every five seconds) and try again to
write the pending data.
* Input will be buffered automatically by INN until
feedlog is able to process it (see man 5 newsfeeds).
Fixes#30, #31.
Signed-off-by: Thomas Hochstein <thh@inter.net>
When using a --checkgroups file while tabulating,
valid but empty groups will be added with a posting
count of zero as needed. If all groups in a
sub-hierarchy are empty, the virtual '.ALL' group
for that sub-hierarchy was not created, though.
If local.test.dummy and local.test.binary were
both empty, both groups were added with a posting
count of '0', but local.test.ALL was not.
Now we loop through all hierarchy elements using
ParseHierarchies and add empty .ALL hierarchies as
needed.
Fixes#49.
Also fixing a typo in some comment. :-)
Signed-off-by: Thomas Hochstein <thh@inter.net>
In most hierarchies, the list of valid newsgroups will
change over time, so you'll have to use another
checkgroups file for each month. gatherstats will now
understand the value of --checkgroups to be a template
and amend it with each month it is processing.
Documentation changed accordingly.
Signed-off-by: Thomas Hochstein <thh@inter.net>
TLH may now also contain literal dots '.',
allowing for using second or third level
hierarchies as "TLH". To faciliate that,
'+' and '-' will be allowed, too.
Signed-off-by: Thomas Hochstein <thh@inter.net>
The TLH was checked to match the beginning
of the newsgroup name, not the whole TLH part.
So the TLH "de" would match not only "de.test",
but also "denver.test", which was not the
desired outcome.
Signed-off-by: Thomas Hochstein <thh@inter.net>
The code introduced in 17ffbebad5
did not check the correct variable for being an array.
Improve an unrelated comment, too.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Switch to Getopt::Long, change coding style;
limit line length.
* Replace 'die' and 'warn' by calls to &Bleat().
* Completely changed options due to new
GetOpt::Long processing.
* Adapt to changes in NewsStats.pm
* Redo documentation.
* Update TODO.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Switch to Getopt::Long, change coding style;
limit line length.
* Completely changed options due to new
GetOpt::Long processing.
* Adapt to changes in NewsStats.pm
* Redo documentation.
* Update TODO.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Switch to Getopt::Long, change coding style;
limit line length.
* Replace 'die' and 'warn' by calls to &Bleat().
* Completely changed options due to new
GetOpt::Long processing.
- merged -m/-p into --month
* Adapt to changes in NewsStats.pm
* Redo documentation.
* Update TODO.
Signed-off-by: Thomas Hochstein <thh@inter.net>
* Switch to Getopt::Long, change coding style;
limit line length.
* Replace 'die' and 'warn' by calls to &Bleat().
* Completely redo options and processing:
- merge -m/-p/-a into --month
- replace -i/-q/-d with - much more powerful -
--group-by/--order-by
- replace -t/-l with - much more powerful -
--lower/--upper/--boundary
- remove -b and replace it with --report
Fixes#33.
* Add new report types, boundaries and sorting options:
- report types 'average' and 'sums'
- boundaries 'average' and 'sums'
- upper and/or lower boundary
- sort output independently
Issue #35.
Fixes#34, #38.
* Add possibility to cross-check newsgroups against
checkgroups file.
* Complete rewrite of groupstats.pl internal logic:
- modularize construction fo SQL queries
- remove unnecessary special cases
- refactor code into NewsStats.pm functions as much
as possible
Issue #37.
Fixes#36.
* Rework output formats, fix padding problem by making use
of modularized SQL queries.
Fixes#15, #32.
* Add some more consistency checks.
Issue #12.
* Redo documentation.
* Update TODO list.
Signed-off-by: Thomas Hochstein <thh@inter.net>