Compare commits

...

27 commits
master ... pu

Author SHA1 Message Date
Thomas Hochstein a586a217c9 Merge branch 'thh-parsedb' into pu
* thh-parsedb:
  Forcibly decode headers with unencoded 8bit chars.
  Add MID to error message to make it more useful.
2021-05-29 10:21:13 +02:00
Thomas Hochstein b99d41010d Forcibly decode headers with unencoded 8bit chars.
Just assume UTF-8 for the time being.
Fixes database errors with illegal characters
when writing parsed data.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2021-05-29 10:17:00 +02:00
Thomas Hochstein 6deb7dbaa4 Add MID to error message to make it more useful.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2021-05-29 09:47:17 +02:00
Thomas Hochstein 84e9923abe Merge branch 'thh-parsedb' into pu
* thh-parsedb:
  Add some input validation.
  Add documentation to parsedb.pl.
  Handle more than one entitiy in From: etc.
  Let gatherstats read its data from DBTableParse.
  Add decoding and parsing of From: etc.
  Create a database table with parsed raw data.
  Make GetTimePeriod() and others accept days.

# Conflicts:
#	bin/gatherstats.pl
2018-01-01 16:56:56 +01:00
Thomas Hochstein 48c8d4bb8e Add some input validation.
Our raw data doesn't have the qualitiy one should
expect. There are empty header lines only containing
whitespace (leading to wrong joining of apparent
continuation lines); header lines that contain garbage
without ':' so split is failing; empty 'newsgroups'
fields; unsupported encondings in MIME encoded words
... and so on.

Add fixes for the aforementioned problems.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:25 +02:00
Thomas Hochstein 13e006104b Add documentation to parsedb.pl.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:23 +02:00
Thomas Hochstein aef5467bfe Handle more than one entitiy in From: etc.
From:, Sender: etc. may contain more than one
entity in a comma separated list, i.e. a From:
line like
"From: Me <me@example.com>, You <you@example.com>"
is perfectly valid.

Handle multiple entities when splitting those
headers and save all names and all adresses
as (new) comma separated lists in the
corresponding database fields.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:21 +02:00
Thomas Hochstein ca8ac4d50f Let gatherstats read its data from DBTableParse.
Switch gatherstat.pl over to the parsed database.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:19 +02:00
Thomas Hochstein 9630376c31 Add decoding and parsing of From: etc.
Decode From:, Sender:, Reply-To:, Subject:;
parse From:, Sender:, Reply-To:.

Add Mail::Address to prerequisites.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:53:17 +02:00
Thomas Hochstein 6d72dad2c0 Create a database table with parsed raw data.
Incoming data is written to DBTableRaw without
much interpretation. To allow for more and
better analysis that raw data should be parsed
daily and copied to another database table
with separate fields for most header lines.
All other scripts could use that pre-parsed
data.

* Add database schema to install.pl
* Add DBTableParse to newsstats.conf.sample
  and as mandatory to NewsStats.pm
* Add parsedb.pl

TODO:
- Documentation is only rudimentary.
- From:, Sender:, Reply-To: and Subject:
  are not yet parsed.
- gatherstats.pl does not yet use DbTableParse.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-08 17:27:50 +02:00
Thomas Hochstein fd0717a15c Fix displayed path in install.
install.pl will display a sample newsfeeds entry.
Adapt the path to the changes in
2ad99c20bc.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:45:33 +02:00
Thomas Hochstein b3b170c357 Update INSTALL documentation.
Configuration files now reside in etc/.

This was an oversight from commit
2ad99c20bc.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:45:31 +02:00
Thomas Hochstein 44c197097b Fix documentation relating to conffile location.
CONFIGURATION section talks about newsstats.conf being
in the same directory which is not true any more since
2ad99c20bc.

It also didn't mention the --conffile option which was
added in 23ab67a099.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:43:13 +02:00
Thomas Hochstein e39d4207a6 Fix --conffile in POD.
Change '--conffile' to 'B<--conffile>'.
The wrong format was added to documentation
in commit
23ab67a099.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:30:28 +02:00
Thomas Hochstein 24d2011f32 Bump version numbers.
All scripts - and the package - have been
restructured in commit
2ad99c20bc,
but version numbers didn't change accordingly.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:29:50 +02:00
Thomas Hochstein 2871792120 Fix forgotten dates.
Some dates were not bumped when releasing v 0.01
in 07c0b2589a.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 11:29:21 +02:00
Thomas Hochstein 22d3d70a72 Fix ea91003a99.
Commit ea91003a99
was broken and did not check for undefined
variables.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-04 10:53:34 +02:00
Thomas Hochstein 3634010808 Make GetTimePeriod() and others accept days.
GetTimePeriod() was written to take a month
('YYYY-MM') and work with that. Make it accept
not only a month, but also a day ('YYYY-MM-DD')
by adding a $TYpe modifier.

Rename LastMonth() to LastMonthDay() and rewrite
it accordingly.

Rename CheckMonth() to CheckPeriod() and rewrite
it accordingly.

As GetTimePeriod() defaults to 'month' if no
modifier is passed this change should be backwards
compatible.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 22:44:36 +02:00
Thomas Hochstein 599fefbf6a Merge branch 'thh-bug51' into next
* thh-bug51:
  One more default sorting order ("grouping").
2013-09-03 22:25:23 +02:00
Thomas Hochstein 7624accb6e Merge branch 'thh-small-changes' into next
* thh-small-changes:
  Small comment fixes.
  --sums is not compatible with --checkgroups.
2013-09-03 22:25:13 +02:00
Thomas Hochstein 8dc6823e98 Small comment fixes.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 17:12:09 +02:00
Thomas Hochstein 17ef44085f --sums is not compatible with --checkgroups.
'Virtual' .ALL groups will never be present in
a checkgroups file, and we can't use them anyway
as they would contain postings from groups that
are filtered out by --checkgroups.

Add a warning, put a note in the documentation.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 15:10:07 +02:00
Thomas Hochstein ea91003a99 One more default sorting order ("grouping").
If --group-by is not set, output will be grouped
by month by default (as long as --boundary is
not set to 'level' or 'average', where grouping
by newsgroup is default).

Now we default to 'newsgroup' if just one newsgroup
is requested by --newsgroups, but more than one
month by --month.

Both defaults can be overridden.

But forced --group-by=month for --report type
'average' or 'sum' in front so defaults are
not checked.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 14:56:17 +02:00
Thomas Hochstein 1af57a5390 Merge branch 'thh-restructure' into next
* thh-restructure:
  Make configuration file configurable.
  Fix some whitespace.
  Redo directory structure.
2013-09-03 14:55:42 +02:00
Thomas Hochstein 23ab67a099 Make configuration file configurable.
Add --conffile option to all scripts to
overrride standard config file location
etc/newsstats.conf.

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:20 +02:00
Thomas Hochstein dfc2b81c37 Fix some whitespace.
Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:18 +02:00
Thomas Hochstein 2ad99c20bc Redo directory structure.
* Move all scripts to /bin
* Move configuration to /etc
* Move NewsStats.pm to /lib
* Add new path to NewsStats.pm to all scripts
* Set $HomePath to top level directory
* Move setting of config file name to ReadConf()

Signed-off-by: Thomas Hochstein <thh@inter.net>
2013-09-03 10:01:16 +02:00
10 changed files with 721 additions and 162 deletions

2
.gitignore vendored
View file

@ -1,3 +1,3 @@
tmp/
tmp/*
newsstats.conf
etc/newsstats.conf

View file

@ -4,18 +4,19 @@
#
# This script will log headers and other data to a database
# for further analysis by parsing a feed from INN.
#
#
# It is part of the NewsStats package.
#
# Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
#
# It can be redistributed and/or modified under the same terms under
# It can be redistributed and/or modified under the same terms under
# which Perl itself is published.
BEGIN {
our $VERSION = "0.01";
our $VERSION = "0.02";
use File::Basename;
push(@INC, dirname($0));
# we're in .../bin, so our module is in ../lib
push(@INC, dirname($0).'/../lib');
}
use strict;
use warnings;
@ -68,14 +69,15 @@ sub PrepareDB {
################################# Main program #################################
### read commandline options
my ($OptDebug,$OptQuiet);
my ($OptDebug,$OptQuiet,$OptConfFile);
GetOptions ('d|debug!' => \$OptDebug,
'q|test!' => \$OptQuiet,
'conffile=s' => \$OptConfFile,
'h|help' => \&ShowPOD,
'V|version' => \&ShowVersion) or exit 1;
### read configuration
my %Conf = %{ReadConfig($HomePath.'/newsstats.conf')};
my %Conf = %{ReadConfig($OptConfFile)};
### init syslog
openlog($0, 'nofatal,pid', LOG_NEWS);
@ -129,7 +131,7 @@ while (<>) {
};
};
$DBQuery->finish;
warn sprintf("-----\nDay: %s\nDate: %s\nMID: %s\nTS: %s\nToken: %s\n".
"Size: %s\nPeer: %s\nPath: %s\nNewsgroups: %s\nHeaders: %s\n",
$Day, $Date, $Mid, $Timestamp, $Token, $Size, $Peer, $Path,
@ -151,7 +153,7 @@ feedlog - log data from an INN feed to a database
=head1 SYNOPSIS
B<feedlog> [B<-Vhdq>]
B<feedlog> [B<-Vhdq>] [B<--conffile> I<filename>]
=head1 REQUIREMENTS
@ -172,7 +174,8 @@ terminating would only result in a rapid respawn.
=head2 Configuration
B<feedlog> will read its configuration from F<newsstats.conf> which
should be present in the same directory via Config::Auto.
should be present in etc/ via Config::Auto or from a configuration file
submitted by the B<--conffile> option.
See L<doc/INSTALL> for an overview of possible configuration options.
@ -197,6 +200,10 @@ find that information most probably in your B<INN> F<errlog> file.
Suppress logging to syslog.
=item B<--conffile> I<filename>
Load configuration from I<filename> instead of F<newsstats.conf>.
=back
=head1 INSTALLATION
@ -218,15 +225,15 @@ See L<doc/INSTALL> for further information.
=over 4
=item F<feedlog.pl>
=item F<bin/feedlog.pl>
The script itself.
=item F<NewsStats.pm>
=item F<lib/NewsStats.pm>
Library functions for the NewsStats package.
=item F<newsstats.conf>
=item F<etc/newsstats.conf>
Runtime configuration file.
@ -259,7 +266,7 @@ Thomas Hochstein <thh@inter.net>
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2010-2012 Thomas Hochstein <thh@inter.net>
Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.

59
gatherstats.pl → bin/gatherstats.pl Executable file → Normal file
View file

@ -3,19 +3,20 @@
# gatherstats.pl
#
# This script will gather statistical information from a database
# containing headers and other information from a INN feed.
#
# containing headers and other information from an INN feed.
#
# It is part of the NewsStats package.
#
# Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
#
# It can be redistributed and/or modified under the same terms under
# It can be redistributed and/or modified under the same terms under
# which Perl itself is published.
BEGIN {
our $VERSION = "0.01";
our $VERSION = "0.02";
use File::Basename;
push(@INC, dirname($0));
# we're in .../bin, so our module is in ../lib
push(@INC, dirname($0).'/../lib');
}
use strict;
use warnings;
@ -37,7 +38,7 @@ my %LegalStats;
### read commandline options
my ($OptCheckgroupsFile,$OptClientsDB,$OptDebug,$OptGroupsDB,$OptTLH,
$OptHostsDB,$OptMonth,$OptRawDB,$OptStatsType,$OptTest);
$OptHostsDB,$OptMonth,$OptParseDB,$OptStatsType,$OptTest,$OptConfFile);
GetOptions ('c|checkgroups=s' => \$OptCheckgroupsFile,
'clientsdb=s' => \$OptClientsDB,
'd|debug!' => \$OptDebug,
@ -45,18 +46,19 @@ GetOptions ('c|checkgroups=s' => \$OptCheckgroupsFile,
'hierarchy=s' => \$OptTLH,
'hostsdb=s' => \$OptHostsDB,
'm|month=s' => \$OptMonth,
'rawdb=s' => \$OptRawDB,
'parsedb=s' => \$OptParseDB,
's|stats=s' => \$OptStatsType,
't|test!' => \$OptTest,
'conffile=s' => \$OptConfFile,
'h|help' => \&ShowPOD,
'V|version' => \&ShowVersion) or exit 1;
### read configuration
my %Conf = %{ReadConfig($HomePath.'/newsstats.conf')};
my %Conf = %{ReadConfig($OptConfFile)};
### override configuration via commandline options
my %ConfOverride;
$ConfOverride{'DBTableRaw'} = $OptRawDB if $OptRawDB;
$ConfOverride{'DBTableParse'} = $OptParseDB if $OptParseDB;
$ConfOverride{'DBTableGrps'} = $OptGroupsDB if $OptGroupsDB;
$ConfOverride{'DBTableClnts'} = $OptClientsDB if $OptClientsDB;
$ConfOverride{'DBTableHosts'} = $OptHostsDB if $OptHostsDB;
@ -71,6 +73,8 @@ $OptStatsType = 'all' if !$OptStatsType;
### get time period from --month
# get verbal description of time period, drop SQL code
my ($Period) = &GetTimePeriod($OptMonth);
# bail out if --month is invalid or set to 'ALL';
# we don't support the latter
&Bleat(2,"--month option has an invalid format - please use 'YYYY-MM' or ".
"'YYYY-MM:YYYY-MM'!") if (!$Period or $Period eq 'all time');
@ -120,15 +124,15 @@ foreach my $Month (&ListMonth($Period)) {
### ----------------------------------------------
### get groups data (number of postings per group)
# get groups data from raw table for given month
# get groups data from parsed table for given month
my $DBQuery = $DBHandle->prepare(sprintf("SELECT newsgroups FROM %s.%s ".
"WHERE day LIKE ? AND NOT disregard",
$Conf{'DBDatabase'},
$Conf{'DBTableRaw'}));
$Conf{'DBTableParse'}));
$DBQuery->execute($Month.'-%')
or &Bleat(2,sprintf("Can't get groups data for %s from %s.%s: ".
"$DBI::errstr\n",$Month,
$Conf{'DBDatabase'},$Conf{'DBTableRaw'}));
$Conf{'DBDatabase'},$Conf{'DBTableParse'}));
# count postings per group
my %Postings;
@ -160,7 +164,7 @@ foreach my $Month (&ListMonth($Period)) {
}
};
};
# delete old data for that month
if (!$OptTest) {
$DBQuery = $DBHandle->do(sprintf("DELETE FROM %s.%s WHERE month = ?",
@ -202,11 +206,11 @@ __END__
=head1 NAME
gatherstats - process statistical data from a raw source
gatherstats - process statistical data from a parsed source
=head1 SYNOPSIS
B<gatherstats> [B<-Vhdt>] [B<-m> I<YYYY-MM> | I<YYYY-MM:YYYY-MM>] [B<-s> I<stats>] [B<-c> I<filename template>]] [B<--hierarchy> I<TLH>] [B<--rawdb> I<database table>] [B<-groupsdb> I<database table>] [B<--clientsdb> I<database table>] [B<--hostsdb> I<database table>]
B<gatherstats> [B<-Vhdt>] [B<-m> I<YYYY-MM> | I<YYYY-MM:YYYY-MM>] [B<-s> I<stats>] [B<-c> I<filename template>]] [B<--hierarchy> I<TLH>] [B<--parsedb> I<database table>] [B<-groupsdb> I<database table>] [B<--clientsdb> I<database table>] [B<--hostsdb> I<database table>] [B<--conffile> I<filename>]
=head1 REQUIREMENTS
@ -215,7 +219,7 @@ See L<doc/README>.
=head1 DESCRIPTION
This script will extract and process statistical information from a
database table which is fed from F<feedlog.pl> for a given time period
database table which is filled from F<parsedb.pl> for a given time period
and write its results to (an)other database table(s). Entries marked
with I<'disregard'> in the database will be ignored; currently, you
have to set this flag yourself, using your database management tools.
@ -257,12 +261,13 @@ override that default through the B<--groupsdb> option.
=head2 Configuration
B<gatherstats> will read its configuration from F<newsstats.conf>
which should be present in the same directory via Config::Auto.
which should be present in etc/ via Config::Auto or from a configuration file
submitted by the B<--conffile> option.
See L<doc/INSTALL> for an overview of possible configuration options.
You can override configuration options via the B<--hierarchy>,
B<--rawdb>, B<--groupsdb>, B<--clientsdb> and B<--hostsdb> options,
B<--parsedb>, B<--groupsdb>, B<--clientsdb> and B<--hostsdb> options,
respectively.
=head1 OPTIONS
@ -291,7 +296,7 @@ conjunction with B<--test> ... everything else seems a bit pointless.
Set processing period to a single month in YYYY-MM format or to a time
period between two month in YYYY-MM:YYYY-MM format (two month, separated
by a colon).
by a colon).
=item B<-s>, B<--stats> I<type>
@ -323,9 +328,9 @@ will be added with a count of 0 (and logged to STDERR).
Override I<TLH> from F<newsstats.conf>.
=item B<--rawdb> I<table> (raw data table)
=item B<--parsedb> I<table> (parsed data table)
Override I<DBTableRaw> from F<newsstats.conf>.
Override I<DBTableParse> from F<newsstats.conf>.
=item B<--groupsdb> I<table> (postings per group table)
@ -339,6 +344,10 @@ Override I<DBTableClnts> from F<newsstats.conf>.
Override I<DBTableHosts> from F<newsstats.conf>.
=item B<--conffile> I<filename>
Load configuration from I<filename> instead of F<newsstats.conf>.
=back
=head1 INSTALLATION
@ -368,15 +377,15 @@ checking against checkgroups-*:
=over 4
=item F<gatherstats.pl>
=item F<bin/gatherstats.pl>
The script itself.
=item F<NewsStats.pm>
=item F<lib/NewsStats.pm>
Library functions for the NewsStats package.
=item F<newsstats.conf>
=item F<etc/newsstats.conf>
Runtime configuration file.
@ -409,7 +418,7 @@ Thomas Hochstein <thh@inter.net>
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2010-2012 Thomas Hochstein <thh@inter.net>
Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.

View file

@ -4,18 +4,19 @@
#
# This script will get statistical data on newgroup usage
# from a database.
#
#
# It is part of the NewsStats package.
#
# Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
#
# It can be redistributed and/or modified under the same terms under
# It can be redistributed and/or modified under the same terms under
# which Perl itself is published.
BEGIN {
our $VERSION = "0.01";
our $VERSION = "0.02";
use File::Basename;
push(@INC, dirname($0));
# we're in .../bin, so our module is in ../lib
push(@INC, dirname($0).'/../lib');
}
use strict;
use warnings;
@ -31,7 +32,7 @@ Getopt::Long::config ('bundling');
### read commandline options
my ($OptBoundType,$OptCaptions,$OptCheckgroupsFile,$OptComments,
$OptFileTemplate,$OptFormat,$OptGroupBy,$OptGroupsDB,$LowBound,$OptMonth,
$OptNewsgroups,$OptOrderBy,$OptReportType,$OptSums,$UppBound);
$OptNewsgroups,$OptOrderBy,$OptReportType,$OptSums,$UppBound,$OptConfFile);
GetOptions ('b|boundary=s' => \$OptBoundType,
'c|captions!' => \$OptCaptions,
'checkgroups=s' => \$OptCheckgroupsFile,
@ -47,6 +48,7 @@ GetOptions ('b|boundary=s' => \$OptBoundType,
'r|report=s' => \$OptReportType,
's|sums!' => \$OptSums,
'u|upper=i' => \$UppBound,
'conffile=s' => \$OptConfFile,
'h|help' => \&ShowPOD,
'V|version' => \&ShowVersion) or exit 1;
# parse parameters
@ -76,12 +78,19 @@ if ($OptReportType) {
$OptReportType = 'default';
}
}
# read list of newsgroups from --checkgroups
# into a hash reference
my $ValidGroups = &ReadGroupList($OptCheckgroupsFile) if $OptCheckgroupsFile;
# honor $OptCheckgroupsFile,
# warn for $OptSums if set concurrently
my $ValidGroups;
if ($OptCheckgroupsFile) {
# read list of newsgroups from --checkgroups
# into a hash reference
$ValidGroups = &ReadGroupList($OptCheckgroupsFile);
&Bleat(1,"--sums option can't possibly work with --checkgroups option set")
if $OptSums;
}
### read configuration
my %Conf = %{ReadConfig($HomePath.'/newsstats.conf')};
my %Conf = %{ReadConfig($OptConfFile)};
### override configuration via commandline options
my %ConfOverride;
@ -124,12 +133,17 @@ if ($OptBoundType and $OptBoundType ne 'default') {
}
### get sort order and build SQL 'ORDER BY' clause
# force to 'month' for $OptReportType 'average' or 'sum'
$OptGroupBy = 'month' if ($OptReportType and $OptReportType ne 'default');
# default to 'newsgroup' for $OptBoundType 'level' or 'average'
$OptGroupBy = 'newsgroup' if (!$OptGroupBy and
$OptBoundType and $OptBoundType ne 'default');
# force to 'month' for $OptReportType 'average' or 'sum'
$OptGroupBy = 'month' if ($OptReportType and $OptReportType ne 'default');
# default to 'newsgroup' if $OptGroupBy is not set and
# just one newsgroup is requested, but more than one month
$OptGroupBy = 'newsgroup' if (!$OptGroupBy and $OptMonth and $OptMonth =~ /:/
and $OptNewsgroups and $OptNewsgroups !~ /[:*%]/);
# parse $OptGroupBy to $GroupBy, create ORDER BY clause $SQLOrderClause
# if $OptGroupBy is still not set, SQLSortOrder() will default to 'month'
my ($GroupBy,$SQLOrderClause) = SQLSortOrder($OptGroupBy, $OptOrderBy);
# $GroupBy will contain 'month' or 'newsgroup' (parsed result of $OptGroupBy)
# set it to 'month' or 'key' for OutputData()
@ -244,7 +258,7 @@ if ($OptCaptions && $OptComments) {
($OptOrderBy and $OptOrderBy =~ /posting/i) ? 'by number of postings ' : '',
($OptOrderBy and $OptOrderBy =~ /-?desc$/i) ? 'descending' : 'ascending');
}
# output data
&OutputData($OptFormat,$OptComments,$GroupBy,$Precision,
$OptCheckgroupsFile ? $ValidGroups : '',
@ -263,7 +277,7 @@ groupstats - create reports on newsgroup usage
=head1 SYNOPSIS
B<groupstats> [B<-Vhcs> B<--comments>] [B<-m> I<YYYY-MM>[:I<YYYY-MM>] | I<all>] [B<-n> I<newsgroup(s)>] [B<--checkgroups> I<checkgroups file>] [B<-r> I<report type>] [B<-l> I<lower boundary>] [B<-u> I<upper boundary>] [B<-b> I<boundary type>] [B<-g> I<group by>] [B<-o> I<order by>] [B<-f> I<output format>] [B<--filetemplate> I<filename template>] [B<--groupsdb> I<database table>]
B<groupstats> [B<-Vhcs> B<--comments>] [B<-m> I<YYYY-MM>[:I<YYYY-MM>] | I<all>] [B<-n> I<newsgroup(s)>] [B<--checkgroups> I<checkgroups file>] [B<-r> I<report type>] [B<-l> I<lower boundary>] [B<-u> I<upper boundary>] [B<-b> I<boundary type>] [B<-g> I<group by>] [B<-o> I<order by>] [B<-f> I<output format>] [B<--filetemplate> I<filename template>] [B<--groupsdb> I<database table>] [B<--conffile> I<filename>]
=head1 REQUIREMENTS
@ -328,7 +342,8 @@ Captions and comments are automatically disabled in this case.
=head2 Configuration
B<groupstats> will read its configuration from F<newsstats.conf>
which should be present in the same directory via Config::Auto.
which should be present in etc/ via Config::Auto or from a configuration file
submitted by the B<--conffile> option.
See doc/INSTALL for an overview of possible configuration options.
@ -346,7 +361,7 @@ Print out version and copyright information and exit.
Print this man page and exit.
=item B<-m>, B<--month> I<YYYY-MM[:YYYY-MM]|all>
=item B<-m>, B<--month> I<YYYY-MM[:YYYY-MM]|all>
Set processing period to a single month in YYYY-MM format or to a time
period between two month in YYYY-MM:YYYY-MM format (two month, separated
@ -373,6 +388,9 @@ example:
See the B<gatherstats> man page for details.
This option does not work together with the B<--checkgroups> option as
all "virtual" groups will not be present in the checkgroups file.
=item B<--checkgroups> I<filename>
Restrict output to those newgroups present in a file in checkgroups format
@ -382,6 +400,9 @@ line is ignored). All other newsgroups will be removed from output.
Contrary to B<gatherstats>, I<filename> is not a template, but refers to
a single file in checkgroups format.
The B<--sums> option will not work together with this option as "virtual"
groups will not be present in the checkgroups file.
=item B<-r>, B<--report> I<default|average|sums>
Choose the report type: I<default>, I<average> or I<sums>
@ -592,6 +613,10 @@ B<--nocomments> is enforced, see above.
Override I<DBTableGrps> from F<newsstats.conf>.
=item B<--conffile> I<filename>
Load configuration from I<filename> instead of F<newsstats.conf>.
=back
=head1 INSTALLATION
@ -635,15 +660,15 @@ machine-readable form (without formatting):
=over 4
=item F<groupstats.pl>
=item F<bin/groupstats.pl>
The script itself.
=item F<NewsStats.pm>
=item F<lib/NewsStats.pm>
Library functions for the NewsStats package.
=item F<newsstats.conf>
=item F<etc/newsstats.conf>
Runtime configuration file.
@ -680,7 +705,7 @@ Thomas Hochstein <thh@inter.net>
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2010-2012 Thomas Hochstein <thh@inter.net>
Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.

430
bin/parsedb.pl Executable file
View file

@ -0,0 +1,430 @@
#! /usr/bin/perl
#
# parsedb.pl
#
# This script will parse a database with raw header information
# from a INN feed to a structured database.
#
# It is part of the NewsStats package.
#
# Copyright (c) 2013 Thomas Hochstein <thh@inter.net>
#
# It can be redistributed and/or modified under the same terms under
# which Perl itself is published.
BEGIN {
our $VERSION = "0.01";
use File::Basename;
# we're in .../bin, so our module is in ../lib
push(@INC, dirname($0).'/../lib');
}
use strict;
use warnings;
use NewsStats qw(:DEFAULT :TimePeriods :SQLHelper);
use DBI;
use Getopt::Long qw(GetOptions);
Getopt::Long::config ('bundling');
use Encode qw/decode/;
use Mail::Address;
################################# Definitions ##################################
# define header names with separate database fields
my %DBFields = ('date' => 'date',
'references' => 'refs',
'followup-to' => 'fupto',
'from' => 'from_',
'sender' => 'sender',
'reply-to' => 'replyto',
'subject' => 'subject',
'organization' => 'organization',
'lines' => 'linecount',
'approved' => 'approved',
'supersedes' => 'supersedes',
'expires' => 'expires',
'user-agent' => 'useragent',
'x-newsreader' => 'xnewsreader',
'x-mailer' => 'xmailer',
'x-no-archive' => 'xnoarchive',
'content-type' => 'contenttype',
'content-transfer-encoding' => 'contentencoding',
'cancel-lock' => 'cancellock',
'injection-info' => 'injectioninfo',
'x-trace' => 'xtrace',
'nntp-posting-host' => 'postinghost');
# define field list for database
my @DBFields = qw/day mid refs date path newsgroups fupto from_ from_parsed
from_name from_address sender sender_parsed sender_name
sender_address replyto replyto_parsed replyto_name
replyto_address subject subject_parsed organization linecount
approved supersedes expires useragent xnewsreader xmailer
xnoarchive contenttype contentencoding cancellock injectioninfo
xtrace postinghost headers disregard/;
################################# Main program #################################
### read commandline options
my ($OptDay,$OptDebug,$OptParseDB,$OptRawDB,$OptTest,$OptConfFile);
GetOptions ('d|day=s' => \$OptDay,
'debug!' => \$OptDebug,
'parsedb=s' => \$OptParseDB,
'rawdb=s' => \$OptRawDB,
't|test!' => \$OptTest,
'conffile=s' => \$OptConfFile,
'h|help' => \&ShowPOD,
'V|version' => \&ShowVersion) or exit 1;
### read configuration
my %Conf = %{ReadConfig($OptConfFile)};
### override configuration via commandline options
my %ConfOverride;
$ConfOverride{'DBTableRaw'} = $OptRawDB if $OptRawDB;
$ConfOverride{'DBTableParse'} = $OptParseDB if $OptParseDB;
&OverrideConfig(\%Conf,\%ConfOverride);
### get time period
### and set $Period for output and expression for SQL 'WHERE' clause
my ($Period,$SQLWherePeriod) = &GetTimePeriod($OptDay,'day');
# bail out if --month is invalid or "all"
&Bleat(2,"--day option has an invalid format - please use 'YYYY-MM-DD' or ".
"'YYYY-MM-DD:YYYY-MM-DD'!") if (!$Period or $Period eq 'all time');
### init database
my $DBHandle = InitDB(\%Conf,1);
### get & write data
&Bleat(1,'Test mode. Database is not updated.') if $OptTest;
# create $SQLWhereClause
my $SQLWhereClause = SQLBuildClause('where',$SQLWherePeriod,'NOT disregard');
# delete old data for current period
if (!$OptTest) {
print "----------- Deleting old data ... -----------\n" if $OptDebug;
my $DBQuery = $DBHandle->do(sprintf("DELETE FROM %s.%s %s",
$Conf{'DBDatabase'},$Conf{'DBTableParse'},
$SQLWhereClause))
or &Bleat(2,sprintf("Can't delete old parsed data for %s from %s.%s: ".
"$DBI::errstr\n",$Period,
$Conf{'DBDatabase'},$Conf{'DBTableParse'}));
};
# read from DBTableRaw
print "-------------- Reading data ... -------------\n" if $OptDebug;
my $DBQuery = $DBHandle->prepare(sprintf("SELECT id, day, mid, peer, path, ".
"newsgroups, headers, disregard ".
"FROM %s.%s %s", $Conf{'DBDatabase'},
$Conf{'DBTableRaw'}, $SQLWhereClause));
$DBQuery->execute()
or &Bleat(2,sprintf("Can't get data for %s from %s.%s: ".
"$DBI::errstr\n",$Period,
$Conf{'DBDatabase'},$Conf{'DBTableRaw'}));
# set output and database connection to UTF-8
# as we're going to write decoded header contents containing UTF-8 chars
binmode(STDOUT, ":utf8");
$DBHandle->do("SET NAMES 'utf8'");
# create a list of supported encondings
my %LegalEncodings;
foreach (Encode->encodings()) {
$LegalEncodings{$_} = 1;
}
# parse data in a loop and write it out
print "-------------- Parsing data ... -------------\n" if $OptDebug;
while (my $HeadersR = $DBQuery->fetchrow_hashref) {
my %Headers = %{$HeadersR};
# parse $Headers{'headers'} ('headers' from DBTableRaw)
# remove empty lines (that should not even exist in a header!)
$Headers{'headers'} =~ s/\n\s*\n/\n/g;
# merge continuation lines
# from Perl Cookbook, 1st German ed. 1999, pg. 91
$Headers{'headers'} =~ s/\n\s+/ /g;
# split headers in single lines
my $OtherHeaders;
for (split(/\n/,$Headers{'headers'})) {
# split header lines in header name and header content
my ($key,$value);
if ($_ =~ /:/) {
($key,$value) = split(/:/,$_,2);
$key =~ s/\s*//;
$value =~ s/^\s*(.+)\s*$/$1/;
} else {
&Bleat(1,sprintf("Illegal header line in %s.%s id %s: %s",
$Conf{'DBDatabase'}, $Conf{'DBTableRaw'},
$Headers{'id'},$_));
next;
}
# check for empty (mandatory) fields from DBTableRaw
# and set them from $Headers{'headers', if necessary
if (lc($key) =~ /^(message-id|path|newsgroups)$/) {
my $HeaderName = lc($key);
$HeaderName = 'mid' if ($HeaderName eq 'message-id');
if (!defined($Headers{$HeaderName}) or $Headers{$HeaderName} eq '') {
$Headers{$HeaderName} = $value;
&Bleat(1,sprintf("Taking missing %s from 'headers' in %s.%s id %s.",
$HeaderName, $Conf{'DBDatabase'}, $Conf{'DBTableRaw'},
$Headers{'id'}));
}
}
# save each header, separate database fields in %Headers,
# the rest in $OtherHeaders (but not Message-ID, Path, Peer
# and Newsgroups as those do already exist)
if (defined($DBFields{lc($key)})) {
$Headers{$DBFields{lc($key)}} = $value;
} else {
$OtherHeaders .= sprintf("%s: %s\n",$key,$value)
if lc($key) !~ /^(message-id|path|peer|newsgroups)$/;
}
}
# replace old (now parsed) $Headers{'headers'} with remanining $OtherHeaders
chomp($OtherHeaders);
$Headers{'headers'} = $OtherHeaders;
foreach ('from_','sender', 'replyto', 'subject') {
if ($Headers{$_}) {
my $HeaderName = $_;
$HeaderName =~ s/_$//;
# decode From: / Sender: / Reply-To: / Subject:
if ($Headers{$_} =~ /\?(B|Q)\?/) {
# check for legal encoding and decode
(my $Encoding) = $Headers{$_} =~ /\?([^?]+)\?(B|Q)\?/;
$Headers{$HeaderName.'_parsed'} = decode('MIME-Header',$Headers{$_})
if (exists($LegalEncodings{$Encoding}));
}
# forcibly modify headers with un-encoded 8bit data assuming utf-8
# TODO: try to guess correct enconding
elsif ($Headers{$_} =~ /[^\x00-\x7F]/) {
$Headers{$_} = decode('utf-8',$Headers{$_});
}
# extract name(s) and mail(s) from From: / Sender: / Reply-To:
# in parsed form, if available
if ($_ ne 'subject') {
my @Address;
# start parser on header or parsed header
# @Address will have an array of Mail::Address objects, one for
# each name/mail (you can have more than one person in From:!)
if (defined($Headers{$HeaderName.'_parsed'})) {
@Address = Mail::Address->parse($Headers{$HeaderName.'_parsed'});
} else {
@Address = Mail::Address->parse($Headers{$_});
}
# split each Mail::Address object to @Names and @Adresses
my (@Names,@Adresses);
foreach (@Address) {
# take address part in @Addresses
push (@Adresses, $_->address());
# take name part form "phrase", if there is one:
# From: My Name <addr@ess> (Comment)
# otherwise, take it from "comment":
# From: addr@ess (Comment)
# and push it in @Names
my ($Name);
$Name = $_->comment() unless $Name = $_->phrase;
$Name =~ s/^\((.+)\)$/$1/;
push (@Names, $Name);
}
# put all @Adresses and all @Names in %Headers as comma separated lists
$Headers{$HeaderName.'_address'} = join(', ',@Adresses);
$Headers{$HeaderName.'_name'} = join(', ',@Names);
}
}
}
# order output for database entry: fill @SQLBindVars
print "-------------- Next entry:\n" if $OptDebug;
my @SQLBindVars;
foreach (@DBFields) {
if (defined($Headers{$_}) and $Headers{$_} ne '') {
push (@SQLBindVars,$Headers{$_});
printf ("FOUND: %s -> %s\n",$_,$Headers{$_}) if $OptDebug;
} else {
push (@SQLBindVars,undef);
}
}
# write data to DBTableParse
if (!$OptTest) {
print "-------------- Writing data ... -------------\n" if $OptDebug;
my $DBWrite =
$DBHandle->prepare(sprintf("INSERT INTO %s.%s (%s) VALUES (%s)",
$Conf{'DBDatabase'},
$Conf{'DBTableParse'},
# get field names from @DBFields
join(', ',@DBFields),
# create a list of '?' for each DBField
join(', ',
split(/ /,'? ' x scalar(@DBFields)))
));
$DBWrite->execute(@SQLBindVars)
or &Bleat(2,sprintf("Can't write parsed data for %s to %s.%s for %s: ".
"$DBI::errstr\n",$Period,
$Conf{'DBDatabase'},$Conf{'DBTableParse'}, $Headers{'mid'}));
$DBWrite->finish;
}
};
$DBQuery->finish;
### close handles
$DBHandle->disconnect;
print "------------------- DONE! -------------------\n" if $OptDebug;
__END__
################################ Documentation #################################
=head1 NAME
parsedb - parse raw data and save it to a database
=head1 SYNOPSIS
B<parsedb> [B<-Vht>] [B<--day> I<YYYY-MM-DD> | I<YYYY-MM-DD:YYYY-MM-DD>] [B<--rawdb> I<database table>] [B<--parsedb> I<database table>] [B<--conffile> I<filename>] [B<--debug>]
=head1 REQUIREMENTS
See L<doc/README>.
=head1 DESCRIPTION
This script will parse raw, unstructured headers from a database table which is
fed from F<feedlog.pl> for a given time period and write its results to
nother database table with separate fields (columns) for most (or even all)
relevant headers.
I<Subject:>, I<From:>, I<Sender:> and I<Reply-To:> will be parsed from MIME
encoded words to UTF-8 as needed while the unparsed copy is kept. From that
parsed copy, I<From:>, I<Sender:> and I<Reply-To:> will also be split into
separate name(s) and address(es) fields while the un-splitted copy is kept,
too.
B<parsedb> should be run nightly from cron for yesterdays data so all
other scripts get current information. The time period to act on defaults to
yesterday, accordingly; you can assign another time period or a single day via
the B<--day> option (see below).
=head2 Configuration
B<parsedb> will read its configuration from F<newsstats.conf>
should be present in etc/ via Config::Auto or from a configuration file
submitted by the B<--conffile> option.
See L<doc/INSTALL> for an overview of possible configuration options.
You can override configuration options via the B<--rawdb> and
B<--parsedb> options, respectively.
=head1 OPTIONS
=over 3
=item B<-V>, B<--version>
Print out version and copyright information and exit.
=item B<-h>, B<--help>
Print this man page and exit.
=item B<--debug>
Output (rather much) debugging information to STDOUT while processing.
=item B<-t>, B<--test>
Do not write results to database. You should use B<--debug> in
conjunction with B<--test> ... everything else seems a bit pointless.
=item B<-d>, B<--day> I<YYYY-MM-DD[:YYYY-MM-DD]>
Set processing period to a single day in YYYY-MM-DD format or to a time
period between two days in YYYY-MM-DD:YYYY-MM-DD format (two days, separated
by a colon).
Defaults to yesterday.
=item B<--rawdb> I<table> (raw data table)
Override I<DBTableRaw> from F<newsstats.conf>.
=item B<--parsedb> I<table> (parsed data table)
Override I<DBTableParse> from F<newsstats.conf>.
=item B<--conffile> I<filename>
Load configuration from I<filename> instead of F<newsstats.conf>.
=back
=head1 INSTALLATION
See L<doc/INSTALL>.
=head1 EXAMPLES
An example crontab entry:
0 1 * * * /path/to/bin/parsedb.pl
Do a dry run for yesterday's data, showing results of processing:
parsedb --debug --test | less
=head1 FILES
=over 4
=item F<bin/parsedb.pl>
The script itself.
=item F<lib/NewsStats.pm>
Library functions for the NewsStats package.
=item F<etc/newsstats.conf>
Runtime configuration file.
=back
=head1 BUGS
Please report any bugs or feature requests to the author or use the
bug tracker at L<http://bugs.th-h.de/>!
=head1 SEE ALSO
=over 2
=item -
L<doc/README>
=item -
L<doc/INSTALL>
=back
This script is part of the B<NewsStats> package.
=head1 AUTHOR
Thomas Hochstein <thh@inter.net>
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2013 Thomas Hochstein <thh@inter.net>
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.
=cut

View file

@ -13,6 +13,8 @@ INSTALLATION INSTRUCTIONS
1) Install the scripts
* Get INN, mysql, Perl, and the necessary modules installed (see README).
* Download the current version of NewsStats from
<http://th-h.de/download/scripts.php>.
@ -28,21 +30,21 @@ INSTALLATION INSTRUCTIONS
* Copy the sample configuration file newsstats.conf.sample to
newsstats.conf and modify it for your purposes:
# cp newsstats.conf.sample newsstats.conf
# vim newsstats.conf
# cp etc/newsstats.conf.sample etc/newsstats.conf
# vim etc/newsstats.conf
a) Mandatory configuration options
* DBDriver = mysql
Database driver used; currently only mysql is supported.
* DBHost = localhost
The host your mysql server is running on.
* DBUser =
The username to connect to the database server.
* DBPw =
* DBPw =
Matching password for your username.
* DBDatabase = newsstats
@ -61,17 +63,17 @@ INSTALLATION INSTRUCTIONS
* TLH = de
Limit examination to that top-level hierarchy.
3) Database (mysql) setup
* Setup your database server with a username, password and
database matching the NewsStats configuration (see 2 a).
* Start the installation script:
# install/install.pl
It will setup the necessary database tables and display some
It will setup the necessary database tables and display some
information on the next steps.
4) Feed (INN) setup

View file

@ -47,6 +47,7 @@ Prerequisites
- Config::Auto
- Date::Format
- DBI
- Mail::Address
* mysql 5.0.x

View file

@ -4,13 +4,14 @@
#
DBDriver = mysql
DBHost = localhost
DBUser =
DBPw =
DBUser =
DBPw =
DBDatabase = newsstats
#
# tables
#
DBTableRaw = raw_de
DBTableParse = parsed_de
DBTableGrps = groups_de
#DBTableClnts =
#DBTableHosts =

View file

@ -3,27 +3,25 @@
# install.pl
#
# This script will create database tables as necessary.
#
#
# It is part of the NewsStats package.
#
# Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
#
# It can be redistributed and/or modified under the same terms under
# It can be redistributed and/or modified under the same terms under
# which Perl itself is published.
BEGIN {
our $VERSION = "0.01";
our $VERSION = "0.02";
use File::Basename;
# we're in .../install, so our module is in ..
push(@INC, dirname($0).'/..');
# we're in .../install, so our module is in ../lib
push(@INC, dirname($0).'/../lib');
}
use strict;
use warnings;
use NewsStats qw(:DEFAULT);
use Cwd;
use DBI;
use Getopt::Long qw(GetOptions);
Getopt::Long::config ('bundling');
@ -31,18 +29,15 @@ Getopt::Long::config ('bundling');
################################# Main program #################################
### read commandline options
my ($OptUpdate);
my ($OptUpdate,$OptConfFile);
GetOptions ('u|update=s' => \$OptUpdate,
'conffile=s' => \$OptConfFile,
'h|help' => \&ShowPOD,
'V|version' => \&ShowVersion) or exit 1;
### change working directory to .. (as we're in .../install)
chdir dirname($FullPath).'/..';
my $Path = cwd();
### read configuration
print("Reading configuration.\n");
my %Conf = %{ReadConfig($Path.'/newsstats.conf')};
my %Conf = %{ReadConfig($OptConfFile)};
##### --------------------------------------------------------------------------
##### Database table definitions
@ -52,10 +47,10 @@ my $DBCreate = <<SQLDB;
CREATE DATABASE IF NOT EXISTS `$Conf{'DBDatabase'}` DEFAULT CHARSET=utf8;
SQLDB
my %DBCreate = ('DBTableRaw' => <<RAW, 'DBTableGrps' => <<GRPS);
--
my %DBCreate = ('DBTableRaw' => <<RAW, 'DBTableParse' => <<PARSE, 'DBTableGrps' => <<GRPS);
--
-- Table structure for table DBTableRaw
--
--
CREATE TABLE IF NOT EXISTS `$Conf{'DBTableRaw'}` (
`id` bigint(20) unsigned NOT NULL auto_increment,
@ -76,9 +71,59 @@ CREATE TABLE IF NOT EXISTS `$Conf{'DBTableRaw'}` (
KEY `peer` (`peer`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Raw data';
RAW
--
--
-- Table structure for table DBTableParse
--
CREATE TABLE IF NOT EXISTS `$Conf{'DBTableParse'}` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`day` date NOT NULL,
`mid` varchar(250) character set ascii NOT NULL,
`refs` varchar(1000) character set ascii,
`date` varchar(100) NOT NULL,
`path` varchar(1000) NOT NULL,
`newsgroups` varchar(1000) NOT NULL,
`fupto` varchar(200),
`from_` varchar(500),
`from_parsed` varchar(200),
`from_name` varchar(200),
`from_address` varchar(200),
`sender` varchar(500),
`sender_parsed` varchar(200),
`sender_name` varchar(200),
`sender_address` varchar(200),
`replyto` varchar(500),
`replyto_parsed` varchar(200),
`replyto_name` varchar(200),
`replyto_address` varchar(200),
`subject` varchar(1000) NOT NULL,
`subject_parsed` varchar(1000),
`organization` varchar(1000),
`linecount` int(4) unsigned,
`approved` varchar(250),
`supersedes` varchar(250),
`expires` varchar(100),
`useragent` varchar(500),
`xnewsreader` varchar(500),
`xmailer` varchar(500),
`xnoarchive` varchar(100),
`contenttype` varchar(500),
`contentencoding` varchar(500),
`cancellock` varchar(500),
`injectioninfo` varchar(500),
`xtrace` varchar(500),
`postinghost` varchar(1000),
`headers` longtext,
`disregard` tinyint(1) default '0',
PRIMARY KEY (`id`),
KEY `day` (`day`),
KEY `mid` (`mid`),
KEY `newsgroups` (`newsgroups`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Parsed data';
PARSE
--
-- Table structure for table DBTableGrps
--
--
CREATE TABLE IF NOT EXISTS `$Conf{'DBTableGrps'}` (
`id` bigint(20) unsigned NOT NULL auto_increment,
@ -108,7 +153,7 @@ Things left to do:
## gather statistics for NewsStats
newsstats!\\
:!*,de.*\\
:Tc,WmtfbsPNH,Ac:$Path/feedlog.pl
:Tc,WmtfbsPNH,Ac:$HomePath/bin/feedlog.pl
Please
@ -167,7 +212,7 @@ if (!$OptUpdate) {
my $DBQuery = $DBHandle->prepare($DBCreate);
$DBQuery->execute() or &Bleat(2, sprintf("Can't create database %s: %s%\n",
$Conf{'DBDatabase'}, $DBI::errstr));
printf("Database table %s created succesfully.\n",$Conf{'DBDatabase'});
$DBHandle->disconnect;
};
@ -185,7 +230,7 @@ if (!$OptUpdate) {
&CreateTable($Table);
};
print "Database table generation done.\n";
# Display install instructions
print $Install;
} else {
@ -255,7 +300,7 @@ install - installation script
=head1 SYNOPSIS
B<install> [B<-Vh> [--update I<version>]
B<install> [B<-Vh> [--update I<version>] [B<--conffile> I<filename>]
=head1 REQUIREMENTS
@ -267,8 +312,9 @@ This script will create database tables as necessary and configured.
=head2 Configuration
B<install> will read its configuration from F<newsstats.conf> via
Config::Auto.
B<install> will read its configuration from F<newsstats.conf> which should
be present in etc/ via Config::Auto or from a configuration file submitted
by the B<--conffile> option.
See L<doc/INSTALL> for an overview of possible configuration options.
@ -288,21 +334,25 @@ Print this man page and exit.
Don't do a fresh install, but update from I<version>.
=item B<--conffile> I<filename>
Load configuration from I<filename> instead of F<newsstats.conf>.
=back
=head1 FILES
=over 4
=item F<install.pl>
=item F<install/install.pl>
The script itself.
=item F<NewsStats.pm>
=item F<lib/NewsStats.pm>
Library functions for the NewsStats package.
=item F<newsstats.conf>
=item F<etc/newsstats.conf>
Runtime configuration file.
@ -335,7 +385,7 @@ Thomas Hochstein <thh@inter.net>
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2010-2012 Thomas Hochstein <thh@inter.net>
Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.

View file

@ -4,7 +4,7 @@
#
# Copyright (c) 2010-2013 Thomas Hochstein <thh@inter.net>
#
# This module can be redistributed and/or modified under the same terms under
# This module can be redistributed and/or modified under the same terms under
# which Perl itself is published.
package NewsStats;
@ -49,20 +49,24 @@ require Exporter;
Output => [qw(OutputData FormatOutput)],
SQLHelper => [qw(SQLHierarchies SQLSortOrder SQLGroupList
SQLSetBounds SQLBuildClause GetMaxLength)]);
$VERSION = '0.01';
our $PackageVersion = '0.01';
$VERSION = '0.02';
our $PackageVersion = '0.02';
use Data::Dumper;
use File::Basename;
use Cwd qw(realpath);
use Config::Auto;
use DBI;
#####-------------------------------- Vars --------------------------------#####
# trim the path
# save $0 in $FullPath
our $FullPath = $0;
our $HomePath = dirname($0);
# strip filename and /bin or /install directory to create the $HomePath
our $HomePath = dirname(realpath($0));
$HomePath =~ s/\/(bin|install)//;
# trim $0
$0 =~ s%.*/%%;
# set version string
our $MyVersion = "$0 $::VERSION (NewsStats.pm $VERSION)";
@ -76,7 +80,7 @@ sub ShowVersion {
################################################################################
### display version and exit
print "NewsStats v$PackageVersion\n$MyVersion\n";
print "Copyright (c) 2010-2012 Thomas Hochstein <thh\@inter.net>\n";
print "Copyright (c) 2010-2013 Thomas Hochstein <thh\@inter.net>\n";
print "This program is free software; you may redistribute it ".
"and/or modify it under the same terms as Perl itself.\n";
exit(100);
@ -99,9 +103,11 @@ sub ReadConfig {
### IN : $ConfFile: config filename
### OUT: reference to a hash containing the configuration
my ($ConfFile) = @_;
# set default
$ConfFile = $HomePath . '/etc/newsstats.conf' if !$ConfFile;
# mandatory configuration options
my @Mandatory = ('DBDriver','DBHost','DBUser','DBPw','DBDatabase',
'DBTableRaw','DBTableGrps');
'DBTableRaw','DBTableParse','DBTableGrps');
# read config via Config::Auto
my $ConfR = Config::Auto::parse($ConfFile, format => 'equal');
my %Conf = %{$ConfR};
@ -238,7 +244,7 @@ sub ReadGroupList {
### ignoring everything after the first whitespace and so accepting files
### in checkgroups format as well as (parts of) an INN active file)
### IN : $Filename : file to read
### OUT: \%ValidGroups: hash containing all valid newsgroups
### OUT: \%ValidGroups: reference to a hash containing all valid newsgroups
my ($Filename) = @_;
my %ValidGroups;
open (my $LIST,"<$Filename") or &Bleat(2,"Cannot read $Filename: $!");
@ -259,33 +265,39 @@ sub ReadGroupList {
################################################################################
sub GetTimePeriod {
################################################################################
### get a time period to act on from --month option;
### if empty, default to last month
### IN : $Month: may be empty, 'YYYY-MM', 'YYYY-MM:YYYY-MM' or 'all'
### get a time period to act on from --month / --day option;
### if empty, default to last month / day
### IN : $Period: may be empty, 'YYYY-MM(-DD)', 'YYYY-MM(-DD):YYYY-MM(-DD)'
### or 'all'
### $Type : may be 'month' or 'day'
### OUT: $Verbal,$SQL: verbal description and WHERE-clause
### of the chosen time period
my ($Month) = @_;
my ($Period,$Type) = @_;
# define result variables
my ($Verbal, $SQL);
# define a regular expression for a month
my $REMonth = '\d{4}-\d{2}';
# default to last month if option is not set
if(!$Month) {
$Month = &LastMonth;
# check $Type
$Type = 'month' if (!$Type or ($Type ne 'month' and $Type ne 'day'));
# define a regular expressions for a month or day
my $REPeriod = '\d{4}-\d{2}';
$REPeriod .= '-\d{2}' if ($Type eq 'day');
# default to last month / day if option is not set
if(!$Period) {
$Period = &LastMonthDay($Type);
}
# check for valid input
if ($Month =~ /^$REMonth$/) {
# single month (YYYY-MM)
($Month) = &CheckMonth($Month);
$Verbal = $Month;
$SQL = sprintf("month = '%s'",$Month);
} elsif ($Month =~ /^$REMonth:$REMonth$/) {
# time period (YYYY-MM:YYYY-MM)
$Verbal = sprintf('%s to %s',&SplitPeriod($Month));
$SQL = sprintf("month BETWEEN '%s' AND '%s'",&SplitPeriod($Month));
} elsif ($Month =~ /^all$/i) {
if ($Period =~ /^$REPeriod$/) {
# single month/day [YYYY-MM(-DD)]
($Period) = &CheckPeriod($Type,$Period);
$Verbal = $Period;
$SQL = sprintf("%s = '%s'",$Type,$Period);
} elsif ($Period =~ /^$REPeriod:$REPeriod$/) {
# time period [YYYY-MM(-DD):YYYY-MM(-DD)]
$Verbal = sprintf('%s to %s',&SplitPeriod($Period,$Type));
$SQL = sprintf("%s BETWEEN '%s' AND '%s'",$Type,
&SplitPeriod($Period,$Type));
} elsif ($Period =~ /^all$/i) {
# special case: ALL
$Verbal = 'all time';
$SQL = '';
@ -293,63 +305,87 @@ sub GetTimePeriod {
# invalid input
return (undef,undef);
}
return ($Verbal,$SQL);
};
################################################################################
sub LastMonth {
sub LastMonthDay {
################################################################################
### get last month from todays date in YYYY-MM format
### OUT: last month as YYYY-MM
# get today's date
my (undef,undef,undef,undef,$Month,$Year,undef,undef,undef) = localtime(time);
# $Month is already defined from 0 to 11, so no need to decrease it by 1
### get last month/day from todays date in YYYY-MM format
### IN : $Type : may be 'month' or 'day'
### OUT: last month/day as YYYY-MM(-DD)
my ($Type) = @_;
my ($Day,$Month,$Year);
if ($Type eq 'day') {
# get yesterdays's date
(undef,undef,undef,$Day,$Month,$Year,undef,undef,undef) = localtime(time-86400);
# $Month is defined from 0 to 11, so add 1
$Month++;
} else {
# get today's date (month and year)
(undef,undef,undef,undef,$Month,$Year,undef,undef,undef) = localtime(time);
# $Month is already defined from 0 to 11, so no need to decrease it by 1
if ($Month < 1) {
$Month = 12;
$Year--;
};
}
$Year += 1900;
if ($Month < 1) {
$Month = 12;
$Year--;
};
# return last month
return sprintf('%4d-%02d',$Year,$Month);
# return last month / day
if ($Type eq 'day') {
return sprintf('%4d-%02d-%02d',$Year,$Month,$Day);
} else {
return sprintf('%4d-%02d',$Year,$Month);
}
};
################################################################################
sub CheckMonth {
sub CheckPeriod {
################################################################################
### check if input (in YYYY-MM form) is valid with MM between 01 and 12;
### check if input (in YYYY-MM(-DD) form) is a valid month / day;
### otherwise, fix it
### IN : @Month: array of month
### OUT: @Month: a valid month
my (@Month) = @_;
foreach my $Month (@Month) {
my ($OldMonth) = $Month;
my ($CalMonth) = substr ($Month, -2);
if ($CalMonth < 1 or $CalMonth > 12) {
### IN : $Type : may be 'month' or 'day'
### @Period: array of month/day
### OUT: @Period: a valid month/day
my ($Type,@Period) = @_;
foreach my $Period (@Period) {
my ($OldPeriod) = $Period;
my ($CalMonth,$CalDay);
$Period .= '-01' if ($Type eq 'month');
$CalDay = substr ($Period, -2);
$CalMonth = substr ($Period, 5, 2);
if ($CalMonth < 1 or $CalMonth > 12 or $CalDay < 1 or $CalDay > 31) {
$CalMonth = '12' if $CalMonth > 12;
$CalMonth = '01' if $CalMonth < 1;
substr($Month, -2) = $CalMonth;
&Bleat(1,sprintf("'%s' is an invalid date (MM must be between '01' ".
"and '12'), set to '%s'.",$OldMonth,$Month));
substr($Period, 5, 2) = $CalMonth;
$CalDay = '01' if $CalDay < 1;
$CalDay = '31' if $CalDay > 31;
# FIXME! - month with less than 31 days ...
substr($Period, -2) = $CalDay;
&Bleat(1,sprintf("'%s' is an invalid date, set to '%s'.",
$OldPeriod,$Period));
}
$Period = substr($Period,0,7) if ($Type eq 'month');
}
return @Month;
return @Period;
};
################################################################################
sub SplitPeriod {
################################################################################
### split a time period denoted by YYYY-MM:YYYY-MM into start and end month
### split a time period denoted by YYYY-MM(-DD):YYYY-MM(-DD) into start and end
### IN : $Period: time period
### OUT: $StartMonth, $EndMonth
my ($Period) = @_;
my ($StartMonth, $EndMonth) = split /:/, $Period;
($StartMonth,$EndMonth) = CheckMonth($StartMonth,$EndMonth);
### $Type : may be 'month' or 'day'
### OUT: $StartTime, $EndTime
my ($Period,$Type) = @_;
my ($StartTime, $EndTime) = split /:/, $Period;
($StartTime,$EndTime) = CheckPeriod($Type,$StartTime,$EndTime);
# switch parameters as necessary
if ($EndMonth gt $StartMonth) {
return ($StartMonth, $EndMonth);
if ($EndTime gt $StartTime) {
return ($StartTime, $EndTime);
} else {
return ($EndMonth, $StartMonth);
return ($EndTime, $StartTime);
};
};
@ -401,7 +437,7 @@ sub OutputData {
my %ValidKeys = %{$ValidKeys} if $ValidKeys;
my ($FileName, $Handle, $OUT);
our $LastIteration;
# define output types
my %LegalOutput;
@LegalOutput{('dump','list','pretty')} = ();
@ -433,7 +469,7 @@ sub OutputData {
# safeguards for filename creation:
# replace potential problem characters with '_'
$FileName = sprintf('%s-%s',$FileTempl,$Caption);
$FileName =~ s/[^a-zA-Z0-9_-]+/_/g;
$FileName =~ s/[^a-zA-Z0-9_-]+/_/g;
open ($OUT,">$FileName")
or &Bleat(2,sprintf("Cannot open output file '%s': $!",
$FileName));
@ -770,5 +806,3 @@ sub CheckValidNewsgroups {
#####------------------------------- done ---------------------------------#####
1;