LRpt - Perl extension for comparing csv files and reporting their contents.
###################################################################
#
# Simple report
#
###################################################################
lks.pl --keys=keys.txt selects.txt > sel_subs.txt
lcsvdmp.pl --conn_file=conn_file.txt --path=data sel_subs.txt
lrptxml.pl --rldf=rldf.xml --selects=sel_subs.txt --keys_file=keys.txt
data > report.xml
###################################################################
#
# Diff report (data in csv files)
#
###################################################################
lks.pl --keys=keys.txt selects.txt > sel_subs.txt
lcsvdmp.pl --conn_file=conn_file.txt --path=data_state1 sel_subs.txt
... some actions on the database...
lcsvdmp.pl --conn_file=conn_file.txt --path=data_state2 sel_subs.txt
lcsvdiff.pl --all --keys_file=keys.txt data_state1 data_state2
> diffs.txt
lrptxml.pl --diffs --rldf=rldf.xml --selects=sel_subs.txt
--keys_file=keys.txt diffs.txt > report_aft.xml
###################################################################
#
# Diff report (data in databases)
#
###################################################################
lks.pl --keys=keys.txt selects.txt > sel_subs.txt
lcsvdiff.pl --keys_file=keys.txt --conn_file_before=conn_file_before.txt
--conn_file_after=conn_file_after.txt
--selects=sel_subs.txt > diffs.txt
###################################################################
#
# Diff report (comparing database with csv files)
#
###################################################################
lks.pl --keys=keys.txt selects.txt > sel_subs.txt
lcsvdiff.pl --keys_file=keys.txt --conn_file_before=conn_file_before.txt
--selects=sel_subs.txt after > diffs.txt
###################################################################
#
# Expected-actual diff report
#
###################################################################
lks.pl --keys=keys.txt selects.txt > sel_subs.txt
lcsvdmp.pl --conn_file=conn_file.txt --path=data sel_subs.txt
lcsveadiff.pl --keys_file=keys.txt --expectations=exp.xml
--cmp_rules=cmp_rules.xml data > eadiffs.xml
You can get the most recent code from lreport project on source forge http://lreport.sourceforge.net. You are given 2 options - you can either download a ppm active state package or a CPAN package.
LRpt (LReport) is a set of tools for database row sets comparison and reporting. The core logic operates on csv files, however, LReport also provides tools for converting select results to csv files.
First part of this document is a reference manual.
If it's your first time with LReport, I recommend you to start with a a stroll through LReport section of this document. Then have a look at 2 additional examples.
In this and following chapters you will find a more systematic treatment of LReport. If you haven't read examples above, I recommend you at least to skim over them. They will help you to understand concepts explained below. When appropriate, you are given links to other documentation pages.
LReport consits of the following tools:
lks.pl
Tool for replacing where key place holders in select files. You can read more about it in man page for LRpt::KeySubst
.
lcsvdmp.pl
Tool for dumping results of a group of selects to csv files. You can read more about it in man page for LRpt::CSVDumper
.
lcsvdiff.pl
Tool for comparing 2 sets of csv files. You can read more about in in man page for LRpt::CSVDiff
.
lcsveadiff.pl
Tool for comparing a group of csv files with expectations. You can read more about it on a man page for LRpt::CSVEADiff
.
lrptxml.pl
Tool for creating and xml report from results generated by lcsvdmp.pl and lcsvdiff.pl. You can read more about it on a man page for LRpt::XMLReport
.
There are 2 types of possible comparisons:
We talk about before-after comparison when a set of selects is executed twice - before and after some transaction, which could potentialy change the data. For each select, rows returned before change are compared with rows returned by the same select after the change. When differences are found, they are reported.
We also use this type of comparison when we compare 2 databases.
The following differences can be reported:
The row, which existed before a transaction does not exist any more.
The row did not exist before a transaction and appeared after the change.
The row existed before a transaction, still exists after it but some fields of the row are modified.
In order to determine a direct relationship between before and after rows (i.e. to determine which after row should be compared with which before row) a row key is used. A row key is an equivalent of primary key in a database table. It has to be unique and may consist of many columns.
During comparison for particular select, LReport sorts after and before rows by a row key (after and before rows are sorted separately). Then it iterates through all keys found in both collections of rows (before and after rows). If for example before collections have 3 rows with keys (01,AB), (02,GT), (02,JT) and the after collection has rows with keys (01,AB), (03,KP), then LReport will iterate through a following list: (01,AB), (02,GT), (02,JT), (03,KP). So it's basically a sum of sets.
For each value of a row key, LReport finds an after and before row with this key value. If there is no such row in before collection, that means that a new row was created. So it is reported as additional row. If there is no such row in after collection, that means that a row was deleted. It is then reported as missing row. If both after and before row exist, they are compared field by field. If any difference is found, the row is reported as not equal. Otherwise rows are considered equal.
NOTE: This functionality is experimental.
This type of comparison may be used for test automation. Sometimes you can create a static set of expected data, which you can use for comparison with actual data. There are many cases, however, when you cannot create static files with expectations, because some column values depend on a date, environment and some circumstances, which are difficult to recreate. So instead, you can implement a script/program, which will predict what could be the output of a test and then generate expectations.
Those expecations may be then compared with results of selects done on actual data. Although it may seem that there is nothing special in expected-actual comparison and it is a slightly modified version of before-after comparison, this is not the case. There are some nuances, which forces an introduction of some new notions and concepts.
In general, expected-actual comparison uses the same core logic as before-after comparison. It parses the expectations and builds collections of expected rows and then compares them with collections of actual rows.
The first big difference is that not all columns are defined in expectations. It makes sense. We often deal with rows, which consist of 40 columns, but we are only interested in 10 of them. It would be infeasible to force a user to define expected values for all those 40 columns. So the user can decide not to specify them.
From the other hand, it would be good to have a protection mechanism, which would warn a user when a really important column has no expectation defined. This is where comparing rules come to the rescue.
Comparing rules are defined for a particular select. They define what to do if an expected value for a given column in not specified. You can make the application terminate, do nothing, warn, use default value. You can find details in COMPARING RULES chapter.
Another big difference is a direct consequence of the previous one. If you are allowed to not specify all columns values, it may happen that a value for column belonging to a row key is not defined. As a result a value of a row key for this row cannot be calculated. So the mechanism of before-after comparison cannot be used in this case. The solution is provided by unkeyed rows and column matching. It is explained below.
The following differences are reported:
The row was expected to exist but doesn't.
The row exists even though it is not expected.
The row exists as expected but some fields contain values different from expected.
This type of difference is used for rows, for which row keys are not defined in expectations (due to not giving a value for one more columns belonging to a row key). Such rows are called unkeyed rows
If case of unkeyed row (missing values in columns which are parts of a row key), rows can not be compared on a row by row basis since row key is the only information allowing to determine a direct relationship between rows. Hence, the following solution was developed for such rows:
Each unkeyed row defined in expectations is tried to be matched with all its fields with any of actual rows. If there is no such match, the row is reported as unmatched.
The detailed rules are as follows:
All actual rows, which are matched with keyed expectation rows are excluded from further comparison.
If such a row is found, the row is considered matched. If there is no such actual row, the expectation row is reported as unmatched. In addition, the result of comparison with the closest match is reported (see below).
Reporting of matching and unmatching has some additional specifics. Since row key is not defined, it is possible to define a set of identical expectation rows. LReport catches those identical rows and count them. When it reports unmatching rows, it groups identical rows and shows how many of them were not matched. If for example, expectations rows for a given select are
1, ABC, ACT
1, ABC, ACT
1, ABC, ACT
and only 2 of them have been matched with actual rows, row (1, ABC, ACT) is reported to be unmatched once.
You can find details on configuring and using expected-actual comparison (with a step by step example) in manual for LRpt:CSVEADiff
.
LReport does not operate on database tables but on select results. Single csv file contain results of one select statement. If the statement is select * from table
then it is actually the whole table. But this is a specific example. In general, csv file contain results from one select.
In order to manage select results, choose a proper row key, report format etc. each select should be given a name. In some simple cases it is not required to give select a name, but it is recommended to use a meaningful name whenever possible.
Select name is used for following purposes:
Results for each select are stored in a file select_name.ext, where ext is an csv file extension.
When a row key is defined, it contains information about names of selects to which it applies
A report layout contains a name of a select, to which it applies
Selects are defined in files called select files. They use jar record format. Definition of each select ends with a line beginning with %% with white spaces after it.
Definition of each select consists of 2 fields: name and select. Their meaning is as follows:
Name of a select used for csv file creation, row key choosing and so on.
Full text of a select. The text can be divided on several lines, no continuation characters are needed on end of lines. The only restrictions you have to obey are:
* No line of a select text may start with string: or %%. Adding single space in front of them is enough to get rid of this limitation.
Example of a select file is given below:
name: employees
select: select * from employees where empl_id = 234
%%
name: payments
select: select * from payments where empl_id = 234
%%
LReport provides some facilities for generating nicely formatted reports of selects results and found differences. It does that by generating xml reports, which can be then transformed to some document format using XSLT.
LReport package in the source forge also contain XSLT stylesheet for converting xml report to an RTF document.
Reporting is a wide topic by itself so it is not described in details here. You can read more on reporting in LRpt::XMLReport
.
The row key is more or less an equivalent of primary key in database theory. It defines a set of columns, which values uniquely identify each row.
Row key is crucial in before-after comparison, since it determines the direct link between before and after row.
In its code LReport treats row key value as a string. If a row key consists of one column, the key value for a row equals the value of the specified column from this row.
If a row key consists of several columns, then <LReport> creates key value by joining values from those columns, using # as a separator. So if values from columns are 3, Kaluski and <20050601>, then the key value will be 3#Kaluski#20050601. Using a column, which values may contain # as a row key part, may lead to unpredicted affects. It would be good to be able to define, what should be the row key value separator. But it is currently not implemented. The hash character # was chosen for the following reasons:
* It is low in characters order. So it not should disturb in string comparisons. The strings abc#1234 and abcd#1234 are compared correctly. abc#1234 is less then abcd#1234. If # would be high in characters order (having ascii code 123 for example), then the comparison above would consider abc#1234 to be greater then abcd#1234
* I found it very unlikely to use such a character in columns which are parts of unique keys.
Row keys values are treated as strings, so they are compared as strings. Therefore keys value 2 is greater then 1000. In order to achieve correct numerical comparison, column value has to be left padded with zeros. 0002 is lower then 1000.
There are several ways of defining a row key. We can define it in a file, give it at command line and use default key. Currently it is recommended to use only a file method. I am still working on the most convinient and expressive way of defining row keys in command line, so things may change. The default row key is a first column of a table/select results.
Row key file is in jar record format. Definition of each key ends with a line beginning with %% with white spaces after it.
Definition of each key consists of 2 fields: select_name and key_columns. Their meaning is as follows:
Name or names (comma separarated) of selects, which should use this key. This why giving names to all used selects is important.
Comma separated list of columns, which constitute a row key. Column names should be given in the order, in which they appear in the key. In order to left pad with zeros a given column, containing numeric value, the field length has to be specified after the column's name. The field length has to be given after a colon after a column name (no spaces in between). See example below.
Example of a row key file:
select_name: customer
key_columns: customer_id:4
%%
select_name: service, service_history
key_columns: customer_id:4, service_type
%%
As you can see one row key may be defined for several selects. Note that values of customer_id column will be left padded with zeros to create 4 character string.
It's an ordinary csv file. LReport expects that a first line of each such a file contains columns' names. Default field separator is tab. You can read more on defaults and changing them in DEFAULTS AND ENVIRONMENT VARIABLES section.
NOTE: csv files used as input to lcsvdiff.pl have to be sorted by a key used in comparison.
lcsvdiff.pl can compare data from different sources:
Comparison based on csv files is usefull, when you want to observe what changes are done by a given transaction. By using tool like lcsvdmp.pl you can state of a data after each step and then compare those states.
On another extremum is a comparison of 2 databases. When you are interested only in differences between 2 databases, there no need to keep their contents in csv files (which may take huge amount of space for big databases).
Comparing rules are used in expected-actual comparison. They are used to determine what to do when expectations for a given column are not specified. Currently, the following possibilities are available:
Column is important, but not so important to stop the processing. A warning is printed and the comparison goes on
Column is so important, that further comparison does not make any sense. LReport (actually lcsveadiff.pl) terminates.
If column value is not given a default value is used.
If column value is not given a default value is used and a warning is logged
[default] Column is not important. It is not a problem if expectations are not defined. The comparison goes on
You can read more about comparing rules in man page for LRpt::CSVEADiff
.
A select template is an additional feature provided by LReport. It's usefull in cases when a user executes the same select or set of selects many times but for different values in a where clause.
We would like to run the selects from previous example. We want to run it not only for customer_id = 1234 but also for customer_id = 1334 and customer_id = 3476. And maybe some others in future.
We can use a select template here. We define the following file (let's call it sel_tmpl.txt):
name: customer
select: select * from customer where --customer_id--
%%
name: service
select: select * from service where --customer_id--
Then we have create a where key file (wkey.txt):
name: customer_id
key: customer_id = 1234
%%
When we now run lks.pl:
lks.pl --keys=wkey.txt sel_tmpl.txt
Meaning of parameters is as follows:
A file containing a text, with which a where key placeholder should be replaced.
A file containing selects templates
it will print on standard output following lines:
name: customer
select: select * from customer where customer_id = 1234
%%
name: service
select: select * from service where customer_id = 1234
%%
lks.pl will basically replace all entries --entry--
from sel_tmp.txt with a proper definition from wkey.txt. Should we run this to selects for customer_id = 1334, we have to modify wkey.txt only. Output generated by lks.pl can be fed as input to lcsvdmp.pl (in --select option).
You can read more on how lks.pl work in man page for LRpt::KeySubst
.
Also have a look at SYNOPSIS.
Where keys are defined in files called where key files. They use jar record format. Definition of each key ends with a line beginning with %% with white spaces after it.
Definition of each select consists of 2 fields: name and select. Their meaning is as follows:
Name of a key. Has to match with a placeholder in a select template. Where key named abc will replace a placeholder --abc-- in a select template
A text with, which a key placeholder in a select template should be replaced. The text can be divided on several lines, no continuation characters are needed on end of lines. The only restrictions you have to obey are:
* No line of a select text may start with string: or %%. Adding single space in front of them is enough to get rid of this limitation.
Core logic of LReport operates on csv files. However, before we can operate on them, we have to create them first. A tool for creating csv files from select results is lcsvdmp.pl and its usage is explained in example 2.
A database connection file contains instructions on how to connect to the database. It is used by lcsvdmp.pl. As you know each database driver (DBD) has a slightly different flavour. So the syntax of the connection string for Sybase is slightly different from that for Oracle, and is slightly different from Mysql and so on. In order to make lcsvdmp.pl flexible and be able to deal with any source for which DBD exists, I passed the responsibility of opening a connection on a user. Connection file should contain a perl code, which opens a connection and assigns a reference to datbase handle to a variable named $dbh. lcsvdmp.pl will load contents of connection file and eval it.
This an example of opening a connection to ODBC source. As you can see you can even include additional configuration code there:
use Win32::ODBC;
my $DBName = "INVOICES";
my $DBServer = "MYSERVER";
my $DBUser = "pkaluski";
my $DBPassword = "password";
no strict;
Win32::ODBC::ConfigDSN( ODBC_CONFIG_DSN,
"Sybase ASE ODBC Driver",
"DSN=DBSybaseSource",
"Database=$DBName",
"InterfacesFileServerName=$DBServer");
use strict;
my $error = Win32::ODBC::Error();
if( $error ){
die $error;
}
$dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
{RaiseError => 1, AutoCommit => 1}
);
Do not forget about assigning a reference to opened connection to $dbh variable.
Since LReport core logic operates on csv files, you can use LReport for any database for which you are able to create csv files for select results. If you use lcsvdmp.pl to do it, you need to have a DBI driver for a database you are working with. LReport uses the following DBI methods:
connect
prepare
execute
fetchall_arrayref
If the DBD for you database supports those methods, lcsvdmp.pl should be able to do its job.
LReport's core logic works on csv files. This gives the tool significant flexibility, since it can work with virtually any database. However, such an approach does have some consequences, which you should be aware of.
csv files carry no information on data type of each column. LReport treats all columns as texts. This may give surprising results for numeric data. You would expect that 1 and 1.0 are equal, but for LReport they are different texts
LReport will help you to detect that a field's value has changes from 1 to 2. It also detect that a particular columns is new or does not exist any more. But it will not detect that column datatype has changed from int to char.
LReport does not see null values, since it is problematic how to put null values in csv file. No value (tabtab) in csv file is treated as an empty string. So basically LReport will not be able to find a difference between field containing a null value and a field containing an empty string.
LReport has some hardcoded defaults. Each of those defaults can be overriden by environment variable. In some cases, those defaults can be further overriden by command line switches. So the general rule is this:
There are following parameters, which are used by LReport tools. The list below is given in the format: meaning: hardcoded default -> Environment variable -> command line option
txt -> LRPT_CSV_FILE_EXT -> --ext=ext
tab -> LRPT_CSV_FIELD_SEPARATOR -> --sep=separator
. (current directory) -> LRPT_CSV_FILE_PATH -> --path=path
conn_file.txt -> LRPT_CONNECTION_FILE -> --conn_file=file.txt
1000 -> LRPT_CHUNK_SIZE -> --chunk_size=size
keys.txt -> LRPT_GLOBAL_KEYS_FILE --> No command line switch
--#> -> LRPT_DIFF_VALUE_SEPARATOR --> No command line switch
The tool was developed and tested on Windows. I can't think of any reason why it will not work on Unix but it was not tested. Currently it assumes that a name separator in file paths is slash (/) so it probably won't work on VMS and other systems, which do not use slash as a name separator.
NOTE: This sections is under construction
If you are brave enough to try to modify the LReport code and the ugliness of the code did not scare you to death, read this chapter. It will provide you with some guidance on how all things are organized. I hope it will make your coding advanture less painfull.
All LReport logic is encapsulated in packages. All *.pl tools are only wrappers calling one exported function. For example, lcsvdmp.pl looks as follows:
use strict;
use LRpt::CSVDumper;
dump_selects( @ARGV );
Some packages are object oriented, some are not.
LRpt provides 4 main objects to perform expected job (see SYNOPSIS):
LRpt::CSVDumper
Dumps results of selects to a csv file. Wrapped by lcsvdmp.pl.
LRpt::CSVDiff
Compares two sets of csv files. Wrapped by lcsvdiff.pl
LRpt::CSVEADiff
Compares set of csv files with expectations. Wrapped by lcsveadiff.pl
LRpt::XMLReport
Converts csv and diff files to xml format. Wrapped by lrptxml.pl
LRpt::KeySubst
Substitutes where keys place holders in select templates.
Apart from those packages there is a bunch of other classes defined. Some of them may appear pretty usefull. Be careful though. I do not consider them public, so I may decide to change them in future. If you find some of them extremely useful, let me know so I will be more careful in my future desing decisions.
LRpt library consists of the following classes:
LRpt::CollDiff
Object for comparing two collections of rows
LRpt::CollEADiff
Object for comparing collection of rows with a set of expectations
LRpt::Collection
Container for data loaded from csv files.
LRpt::CollUnkeyed
Container for unkeyed rows.
LRpt::Config
Object for managing defaults and runtime parameters.
LRpt::JarReader
Object for reading jar records files.
LRpt::RKeysRdr
Object for reading row keys definition files.
Each of this modules have its POD, so if you would like to know more about them, you can go ahead and read it.
LReport consists of a bunch of tools. Currently regression tests do not test particular modules separately. Instead there are some end-to-end test scenarios. For each testcase there is set of expectations and the results are compared with those expectations.
As of 10-09-2006 tests are divided into 2 groups: tests of simple reporting and tests of difference reporting. The 't' directory contains 2 directories, 1 for each group - DiffRpt and Report. Let's have a look at DiffRpt directory. It constains the following subdirectories/files:
dbschema
dbschema1
dbschema2
dbschema3
t1
t2
t3
t4
t5
driver.pl
driver.pl is test driver for diff reporting tests. It contains perl code running scenarious of four tests.
dbschema and dbschema1 are databases. They are based on csv files and are handled by DBD::CVS. They are used to simulate changes in the database. Instead of changing the contents of the database, selects for before state are done on dbschema database and selects for after state are done on dbschema1.
t1 to t4 are directories used by test cases. Let's have a look at t1. The test case code looks like this (it is in t\DiffRpt\driver.pl file):
CODE
On the beginning, t1 contains configuration and input files. It also contains expected directory. It contains a directory structure with files expected to be generated by tested tools. Once you run the test, it will create actual directory in t1. All output files generated by the test case will be placed in the actual directory and its subdirectories. Once the scenario is done, contents of actual and expected directories are compared. If they are the same, that means that the test case generated exactly what we expected. Otherwise, it behaves differently which means that we either introduced a new bug or modified the functionality. We should either remove the bug, or if changes were intentional - modify the test case.
As of 15-01-2006 the tool have only one user, which is me. I used it in one environment. You can imagine that there are plenty of undetected bugs which will jump out, once you will start using it for your specific needs. I expect that regression tests may fail on Unix. This is due to end of line incompatibilities.
In case you find a bug, let me know.
Thanks to chunking, LReport can dump and compare huge files. Some performance tuning has been already done but there is still some space for improvement.
Currently comparing rules (for expected-actual comparison) have to be specified in command lines options. I expect that for intensive use of LReport, it would be convenient to have a common place where all this is defined. LReport tools should look there if nothing is given in the command line.
This is the first version of a documentation. Some parts of it are still under construction. Some links does not work. I am aware of that. My goal is to provide you with something you can play with. I will keep working on documentation refinements.
This section is a tutorial, which will guide you through functionalities of LReport. It will start with a fairly simple example, in which basic concepts are presented. And then by incresing examples' difficulty, more and more advanced topics will be presented.
After reading this you should have a good understanding of LReport. You should be able to use it and should feel comfortable when reading documentation.
This tutorial uses several example files. You can find a tarball with them on LReport web site.
Imagine that you have to compare 2 simple csv files. They are as follows:
last_name salary occupation
Bush 20000 Clerk
Kowalski 30000 Programmer
Nowak 40000 Manager
Wilson 10000 Clerk
last_name salary occupation
Bush 20000 Assistant
Kos 40000 Manager
Nowak 40000 Manager
Wilson 10000 Clerk
NOTE: Just on the very beginning let me tell you one important thing - input csv files MUST BE SORTED BY THE KEY (later you will find out what a key is). If they are not sorted, you may get weird results.
We compare csv files using lcsvdiff.pl
lcsvdiff.pl before1.txt after1.txt
We will get the following output:
SCHEMA: last_name salary occupation
UPD( Bush ): occupation: Clerk ==#> Assistant
INS( Kos )
DEL( Kowalski )
Let's have a closer look at it, line by line. It says:
SCHEMA: last_name salary occupation
csv file contains rows with 3 columns: last_name, salary and occupation.
UPD( Bush ): occupation: Clerk ==#> Assistant
The value of column occupation in a row identified by Bush (we will talk about identification soon) has changed from Clerk to Assistant.
INS( Kos )
Row identified by Kos (last_name column) is new (additional) in after1.txt file.
DEL( Kowalski )
Row identified by Kowalski (last_name column) from before1.txt does not exist any more (is missing) in after1.txt file.
Now we need to compare files, which contain also employees first names:
first_name last_name salary occupation
James Bush 20000 Clerk
Adam Kowalski 30000 Programmer
Adam Nowak 40000 Manager
John Wilson 10000 Clerk
first_name last_name salary occupation
James Bush 20000 Assistant
James Kos 40000 Manager
Adam Nowak 40000 Manager
John Wilson 10000 Clerk
Lets try lcsvdiff.pl again
lcsvdiff.pl before2.txt after2.txt
This time we get the following output:
Key is not unique. Some rows from file before2.txt will be lost!!! at C:/Perl/site/lib/LRpt/Collection.pm line 351, <GEN0> line 5.
Key is not unique. Some rows from file after2.txt will be lost!!! at C:/Perl/site/lib/LRpt/Collection.pm line 351, <GEN1> line 5.
SCHEMA: first_name last_name salary occupation
UPD( James ): last_name: Bush ==#> Kos
UPD( James ): salary: 20000 ==#> 40000
UPD( James ): occupation: Clerk ==#> Manager
We got 2 warnings and then different and weird result. What has happened?
It's time to talk about keys. When you compare rows (in csv files or database) then you need to define, which column/columns are used to uniquely identify each row. This is so called primary key. It allows you to determine, which rows should be compared with each other, which should be considered missing or additional. Otherwise you could only compare rows basing on their order in both files.
For after1.txt and before1.txt it worked fine, because lcsvdiff.pl assumed a default key - the first column. Values in these column were unique, so it worked well. In the second example, the first column contains first names, which are very likely to be repeated. We have Adam in 2 rows. For this case it would be much better to use columns last_name and first_name as a key. We tell it lcsvdiff.pl about a key a in a following manner:
lcsvdiff.pl --key_cols=last_name,first_name before2.txt after2.txt
Now it worked fine giving the following output:
SCHEMA: first_name last_name salary occupation
UPD( Bush#James ): occupation: Clerk ==#> Assistant
INS( Kos#James )
DEL( Kowalski#Adam )
Note the difference between comparison of before1.txt/after1.txt and before2.txt/after2.txt. Previously rows were identified by Bush, Kos and Kowalski. Now, they are identified by Bush#James, Kos#James and Kowalski#Adam. This is a concatenation of values from key columns - last_name and first_name. # is a concatenation character.
IMPORTANT NOTE - remember that input files for lcsvdiff.pl have to be sorted by the key. If they are not sorted, strange results may be returned.
Let's try with following files
empl_id first_name last_name salary occupation
1 John Wilson 10000 Clerk
9 James Bush 20000 Clerk
11 Adam Kowalski 30000 Programmer
201 Adam Nowak 40000 Manager
empl_id first_name last_name salary occupation
1 John Wilson 10000 Clerk
9 James Bush 20000 Assistant
201 Adam Nowak 40000 Manager
5043 James Kos 40000 Manager
Since empl_id is a unique key, we can again allow lcsvdiff.pl to use default key i.e. first column.
The output is as follows:
SCHEMA: empl_id first_name last_name salary occupation
DEL( 11 )
INS( 5043 )
DEL( 9 )
What's wrong now? Rows are now identified by empl_id. However, their order in the output seem random. It would be good to have them ordered. The current order is because lcsvdiff.pl treats all keys as strings. Numerically 5043 is bigger then 9. But alphabeticaly it is lower. You have to tell lcsvdiff.pl that empl_id is numeric. lcsvdiff.pl will zero-pad it, so it compares correctly.
lcsvdiff.pl --key_cols=empl_id:5 before3.txt after3.txt
SCHEMA: empl_id first_name last_name salary occupation
UPD( 00009 ): occupation: Clerk ==#> Assistant
DEL( 00011 )
INS( 05043 )
--key_cols=empl_id:5
means that column empl_id is numeric so it should be zero-padded to a length of 5 characters.
Sometimes you don't want to compare all values from rows. For example in real life rows often contain date and time of their last modification. Look at the file below (before4.txt):
empl_id first_name last_name salary occupation last_update
1 John Wilson 10000 Clerk 2006-01-12 10:15
9 James Bush 20000 Clerk 2006-05-22 15:35
11 Adam Kowalski 30000 Programmer 2006-03-07 12:10
201 Adam Nowak 40000 Manager 2006-11-12 14:23
Imagine that a record for John Wilson has changed twice - his salary changed from 10000 to 8000 but after a month it has changed back to 10000. So the employment data remaind unchanged. after4.txt will look something like this:
empl_id first_name last_name salary occupation last_update
1 John Wilson 10000 Clerk 2006-04-12 10:15
9 James Bush 20000 Assistant 2006-07-27 17:45
201 Adam Nowak 40000 Manager 2006-11-12 14:23
5043 James Kos 40000 Manager 2006-11-12 10:15
But lcsvdiff.pl shows:
lcsvdiff.pl --key_cols=empl_id:5 before4.txt after4.txt
SCHEMA: empl_id first_name last_name salary occupation last_update
UPD( 00001 ): last_update: 2006-01-12 10:15 ==#> 2006-04-12 10:15
UPD( 00009 ): occupation: Clerk ==#> Assistant
UPD( 00009 ): last_update: 2006-05-22 15:35 ==#> 2006-07-27 17:45
DEL( 00011 )
INS( 05043 )
It reports that there is a difference for row 00001 (John Wilson). And from technical point of view - that's right. But from the point of view of business logic, nothing changed. We want to skip last_update in our comparisons:
lcsvdiff.pl --key_cols=empl_id:5 --skip_cols=last_update before4.txt after4.txt
SCHEMA: empl_id first_name last_name salary occupation last_update
UPD( 00009 ): occupation: Clerk ==#> Assistant
DEL( 00011 )
INS( 05043 )
This is exactly what we expected.
Previous example was used to illustrate basic notions related to comparing csv files in general and lcsvdiff.pl in particular.
Now, we are going to focus on some aspects which may become relevant in real life usage, especially if you often compare csv files of different formats. Especialy testing is a job, when you are really likely to do this.
Your data may be distributed between several files, so in order to get a full picture of what has changed, you may have to compare several types of csv files.
Imagine that you have 2 directories - one with data before the change and the second with data after the change.
employees.txt:
empl_id first_name last_name salary occupation last_update
1 John Wilson 10000 Clerk 2006-01-12 10:15
9 James Bush 20000 Clerk 2006-05-22 15:35
11 Adam Kowalski 30000 Programmer 2006-03-07 12:10
201 Adam Nowak 40000 Manager 2006-11-12 14:23
payments.txt:
empl_id payment_number amount status payment_date last_update
1 1 100.00 PD 2006-05-23 2006-05-24 15:35
1 2 5100.00 PD 2006-06-23 2006-06-27 15:35
1 3 100.05 PD 2006-07-23 2006-10-22 15:35
9 1 8930.00 PD 2006-05-21 2006-05-22 15:35
9 2 100.00 PD 2006-05-22 2006-05-23 15:35
11 1 9580.00 PD 2006-05-03 2006-05-22 15:35
11 2 39840.00 PD 2006-06-23 2006-07-22 15:35
11 3 4530.00 PD 2006-07-10 2006-08-22 15:35
11 4 100.00 PD 2006-08-29 2006-12-22 15:35
201 1 300.00 PD 2006-05-23 2006-05-25 15:35
201 2 400.00 PD 2006-11-23 2006-11-25 15:35
201 3 500.00 PD 2006-12-23 2006-12-24 15:35
employees.txt:
empl_id first_name last_name salary occupation last_update
1 John Wilson 10000 Manager 2006-11-12 10:15
9 James Bush 20000 Clerk 2006-05-22 15:35
11 Adam Kowalski 30000 Programmer 2006-03-07 12:10
201 Adam Nowak 40000 Manager 2006-11-12 14:23
payments.txt:
empl_id payment_number amount status payment_date last_update
1 1 100.00 PD 2006-05-23 2006-05-24 15:35
1 2 5100.00 PD 2006-06-23 2006-06-27 15:35
1 3 100.05 PD 2006-07-23 2006-10-22 15:35
1 4 10000.05 PD 2006-11-23 2006-12-22 15:35
9 1 8930.00 PD 2006-05-21 2006-05-22 15:35
9 2 100.00 PD 2006-05-22 2006-05-23 15:35
11 1 9580.00 PD 2006-05-03 2006-05-22 15:35
11 2 39840.00 PD 2006-06-23 2006-07-22 15:35
11 3 4530.00 PD 2006-07-10 2006-08-22 15:35
11 4 100.00 PD 2006-08-29 2006-12-22 15:35
201 1 300.00 PD 2006-05-23 2006-05-25 15:35
201 2 400.00 CN 2006-11-23 2006-11-25 15:35
201 3 500.00 PD 2006-12-23 2006-12-24 15:35
You have to compare employees.txt and payments.txt from before and after directories accordingly.
lcsvdiff.pl before after
Key is not unique. Some rows from file before/payments.txt will be lost!!! at C:/Perl/site/lib/LRpt/Collection.pm line 351, <GEN0> line 13.
Key is not unique. Some rows from file after/payments.txt will be lost!!! at C:/Perl/site/lib/LRpt/Collection.pm line 351, <GEN1> line 14.
lcsvdiff before/payments.txt after/payments.txt
SCHEMA: empl_id payment_number amount status payment_date last_update
UPD( 1 ): payment_number: 3 ==#> 4
UPD( 1 ): amount: 100.05 ==#> 10000.05
UPD( 1 ): last_update: 2006-07-23 ==#> 2006-11-23
DEL( 9 )
lcsvdiff before/employees.txt after/employees.txt
SCHEMA: empl_id first_name last_name salary occupation last_update
UPD( 1 ): occupation: Clerk ==#> Manager
UPD( 1 ): last_update: 2006-01-12 10:15 ==#> 2006-11-12 10:15
DEL( 9 )
The outcome is weird. No wonder - we did not define keys. But how to define a key, which works for both? You can't. You have to define different key for each file. And you can do it by using special file, called key file. Below is an example (keys_file.txt):
select_name: employees
key_columns: empl_id:5
%%
select_name: payments
key_columns: empl_id:5, payment_number:5
%%
It says that for a file employees.txt a key should be empl_id (numeric, 5 digits) and for payments.txt empl_id and payment_date.
So if you call now lcsvdiff.pl:
lcsvdiff.pl --keys_file=keys_file.txt before after
it will do the following:
It will scan after and before directories for files of the same names.
For each pair it will look into keys file and will find entry for file's basename (filename without extension).
It will then take all columns from key_columns entry of keys file and will treat them as key columns.
Let's see:
lcsvdiff.pl --keys_file=keys_file.txt before after
lcsvdiff before/payments.txt after/payments.txt
SCHEMA: empl_id payment_number amount status payment_date last_update
INS( 00001#00004 )
UPD( 00201#00002 ): status payment_date: PD ==#> CN
lcsvdiff before/employees.txt after/employees.txt
SCHEMA: empl_id first_name last_name salary occupation last_update
UPD( 00001 ): occupation: Clerk ==#> Manager
UPD( 00001 ): last_update: 2006-01-12 10:15 ==#> 2006-11-12 10:15
You can go even futher. If there is a set of csv files you often compare, you can put a file with a definition of most frequently used keys in one location. You can then order lcsvdiff.pl to look there for the definition of the key, instead of specifying a key or a key file every time. Please refer to documentation for a description of LRPT_GLOBAL_KEYS_FILE environment variable.
If you have such a variable defined, calling lcsvdiff.pl will look like this:
lcsvdiff.pl --global_keys before after
--global_keys
switch tells lcsvdiff.pl to look for keys in a file specified in LRPT_GLOBAL_KEYS_FILE variable
In many cases data are not I csv files but in a database. How to compare them using lcsvdiff.pl?
lvcsdiff.pl works on csv files, in which a first row contains column names. Most of well known database servers provide tools for generating such files. However some of them require some time to understand them and some additional tricks if you want to dump results of a select rather then a whole table. lcsvdmp.pl is a simple tool, which does only one think - runs a select (or a group of selects) and dumps results to csv files.
So if employees and payments are actually tables, before we can use lcsvdiff.pl we have to dump to csv files result of 2 selects:
select * from employees
select * from payments
Input file for lcsvdmp.pl will look like this:
name: employees
select: select * from employees
%%
name: payments
select: select * from payments
%%
If we run it
lcsvdmp.pl selects.txt
We will get this:
Cannot open conn_file.txt : No such file or directory at C:/Perl/site/lib/LRpt/CSVDumper.pm line 201.
Apparently it failed. Of course it did. How would it know to which database to connect? As you can see in the message, it is looking for conn_file.txt, a so called connection file. Connection file should contain a perl code, which will open a connection and assign a database handler to a variable $dbh
.
Confused? No problem. Read on. Perl uses DBI to access the database. DBI is a perl module - it is a common interface to database servers. In order to do anything via DBI, you have to open a connection with a server. The details of how you specify a connection parameters differ from vendor to vendor. Moreover it is possible that new drivers will be created. So instead of trying to be smart about currently available drivers, lcsvdmp.pl gives user a chance to write a code, which opens a connection and returns it. For ODBC, connection file will look like this:
my $DBUser = "pkaluski";
my $DBPassword = "password";
$dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
{RaiseError => 1, AutoCommit => 1}
);
Since connection file is actually a perl code which will be evaled, you can add some more sophisticated logic there. Below is an example of code which connects to database via ODBC by specifying server, database and user:
use Win32::ODBC;
my $DBName = "INVOICES";
my $DBServer = "MYSERVER";
my $DBUser = "pkaluski";
my $DBPassword = "password";
no strict;
Win32::ODBC::ConfigDSN( ODBC_CONFIG_DSN,
"Sybase ASE ODBC Driver",
"DSN=BolekSybase",
"Database=$DBName",
"InterfacesFileServerName=$DBServer");
use strict;
my $error = Win32::ODBC::Error();
if( $error ){
die $error;
}
$dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
{RaiseError => 1, AutoCommit => 1}
);
Remember that in this code an opened connection has to be assigned to variable named $dbh
.
If you are not familiar with perl, just go to LReport website and get the proper connection file for well known servers.
You can tell lcsvdmp.pl which connection file to take by --conn_file
option. If you don't specify it, lcsvdmp.pl will look in the current directory for conn_file.txt. If it won't find it, it will look for a file which name is given in LRPT_CONNECTION_FILE. If it won't work, lcsvdmp.pl will die, as it did in our case
In this case we will use a driver allowing to treat csv files as a database. We do it only for explanation purposes so you can repeat an example yourself without installing any particular database (you need to install DBD::CSV in order ro use it). The connection file in our case will look like this:
$dbh = DBI->connect( "DBI:CSV:csv_sep_char=\t;f_dir=database",
{ RaiseError => 1 }
);
NOTE: Database files are in the tutorial package in a directory database.
So let's create a connection file in the current directory and run the program again:
lcsvdmp.pl selects.txt
Have a look on what it prints:
select: select * from employees
results: ./employees.txt
%%
name: payments
select: select * from payments
results: ./payments.txt
%%
Note that it describes to what files it has dumped results of selects. As you can see it creates a file in the current directory, name is a select name plus extension txt.
If you run the same set of selects again and again, each run will overwrite results from previous run. It would be much better to have lcsvdmp.pl to create files in a specified directory. You can do this by using --path
option.
If you do
lcsvdmp.pl --path=before selects.txt
and after some actions
lcsvdmp.pl --path=after selects.txt
in before directory you will have files employees.txt and payments.txt with data in a state before modification and in after - the data after the modification.
Now imagine that you want to check changes not only for one customer, but for several. You are given a group of 20 employees and you have to check how their data changed after a given operation. It would be difficult to check data for employees retrieved in one big select. It would be easier to select data for each employee separately.
So you have to run a select for each employee. You can do it by simply opening selects.txt file from previous stage, changing empl_id in all where clauses, run it, change empl_id, run and so on. This may be tedious for 20 employees. It would be much more convenient to give a list of empl_ids and proper selects will run automatically.
There are several ways to do it. LReport toolkit offers one of them. There is a tool lks.pl which replaces a specific pattern in a file with values given in other special file. Let's see how it works.
We have to create a selects file template like this:
name: employees
select: select * from employees where --empl_id--
%%
name: payments
select: select * from payments where --empl_id--
%%
The file with the detailed value for placeholder --empl_id--
will look like this (keys.txt):
name: empl_id
key: empl_id = 1
%%
When you run
lks.pl --keys=keys.txt selects_tmpl.txt
it will produce on a standard output the following text.
name: employees
select: select * from employees where empl_id = 1
%%
name: payments
select: select * from payments where empl_id = 1
%%
which can then be fed to lcsvdmp.pl as a select file.
It replaced all placeholders with a value empl_id = 1.
If now, you would like to run the same set of selects for employee with id 201, you don't have to modify selects file. You modify keys.txt so it looks like this:
name: empl_id
key: empl_id = 201
%%
and run lcsvdmp.pl for the generated set of selects.
You now know pretty much about LReport toolkit. You should be able to use it in comparison of various csv files. You can now look at additional examples. They will tell something about creating reports. For more details on defaults, tool's customization you can go directly to REFERENCE MANUAL section.
Good luck and have fun.
I created LReport to help me in my tests. When I worked as a tester of business applications, I have noticed that I am wasting much of my time checking what has changed in the database and documenting my observations. Although you cannot fully replace human beings in testing, you can at least support them in the most repetitive, boring and error prone tasks.
You want to check how a given transaction affects a database.
You are not going to observe the whole database. Just results of 2 selects:
select1: select * from customer where customer id = 1234
select2: select * from service where customer id = 1234
Say, results of those selects before the transaction are as follows:
select1 (customer) :
---------------------------------------------
| customer_id | name | last_name | address |
---------------------------------------------
| 1234 | Jan | Nowak | Warszawa |
---------------------------------------------
select2 (service) :
---------------------------------------------------
| customer_id | service_type | price | status |
---------------------------------------------------
| 1234 | MAIL | 1.30 | ACTIVE |
---------------------------------------------------
| 1234 | VOICE | 0.34 | ACTIVE |
---------------------------------------------------
After the transaction, the same selects return the following results:
select1:
---------------------------------------------
| customer_id | name | last_name | address |
---------------------------------------------
| 1234 | Jan | Nowak | Warszawa |
---------------------------------------------
select2:
---------------------------------------------------
| customer_id | service_type | price | status |
---------------------------------------------------
| 1234 | GPRS | 2.05 | ACTIVE |
---------------------------------------------------
| 1234 | VOICE | 0.34 | DEACTIVATED |
---------------------------------------------------
You would like to see what are differences between select results before and after the transaction. After processing the data through the tool chain, you get the following diff report
lcsvdiff.pl before/service.txt after/service.txt
INS( 1234#GPRS )
DEL( 1234#MAIL )
UPD( 1234#VOICE ): status: ACTIVE =#=> DEACTIVATED
Those entries mean the following:
INS( 1234#GPRS )
New row has been inserted with a key 1234#GPRS
DEL( 1234#MAIL )
The row with a key 1234#MAIL has been removed
UPD( 1234#VOICE ): status
In a row with a key (which is customer_id and service_type ) 1234#VOICE, value in a column status has changed from ACTIVE to DEACTIVATED
As you see, there is nothing about select1. This is fine because nothing has changed in customer table.
You can also generate an xml report about the current results of selects. The report will look like this:
<report>
<customer>
<statement><![CDATA[select * from customer where customer_id = 1234]]>
</statement>
<header>
<customer_id/>
<name/>
<last_name/>
<address/>
</header>
<data>
<equal>
<customer_id>1234</customer_id>
<name>Jan</name>
<last_name>Nowak</last_name>
<address>Warszawa</address>
</equal>
</data>
</customer>
<service>
<statement><![CDATA[select * from service where customer_id = 1234]]>
</statement>
<header>
<customer_id/>
<service_type/>
<price/>
<status/>
</header>
<data>
<additional>
<customer_id>1234</customer_id>
<service_type>GPRS</service_type>
<price>2.05</price>
<status>ACTIVE</status>
</additional>
<missing>
<customer_id>1234</customer_id>
<service_type>MAIL</service_type>
<price>1.30</price>
<status>ACTIVE</status>
</missing>
<different>
<customer_id>1234</customer_id>
<service_type>VOICE</service_type>
<price>0.34</price>
<status>
<old_value>ACTIVE</old_value>
<new_value>DEACTIVATED</new_value>
</status>
<different>
</data>
</service>
</report>
You can transform it to some other formats using XSLT. You can transform it to RTF. You can find an RTF document for this example at http:/xxxx.
I hope, you now know what can you expect from LReport. There is more LReport can offer you, for example expected-actual comparison. But for starters this example should be enough. If this is what you are looking for, read on. In the next example, I am going to be more specific on how does the LReport tool chain work.
Now I am going to show you another example. It will work on the same data as in the first example. However, now I will be more specific on how do you use LReport tools to achieve results you want.
Let's see what steps do we need to perform in order to get results presented in previous example.
You can split the whole processing on the following steps:
Let's observe those steps one by one.
It is important to understand that LReport operates on csv files. All comparisons and reporting is done on csv files as a source of data. I will elaborate in LIMITATIONS AND WARNINGS, what are potential benefits and flaws of such an approach. LReport makes the following assumptions about the format of csv files:
Result of each select is placed in a separate file
File extension is txt
Field separator is tab (tabulation)
First row in each file should be a row with columns' names
You can modify some of those defaults. You can read about it in section DEFAULTS AND ENVIRONMENT VARIABLES.
The goal of step 1 is to put the data into csv files so other tools from LReport tool chain can work with them. If your database server provides you with tools allowing to create such files, you don't have to use the tool provided by LReport. But let's assume we will use LReport to do it for us. The tool we have to use is lcsvdmp.pl which stands for "LReport csv dump". You call the lcsvdmp.pl in a following way:
lcsvdmp.pl --conn_file=conn_file.txt --path=data_dir selects.txt
Meaning of parameters:
This is a directory in which csv files will be placed. If this parameter is not given, files will be created in the current directory.
This may be difficult to understand on the beginning. This is the name of the database connection file, which contains instructions on how to connect to the database. lcsvdmp.pl uses DBI to connect to database. As you know each database driver (DBD) has a slightly different flavour. So the syntax of the connection string for Sybase is slightly different from that for Oracle, and is slightly different from Mysql and so on. In order to make lcsvdmp.pl flexible and be able to deal with any source for which DBD exists, I passed the responsibility of opening a connection on a user. Connection file should contain a perl code, which opens a connection and assigns a reference to datbase handle to a variable named $dbh. lcsvdmp.pl will load contents of connection file and eval it.
This an example of opening a connection to ODBC source. As you can see you can even include additional configuration code there:
use Win32::ODBC;
my $DBName = "INVOICES";
my $DBServer = "MYSERVER";
my $DBUser = "pkaluski";
my $DBPassword = "password";
no strict;
Win32::ODBC::ConfigDSN( ODBC_CONFIG_DSN,
"Sybase ASE ODBC Driver",
"DSN=BolekSybase",
"Database=$DBName",
"InterfacesFileServerName=$DBServer");
use strict;
my $error = Win32::ODBC::Error();
if( $error ){
die $error;
}
$dbh = DBI->connect( "DBI:ODBC:DBSybaseSource", $DBUser, $DBPassword,
{RaiseError => 1, AutoCommit => 1}
);
Points to a file containing selects to be executed and dumped to csv files. For our example it has the following format:
name: customer
select: selects * from customer where customer_id = 1234
%%
name: service
select: selects * from customer where customer_id = 1234
%%
Each select should have a name. LReport for each of selects will create a file with a file name equal to select's name, plus extension. So results of select named 'customer' will go to a file customer.txt.
In our example we run the tool in the following way
lcsvdmp.pl --conn_file=conn_file.txt --path=before selects.txt
Note that we have given a --path option with value before. It will make lcsvdmp.pl to put all csv files in before directory, which should exist before we run the tool.
Once we run it, in the before directory you should have 2 files - customer.txt and service.txt with results of according selects.
The generated files will look like this:
customer_id name last_name address
1234 Jan Nowak Warszawa
customer_id service_type price status
1234 VOICE 0.34 ACTIVE
1234 MAIL 1.30 ACTIVE
Fields are separated by tab characters.
It is time for the step 2.
Just do it. Once you are done, you have to see what has changed.
We use lcsvdmp.pl again. We will run it as follows:
lcsvdmp.pl --conn_file=conn_file.txt --path=after selects.txt
As you can see, the only difference is that we specify different destination directory for csv files. If we use before, new csvs will simply overwrite the previous ones and we will loose the information about the state before the transaction.
At this stage, we have information about database states before and after the transaction, stored in separate directories. It's time to compare them.
It's time for the next tool - lcsvdiff.pl.
The command line syntax is as follows
lcsvdiff.pl --keys_file=keys.txt $before_dir $after_dir
Meaning of parameters:
The name of the file, which contains definitions of row keys for each csv file. Some explanation is needed here. In order to be able to compare 2 sets of rows, you have to define a kind of primary key, which will be a direct link between after and before rows to be compared. You can read more about rows keys in BEFORE-AFTER COMPARISON and ROW KEYS. The file for our example will look like this
select_name: customer
key_columns: customer_id:4
%%
select_name: service
key_columns: customer_id:4, service_type
%%
For customer select, the key is a field customer_id. Note the digit in a definition. It describes a length of the field and you should always specify it for numeric fields. Why? As you know lcsvdiff.pl operates on csv files. Those files, by their nature, carry no information about the field type, so they will treat all columns as text. It does not have severe consequences but there is one, which is good to be aware of. 2 is less then 1000 for numeric types. But is greater then 1000 for text comparison. When you specify a field lenght, lcsvdiff.pl will leftpad the value with zeros. So the comparison will be 0002 with 1000 which will give the correct result.
Directory containing csv files with data before the transaction
Directory containing csv files with data after the transaction
lcsvdiff.pl will work as follows:
In our case there is one row only
In case of service, there is a composite key, consisting of columns customer_id and service_type.
So it concludes that this row was inserted by the transaction. So it will generate an INS entry:
INS( 1234#GPRS )
So it concludes that this row was removed by the transaction. So it generates the DEL entry:
DEL( 1234#MAIL )
It will find out that the value of column status is different. So it will generate an UPD entry like below
UPD( 1234#VOICE ): status: ACTIVE =#=> DEACTIVATED
The tool will generate all the output on the standard output.
If you were interested in finding the diffences, you are done at this point. If you would like to generate more readable, nicely formated report, you go to step 5:
We have to use lrptxml.pl tool. But I am not going to describe it here, to avoid bogging you down with too much details. You can read about reporting in manual page for lrptxml.pl.
Project is maintained on Source Forge (http://lreport.sourceforge.net). You can find links to documentation there.
Piotr Kaluski, <pkaluski@piotrkaluski.com>
Copyright 2004-2006 by Piotr Kaluski
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.