Useful tricks for data munging and gnuplot presentation

1.) Data processing (munging)

We often find ourselves facing the fact that our generated numerical results as provided by our own (or someone else's for that matter) code don't match our needs. In order to avoid painful hours of copy and paste sessions I compiled an overview of a few useful tools.

Note: For the rest of this page we will assume a tabulated (multiple columns) format of the data files.

Simple merging

cat file1 >> file2

Files are simply appended to each other.

sort -n file1 file2 ...

All files will be merged and sorted by the numeric value of the first entry (column).

paste file1 file2 ...

All files will be merged such that columns will be placed next to one another.(Side by side)

More involved manipulations

sed

Sed is a very powerful stream editor which allows us to manipulate data in various ways. A detailed description of all features would exceed the scope of this writeup. I find myself using the following features more than any other:

sed 's/search term/replace term/g' in-file > out-file

Replaces all instances of search term with replace term and file is then written to out-file

sed '/^[ \t]*$/d' in-file > out-file

This deletes any entirely empty lines in the file ^$, empty but containing spaces ^[ ]*$ or tabs ^[\t]*$

awk

Although I mainly use PERL for more involved operations awk can be quite useful for some simple logic/search operations. Once again, one very useful 'mini-script':

awk ' { if ( $1==$2 ) print $0; }' in-file > out-file

If the logic operation checking if the value in column 1 is equal to that in column 2 returns TRUE ($1==$2) then we print the entire line ($0) or any specific column ($i).

2.) Advanced gnuplot options

Data set operations

Similarly to awk we can also specify logic operations in gnuplot:

gnuplot> plot 'filename' u ($1==$3&$4==0.0346?$3:1/0):($2)

The data to be plotted is taken from filename with the x value being taken from column 3 ($3) and the y value from column 2 ($2). The logic instruction in the x value specifies that only the data points which have equal values in column 1 and 3 AS WELL AS have a fixed value of 0.0346 in column 4 will be used. Furthermore, the syntax :1/0 eliminates any problems with empty lines.

One very useful feature is that all the data parsing techniques mentioned in the first section can also be used directly by gnuplot along with any other command line options. One example would be the combination of two files using paste:

gnuplot> plot "< paste file1 file2" u 1:2 w l

This example will plot column 1 and 2 from the merged two input files. This method allows one to involve more than one file in the data manipulation.

While on the topic of command line manipulation, it is useful to know that any command line commands can be executed from within is they are preceded by a "!" such as the following will provide the current working directory while in gnuplot:

gnuplot> !pwd

Useful tricks for data munging and gnuplot presentation

1.) Data processing (munging)

Simple merging

More involved manipulations

2.) Advanced gnuplot options

Data set operations