GDS Software Free Python Scripts

Introduction

This document describes various python utilities and modules I've written. They are all released under the GNU General Public License (see the end of this page).

You can download a zipfile of all these files from here.

If you'd like to report problems with any of these, I'd appreciate it if you sent the feedback to support@gdssw.com.

banner.py Produce output like the UNIX banner(1) program. Here's the output for the string "Banner" using the character 'l':
     lll                                            
ll
ll lllll ll lll ll lll lllll ll lll
lllll l lll ll lll ll ll l lll ll
ll ll llllll ll ll ll ll lllllll ll
ll ll l ll ll ll ll ll ll ll
llllll lllll l ll ll ll ll lllll llll
bitfield.py Implements a class that allows you to create arbitrarily long bitfields. The bits are numbered from 0 to N-1, where N is the size of the bitfield.

Why you'd want to use it: this class is useful if you want to keep track of a single bit of information for a large set of things. For example, suppose you had a large list of people and the people are numbered from 0 to N-1. If you wanted to keep track of who has paid their dues, you could keep the information in a bitfield of N bits.

The methods are:

    is_set()            Return 1 if specified bit is set
is_clear() Return 1 if specified bit is clear
set_bit() Set a specified bit
clear_bit() Clear a specified bit
set_bit_range() Set a range of bits to one
clear_bit_range() Set a range of bits to zero
set_to_zeros() Set all the bits to zero
set_to_ones() Set all the bits to one
num_bytes_used() How many bytes the bitfield representation takes
The implementation uses a list of strings for the bitfields. You can choose the size of the strings in the list.

Here are some creation times for a 166 MHz Pentium with 32 MB of RAM running Windows NT 4.0:

    size = 10^6 bits, creation time = 0.05 sec
size = 10^7 bits, creation time = 0.5 sec
size = 10^8 bits, creation time = 7.2 sec
size = 10^9 bits, creation time = 327 sec
bsearch.py Implements an object that will perform binary searches for a key. You can initialize it with either a sorted list (or a sorted tuple) or a file object. For a file object, you must give the record size and the number of records in the file. You can also specify an initial offset in the file to where the first record occurs. You also need to provide a compare function that returns -1 for <, 0 for ==, and 1 for >.
cal.py Prints a 3 month calendar from the current month. If you give it a parameter, it adds that many months to the current month. For example, typing python cal.py 3 gives a 3 month calendar starting 3 months in the future.

I wrote this utility because I like to see out three months when planning.

Here's some typical output:

    Now = Tue Aug 20 16:49:08 2002 

August 2002 September 2002 October 2002
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 1 2 3 4 5 6 7 1 2 3 4 5
4 5 6 7 8 9 10 8 9 10 11 12 13 14 6 7 8 9 10 11 12
11 12 13 14 15 16 17 15 16 17 18 19 20 21 13 14 15 16 17 18 19
18 19 20 21 22 23 24 22 23 24 25 26 27 28 20 21 22 23 24 25 26
25 26 27 28 29 30 31 29 30 27 28 29 30 31
checksum.py Script to calculate an MD5 checksum for a group of files. It can recursively descend subdirectories and the output can be sorted by any of the three output fields. Here's some sample output:
    232DB58C75BE6C8FE761624572EB1813 5108      banner.py
A68C0E70B9EAC8BA083F71B7083DE49A 4631 checksum.py
EB4C99C658EB16376C1EFE1D742E8CE0 14762 dbase3.py
The first column is the MD5 hash of the file, the second is the size of the file in bytes, and the third is the file name. Similar to cksum(1).

Why you'd want to use it: This is a handy tool to get a quick description of a set of files. Later, the same description could be generated and compared to the first description. Any differences would tell you about files that have changed or been added/deleted.

cmddecod.py Class to identify commands for a command interpreter. You pass the class a dictionary of the commands you want to allow (the commands are the dictionary's keys). Then you get commands from your user and send them to the class method identify_cmd(). You'll get back None if the command isn't recognized, a string (which is the command string) if the command was uniquely identified, or a list of commands that the string matches the beginning characters of.

For example, suppose your dictionary is

    cmds = { 'cd' : '',
'cp' : '',
'mv' : ''
}
Then, if you pass the string 'c' to the identify_cmd() function, you'll get back the list
       [ 'cd', 'cp' ]
because the user could have meant either 'cd' or 'cp'. If you send in the string 'm', you'll get back the single string 'mv', since there is only one string that could match 'm'.

I wrote this class before I knew about the cmd module included with python.

Why you'd want to use it: if you'd like to write a console program that recognizes its commands from partial user input, then this class could be useful to you.

comb_perm.py Utility routines to return one combination or permutation of a set with each function call. Not thread safe.

Why you'd want to use it: you can do exhaustive searches without having to build a structure in memory or in a file that contains all the combinations or permutations.

I used these routines to find a number of distinct solutions of the "Einstein fish puzzle" that's popular on the web.

dbase3.py Class to read a dBase III file format. There's also a function to print out a string-delimited text form of the database.

Why you'd want to use it: if you need to access a dBase compatible file using python, this class may be useful. Personally, if I had to deal with a database in this format, I'd probably convert it to another form (e.g., dbm) if I needed to use it with python.

ebcdic.py Convert between ASCII and EBCDIC. The functions are:
    EbcdicToAscii(str)
AsciiToEbcdic(str)
Why you'd want to use it: these functions can be useful to convert data from an IBM mainframe computer to a format usable on most other computers.
envcmp.py This tool will compare two files A and B containing the output of a 'set' shell command and produce four groupings:
    * Those environment variables in A but not in B
    * Those environment variables in B but not in A
    * Those environment variables that are common and equal
    * Those environment variables that are common but unequal
    
This tool was useful in environments with hundreds of environment variables -- I was trying to determine the cause of firmware build differences hypothesized to be caused by environmental differences. Manually comparing the environments was tedious and error-prone.
fcut.py Usage: fcut [options] file n:m [n1:m2 ...]

Prints specified line number ranges of a file. Line numbering is 1-based and ranges are inclusive.

    n:m      Print lines n through m 
n: Print from line n to end of file
:m Print from line 1 to line m
: Print all lines of the file
If n is negative, it means to count from the last line of the file backwards.

Options: -n numbers the lines and -r reverses the sense of the specification.

Examples:

    fcut file 10:-10
Chop off the first and last 10 lines of the file.

fcut file :10 -10:
fcut -r file 10:-10
Prints the first 10 and last 10 lines of the file.

fcut -n file :
Number all the lines of the file
Why you'd want to use it: this script lets you pick exactly what lines you want. Example: you want all the lines of a file except for the first one. You can do it with head, tail, wc, and expr, but it's clumsy. It can by done by fcut file 2:.
fset.py This script lets you treat the lines of a text file as a set; a set element is the string making up the line. Two lines in different sets are equal if they compare as the same string (but you can use the -w option to ignore leading and trailing whitespace). You can then take the intersection, union, and difference of sets of lines. The difference operator only operates on the first two files and returns the set of elements in the first file that are not in the second file.

Why you'd want to use it: if you have two large text files that have some common lines, you can find out which lines are common and not common. This is especially handy when the sort order of the two files does not match.

The syntax is

    fset.py [-w] op file1 file2 [file3 ...]
where op is the operation:
    d[ifference]
i[ntersection]
u[nion]
The output is sent to stdout and is sorted.

The -w option means to ignore leading and trailing whitespace when comparing elements.

julian.py Julian day routines from Meeus, "Astronmomical Formulae for Calculators". The functions are:
    Julian(month, day, year)            Integer Julian day number
JulianAstro(month, day, year) Astronomical Julian day number
JulianToMonthDayYear(julian_day) Returns month, day, year tuple
DayOfWeek(month, day, year) 0 = Sunday
DayOfYear(month, day, year) 1 to 365 (366 in leap year)
IsValidDate(month, day, year) Returns true if date is valid
Gregorian date.
IsLeapYear(year) Returns true if year is leap year
NumDaysInMonth(month, year)
Why you'd want to us it: "julian day" in common use means the day number of the year, starting with January 1 being day 1. However, astronomers reckon Julian days as days from 4713 B.C. Using astronomical Julian days can make it easy to calculate days between dates or a date a specified number of days from another date. Click here for more information on Julian days and time keeping.
lc.py Count lines in a file.
lock.py pings a location periodically to keep an Internet connection open. ISP's will typically hang up a modem connection if there is no activity for a while. Pass the time in hours you want it to stay connected in on the command line; defaults to 1 hour.
man2c.py Handy tool for C programmers. Write a man page for a program as a plain text file. Then give it to this tool and it will convert it to a C function called manpage(FILE *s) that prints the manpage text to a stream s. I use this with my C programs to build a manpage into the executable (a nice feature, since the program is then self-contained).

The script is tested by taking a text file, converting it to a C file, compiling and then calling manpage(). The resulting output has to match the original text file (well, the whitespace at the end of lines may not match).

moon.py Prints the time of the moon's phases for a given year. Typical output is
    Moon phases for 2002
New
13 Jan 01:33
12 Feb 19:45
14 Mar 14:07
12 Apr 07:25
12 May 22:49
10 Jun 11:50
10 Jul 22:29
08 Aug 07:17
07 Sep 15:12
06 Oct 23:20
04 Nov 08:37
04 Dec 19:38
First
etc.
mort.py Prints a table of factors to calculate a monthly payment for a mortgage, given the interest rate in %/yr and the length of the mortgage in years. The user divides the principal by 1000 and multiplies this by the table entry to get the monthly payment.

Example output (truncated):

    Monthly payment per $1000 principal

                   Years
    %/yr     10    15    20    25    30    
    ------------------------------------
     5.00  10.61 7.908 6.600 5.846 5.368 
     6.00  11.10 8.439 7.164 6.443 5.996 
     7.00  11.61 8.988 7.753 7.068 6.653 
     8.00  12.13 9.557 8.364 7.718 7.338 
     9.00  12.67 10.14 8.997 8.392 8.046 
    10.00  13.22 10.75 9.650 9.087 8.776 

    Example:  A 10 year loan of $38000 at 
    8.0% per year will require a payment 
    of 38 * 12.13 = $461.

    Formula:  let i = yearly interest in %
                  T = time in years
                  A = (1 + i/1200)^(-T*12)
    Then factor = 1000 * (i/1200)/(1 - A) 
mp.py This is a macro processor that is primarily a string substitution tool. It is a bit different from token-oriented macro processors like m4 and cpp, so you may have to adjust your thinking a bit. It also uses the python machinery to provide programmability in your macro files.

Use python mp.py -h to get a manpage printed to stdout.

Here's an example of the things that can be done with it. If the file inputdata contains the following information:

    .include globals.mp
.define mp_CompanyName =ABC Company
\
mp_CompanyName's bank balance is $\
.code

import string

# Read the second line from the file and take the second field for
# the bank balance.

ifp = open("bank_balance")
ifp.readline()
line = ifp.readline()
ifp.close()
interest, balance = string.split(line)
print "%.2f" % float(balance)

.endcode

mp_NameOfPresident, President
When run as python mp.py inputdata, it produces the following output:
    ABC Company's bank balance is $1897.33

George Washington, President
The .include line allows you to include other files. This is handy to provide common macro definitions or boilerplate text.

The .define line is a macro definition that defines the string that will be used to replace the string mp_CompanyName wherever it is found on a line.

A \ at the end of the line escapes the newline, causing the next line or output to be appended to the line with the escaped newline. This allowed the .code/.endcode section, which is python code that opens a file and gets the second token from the second line, to append its output to the dollar sign $ for the bank balance. Python code embedded like this is compiled and executed on the fly; any strings the code outputs to stdout is included in the text.

ncss.py Counts lines of non-commented source code for C and C++. Strips out comments, then counts non-empty lines. Prints a report to stdout.
otp.py Class to generate one time pad sequences of 128 bit numbers. Typical output from 5 calls is:
    F793D8198D0F7945B2CEE4B573475222
4632C131EEA88B80DF5C92B218236ADA
6628BB06C4349FC46A211434224B5056
D363C858D1365748FC2E90C0CC6617B0
F34DBEC0183DFFB98581D0A3BEF0BDD0
Note that as constructed, this class does not generate cryptographically secure one time pads, since it relies on the whrandom module of python. If you substitute a cryptographically strong random number generator, you will get cryptographically strong one time pads.
roman.py Two functions to convert back and forth between decimal numbers and roman numerals:
    RomanNumeralsToDecimal(roman_string)
DecimalToRomanNumerals(base10_integer)
sample.py Routines to sample with and without replacement and to shuffle. Also includes routines to shuffle and deal "decks" of cards. The routines are:
    sample_wr(population_size, sample_size)
Sample with replacement from a set of integers from 1 to
sample_size. It returns a list of the integers that were
selected. The sampling distribution is binomial.

sample_wor(population_size, sample_size)
Sample without replacement from a set of integers from 1 to
sample_size. It returns a list of the integers that were
selected. The sampling distribution is hypergeometric.

shuffle(sample_size)
Returns a random permutation of the integers 1 to sample_size.

deal(deck_size, num_hands, num_per_hand)
Returns a dictionary of hands (list of integers) dealt from
the integers 1 to deck_size. The hands are keyed by
1, 2, ..., num_hands. Any leftover "cards" go into the list
keyed by 0.

deal_deck(num_hands, num_per_hand)
Returns a dictionary of a dealt card hand. The deal() function
is used, but the routine also maps the integers to a string that
contains the card identifications. For example, 1 -> 2S,
2 -> 3S,
..., etc.
These routines are useful to help you select random samples from populations or to randomize the order of a sequence. A typical use is to randomize the order of experimental trials.
set.py, setf.py Both of these modules contain a Set class in them which implement sets. set.py is implemented with lists and each set can hold arbitrary objects. For sets larger than on the order of 1000, performance drops. For a faster implementation, use setf.py, which implements sets with dictionaries. This is much faster than set.py; the tradeoff is that now set elements can only contain hashable objects.

Here are some of the methods of the class Set:

add_to_set(element)
Adds element to the set. element can be a list, tuple, dictionary, string, number, or set. If it is a list, tuple, dictionary, or set, and self.decompose is true, element is broken into its component parts and those parts are stored in the set (otherwise element is just added to the set).
delete_from_set(element)
Deletes element from the set. element can be a list, tuple, dictionary, string, number, or a set. If it is a list, tuple, dictionary, or set and self.decompose is true, element is broken into its component parts and each part is deleted from the set. An exception will occur if one of the elements is not in the set and self.harsh is true.
intersection(other_set)
Returns a set that is the intersection of self and other_set.
union(other_set)
Returns a set that is the union of self and other_set.
difference(other_set)
Returns a set that consists of all elements in self that are not in other_set.
is_in_set(element)
Returns 1 if element is in the set, 0 if not.
is_empty_set()
Returns 1 if the set is empty.
is_subset_of(other_set)
Returns 1 if self is a subset of other_set; otherwise returns 0.
is_proper_subset_of(other_set)
Returns 1 if self is a proper subset of other_set; otherwise returns 0.
list()
Returns a list of the elements of the set.
space.py Calculates number of bytes per directory in a directory tree. Can display a list sorted by percentage of the total size of the tree. Example output for one of my directories is:
    For directory '.':     [total bytes = 3.0 MB]
Percent
of total Directory
-------- --------------------------------------------------
31.9 ./from_web
27.7 ./from_web/htmlgen
23.1 ./from_web/htmlgen/html
5.3 .
4.6 ./from_web/htmlgen/image
2.5 ./RCS
2.2 ./script
1.4 ./from_web/yarn
1.3 ./from_web/htmlgen/data
[0 directories not shown]
This tells me my current directory and all subdirectories have files consuming about 3 MB. A third of that space is in the from_web subdirectory and most of that is in the htmlgen directory. The [0 directories not shown] message tells how many directories were smaller than the threshold of 0.1% to be shown.
spinner.py Class implementation of a simple spinner to show progress of a computation.
stack.py Simple stack class.
sw.py Windows utility program that provides a command line stopwatch. When you press a key, the split time (time since the last key was pressed), total elapsed time, time/date, and key pressed are printed on a line. Special keys are:
    q        Quit
Z Rezero the timer
C Get prompted for a comment
If a file is included on the command line, the data are also logged to that file.

Here is a sample of the program's output:

    Times are in seconds

Diff time Total time
--------- ----------
8.9 8.9 Tue Aug 20 19:09:37 2002
43.6 52.5 Tue Aug 20 19:10:20 2002
13.6 66.1 Tue Aug 20 19:10:34 2002 Quitting
tc.py Class to calculate temperatures and voltages of E, J, K, R, S, and T thermocouples. The methods are:
    mV_to_degF
mV_to_degC
degF_to_mV
degC_to_mV
These are based on polynomial approximations published in the Omega Temperature Catalog vol 26, 1988, page T-12.
tree.py Defines the class Tree, which will return an ASCII representation of a directory tree (each line represents one directory level). Here's some sample output for a python directory:
    .
| DLLs
| Doc
| | api
| | doc
| | ext
| | icons
| | lib
| | ref
| | tut
| Lib
| | Plat-Win
| | lib-tk
| | test
| | | output
| Tools
| | Scripts
| | idle
| | pynche
| | | X
| | versioncheck
| | webchecker
| include
| libs
util.py Various utilities:
Ruler                 Return a ruler
TensRuler 10's ruler to go along with Ruler()
WindChillInDegF Calculate wind chill given OAT & wind speed
Deg2Rad Converts degrees to radians
Rad2Deg Converts radians to degrees
SpellCheck Checks that a list of words is in a dictionary
Keep Keep only specified characters in a string
Remove Remove a specified set of characters from a string
ListInColumns Produce a listing like ls
Debug A class that helps with debugging
Time Returns a string giving local time and date
AWG Returns wire diam in inches for AWG gauge number
NiceRound Rounds a floating pt to nearest 1, 2, or 5.
SignificantFiguresS Rounds to specified num of sig figures (returns string)
SignificantFigures Rounds to specified num of sig figures (returns float)
SignMantissaExponent Returns tuple of sign, mantissa, exponent
where.py Searches the directories in the PATH variable for a file that matches one or more regular expressions. For example, the command where -i python on my computer results in:
    c:\bin\python22/python.exe
c:\bin\python22/pythonw.exe
c:\winnt\system32/python15.dll
c:\winnt\system32/python15.lib
c:\winnt\system32/python21.dll
c:\winnt\system32/python22.dll
The -i option makes the search case insensitive.

Note that it files all files whose name matches the regular expression, rather than just executables.

wire.py Defines the Wire class, which can calculate various characteristics of wire. As an example, it prints out a copper wire table in terms of AWG diameters for 20 deg C and 40 deg C.

You set the characteristics of a wire. Then you can get the following properties:

    Resistance in ohms
Length in m
Mass in kg
Diameter in m
Resistivity in ohm*m at current temperature
Specific gravity
Temperature in K
Material
xref.py Indexes whitespace-separated tokens in a text file by line number. Here's some sample output from indexing its own source code:
    def             : 16
dict : 34,40,41,43,44,45
dictionary : 2,4
donp : 11
ds : 60
else : 30
except : 23
exit : 51
file : 3,24,50
filename : 16,20,24
for : 26,35,38,56,61,63,69
fp : 20,21,22

GNU General Public License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

See http://www.gnu.org/licenses/licenses.html for more details.

File: gds_scripts.mp revision 1.4 [25 Aug 2002]