Hash Subroutines

Copying Hash Subroutines

The subroutines in the lib directory are compiled into the shared library, libbthash.so.

Subroutine Description
hinit Initialize a hash database
hopen Open the database
hopent Create a temporary database
hclose Close the database
hcloset Remove a temporary database from disk
hisrt Insert a record in the database
hupdt Replace a record in the database
hdlet Delete a record from the database
hget Read a record in the database
hsweep Read the next physical record in the database
hdbck Check every pointer in the database
hchain Count length of chains in the database
hprime Calculate a prime number
hreorg Reorganize the database
hlock Lock the database (UNIX)
hunlock Unlock the database (UNIX)

Update Transaction

The following calls may be used together to create an update transaction.


hinit

Read Initialize a Hash Database about initializing a database.


hopen

UNIX syntax:

hashfmt *hopen(char *dbname, int rwflag, int lock, int *reason);

Windows syntax:

hashfmt *hopen(char *dbname, int rwflag, int *reason);

Parameters:

Processing:

hopen tests for locks. If no lock is pending, it opens the database according to the read/write parameter. Then it reads the first 256 bytes of the database to gather schema information about the database. It validates the signature of the database before formatting the hashfmt structure. Then it allocates memory for the read and write buffer. The hashfmt structure is not freed until hclose. If more than one database is open, the buffers are kept separate.

Returns:

Error Codes


hopent

hashfmt *hopent(int totrcds, int rcdlen, int keylen, int keyofst, int *reason, unsigned char *seed);

Parameters:

Processing:

hopent creates a temporary database name. It then calls hinit to create the temporary database and initialize it. Finally, it calls hopen to make it available for update. The temporary database is locked until it is closed with hcloset.

Using a temporary database:

A temporary database is useful for joins and views of a permanent database. It is also useful for creating summary databases for reports.

A hashing temporary database is useful for data, where the program knows the total number of records being processed and the exact keys of the records. A hashing temporary database is also useful for data that doesn't need to be sorted.

Initializing a hashing temporary database is expensive, because all the records need to be initialized with blank data. A btree database may be more suitable as a temporary database, because it initializes only the root record in the database. A btree database is designed for fast growth. A hash database is designed for stable amounts of data, even though the turnover rate of the data may be high.

Returns:

Error Codes


hclose

int hclose(hashfmt *hshhdr);

Parameters:

Processing:

hclose closes the database. Then it frees all allocated memory in the hashfmt structure. Finally, it frees the hashfmt structure, itself.

Returns:

Zero for good close.

Error Codes


hcloset

int hcloset(hashfmt *hshhdr);

Parameters:

Processing:

hcloset closes the database by calling hclose. It then removes the temporary database from disk.

The link count for the temporary database must go to zero before it can be removed. If the program pipes to another process, the link count may not go to zero. In that case the file may not be removed.

Returns:

Zero for good close.

Error Codes


hisrt

int hisrt(hashfmt *hshhdr, unsigned char *rcd);

Parameters:

Processing:

Locking is mandatory during insert. If a lock is pending, hisrt returns with an error. After a successful insert, the file is unlocked only if it was not opened with a lock. If the file was opened with a lock, the lock remains in effect after a good insert.

hisrt first checks the key to see if it is reserved. If it is not reserved, it continues.

hisrt looks for an existing record with the same key. If no record exists with the same key, it tries to fit the record in the alloted hash location.

The Sedgewick algorithm is adapted to calculate the hash location. First the record key is scrambled with the md5 one-way hash. Then the digest from the hash is divided by the total number of records. The remainder becomes the location of the record.

If several records hash to the same location, there is a collision.

The colliding record fills the free block at the head of the free block stack and the free block stack is popped.

If the free block stack is empty, the database is extended by one record and the record is added to the beginning of the overflow chain.

In either case, the colliding record is logically inserted at the beginning of the overflow chain for the hashing address.

Returns:

Error Codes


hupdt

int hupdt(hashfmt *hshhdr, unsigned char *rcd);

Parameters:

Processing:

Locking is mandatory during update. If a lock is pending, hupdt returns with an error. After a successful update, the file is unlocked only if it was not opened with a lock. If the file was opened with a lock, the lock remains in effect after a good update.

hupdt looks for the record matching the embedded key. If the record is found, it is replaced in the same physical location and written back to disk.

Returns:

Error Codes


hdlet

int hdlet(hashfmt *hshhdr, unsigned char *key);

Parameters:

Processing:

Locking is mandatory during delete. If a lock is pending, hdlet returns with an error. After a successful delete, the file is unlocked only if it was not opened with a lock. If the file was opened with a lock, the lock remains in effect after a good delete.

hdlet looks for the record matching the key. If the record is not found, hdlet returns with an error code of 1.

If the record is found, hdlet sees if the record to delete is in the hash space. If so, it checks to see if the record to be deleted will orphan the overflow chain. If so, it copies the first record in the overflow chain into its location and puts the first overflow location on the free stack. If the record to be deleted does not orphan the overflow chain, it empties the record and returns.

If the record to be deleted is on the overflow chain, hdlet puts the record at the head of the free block stack.

hisrt will use the record on the free block stack the next time a record is inserted.

Returns:

Error Codes


hget

int hget(hashfmt *hshhdr, unsigned char *key, unsigned char *buf);

Parameters:

Processing:

hget searches for a record matching the key. If the record is not found at the hash address, the overflow chain is searched.

Returns:

Error Codes


hsweep

int hsweep(hashfmt *hshhdr, int position, unsigned char *buf);

Parameters:

Processing:

hsweep reads the database in physical sequence. The node is saved in hshhdr->node. The disk address of the record is saved in hshhdr->ofst. hsweep bypasses empty records in the hashing address space and deleted records in the overflow address space.

hsweep verifies that the record is in the correct location if it is in the hashing address space. It does not perform this verification if the record is in the overflow address space.

hsweep returns one record per call. This subroutine is useful for matching criteria in the data portion of the record. It is also useful for doing mass updates to the database. The keys will always be in random sequence.

Returns:

Error Codes


hdbck

int hdbck(hashfmt *hshhdr);

Parameters:

Processing:

hdbck reads the database in physical sequence. Every pointer is mapped to a bit map. If a pointer is mapped twice, it is an error. If a pointer points to the hash address space, it is an error. If the bit map lacks a pointer to a record in the overflow address space, it is an error.

Returns:

Error Codes


hchain

void hchain(hashfmt *hshhdr, int *count, int *smlen, int *smlensq, int *freelen);

Parameters:

Processing:

hchain reads the hashing part of the database in physical sequence. Every overflow chain is walked to determine its length and to gather statistics. Finally the free block chain is walked to determine its length.

Returns:

The output parameters are count, smlen, smlensq, and freelen. The output parameters allow you to calculate the standard deviation of the overflow chains.

Error Codes


hprime

int hprime(int *maxnum);

Parameters:

Processing:

hprime calculates the number of records to allocate to the hashing space in your database. This number should be a prime number.

The input parameter is an estimated prime number. The output from hprime is the largest prime number equal or less than the estimated number.

Returns:

The largest prime number equal or less than the estimated number.

Error Codes


hreorg

int hreorg(hashfmt *hshhdr, int totrcds, double freespc);

Parameters:

Processing:

Back up your database before calling hreorg. There is a risk of losing your data during the renames.

If the total number of records parameter is zero, hreorg changes the database size to the file size plus the free space percent. The totrcds parameter is calculated by dividing the record length into the file size.

hreorg initializes a temporary database with the new database size. It then copies the current database to the temporary database. Finally, it renames the temporary database back to the original.

When the totrcds parameter changes, the hashing location of each record also changes.

hreorg closes the original database with hclose before the rename occurs. When you receive control back from hreorg, you will have to open the database to use it.

Returns:

Zero - good reorganization.

Error Codes


hlock (UNIX)

int hlock(hashfmt *hshhdr);

Parameters:

Processing:

hlock returns if the database is locked by another user. If the database is not locked, the lock is applied and the database status is now LOCK;

Returns:

Error Codes


hunlock (UNIX)

int hunlock(hashfmt *hshhdr);

Parameters:

Processing:

If the database is locked, the unlock is applied and the database status is now NOLOCK;

Returns:

Error Codes