bsddb3 Python documentation

Berkeley DB 3.x & 4.x Python Extension Package

Introduction

This is a simple bit of documentation for the bsddb3.db Python extension module which wraps the Berkeley DB 3.x or 4.x C library. The extension module is located in a Python package along with a few pure python modules.

It is expected that this module will be used in the following general ways by different programmers in different situations. The goals of this module are to allow all of these methods without making things too complex for the simple cases, and without leaving out funtionality needed by the complex cases.

  1. Backwards compatibility -- It is desirable for this package to be a near drop-in replacement for the bsddb module shipped with Python which is designed to wrap either DB 1.85, or the 1.85 compatibility interface. This means that there will need to be equivalent object creation functions available, (btopen(), hashopen(), and rnopen()) and the objects returned will need to have the same or at least similar methods available, (specifically, first(), last(), next(), and prev() will need to be available without the user needing to explicitly use a cursor.) All of these have been implemented in Python code in the bsddb3.__init__.py module.
  2. Simple persistent dictionary -- One small step beyond the above. The programmer may be aware of and use the new DB object type directly, but only needs it from a single process and thread. The programmer should not have to be bothered with using a DBEnv, and the DB object should behave as much like a dictionary as possible.
  3. Concurrent access dictionaries -- This refers to the ability to simultaneously have one writer and multiple readers of a DB (either in multiple threads or processes) and is implemented simply by creating a DBEnv with certain flags. No extra work is required to allow this access mode in bsddb3.
  4. Advanced transactional data store -- This mode of use is where the full capabilities of the Berkeley DB library are called into action. The programmer will probably not use the dictionary access methods as much as the regular methods of the DB object, so he can pass transaction objects to the methods. Again, most of this advanced functionality is activated simply by opening a DBEnv with the proper flags, and also by using transactions and being aware of and reacting to deadlock exceptions, etc.

Types Provided

The bsddb3.db extension module provides the following object types:

  • DB: The basic database object, capable of Hash, BTree, Recno, and Queue access methods.
  • DBEnv: Provides a Database Environment for more advanced database use. Apps using transactions, logging, concurrent access, etc. will need to have an environment object.
  • DBCursor: A pointer-like object used to traverse a database.
  • DBTxn: A database transaction. Allows for multi-file commit, abort and checkpoint of database modifications.
  • DBLock: An opaque handle for a lock. See DBEnv.lock_get() and DBEnv.lock_put(). Locks are not necessarily associated with anything in the database, but can be used for any syncronization task across all threads and processes that have the DBEnv open.

Exceptions Provided

The BerkeleyDB C API uses function return codes to signal various errors. The bsddb3.db module checks for these error codes and turns them into Python exceptions, allowing you to use familiar try:... except:... constructs and not have to bother with checking every method's return value.

Each of the error codes is turned into an exception specific to that error code, as outlined in the table below. If you are using the C API documentation then it is very easy to map the error return codes specified there to the name of the Python exception that will be raised. Simply refer to the table below.

Each exception derives from the DBError exception class so if you just want to catch generic errors you can use DBError to do it. Since DBNotFoundError is raised when a given key is not found in the database, DBNotFoundError also derives from the standard KeyError exception to help make a DB look and act like a dictionary.

When any of these exceptions is raised, the associated value is a tuple containing an integer representing the error code and a string for the error message itself.

DBError

Base class, all others derive from this

DBIncompleteError

DB_INCOMPLETE

DBKeyEmptyError

DB_KEYEMPTY

DBKeyExistError

DB_KEYEXIST

DBLockDeadlockError

DB_LOCK_DEADLOCK

DBLockNotGrantedError

DB_LOCK_NOTGRANTED

DBNotFoundError

DB_NOTFOUND (also derives from KeyError)

DBOldVersionError

DB_OLD_VERSION

DBRunRecoveryError

DB_RUNRECOVERY

DBVerifyBadError

DB_VERIFY_BAD

DBNoServerError

DB_NOSERVER

DBNoServerHomeError

DB_NOSERVER_HOME

DBNoServerIDError

DB_NOSERVER_ID

DBInvalidArgError

EINVAL

DBAccessError

EACCES

DBNoSpaceError

ENOSPC

DBNoMemoryError

ENOMEM

DBAgainError

EAGAIN

DBBusyError

EBUSY

DBFileExistsError

EEXIST

DBNoSuchFileError

ENOENT

DBPermissionsError

EPERM

Other Package Modules

  • dbshelve.py: This is an implementation of the standard Python shelve concept for storing objects that uses bsddb3 specifically, and also exposes some of the more advanced methods and capabilities of the underlying DB.
  • dbtables.py: This is a module by Gregory Smith that implements a simplistic table structure on top of a DB.
  • dbutils.py: A catch-all for python code that is generally useful when working with DB's
  • dbobj.py: Contains subclassable versions of DB and DBEnv.
  • dbrecio.py: Contains the DBRecIO class that can be used to do partial reads and writes from a DB record using a file-like interface. Contributed by Itamar Shtull-Trauring.

Testing

A full unit test suite is being developed to exercise the various object types, their methods and the various usage modes described in the introduction. PyUnit is used and the tests are structured such that they can be run unattended and automated. There are currently over 150 test cases!

Reference

See the C language API online documentation at sleepycat.com (or the local copy ) for more details of the functionality of each of these methods. The names of all the Python methods should be the same or similar to the names in the C API.

This version of the documentation was originally based on Berkeley DB 3.3. If you build the module with a different version of Berkeley DB then the items below and in the Sleepycat docs may not be entirely accurate. Refer to the sleepycat documentation for true details.

NOTE: All the methods shown below having more than one keyword argument are actually implemented using keyword argument parsing, so you can use keywords to provide optional parameters as desired. Those that have only a single optional argument are implemented without keyword parsing to help keep the implementation simple. If this is too confusing let me know and I'll think about using keywords for everything.

DBEnv Attributes

db_home
database home directory (read-only)

DBEnv Methods

DBEnv(flags=0):
Constructor More info...
close(flags=0):
Close the database environment, freeing resources. More info...
open(homedir, flags=0, mode=0660):
Prepare the database environment for use. More info...
remove(homedir, flags=0):
Remove a database environment. More info...
set_cachesize(gbytes, bytes, ncache=0):
Set the size of the shared memory buffer pool More info...
set_data_dir(dir):
Set the environment data directory More info...
set_flags(flags, onoff):
Set additional flags for the DBEnv. The onoff parameter specifes if the flag is set or cleared. More info...
set_tmp_dir(dir):
Set the directory to be used for temporary files More info...
set_get_returns_none(flag):
By default when DB.get or DBCursor.get, get_both, first, last, next or prev encounter a DB_NOTFOUND error they return None instead of raising DBNotFoundError. This behaviour emulates Python dictionaries and is convenient for looping.

You can use this method to toggle that behaviour for all of the aformentioned methods or extend it to also apply to the DBCursor.set, set_both, set_range, and set_recno methods. Supported values of flag:

  • 0 all DB and DBCursor get and set methods will raise a DBNotFoundError rather than returning None.
  • 1 Default in module version <4.2.4. The DB.get and DBCursor.get, get_both, first, last, next and prev methods return None.
  • 2 Default in module version >=4.2.4.Extends the behaviour of 1 to the DBCursor set, set_both, set_range and set_recno methods.

The default of returning None makes it easy to do things like this without having to catch DBNotFoundError (KeyError):

                    data = mydb.get(key)
                    if data:
                        doSomething(data)

or this:

                    rec = cursor.first()
                    while rec:
                        print rec
                        rec = cursor.next()

Making the cursor set methods return None is useful in order to do this:

                    rec = mydb.set()
                    while rec:
                        key, val = rec
                        doSomething(key, val)
                        rec = mydb.next()

The downside to this it that it is inconsistent with the rest of the package and noticeably diverges from the Sleepycat DB API. If you prefer to have the get and set methods raise an exception when a key is not found, use this method to tell them to do so.

Calling this method on a DBEnv object will set the default for all DB's later created within that environment. Calling it on a DB object sets the behaviour for that DB only.

The previous setting is retured.

set_lg_bsize(size):
Set the size of the in-memory log buffer, in bytes. More info...
set_lg_dir(dir):
The path of a directory to be used as the location of logging files. Log files created by the Log Manager subsystem will be created in this directory. More info...
set_lg_max(size):
Set the maximum size of a single file in the log, in bytes. More info...
set_lk_detect(mode):
Set the automatic deadlock detection mode More info...
set_lk_max(max):
Set the maximum number of locks. (This method is deprecated.) More info...
set_lk_max_locks(max):
Set the maximum number of locks supported by the Berkeley DB lock subsystem. More info...
set_lk_max_lockers(max):
Set the maximum number of simultaneous locking entities supported by the Berkeley DB lock subsystem. More info...
set_lk_max_objects(max):
Set the maximum number of simultaneously locked objects supported by the Berkeley DB lock subsystem. More info...
set_mp_mmapsize(size):
Files that are opened read-only in the memory pool (and that satisfy a few other criteria) are, by default, mapped into the process address space instead of being copied into the local cache. This can result in better-than-usual performance, as available virtual memory is normally much larger than the local cache, and page faults are faster than page copying on many systems. However, in the presence of limited virtual memory it can cause resource starvation, and in the presence of large databases, it can result in immense process sizes.

This method sets the maximum file size, in bytes, for a file to be mapped into the process address space. If no value is specified, it defaults to 10MB. More info...

log_archive(flags=0):
Returns a list of log or database file names. By default, log_archive returns the names of all of the log files that are no longer in use (e.g., no longer involved in active transactions), and that may safely be archived for catastrophic recovery and then removed from the system. More info...
lock_detect(atype, flags=0):
Run one iteration of the deadlock detector, returns the number of transactions aborted. More info...
lock_get(locker, obj, lock_mode, flags=0):
Aquires a lock and returns a handle to it as a DBLock object. The locker parameter is an integer representing the entity doing the locking, and obj is a string representing the item to be locked. More info...
lock_id():
Aquires a locker id, guaranteed to be unique across all threads and processes that have the DBEnv open. More info...
lock_put(lock):
Release the lock. More info...
lock_stat(flags=0):
Returns a dictionary of locking subsystem statistics with the following keys:

lastid

Last allocated lock ID.

nmodes

Number of lock modes.

maxlocks

Maximum number of locks possible.

maxlockers

Maximum number of lockerspossible.

maxobjects

Maximum number of objects possible.

nlocks

Number of current locks.

maxnlocks

Maximum number of locks at once.

nlockers

Number of current lockers.

nobjects

Number of current objects.

maxnobjects

Maximum number of objects at once.

maxnlockers

Maximum number of lockers at once.

nrequests

Total number of locks requested.

nreleases

Total number of locks released.

nnowaits

Total number of lock requests that failed because of DB_LOCK_NOWAIT.

nconflicts

Tot number of locks not immediately available due to conflicts.

ndeadlocks

Number of deadlocks detected.

regsize

Size of the region.

region_wait

Number of times a thread of control was forced to wait before obtaining the region lock.

region_nowait

Number of times a thread of control was able to obtain the region lock without waiting.

More info...

set_tx_max(max):
Set the maximum number of active transactions More info...
txn_begin(parent=None, flags=0):
Creates and begins a new transaction. A DBTxn object is returned. More info...
txn_checkpoint(kbyte=0, min=0, flag=0):
Flushes the underlying memory pool, writes a checkpoint record to the log and then flushes the log. More info...
txn_stat():
Return a dictionary of transaction statistics with the following keys:

time_ckp

Time the last completed checkpoint finished (as the number of seconds since the Epoch, returned by the IEEE/ANSI Std 1003.1 POSIX time interface).

last_txnid

Last transaction ID allocated.

maxtxns

Max number of active transactions possible.

nactive

Number of transactions currently active.

maxnactive

Max number of active transactions at once.

nbegins

Number of transactions that have begun.

naborts

Number of transactions that have aborted.

ncommits

Number of transactions that have committed.

regsize

Size of the region.

region_wait

Number of times that a thread of control was forced to wait before obtaining the region lock.

region_nowait

Number of times that a thread of control was able to obtain the region lock without waiting.

More info...

DB Methods

DB(dbEnv=None, flags=0)
Constructor. More info...
append(data, txn=None)
A convenient version of put() that can be used for Recno or Queue databases. The DB_APPEND flag is automatically used, and the record number is returned. More info...
associate(secondaryDB, callback, flags=0)
Used to associate secondaryDB to act as a secondary index for this (primary) database. The callback parameter should be a reference to a Python callable object that will consruct and return the secondary key or DB_DONOTINDEX if the item should not be indexed. The parameters the callback will receive are the primaryKey and primaryData values. More info...
close(flags=0)
Flushes cached data and closes the database More info...
consume(txn=None, flags=0)
For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. More info...
consume_wait(txn=None, flags=0)
For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. If the Queue database is empty, the thread of control will wait until there is data in the queue before returning. More info...
cursor(txn=None, flags=0)
Create a cursor on the DB and returns a DBCursor object. If a transaction is passed then the cursor can only be used within that transaction and you must be sure to close the cursor before commiting the transaction. More info...
delete(key, txn=None, flags=0)
Removes a key/data pair from the database More info...
fd()
Returns a file descriptor for the database More info...
get(key, default=None, txn=None, flags=0, dlen=-1, doff=-1)
Returns the data object associated with key. If key is an integer then the DB_SET_RECNO flag is automatically set for BTree databases and the actual key and the data value are returned as a tuple. If default is given then it is returned if the key is not found in the database. Partial records can be read using dlen and doff, however be sure to not read beyond the end of the actual data or you may get garbage. More info...
get_both(key, data, txn=None, flags=0)
A convenient version of get() that automatically sets the DB_GET_BOTH flag, and which will be successful only if both the key and data value are found in the database. (Can be used to verify the presence of a record in the database when duplicate keys are allowed.) More info...
get_byteswapped()
May be used to determine if the database was created on a machine with the same endianess as the current machine. More info...
get_size(key, txn=None)
Return the size of the data object associated with key.
get_type()
Return the database's access method type More info...
join(cursorList, flags=0)
Create and return a specialized cursor for use in performing joins on secondary indices More info...
key_range(key, txn=None, flags=0)
Returns an estimate of the proportion of keys that are less than, equal to and greater than the specified key. More info...
open(filename, dbname=None, dbtype=DB_UNKNOWN, flags=0, mode=0660)
Opens the database named dbname in the file named fileName. The dbname argument is optional and allows applications to have multiple logical databases in a single physical file. It is an error to attempt to open a second database in a file that was not initially created using a database name. In-memory databases never intended to be shared or preserved on disk may be created by setting both the fileName and dbName arguments to None. More info...
put(key, data, txn=None, flags=0, dlen=-1, doff=-1)
Stores the key/data pair in the database. If the DB_APPEND flag is used and the database is using the Recno or Queue access method then the record number allocated to the data is returned. Partial data objects can be written using dlen and doff. More info...
remove(filename, dbname=None, flags=0)
Remove a database More info...
rename(filename, dbname, newname, flags=0)
Rename a database More info...
set_bt_minkey(minKeys)
Set the minimum number of keys that will be stored on any single BTree page More info...
set_cachesize(gbytes, bytes, ncache=0)
Set the size of the database's shared memory buffer pool More info...
set_get_returns_none(flag):
Controls what get and related methods do when a key is not found.

See the DBEnv set_get_returns_none documentation.

The previous setting is retured.

set_flags(flags)
Set additional flags on the database before opening. More info...
set_h_ffactor(ffactor)
Set the desired density within the hash table More info...
set_h_nelem(nelem)
Set an estimate of the final size of the hash table More info...
set_lorder(lorder)
Set the byte order for integers in the stored database metadata. More info...
set_pagesize(pagesize)
Set the size of the pages used to hold items in the database, in bytes. More info...
set_re_delim(delim)
Set the delimiting byte used to mark the end of a record in the backing source file for the Recno access method. More info...
set_re_len(length)
For the Queue access method, specify that the records are of length length. For the Recno access method, specify that the records are fixed-length, not byte delimited, and are of length length. More info...
set_re_pad(pad)
Set the padding character for short, fixed-length records for the Queue and Recno access methods. More info...
set_re_source(source)
Set the underlying source file for the Recno access method More info...
set_q_extentsize(extentsize)
Set the size of the extents used to hold pages in a Queue database, specified as a number of pages. Each extent is created as a separate physical file. If no extent size is set, the default behavior is to create only a single underlying database file. More info...
stat(flags=0)
Return a dictionary containing database statistics with the following keys.

For Hash databases:

magic

Magic number that identifies the file as a Hash database.

version

Version of the Hash database.

nkeys

Number of unique keys in the database.

ndata

Number of key/data pairs in the database.

pagesize

Underlying Hash database page (& bucket) size.

nelem

Estimated size of the hash table specified at database creation time.

ffactor

Desired fill factor (number of items per bucket) specified at database creation time.

buckets

Number of hash buckets.

free

Number of pages on the free list.

bfree

Number of bytes free on bucket pages.

bigpages

Number of big key/data pages.

big_bfree

Number of bytes free on big item pages.

overflows

Number of overflow pages (overflow pages are pages that contain items that did not fit in the main bucket page).

ovfl_free

Number of bytes free on overflow pages.

dup

Number of duplicate pages.

dup_free

Number of bytes free on duplicate pages.

For BTree and Recno databases:

magic

Magic number that identifies the file as a Btree database.

version

Version of the Btree database.

nkeys

For the Btree Access Method, the number of unique keys in the database.

For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted.

ndata

For the Btree Access Method, the number of key/data pairs in the database,

For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted.

pagesize

Underlying database page size.

minkey

Minimum keys per page.

re_len

Length of fixed-length records.

re_pad

Padding byte value for fixed-length records.

levels

Number of levels in the database.

int_pg

Number of database internal pages.

leaf_pg

Number of database leaf pages.

dup_pg

Number of database duplicate pages.

over_pg

Number of database overflow pages.

free

Number of pages on the free list.

int_pgfree

Num of bytes free in database internal pages.

leaf_pgfree

Number of bytes free in database leaf pages.

dup_pgfree

Num bytes free in database duplicate pages.

over_pgfree

Num of bytes free in database overflow pages.

For Queue databases:

magic

Magic number that identifies the file as a Queue database.

version

Version of the Queue file type.

nkeys

Number of records in the database.

ndata

Number of records in the database.

pagesize

Underlying database page size.

pages

Number of pages in the database.

re_len

Length of the records.

re_pad

Padding byte value for the records.

pgfree

Number of bytes free in database pages.

start

Start offset.

first_recno

First undeleted record in the database.

cur_recno

Last allocated record number in the database.

More info...

sync(flags=0)
Flushes any cached information to disk More info...
truncate(txn=None, flags=0)
Empties the database, discarding all records it contains. The number of records discarded from the database is returned. More info...
upgrade(filename, flags=0)
Upgrades all of the databases included in the file filename, if necessary. More info...
verify(filename, dbname=None, outfile=None, flags=0)
Verifies the integrity of all databases in the file specified by the filename argument, and optionally outputs the databases' key/data pairs to a file. More info...

DB Mapping and Compatibility Methods

These methods of the DB type are for implementing the Mapping Interface, as well as others for making a DB behave as much like a dictionary as possible. The main downside to using a DB as a dictionary is you are not able to specify a transaction object.

DB_length() [ usage: len(db) ]
Return the number of key/data pairs in the database.
DB_subscript(key) [ usage: db[key] ]
Return the data associated with key.
DB_ass_sub(key, data) [ usage: db[key] = data ]
Assign or update a key/data pair, or delete a key/data pair if data is NULL.
keys(txn=None)
Return a list of all keys in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
items(txn=None)
Return a list of tuples of all key/data pairs in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
values(txn=None)
Return a list of all data values in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
has_key(key, txn=None)
Returns true if key is present in the database.

DBCursor Methods

close()
Discards the cursor. If the cursor is created within a transaction then you must be sure to close the cursor before commiting the transaction. More info...
count(flags=0)
Returns a count of the number of duplicate data items for the key referenced by the cursor. More info...
delete(flags=0)
Deletes the key/data pair currently referenced by the cursor. More info...
dup(flags=0)
Create a new cursor More info...
put(key, data, flags=0, dlen=-1, doff=-1)
Stores the key/data pair into the database. Partial data records can be written using dlen and doff. More info...
get(flags, dlen=-1, doff=-1)
See get(key, data, flags, dlen=-1, doff=-1) below.
get(key, flags, dlen=-1, doff=-1)
See get(key, data, flags, dlen=-1, doff=-1) below.
get(key, data, flags, dlen=-1, doff=-1)
Retrieves key/data pairs from the database using the cursor. All the specific functionalities of the get method are actually provided by the various methods below, which are the preferred way to fetch data using the cursor. These generic interfaces are only provided as an inconvenience. Partial data records are returned if dlen and doff are used in this method and in many of the specific methods below. More info...

DBCursor Get Methods

These DBCursor methods are all wrappers around the get() function in the C API.

current(flags=0, dlen=-1, doff=-1)
Returns the key/data pair currently referenced by the cursor. More info...
get_current_size()
Returns length of the data for the current entry referenced by the cursor.
first(flags=0, dlen=-1, doff=-1)
Position the cursor to the first key/data pair and return it. More info...
last(flags=0, dlen=-1, doff=-1)
Position the cursor to the last key/data pair and return it. More info...
next(flags=0, dlen=-1, doff=-1)
Position the cursor to the next key/data pair and return it. More info...
prev(flags=0, dlen=-1, doff=-1)
Position the cursor to the previous key/data pair and return it. More info...
consume(flags=0)
For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue.

NOTE: This method is deprecated in Berkeley DB version 3.2 in favor of the new consume method in the DB class.

get_both(key, data, flags=0)
Like set() but positions the cursor to the record matching both key and data. (An alias for this is set_both, which makes more sense to me...) More info...
get_recno()
Return the record number associated with the cursor. The database must use the BTree access method and have been created with the DB_RECNUM flag. More info...
join_item()
For cursors returned from the DB.join method, returns the combined key value from the joined cursors. More info...
next_dup(flags=0, dlen=-1, doff=-1)
If the next key/data pair of the database is a duplicate record for the current key/data pair, the cursor is moved to the next key/data pair of the database, and that pair is returned. More info...
next_nodup(flags=0, dlen=-1, doff=-1)
The cursor is moved to the next non-duplicate key/data pair of the database, and that pair is returned. More info...
prev_nodup(flags=0, dlen=-1, doff=-1)
The cursor is moved to the previous non-duplicate key/data pair of the database, and that pair is returned. More info...
set(key, flags=0, dlen=-1, doff=-1)
Move the cursor to the specified key in the database and return the key/data pair found there. More info...
set_range(key, flags=0, dlen=-1, doff=-1)
Identical to set() except that in the case of the BTree access method, the returned key/data pair is the smallest key greater than or equal to the specified key (as determined by the comparison function), permitting partial key matches and range searches. More info...
set_recno(recno, flags=0, dlen=-1, doff=-1)
Move the cursor to the specific numbered record of the database, and return the associated key/data pair. The underlying database must be of type Btree and it must have been created with the DB_RECNUM flag. More info...
set_both(key, data, flags=0)
See get_both(). The only difference in behaviour can be disabled using set_get_returns_none(2). More info...

DBTxn Methods

abort()
Aborts the transaction More info...
commit(flags=0)
Ends the transaction, committing any changes to the databases. More info...
id()
The txn_id function returns the unique transaction id associated with the specified transaction. More info...
prepare(gid)
Initiates the beginning of a two-phase commit. Begining with BerkeleyDB 3.3 a global identifier paramater is required, which is a value unique across all processes involved in the commit. It must be a string of DB_XIDDATASIZE bytes. More info...

DBLock

The DBLock objects have no methods or attributes. They are just opaque handles to the lock in question.


Document Version: $Id: bsddb3.html,v 1.1.1.1 2004/06/21 18:10:43 vk Exp $

This is a StructuredTextNG document. To see the original, click here