Introduction to databases

Concepts

flat-file database

Databases that can only contain one table

Multi-valued

http://www.attcanada.net/~kallal.msn/Articles/fog0000000006.html

ISAM

Indexed Sequential Access Method

key/value ("hashing")

""relational" is the key feature of relational databases: they are interesting not because of a given element (i.e. row in a table) on its own but because of the relations between rows and tables. If you don't need relations, then there's no reason to use a relational DB. The whole point is to use multiple tables ;) Otherwise you just have a flat file."

... or has engines like SleepyCat :-)

relational

object-oriented

XML

Keys

Primary, foreign, referential integrity

Indexes

Views

Normalization

First, second, third normal forms

Transactions

Ie. bunch of SQL instructions that sit between a BEGIN and COMMIT statements

Tuning

SQL

Commands

Triggers

Stored procedures

Atomic transactions, commit/rollback

Pessimistic/optimistic locking

Cursor

Joins

Inner (symetric, ie. returns all the rows common to the tables used; inner join, cross join, equijoin), outer join (asymetric, ie. only some of the rows are returned; outer join, left join, right join), self join (using only one table, and creating temporary aliases)

APIs

MDAC

(From MS) The Microsoft Data Access Components are the key technologies that enable Universal Data Access. They include ActiveX Data Objects (ADO), Remote Data Service (RDS, formerly known as Advanced Data Connector or ADC), OLE DB, and Open Database Connectivity (ODBC).

UDA

Universal Data Access. Here's the UDA architecture as seen on MS site:

OLE DB + ADO

OLE DB is a set of COM interfaces specific to access data sources; OLE DB providers are usually accessed through ADO, which encapsulates OLE DB to expose a simple interface through methods, properties and events.

RDS

(MS) Remote Data Service (RDS) is a feature of ADO. RDS delivers a new Web data access technology that allows developers to create data-centric applications within ActiveX®-enabled browsers such as Microsoft® Internet Explorer.

DAO/RDO

Here's a comparison of OLEDB-ADO and DAO-RDO as seen on MS site:

?????

ODBC

(MS) Open Database Connectivity (ODBC) is a widely accepted application programming interface (API) for database access. It is based on the Call-Level Interface (CLI) specifications from X/Open and ISO/IEC for database APIs and uses Structured Query Language (SQL) as its database access language.

DSN = Data Source Name, ie. name given in ODBC to a connection profile. User DSN = user-specific connections; System DSN = system-wide connections, ie. any user who logs on to this computer can use those DSN's; File DSN = access to DSN definition files (*.DSN)?; ODBC Drivers = list of drivers installed

ODBC configuration done through Control Panel.

JDBC

Implementing concurrency

If your database doesn't handle concurrent accesses, you'll have to implement this in your application. Three possibilities are available:

From optimistic locking:

Anyway... back to the specific question... the way I've done it in the past is to store the timestamp in the Value and View objects (you can store the timestamp in a hidden field on the jsp). Then when you save the object you do:
 
update table foo set x=? where id=? and timestamp=?
 
Check the return to see how many rows where affected. If non were affected then you basically have a stale update (either the timestamp was changed in the interim or someone deleted the row out from under you). I've used triggers or explicit setting of the timestamp in the past. Note that it does not even need to be a real timestamp... it could be a verion number too. In one application we were keeping audit trails for all changes and we decided to use a sequence instead of a timestamp.

From VFP 8.0 and Sql Server: implementing pessimistic locking:

Because we're still not sure of how to best implement sql server locking with a VFP front end, my boss has suggested a customzied locking design. His idea is that VFP would write to a central "lock" table when a user decides to modify a record. Then if another user comes in and tries to modify the same record, his session would check the contents of the central "lock" table to see if the first user is already modifying the same record.

From Code Charge:

From Handling Concurrency Issues in .NET by By Wayne Plourde

The only way to prevent concurrency errors is to lock the records that are being edited. There are two basic approaches towards locking - optimistic and pessimistic. One would consider an optimistic locking scenario when the likelihood of a concurrency condition is low. This is usually the case in systems where the activity is primarily additive, like an order entry system. On the other hand, one would consider pessimistic locking when the likelihood of a concurrency condition is high. This is usually true of management or workflow-oriented systems. [...]
 
One of the drawbacks of using pessimistic locks is that locks can be set then forgotten. Therefore, you may need to provide some way for the application to recover so that the entity in question can be accessed again. One approach is to allow locks to timeout. In this case, you would need to provide a DateTime data field to track when the lock was set. Then if the lock is older then some predetermined amount of time, you would ignore it, allowing users to access the record for updates.
 
Another approach is providing the ability to define permissions to allow an administrator to override or clear previously set locks. This may be especially important if you are not providing timeouts.

To read:

Embedded databases

Embedded means that the database component should ideally live in a single DLL containing both the client and server parts and save its data in the filesystem. Obviously, this means that the database server offers no data protection from external access, either locally or through the network. In other words, those lightweight solutions that offer no server process are meant for single-user usage since the only protection comes from the OS.

SQL

Ocelot

http://www.ocelot.ca

MySQL

FireBird

Speedy

hSQLDB

SQLite

A free C library that implements a DBMS. Comes with a standalone command-line access program that can be used to administer the database. Available here. Doesn't provide a server process to allow safe simultaneous write access, but relies on the file system locking mechanisme instead. Very fast when used with transactions, on par with MySQL when used in asynchronous mode, much slower than MySQL and PostgreSQL when used in synchronous mode. A mailing list is available at Yahoo : http://groups.yahoo.com/group/sqlite/

To import tab-delimited records (ie. CSV):

  1. sqlite.exe mydb
  2. create table test (i integer primary key, name);
  3. copy test from 'import.txt';

If the CVS data don't have a primary key, here's a trick to import those into SQLite:

  1. sqlite.exe mydb
  2. create temporary table table1 (...);
  3. copy table1 from 'import.txt';
  4. create table table2 (keyfield INTEGER PRIMARY KEY, ...);
  5. begin;
  6. insert into table2 select null, * from table1;
  7. commit;

In case you need reports, ie. long lock on a DB, you could create a temporary table, and read from this database instead (http://groups.yahoo.com/group/sqlite/message/3565).

CQL++

TurboPower FlashFiler

Maverick

Hashing, non-SQL

db.*

ZODB

My little base

http://www.mylittlebase.org : Delphi or C++

EZTree        

http://ezgui.com/

It is up to you the programmer, to make sure that you always pass your record data with the same length for each record and the same length for the Index Key.

Cheetah

Light, closed-source DBMS from Planet Squires.

XDB        

http://linux.techass.com/projects/xdb/ : C++

BerkeleyDB

Open-source. Generates hashed files, hence poor man's storage system. Available here.

Tsunami

Light (one DLL), closed-source solution available here.

DvBTree

Available here.

CDB

MetaKit

http://www.equi4.com/metakit/ : C++, Python, Tcl

DynDB

http://www.ohse.de/uwe/dyndb.html

Goods

PureDB

http://www.pureftpd.org/puredb/

Free Database Engines

TurboPower FlashFiler

Speedy

GNU SQL server

Adabase PE

PostGreSQL

Open-source solution available here.

SAP DB

http://www.sapdb.org/

Borland Interbase

An open-source version is available here, here and here.

FireBird Interbase

Enhanced version of Interbase. Available here. Because Borland's open source efforts regarding InterBase never really took off beyond prime release of the source code and the company returned its focus to closed commercial development, Firebird became THE Open Source version of InterBase. Yet more information here.

Jet

A complete ISAM relational database engine that is 9/10ths of Microsoft Access, for free. it's preinstalled on Windows 2000. Does not scale. General-purpose interface.

MSDE

Scaled down version of the SQL 7.0 engine, royalty free, and optimized for up to 5 concurrent users

MySQL

Open-source solution available here

MDBMS

OpenIsis

http://openisis.org

Commercial Database Engines

Besides the usual suspects (Oracle, DB2, Sybase SQL Server, MS SQL Server, etc.)...

Advantage

mSQL

CodeBase

Available here.

Informix SE

http://www-3.ibm.com/software/data/informix/se/

APIs

DAO

Designed to access the Jet database. Can't retrieve non-relational data. Being superseded by ADO, and ADO.Net.

RDO

ADO

Enhancement to DAO. Requires ADO drivers. Used to access SQL DBMS's. Unlike ODBC, can access both relational and non-relational data.

OLEDB

Used by ADO to access actual data.

ODBC

Limited to relational data.

Q&A

What is MDAC used for?

Basic database-related stuff that you need to install in Windows before installing such and such database-specific ODBC driver?

Temp stuff

From: Jon Rista (jrista@hotmail.com)
Subject: Re: Should ODBC be used ?
 
ADO is faster if you use it properly. There are lots of little "things" to be aware of when using ADO, such as lock type, cursor location and type, etc. Proper combinations of different ADO settings can result in very good performance.
 
The best results for SQL Server come when you use the native OLEDB provider for it. When you use the OLEDB provider for ODBC, you add at least one extra layer of abstraction between your presentation tier and the data itself. This decreases performance. Whenever possible, you want to use the native OLEDB provider for SQL Server, as it provides direct access to the database, no intermediate layers (other than ADO) are required. If you use a properly configured connection with a database with proper indexes and constraints, SQL Server 2000 can be faster than Oracle, and significantly faster than IBM DB2.

Resources