SQL Expression Language Tutorial
The SQLAlchemy Expression Language presents a system of representing relational database structures and expressions using Python constructs. These constructs are modeled to resemble those of the underlying database as closely as possible, while providing a modicum of abstraction of the various implementation differences between database backends. While the constructs attempt to represent equivalent concepts between backends with consistent structures, they do not conceal useful concepts that are unique to particular subsets of backends. The Expression Language therefore presents a method of writing backend-neutral SQL expressions, but does not attempt to enforce that expressions are backend-neutral.
The Expression Language is in contrast to the Object Relational Mapper, which is a distinct API that builds on top of the Expression Language. Whereas the ORM, introduced in Object Relational Tutorial, presents a high level and abstracted pattern of usage, which itself is an example of applied usage of the Expression Language, the Expression Language presents a system of representing the primitive constructs of the relational database directly without opinion.
While there is overlap among the usage patterns of the ORM and the Expression Language, the similarities are more superficial than they may at first appear. One approaches the structure and content of data from the perspective of a user-defined domain model which is transparently persisted and refreshed from its underlying storage model. The other approaches it from the perspective of literal schema and SQL expression representations which are explicitly composed into messages consumed individually by the database.
A successful application may be constructed using the Expression Language exclusively, though the application will need to define its own system of translating application concepts into individual database messages and from individual database result sets. Alternatively, an application constructed with the ORM may, in advanced scenarios, make occasional usage of the Expression Language directly in certain areas where specific database interactions are required.
A quick check to verify the version of SQLAlchemy:
For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine():
The echo flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python’s standard logging module. With it enabled, we’ll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False. This tutorial will format the SQL behind a popup window so it doesn’t get in our way; just click the “SQL” links to see what’s being generated.
The first time a method like Engine.execute() or Engine.connect() is called, the Engine establishes a real DBAPI connection to the database, which is then used to emit the SQL.
Define and Create Tables
The SQL Expression Language constructs its expressions in most cases against table columns. In SQLAlchemy, a column is most often represented by an object called Column, and in all cases a Column is associated with a Table. A collection of Table objects and their associated child objects is referred to as database metadata. In this tutorial we will explicitly lay out several Table objects, but note that SA can also “import” whole sets of Table objects automatically from an existing database (this process is called table reflection).
We define our tables all within a catalog called MetaData, using the Table construct, which resembles regular SQL CREATE TABLE statements. We’ll make two tables, one of which represents “users” in an application, and another which represents zero or more “email addresses” for each row in the “users” table:
All about how to define Table objects, as well as how to create them from an existing database automatically, is described in Describing Databases with MetaData.
Next, to tell the MetaData we’d actually like to create our selection of tables for real inside the SQLite database, we use create_all(), passing it the engine instance which points to our database. This will check for the presence of each table first before creating, so it’s safe to call multiple times:
The first SQL expression we’ll create is the Insert construct, which represents an INSERT statement. This is typically created relative to its target table:
To see a sample of the SQL this construct produces, use the str() function:
Notice above that the INSERT statement names every column in the users table. This can be limited by using the values() method, which establishes the VALUES clause of the INSERT explicitly:
Above, while the values method limited the VALUES clause to just two columns, the actual data we placed in values didn’t get rendered into the string; instead we got named bind parameters. As it turns out, our data is stored within our Insert construct, but it typically only comes out when the statement is actually executed; since the data consists of literal values, SQLAlchemy automatically generates bind parameters for them. We can peek at this data for now by looking at the compiled form of the statement:
The interesting part of an Insert is executing it. In this tutorial, we will generally focus on the most explicit method of executing a SQL construct, and later touch upon some “shortcut” ways to do it. The engine object we created is a repository for database connections capable of issuing SQL to the database. To acquire a connection, we use the connect() method:
The Connection object represents an actively checked out DBAPI connection resource. Lets feed it our Insert object and see what happens:
So the INSERT statement was now issued to the database. Although we got positional “qmark” bind parameters instead of “named” bind parameters in the output. How come ? Because when executed, the Connection used the SQLite dialect to help generate the statement; when we use the str() function, the statement isn’t aware of this dialect, and falls back onto a default which uses named parameters. We can view this manually as follows:
What about the result variable we got when we called execute() ? As the SQLAlchemy Connection object references a DBAPI connection, the result, known as a ResultProxy object, is analogous to the DBAPI cursor object. In the case of an INSERT, we can get important information from it, such as the primary key values which were generated from our statement using ResultProxy.inserted_primary_key:
The value of 1 was automatically generated by SQLite, but only because we did not specify the id column in our Insert statement; otherwise, our explicit value would have been used. In either case, SQLAlchemy always knows how to get at a newly generated primary key value, even though the method of generating them is different across different databases; each database’s Dialect knows the specific steps needed to determine the correct value (or values; note that ResultProxy.inserted_primary_key returns a list so that it supports composite primary keys). Methods here range from using cursor.lastrowid, to selecting from a database-specific function, to using INSERT..RETURNING syntax; this all occurs transparently.
Executing Multiple Statements
Our insert example above was intentionally a little drawn out to show some various behaviors of expression language constructs. In the usual case, an Insert statement is usually compiled against the parameters sent to the execute()method on Connection, so that there’s no need to use the values keyword with Insert. Lets create a generic Insert statement again and use it in the “normal” way:
Above, because we specified all three columns in the execute() method, the compiled Insert included all three columns. The Insert statement is compiled at execution time based on the parameters we specified; if we specified fewer parameters, the Insert would have fewer entries in its VALUES clause.
To issue many inserts using DBAPI’s executemany() method, we can send in a list of dictionaries each containing a distinct set of parameters to be inserted, as we do here to add some email addresses:
Above, we again relied upon SQLite’s automatic generation of primary key identifiers for each addresses row.
When executing multiple sets of parameters, each dictionary must have the same set of keys; i.e. you cant have fewer keys in some dictionaries than others. This is because the Insert statement is compiled against the first dictionary in the list, and it’s assumed that all subsequent argument dictionaries are compatible with that statement.
The “executemany” style of invocation is available for each of the insert(), update() and delete() constructs.
We began with inserts just so that our test database had some data in it. The more interesting part of the data is selecting it! We’ll cover UPDATE and DELETE statements later. The primary construct used to generate SELECT statements is the select() function:
Above, we issued a basic select() call, placing the users table within the COLUMNS clause of the select, and then executing. SQLAlchemy expanded the users table into the set of each of its columns, and also generated a FROM clause for us. The result returned is again a ResultProxy object, which acts much like a DBAPI cursor, including methods such as fetchone() and fetchall(). The easiest way to get rows from it is to just iterate:
Above, we see that printing each row produces a simple tuple-like result. We have more options at accessing the data in each row. One very common way is through dictionary access, using the string names of columns:
Integer indexes work as well:
But another way, whose usefulness will become apparent later on, is to use the Column objects directly as keys:
Result sets which have pending rows remaining should be explicitly closed before discarding. While the cursor and connection resources referenced by the ResultProxy will be respectively closed and returned to the connection pool when the object is garbage collected, it’s better to make it explicit as some database APIs are very picky about such things:
If we’d like to more carefully control the columns which are placed in the COLUMNS clause of the select, we reference individual Column objects from our Table. These are available as named attributes off the c attribute of the Table object:
Lets observe something interesting about the FROM clause. Whereas the generated statement contains two distinct sections, a “SELECT columns” part and a “FROM table” part, our select() construct only has a list containing columns. How does this work ? Let’s try putting two tables into our select() statement:
It placed both tables into the FROM clause. But also, it made a real mess. Those who are familiar with SQL joins know that this is a Cartesian product; each row from the users table is produced against each row from the addresses table. So to put some sanity into this statement, we need a WHERE clause. We do that using Select.where():
So that looks a lot better, we added an expression to our select() which had the effect of adding WHERE users.id = addresses.user_id to our statement, and our results were managed down so that the join of users and addresses rows made sense. But let’s look at that expression? It’s using just a Python equality operator between two different Column objects. It should be clear that something is up. Saying 1 == 1 produces True, and 1 == 2 produces False, not a WHEREclause. So lets see exactly what that expression is doing:
Wow, surprise ! This is neither a True nor a False. Well what is it ?
As you can see, the == operator is producing an object that is very much like the Insert and select() objects we’ve made so far, thanks to Python’s __eq__() builtin; you call str() on it and it produces SQL. By now, one can see that everything we are working with is ultimately the same type of object. SQLAlchemy terms the base class of all of these expressions as ColumnElement.
Since we’ve stumbled upon SQLAlchemy’s operator paradigm, let’s go through some of its capabilities. We’ve seen how to equate two columns to each other:
If we use a literal value (a literal meaning, not a SQLAlchemy clause object), we get a bind parameter:
The 7 literal is embedded the resulting ColumnElement; we can use the same trick we did with the Insert object to see it:
Most Python operators, as it turns out, produce a SQL expression here, like equals, not equals, etc.:
If we add two integer columns together, we get an addition expression:
Interestingly, the type of the Column is important! If we use + with two string based columns (recall we put types like Integer and String on our Column objects at the beginning), we get something different:
Where || is the string concatenation operator used on most databases. But not all of them. MySQL users, fear not:
The above illustrates the SQL that’s generated for an Engine that’s connected to a MySQL database; the || operator now compiles as MySQL’s concat() function.
If you have come across an operator which really isn’t available, you can always use the Operators.op() method; this generates whatever operator you need:
This function can also be used to make bitwise operators explicit. For example:
When using Operators.op(), the return type of the expression may be important, especially when the operator is used in an expression that will be sent as a result column. For this case, be sure to make the type explicit, if not what’s normally expected, using type_coerce():
While Operators.op() is handy to get at a custom operator in a hurry, the Core supports fundamental customization and extension of the operator system at the type level. The behavior of existing operators can be modified on a per-type basis, and new operations can be defined which become available for all column expressions that are part of that particular type. See the section Redefining and Creating New Operators for a description.
We’d like to show off some of our operators inside of select() constructs. But we need to lump them together a little more, so let’s first introduce some conjunctions. Conjunctions are those little words like AND and OR that put things together. We’ll also hit upon NOT. and_(), or_(), and not_() can work from the corresponding functions SQLAlchemy provides (notice we also throw in a like()):
And you can also use the re-jiggered bitwise AND, OR and NOT operators, although because of Python operator precedence you have to watch your parenthesis:
So with all of this vocabulary, let’s select all users who have an email address at AOL or MSN, whose name starts with a letter between “m” and “z”, and we’ll also generate a column containing their full name combined with their email address. We will add two new constructs to this statement, between() and label(). between() produces a BETWEEN clause, and label() is used in a column expression to produce labels using the AS keyword; it’s recommended when selecting from expressions that otherwise would not have a name:
A shortcut to using and_() is to chain together multiple where() clauses. The above can also be written as:
Using Textual SQL
Our last example really became a handful to type. Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard. That’s why SQLAlchemy lets you just use strings, for those cases when the SQL is already known and there isn’t a strong need for the statement to support dynamic features. The text() construct is used to compose a textual statement that is passed to the database mostly unchanged. Below, we create a text() object and execute it:
Specifying Bound Parameter Behaviors
The text() construct supports pre-established bound values using the TextClause.bindparams() method:
Specifying Result-Column Behaviors
We may also specify information about the result columns using the TextClause.columns() method; this method can be used to specify the return types, based on name:
If on the other hand we used a string column key, the usual rules of name- based matching still apply, and we’d get an ambiguous column error for the id value:
It’s important to note that while accessing columns from a result set using Column objects may seem unusual, it is in fact the only system used by the ORM, which occurs transparently beneath the facade of the Query object; in this way, the TextClause.columns() method is typically very applicable to textual statements to be used in an ORM context. The example at Using Textual SQL illustrates a simple usage.
Using text() fragments inside bigger statements
text() can also be used to produce fragments of SQL that can be freely within a select() object, which accepts text() objects as an argument for most of its builder functions. Below, we combine the usage of text() within a select() object. The select() construct provides the “geometry” of the statement, and the text() construct provides the textual content within this form. We can build a statement without the need to refer to any pre-established Table metadata:
Using More Specific Text with table(), literal_column(), and column()
We can move our level of structure back in the other direction too, by using column(), literal_column(), and table() for some of the key elements of our statement. Using these constructs, we can get some more expression capabilities than if we used text() directly, as they provide to the Core more information about how the strings they store are to be used, but still without the need to get into full Table based metadata. Below, we also specify the String datatype for two of the key literal_column() objects, so that the string-specific concatenation operator becomes available. We also use literal_column() in order to use table-qualified expressions, e.g. users.fullname, that will be rendered as is; using column()implies an individual column name that may be quoted:
One place where we sometimes want to use a string as a shortcut is when our statement has some labeled column element that we want to refer to in a place such as the “ORDER BY” or “GROUP BY” clause; other candidates include fields within an “OVER” or “DISTINCT” clause. If we have such a label in our select() construct, we can refer to it directly by passing the string straight into select.order_by() or select.group_by(), among others. This will refer to the named label and also prevent the expression from being rendered twice:
The alias in SQL corresponds to a “renamed” version of a table or SELECT statement, which occurs anytime you say “SELECT .. FROM sometable AS someothername”. The AS creates a new name for the table. Aliases are a key construct as they allow any table or subquery to be referenced by a unique name. In the case of a table, this allows the same table to be named in the FROM clause multiple times. In the case of a SELECT statement, it provides a parent name for the columns represented by the statement, allowing them to be referenced relative to this name.
Since on the outside, we refer to the alias using the Alias construct itself, we don’t need to be concerned about the generated name. However, for the purposes of debugging, it can be specified by passing a string name to the FromClause.alias() method:
Aliases can of course be used for anything which you can SELECT from, including SELECT statements themselves. We can self-join the users table back to the select() we’ve created by making an alias of the entire statement. The correlate(None) directive is to avoid SQLAlchemy’s attempt to “correlate” the inner users table with the outer one:
We’re halfway along to being able to construct any SELECT expression. The next cornerstone of the SELECT is the JOIN expression. We’ve already been doing joins in our examples, by just placing two tables in either the columns clause or the where clause of the select() construct. But if we want to make a real “JOIN” or “OUTERJOIN” construct, we use the join() and outerjoin() methods, most commonly accessed from the left table in the join:
The alert reader will see more surprises; SQLAlchemy figured out how to JOIN the two tables ! The ON condition of the join, as it’s called, was automatically generated based on the ForeignKey object which we placed on the addresses table way at the beginning of this tutorial. Already the join() construct is looking like a much better way to join tables.
Of course you can join on whatever expression you want, such as if we want to join on all users who use the same name in their email address as their username:
* SQLAlchemy 1.2 Documentation - Engine Configuration
* FAQ - How to delete a table in SQLAlchemy?
* FAQ - List database tables with SQLAlchemy