What the difference is between flush() and commit() in SQLAlchemy?
I'm particularly interested in their impact on memory usage. I'm loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over - it's a large database and a machine with not much memory.
I'm wondering if I'm using too many commit() and not enough flush() calls - but without really understanding what the difference is, it's hard to tell!
A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren't persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost). The session object registers transaction operations with session.add(), but doesn't yet communicate them to the database until session.flush() is called.
session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren't persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).
commit() commits (persists) those changes to the database. flush() is always called as part of a call to commit() (1).
When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled. Hopefully this example will make this clearer:
* FAQ - SQLAlchemy insert or update example