2018年9月18日 星期二

[ Python 文章收集 ] SQLAlchemy quick start with PostgreSQL

Source From Here 
Preface 
This is a quick tutorial for getting started with SQLAlchemy Core API. 

Prerequisites 
In this quick start guide, we’ll assume that PostgreSQL is already installed in some location (let’s say, %POSTGRESQL_DIR%), and %POSTGRESQL_DIR%/bin/ directory is in our PATH and the below command successfully works: 
# psql -U postgres 
psql (9.4.4) 
Type "help" for help. 

postgres=#

By default, PostgreSQL runs at port 5432. Let’s work with that. We also assume that python is installed and in PATH. 

Installing dependencies 
We need to install the sqlalchemy library along with the database adapter for PostgreSQL, which is psycopg2: 
# pip install psycopg2 sqlalchemy

If all goes well, we should be able to import sqlalchemy and psycopg2 in a python REPL: 
>>> import sqlalchemy 
>>> sqlalchemy.__version__ 
1.1.0

Creating a PostgreSQL user and database 
For the sake of this tutorial, let’s create a database called tennis and a user called federer with password grandestslam. Here’s how we do it: 
postgres=# CREATE DATABASE tennis; 
CREATE DATABASE 
postgres=# CREATE USER federer WITH PASSWORD 'grandestslam'; 
CREATE ROLE 
postgres=# GRANT ALL PRIVILEGES ON DATABASE tennis TO federer; 
GRANT

That should do. Now let’s connect to this database and create some tables. 

Connecting to the database 
Our first step will be to connect with the PostgreSQL server. Let’s write a function that connects to the database and returns two things: a connection object and a metadata object. This is important, as we’ll use these two objects to interact with the database later. 
  1. import sqlalchemy  
  2.   
  3. def connect(user, password, db, host='localhost', port=5432):  
  4.     '''Returns a connection and a metadata object'''  
  5.     # We connect with the help of the PostgreSQL URL  
  6.     # postgresql://federer:grandestslam@localhost:5432/tennis  
  7.     url = 'postgresql://{}:{}@{}:{}/{}'  
  8.     url = url.format(user, password, host, port, db)  
  9.   
  10.     # The return value of create_engine() is our connection object  
  11.     con = sqlalchemy.create_engine(url, client_encoding='utf8')  
  12.   
  13.     # We then bind the connection to MetaData()  
  14.     meta = sqlalchemy.MetaData(bind=con, reflect=True)  
  15.   
  16.     return con, meta  
Let’s now call connect to get con and meta: 
>>> con, meta = connect('federer', 'grandestslam', 'tennis') 
>>> con 
Engine(postgresql://federer:***@localhost:5432/tennis) 
>>> meta 
MetaData(bind=Engine(postgresql://federer:***@localhost:5432/tennis))

We’re now connected. 

Creating tables 
Let’s create two tables, slams and results that represents Roger Federer’s results in the major slams in his long and illustrous career. This is the rough vision for our two tables: 
  1. # slams  
  2. ---------------------------------  
  3. | name         | country        |  
  4. ---------------------------------  
  5. | Wimbledon    | United Kingdom |  
  6. | French Open  | France         |  
  7.   
  8.   
  9. # results  
  10. ---------------------------------  
  11. | slam        | year  | result  |  
  12. ---------------------------------  
  13. | Wimbledon   | 2003  | W       |  
  14. | French Open | 2004  | 3R      |  
The slam column of results table is naturally referenced from the name column of slams table. 
  1. from sqlalchemy import Table, Column, Integer, String, ForeignKey  
  2.   
  3. slams = Table('slams', meta,  
  4.     Column('name', String, primary_key=True),  
  5.     Column('country', String)  
  6. )  
  7.   
  8. results = Table('results', meta,  
  9.     Column('slam', String, ForeignKey('slams.name')),  
  10.     Column('year', Integer),  
  11.     Column('result', String)  
  12. )  
  13.   
  14. # Create the above tables  
  15. meta.create_all(con)  
Confirming that the database has been updated 
We can now confirm if the above code worked by checking if the tables have been created from the PostgreSQL prompt. 
  1. # psql -U federer tennis  
  2. Password for user federer:  
  3. psql (9.4.4)  
  4. Type "help" for help.  
  5.   
  6. tennis=> \d+  
  7.                        List of relations  
  8. Schema |  Name   | Type  |  Owner  |    Size    | Description  
  9. --------+---------+-------+---------+------------+-------------  
  10. public | results | table | federer | 8192 bytes |  
  11. public | slams   | table | federer | 8192 bytes |  
  12. (2 rows)  
Before we move on to inserting some values in these tables, let’s take a look at meta.tables first. 

Getting all created tables from MetaData 
To see all the tables that have been created, we just iterate over meta.tables: 
>>> for table in meta.tables: 
... print table 
... 
slams 
results

We can further get the sqlalchemy.schema.Table object on which we can perform inserts by table = meta.tables['slams'] or more generally for some table_name, table = meta.tables[table_name]. As you might have guessed, meta.tables is an immutabledict that has mapping of table names to corresponding Table object. 

Inserting records 
Once our tables are ready, the next step is to actually add records to these tables. For this, we usually create an object of Insert relative to a target table, for example: slams.insert() or results.insert(). 
>>> clause = slams.insert().values(name='Wimbledon', country='United Kingdom') 

>>> con.execute(clause) 

The execute returns a ResultProxy object. After an insert, we can use this object to get the successfully inserted Primary key. 
>>> clause = slams.insert().values(name='Roland Garros', country='France') 

>>> result = con.execute(clause) 

>>> result.inserted_primary_key 
['Roland Garros']

As a practice, when we have lots of records to insert, we instead pass a list of values directly to the execute function. 
  1. victories = [  
  2.     {'slam': 'Wimbledon', 'year': 2003, 'result': 'W'},  
  3.     {'slam': 'Wimbledon', 'year': 2004, 'result': 'W'},  
  4.     {'slam': 'Wimbledon', 'year': 2005, 'result': 'W'}  
  5. ]  
  6.   
  7. con.execute(meta.tables['results'].insert(), victories)  
Next, we’ll take a look at the very special table_name.c object. 

Getting all columns from a table 
To access all the columns of a table, say table_name, we iterate through the table_name.c object. It’s an object of ImmutableColumnCollection from sqlalchemy.sql.base: 
>>> results = meta.tables['result'] 

>>> results.c 
 

>>> for col in results.c: 
... print col 
... 
results.slam 
results.year 
results.result

We’ll now use our knowledge of table_name.c to select some specific things from the results table. 

Selecting 
The simplest way of selecting from a table would be to create an object of Select relative to our target table, and running it with execute(). 
>>> for row in con.execute(results.select()): 
... print row 
... 
(u'Wimbledon', 2003, u'W') 
(u'Wimbledon', 2004, u'W') 
(u'Wimbledon', 2005, u'W')

But what if we needed only results from 2005? For that we use Select.where(): 
>>> clause = results.select().where(results.c.year == 2005) 

>>> for row in con.execute(clause): 
... print row 
... 
(u'Wimbledon', 2005, u'W')

Conclusion 
Congratulations on getting started with SQLAlchemy Core and PostgreSQL. A more comprehensive documentation and tutorial is available here. 

Supplement 
* Using PostgreSQL through SQLAlchemy 
In this Write Stuff article, Gareth Dwyer writes about using SQLAlchemy, a Python SQL toolkit and ORM, discussing the advantages of using it while performing database operations. He shows the differences between using raw SQL and using an ORM, and gives examples that perform CRUD operations on a PostgreSQL database.


[ Py DS ] Ch3 - Data Manipulation with Pandas (Part5)

Source From  Here   Pivot Tables   We have seen how the  GroupBy  abstraction lets us explore relationships within a dataset. A pivot ta...