Sunteți pe pagina 1din 7

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

Compartir Informar sobre mal uso Siguiente blog» Crear un

Compartir

Informar sobre mal uso

Siguiente blog»

Crear un blog

Acceder

sobre mal uso Siguiente blog» Crear un blog Acceder Friday, January 12, 2007 Why SQLAlchemy impresses

Friday, January 12, 2007

Why SQLAlchemy impresses me

One of the reasons ORM tools have a spotted reputation is that it's really, really easy to write a dumb ORM that works fine for simple queries but performs like molasses once you start throwing real data at it.

Let me give an example of a situation where, to my knowledge, only SQLAlchemy of the Python (or Ruby) ORMs is really able to handle things elegantly, without gross hacks like "piggy backing."

Often you'll see a one-to-many relationship where you're not always interested in all of the -many side. For instance, you might have a users table, each associated with many orders. In SA you'd first define the Table objects, then create a mapper that's responsible for doing The Right Thing when you write "user.orders."

(I'm skipping connecting to the database for the sake of brevity, but that's pretty simple. I'm also avoiding specifying columns for the Tables by assuming they're in the database already and telling SA to autoload them. Besides keeping this code shorter, that's the way I prefer to work in real projects.)

users = Table('users', metadata, autoload=True) orders = Table('orders', metadata, autoload=True)

class User(object): pass class Order(object): pass

mapper(User, users, properties={ 'orders':relation(mapper(Order, orders), order_

})

That "properties" dict says that you want your User class to provide an "orders" attribute, mapped to the orders table. If you are using a sane database, SQLAlchemy will automatically use the foreign keys it finds in the relation; you don't need to explicitly specify that it needs to join on "orders.user_id = user.id."

We can thus write

for user in session.query(User).select():

print user.orders

So far this is nothing special: most ORMs can do this much. Most can also specify whether to do eager loading for the orders -- where all the data is pulled out via joins in the first

About Me

all the data is pulled out via joins in the first About Me JONATHAN ELLIS jbellis

JONATHAN

ELLIS

jbellis at

gmail dot

com

View my complete profile

Follow Jonathan on Twitter here

Most popular

Cassandra: fact vs fiction

Cassandra in action

Why you won't be building

your killer app on a DHT

All you ever wanted to

know about writing Bloom Filters

CouchDB: Not drinking the

kool-aid

FormAlchemy 1.0

Scala: first impressions

Anders Heljsberg doesn't

grok Python

Blog Archive

2011 (1)

2010 (8)

2009 (17)

2008 (33)

2007 (40)

December (2)

November (1)

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

select() -- or lazy loading, where orders are loaded via a separate query each time the attribute is accessed. Either of these can be "the right way" for performance, depending on the use case.

The tricky part is, what if I want to generate a list of all users and the most recent order for each? The naive way is to write

class User:

@property def max_order(self):

return self.orders[-1]

for user in session.query(User).select():

print user, user.max_order

This works, but it requires loading all the orders when we are really only interested in one. If we have a lot of orders, this can be painful.

One solution in SA is to create a new relation that knows how to load just the most recent order. Our new mapper will look like this:

mapper(User, users, properties={ 'orders':relation(mapper(Order, orders), order_ 'max_order':relation(mapper(Order, max_orders,

})

("non_primary" means the second mapper does not define persistence for Orders; you can only have one primary mapper at a time. "viewonly" means you can't assign to this relation directly.)

Now we have to define "max_orders." To do this, we'll leverage SQLAlchemy's ability to map not just Tables, but any Selectable:

max_orders_by_user = select([func.max(orders.c.order_id).l group_by=[orders.c.user_id]).a max_orders = orders.select(orders.c.order_id==max_orders_b

"max_orders_by_user" is a subselect whose rows are the max order_id for each user_id. Then we use that to define max_orders as the entire order row joined to that subselect on user_id.

October (4)

September (4)

July (3)

June (2)

May (2)

April (5)

March (2)

February (7)

January (8)

Komodo 4 released;

new free version

PyCon SQLAlchemy

tutorial full

Caution: upgrading to

new version of blogger may i

Abstract of "Advanced

PostgreSQL, part 1"

Why SQLAlchemy

impresses me

MySQL backend

performance

Fun with three-valued

logic

Good advice for

Tortoise SVN users

2006 (38)

2005 (78)

We could define this as eager-by-default in the mapper, but in this scenario we only want it eager on a per-query basis. That looks like this:

q = session.query(User).options(eagerload('max_order')) for user in q.select():

print user, user.max_order

For fun, here's the sql generated:

SELECT users.user_name AS users_user_name, users.user_id A anon_760c.order_id AS anon_760c_order_id, anon_760c.us anon_760c.description AS anon_760c_description, anon_760c.isopen AS anon_760c_isopen

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

FROM users LEFT OUTER JOIN ( SELECT orders.order_id AS order_id, orders.user_id AS orders.description AS description, orders.isopen A FROM orders, ( SELECT max(orders.order_id) AS order_id FROM orders GROUP BY orders.user_id) AS max_orders WHERE orders.order_id = max_orders_by_user.order_id) A ON users.user_id = anon_760c.user_id ORDER BY users.oid, anon_760c.oid

In SQLAlchemy, easy things are easy; hard things take some effort up-front, but once you have your relations defined, it's almost magical how it pulls complex queries together for you.

I'm giving a tutorial on Advanced Databases with SQLAlchemy at PyCon in February. Feel free to let me know if there is anything you'd like me to cover specifically.

Posted by Jonathan Ellis at 10:04 PM

to cover specifically. Posted by Jonathan Ellis at 10:04 PM 18 comments: Anonymous said Your "that's

18 comments:

Posted by Jonathan Ellis at 10:04 PM 18 comments: Anonymous said Your "that's the way I

Anonymous said

Your "that's the way I prefer to work" link isn't quite right

10:37 AM

Jonathan Ellis said said

Fixed.

10:41 AM

Anonymous said

"In SA you'd first define the Table objects, then create a mapper that's responsible for doing The Right Then when you write "user.orders."

Should that be "The Right Thing"? Nice post.

2:34 PM

Jonathan Ellis said said

Quite right. Fixed that too. :)

3:40 PM

Anonymous said

Ugh, three levels of selects, that's some ugly SQL. I know MySQL loves to choke on that kind of query.

You are using a sub-sub-select to get all users and their last order, then joining that to a sub-select to get the order information for that last order per user, and then again an outer select to join the users to get their information.

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

Why not use a single sub-select as a "relational" table to link the users and order tables. Like so:

SELECT u.*, o.* FROM users AS u LEFT JOIN (SELECT user_id, max(order_id) AS order_id FROM orders GROUP BY user_id) AS last_order ON u.user_id = last_order.user_id INNER JOIN orders AS o ON last_order.order_id = o.order_id ORDER BY u.user_id

10:28 PM

Jonathan Ellis said

I found my way more intuitive, but I'm pretty sure a

database with a decent optimizer will generate the same

plan for both. (This may not include MySQL, as you say, but I don't see a point in writing baroque code to accommodate a database that I never use. :)

10:40 PM

Anonymous said

It doesn't impress me.

I think the generic SQL that ORMs tend to generate

might be useful only part of the time. That's why for

really caring ORMs, they make it easy for you to customize the SQL to your unique needs.

Maybe iBATIS is one of the most famous and impressive ones in doing that?

I have my own as well, which is "impressive" in its own

way. hehe. It follows more of the way iBATIS works, without the XML need and with lots of conventions. :-)

Anyway, just ask yourself how could your ORM become more popular and solid at the same time? Is it easy to get there?

Best of luck to you.

3:42 AM

Jonathan Ellis said

SQLAlchemy allows custom SQL for several meanings of the phrase, but the point is, most of the time you don't need it. Which is the best of both worlds IMO.

(SA is not "my" ORM; I just recognize good software when I see it.)

6:56 AM

Anonymous said

Python and Ruby ORMs? Dunno. The main perl one

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

handles this fine. DBIx::Class will happily prefetch all sorts of complex things in a single relation, and it's also trivial to select parts of a one-many rel to prefetch on the fly if you know that's all you're going to need.

8:39 AM

Jonathan Ellis said

I do not see anything in the DBIx::Class documentation that implies that it can do this cleanly (and I see some suggestions that it cannot), but if you are correct, then it's good that people who have to use Perl have that option. :)

9:58 AM

Timur Izhbulatov said

Good job! Thanks! I've just started playing with SA and your post is really helpful.

1:44 PM

zzzeek said

my impression is that iBatis is strictly a rowset-mapper. that in itself is maybe 10% of SQLAlchemy, which if you prefer allows completely straight textual SQL that can be turned into objects at the result set level just like iBatis. But theres no need for that level of tedium.

the eager loading queries Jonathan illustrates work just great with MySQL. SA has several hundred unit tests, dozens of which contain far more exotic queries than that, all of which pass 100% on MySQL and are in use on many production systems today.

also, SQLAlchemy has been adopted by virtually every sub-category of the Python community in less than a year after it was first released so we are doing OK in the popularity department.

8:11 PM

Raven said

Does SQLAlchemy do any kind of internal caching?

For example, if you ask for the same data twice (or an obvious subset of the initially requested data) will the database be hit once or twice?

I recently wrote a caching database abstraction layer for an application and (while fun) it was a fair bit of work to get it to a minimally functional state. If SQLAlchemy did that I would seriously consider jumping on the bandwagon.

I've found things in the docs that imply something like this might be going on, but nothing explicit.

4:36 PM

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

Jonathan Ellis said

No; the author of SA [rightly, IMO] considers caching a separate concern.

What you saw in the docs is probably the SA identity map, which makes it so if you load an instance in two different places, they will refer to the same object. But the database will still be queried twice, so it is not a cache in the sense you mean.

6:56 PM

zzzeek said

what youre talking about is a second level cache, which is distinct from SA's "session" which is cache-like, but thats not its main purpose (identity mapping is its main purpose).

SA doesnt have an integrated second-level cache at the moment. i think theyre a little bit overrated and also its quite easy to write your own that is closely tailored to how your app functions (since theyre usually just dictionaries).

to design a generic one that does what everybody wants without being impossible to configure/understand is a

large undertaking, but we should eventually get one. i need to make a huge pass through myghtyutils first and

clean it up

that

would probably be the backend.

7:54 PM

Matt said

Jonathan,

In this line:

'max_order':relation(mapper(Order, max_orders, non_primary=True), uselist=False, viewonly=True),

How does SA know to join on the user_id column? You haven't set up any foreign keys, nor have you specified where to join.

I've got a tricky problem with an association table -- and

I think this technique (of using selects) would work, but I'm having trouble connecting all the pieces.

10:36 AM

Jonathan Ellis said

I did have FKs in the database, so autoload picked them up. Sorry that wasn't clear.

11:46 AM

love encounter flow said

i am impressed by sqlalchemy too but i recently dumped

it in favor of storm. i've been finding myself far too often in the jungle of documentation, trying to figure out how to do this, how to do that. frequently i was

Jonathan Ellis's Programming Blog - Spyced: Why SQLAlchemy impres

http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html

unable to come up with a solution and had to write sql anyway. sqlalchemy is just too complicated, and i feel i have to jump thru too many hoops to get things done. also i got a lot of strange errors i was unable to resolve with that relations things, when you tie several records from several tables together. i write a lot more raw sql now which i do not particularly like, but at least the level of complexity has decreased. using a db backend should not be so hard. my suspicion is that the current way to use orms to access db data is ultimately not the way to go, that there is a better solution waiting to be discovered.

5:07 AM

Post a Comment

Newer Post

Home

Older Post

Subscribe to: Post Comments (Atom)