Sunteți pe pagina 1din 14

Foursquare & MongoDB

Harry Heymann
May 21, 2010
Foursquare Overview

A location based social network. Allows users to "check in"


at bars, restaurants, shopping destinations, etc to share
their location with friends.
Rewards users with virtual prizes (points, badges) that can
sometimes lead to real life rewards (5 checkins at a
restaurant might get you a free appetizer)
~1.3M registered users. ~615k checkins/day. Nearly 50M
checkins total. Very rapid growth.
Basic technical details

Written in scala (a somewhat new language what you'd get


if Java & ML had a baby. It runs on the JVM)
Uses a web framework called Lift
Originally used a single PostgreSQL instance as the data
store.
Scaling up on on a SQL database can be frustrating
(replication & sharding don't work as easily as one would
like) so we're moving to MongoDB.
Built in geospatial capabilities (obviously very important to
foursquare) are a very nice bonus.
Transition to Mongo

Currently writing checkins, tips, venues (and various things


related to venues) to MongoDB. All writes still go to
PostgreSQL as well. Slowly migrating various reads.

Exclusively use mongo for our "Who's here" server. A short


lived record of where our users are at any given time (contains
3 hours worth of data).

Migrating geo related queries first. Other items later. Checkins


a high priority (due to the fact that they represent the bulk of our
data).
Geospatial Indexes

MongoDB conveniently supports geospatial indexes out of


the box.
Currently limited to Earth like dimensions (+/- 180 degrees
in each of 2 axes).
It cheats to make the math easier/faster by assuming a flat
earth where 1 degree of latitude or longitude is always the
same distance. This is fine as long as you are dealing with
relatively small distances (as foursquare does)
Implemented using geographic hash codes atop standard
MongoDB b-trees
Creating Geospatial Indexes

Indicate the "2d" index type: db.venues.ensureIndex


({latlng: "2d"})
Specify additional fields if you plan on using compound
geospatial queries (more on these in a moment): db.
venues.ensureIndex({latlng: "2d", closed: 1, keywordList: 1})
Take care: 1k limit on key size

If you have a compound geospatial index and you don't take


care it can be easy to go over the MongoDB limit of 1K. For
example if the index is on {latlng: "2d", keywordList: 1} then
the following venue would be a problem:

{latlng: [40, -72], keywordList: ["some", "venue", "with", "a",


"whole", "lot", "of", "different", "words", "in", "the", "name", "of",
"the", "venue", "it", "just", "keeps", "going", "on", "and", "on",
"forever", "without", "seeming, "to", "ever", "stop"]}

In these cases the individual item will be dropped from the


index making it impossible to query.
Basic Geospatial Queries

Find the closes 20 venues to a given location: db.venues.


find({latlng: {$near: [40.72, -73.99]}).limit(20)
Find up to the closes 20 venues to a given location that are
within 1 degree of the location: db.venues.
find({latlng: {$near: [40.72, -73.99, 1]}).limit(20)

Foursquare uses this to find nearby venues, tips, specials,


and various other geolocated data.
Complex Geospatial Queries

If you have a compound geospatial index defined you can


query on additional fields and still use the index: db.venus.
find({latlng: {$near: [40.72, -73.99]}, closed: false} (because
we generally don't want closed venues)
Basic search: db.venus.find
({latlng: {$near: [40.72, -73.99]}, closed: false, keywordList:
$all: ["nyc", "seminar"]})
Bounded geospatial queries

Foursquare doesn't do much of this, but it's possible to find


all of the items in a collection that are within a given circle or
square:

db.venues.find({latlng: {"$within": {"$box": [[40, -72], [41,


-73]]}}})

db.venues.find({latlng: {"$within": {"$circle": [[40, -72], 0.5]}}}

Can be combined with complex geospatial queries that were


demonstrated on last slide. In general though $near will be
more useful/performant than $within.
MongoDB & Scala/Lift

Lift has a generic ORM layer called record for which there is
a MongoDB implementation.
Originally called scamongo, but as of Lift 2.0 M5 it's
integrated into the core Lift codebase as lift-mongodb
It's a very thin wrapper around the Java driver provided by
10gen
It's new, so has some wards/oddities, but should improve
fairly rapidly (foursquare is working on this)
A foursquare venue using lift-mongo

class Venue extends MongoRecord[Venue]


with MongoId[Venue]
with GeolocationMongo[Venue] {

object venuename extends StringField(this, 255)


object address extends OptionalStringField(this, 50)
object closed extends BooleanField(this)
// etc
}
Basic ORM operations work as you
might expect
val venue = Venue.createRecord
.venuename("NYC Seminar & Conference Center")
.address("71 W 23rd St")
.city("New York").state("NY").zip("10010")

venue.save

val query = QueryBuilder.start(Venue.venuename.name)


.is("Gramercy Tavern")
.get()
venue = Venue.findAll(query)
PS: We're hiring

All sorts of roles: engineering, operations, business. See


http://foursquare.com/jobs or come talk to me.

S-ar putea să vă placă și