Documente Academic
Documente Profesional
Documente Cultură
Zed A. Shaw
Guillermo O. Tordek Freschi
July 2010
ii
Contents
Preface 1 Introduction 1.1 Language Agnostic . . . 1.2 Asynchronous . . . . . . 1.3 Message Protocol . . . . 1.4 Application Oriented . . 1.5 Automated Management 1.6 Using This Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1 2 2 2 3 3 4 5 5 6 6 6 7 7 8 8 9 11 11 13 13 15 16 17 17 18 18 19 19 20
Installing 2.1 Install Dependencies . . . . . . . . 2.2 Building Mongrel2 . . . . . . . . . 2.2.1 Using the .tar.bz2 File . . . 2.2.2 Using git . . . . . . . . . . . 2.3 Building And Installing . . . . . . 2.3.1 Other platforms than Linux 2.4 Testing The Installation . . . . . . . 2.5 Upgrading from trunk . . . . . . . 2.6 Up Next . . . . . . . . . . . . . . .
Managing 3.1 Model-View-Controller . . . . . . . . . 3.2 Trying m2sh . . . . . . . . . . . . . . . 3.2.1 What The Hell Just Happened? 3.3 A Simple Conguration File . . . . . . 3.4 How A Cong Is Structured . . . . . . 3.4.1 Server . . . . . . . . . . . . . . 3.4.2 Host . . . . . . . . . . . . . . . 3.4.3 Route . . . . . . . . . . . . . . . 3.4.4 Dir . . . . . . . . . . . . . . . . 3.4.5 Proxy . . . . . . . . . . . . . . . 3.4.6 Handler . . . . . . . . . . . . . 3.4.7 Others . . . . . . . . . . . . . . iii
iv 3.5 3.6 A More Complex Example . . . . . . . . . . . Routing And Host Patterns . . . . . . . . . . 3.6.1 How Routing Works . . . . . . . . . . 3.6.2 JSON/XML Message Routing Syntax 3.7 Deployment Logs And Commits . . . . . . . 3.8 Control Port . . . . . . . . . . . . . . . . . . . 3.9 Multiple Servers . . . . . . . . . . . . . . . . . 3.10 Tweakable Expert Settings . . . . . . . . . . . 3.11 SSL Conguration . . . . . . . . . . . . . . . . 3.11.1 Experimental SSL Caching . . . . . . 3.12 Conguring Filters (BETA) . . . . . . . . . . . 4 Deploying 4.1 Mongrel2 Deployment Requirements . . 4.1.1 Introducing procer . . . . . . . . 4.1.2 Installing procer . . . . . . . . . 4.2 The Plan . . . . . . . . . . . . . . . . . . 4.3 Step 1: The Deployment Area . . . . . . 4.4 Step 2: The mongrel2.org Conguration 4.5 Step 3: Setup procer . . . . . . . . . . . . 4.5.1 The Python Examples . . . . . . 4.5.2 Testing The New Setup . . . . . 4.5.3 Nice Features of Procer . . . . . 4.6 Step 4: Static Content . . . . . . . . . . . 4.7 Step 5: Testing And Troubleshooting . . 4.8 Further Improvements . . . . . . . . . . 4.9 Deployment Tips . . . . . . . . . . . . . 5 Hacking 5.1 Front-end Goodies . . . . . . . . . . 5.1.1 HTTP . . . . . . . . . . . . . . 5.1.2 Proxying . . . . . . . . . . . . 5.1.3 WebSockets . . . . . . . . . . 5.1.4 JSSocket . . . . . . . . . . . . 5.1.5 Long Poll . . . . . . . . . . . 5.1.6 Streaming . . . . . . . . . . . 5.1.7 N:M Responses . . . . . . . . 5.1.8 Async Uploads . . . . . . . . 5.2 Introduction to ZeroMQ . . . . . . . 5.3 Handler ZeroMQ Format . . . . . . . 5.3.1 Socket Types Used . . . . . . 5.3.2 UUID Addressing . . . . . . 5.3.3 Numbers Identify Listeners . 5.3.4 Paths Identify Targets . . . . 5.3.5 Request Headers And Body . 5.3.6 Complete Message Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 22 24 26 28 29 31 32 35 37 37 39 39 40 42 42 43 44 45 48 51 51 51 52 52 53 55 56 56 57 57 57 59 59 59 60 60 60 61 61 62 62 62 63
CONTENTS
5.3.7 TNetStrings Alternative Protocol 5.3.8 Python Handler API . . . . . . . 5.4 Basic Handler Demo . . . . . . . . . . . 5.5 Async File Upload Demo . . . . . . . . . 5.6 MP3 Streaming Demo . . . . . . . . . . 5.7 Chat Demo . . . . . . . . . . . . . . . . . 5.8 Writing A Filter (BETA) . . . . . . . . . . 5.9 Other Language APIs . . . . . . . . . . . 5.10 Writing Your Own m2sh . . . . . . . . . 5.11 Cong From Anything: Experimental . 6 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v 64 65 66 67 70 72 72 74 74 75 79
vi
CONTENTS
Preface
This manual will tell you about the most awesome webserver on the planet: Mongrel2. It is written for people with a sense of humor who want to get things done with Mongrel2. That means, if youre an operations professional, software developer, hacker or just curious, its for you. However, if youre too serious and think owery language (A.K.A. good, entertaining writing) does not belong in your software manuals, then you should just go read the source code and save everyone a huge headache dealing with you. In case you havent gured it out, this book will be fun and slightly obnoxious. Thats not intended to insult you, but just to keep you interested so that you want to read it.
Typography
Usually the people running the web can be divided into three types of people: Steves, Edsgers, and Knuths. The Steves think that the entire internet should be a wonderful user experience where all pages are crafted with pixel-perfect fonts with high gloss visuals and coated with the most happy happy joy joy of all possible experiences. To them, design is paramount and actual stability isnt important unless it interferes with design. The Steves of the internet think the Edsgers of internet are destroying the universe with things like functionality, security, and stability. Just like the real Steve Jobs, they would rather everything look fantastic and then use awesome marketing to cover up any technical aws. The Edsgers feel that the internet is completely unsafe, and until it is a fully curated and crafted set of academic, peer reviewed papers, it will be a festering pile of dung. To the Edsgers, the world is dangerous and only a truly paranoid attitude toward security and stability will ensure that it becomes safe. They want every single piece of software to reject all reality and be crafted from nothing but pure mathematics, and hate the fact that the Steves want to run around painting the world with useless frivolous colors and words and things vii
PREFACE
The typography in this book, and the entire project, is for the Knuths of the world. I like to think of the Knuths as the practical yet professional types with a light sense of humor. They are the ones who are getting things done while still balancing between great typography and solid bug-free functionality. They arent zealots, but practical, straight-forward type of people. That is why this book is written in TEX, and why it uses whatever fonts TEX uses.
Chapter 1
Introduction
Mongrel2 is a web server. HTTP requests come in, HTTP responses go out. Request, response. There is nothing revolutionary or extravagant in what Mongrel2 does with a browser, apart from supporting fancy asynchronous socket protocols. To the browser, Mongrel2 is just this nice web server that has WebSockets and Flash Sockets in it. Thats it. What makes Mongrel2 special is how it satises these requests in a language agnostic and asynchronous way using a simple messaging protocol to talk to applications; not just serve les. Mongrel2 is also designed to be incredibly easy to automatically manage it as part of your infrastructure. Other web servers do some of these things, but they either do them in a bastardized way or not all of them at once. Plenty of language specic web servers like Node.js and Jetty have asynchronous operation, but theyre not language agnostic1 . Other web servers will let you talk to any language as a backend, but they insist on using HTTP proxying or FastCGI, which is not friendly to asynchronous operations. Mongrel2 is the only web server I know of that actively tries to focus on these features as a cohesive whole. Note 1 TL;DR!
Dont want to read the manual?2 You can read the Getting Started page available many languages even. Its a fast crash course in getting Mongrel2 up and running.
1 Who
CHAPTER 1. INTRODUCTION
1.2 Asynchronous
Many web servers are asynchronous internally, and some force you to know way too much about how they work internally to get anything done. What makes Mongrel2s version of asynchronous messaging different is that it extends to outside the Mongrel2 server. This is a powerful concept that even your backends can operate asynchronously using simple identication of connected clients. Other servers assume that every request is received by a browser, then sent to a backend, and then directly sent out to the client and thats it. Mongrel2 assumes that there is a connected client, and it sends requests to backends, but it makes no assumptions about how those backends respond to the clients. All it requires is that the backend application send messages addressed to the client and it will write them on the socket. Because of this design, Mongrel2 can easily house both classic HTTP clients, keep-alive style HTTP client, chunked encoding responses, JSSockets, or WebSockets using the same code.
powers is ZeroMQ, a language- and transport-mechanism-agnostic messaging system that does not require a centralized messaging server to operate. Using ZeroMQ lets Mongrel2 talk to a huge number of languages, operate within any kind of network architecture, and do it with a very simple communication model and API that most programmers can understand.
CHAPTER 1. INTRODUCTION
But, most importantly, you can write your own. You dont have to wait for a Mongrel2 developer to craft a conguration le parser for your favorite language, or use some hack job Nagios Perl junk to automate or scan it. Its SQLite3 with a solid, simple schema and even a well written Python and/or C code example showing you how it works. Nothing stops you from automating the hell out of Mongrel2 with that.
Chapter 2
Installing
Mongrel2 is designed to build on most modern Unix systems, specically Linux and Mac OSX. It is written in C (not Ruby) and uses fairly vanilla C and standard libraries, except for one piece that implements the internal coroutines. Other than this, you should be able to compile and install Mongrel2 with nothing more than make all install after youve installed all the dependencies. Now, if when I said dependencies you started to groan at having to install software to use my software, well my friend, welcome to the future. You said you dont want people reinventing the wheel, right? Great, that means you need to install software for my software to work. Its either that or wait 10 years for me to build everything from scratch like some arrogant jackass. We good now? Great, lets get started.
6 Source 1
CHAPTER 2. INSTALLING
Installing Dependencies on ArchLinux
# install ZeroMQ wget http://download.zeromq.org/zeromq-2.1.4.tar.gz tar -xzvf zeromq-2.1.4.tar.gz cd zeromq-2.1.4/ ./configure make sudo make install # install sqlite3 sudo pacman -S sqlite3
If you run into parts that your OS is missing, which is likely on Debian and SuSE systems, then youll have to go and gure out how to install it. Some distributions (like Ubuntu) split dev and runtime packages. In order to build mongrel2 on these distros, you must install libsqlite3-dev: this package contains sqlite3.h, which Mongrel2 needs during compilation. For the lazy, the command is: sudo apt-get install libsqlite3-dev Other pieces known to be missing on ubuntu-like systems: uuid-runtime: Needed by m2sh uuid command
2.2.1
The easiest way to build Mongrel2 is to use the .tar.bz2 le from the main page downloads section. Simply download it and youre done.
2.2.2
Using git
If you like living on the edge then heres how to follow the development source tree while we work on it:
First, get git running on your system, through your package manager or fetch the proper sources or binaries from the Git Download page. Once you have git you can then get the Mongrel2 source and open it up: Source 2 Cloning the Mongrel2 Source
Make sure you do this in order (just like with every set of instructions you follow) or else youll get errors.
2.3.1
If you arent running Linux chances are good this standard procedure will not work for you. The Makele lists several targets for various platforms, as of writing this there are:
1 Please
CHAPTER 2. INSTALLING
So for example you would probably install zeromq and sqlite3 as ports and then compile it like so: Source 3
# install ZeroMQ cd /usr/ports/devel/zmq make make install # install sqlite3 cd /usr/ports/databases/sqlite3 make make install # install mongrel2 cd /where/you/extracted gmake freebsd install
Thats it. Just hit CTRL-c for now and well get into playing with this setup later.
2.6. UP NEXT
Source 5
cd mongrel2 git pull # make sure you get a clean build make clean all # install it once again sudo make install
2.6 Up Next
You now should have a working Mongrel2 system installed and the m2sh conguration interface ready to go. In the rest of this manual well be simply learning how to do more with Mongrel2, like making our own congs, writing handlers, and other fun stuff.
10
CHAPTER 2. INSTALLING
Chapter 3
Managing
Mongrel2 is designed to be easy to deploy and automate the deployment. This is why it uses SQLite to store the conguration, but m2sh as an interface to creating the conguration. Doing this lets you access the conguration using any language that works for you, augment it, alter it, migrate it and automate it. In this chapter, Im going to show you how to make a basic conguration using m2sh and all the commands that are available. Youll learn how the conguration system is structured so that you know what goes where, but in the end its just a simple storage mechanism.
3.1 Model-View-Controller
When you hear Model-View-Controller, you think about web applications. This is a design pattern where you place different concerns into different parts of your system and try not to mix them too much. For an interactive application, if you keep the part that stores data (Model) separated from the logic (Controller) and use another piece to display and interact with the user (View), then its easier to change the system and adapt it over time to new features. The power of MVC is simply that these things really are separate orthogonal pieces that get ugly if theyre mixed together. Theres no math or theory that says why; just lots of experience has told us its usually a bad idea. When you start mixing them, you nd out that its hard to change for new requirements later, because youve sprinkled logic all over your web pages. Or you cant update your database because theres all these stored procedures that assume the tables are a certain way. 11
12 Note 2
CHAPTER 3. MANAGING
Apparently SQL Inspires FUD
When I rst started talking about Mongrel2, I said Id store the conguration in SQLite and do a Model-View-Controller kind of design. Immediately, people who cant read ipped out and thought this meant theyd be back in Windows registry hell, but with SQL as their only way to access it. They thought that theyd be stuck writing congurations with SQL; that SQL couldnt possibly congure a web server. They were wrong on many levels. Nobody was ever going to make anyone use SQL. That was repeated over and over but, again, people dont read and love spreading FUD. The SQLite cong database is nothing like the Windows Registry. No other web server really uses a true hierarchy; they just cram a relational model into a weirdo conguration format. The real goal was to make a web server that was easy to manage from any language, and then give people a nice tool to get their job done without having to ever touch SQL. EVER! In the end, what we got despite all this fear mongering is a bad ass conguration tool and a design that is simple, elegant, and works fantastically. If you read that Mongrel2 uses SQLite and thought this was weird, well, welcome to the future. Sometimes its weird out here (even though Postx has been doing this for a decade or more).
Mongrel2 needed a way to allow you to use various languages and tools to automate its conguration. Letting you automate your deployments is the entire point of the server. The idea was that if we gave you the Controller and the Model, then you can craft any View you wanted, and theres no better Model than a SQL database like SQLite: its embeddable, easily accessed from C or any language, portable, small, fast enough and full of all the features you need and then some. What you are doing when you use m2sh (from tools/m2sh) to congure a conguration for Mongrel2, is working with a View weve given you to create a Model for the Mongrel2 server to work with. Thats it, and you can create your own View if you want. It could be automated deployment scripts, a web interface, monitoring scripts, anything you need. The point is, if you just want to get Mongrel2 up and running, then use m2sh. If you want to do more advanced stuff, then get into the conguration database schema and see what you can do. The structure of the database very closely matches Mongrel2s internal structure, so understanding that means you understand how Mongrel2 works. This is a vast improvement over other web servers like Apache where youve got no idea why one stanza has to go in a
13
At this point, you should have seen lists of servers and hosts, seen that mongrel2 is not running, and then started it. You can nd out about all the commands and get help for them with m2sh help or ms2h help command. You can now try doing some simple starting, stopping and reloading using sudo (make sure you CTRL-c to exit from the previous start command): Awesome, right? Using just this one little management tool you are able to completely manage a Mongrel2 instance without having to hack on a cong le at all. But you probably need to know how this is all working anyway.
3.2.1
You now have done nearly everything you can to a conguration, but you might not know exactly whats going on. Heres an explanation of whats going on behind the scenes:
14 Source 7
CHAPTER 3. MANAGING
Starting, Stopping, Reloading
# start it so it runs in the background via sudo m2sh start -db tests/config.sqlite -host localhost -sudo tail logs/error.log # reload it m2sh reload -db tests/config.sqlite -host localhost tail logs/error.log # hit is with curl to see it do the reload curl http://localhost:6767/ tail logs/error.log # see if its running then stop it m2sh running -db tests/config.sqlite -host localhost m2sh stop -db tests/config.sqlite -host localhost
1. When you did m2sh start with the -sudo option, it actually runs sudo mongrel2 tests/config.sqlite localhost to start the server. 2. Mongrel2 is now running in the background as a daemon process, just like a regular server. However, what it did was chroot to the current directory and then drop privileges so that they match the owner of that directory (you). Use ps aux to take a look. 3. With Mongrel2 running, you can look in the logs/error.log le to see what it said. It should be a bunch of debug logging, but check out the messages: nice and detailed. 4. Next you did a soft reload with m2sh reload and you should notice that your mongrel2 process was able to load the new cong without restarting. 5. However, theres a slight bug that doesnt do the reload until the next request is served. Thats what the curl http://localhost:6767/ was for. 6. Now that you can see this reload work in logs/error.log, you used m2sh running to see if its running. This command is just reading the cong database to nd out where the PID le is (run/mongrel2.pid) and then checking if that process is running. 7. Finally, you tell mongrel2 to stop, and since it dropped privileges to be owned by you, you can do that without having to use sudo. All of this is happening by reading the tests/config.sqlite le and not reading any conguration les. You can now try building your own conguration that matches this one or some others.
15
main = Server( uuid="f400bf85-4538-4f7a-8908-67e313d515c2", access_log="/logs/access.log", error_log="/logs/error.log", chroot="./", default_host="localhost", name="test", pid_file="/run/mongrel2.pid", port=6767, hosts = [ Host(name="localhost", routes={ '/tests/': Dir(base='tests/', index_file='index.html', default_ctype='text/plain') }) ] ) servers = [main]
If you arent familiar with Python, then this code might look freaky, but its really simple. Well get into how its structured in a second, but to load this le we would just do this: Source 9 Loading The Simple Cong
m2sh load -config examples/configs/sample.conf ls -l config.sqlite m2sh servers m2sh hosts -server test m2sh start -name test
Notice that we didnt have to tell m2sh that the database was config.sqlite. It assumes that is the default, as well as that mongrel2.conf is the cong le
16
CHAPTER 3. MANAGING
you want. If you use those two les, then you never have to type those parameters again. With this sequence of commands you: 1. Create a raw fresh cong database name config.sqlite and load the mongrel2.conf into it. 2. List the servers it has congured. 3. List the hosts that server has, with what routes it has. 4. Start this server to try it out. By now you should be getting the hang of the pattern here, which is to use m2sh and a conguration script to generate .sqlite les that Mongrel2 understands.
17
3.4.1
Server
The server is all about telling Mongrel2 where to listen on its port, where to chroot, and general server specic deployment gear. uuid A UUID is used to make sure that each deployed server is unique in your infrastructure. You could easily use any string thats letters, numbers, or - characters. chroot This is the directory that Mongrel2 should chroot to and drop privileges. access log The access log le relative to the chroot. Usually starts with a /. Make sure you congure your server so that this and other les arent accessible, or make this owned by root. error log The error log le, just like access log. pid le Like the access log, where within the chroot directory is the pid le stored. default host The server has a bunch of hosts listed, but it needs to know what the default host is. This is also used as a convenient way to refer to this Server. bind addr The IP address to bind to, default is 0.0.0.0. port The port the server should listen on for new connections.
3.4.2
Host
A host is matched using a kind of inverse route that matches the ending of Host: headers against a pattern. Youll see how this works when we talk about routes, but for now you just need to know that request to the Server.port are routed based on these Host congurations the Server contains. name The name that you use to talk about this Host in the server conguration. matching This is a pattern thats used to match incoming Host headers for routing purposes. server If you want to set the server separately you can use this attribute. maintenance This will a setting for the future that will let you have Mongrel2 throw up a maintenance page for this host. routes This is a dict (hashmap) of the URL patterns mapped to the targets that should be run.
18
CHAPTER 3. MANAGING
3.4.3
Route
The Route is the workhorse of the whole system. It uses some very fancy but still simple code in Mongrel2 to translate Host: headers to Hosts and URL paths to Handlers, Dirs, and Proxies. path This is the path pattern that matches a route. The pattern uses the Mongrel2 pattern langauge which is a reduced version of the Lua pattern matching system. reversed Determines if this pattern is reversed, which is useful for matching le extensions, hostnames, and other naming systems where the ending is really the prex. Usually you dont set this. host You can use this attribute to set the host manually. target This is the target that should handle the request, either a Dir, Handler or Proxy. Later on, youll learn about the pattern matching thats used, but its basically a stripped down version of your normal regular expressions, but with a few convenient syntaxes for doing simple string matching. When you congure a route, you write something like /images/(.*.jpg) and the part before the ( is used as a fast matched prex, while the part after it is considered a pattern to match. When a request comes in, Mongrel2 quickly nds the longest prex that matches the URL, and then tests its pattern if there is one. If the pattern is valid, the request goes through. If not, 404.
3.4.4
Dir
A Dir is a simple directory-serving route target that serves les out of a directory. It has caching built-in, handles if-modied-since, ETags, and all the various bizarre HTTP caching mechanisms as RFC-accurate as possible. It also has default content-types and index les. base This is the base directory from the chroot that is served. Files should not be served outside of this base directory, even if theyre in the chroot. index le This is the default index le to use if a request doesnt give one. The Dir also will do redirects if a request for a directory doesnt end in a slash. default ctype The default Content-Type to use if none matches the MIMEType table. Currently, we dont offer more parameters for conguration, but eventually youll be able to tweak more and more of the settings to control how Dirs work.
19
3.4.5
Proxy
A proxy is used so that you can use Mongrel2 but not have to throw out your existing infrastructure. Mongrel2 goes to great pains to make sure that it implements a fast and dead-accurate proxy system internally, but no matter how good it is, it cant compete with ZeroMQ handlers. The idea with giving Proxy functionality is that you can point Mongrel2 at existing servers, and then slowly carve out pieces that will work as handlers. addr The DNS address of the server. port The port to connect to. Requests that match a Proxy route are still parsed by Mongrel2s incredibly accurate HTTP parser, so that your backend servers should not be receiving badly formatted HTTP requests. Responses from a Proxy server, however, are sent unaltered to the browser directly.
3.4.6
Handler
Now we get to the best part: the ZeroMQ Handlers that will receive asynchronous requests from Mongrel2. You need to use the ZeroMQ syntax for conguring them, but this means with one conguration format you can use handlers that are using UDP, TCP, Unix, or PGM transports. Most testing has been done with TCP transports. send spec This is the 0MQ sender specication, something like tcp://127.0.0.1:9999 will use TCP to connect to a server on 127.0.0.1 at port 9999. The type of socket used is a PUSH socket, so that handlers receive messages in roundrobin style. send ident This is an identier (usually a UUID) that will be used to register the send socket. This makes it so that messages are persisted between crashes. recv spec Same as the send spec, but its for receiving responses from Handlers. The type of socket used is a SUB socket, so that a cluster of Mongrel2 servers will receive handler responses but only the one with the right recv ident will process it. recv ident This is another UUID if you want the receive socket to subscribe to its messages. Handlers properly mention the send ident on all returned messages, so you should either set this to nothing and dont subscribe, or set it to the same as send ident. The interesting thing about the Handler conguration is that you dont have to say where the actual backend handlers live. Did you notice you arent
20
CHAPTER 3. MANAGING
declaring large clusters of proxies, proxy selection methods, or anything else, other than two 0MQ endpoints and some identiers? This is because Mongrel2 is binding these sockets and listening. Mongrel2 doesnt actively connect to backends; they connect to Mongrel2. This means, if you want to re up 10 more handlers, you just start them; no need to restart or recongure Mongrel2 to make them active.
3.4.7
Others
Theres also Log, MIMEType, and Setting objects/tables you can work with, but well get into those later, since you dont need to know about them to understand the Mongrel2 structure.
# here's a sample directory test_directory = Dir(base='tests/', index_file='index.html', default_ctype='text/plain') # a sample proxy route web_app_proxy = Proxy(addr='127.0.0.1', port=8080) chat_demo_dir = Dir(base='examples/chat/static/', index_file='index.html', default_ctype='text/plain') # a sample of doing some handlers chat_demo = Handler(send_spec='tcp://127.0.0.1:9999', send_ident='54c6755b-9628-40a4-9a2d-cc82a816345e', recv_spec='tcp://127.0.0.1:9998', recv_ident='') handler_test = Handler(send_spec='tcp://127.0.0.1:9997', send_ident='34f9ceee-cd52-4b7f-b197-88bf2f0ec378', recv_spec='tcp://127.0.0.1:9996', recv_ident='') profiler = Filter( name="/home/tordek/src/C/mongrel2/tools/filters/profiler.so", settings={} ) # your main host mongrel2 = Host(name="mongrel2.org", routes={ '@chat': chat_demo, '/handlertest': handler_test, '/chat/': web_app_proxy, '/': web_app_proxy, '/tests/': test_directory, '/testsmulti/(.*.json)': test_directory, '/chatdemo/': chat_demo_dir, '/static/': chat_demo_dir, '/mp3stream': Handler( send_spec='tcp://127.0.0.1:9995', send_ident='53f9f1d1-1116-4751-b6ff-4fbe3e43d142', recv_spec='tcp://127.0.0.1:9994', recv_ident='') }) # the server to run them all main = Server( uuid="2f62bd5-9e59-49cd-993c-3b6013c28f05", access_log="/logs/access.log", error_log="/logs/error.log", chroot="./", pid_file="/run/mongrel2.pid", default_host="mongrel2.org", name="main", port=6767, filters = [profiler], hosts=[mongrel2] )
22
CHAPTER 3. MANAGING
8. With all those handler targets, we can now make the mongrel2 Host with all the routes assigned once, nice and clean. However, look how I was lazy and just tossed the mp3stream demo right into the routes dict? You can totally do this and m2sh will gure it out. Remember also that you can use the blah string format to not have to double up on your \ chars in the patterns. 9. We then assign this mongrel2 variable as the hosts for the main server.
10. There is also a settings feature, which is just a dict of global settings you can tweak. In this case, were upping the number of threads that 0MQ is using for its operations. 11. Finally, we commit the whole thing to the database by passing in the servers to save and the settings to use. And that, my friends, is the most complex conguration we have so far.
23
[set] Just like a regexs [] where set is a set of chars, like [0-9] for all digits. [set] Inverse character set, so [0-9] is anything but digits. * Longest match of 0 or more of the preceding character. + Longest match of 1 or more of the preceding character. - Shortest match of 0 or more of the preceding character. ? 0 or 1 match of of the preceding character \bxy Balanced match a substring starting with x and ending in y. So \b() will match balanced parentheses. $ End of the string. Using the uppercase version of an escaped character makes it work the opposite way (e.g., \A matches any character that isnt a letter). The backslash can be used to escape the following character, disabling its special abilities (e.g., \\ will match a backslash). Anything thats not listed here is matched literally. Here are some example routes you can try to get a feel for the system: "/images/" This will just match any path that has /images/ in it without any patterns. "/" The fastest possible route you can have. "/images/(.*.jpg)" Match only requests for jpg images in the images directory. Keep in mind that this isnt actually looking in the directory, its just matching the (.*.jpg) pattern. "/images/(\a-\-\d+\.jpg)" A more complex example that matches a short sequence of 0 or more letters (remember -), then a dash (\- escapes the -), then 1 or long sequence of digits and nally a .jpg) with the \. escaping the period. That should give the idea of how you can use them. Notice also that Im using the Python "blah" string syntax which is interchangeable with the blah syntax so I dont have to double escape everything.
24 Note 3
CHAPTER 3. MANAGING
Sorry, Unicodians, Its All ASCII
Yep, I get it. You think that everyone should use UTF-8 or some Unicode encoding for everything. You despise the dominance of the A in ASCII and hate that you cant put your spoken language right in a URL. Well, I hate to say it, but tough. Protocols are hard enough without having to worry about the bewildering mess that is Unicode. When you sit down to write a network protocol, the last thing you need is a format thats inconsistent, has multiple interpretations, cant be properly capitalized or lowercased, and requires extra translation steps for every operation. With ASCII, every computer just knows what it is, and its the fastest for creating wire protocol formats. This is why, on the Internet, you have to do things to URLs to make them ASCII, like encoding them with % signs. Its in the standard, and its the smart thing to do. I dont want to have to know the difference between the various accents in your spoken language to route a URL around. I just want to deal with a xed set of characters and be done with it. Dont blame me or Mongrel2 for this, its just the way the standard is and the way to get a server that is stable and works. Protocols work better when theres less politics in their design. This means you cant put Unicode into your URL patterns. I mean, you can try; but the behavior is completely undened.
3.6.1
The routing algorithm is actually kind of simple, but its an unfamiliar algorithm to most programmers. I wont go into the details of how a Ternary Search Tree works, but basically it lets you match one prex against a bunch of other strings very fast. This data structure lets Mongrel2 very quickly determine the target for a route, and also know if it has a route at all. Typically, it can match a route in just a few characters, and reject a route in even fewer. For practical usage, its better to just read how it works, rather than how its implemented. Heres how Mongrel2 matches an incoming URL against routes youve given it: 1. Your conguration has a route for "/images/(.*.jpg)" and "/users". 2. Mongrel2 loads these and converts them to PREFIX/PATTERN pairs. For the rst one the PREFIX=images, PATTERN=(.*.jpg). For the second one its PREFIX=/users and PATTERN=None.
25
3. It stores these in the URL routes by their PREFIX, and there can be only one PREFIX at a time. This means you cant put "/foo/(.*)" and "/foo/" in at the same time (thats always redundant anyway). 4. A request comes in for /images/hello.jpg so Mongrel2 takes the whole URL and searches for the longest rst route that can possibly match. In this case, thats the /images route. 5. It checks if the route it found has a pattern, and if it does then it runs the pattern match code for the whole thing. If they match, then this is the target and its good. If not, it returns a 404. In this case the /images URL and patterns match so its good. 6. Next, a request comes in for /users/johndoe/1234. 7. Mongrel2 does the PREFIX search again, and the longest matching prex is the route for "/users" so it gets that from the routing table. 8. Since the /users route doesnt have a PATTERN, then this is the route and it passes by default. No pattern matching code is run. 9. Now for a slightly confusing result: A request comes in for /us. Since a PREFIX for "/users" exists, and its the longest rst match, it will match that route. If you wanted this condition to fail, youd need to be explicit and add on a pattern like, "/users()$" to say you need an exact match. Another option is to give a "/" route for a default location (which usually happens). 10. Finally, a request comes in for /XRAY. This will match no prex at all, so it gets a 404. That example should show you how routes work, and the important thing to realize is that theyll try to match the longest rst route as what we call the best route. If you get unexpected routing behavior, then youll want to just make them explicit by putting a pattern at the end. Finally, heres some examples directly from the unit test that we have for the routing system. Imagine we have these routes: "/" == handler0 "/users/([0-9]+)" == handler1 "/users" == handler2 "/users/people/([0-9]+)$" == handler3 "/cars-fast/([a-z]-)$" == handler4 Then this is how a set of example requests would match: /users/1234/testing - handler1
26 /users - handler2 /users/people/1234 - handler3 /cars-fast/cadillac - handler4 /users/1234 - handler1 / - handler0 /usersBLAHAHAHAHA - handler2 /us - handler2
CHAPTER 3. MANAGING
Work through those in your head so you make sure you understand them.
3.6.2
Mongrel2 works with Flash sockets out of the box (with WebSockets coming soon) and can handle either XML messages or special JSON messages. It does this by modifying the parser it has internally to parse out HTTP or (exclusive) XML and JSON messages. This feature can be used by any TCP client, not just Flash, it just happens to be a simple way to send simple async messages without using HTTP. To make it work, theres a slight modication to the routes used by JSON or XML messages. Basically, JSON routes start with a @ and XML routes start with a < and both must be terminated with a NUL byte \0. When the parser sees these at the beginning of a request, it parses that message and sends it as-is to your target handler. Lets look at two examples from the chat demo and from some test suites: "@chat": chat_demo "<test": xml_demo The rst one will take any Flash (or just TCP connection) that sends lines like @chat {"msg": "hello"}\0 and route those to the chat_demo handler. You can connect, and then just stream these JSON messages all you want, and handlers can send back the same responses. In fact, as long as you dont include a \0 character, you could probably send anything you want. The second route will take any XML that is wrapped in a <test> tag and send that to your handlers. That means you can send <test name="joe"><age>21</age></test> and it will send it to xml_demo. This is powerful because Mongrel2 now becomes a generic XML or JSON messaging server very easily. For example, I wrote a simple little BBS demo with Mongrel2 and wrote a very basic terminal client in Python for people to use
27
instead of the browser. Look at examples/bbs/client.py to see how that works in full, but the meat of it is: Source 11
CONN = socket.socket() CONN.connect((host, port)) def read_msg(): reply = " " ch = CONN.recv(1) while ch != \0 : reply += ch ch = CONN.recv(1) return json.loads(b64decode(reply)) def post_msg(data): msg = @bbs %s \x00 % ( json.dumps({ type : msg , msg : data})) CONN.send(msg)
In that code, notice how (for historical reasons due to Flash sucking) the response is base64 encoded, but your handler doesnt have to do that. You can just adopt the same protocol back. Other than that, the BBS example client is just opening a socket and sending message, but Mongrel2 is converting them to messages to backend handlers for processing. Finally, heres the grammar rules in the parser for handling these messages: Source 12 JSON/XML Message Grammar
rel_path = ( path ? ( " ; " params) ? ) ( " ? " query) ? ; SocketJSONStart = ( " @ " rel_path); SocketJSONData = " { " any* " } " :>> " \0 " ; SocketXMLData = ( " < " [a-z0-9A-Z\-.]+) ( " / " | space | " > " ) any* " > " :>> " \0 " ; SocketJSON = SocketJSONStart " SocketXML = SocketXMLData; " SocketJSONData;
If you read that carefully, youll see you can actually pass query strings and path parameters to your JSON socket handlers. Thats currently not used, but in the future we might.
28
CHAPTER 3. MANAGING
One caveat to this whole feature is these targets can only be routed to the Server.default\_host of the server. Theres not enough information in these routes to determine a target host (like the Host: header in HTTP) so you can only send it to the default target host.
> m2sh log [2010-07-18T04:14:53, mongrel2@zedshaw, init_command] /usr/bin/m2sh init [2010-07-18T04:15:06, mongrel2@zedshaw, load_command] /usr/bin/m2sh load [2010-07-18T04:22:06, mongrel2@zedshaw, load_command] /usr/bin/m2sh load [2010-07-18T04:23:32, mongrel2@zedshaw, load_command] /usr/bin/m2sh load [2010-07-18T04:26:16, mongrel2@zedshaw, upgrade] Latest code for Mongrel2. [2010-07-18T18:05:59, mongrel2@zedshaw, load_command] /usr/bin/m2sh load [2010-07-18T20:09:01, mongrel2@zedshaw, init_command] /usr/bin/m2sh config [2010-07-18T20:09:02, mongrel2@zedshaw, load_command] /usr/bin/m2sh config > m2sh commit -what mongrel2.org -why "Testing things out."
The motivation for this feature is the trend that ops stores server congurations in revision control systems like git or etckeeper. This works great for holding the conguration les, but it doesnt tell you what happened on each server. In many cases, the conguration les also need to be reworked or altered for each deployment. With the m2sh log and commit system, you can augment your revision control with deployment action tracking.
29
Later versions of Mongrel2 will keep small amounts of statistics which will link these actions to changes in Mongrel2 behavior like frequent crashing, failures, slowness, or other problems. Basically, theres nowhere to hide. Mongrel2 will help operations gure out who needs to get red the next time Twitter goes down.
30
CHAPTER 3. MANAGING
kill id=ID Does a forced close on the socket that is at this ID from the status net command. This is a rather violent way to kill a connection so dont do it that often, but if youre overloaded then this is where to go. control stop Shuts down the control port permanently in case you want to keep it from being accessed for some reason. You then use the control port by running m2sh: m2sh control -every m2 [test]> help name help stop stop the server (SIGINT) reload reload the server help this command control_stop stop control port kill kill a connection status status, what=[ net | tasks ] terminate terminate the server (SIGTERM) time the server s time uuid the server s uuid info information about this server m2 [test]> info port: 6767 bind_addr: 0.0.0.0 uuid: f400bf85-4538-4f7a-8908-67e313d515c2 chroot: ./ access_log: .//logs/access.log error_log: /logs/error.log pid_file: ./run/mongrel2.pid default_hostname: localhost m2 [test]> The protocol to and from the control socket is a simple tnetstring in and out that any langauge can read. Heres a nearly complete Python client that is using the control port: You obviously dont need to do this, but should you want to do something special like a management interface, this is your start.
while True: cmd = raw_input( " > " ) # will only work with simple commands that have no arguments ctl.send(tnetstrings.dump([cmd, {}])) resp = ctl.recv() pprint(tnetstrings.parse(resp)) ctl.close()
32
CHAPTER 3. MANAGING
... Launching server localhost XXX on port 6767 ... > m2sh start -db config.sqlite -host localhost Not sure which server to run, what I found: NAME HOST UUID -------------localhost localhost XXX localhost localhost XXX * Use -every to run them all. > m2sh start -db config.sqlite -uuid XXX Launching server localhost XXX on port 6767 ... > m2sh running -db config.sqlite -every Found server localhost XXX RUNNING at PID 28525 PID file run/mongrel2.pid not found for server localhost XXX > m2sh stop -db config.sqlite -every
Mongrel2 will read these on the y and write INFO log messages telling you what the settings are so you can debug them if they cause problems. The list of available settings are:
33
control port=ipc://run/control This is where Mongrel2 will listen with 0MQ for control messages. You should use ipc:// for the spec, so that only a local user with le access can get at it. limits.buffer size=2 * 1024 Internal IO buffers, used for things like proxying and handling requests. This is a very conservative setting, so if you get HTTP headers greater than this, youll want to increase this setting. Youll also want to shoot whoever is sending you those requests, because the average is 400-600 bytes. limits.client read retries=5 How many times it will attempt to read a complete HTTP header from a client. This prevents attacks where a client trickles an incomplete request at you until you run out of resources. limits.connection stack size=32 * 1024 Size of the stack used for connection coroutines. If youre trying to cram a ton of connections into very little RAM, see how low this can go. limits.content length=20 * 1024 Maximum allowed content length on submitted requests. This is, right now, a hard limit so requests that go over it are rejected. Later versions of Mongrel2 will use an upload mechanism that will allow any size upload. limits.dir max path=256 Max path length you can set for Dir handlers. limits.dir send buffer=16 * 1024 Maximum buffer used for le sending when we need to use one. limits.fdtask stack=100 * 1024 Stack frame size for the main IO reactor task. Theres only one, so set it high if you can, but it could possibly go lower. limits.handler stack=100 * 1024 The stack frame size for any Handler tasks. You probably want this high, since theres not many of these, but adjust and see what your system can handle. limits.handler targets=128 The maximum number of connection IDs a message from a Handler may target. Its not smart to set this really high. limits.header count=128 * 10 Maximum number of allowed headers from a client connection. limits.host name=256 Maximum hostname for Host speciers and other DNS related settings. limits.mime ext len=128 Maximum length of MIME type extensions. limits.proxy read retries=100 The number of read attempts Mongrel2 should make when reading from a backend proxy. Many backend servers dont buffer their I/O properly and Mongrel2 will ditch their HTTP response if it doesnt get a header after this many attempts.
34
CHAPTER 3. MANAGING
limits.proxy read retry warn=10 This is the threshold where you get a warning that a particular backend is having performance problems, useful for spotting potential errors before they become a problem. limits.url path=256 Max URL paths. Does not include query string, just path. superpoll.hot dividend=4 Ratio of the total (like 1/4th, 1/8th) that should be in the hot selection. Set this higher if you have lots of idle connections; set it lower if you have more active connections. superpoll.max fd=10 * 1024 Maximum possible open les. Do not set this above 64 * 1024, and expect it to take a bit while Mongrel2 sets up constant structures. upload.temp store=None This is not set by default. If you want large requests to reach your handlers, then set this to a directory they can access, and make sure they can handle it. Read about it in the Hacking section under Uploads. The le has to end in XXXXXX chars to work (read man mkstemp). upload.temp store mode=0666 The mode to chmod any les uploaded to upload.temp store. zeromq.threads=1 Number of 0MQ IO threads to run. Careful, weve experienced thread bugs in 0MQ sometimes with high numbers of these. limits.tick timer=10 Mongrel2 keeps an internal clock for efciency and to run the timeouts. This is how often that clock updates, and defaults to 10 seconds. limits.min ping=120 Minimum time since last activity before considering closing a socket. Set to 0 to disable it. limits.min write rate=300 Minimum bytes/second written before considering closing a socket. Set to 0 to disable it. limits.min read rate=300 Minimum bytes/second read before considering closing a socket. Set to 0 to disable it. limits.kill limit=2 How many of min ping, min write rate, and min read rate have to trigger before a socket is killed. You can also update your mimetypes in the same way, just set a variable with them:
35 Changing Mimetypes
I actually have a shell script kind of like this since I can never remember how to set this stuff up with openssl. Also, you should really adjust the RSA key strength from 512 to something youre comfortable with. Im using a weak key
36
CHAPTER 3. MANAGING
here so you can do performance testing and thrashing and then compare with your real key later. Once you have that done, you just have to add three little settings to your mongrel2 conf: 1. Add the settings certsdir pointed at ./certs/, make sure it has the trailing slash. 2. Add the Server.use\_ssl = 1 value to the Server that has this UUID you just created a cert for. 3. Optionally, set the settings ssl ciphers to SSL RSA RC4 128 SHA so you can play with the performance of a weak cipher. If you unset this then Mongrel2 will use the best one a browser wants. After you have those changes your cong should look something like this:
Source 18
main = Server( uuid="2f62bd5-9e59-49cd-993c-3b6013c28f05", use_ssl=1, access_log="/logs/access.log", error_log="/logs/error.log", chroot="./", pid_file="/run/mongrel2.pid", default_host="mongrel2.org", name="main", port=6767, hosts=[mysite] ) settings = { "certdir": "./certs/" "ssl_ciphers": "SSL_RSA_RC4_128_SHA" } servers = [main]
Get that written, rerun m2sh config to make the new cong, restart Mongrel2 (you cant reload to enable SSL), and it should be working. After you get this working you just have to get your own certicate, put it in the certs directory with the right lename, and you should be good to go.
37
3.11.1
Weve got experimental SSL caching working, which will try to reuse the browsers SSL session if its there. This is meant to be a trade-off between memory and performance, so it can chew a bunch of RAM if you have a lot of SSL trafc over a short period of time. Well be making the caching more congurable, but for now, its working and does speed up SSL clients that do it properly.
38
CHAPTER 3. MANAGING
Source 19
null = Filter(name="/usr/local/lib/mongrel2/filters/null.so", settings={ "extensions": ["*.html", "*.txt"], "min_size": 1000 }) main = Server( uuid="f400bf85-4538-4f7a-8908-67e313d515c2", access_log="/logs/access.log", error_log="/logs/error.log", chroot="./", default_host="localhost", name="test", pid_file="/run/mongrel2.pid", port=6767, hosts = [ Host(name="localhost", routes={ '/tests/': Dir(base='tests/', index_file='index.html', default_ctype='text/plain') '/nulltest/': Proxy(addr='127.0.0.1', port=8080) }) ] filters = [null] ) servers = [main]
Chapter 4
Deploying
I am now going to try to get you to setup a small, tiny, little version of a good deployment that matches the conguration of the site at http://mongrel2.org, with all the examples running. This conguration will give you all the tools you need to make automated and managed deployments, but it is using small scale tools. The idea is that you learn what is involved in a nice, easy-tomanage setup, using simple things rst, then you can extrapolate that out into your own setup or something better.
40 Note 4
CHAPTER 4. DEPLOYING
Learning Python
Why should you learn programming? The trend is that if you are a system administrator who cant code, you are on your way out. Eventually, youll be in charge of automating systems; not manually managing them, and if you dont believe me then what do you think all those managed service companies are doing? Alright, so you need to learn to code, but most of the books suck for really learning if you know nothing. This is why I started my own book: Learn Python The Hard Way, for people who know nothing about programming but need or want to learn. It teaches Python, but it mostly teaches all the things programmers actually learn before they learn programming. When youre done with my book, youll have your programming brown belt. That means you can then move onto one of many other free online books and really learn programming, and have a higher chance of actually learning it. If you cant code Python then you can probably muddle through this and you may learn something, but learning Python will be important later. But dont read Dive Into Python. It is a horrible introduction.
4.1.1
Introducing procer
When I started working on this little manual, I wanted to get you into setting up a well-managed and automated deployment system. The m2sh program does much of the automation you need, but Mongrel2 also has to talk to quite a few separate little pieces that run as separate processes. Trying to juggle all these processes without a tool to help is a nightmare. You end up writing init scripts and merging them into your boot process and all sorts of crazy antics just so you can run a stupid hello world demo. What I needed was a user space process manager. These are programs that run other programs, but, more importantly, try to keep those other programs running without much human intervention. When you need to deploy a ton of processes that all have to be running, these USPMs are fantastic. They usually read some startup prole describing what needs to start and what they depend on, and then it kicks everything into gear and watches them. If any of the processes crash, they try to restart them. Very simple. Theres just one catch: all of them suck. Theres daemontools, which barely builds (if at all) and then assumes that daemons dont fork. Stupid. Theres minit, which bafingly required dietlibc to even compile and assumed it was going to be the one true init (not user space at all). Theres cinit, which got
41
through a compile, then barfed on its documentation, and the end result is some huge number of weird shell scripts to make it work, and, again, it wants to be the one true init. Finally , runit is some of the worst C code Ive seen in years and has the same weird design as daemontools. After trying every single one, I just gave up. Either they didnt build, were too complex, expected to be the one true init, poorly documented, not maintained, and denitely not going to work for this manual. My only choice was to shave a yak and write my own. The end result is procer, which lives in examples/procer and does most of what you need in a USPM. It works a lot like daemontools or minit, but is much simpler, with these differences: 1. It is much simpler, with only a single command to start all your stuff and keep it running. 2. It will build anywhere Mongrel2 builds, because it reuses the libm2.a library from the Mongrel2 project. 3. It doesnt want to be the one true init, or even expect to be running constantly. You can start it and stop it and it will only run whats not already running. 4. It assumes that programs will always daemonize and create a PID le. This turns out to be way easier to manage than what daemontools does, so Im sort of bafed why daemontools is how it is. 5. It has dependency management so that you can have processes start only after others have nished. 6. It still uses simple les to congure itself that are in separate directories. 7. It can be run as root and, like Mongrel2, it will drop privileges to the owner of the prole directory before it runs the command. This is incredibly useful because it lets you setup scripts that run as other users without much conguration or fuss. 8. It is dinky, tiny and well written so you can understand it, even though its written in C. 9. Best of all, I can use it in this book and you wont go insane trying to install it or use it like the others. Of course, if you have something else you like then, please, use it. Anything that automates process management will be your friend. In this manual, to keep things simple and easily understood, Ill be using procer to tell you how to setup everything.
42 Note 5
CHAPTER 4. DEPLOYING
Alternatives to procer
I wrote procer mostly for this book, but I also use it for my Mongrel2 deployments. It works for me but you can try other solutions. By default, Mongrel2 will work with either daemontools/runit style, or init.d style launchers. If Mongrel2 runs as a regular user, it assumes that you want runit style (dont fork, write to stdout/stderr). If you run as root, it assumes you want init.d style like what procer uses (fork, drop priv, chroot, etc.). You should check out proclaunch as another alternative that is similar to procer, and inspired by procer, but written in Perl with a few more features. Either way, Mongrel2 is practical, and does generally the right thing with todays tools. Want to use daemontools? Fine, just run it mongrel2 config.sqlite server_uuid and itll work right. Want to put it in init.d or use procer or similar? Fine, run it as root.
4.1.2
Installing procer
Installing procer is very easy. Its a single little binary and it lives in tools/procer in the Mongrel2 source. Heres how youd install it totally from scratch as if you hadnt even build Mongrel2 yet: Source 20
cd projects/mongrel2 make clean all && sudo make install
Install procer
Thats the entire install process, and now procer is in /usr/local/bin so you can use it. In the rest of this chapter youll learn how to use procer by just setting up the Mongrel2 demo completely and messing around with it.
43
5. Test out that procer is keeping things running and play with taking things down and up and using m2sh to work with the deployment. Once you have this setup working, you can then start to make your own deployments and tweak things as you need for your own applications. Remember that the goal is to get you to automate everything as much as possible, so you can go further than this then do it.
Hopefully, youre starting to see how you could easily automate this so that you dont have to do this all the time. Im just showing you how to make the sausage so that you know where everything goes. Future versions of m2sh will most likely create deployment directories like this automatically. What weve done here is the following:
44
CHAPTER 4. DEPLOYING
1. Setup a /deployment directory well put everything in. 2. Created run, tmp, logs, and profiles that Mongrel2 and procer need to run. 3. In proles we started dirs for chat, mp3stream, handlertest, web and mongrel2, that procer will read les out of to get all our gear up and running. 4. Copied the mongrel2.conf example le over to our deployment so we can modify it. 5. Initialized the config.sqlite le well be lling in with our modifed mongrel2.conf.
m2sh start -db config.sqlite -host localhost # hit C to exit out m2sh start -db config.sqlite -host localhost -sudo less logs/error.log m2sh stop -db config.sqlite -host localhost -murder
handlertest
mongrel2
mp3stream
web
# make all the restart settings for i in *; do touch $i/restart; done # make all the empty dependencies for i in *; do touch $i/depends; done # setup the pid_files to some sort of default for i in *; do echo $PWD/$i/$i.pid > $i/pid_file; done cat chat/pid_file # get the run script setup to do nothing for i in *; do echo #!/bin/sh > $i/run; done for i in *; do chmod u+x $i/run; done # check out what we did ls -lR
With all of that, you can then try to run procer to watch it fail but still try to run everything: sudo procer $PWD $PWD/../run/procer.pid less error.log
46
CHAPTER 4. DEPLOYING
This is assuming that you are still in the profiles directory. You should see the le error.log get created and probably some messages printed to the screen. Just ignore any mention of Mongrel2 since thats probably just cruft from the libm2.a we havent removed. Take a look in the error.log and youll see its not necessarily errors but information on how things were run. You should see something like this for each prole: Source 24 First Dummy Run Of procer
DEBUG procer.c:232: Loading 5 actions. DEBUG procer.c:83: STARTED chat ERROR Failed to open PID file /home/zedshaw/deployment/profiles/chat/chat.pid for reading. ERROR Failed to open PID file /home/zedshaw/deployment/profiles/chat/chat.pid for reading. INFO No previous Mongrel2 running, continuing on. DEBUG procer.c:37: ACTION: command=/home/zedshaw/deployment/profiles/chat/run, pid_file=/home/zedshaw/deployment/profiles/chat/chat.pid, restart=1, depends=(null) DEBUG procer.c:56: WAITING FOR CHILD. INFO Now running as UID:1000, GID:1000 DEBUG procer.c:60: Command ran and exited successfully, now looking for the PID file. ERROR chat didn't make pidfile /home/zedshaw/deployment/profiles/chat/chat.pid.
Ive cleaned this up a bit and, again, ignore that its saying Mongrel2; thats just cruft from the library since it was originally designed for Mongrel2. What you can see here is the following: 1. It starts up and says it found 5 proles. 2. It starts chat, and says theres no PID le so its good to continue. 3. It reports what ACTION its running, so you can see the cong. 4. It spawns off your run script, drops privilege and says its WAITING for your script to exit. 5. After your script runs, it looks for the PID le you gave in pid file and, if its not there, it exits that action. 6. It does this for all of them and, since none of them run right, procer exits.
47
echo "m2sh start -db config.sqlite -host localhost" >> profiles/mongrel2/run # check out the results cat profiles/mongrel2/run #!/bin/sh cd /home/YOU/deployment m2sh start -db config.sqlite -host localhost
Obviously, you dont have to use a series of echo commands to make these scripts. You can edit them just ne, were just doing it this way so that you can follow along easier. Now, make sure you dont have any other Mongrel2 processes running, and then start procer again to see if it starts this conguration correctly. Source 26
cd /deployment # clear out the error.log for testing rm profiles/error.log # start procer sudo procer $PWD/profiles $PWD/procer.pid # see if procer is running ps ax | grep procer # should see: # 17934 ?
Ss
Ssl
48
CHAPTER 4. DEPLOYING
To watch procer in action, try doing m2sh stop -db config.sqlite -host localhost -murder and then look at profiles/error.log and watch Mongrel2 come right back.
4.5.1
Weve got a good setup of procer going and it keeps Mongrel2 running, so lets setup a similar thing for each of our little Python demos that well need. In order to do this, though, we sort of have to hack in making them daemonize and create PID les with a little shell script help. Lets start with the chat demo and, assuming your mongrel2 source is in /projects/mongrel2, you will change profiles/chat/run to be like this: Source 27
#!/bin/sh set -e DEPLOY=/home/YOU/deployment SOURCE=/home/YOU/projects/mongrel2 cd $SOURCE/examples/chat # WARNING: on some systems the nohup doesnt work, like OSX # try running it without nohup python -u chat.py 2>&1 > chat.log & echo $! > $DEPLOY/profiles/chat/chat.pid
This little script uses some funky features you might not be familiar with, but which are nice to learn, so lets take a look: 1. The rst trick is set -e, which tells bash to bail if theres any errors in your script. This is a huge life saver in system scripts. 2. Next, you point some variables at where the deployment and Mongrel2 source live, remembering to not type YOU but your username. 3. After that, you run the chat.py using a program called nohup. This basically daemonizes your script by redirecting output and preventing the program from exiting, and then you background it with &. 4. The nal thing we do is echo the magic variable $! (the PID of the last process started in the background) to the chat.pid le in the prole directory. When you run this manually, you should see something like this: ./profiles/chat/run nohup: redirecting stderr to stdout
49
After all that, you can then try out procer again to see if it properly runs the chat demo as well as mongrel2:
Source 28
# run procer to get stuff started sudo procer $PWD/profiles $PWD/run/procer.pid # see if its all running ps ax | grep procer # should see: # 19607 ?
Ss
Ssl
Sl
# try killing chat to see if it comes back kill -TERM cat profiles/chat/chat.pid ps ax | grep chat # should see: # 19669 ?
Sl
If you go look at profiles/error.log, youll see that procer is also running each of them as the right user, with chat being run as you, but Mongrel2 being run as root so it can chroot/drop privileges properly. Rather than give you a walk through each of these setups, heres the run scripts for the remaining les:
50
CHAPTER 4. DEPLOYING
Source 29
profiles/handlertest/run #!/bin/sh set -e DEPLOY=/home/YOU/deployment SOURCE=/home/YOU/projects/mongrel2
cd $SOURCE/examples/http_0mq # WARNING: on some systems the nohup doesnt work, like OSX # try running it without nohup python -u http.py 2>&1 > http.log & echo $! > $DEPLOY/profiles/handlertest/handlertest.pid profiles/mp3stream/run #!/bin/sh set -e DEPLOY=/home/YOU/deployment SOURCE=/home/YOU/projects/mongrel2 cd $SOURCE/examples/mp3stream # WARNING: on some systems the nohup doesnt work, like OSX # try running it without nohup python -u handler.py 2>&1 > mp3stream.log & echo $! > $DEPLOY/profiles/mp3stream/mp3stream.pid profiles/web/run #!/bin/sh set -e DEPLOY=/home/YOU/deployment SOURCE=/home/YOU/projects/mongrel2 cd $SOURCE/examples/chat # WARNING: on some systems the nohup doesnt work, like OSX # try running it without nohup python -u www.py 2>&1 > www.log & echo $! > $DEPLOY/profiles/web/web.pid
51
4.5.2
Once everything is running and procer is maintaining it, you just need to see if things work. Heres some curl commands to try: Source 30
curl http://localhost:6767/ # Hello, World! curl http://localhost:6767/handlertest
4.5.3
Theres some nice subtle features you get from using procer to run your stuff: Faster Development A great thing about procer is once you get all of this setup, it cuts down on a lot of your setup time and development time because it will properly restart things for you. This means you can simply make changes to code or congs, and then just kill the process and procer will kick it back over automatically. Easy Automation You should start to see how you could automate creating proles for new processes since the setup is consistent. profiles/run.log All your commands will have their output sent to this le so you can see how they might be blowing up in your scripts. Restart State Maintained Since procer is just tracking PID les and processes, if you shut it down, it wont kill the world. When you start it back up, it just starts new stuff or stuff it needs, then goes back to supervising. This means you can change the congs for procer then just kick it over and itll do the right thing. The key thing, though, is that you now have the whole application for the mongrel2.org demo up and running, including automated process management, conguration, and managing everything.
52 Source 31
CHAPTER 4. DEPLOYING
Setting Up Static Content
cp -r /projects/mongrel2/examples/chat/static static/chatdemo m2sh stop -db config.sqlite -host localhost -murder curl -I http://localhost:6767/chatdemo/
and it came back because of procer. If you do your curl check too fast, you might miss it, so just wait a bit.
53
54
CHAPTER 4. DEPLOYING
clusters. I know that as long as theres the right user on the target, Im set. 5. Use GNU screen or die. 6. Keep your cong.sqlite and the .conf le in your chroot, and keep your content and everything else under that. This makes sure that the cong isnt accessible outside your content directories. Mongrel2 helps you get this right by not allowing certain Dir congurations that would expose your chroot to the world.
Theres a few additional tips for people who want to use alternative process supervision like daemontools, runit, or init.d setups. No matter what you use, you should probably follow this advice: 1. Whatever you use for process management, make sure it can run stuff as not root and can do chroot for you. If youre running your Mongrel2 as root, youre doing it wrong. Actually, if youre running any services as root that dont absolutely need to be, youre doing it wrong. 2. Mongrel2 is happy to run as a regular user, and assumes that if you do not run as root, then you probably want to run under daemontools or similar. It wont chroot or drop priv and logs to stdout/stderr. 3. If you need to bind to port 80 but run under daemontools as a regular user, then use privbind to do it. This tool will run any command, like mongrel2 but it does it in a way that lets the executable grab ports below 1024. This restriction on ports is actually really stupid so dont worry about doing this. 4. Make sure your process monitor is not a single point of failure. Some of them out there will take your whole world down if they crash. Try doing a harsh kill on your process manager and see how it behaves. As much as they like to tell you not to worry about this because they run forever, everything has bugs and stupid people tend to kill things they dont get. If taking one process down nukes your whole server, then thats a bad design. As we work on the next phase of Mongrel2 development, this will improve, so watch for news about deployment and real applications.
Chapter 5
Hacking
This chapter is all about making cool things with Mongrel2. It covers all the non-deployment features that you get from the browsers side and the handler/backend side of your application. Ill show you how the chat demo works for the async web sockets. Ill get into writing your own handlers using a few other demos. Ill cover some of the interesting things you can do with Mongrel2 you cant do with other servers. Finally, Ill get into practical things, when to do proxying and when to use a 0MQ handler. For the majority of this chapter, Ill be using Python, but the demos should translate to the other languages that are implemented. Ill periodically show how another language does one of the demos, so you can get the idea that Mongrel2 is language agnostic. In no way should you take me using Python in this chapter to mean you cant use something else for your handlers. Currently supported languages are: Python The directory examples/python contain the Mongrel2 Python library m2py. Ruby Probably the most extensively supported language, with good Rack support, by perplexes on github. C++ C++ support by akrennmair on github. PHP PHP support by winks on github. C You can also write handlers in C using the Mongrel2 library, but its really rough, and not recommended yet. A C library will come, though. Others? ZeroMQ supports Ada, Basic, C, C++, Common Lisp, Erlang, Go, Haskell, Java, Lua, .NET, Objective-C, ooc, Perl, PHP, Python, and Ruby, so after reading this chapter you can easily write handlers in any of those 55
56 languages too.
CHAPTER 5. HACKING
However, no matter how many languages Mongrel2 supports, you will still have applications that cant t into 0MQ handlers and just work better as classic web apps, either because youve already written them and have existing infrastructure, or because of some architectural issues that require it to run traditionally. Because of that, Mongrel2 supports HTTP proxying, which allows you to route requests to basic web server backends that dont support 0MQ. Note 6 What About FastCGI/AJP/CGI/SCGI/WSGI/Rack?
Nothing prevents you from writing your own connector between Mongrel2 and your deployment protocol of choice. If you need to run FastCGI or AJP in your environment, then your best bet is to just make a handler that translates Mongrel2 requests to the protocol you need and back. The Mongrel2 format is very easy to parse and translate, so you should be able to do it with no problem. The Ruby library already supports Rack as an example, and Python will support WSGI soon. However, Mongrel2 itself doesnt support any of these directly. Doing so would bring back the language specic infections that cause other web servers to go south. The design of most of these protocols tends to be either before the modern web, or specic to one particular language. Instead of trying to cater to all the possible languages out there, Mongrel2 just gives the tools to connect to it yourself.
5.1.1
HTTP
Mongrel2 uses the original Mongrel parser that powers quite a few other web servers and large, successful websites. This parser is rock solid, dead accurate, and by design blocks a lot of security attacks. For the most part you dont have to worry about this and just need to know Mongrel2 is using the same stable HTTP processing that has been working great for many years.
57
Another way to put this is if Mongrel2 says your request is invalid, it most denitely is.
5.1.2
Proxying
Youve already seen congurations that have the Proxy routes working, so it should be easy to understand whats going on. You just create routes to backends that are HTTP servers and Mongrel2 shuttles requests to them, then proxies responses back. The Proxying support in Mongrel2 is accurate, but its not very capable right now. For example, theres not round-robin backend selection, or page caching, or other things you might need for more serious deployments. Those features will come eventually, though. What you do get with Mongrel2s proxying, though, is a dead accurate way of slicing up your application by routes. Other web servers make you go through great pain in order to have some URLs go to a proxy and others go to handlers or directories. They make you use odd le syntax, weird pseudo-turing logic if-statements, and other odd hacks to get exible route selection. They also tend to not maintain keep-alives properly between proxy requests and other requests. Mongrel2 uses the exact same routing syntax for all backends and has no distinction between them. It also properly does keep-alives for as long as it is efcient to do so.
5.1.3
WebSockets
Mongrel2 does not support WebSockets because the original protocol was a complete ugly hack with security holes galore. Theyve since xed the entire protocol and well be implementing the hybi-07 version of the protocol in the 1.7 or 1.8 release.
5.1.4
JSSocket
The Mongrel2 chat demo uses JSSocket to do its magic, and it works great, but it requires Flash and, oh, man, do I absolutely hate Flash. However, it works, and works now, and works in every browser, even really old, busted ones. That means its the rst thing we implemented and the one well keep for a while until it proves itself not useful. The chat demo well cover will show you how to hook this up for fast async messaging and presence detection.
58
CHAPTER 5. HACKING
Note 7
I dont know why, but people who implement RFCs pick up very weird cargo cult beliefs peddled by the people who write the standards. In HTTP it was two things which the creators of HTTP have actually back-peddled on: Accept everything, and keep-alives with pipe-lines. The truth is, if you want a secure server of any kind, blindly accepting every single thing any idiot sends you is going to open your server up to a huge number of attacks. If you look at every attack on existing HTTP servers youll nd that about 80% of them are exploiting ambiguous parts of the HTTP grammar to pass through malicious content or overow buffers. In Mongrel2 we use a parser that rejects invalid requests from rst basic principles using technology thats 30 years old and backed by solid mathematics. Not only does Mongrel2 reject bad requests, but it can tell you why the request was bad, just like a compiler. This doesnt mean Mongrel2 is ruthless, but it denitely doesnt tolerate ambiguity or stupidity. Mongrel2 completey supports keep-alives because now, since its not using Ruby at all it can scale up beyond 1024 le descriptors. Ruby was limited in the number of open les a process could have, so the original Mongrel had to break keep-alive and kill connections in order to save itself from greedy browsers that never close them. Mongrel2 doesnt have this limitation, so it uses full keep-alives and has a dead accurate state machine to manage them correctly. Where problems come in is with pipe-lined requests, meaning a browser sends a bunch of requests in a big blast, then hangs out for all the responses. This was such a horrible stupid idea that pretty much everone gets it wrong and doesnt support it fully, if at all. The reason is its much too easy to blast a server with a ton of request, wait a bit so they hit proxied backends, and then close the socket. The web server and the backends are now screwed having to handle these requests which will go nowhere. Mongrel2 does not support pipe-lined requests. It sends one, and waits for the reponse, and if you want more, then tough. Screw you because it has no advantage for Mongrel2 and dubious advantages to you. It is simply one more attack vector for the server and is rejected outright. These two things are rejected outright by Mongrel2 simply because they are stupid ideas and in 2010 nobody should be writing clients so badly that they need these features.
A quick note for people coming from other web servers. If you use nginx then you are probably familiar with the concept of proxying to a backend like Ruby on Rails or Django. If you use PHP or another language, you may be used to a system like mod php which manages your code for you and reloads when you make changes. If you use Apache, then you probably think in terms of virtual hosts and mod rewrite rules. In Mongrel2 all the same concepts are there, its just cleaned up. If you want Mongrel2 to nginx/mod rewrite style talk to another backend web server, then thats Proxying. If you want to have fast backend handlers then thats 0MQ Handlers. We really dont have anything like mod php because the whole idea of embedding a programming language runtime inside Mongrel2 would defeat the point of making it language agnostic.
5.1.5
Long Poll
Mongrel2 just works as if everything is an HTTP long poll, its just that normal request/responses are super fast long polls. For the most part you dont even need to know this exists; its just how things are and they make perfect sense. You get requests from a certain server with a certain connected identity, and then you send stuff to that target. Thats it. If you send it one response, or a stream of them, or setup a long poll conguration, then thats up to you.
5.1.6
Streaming
Because everything in Mongrel2 is asynchronous, and it allows you to target any connected listeners from your handlers, even with partial messages, you can easily do efcient streaming applications. ZeroMQ is an incredibly efcient transport mechanism, and with it you can send tons of information to many browsers or clients at once. This means streaming video and MP3 streams to listeners is very trivial. Well cover the mp3stream example where you get to see a simple implementation of the ICY MP3 streaming protocol.
5.1.7
N:M Responses
What makes streaming, async messaging, and long poll designs so efcient in Mongrel2 is that you can send one message and target up to 128 clients with that one message. This means sending large scale replies to many browsers requires less copying of the message and less transports.
60
CHAPTER 5. HACKING
In addition to this, you can setup Mongrel2 with the help of some 0MQ to send one request from a browser to as many target handlers as you like. You can even send them messages using OpenPGM for sending UDP messages reliably to clusters of computers. This means that Mongrel2 is the only web server capable of sending one request from a browser to N backends at once, and then return the replies from these handlers to M browsers. Not exactly sure what you could write with that, but its probably something really damn cool.
5.1.8
Async Uploads
Mongrel2 also solves the problem of large uploads choking your server because you cant stop them before theyre complete. Mongrel2 will stream large requests to temporary les, but it sends your handlers an initial upload started message. When the upload is done, you get a nal upload nished message. If, at any time, you want to kill the upload, you just send a 0-length reply (the ofcial KILL MESSAGE) and the whole thing is aborted and cleaned up.
61
5. Be easy to parse and generate inside Mongrel2 without have to parse the entire message to do routing or analysis. 6. Be useful within ZeroMQ so that you can do subscriptions and routing. To satisfy these features we use different types of ZeroMQ sockets (soon to be congurable), a request format that Mongrel2 sends and a response format that the handlers send back. Most importantly, there is nothing about the request and response that must be connected. In most cases they will be connected, but you can receive a request from one browser and send a response to a totally different one.
5.3.1
First, the types of ZeroMQ sockets used are a ZMQ PUSH socket for messages from Mongrel2 to Handlers, which means your Handlers receive socket should be a ZMQ PULL. Mongrel2 then uses a ZMQ SUB socket for receiving responses, which means your Handlers should send on a ZMQ PUB socket. This setup allows multiple handlers to connect to a Mongrel2 server, but only one Handler will get a message in a round-robin style. The PUB/SUB reply sockets, though, will let Handlers send back replies to a cluster of Mongrel2 servers, but only the one with the right subscription will process the request.2 In the various APIs weve implemented, you dont need to care about this. They provide an abstraction on top of this, but it does help to know it so that you understand why the message format is the way it is. This leads to rule number 1: Rule 1: Handlers receive with PULL and send with PUB sockets.
5.3.2
UUID Addressing
Do you remember all those UUIDs all over the place in the conguration les? They may have seemed odd, but they identify specic server deployments and processes in a cluster. This will let you identify exactly which member of a cluster sent a message, so that you can return the right reply. This is the rst part of our protocol format and it results in the next rule 2: Rule 2: Every message to and from Mongrel2 has that Mongrel2 instances UUID as the very rst thing.
1 Except 2 The
Erlang guys, cause theyll always complain that everythings not in Erlang types of sockets used will be congurable in later version
62
CHAPTER 5. HACKING
5.3.3
You then need a way to identify a particular listener (browser, client, etc.) that your message should target, and Mongrel2 needs to tell you who is sending your handler the request. This means Mongrel2 sends you just one identier, but you can send Mongrel2 a list of them. This leads to rule 3: Rule 3: Mongrel2 sends requests with one number right after the servers UUID separated by a space. Handlers return a netstring with a list of numbers separated by spaces. The numbers indicate the connected browser the message is to/from. In case you dont know what a netstring is, it is a very simple way to encode a block of data such that any language can read the block and know how big it is. A netstring is, simply, SIZE:DATA,. So, to send HI, you would do 2:HI,, and it is incredibly easy to parse in every language, even C. It is also a fast format and you can read it even if youre a human.
5.3.4
In order to make it possible to route or analyze a request in your handlers without having to parse a full request, every request has the path that was matched in the server as the next piece. That gives us: Rule 4: Requests have the path as a single string followed by a space and no paths may have spaces in them.
5.3.5
We only have two more rules to complete the message format. Rule 5: Mongrel2 sends requests with a netstring that contains a JSON hash (dict) of the request headers, and then another netstring with the body of the request. Then theres a similar rule for responses: Rule 6: Handlers return just the body after a space character. It can be any data that Mongel2 is supposed to send to the listeners. HTTP headers, image data, HTML pages, streaming video. . . You can also send as many as you like to complete the request and any handler can send it.
63
5.3.6
Now, even though we laid out all of this as a series of rules, the actual code to implement these is very simple. First heres a simple grammar for how a request that gets sent to your handlers is formatted: UUID ID PATH SIZE:HEADERS,SIZE:BODY, Thats obviously a much simpler way to specify the request than all those rules, but it also doesnt tell you why. The above description, while boring as hell, tells you why each of these pieces exist. Also remember that this is a strict format, so to be more precise its: Identifier = digit+ ?; IdentList = (Identifier)**; Length = digit+; UUID = (alpha | digit | -)+; Targets = Length : IdentList ","; Request = UUID Targets ; Mongrel2 will strictly enforce this grammar and reject any 0mq messages that dont follow it. To parse this in Python we simply do this: Source 32
import json def parse_netstring(ns): len, rest = ns.split( : , 1) len = int(len) assert rest[len] == , , " Netstring did not end in return rest[:len], rest[len+1:] def parse(msg): sender, conn_id, path, rest = msg.split( headers, rest = parse_netstring(rest) body, _ = parse_netstring(rest) headers = json.loads(headers) return uuid, id, path, headers, body
, "
, 3)
This is actually all of the code needed to parse a request, and is fairly the same in many other languages. If you look at the le examples/python/mongrel2/request.py, youll see a more complete example of making a full request object. A response is then just as simple and involves crafting a similar setup like this:
CHAPTER 5. HACKING
Notice Ive got three IDs here, but you can do anywhere from 1 up to 128. Generating this is very easy in Python: Source 33 Generating Responses
def send(uuid, conn_id, msg): header = " %s %d : %s , " % (uuid, len(str(conn_id)), str(conn_id)) self.resp.send(header + + msg)
That, again, is all there is to it. The send method is the one doing the real work of crafting the response, and the deliver method is just using send to do all the the target idents joined with a space.
5.3.7
During the 1.6 development, it became clear that we needed a sort of internal protocol for some new Mongrel2 features. This internal protocol should be able to store all the same things that JSON can, but also store exact binary data. This came about because we want to send raw data to handlers and other parts of the system like the control port, but JSON involved too much work to parse and deal with that. We also did various analyses and found that much of our time was spent just generating JSON. What we did, then, is create a small modication to netstrings that tags each element with its type. We did this by changing the (fairly useless) trailing , character so that it signied the type of what it contained. Types can be any of the main data types that JSON has (dicts, lists, integers, etc.), except that strings are now entirely raw binary strings, with no denition about whether they hold anything other than 8-bit octets. We also made the design so it was backward compatible with netstrings. This lets us use it to directly parse a zeromq message from anyone, and it will work whether its a TNetString-style nested structure, or just a string with JSON in it. The end result is a simple specication at http://tnetstrings.org which encodes a nave parser that anyone can copy to other languages easily. Many other peo ple implemented the protocol and it looks like you can do it in every language in about 100 lines of code. Implementing a version with more performance (since every language needs tricks) seems to take about 500-1000 lines of code.
65
Mongrel2 now supports either TNetStrings or JSON as dened above, on the y, and without any modication to existing handlers. Internally, Mongrel2 uses TNetStrings to create its internal control port protocol, which makes working with Mongrel2 programatically even easier. To demonstrate this, heres the new code for parsing a request in Python:
Source 34
from mongrel2 import tnetstrings
def parse(msg): sender, conn_id, path, rest = msg.split( headers, rest = tnetstrings.parse(rest) body, _ = tnetstrings.parse(rest) if type(headers) is str: headers = json.loads(headers)
, 3)
Our tests also show that TNetStrings are a good compromise between speed and ease of parsing. Theyre hard to get wrong in parsing, easy to write out, and faster than many other protocols out there. The few that are faster are also much, much, harder to parse and more error prone. In our tests, weve found that TNetStrings in Python can be faster than Pythons own pickle format when we use a C extension. The most important point about TNetStrings, though, is how it opens up Mongrel2 for even more control and automation.
5.3.8
Instead of building all of this yourself, Ive created a Python library that wraps all this up and makes it easy to use. Each of the other libraries are designed around the same idea and should have a similar design. To check out how to use the Python API, well take a look at each of the demos that are available. These are the same demos you ran in the previous section to create a sample deployment. For the Python API, you may want to start by looking at two very small les that should be able to understand quickly: examples/python/mongrel2/request.py and examples/python/mongrel2/handler.py.
66
CHAPTER 5. HACKING
http.py example
%r \n HEADERS: %r
All this code does is print back a simple little dump of what it received, and its not even a valid HTML document. Lets walk through everything thats going on: 1. Import the handler module from mongrel2 and json. The json module is really only used for logging. 2. Establish the UUID for our handler, and create a connection. Its not really a connection but more of a virtual circuit that you can just pretend is a
3 This
is the same code as the original le, but with extraneous prints removed for simplicity.
67
connection. Its using all ZeroMQ and the protocol we just described to create a simple API to use. 3. Go into a while loop forever and recv request objects off the connection. 4. One type of special message we can get from Mongrel2 is a disconnect message, which tells you that one of the listeners you tried to talk to was closed. You should either ignore those and read another, or update any internal state you may have. They can come asynchronously, and for the most part you can ignore them unless you need to keep them open as in, say, a chat application or streaming. 5. Craft the reply youre going to send back, which is just a dump of what you received. 6. Send this reply back to Mongrel2. Notice the subtle difference where you include the req object as part of how you reply? This is the major difference between this API and more traditional request/response APIs in that you need the request you are responding to so that it knows where to send things. In a normal socket-based server this is just assumed to be the socket youre talking about. This is all you need at rst to do simple HTTP handlers. In reality, the reply http method is just syntactic sugar on crafting a decent HTTP response. Heres the actual method that is crafting these replies: Source 36 HTTP Response Python Code
def http_response(body, code, status, headers): payload = { code : code, status : status, body : body} headers[ Content-Length ] = len(body) payload[ headers ] = " \r \n " .join( %s : %s % (k,v) for k,v in headers.items()) return HTTP_FORMAT % payload
Which is then used by Connection.reply http and Connection.deliver http to send an actual HTTP response. That means all this is doing is creating the raw bytes you want to go to the real browser, and how its delivered is irrelevant. For example, the deliver http method means that, yes, you can have one handler send a single response to target multiple browsers at once.
68
CHAPTER 5. HACKING
ing your handler an initial message with just the headers, streaming the le to disk, and then a nal message so you can read the resulting le. If you dont want the upload, then you can send a kill message (a 0 length message) and the connection closes, and the le never lands. The upload mechanism works entirely on content length, and whether the le is larger than the limits.content length. This means if you dont want to deal with this for most form uploads, then just set limits.content length high enough and you wont have to. However, if you want to handle le uploads or large requests, then you add the setting upload.temp store to a mkstemp compatible path like /tmp/mongrel2.upload.XXXXXX with the XXXXXX chars being replaced with random characters. It doesnt have to /tmp either, and can be any store you want, network disk, anything. Heres an example handler in examples/http 0mq/upload.py that shows you how to do it:
You can test this with something like curl -T tests/config.sqlite http://localhost:6767 to upload a big le. Whats happening is the following process: 1. Mongrel2 receives a request from a browser (or curl in this case) that is greater than limits.content length in size. It actually doesnt read all of it yet, only about 2k. 2. Mongrel2 looks up the upload.temp store setting and makes a temp le there to write the contents. If you dont have this setting then it aborts and returns an error to the browser. 3. Mongrel2 sees that the request is for a Handler, so it crafts an initial request message. This request message has all the original headers, plus a X-Mongrel2-Upload-Start header with the path of the expected tmple you will read later. 4. Your handler receives this message, which has no actual content, but the original content length, all the headers, and this new header to indicate an upload is starting. 5. At this point, your handler can decide to kill the connection by simply responding with a kill message, or even with a valid HTTP error reponse then a kill message. 6. Otherwise your handler does nothing, and Mongrel2 is already streaming the le into the designated tmple for this upload. 7. When the upload is nally saved to the le, it adds a new header of X-Mongrel2-Upload-Done set to the same le as the rst header. Remember that both headers are in this nal request.
conn = handler.Connection(sender_id, " tcp://127.0.0.1:9997 " , " tcp://127.0.0.1:9996 " ) while True: print " WAITING FOR REQUEST " req = conn.recv() if req.is_disconnect(): print " DISCONNECT " continue elif req.headers.get( x-mongrel2-upload-done , None): expected = req.headers.get( x-mongrel2-upload-start , " BAD " ) upload = req.headers.get( x-mongrel2-upload-done , None) if expected != upload: print " GOT THE WRONG TARGET FILE: continue
body = open(upload, r ).read() print " UPLOAD DONE: BODY IS %d long, content length is len(body), req.headers[ content-length ]) response = " UPLOAD GOOD:
%s " % (
%s " % hashlib.md5(body).hexdigest()
elif req.headers.get( x-mongrel2-upload-start , None): print " UPLOAD starting, don t reply yet. " print " Will read file from continue
else: response = " <pre> \n SENDER: %r \n IDENT: %r \n PATH: req.sender, req.conn_id, req.path, json.dumps(req.headers), req.body) print response conn.reply_http(req, response)
%r \n HEADERS: %r \n BODY:
70
CHAPTER 5. HACKING
8. Your handler then gets this nal request message that has both the X-Mongrel2-Upload-Start and X-Mongrel2-Upload-Done headers, which you can then use to read the upload contents. You should also make sure the headers match to prevent someone forging completed uploads.
Note 9
Remember, when you run Mongrel2 it will store the le relative to its chroot setting. In testing you probably arent running Mongrel2 as root so it works ne. You just then have to make sure that your handler know to look for the le in the same place. So if you have /var/www/mongrel2.org for your chroot and /uploads/file.XXXXXX then the actual le will be in /var/www/mongrel2.org/uploads/file.XXXXXX. The good thing is you can read the cong database in your handlers and nd out all this information as well.
71
Source 38
from mp3stream import ConnectState, Streamer from mongrel2 import handler import glob
sender_id = " 9703b4dd-227a-45c4-b7a1-ef62d97962b2 " CONN = handler.Connection(sender_id, " tcp://127.0.0.1:9995 " , " tcp://127.0.0.1:9994 " )
STREAM_NAME = " Mongrel2 Radio " MP3_FILES = glob.glob( " *.mp3 " ) print " PLAYING: " , MP3_FILES CHUNK_SIZE = 8 * 1024 STATE = ConnectState() STREAMER = Streamer(MP3_FILES, STATE, CONN, CHUNK_SIZE, sender_id) STREAMER.start() HEADERS = { icy-metaint : CHUNK_SIZE, icy-name : STREAM_NAME}
while True: req = CONN.recv() if req.is_disconnect(): print " DISCONNECT " , req.headers, req.body, req.conn_id STATE.remove(req) else: print " REQUEST " , req.headers, req.body if STATE.count() > 20: print " TOO MANY " , STATE.count() CONN.reply_http(req, " Too Many Connected. Try Later. " ) else: STATE.add(req) CONN.reply_http(req, " " , headers=HEADERS)
72
CHAPTER 5. HACKING
6. If we have too many connected clients, we reply with a failure. 7. Otherwise, we add them to the STATE and then send the initial ICY protocol header to get things going.
That is the base of it, and if you point mplayer at it (which is the only player that works, really) you should hear it play: mplayer http://localhost:6767/mp3stream That is, assuming you put some mp3 les into the directory and started the handler again. For more on how the actual state and the protocol works, go look at mp3stream.py. Explaining it is far outside the scope of this manual, but the key points to realize are that this is one thread thats targetting randomly connected clients with a single message to the Mongrel2 server and streaming it.
StateEvent filter_transition(StateEvent state, Connection *conn, tns_value_t *config) { size_t len = 0; char *data = tns_render(config, &len); if(data != NULL) { log_info( " CONFIG: %.*s " , (int)len, data); } free(data); return CLOSE; }
StateEvent *filter_init(Server *srv, bstring load_path, int *out_nstates) { StateEvent states[] = {HANDLER, PROXY}; *out_nstates = Filter_states_length(states); check(*out_nstates == 2, " Wrong state array length. " ); return Filter_state_list(states, *out_nstates); error: return NULL; }
In this code you are basically creating a .so le that Mongrel2 will load on the y when told to. How it works is you make two functions, always named filter init and filter transition. The filter init function sets up a simple array that lists all of the events (found in src/events.h) that you want to have your lter triggered on. Its important that you use the Filter state list function to return the actual list or else youll get the memory allocation wrong. Mongrel2 will load this null.so and call the filter init function and wire it up for each of the events you indicate. Next, when a request comes in, the server will go through each event that triggers, and call your filter transition function. This function will get the StateEvent that is about to happen, the Connection its happening on, and nally, the config that the user set in their config.sqlite database.
74
CHAPTER 5. HACKING
All your filter transition function has to do is use the Mongrel2 APIs to do what it needs, alter the Connection and work with the config to get its work done. When its done, it can then return the next state event that Mongrel2 should work with instead of what you were handed (or, just return the same one if you arent changing how Mongrel2 works). Thats all there is to it for now. Later releases will start having more lters that you can load and look at the example code to try.
75
The next thing to do is to make your tool craft databases and compare the results to what m2sh does for a similar conguration. I recommend you make a database thats correct with m2sh, and then dump it via sqlite3. After that, use your tool to make your own database, dump it, and then use diff to compare your results to mine. You can also look at how the C version of m2sh that is installed by default is written. It lives in tools/m2sh and has a completely different design but does nearly the same things. If you know C then this comparing the two is also educational. Finally, youll need to look at two base schema les: src/config/config.sql and src/config/mimetypes.sql, where the database schema is created and the large list of mimetypes that Mongrel2 knows is stored.5 Your tool should be able to use this SQL to make its database, or at least know what it does. If you do something cool with all of this, let us know.
76 Source 40
CHAPTER 5. HACKING
The null Cong Module
/** * * Copyright (c) 2010, Zed A. Shaw and Mongrel2 Project Contributors. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are * met: * * * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * * * * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * * * Neither the name of the Mongrel2 Project, Zed A. Shaw, nor the names of its contributors may be used to endorse or promote products * derived from this software without specific prior written * permission. * * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS * IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include #include #include #include <filter.h> <dbg.h> <config/module.h> <config/db.h>
struct tagbstring GOODPATH = bsStatic( " goodpath " ); int config_init(const char *path) { if(biseqcstr(&GOODPATH, path)) { log_info( " Got the good path. " ); return 0; } else { log_info( " Got the bad path: %s " , path); return -1; } } void config_close() { } tns_value_t *config_load_handler(int handler_id) { return NULL;
# OUTPUT: #[INFO] (src/mongrel2.c:320) Using configuration module /usr/local/lib/mongrel2/config_modules #[INFO] (null.c:11) Got the good path. #[ERROR] (src/config/config.c:366: errno: None) Wrong type, expected valid rows. #[ERROR] (src/mongrel2.c:124: errno: None) Failed to load global settings. #[ERROR] (src/mongrel2.c:326: errno: None) Aborting since cant load server. #[ERROR] (src/mongrel2.c:362: errno: None) Exiting due to error.
#[INFO] (src/mongrel2.c:320) Using configuration module /usr/local/lib/mongrel2/config_modules #[INFO] (null.c:14) Got the bad path: badpath #[ERROR] (src/mongrel2.c:121: errno: None) Failed to load config database at badpath #[ERROR] (src/mongrel2.c:326: errno: None) Aborting since cant load server. #[ERROR] (src/mongrel2.c:362: errno: None) Exiting due to error.
2. Add your .so to the list of ones to build in tools/config modules/Makefile. 3. Run make to conrm that it builds, then sudo make install to make sure it shows up in $PREFIX/lib/mongrel2/config modules. 4. Start making each function return the right tns_value_t * results that it needs. Look at src/cong/module.c for what is currently being used. 5. Look at tests/config tests.c:test Config load module and write a similar unit test to make sure it works right. Finally, the protocol thats being used is basically a translation of the sqlite3 tables dened in the src/config/config.sql schema into a TNetString data type that Mongrel2 can understand. The queries are checked for every error I could think up, and you should get meaningful error messages about column types. When it doubt, just look at src/config/module.c to see how its being done and then replicate it exactly.
78
CHAPTER 5. HACKING
Note 10
Youre On Your Own Theres also a way to run the same command using m2sh, but its mostly a convenience to get you started. If youre doing your own conguration system its assumed that you probably arent using m2sh and have written your own. In order to make m2sh work with your cong, wed have to alter m2sh quite a lot and turn it into a generic query the cong tool. That might happen, but its not there yet. Rather than confuse the issue, Ill skip documenting it until a later release when its more robust.
Chapter 6
Contributing
You have gone through a complete description of all the features that Mongrel2 has right now, but not all the features it will have. I tend to write small software that does exactly what it needs to do, and Mongrel2 is no different. What you see here are the majority of the things you need to do right now, and well be slowly adding things people need. If youd like to help, then join the mongrel2@librelist.com mailing list and then read the Contributor Instructions. Thanks, Zed
79