Auto-scale Rackspace Cloud servers with Fabric and Celery

Auto-Scaling Celery Workers with Rackspace Cloud Servers

Python logo

Python

I have been working on an automated stock trading system for some time. Part of my automated trading system involves a lot of number crunching and calculations, such as for technical analysis and neural networks. Processing large amounts of data for thousands of stock tickers can be very time consuming and take a significantly long time to fully process the data. I have begun creating additional worker nodes so that I have more processing power available in order to complete the data processing and number crunching much faster.

I have created some tools to help me manage these additional worker nodes and simplify the scaling up and scaling down of the data processing workers. My tools will automatically scale up worker nodes using the Rackspace Cloud Servers API and then start my python celery workers to crunch the data and process the tasks in my rabbitmq queue. When the processing has been completed and the tasks in queue are zero, the auto-scaling script will spin down and destroy the worker cloud servers.

The auto scaling tool checks my rabbitmq queue size and if there are a large amount of tasks in the queue, it will create a number of cloud servers using the Rackspace Cloud API and an image template I have created and saved at Rackspace Cloud.

Basically it works like this:

  • Each worker instance is built from a template image, so it has the exact same packages and code base.
  • If the queue size is very large, then create a bunch of workers.
  • If the queue size is 0, then delete the additional workers, to save money by removing instances we’re not actively using.

I’m using Fabric for python to create the celery worker servers. The celery workers then work on tasks in a RabbitMQ messaging queue. Fabric is a very powerful tool that can run commands automatically on the newly created servers. I am also using the python-cloudservers python package to interface with Rackspace Cloud API and create the servers. After the servers are created, I’m using rsync to copy over my code base to the newly created cloud servers. Fabric then starts up the celery worker daemon on the newly created worker nodes. The celery daemon on the worker then takes care of the rest and starts processing the tasks.

The script I’m using to auto-scale up and and down is a custom script I have written. It runs the auto-scale.py via a crontab entry on my primary/master processing server.

You can find my scripts and code at github:

My crontab entry for auto-scale.py:

*/5 * * * * /usr/bin/python /opt/codebase/autoscale-rackspacecloud-fabric-celery/auto-scale.py >> /tmp/auto-scale.log

2 thoughts on “Auto-scale Rackspace Cloud servers with Fabric and Celery

  1. Thanks for this, I seem to be retracing your steps almost exactly and this is probably going to save me a boatload of time and prevent me from reinventing the wheel. I’ve similarly been working on a trading system modeller/tester and have set things up so I can scale out the computational work into the cloud, as it happens also with Rackspace. However the process of adding nodes and so on is very manual and so I’ve been I’ve been thinking of automating this for a while. As for the actual parallel computation i’ve so far user ParallelPython (pp) which works rather well for me, but you mention a number of alternatives which I suspect might actually be better in the long run. Anyway, thanks again for this blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>