Auto-scale Rackspace Cloud servers with Fabric and Celery

Auto-Scaling Celery Workers with Rackspace Cloud Servers

Python logo
Python

I have been working on an automated stock trading system for some time. Part of my automated trading system involves a lot of number crunching and calculations, such as for technical analysis and neural networks. Processing large amounts of data for thousands of stock tickers can be very time consuming and take a significantly long time to fully process the data. I have begun creating additional worker nodes so that I have more processing power available in order to complete the data processing and number crunching much faster.

I have created some tools to help me manage these additional worker nodes and simplify the scaling up and scaling down of the data processing workers. My tools will automatically scale up worker nodes using the Rackspace Cloud Servers API and then start my python celery workers to crunch the data and process the tasks in my rabbitmq queue. When the processing has been completed and the tasks in queue are zero, the auto-scaling script will spin down and destroy the worker cloud servers.

The auto scaling tool checks my rabbitmq queue size and if there are a large amount of tasks in the queue, it will create a number of cloud servers using the Rackspace Cloud API and an image template I have created and saved at Rackspace Cloud.

Basically it works like this:

  • Each worker instance is built from a template image, so it has the exact same packages and code base.
  • If the queue size is very large, then create a bunch of workers.
  • If the queue size is 0, then delete the additional workers, to save money by removing instances we’re not actively using.

I’m using Fabric for python to create the celery worker servers. The celery workers then work on tasks in a RabbitMQ messaging queue. Fabric is a very powerful tool that can run commands automatically on the newly created servers. I am also using the python-cloudservers python package to interface with Rackspace Cloud API and create the servers. After the servers are created, I’m using rsync to copy over my code base to the newly created cloud servers. Fabric then starts up the celery worker daemon on the newly created worker nodes. The celery daemon on the worker then takes care of the rest and starts processing the tasks.

The script I’m using to auto-scale up and and down is a custom script I have written. It runs the auto-scale.py via a crontab entry on my primary/master processing server.

You can find my scripts and code at github:

My crontab entry for auto-scale.py:

*/5 * * * * /usr/bin/python /opt/codebase/autoscale-rackspacecloud-fabric-celery/auto-scale.py >> /tmp/auto-scale.log

Debian & Ubuntu Equivalents of ‘yum whatprovides’

Introduction

For Debian and Ubuntu users there are three easy ways to find what package a file on your system is from. Those Red Hat, Fedora, or CentOS over to a Debian or Ubuntu system may have become used to using ‘yum whatprovides’. There is no whatprovides equivalent in aptitude or apt-get. But there are three easy to use debian/ubuntu alternatives to ‘yum whatprovides’: the Ubuntu packages search web site, the apt-file package, and the ‘dpkg -S’ command.

Ubuntu Packages Search

The first method is simple: Ubuntu provides a web site where you can search the package repositories and pull up detailed information on the packages. Visit the Ubuntu Packages Search site. Scroll down to the section titled "Search the contents of packages", which will search file manifests and all other package information. Enter in the path of the file you are looking for, such as "/etc/ssl/certs/ca-certificates.crt" or "ca-certificates.crt". and hit search. Very easy to use.

apt-file

The apt-file tool can also substitute for ‘yum whatprovides’. The apt-file package must first be installed using aptitude or apt-get. Simply run an ‘aptitude install apt-file’ to install it.

After you install the apt-file package, you must run ‘apt-file update’ to update it’s files database. Remember to run ‘apt-file’ update if you haven’t used it in a while, to make sure your apt-file database is up to date.

Now to determine the package a file is from, use the ‘apt-file search’ command. Here’s an example which shows the base packages my apache2 web server is currently using:

# apt-file search /usr/sbin/apache2
apache2-dbg: /usr/lib/debug/usr/sbin/apache2-mpm-event
apache2-dbg: /usr/lib/debug/usr/sbin/apache2-mpm-prefork
apache2-dbg: /usr/lib/debug/usr/sbin/apache2-mpm-worker
apache2-mpm-event: /usr/sbin/apache2
apache2-mpm-itk: /usr/sbin/apache2
apache2-mpm-prefork: /usr/sbin/apache2
apache2-mpm-worker: /usr/sbin/apache2
apache2.2-common: /usr/sbin/apache2ctl

dpkg -S

Another way to find the package name is using ‘dpkg -S’. This command will quickly tell you the package name it originated from. However it will only work for packages already install on your system. You can also use ‘dpkg -S’ without providing a full path for a very broad search.

Let’s find out what package the python binary /usr/bin/python is from:

# dpkg -S /usr/bin/python
python-minimal: /usr/bin/python

Let’s find out what package /usr/bin/python-config is from:

 
# dpkg -S /usr/bin/python-config 
python-dev: /usr/bin/python-config

Download VMware Appliances and Templates

Here are 2 places you can download ready to use VMware appliances and operating system templates:

You’ll be able to find many different appliances and images. Some of the more popular downloads are: Linux (Fedora, Ubuntu, Debian), FreeBSD.

Also check out VMware HOWTOs also at thoughtpolice.