Aurora Configuration Best Practices

Use As Few .aurora Files As Possible

When creating your .aurora configuration, try to keep all versions of a particular job within the same .aurora file. For example, if you have separate jobs for cluster1, cluster1 staging, cluster1 testing, andcluster2, keep them as close together as possible.

Constructs shared across multiple jobs owned by your team (e.g. team-level defaults or structural templates) can be split into separate .aurorafiles and included via the include directive.

Avoid Boilerplate

If you see repetition or find yourself copy and pasting any parts of your configuration, it's likely an opportunity for templating. Take the example below:

redundant.aurora contains:

download = Process(
  name = 'download',
  cmdline = 'wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2',
  max_failures = 5,
  min_duration = 1)

unpack = Process(
  name = 'unpack',
  cmdline = 'rm -rf Python-2.7.3 && tar xzf Python-2.7.3.tar.bz2',
  max_failures = 5,
  min_duration = 1)

build = Process(
  name = 'build',
  cmdline = 'pushd Python-2.7.3 && ./configure && make && popd',
  max_failures = 1)

email = Process(
  name = 'email',
  cmdline = 'echo Success | mail feynman@tmc.com',
  max_failures = 5,
  min_duration = 1)

build_python = Task(
  name = 'build_python',
  processes = [download, unpack, build, email],
  constraints = [Constraint(order = ['download', 'unpack', 'build', 'email'])])

As you'll notice, there's a lot of repetition in the Process definitions. For example, almost every process sets a max_failures limit to 5 and a min_duration to 1. This is an opportunity for factoring into a common process template.

Furthermore, the Python version is repeated everywhere. This can be bound via structural templating as described in the Advanced Binding section.

less_redundant.aurora contains:

class Python(Struct):
  version = Required(String)
  base = Default(String, 'Python-{{version}}')
  package = Default(String, '{{base}}.tar.bz2')

ReliableProcess = Process(
  max_failures = 5,
  min_duration = 1)

download = ReliableProcess(
  name = 'download',
  cmdline = 'wget http://www.python.org/ftp/python/{{python.version}}/{{python.package}}')

unpack = ReliableProcess(
  name = 'unpack',
  cmdline = 'rm -rf {{python.base}} && tar xzf {{python.package}}')

build = ReliableProcess(
  name = 'build',
  cmdline = 'pushd {{python.base}} && ./configure && make && popd',
  max_failures = 1)

email = ReliableProcess(
  name = 'email',
  cmdline = 'echo Success | mail {{role}}@foocorp.com')

build_python = SequentialTask(
  name = 'build_python',
  processes = [download, unpack, build, email]).bind(python = Python(version = "2.7.3"))

Thermos Uses bash, But Thermos Is Not bash

Bad

Many tiny Processes makes for harder to manage configurations.

copy = Process(
  name = 'copy',
  cmdline = 'rcp user@my_machine:my_application .'
 )

 unpack = Process(
   name = 'unpack',
   cmdline = 'unzip app.zip'
 )

 remove = Process(
   name = 'remove',
   cmdline = 'rm -f app.zip'
 )

 run = Process(
   name = 'app',
   cmdline = 'java -jar app.jar'
 )

 run_task = Task(
   processes = [copy, unpack, remove, run],
   constraints = order(copy, unpack, remove, run)
 )

Good

Each cmdline runs in a bash subshell, so you have the full power of bash. Chaining commands with && or || is almost always the right thing to do.

Also for Tasks that are simply a list of processes that run one after another, consider using the SequentialTask helper which applies a linear ordering constraint for you.

stage = Process(
  name = 'stage',
  cmdline = 'rcp user@my_machine:my_application . && unzip app.zip && rm -f app.zip')

run = Process(name = 'app', cmdline = 'java -jar app.jar')

run_task = SequentialTask(processes = [stage, run])

Rarely Use Functions In Your Configurations

90% of the time you define a function in a .aurora file, you're probably Doing It Wrong(TM).

Bad

def get_my_task(name, user, cpu, ram, disk):
  return Task(
    name = name,
    user = user,
    processes = [STAGE_PROCESS, RUN_PROCESS],
    constraints = order(STAGE_PROCESS, RUN_PROCESS),
    resources = Resources(cpu = cpu, ram = ram, disk = disk)
 )

 task_one = get_my_task('task_one', 'feynman', 1.0, 32*MB, 1*GB)
 task_two = get_my_task('task_two', 'feynman', 2.0, 64*MB, 1*GB)

Good

This one is more idiomatic. Forced keyword arguments prevents accidents, e.g. constructing a task with "32*MB" when you mean 32MB of ram and not disk. Less proliferation of task-construction techniques means easier-to-read, quicker-to-understand, and a more composable configuration.

TASK_TEMPLATE = SequentialTask(
  user = 'wickman',
  processes = [STAGE_PROCESS, RUN_PROCESS],
)

task_one = TASK_TEMPLATE(
  name = 'task_one',
  resources = Resources(cpu = 1.0, ram = 32*MB, disk = 1*GB) )

task_two = TASK_TEMPLATE(
  name = 'task_two',
  resources = Resources(cpu = 2.0, ram = 64*MB, disk = 1*GB)
)