1 minute read

I want to share a little piece of work that I recently completed. Hope you enjoy!

What I did

  • Using infrastructure-as-code and AWS I built a system that automatically creates testsets for our algorithms every month
  • Now, every month a cron job (EventsRule) triggers a piece of code (Lambda) that runs a computational routine (Sagemaker job) that creates the new testset in a new table in our Athena warehouse (Athena table/S3 bucket)

Why I am proud of it

  • This was my first time setting up automated infrastructure-as-code and it was a real pain in the ass, but once I got it, it was very satisfying
  • I found it particularly interesting to build a new system by piecing together different services of AWS
  • It adds repeated business value by automating something we previously did manually

Technical details

Most of the work involved updating our infra-as-code cloudformation.yaml. I had to provision the following new infrastructure:

  • CreateTestsetLambda (AWS::Serverless::Function): This is the actual code in the form of a lambda that triggers the Sagemaker job using boto3. We do it this way because currently (as of writing this) there is no way to start a Sagemaker job directly from a lambda
  • AWSLambdaCreateTestsetLambdaRole (AWS::IAM::Role): This is the role that the lambda needs to get executed
  • AWSSageMakerExecutorRole (AWS::IAM::Role): This is another role that we give to the lambda so that it has rights to start the Sagemaker job
  • ScheduleMonthlyEventsRule (AWS::Events::Rule): This is the cron job that runs triggers the lambda every month
  • PermissionForEventsRuleToInvokeLambda (AWS::Lambda::Permission): Finally, we need to give the cron job this permission to be allowed to trigger the lambda

That’s it! I have to admit, it was a real pain in the ass sometimes, but once everything started working together it was sooooo satisfying.