Subscribing to GitHub Project Releases With AWS Lambda

Aug 5, 2018 16:04 · 2010 words · 10 minutes read

I used to use a free service called Sibbell (run by Dependencies.io) for subscribing to releases of GitHub projects I was interested in until it was discontinued recently due to costs. After a few months of missing releases for projects I care about I decided to make use of AWS Lambda’s free tier to set up a more permanent solution, hopefully less prone to unexpected shutdown.

If you’d rather not set this up (and hence maintain it) yourself, there are alternatives to Sibbell:

I wanted to run my own though, and Lambda reduces the maintenance burden significantly for such a small and specific task. There isn’t really money in doing this, so I personally don’t think I can confidently rely on any of the above alternatives still existing in 1, 2, 5, etc years.

IF This Then That

Before we look at Lambda, there’s another option for simpler use cases than mine: IF This Then That (IFTTT). This doesn’t allow you to send a digest-style email once a day or week, however, which is something I wanted to be able to do, but I have added instructions below for those ok with this compromise.

GitHub Atom Feeds

GitHub provides Atom feeds for various different things in a repository, but the useful on here is the releases feed:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xml:lang="en-US">
  <id>tag:github.com,2008:https://github.com/golang/go/releases</id>
  <link type="text/html" rel="alternate" href="https://github.com/golang/go/releases"/>
  <link type="application/atom+xml" rel="self" href="https://github.com/golang/go/releases.atom"/>
  <title>Release notes from go</title>
  <updated>2018-08-04T03:21:10+10:00</updated>
  <entry>
    <id>tag:github.com,2008:Repository/23096959/go1.11beta3</id>
    <updated>2018-08-04T03:21:10+10:00</updated>
    <link rel="alternate" type="text/html" href="https://github.com/golang/go/releases/tag/go1.11beta3"/>
    <title>go1.11beta3: net: skip flaky TestNotTemporaryRead on FreeBSD</title>
    <content type="html">&lt;p&gt;Updates &lt;a class=&quot;issue-link js-issue-link&quot; data-error-text=&quot;Failed to load issue title&quot; data-id=&quot;321205128&quot; data-permission-text=&quot;Issue title is private&quot; data-url=&quot;https://github.com/golang/go/issues/25289&quot; href=&quot;https://github.com/golang/go/issues/25289&quot;&gt;#25289&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Change-Id: I662760b921be625aca988cd0b43c648ac5dfd814&lt;br&gt;
Reviewed-on: &lt;a href=&quot;https://go-review.googlesource.com/127837&quot; rel=&quot;nofollow&quot;&gt;https://go-review.googlesource.com/127837&lt;/a&gt;&lt;br&gt;
Reviewed-by: Bryan C. Mills &lt;a href=&quot;mailto:bcmills@google.com&quot;&gt;bcmills@google.com&lt;/a&gt;&lt;br&gt;
Run-TryBot: Brad Fitzpatrick &lt;a href=&quot;mailto:bradfitz@golang.org&quot;&gt;bradfitz@golang.org&lt;/a&gt;&lt;br&gt;
TryBot-Result: Gobot Gobot &lt;a href=&quot;mailto:gobot@golang.org&quot;&gt;gobot@golang.org&lt;/a&gt;&lt;/p&gt;</content>
    <author>
      <name>bradfitz</name>
    </author>
    <media:thumbnail height="30" width="30" url="https://avatars3.githubusercontent.com/u/2621?s=60&amp;v=4"/>
  </entry>
...

Source: https://github.com/golang/go/releases.atom

Simply append /releases.atom to the end of a GitHub repository URL and you’re off to the races.

Subscribing via IFTTT Applet

If all you want is an instant notification via email when a new release is published, you can skip the rest of this post and just setup the following IFTTT applet: https://ifttt.com/applets/wyiP45c8-rss-to-email using the Atom link for the repository you care about.

AWS Lambda

Subscribing via IFTTT requires manually adding a new applet for every repository I want to watch. I wanted more, specifically:

  • Custom formatted weekly digest (no matter how many releases)
  • Subscribe to new repositories by starring them

This requires a little more complexity than the IFTTT handlers can provide.

Python Handler

Lambda scripts contain handlers which are called on Lambda function invocation.

The following Python 3 script contains a Lambda handler that fetches the list of starred projects along with their releases using your GitHub API key, filters out any releases older than one week, then creates and sends an email with releases grouped by repository.

#!/usr/bin/env python3.6
'''
AWS Lambda handler to send a weekly email digest for GitHub repository releases.
'''

from base64 import b64decode
from datetime import datetime, timedelta
from email.mime.text import MIMEText
import html
import os
import smtplib
from typing import Generator, List, Dict, Tuple

import boto3
import requests

def _get_decrypted(key: str) -> str:
    '''
    Helper to decrypt the stored credentials from AWS KMS.

    Arguments:
        key (str): name of environment variable to fetch encrypted value from.

    Returns:
        decrypted (str): decrypted os.environ[key] value.
    '''
    return boto3.client('kms').decrypt(
        CiphertextBlob=b64decode(os.environ[key]))['Plaintext'].decode('UTF8')

def repos_with_releases(since: datetime = None) \
        -> Generator[datetime, None, None]:
    '''
    Generator that yields projects with ordered releases, for any projects
    with releases more recently than `since`.

    Arguments:
        since (datetime.datetime): Only yield releases more recent than this.
                                   If not provided, defaults to now - 7d
                                   (rounded down to 00:00:00).

    Yields:
        release (dict): Project with CREATED_AT DESC ordered releases.

    Returns:
        N/A
    '''
    graphql_query = '''
query {{
  viewer {{
    starredRepositories(first:100{}) {{
      pageInfo {{
        endCursor
        startCursor
      }}
      edges {{
        node {{
          id
          nameWithOwner
          releases(first:5, orderBy: {{field: CREATED_AT, direction: DESC}}) {{
            edges {{
              node {{
                name
                tag {{
                  name
                }}
                description
                url
                createdAt
              }}
            }}
          }}
        }}
      }}
    }}
  }}
}}
'''

    if not since:
        since = datetime.now() - timedelta(days=7)
        # Zero out everything after the day, this effectively rounds the
        # datetime down to midnight.
        for prop in ['hour', 'minute', 'second', 'microsecond']:
            since = since - timedelta(**{
                "{}s".format(prop): getattr(since, prop),
            })

    end_cursor_filter = ""
    while True:
        resp = requests.post("https://api.github.com/graphql", headers={
            "Authorization": "token {}".format(_get_decrypted("GITHUB_TOKEN")),
            "Content-Type": "application/json",
            "Accept": "application/json",
        }, json={
            "query": graphql_query.format(end_cursor_filter),
            "variables": {},
        })
        resp.raise_for_status()
        data = resp.json()

        repos = data["data"]["viewer"]["starredRepositories"]
        end_cursor = repos["pageInfo"]["endCursor"]
        if not end_cursor:
            break
        end_cursor_filter = ", after: \"{}\"".format(end_cursor)

        for edge in repos["edges"]:
            node = edge["node"]
            repo_name = node["nameWithOwner"]
            recent_releases = [release["node"]
                               for release in node["releases"]["edges"]
                               if datetime.strptime(
                                   release["node"]["createdAt"],
                                   "%Y-%m-%dT%H:%M:%SZ") > since]
            yield {
                "name": repo_name,
                "releases": recent_releases,
            }

def _build_email(releases: List[Dict[str, str]],
                 no_releases: List[Dict[str, str]]) \
        -> Tuple[str, str]:
    '''
    Build a basic email with releases (or lack thereof) for the given projects.

    Arguments:
        releases (list(dict)): Projects with list of releases, sorted by
                               project name.
        no_releases (list(dict)): Projects with no releaes, sorted by
                                  project name.
    Returns:
        subject, body (tuple(str, str)): Email subject and body, html escaped.
    '''
    title = "Project Releases for the Week Ending {}".format(
        datetime.now().strftime("%Y-%m-%d"))
    body = "<h1>{}</h1>".format(title)

    for project in releases:
        body += '''<h2>
          <a href="https://github.com/{project}" title="{project}">
              {project}
          </a>
        </h2>'''.format(project=html.escape(project["name"]))

        for release in project["releases"]:
            body += '''<p>
              <ul>
                <li>
                  <a href="https://github.com/{project}/releases/tag/{tag}"
                     title="{tag}">{tag}</a> {created_at}<br />
                  {description}
                </li>
              </ul>
            </p>'''.format(
                project=html.escape(project["name"]),
                tag=html.escape(release["tag"]["name"]),
                created_at=html.escape(release["createdAt"]),
                description=html.escape(release["description"]).replace("\r\n", "<br />"))

    if no_releases:
        body += "<h1>No Releases</h1>"
    for project in no_releases:
        body += '''<ul>
          <li><a href="https://github.com/{project}" title="{project}">{project}</a>
        </ul>'''.format(project=html.escape(project["name"]))

    return title, body.replace("\n", "")

def _send_email(subject: str, body: str) -> None:
    '''
    Send email using credentials from the environment.

    Arguments:
        subject (str): Subject of the email.
        body (str): Body of the email.

    Returns:
        N/A
    '''
    from_addr = _get_decrypted("FROM_EMAIL")
    to_addr = _get_decrypted("TO_EMAIL")
    email_pass = _get_decrypted("EMAIL_PASSWORD")
    conn = smtplib.SMTP(host="smtp.mailgun.org", port=587)
    conn.starttls()
    conn.login(from_addr, email_pass)

    msg = MIMEText(body, "html")
    msg["From"] = from_addr
    msg["To"] = to_addr
    msg["Subject"] = subject
    conn.sendmail(msg["From"], [msg["To"]], msg.as_string())
    conn.quit()

def digest_handler(event, context):
    '''
    Lambda entrypoint. Calls necessary functions to build and send the digest.

    Arguments:
        event (dict, list, str, int, float, None): Lambda event data.
        context (LambdaContext): Lambda runtime information and other context.
                                 Documentation on this type can be found here:
                                 https://docs.aws.amazon.com/lambda/latest/dg/python-context-object.html

    Returns:
        N/A (return value is unused by Lambda when using an asynchronous
             invocation method, such as periodic execution a la cron)
    '''
    main()

def main():
    ''' Main func. '''
    no_releases = []
    releases = []
    for repo in repos_with_releases(
        since=(datetime.now() - timedelta(days=7)).replace(
            hour=0, minute=0, second=0, microsecond=0)):
        if repo["releases"]:
            releases.append(repo)
        else:
            no_releases.append(repo)
    no_releases.sort(key=lambda x: x["name"])
    releases.sort(key=lambda x: x["name"])

    title, body = _build_email(releases, no_releases)
    _send_email(title, body)

if __name__ == "__main__":
    main()

(also available as a gist)

Email Template

I wanted something simple. I don’t personally mind HTML email, provided they are basic. Headers, paragraphs, lists, and links are sufficient here; let the device or platform render it however it thinks best.

<h1>Project Releases for the Week Ending 2006-01-02</h1>
<h2><a href="https://github.com/grafana/grafana" title="Grafana">grafana/grafana</a></h2>
<p>
  <ul>
    <li>
      <a href="https://github.com/grafana/grafana/releases/tag/v5.2.2" title="v5.2.2">v5.2.2</a> 25 Jul 2018, 08:03 GMT-4<br />
      * Prometheus: Fix graph panel bar width issue in aligned prometheus queries #12379<br />
      * Dashboard: Dashboard links not updated when changing variables #12506
      ...
    </li>
    ...
  </ul>
</p>
...
<h1>No Releases</h1>
<ul>
  <li><a href="https://github.com/prometheus/prometheus" title="Prometheus">prometheus/prometheus</a>
  ...
</ul>

Creating a GitHub API personal access token

GitHub affords you quite granular control over the permissions of access tokens. To create a personal access token so that the Lambda script can access the necessary GitHub endpoints perform the following:

  1. Navigate to https://github.com/settings/tokens/new
  2. Give your token a meaningful name, like Starred Project Email Digest (Lambda)
  3. Select the read:user checkbox in the list of permissions
  4. Click Generate token
  5. Copy your token to somewhere safe, GitHub will not allow you to read it again (so if you lose it you will need to create a new one)

Create MailGun SMTP credentials

You could use your GMail account credentials here instead, for example, but having my email credentials sitting in an AWS account (even if they are encrypted) makes me somewhat uncomfortable. Getting my free, personal, password-manager-password-generated Mailgun account compromised is vastly lower impact than getting my GMail account compromised. If you have your own domain then you can signup for mailgun and get 10k emails free here, which should be more than enough for your personal use. You do not need to add a credit card to your account if you don’t want to. Alternatively, you could create a second GMail account just for sending yourself emails.

Once you have a Mailgun account created and setup do the following to get an API key:

  1. Navigate to https://app.mailgun.com/app/domains/<your_domain>/credentials
  2. Click New SMTP Credential
  3. Give it a name, like github@<your_domain>
  4. Give it a password, preferrably a long, randomly generated one

Note: you could also use AWS SES here if you prefer. I didn’t learn about it until I wrote this blog post and a friend pointed it out (something something AWS has too many things to know) and I personally prefer to spread my eggs a little bit.

Creating and configuring the Lambda function

You can use the aws cli for this, however I will explain how to do this with the UI as that is the easiest way for people who don’t already have roles etc set up to use.

Creating the Lambda function and roles

  1. Navigate to https://console.aws.amazon.com/lambda/home?region=us-east-1#/create?firstrun=true
  2. Select Author from scratch
  3. Give your new function a descriptive name, like github-release-digest
  4. Select Python 3.6 as the runtime
  5. Select Create a new role from template(s) (this will automatically grant access to things like CloudWatch for logging)
  6. Give your new role a descriptive name, like basic_lambda_execution
  7. Select KMS Descryption Permissions under Policy templates, so that we can store our credentials with encryption at rest
  8. Click Create function

Configuring a scheduled, periodic trigger

  1. Select CloudWatch Event from the list of available triggers on the LHS of the Lambda builder screen
  2. Under Rule select Create a new rule
  3. Give the rule a descriptive name, like “monday_9am”
  4. Give the rule a description, like “Fires every Monday at 9am UTC”
  5. Select Schedule expression
  6. Enter cron(0 9 ? * MON *) (the last * is non-standard, and represents the year)
  7. Click Add
  8. Click Save

Adding the requirements and the handler code

We imported requests, but Lambda doesn’t provide any way to satisfy module requirements automatically so we have to package and provide them ourselves.

  1. Install requests and its dependencies to a temp directory sh pip install requests -t /tmp/github-subscription
  2. Zip the requirements sh cd /tmp/ zip -r package.zip github-subscription
  3. Add the handler to the zip file (the -j is so that the handler file exists at the top level of the zip) sh zip -j package.zip /path/to/handler.py
  4. Upload requirements.zip to the Lambda function by selecting Upload a .ZIP file as the code entry method under the function code section
  5. Select Python 3.6 as the runtime
  6. Set the Handler to handler.digest_handler

Adding the necessary secrets

AWS Lambda allows you to provide encrypted environment variables for your functions via KMS. To set this up, adding your GitHub and Mailgun credentials, do the following:

  1. On the Lambda configuration screen, under the environment variables section, expand the encryption configuration and check the box to enable helpers for encryption in transit
  2. If you have no KMS keys already, click the link to create a new one
  3. Select the desired KMS key to use to encrypt the Lambda environment variables
  4. Enter the following variables and their values using the text input, selecting to encrypt all of them:
    • GITHUB_TOKEN
    • FROM_EMAIL
    • TO_EMAIL
    • EMAIL_PASSWORD

Increase the default timeout

The default Lambda timeout is only 3 seconds, which may not be enough to query the GitHub API and successfully send the email. Since this function runs quite infrequently, you can probably safely bump this to 30s or so.

Test your function!

To make sure you’ve setup your Lambda function correctly, you can trigger a test run. Select the github-release-digest box in the Lambda designer window and then click the Test button at the top. Assuming everything works you should receive an email.

You will now receive an email every Monday morning at 9am UTC with a list of releases for all your starred GitHub projects. To subscribe or unsubscribe from new repositories simple star/unstar them.