Planet Debian

Subscribe to Planet Debian feed
Planet Debian - https://planet.debian.org/
Updated: 2 hours 35 min ago

Sergio Talens-Oliag: Kubernetes Static Content Server

8 hours 47 min ago

This post describes how I’ve put together a simple static content server for kubernetes clusters using a Pod with a persistent volume and multiple containers: an sftp server to manage contents, a web server to publish them with optional access control and another one to run scripts which need access to the volume filesystem.

The sftp server runs using MySecureShell, the web server is nginx and the script runner uses the webhook tool to publish endpoints to call them (the calls will come from other Pods that run backend servers or are executed from Jobs or CronJobs).

Note:

This service has been developed for Kyso and the version used in our current architecture includes an additional container to index documents for Elasticsearch, but as it is not relevant for the description of the service as a general solution I’ve decided to ignore it on this post.

History

The system was developed because we had a NodeJS API with endpoints to upload files and store them on S3 compatible services that were later accessed via HTTPS, but the requirements changed and we needed to be able to publish folders instead of individual files using their original names and apply access restrictions using our API.

Thinking about our requirements the use of a regular filesystem to keep the files and folders was a good option, as uploading and serving files is simple.

For the upload I decided to use the sftp protocol, mainly because I already had an sftp container image based on mysecureshell prepared; once we settled on that we added sftp support to the API server and configured it to upload the files to our server instead of using S3 buckets.

To publish the files we added a nginx container configured to work as a reverse proxy that uses the ngx_http_auth_request_module to validate access to the files (the sub request is configurable, in our deployment we have configured it to call our API to check if the user can access a given URL).

Finally we added a third container when we needed to execute some tasks directly on the filesystem (using kubectl exec with the existing containers did not seem a good idea, as that is not supported by CronJobs objects, for example).

The solution we found avoiding the NIH Syndrome (i.e. write our own tool) was to use the webhook tool to provide the endpoints to call the scripts; for now we have three:

  • one to get the disc usage of a PATH,
  • one to hardlink all the files that are identical on the filesystem,
  • one to copy files and folders from S3 buckets to our filesystem.
Container definitionsmysecureshell

The mysecureshell container can be used to provide an sftp service with multiple users (although the files are owned by the same UID and GID) using standalone containers (launched with docker or podman) or in an orchestration system like kubernetes, as we are going to do here.

The image is generated using the following Dockerfile:

ARG ALPINE_VERSION=3.16.2

FROM alpine:$ALPINE_VERSION as builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN apk update &&\
 apk add --no-cache alpine-sdk git musl-dev &&\
 git clone https://github.com/sto/mysecureshell.git &&\
 cd mysecureshell &&\
 ./configure --prefix=/usr --sysconfdir=/etc --mandir=/usr/share/man\
 --localstatedir=/var --with-shutfile=/var/lib/misc/sftp.shut --with-debug=2 &&\
 make all && make install &&\
 rm -rf /var/cache/apk/*

FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
COPY --from=builder /usr/bin/mysecureshell /usr/bin/mysecureshell
COPY --from=builder /usr/bin/sftp-* /usr/bin/
RUN apk update &&\
 apk add --no-cache openssh shadow pwgen &&\
 sed -i -e "s|^.*\(AuthorizedKeysFile\).*$|\1 /etc/ssh/auth_keys/%u|"\
 /etc/ssh/sshd_config &&\
 mkdir /etc/ssh/auth_keys &&\
 cat /dev/null > /etc/motd &&\
 add-shell '/usr/bin/mysecureshell' &&\
 rm -rf /var/cache/apk/*
COPY bin/* /usr/local/bin/
COPY etc/sftp_config /etc/ssh/
COPY entrypoint.sh /
EXPOSE 22
VOLUME /sftp
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]
Note:

Initially the container used the mysecureshell package included in alpine, but we wanted to be able to create hardlinks from the client and the support is only available on the master branch of the source repository, that is why we are compiling our own binary using a multi-stage Dockerfile.

Note that we are cloning the source from a fork that includes this pull request because we had to fix a couple of minor issues to make the ln command work as expected.

The /etc/sftp_config file is used to configure the mysecureshell server to have all the user homes under /sftp/data, only allow them to see the files under their home directories as if it were at the root of the server and close idle connections after 5m of inactivity:

etc/sftp_config
# Default mysecureshell configuration
<Default>
   # All users will have access their home directory under /sftp/data
   Home /sftp/data/$USER
   # Log to a file inside /sftp/logs/ (only works when the directory exists)
   LogFile /sftp/logs/mysecureshell.log
   # Force users to stay in their home directory
   StayAtHome true
   # Hide Home PATH, it will be shown as /
   VirtualChroot true
   # Hide real file/directory owner (just change displayed permissions)
   DirFakeUser true
   # Hide real file/directory group (just change displayed permissions)
   DirFakeGroup true
   # We do not want users to keep forever their idle connection
   IdleTimeOut 5m
</Default>
# vim: ts=2:sw=2:et

The entrypoint.sh script is the one responsible to prepare the container for the users included on the /secrets/user_pass.txt file (creates the users with their HOME directories under /sftp/data and a /bin/false shell and creates the key files from /secrets/user_keys.txt if available).

The script expects a couple of environment variables:

  • SFTP_UID: UID used to run the daemon and for all the files, it has to be different than 0 (all the files managed by this daemon are going to be owned by the same user and group, even if the remote users are different).
  • SFTP_GID: GID used to run the daemon and for all the files, it has to be different than 0.

And can use the SSH_PORT and SSH_PARAMS values if present.

It also requires the following files (they can be mounted as secrets in kubernetes):

  • /secrets/host_keys.txt: Text file containing the ssh server keys in mime format; the file is processed using the reformime utility (the one included on busybox) and can be generated using the gen-host-keys script included on the container (it uses ssh-keygen and makemime).
  • /secrets/user_pass.txt: Text file containing lines of the form username:password_in_clear_text (only the users included on this file are available on the sftp server, in fact in our deployment we use only the scs user for everything).

And optionally can use another one:

  • /secrets/user_keys.txt: Text file that contains lines of the form username:public_ssh_ed25519_or_rsa_key; the public keys are installed on the server and can be used to log into the sftp server if the username exists on the user_pass.txt file.

The contents of the entrypoint.sh script are:

entrypoint.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
# Expects SSH_UID & SSH_GID on the environment and uses the value of the
# SSH_PORT & SSH_PARAMS variables if present
# SSH_PARAMS
SSH_PARAMS="-D -e -p ${SSH_PORT:=22} ${SSH_PARAMS}"
# Fixed values
# DIRECTORIES
HOME_DIR="/sftp/data"
CONF_FILES_DIR="/secrets"
AUTH_KEYS_PATH="/etc/ssh/auth_keys"
# FILES
HOST_KEYS="$CONF_FILES_DIR/host_keys.txt"
USER_KEYS="$CONF_FILES_DIR/user_keys.txt"
USER_PASS="$CONF_FILES_DIR/user_pass.txt"
USER_SHELL_CMD="/usr/bin/mysecureshell"
# TYPES
HOST_KEY_TYPES="dsa ecdsa ed25519 rsa"
# ---------
# FUNCTIONS
# ---------
# Validate HOST_KEYS, USER_PASS, SFTP_UID and SFTP_GID
_check_environment() {
  # Check the ssh server keys ... we don't boot if we don't have them
  if [ ! -f "$HOST_KEYS" ]; then
    cat <<EOF
We need the host keys on the '$HOST_KEYS' file to proceed.

Call the 'gen-host-keys' script to create and export them on a mime file.
EOF
    exit 1
  fi
  # Check that we have users ... if we don't we can't continue
  if [ ! -f "$USER_PASS" ]; then
    cat <<EOF
We need at least the '$USER_PASS' file to provision users.

Call the 'gen-users-tar' script to create a tar file to create an archive that
contains public and private keys for users, a 'user_keys.txt' with the public
keys of the users and a 'user_pass.txt' file with random passwords for them 
(pass the list of usernames to it).
EOF
    exit 1
  fi
  # Check SFTP_UID
  if [ -z "$SFTP_UID" ]; then
    echo "The 'SFTP_UID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_UID" -eq "0" ]; then
    echo "The 'SFTP_UID' can't be 0, use a different 'UID'"
    exit 1
  fi
  # Check SFTP_GID
  if [ -z "$SFTP_GID" ]; then
    echo "The 'SFTP_GID' can't be empty, pass a 'GID'."
    exit 1
  fi
  if [ "$SFTP_GID" -eq "0" ]; then
    echo "The 'SFTP_GID' can't be 0, use a different 'GID'"
    exit 1
  fi
}
# Adjust ssh host keys
_setup_host_keys() {
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  reformime <"$HOST_KEYS" || ret="1"
  for kt in $HOST_KEY_TYPES; do
    key="ssh_host_${kt}_key"
    pub="ssh_host_${kt}_key.pub"
    if [ ! -f "$key" ]; then
      echo "Missing '$key' file"
      ret="1"
    fi
    if [ ! -f "$pub" ]; then
      echo "Missing '$pub' file"
      ret="1"
    fi
    if [ "$ret" -ne "0" ]; then
      continue
    fi
    cat "$key" >"/etc/ssh/$key"
    chmod 0600 "/etc/ssh/$key"
    chown root:root "/etc/ssh/$key"
    cat "$pub" >"/etc/ssh/$pub"
    chmod 0600 "/etc/ssh/$pub"
    chown root:root "/etc/ssh/$pub"
  done
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
}
# Create users
_setup_user_pass() {
  opwd="$(pwd)"
  tmpdir="$(mktemp -d)"
  cd "$tmpdir"
  ret="0"
  [ -d "$HOME_DIR" ] || mkdir "$HOME_DIR"
  # Make sure the data dir can be managed by the sftp user
  chown "$SFTP_UID:$SFTP_GID" "$HOME_DIR"
  # Allow the user (and root) to create directories inside the $HOME_DIR, if
  # we don't allow it the directory creation fails on EFS (AWS)
  chmod 0755 "$HOME_DIR"
  # Create users
  echo "sftp:sftp:$SFTP_UID:$SFTP_GID:::/bin/false" >"newusers.txt"
  sed -n "/^[^#]/ { s/:/ /p }" "$USER_PASS" | while read -r _u _p; do
    echo "$_u:$_p:$SFTP_UID:$SFTP_GID::$HOME_DIR/$_u:$USER_SHELL_CMD"
  done >>"newusers.txt"
  newusers --badnames newusers.txt
  # Disable write permission on the directory to forbid remote sftp users to
  # remove their own root dir (they have already done it); we adjust that
  # here to avoid issues with EFS (see before)
  chmod 0555 "$HOME_DIR"
  # Clean up the tmpdir
  cd "$opwd"
  rm -rf "$tmpdir"
  return "$ret"
}
# Adjust user keys
_setup_user_keys() {
  if [ -f "$USER_KEYS" ]; then
    sed -n "/^[^#]/ { s/:/ /p }" "$USER_KEYS" | while read -r _u _k; do
      echo "$_k" >>"$AUTH_KEYS_PATH/$_u"
    done
  fi
}
# Main function
exec_sshd() {
  _check_environment
  _setup_host_keys
  _setup_user_pass
  _setup_user_keys
  echo "Running: /usr/sbin/sshd $SSH_PARAMS"
  # shellcheck disable=SC2086
  exec /usr/sbin/sshd -D $SSH_PARAMS
}
# ----
# MAIN
# ----
case "$1" in
"server") exec_sshd ;;
*) exec "$@" ;;
esac
# vim: ts=2:sw=2:et

The container also includes a couple of auxiliary scripts, the first one can be used to generate the host_keys.txt file as follows:

$ docker run --rm stodh/mysecureshell gen-host-keys > host_keys.txt

Where the script is as simple as:

bin/gen-host-keys
#!/bin/sh
set -e
# Generate new host keys
ssh-keygen -A >/dev/null
# Replace hostname
sed -i -e 's/@.*$/@mysecureshell/' /etc/ssh/ssh_host_*_key.pub
# Print in mime format (stdout)
makemime /etc/ssh/ssh_host_*
# vim: ts=2:sw=2:et

And there is another script to generate a .tar file that contains auth data for the list of usernames passed to it (the file contains a user_pass.txt file with random passwords for the users, public and private ssh keys for them and the user_keys.txt file that matches the generated keys).

To generate a tar file for the user scs we can execute the following:

$ docker run --rm stodh/mysecureshell gen-users-tar scs > /tmp/scs-users.tar

To see the contents and the text inside the user_pass.txt file we can do:

$ tar tvf /tmp/scs-users.tar
-rw-r--r-- root/root        21 2022-09-11 15:55 user_pass.txt
-rw-r--r-- root/root       822 2022-09-11 15:55 user_keys.txt
-rw------- root/root       387 2022-09-11 15:55 id_ed25519-scs
-rw-r--r-- root/root        85 2022-09-11 15:55 id_ed25519-scs.pub
-rw------- root/root      3357 2022-09-11 15:55 id_rsa-scs
-rw------- root/root      3243 2022-09-11 15:55 id_rsa-scs.pem
-rw-r--r-- root/root       729 2022-09-11 15:55 id_rsa-scs.pub
$ tar xfO /tmp/scs-users.tar user_pass.txt
scs:20JertRSX2Eaar4x

The source of the script is:

bin/gen-users-tar
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
USER_KEYS_FILE="user_keys.txt"
USER_PASS_FILE="user_pass.txt"
# ---------
# MAIN CODE
# ---------
# Generate user passwords and keys, return 1 if no username is received
if [ "$#" -eq "0" ]; then
  return 1
fi
opwd="$(pwd)"
tmpdir="$(mktemp -d)"
cd "$tmpdir"
for u in "$@"; do
  ssh-keygen -q -a 100 -t ed25519 -f "id_ed25519-$u" -C "$u" -N ""
  ssh-keygen -q -a 100 -b 4096 -t rsa -f "id_rsa-$u" -C "$u" -N ""
  # Legacy RSA private key format
  cp -a "id_rsa-$u" "id_rsa-$u.pem"
  ssh-keygen -q -p -m pem -f "id_rsa-$u.pem" -N "" -P "" >/dev/null
  chmod 0600 "id_rsa-$u.pem"
  echo "$u:$(pwgen -s 16 1)" >>"$USER_PASS_FILE"
  echo "$u:$(cat "id_ed25519-$u.pub")" >>"$USER_KEYS_FILE"
  echo "$u:$(cat "id_rsa-$u.pub")" >>"$USER_KEYS_FILE"
done
tar cf - "$USER_PASS_FILE" "$USER_KEYS_FILE" id_* 2>/dev/null
cd "$opwd"
rm -rf "$tmpdir"
# vim: ts=2:sw=2:et
nginx-scs

The nginx-scs container is generated using the following Dockerfile:

ARG NGINX_VERSION=1.23.1

FROM nginx:$NGINX_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
RUN rm -f /docker-entrypoint.d/*
COPY docker-entrypoint.d/* /docker-entrypoint.d/

Basically we are removing the existing docker-entrypoint.d scripts from the standard image and adding a new one that configures the web server as we want using a couple of environment variables:

  • AUTH_REQUEST_URI: URL to use for the auth_request, if the variable is not found on the environment auth_request is not used.
  • HTML_ROOT: Base directory of the web server, if not passed the default /usr/share/nginx/html is used.

Note that if we don’t pass the variables everything works as if we were using the original nginx image.

The contents of the configuration script are:

docker-entrypoint.d/10-update-default-conf.sh
#!/bin/sh
# Replace the default.conf nginx file by our own version.
set -e
if [ -z "$HTML_ROOT" ]; then
  HTML_ROOT="/usr/share/nginx/html"
fi
if [ "$AUTH_REQUEST_URI" ]; then
  cat >/etc/nginx/conf.d/default.conf <<EOF
server {
  listen       80;
  server_name  localhost;
  location / {
    auth_request /.auth;
    root  $HTML_ROOT;
    index index.html index.htm;
  }
  location /.auth {
    internal;
    proxy_pass $AUTH_REQUEST_URI;
    proxy_pass_request_body off;
    proxy_set_header Content-Length "";
    proxy_set_header X-Original-URI \$request_uri;
  }
  error_page   500 502 503 504  /50x.html;
  location = /50x.html {
    root /usr/share/nginx/html;
  }
}
EOF
else
  cat >/etc/nginx/conf.d/default.conf <<EOF
server {
  listen       80;
  server_name  localhost;
  location / {
    root  $HTML_ROOT;
    index index.html index.htm;
  }
  error_page   500 502 503 504  /50x.html;
  location = /50x.html {
    root /usr/share/nginx/html;
  }
}
EOF
fi
# vim: ts=2:sw=2:et

As we will see later the idea is to use the /sftp/data or /sftp/data/scs folder as the root of the web published by this container and create an Ingress object to provide access to it outside of our kubernetes cluster.

webhook-scs

The webhook-scs container is generated using the following Dockerfile:

ARG ALPINE_VERSION=3.16.2
ARG GOLANG_VERSION=alpine3.16

FROM golang:$GOLANG_VERSION AS builder
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
ENV WEBHOOK_VERSION 2.8.0
ENV WEBHOOK_PR 549
ENV S3FS_VERSION v1.91
WORKDIR /go/src/github.com/adnanh/webhook
RUN apk update &&\
 apk add --no-cache -t build-deps curl libc-dev gcc libgcc patch
RUN curl -L --silent -o webhook.tar.gz\
 https://github.com/adnanh/webhook/archive/${WEBHOOK_VERSION}.tar.gz &&\
 tar xzf webhook.tar.gz --strip 1 &&\
 curl -L --silent -o ${WEBHOOK_PR}.patch\
 https://patch-diff.githubusercontent.com/raw/adnanh/webhook/pull/${WEBHOOK_PR}.patch &&\
 patch -p1 < ${WEBHOOK_PR}.patch &&\
 go get -d && \
 go build -o /usr/local/bin/webhook
WORKDIR /src/s3fs-fuse
RUN apk update &&\
 apk add ca-certificates build-base alpine-sdk libcurl automake autoconf\
 libxml2-dev libressl-dev mailcap fuse-dev curl-dev
RUN curl -L --silent -o s3fs.tar.gz\
 https://github.com/s3fs-fuse/s3fs-fuse/archive/refs/tags/$S3FS_VERSION.tar.gz &&\
 tar xzf s3fs.tar.gz --strip 1 &&\
 ./autogen.sh &&\
 ./configure --prefix=/usr/local &&\
 make -j && \
 make install

FROM alpine:$ALPINE_VERSION
LABEL maintainer="Sergio Talens-Oliag <sto@mixinet.net>"
WORKDIR /webhook
RUN apk update &&\
 apk add --no-cache ca-certificates mailcap fuse libxml2 libcurl libgcc\
 libstdc++ rsync util-linux-misc &&\
 rm -rf /var/cache/apk/*
COPY --from=builder /usr/local/bin/webhook /usr/local/bin/webhook
COPY --from=builder /usr/local/bin/s3fs /usr/local/bin/s3fs
COPY entrypoint.sh /
COPY hooks/* ./hooks/
EXPOSE 9000
ENTRYPOINT ["/entrypoint.sh"]
CMD ["server"]

Again, we use a multi-stage build because in production we wanted to support a functionality that is not already on the official versions (streaming the command output as a response instead of waiting until the execution ends); this time we build the image applying the PATCH included on this pull request against a released version of the source instead of creating a fork.

The entrypoint.sh script is used to generate the webhook configuration file for the existing hooks using environment variables (basically the WEBHOOK_WORKDIR and the *_TOKEN variables) and launch the webhook service:

entrypoint.sh
#!/bin/sh
set -e
# ---------
# VARIABLES
# ---------
WEBHOOK_BIN="${WEBHOOK_BIN:-/webhook/hooks}"
WEBHOOK_YML="${WEBHOOK_YML:-/webhook/scs.yml}"
WEBHOOK_OPTS="${WEBHOOK_OPTS:--verbose}"
# ---------
# FUNCTIONS
# ---------
print_du_yml() {
  cat <<EOF
- id: du
  execute-command: '$WEBHOOK_BIN/du.sh'
  command-working-directory: '$WORKDIR'
  response-headers:
  - name: 'Content-Type'
    value: 'application/json'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-arguments-to-command:
  - source: 'url'
    name: 'path'
  pass-environment-to-command:
  - source: 'string'
    envname: 'OUTPUT_FORMAT'
    name: 'json'
EOF
}
print_hardlink_yml() {
  cat <<EOF
- id: hardlink
  execute-command: '$WEBHOOK_BIN/hardlink.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['GET']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
EOF
}
print_s3sync_yml() {
  cat <<EOF
- id: s3sync
  execute-command: '$WEBHOOK_BIN/s3sync.sh'
  command-working-directory: '$WORKDIR'
  http-methods: ['POST']
  include-command-output-in-response: true
  include-command-output-in-response-on-error: true
  pass-environment-to-command:
  - source: 'payload'
    envname: 'AWS_KEY'
    name: 'aws.key'
  - source: 'payload'
    envname: 'AWS_SECRET_KEY'
    name: 'aws.secret_key'
  - source: 'payload'
    envname: 'S3_BUCKET'
    name: 's3.bucket'
  - source: 'payload'
    envname: 'S3_REGION'
    name: 's3.region'
  - source: 'payload'
    envname: 'S3_PATH'
    name: 's3.path'
  - source: 'payload'
    envname: 'SCS_PATH'
    name: 'scs.path'
  stream-command-output: true
EOF
}
print_token_yml() {
  if [ "$1" ]; then
    cat << EOF
  trigger-rule:
    match:
      type: 'value'
      value: '$1'
      parameter:
        source: 'header'
        name: 'X-Webhook-Token'
EOF
  fi
}
exec_webhook() {
  # Validate WORKDIR
  if [ -z "$WEBHOOK_WORKDIR" ]; then
    echo "Must define the WEBHOOK_WORKDIR variable!" >&2
    exit 1
  fi
  WORKDIR="$(realpath "$WEBHOOK_WORKDIR" 2>/dev/null)" || true
  if [ ! -d "$WORKDIR" ]; then
    echo "The WEBHOOK_WORKDIR '$WEBHOOK_WORKDIR' is not a directory!" >&2
    exit 1
  fi
  # Get TOKENS, if the DU_TOKEN or HARDLINK_TOKEN is defined that is used, if
  # not if the COMMON_TOKEN that is used and in other case no token is checked
  # (that is the default)
  DU_TOKEN="${DU_TOKEN:-$COMMON_TOKEN}"
  HARDLINK_TOKEN="${HARDLINK_TOKEN:-$COMMON_TOKEN}"
  S3_TOKEN="${S3_TOKEN:-$COMMON_TOKEN}"
  # Create webhook configuration
  { 
    print_du_yml
    print_token_yml "$DU_TOKEN"
    echo ""
    print_hardlink_yml
    print_token_yml "$HARDLINK_TOKEN"
    echo ""
    print_s3sync_yml
    print_token_yml "$S3_TOKEN"
  }>"$WEBHOOK_YML"
  # Run the webhook command
  # shellcheck disable=SC2086
  exec webhook -hooks "$WEBHOOK_YML" $WEBHOOK_OPTS
}
# ----
# MAIN
# ----
case "$1" in
"server") exec_webhook ;;
*) exec "$@" ;;
esac

The entrypoint.sh script generates the configuration file for the webhook server calling functions that print a yaml section for each hook and optionally adds rules to validate access to them comparing the value of a X-Webhook-Token header against predefined values.

The expected token values are taken from environment variables, we can define a token variable for each hook (DU_TOKEN, HARDLINK_TOKEN or S3_TOKEN) and a fallback value (COMMON_TOKEN); if no token variable is defined for a hook no check is done and everybody can call it.

The Hook Definition documentation explains the options you can use for each hook, the ones we have right now do the following:

  • du: runs on the $WORKDIR directory, passes as first argument to the script the value of the path query parameter and sets the variable OUTPUT_FORMAT to the fixed value json (we use that to print the output of the script in JSON format instead of text).
  • hardlink: runs on the $WORKDIR directory and takes no parameters.
  • s3sync: runs on the $WORKDIR directory and sets a lot of environment variables from values read from the JSON encoded payload sent by the caller (all the values must be sent by the caller even if they are assigned an empty value, if they are missing the hook fails without calling the script); we also set the stream-command-output value to true to make the script show its output as it is working (we patched the webhook source to be able to use this option).
The du hook script

The du hook script code checks if the argument passed is a directory, computes its size using the du command and prints the results in text format or as a JSON dictionary:

hooks/du.sh
#!/bin/sh
set -e
# Script to print disk usage for a PATH inside the scs folder
# ---------
# FUNCTIONS
# ---------
print_error() {
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo "{\"error\":\"$*\"}"
  else
    echo "$*" >&2
  fi
  exit 1
}
usage() {
  if [ "$OUTPUT_FORMAT" = "json" ]; then
    echo "{\"error\":\"Pass arguments as '?path=XXX\"}"
  else
    echo "Usage: $(basename "$0") PATH" >&2
  fi
  exit 1
}
# ----
# MAIN
# ----
if [ "$#" -eq "0" ] || [ -z "$1" ]; then
  usage
fi
if [ "$1" = "." ]; then
  DU_PATH="./"
else
  DU_PATH="$(find . -name "$1" -mindepth 1 -maxdepth 1)" || true
fi
if [ -z "$DU_PATH" ] || [ ! -d "$DU_PATH/." ]; then
  print_error "The provided PATH ('$1') is not a directory"
fi
# Print disk usage in bytes for the given PATH
OUTPUT="$(du -b -s "$DU_PATH")"
if [ "$OUTPUT_FORMAT" = "json" ]; then
  # Format output as {"path":"PATH","bytes":"BYTES"}
  echo "$OUTPUT" |
    sed -e "s%^\(.*\)\t.*/\(.*\)$%{\"path\":\"\2\",\"bytes\":\"\1\"}%" |
    tr -d '\n'
else
  # Print du output as is
  echo "$OUTPUT"
fi
# vim: ts=2:sw=2:et:ai:sts=2
The hardlink hook script

The hardlink hook script is really simple, it just runs the util-linux version of the hardlink command on its working directory:

hooks/hardlink.sh
#!/bin/sh
hardlink --ignore-time --maximize .

We use that to reduce the size of the stored content; to manage versions of files and folders we keep each version on a separate directory and when one or more files are not changed this script makes them hardlinks to the same file on disc, reducing the space used on disk.

The s3sync hook script

The s3sync hook script uses the s3fs tool to mount a bucket and synchronise data between a folder inside the bucket and a directory on the filesystem using rsync; all values needed to execute the task are taken from environment variables:

hooks/s3sync.sh
#!/bin/ash
set -euo pipefail
set -o errexit
set -o errtrace
# Functions
finish() {
  ret="$1"
  echo ""
  echo "Script exit code: $ret"
  exit "$ret"
}
# Check variables
if [ -z "$AWS_KEY" ] || [ -z "$AWS_SECRET_KEY" ] || [ -z "$S3_BUCKET" ] ||
  [ -z "$S3_PATH" ] || [ -z "$SCS_PATH" ]; then
  [ "$AWS_KEY" ] || echo "Set the AWS_KEY environment variable"
  [ "$AWS_SECRET_KEY" ] || echo "Set the AWS_SECRET_KEY environment variable"
  [ "$S3_BUCKET" ] || echo "Set the S3_BUCKET environment variable"
  [ "$S3_PATH" ] || echo "Set the S3_PATH environment variable"
  [ "$SCS_PATH" ] || echo "Set the SCS_PATH environment variable"
  finish 1
fi
if [ "$S3_REGION" ] && [ "$S3_REGION" != "us-east-1" ]; then
  EP_URL="endpoint=$S3_REGION,url=https://s3.$S3_REGION.amazonaws.com"
else
  EP_URL="endpoint=us-east-1"
fi
# Prepare working directory
WORK_DIR="$(mktemp -p "$HOME" -d)"
MNT_POINT="$WORK_DIR/s3data"
PASSWD_S3FS="$WORK_DIR/.passwd-s3fs"
# Check the moutpoint
if [ ! -d "$MNT_POINT" ]; then
  mkdir -p "$MNT_POINT"
elif mountpoint "$MNT_POINT"; then
  echo "There is already something mounted on '$MNT_POINT', aborting!"
  finish 1
fi
# Create password file
touch "$PASSWD_S3FS"
chmod 0400 "$PASSWD_S3FS"
echo "$AWS_KEY:$AWS_SECRET_KEY" >"$PASSWD_S3FS"
# Mount s3 bucket as a filesystem
s3fs -o dbglevel=info,retries=5 -o "$EP_URL" -o "passwd_file=$PASSWD_S3FS" \
  "$S3_BUCKET" "$MNT_POINT"
echo "Mounted bucket '$S3_BUCKET' on '$MNT_POINT'"
# Remove the password file, just in case
rm -f "$PASSWD_S3FS"
# Check source PATH
ret="0"
SRC_PATH="$MNT_POINT/$S3_PATH"
if [ ! -d "$SRC_PATH" ]; then
  echo "The S3_PATH '$S3_PATH' can't be found!"
  ret=1
fi
# Compute SCS_UID & SCS_GID (by default based on the working directory owner)
SCS_UID="${SCS_UID:=$(stat -c "%u" "." 2>/dev/null)}" || true
SCS_GID="${SCS_GID:=$(stat -c "%g" "." 2>/dev/null)}" || true
# Check destination PATH
DST_PATH="./$SCS_PATH"
if [ "$ret" -eq "0" ] && [ -d "$DST_PATH" ]; then
  mkdir -p "$DST_PATH" || ret="$?"
fi
# Copy using rsync
if [ "$ret" -eq "0" ]; then
  rsync -rlptv --chown="$SCS_UID:$SCS_GID" --delete --stats \
    "$SRC_PATH/" "$DST_PATH/" || ret="$?"
fi
# Unmount the S3 bucket
umount -f "$MNT_POINT"
echo "Called umount for '$MNT_POINT'"
# Remove mount point dir
rmdir "$MNT_POINT"
# Remove WORK_DIR
rmdir "$WORK_DIR"
# We are done
finish "$ret"
# vim: ts=2:sw=2:et:ai:sts=2
Deployment objects

The system is deployed as a StatefulSet with one replica.

Our production deployment is done on AWS and to be able to scale we use EFS for our PersistenVolume; the idea is that the volume has no size limit, its AccessMode can be set to ReadWriteMany and we can mount it from multiple instances of the Pod without issues, even if they are in different availability zones.

For development we use k3d and we are also able to scale the StatefulSet for testing because we use a ReadWriteOnce PVC, but it points to a hostPath that is backed up by a folder that is mounted on all the compute nodes, so in reality Pods in different k3d nodes use the same folder on the host.

secrets.yaml

The secrets file contains the files used by the mysecureshell container that can be generated using kubernetes pods as follows (we are only creating the scs user):

$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-host-keys >"./host_keys.txt"
$ kubectl run "mysecureshell" --restart='Never' --quiet --rm --stdin \
  --image "stodh/mysecureshell:latest" -- gen-users-tar scs >"./users.tar"

Once we have the files we can generate the secrets.yaml file as follows:

$ tar xf ./users.tar user_keys.txt user_pass.txt
$ kubectl --dry-run=client -o yaml create secret generic "scs-secret" \
  --from-file="host_keys.txt=host_keys.txt" \
  --from-file="user_keys.txt=user_keys.txt" \
  --from-file="user_pass.txt=user_pass.txt" > ./secrets.yaml

The resulting secrets.yaml will look like the following file (the base64 would match the content of the files, of course):

secrets.yaml
apiVersion: v1
data:
  host_keys.txt: TWlt...
  user_keys.txt: c2Nz...
  user_pass.txt: c2Nz...
kind: Secret
metadata:
  creationTimestamp: null
  name: scs-secret
pvc.yaml

The persistent volume claim for a simple deployment (one with only one instance of the statefulSet) can be as simple as this:

pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi

On this definition we don’t set the storageClassName to use the default one.

Volumes in our development environment (k3d)

In our development deployment we create the following PersistentVolume as required by the Local Persistence Volume Static Provisioner (note that the /volumes/scs-pv has to be created by hand, in our k3d system we mount the same host directory on the /volumes path of all the nodes and create the scs-pv directory by hand before deploying the persistent volume):

k3d-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: scs-pv
  labels:
    app.kubernetes.io/name: scs
spec:
  capacity:
    storage: 8Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  claimRef:
    name: scs-pvc
  storageClassName: local-storage
  local:
    path: /volumes/scs-pv
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
          - k3s
Note:

The nodeAffinity section is required but in practice the current definition selects all k3d nodes.

And to make sure that everything works as expected we update the PVC definition to add the right storageClassName:

k3d-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: local-storage
Volumes in our production environment (aws)

In the production deployment we don’t create the PersistentVolume (we are using the aws-efs-csi-driver which supports Dynamic Provisioning) but we add the storageClassName (we set it to the one mapped to the EFS driver, i.e. efs-sc) and set ReadWriteMany as the accessMode:

efs-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: scs-pvc
  labels:
    app.kubernetes.io/name: scs
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 8Gi
  storageClassName: efs-sc
statefulset.yaml

The definition of the statefulSet is as follows:

statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: scs
  labels:
    app.kubernetes.io/name: scs
spec:
  serviceName: scs
  replicas: 1
  selector:
    matchLabels:
      app: scs
  template:
    metadata:
      labels:
        app: scs
    spec:
      containers:
      - name: nginx
        image: stodh/nginx-scs:latest
        ports:
        - containerPort: 80
          name: http
        env:
        - name: AUTH_REQUEST_URI
          value: ""
        - name: HTML_ROOT
          value: /sftp/data
        volumeMounts:
        - mountPath: /sftp
          name: scs-datadir
      - name: mysecureshell
        image: stodh/mysecureshell:latest
        ports:
        - containerPort: 22
          name: ssh
        securityContext:
          capabilities:
            add:
            - IPC_OWNER
        env:
        - name: SFTP_UID
          value: '2020'
        - name: SFTP_GID
          value: '2020'
        volumeMounts:
        - mountPath: /secrets
          name: scs-file-secrets
          readOnly: true
        - mountPath: /sftp
          name: scs-datadir
      - name: webhook
        image: stodh/webhook-scs:latest
        securityContext:
          privileged: true
        ports:
        - containerPort: 9000
          name: webhook-http
        env:
        - name: WEBHOOK_WORKDIR
          value: /sftp/data/scs
        volumeMounts:
        - name: devfuse
          mountPath: /dev/fuse
        - mountPath: /sftp
          name: scs-datadir
      volumes:
      - name: devfuse
        hostPath:
          path: /dev/fuse
      - name: scs-file-secrets
        secret:
          secretName: scs-secrets
      - name: scs-datadir
        persistentVolumeClaim:
          claimName: scs-pvc

Notes about the containers:

  • nginx: As this is an example the web server is not using an AUTH_REQUEST_URI and uses the /sftp/data directory as the root of the web (to get to the files uploaded for the scs user we will need to use /scs/ as a prefix on the URLs).
  • mysecureshell: We are adding the IPC_OWNER capability to the container to be able to use some of the sftp-* commands inside it, but they are not really needed, so adding the capability is optional.
  • webhook: We are launching this container in privileged mode to be able to use the s3fs-fuse, as it will not work otherwise for now (see this kubernetes issue); if the functionality is not needed the container can be executed with regular privileges; besides, as we are not enabling public access to this service we don’t define *_TOKEN variables (if required the values should be read from a Secret object).

Notes about the volumes:

  • the devfuse volume is only needed if we plan to use the s3fs command on the webhook container, if not we can remove the volume definition and its mounts.
service.yaml

To be able to access the different services on the statefulset we publish the relevant ports using the following Service object:

service.yaml
apiVersion: v1
kind: Service
metadata:
  name: scs-svc
  labels:
    app.kubernetes.io/name: scs
spec:
  ports:
  - name: ssh
    port: 22
    protocol: TCP
    targetPort: 22
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: webhook-http
    port: 9000
    protocol: TCP
    targetPort: 9000
  selector:
    app: scs
ingress.yaml

To download the scs files from the outside we can add an ingress object like the following (the definition is for testing using the localhost name):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: scs-ingress
  labels:
    app.kubernetes.io/name: scs
spec:
  ingressClassName: nginx
  rules:
  - host: 'localhost'
    http:
      paths:
      - path: /scs
        pathType: Prefix
        backend:
          service:
            name: scs-svc
            port:
              number: 80
Deployment

To deploy the statefulSet we create a namespace and apply the object definitions shown before:

$ kubectl create namespace scs-demo
namespace/scs-demo created
$ kubectl -n scs-demo apply -f secrets.yaml
secret/scs-secrets created
$ kubectl -n scs-demo apply -f pvc.yaml
persistentvolumeclaim/scs-pvc created
$ kubectl -n scs-demo apply -f statefulset.yaml
statefulset.apps/scs created
$ kubectl -n scs-demo apply -f service.yaml
service/scs-svc created
$ kubectl -n scs-demo apply -f ingress.yaml
ingress.networking.k8s.io/scs-ingress created

Once the objects are deployed we can check that all is working using kubectl:

$ kubectl  -n scs-demo get all,secrets,ingress
NAME        READY   STATUS    RESTARTS   AGE
pod/scs-0   3/3     Running   0          24s

NAME            TYPE       CLUSTER-IP  EXTERNAL-IP  PORT(S)                  AGE
service/scs-svc ClusterIP  10.43.0.47  <none>       22/TCP,80/TCP,9000/TCP   21s

NAME                   READY   AGE
statefulset.apps/scs   1/1     24s

NAME                         TYPE                                  DATA   AGE
secret/default-token-mwcd7   kubernetes.io/service-account-token   3      53s
secret/scs-secrets           Opaque                                3      39s

NAME                                   CLASS  HOSTS      ADDRESS     PORTS   AGE
ingress.networking.k8s.io/scs-ingress  nginx  localhost  172.21.0.5  80      17s

At this point we are ready to use the system.

Usage examplesFile uploads

As previously mentioned in our system the idea is to use the sftp server from other Pods, but to test the system we are going to do a kubectl port-forward and connect to the server using our host client and the password we have generated (it is on the user_pass.txt file, inside the users.tar archive):

$ kubectl -n scs-demo port-forward service/scs-svc 2020:22 &
Forwarding from 127.0.0.1:2020 -> 22
Forwarding from [::1]:2020 -> 22
$ PF_PID=$!
$ sftp -P 2020 scs@127.0.0.1                                                 1
Handling connection for 2020
The authenticity of host '[127.0.0.1]:2020 ([127.0.0.1]:2020)' can't be \
  established.
ED25519 key fingerprint is SHA256:eHNwCnyLcSSuVXXiLKeGraw0FT/4Bb/yjfqTstt+088.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[127.0.0.1]:2020' (ED25519) to the list of known \
  hosts.
scs@127.0.0.1's password: **********
Connected to 127.0.0.1.
sftp> ls -la
drwxr-xr-x    2 sftp     sftp         4096 Sep 25 14:47 .
dr-xr-xr-x    3 sftp     sftp         4096 Sep 25 14:36 ..
sftp> !date -R > /tmp/date.txt                                               2
sftp> put /tmp/date.txt .
Uploading /tmp/date.txt to /date.txt
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt
sftp> ln date.txt date.txt.1                                                 3
sftp> ls -l
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
sftp> put /tmp/date.txt date.txt.2                                           4
Uploading /tmp/date.txt to /date.txt.2
date.txt                                      100%   32    27.8KB/s   00:00
sftp> ls -l                                                                  5
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt
-rw-r--r--    2 sftp     sftp           32 Sep 25 15:21 date.txt.1
-rw-r--r--    1 sftp     sftp           32 Sep 25 15:21 date.txt.2
sftp> exit
$ kill "$PF_PID"
[1]  + terminated  kubectl -n scs-demo port-forward service/scs-svc 2020:22
  1. We connect to the sftp service on the forwarded port with the scs user.
  2. We put a file we have created on the host on the directory.
  3. We do a hard link of the uploaded file.
  4. We put a second copy of the file we created locally.
  5. On the file list we can see that the two first files have two hardlinks
File retrievals

If our ingress is configured right we can download the date.txt file from the URL http://localhost/scs/date.txt:

$ curl -s http://localhost/scs/date.txt
Sun, 25 Sep 2022 17:21:51 +0200
Use of the webhook container

To finish this post we are going to show how we can call the hooks directly, from a CronJob and from a Job.

Direct script call (du)

In our deployment the direct calls are done from other Pods, to simulate it we are going to do a port-forward and call the script with an existing PATH (the root directory) and a bad one:

$ kubectl -n scs-demo port-forward service/scs-svc 9000:9000 >/dev/null &
$ PF_PID=$!
$ JSON="$(curl -s "http://localhost:9000/hooks/du?path=.")"
$ echo $JSON
{"path":"","bytes":"4160"}
$ JSON="$(curl -s "http://localhost:9000/hooks/du?path=foo")"
$ echo $JSON
{"error":"The provided PATH ('foo') is not a directory"}
$ kill $PF_PID

As we only have files on the base directory we print the disk usage of the . PATH and the output is in json format because we export OUTPUT_FORMAT with the value json on the webhook configuration.

Cronjobs (hardlink)

As explained before, the webhook container can be used to run cronjobs; the following one uses an alpine container to call the hardlink script each minute (that setup is for testing, obviously):

webhook-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hardlink
  labels:
    cronjob: 'hardlink'
spec:
  schedule: "* */1 * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            cronjob: 'hardlink'
        spec:
          containers:
          - name: hardlink-cronjob
            image: alpine:latest
            command: ["wget", "-q", "-O-", "http://scs-svc:9000/hooks/hardlink"]
          restartPolicy: Never

The following console session shows how we create the object, allow a couple of executions and remove it (in production we keep it running but once a day, not each minute):

$ kubectl -n scs-demo apply -f webhook-cronjob.yaml                          1
cronjob.batch/hardlink created
$ kubectl -n scs-demo get pods -l "cronjob=hardlink" -w                      2
NAME                      READY   STATUS    RESTARTS   AGE
hardlink-27735351-zvpnb   0/1     Pending   0          0s
hardlink-27735351-zvpnb   0/1     ContainerCreating   0          0s
hardlink-27735351-zvpnb   0/1     Completed           0          2s
^C
$ kubectl -n scs-demo logs pod/hardlink-27735351-zvpnb                       3
Mode:                     real
Method:                   sha256
Files:                    3
Linked:                   1 files
Compared:                 0 xattrs
Compared:                 1 files
Saved:                    32 B
Duration:                 0.000220 seconds
$ sleep 60
$ kubectl -n scs-demo get pods -l "cronjob=hardlink"                         4
NAME                      READY   STATUS      RESTARTS   AGE
hardlink-27735351-zvpnb   0/1     Completed   0          83s
hardlink-27735352-br5rn   0/1     Completed   0          23s
$ kubectl -n scs-demo logs pod/hardlink-27735352-br5rn                       5
Mode:                     real
Method:                   sha256
Files:                    3
Linked:                   0 files
Compared:                 0 xattrs
Compared:                 0 files
Saved:                    0 B
Duration:                 0.000070 seconds
$ kubectl -n scs-demo delete -f webhook-cronjob.yaml                         6
cronjob.batch "hardlink" deleted
  1. This command creates the cronjob object.
  2. This checks the pods with our cronjob label, we interrupt it once we see that the first run has been completed.
  3. With this command we see the output of the execution, as this is the fist execution we see that date.txt.2 has been replaced by a hardlink (the summary does not name the file, but it is the only option knowing the contents from the original upload).
  4. After waiting a little bit we check the pods executed again to get the name of the latest one.
  5. The log now shows that nothing was done.
  6. As this is a demo, we delete the cronjob.
Jobs (s3sync)

The following job can be used to synchronise the contents of a directory in a S3 bucket with the SCS Filesystem:

job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: s3sync
  labels:
    cronjob: 's3sync'
spec:
  template:
    metadata:
      labels:
        cronjob: 's3sync'
    spec:
      containers:
      - name: s3sync-job
        image: alpine:latest
        command: 
        - "wget"
        - "-q"
        - "--header"
        - "Content-Type: application/json"
        - "--post-file"
        - "/secrets/s3sync.json"
        - "-O-"
        - "http://scs-svc:9000/hooks/s3sync"
        volumeMounts:
        - mountPath: /secrets
          name: job-secrets
          readOnly: true
      restartPolicy: Never
      volumes:
      - name: job-secrets
        secret:
          secretName: webhook-job-secrets

The file with parameters for the script must be something like this:

s3sync.json
{
  "aws": {
    "key": "********************",
    "secret_key": "****************************************"
  },
  "s3": {
    "region": "eu-north-1",
    "bucket": "blogops-test",
    "path": "test"
  },
  "scs": {
    "path": "test"
  }
}

Once we have both files we can run the Job as follows:

$ kubectl -n scs-demo create secret generic webhook-job-secrets \            1
  --from-file="s3sync.json=s3sync.json"
secret/webhook-job-secrets created
$ kubectl -n scs-demo apply -f webhook-job.yaml                              2
job.batch/s3sync created
$ kubectl -n scs-demo get pods -l "cronjob=s3sync"                           3
NAME           READY   STATUS      RESTARTS   AGE
s3sync-zx2cj   0/1     Completed   0          12s
$ kubectl -n scs-demo logs s3sync-zx2cj                                      4
Mounted bucket 's3fs-test' on '/root/tmp.jiOjaF/s3data'
sending incremental file list
created directory ./test
./
kyso.png

Number of files: 2 (reg: 1, dir: 1)
Number of created files: 2 (reg: 1, dir: 1)
Number of deleted files: 0
Number of regular files transferred: 1
Total file size: 15,075 bytes
Total transferred file size: 15,075 bytes
Literal data: 15,075 bytes
Matched data: 0 bytes
File list size: 0
File list generation time: 0.147 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 15,183
Total bytes received: 74

sent 15,183 bytes  received 74 bytes  30,514.00 bytes/sec
total size is 15,075  speedup is 0.99
Called umount for '/root/tmp.jiOjaF/s3data'

Script exit code: 0
$ kubectl -n scs-demo delete -f webhook-job.yaml                             5
job.batch "s3sync" deleted
$ kubectl -n scs-demo delete secrets webhook-job-secrets                     6
secret "webhook-job-secrets" deleted
  1. Here we create the webhook-job-secrets secret that contains the s3sync.json file.
  2. This command runs the job.
  3. Checking the label cronjob=s3sync we get the Pods executed by the job.
  4. Here we print the logs of the completed job.
  5. Once we are finished we remove the Job.
  6. And also the secret.
Final remarks

This post has been longer than I expected, but I believe it can be useful for someone; in any case, next time I’ll try to explain something shorter or will split it into multiple entries.

Shirish Agarwal: Rama II, Arthur C. Clarke, Aliens

25 September, 2022 - 16:07
Rama II

This would be more of a short post about the current book I am reading. Now people who have seen Arrival would probably be more at home. People who have also seen Avatar would also be familiar to the theme or concept I am sharing about. Now before I go into detail, it seems that Arthur C. Clarke wanted to use a powerful god or mythological character for the name and that is somehow the RAMA series started.

Now the first book in the series explores an extraterrestrial spaceship that earth people see/connect with. The spaceship is going somewhere and is doing an Earth flyby so humans don’t have much time to explore the spaceship and it is difficult to figure out how the spaceship worked. The spaceship is around 40 km. long. They don’t meet any living Ramans but mostly automated systems and something called biots.

As I’m still reading it, I can’t really say what happens next. Although in Rama or Rama I, the powers that be want to destroy it while in the end last they don’t. Whether they could have destroyed it or not would be whole another argument. What people need to realize is that the book is a giant ‘What IF’ scenario.

Aliens

If there were any intelligent life in the Universe, I don’t think they will take the pain of visiting Earth. And the reasons are far more mundane than anything else. Look at how we treat each other. One of the largest democracies on Earth, The U.S. has been so divided. While the progressives have made some good policies, the Republicans are into political stunts, consider the political stunt of sending Refugees to Martha’s Vineyard. The ex-president also made a statement that he can declassify anything just by thinking about it. Now understand this, a refugee is a legal migrant whose papers would be looked into by the American Govt. and till the time he/she/their application is approved or declined they can work, have a house, or do whatever to support themselves. There is a huge difference between having refugee status and being an undocumented migrant. And it isn’t as if the Republicans don’t know this, they did it because they thought they will be able to get away with it.

Both the above episodes don’t throw us in a good light. If we treat others like the above, how can we expect to be treated? And refugees always have a hard time, not just in the U.S, , the UK you name it. The UK just some months ago announced a controversial deal where they will send Refugees to Rwanda while their refugee application is accepted or denied, most of them would be denied.

The Indian Government is more of the same. A friend, a casual acquaintance Nishant Shah shared the same issues as I had shared a few weeks back even though he’s an NRI. So, it seems we are incapable of helping ourselves as well as helping others. On top of it, we have the temerity of using the word ‘alien’ for them.

Now, just for a moment, imagine you are an intelligent life form. An intelligent life-form that could coax energy from the stars, why would you come to Earth, where the people at large have already destroyed more than half of the atmosphere and still arguing about it with the other half. On top of it, we see a list of authoritarian figures like Putin, Xi Jinping whose whole idea is to hold on to power for as long as they can, damn the consequences. Mr. Modi is no different, he is the dumbest of the lot and that’s saying something. Most of the projects made by him are in disarray, Pune Metro, my city giving an example. And this is when Pune was the first applicant to apply for a Metro. Just like the UK, India too has tanked the economy under his guidance. Every time they come closer to target dates, the targets are put far into the future, for e.g. now they have said 2040 for a good economy. And just like in other countries, he has some following even though he has a record of failure in every sector of the economy, education, and defense, the list is endless. There isn’t a single accomplishment by him other than screwing with other religions. Most of my countrymen also don’t really care or have a bother to see how the economy grows and how exports play a crucial part otherwise they would be more alert. Also, just like the UK, India too gave tax cuts to the wealthy, most people don’t understand how economies function and the PM doesn’t care. The media too is subservient and because nobody asks the questions, nobody seems to be accountable :(.

Religion

There is another aspect that also has been to the fore, just like in medieval times, I see a great fervor for religion happening here, especially since the pandemic and people are much more insecure than ever before. Before, I used to think that insecurity and religious appeal only happen in the uneducated, and I was wrong. I have friends who are highly educated and yet still are blinded by religion. In many such cases or situations, I find their faith to be a sham. If you have faith, then there shouldn’t be any room for doubt or insecurity. And if you are not in doubt or insecure, you won’t need to talk about your religion. The difference between the two is that a person is satiated himself/herself/themselves with thirst and hunger. That person would be in a relaxed mode while the other person would continue to create drama as there is no peace in their heart.

Another fact is none of the major religions, whether it is Christianity, Islam, Buddhism or even Hinduism has allowed for the existence of extraterrestrials. We have already labeled them as ‘aliens’ even before meeting them & just our imagination. And more often than not, we end up killing them. There are and have been scores of movies that have explored the idea. Independence day, Aliens, Arrival, the list goes on and on. And because our religions have never thought about the idea of ET’s and how they will affect us, if ET’s do come, all the religions and religious practices would panic and die. That is the possibility why even the 1947 Roswell Incident has been covered up .

If the above was not enough, the bombing of Hiroshima and Nagasaki by the Americans would always be a black mark against humanity. From the alien perspective, if you look at the technology that they have vis-a-vis what we have, they will probably think of us as spoilt babies and they wouldn’t be wrong. Spoilt babies with nuclear weapons are not exactly a healthy mix

Earth

To add to our fragile ego, we didn’t even leave earth even though we have made sure we exploit it as much as we can. We even made the anthropocentric or homocentric view that makes man the apex animal and to top it we have this weird idea that extraterrestrials come here or will invade for water. A species that knows how to get energy out of stars but cannot make a little of H2O. The idea belies logic and again has been done to death. Why we as humans are so insecure even though we have been given so much I fail to understand. I have shared on numerous times the Kardeshev Scale on this blog itself.

The above are some of the reasons why Arthur C. Clarke’s works are so controversial and this is when I haven’t even read the whole book. It forces us to ask questions that we normally would never think about. And I have to repeat that when these books were published for the first time, they were new ideas. All the movies, from Stanley Kubrick’s 2001: Space Odyssey, Aliens, Arrival, and Avatar, somewhere or the other reference some aspect of this work. It is highly possible that I may read and re-read the book couple of times before beginning the next one. There is also quite a bit of human drama, but then that is to be expected. I have to admit I did have some nice dreams after reading just the first few pages, imagining being given the opportunity to experience an Extraterrestrial spaceship that is beyond our wildest dreams. While the Governments may try to cover up or something, the ones who get to experience that spacecraft would be unimaginable. And if they were able to share the pictures or a Livestream, it would be nothing short of amazing.

For those who want to, there is a lot going on with the New James Webb Telescope. I am sure it would give rise to more questions than answers.

Ian Jackson: Please vote in favour of the Debian Social Contract change

25 September, 2022 - 02:08

tl;dr: Please vote in favour of the Debian Social Contract change, by ranking all of its options above None of the Above. Rank the SC change options above corresponding options that do not change the Social Contract.

Vote to change the SC even if you think the change is not necessary for Debian to prominently/officially provide an installer with-nonfree-firmware.

Why vote for SC change even if I think it’s not needed?

I’m addressing myself primarily to the reader who agrees with me that Debian ought to be officially providing with-firmware images. I think it is very likely that the winning option will be one of the ones which asks for an official and prominent with-firmware installer.

However, many who oppose this change believe that it would be a breach of Debian’s Social Contract. This is a very reasonable and arguable point of view. Indeed, I’m inclined to share it.

If the winning option is to provide a with-firmware installer (perhaps, only a with-firmware installer) those people will feel aggrieved. They will, quite reasonably, claim that the result of the vote is illegitimate - being contrary to Debian’s principles as set out in the Social Contract, which require a 3:1 majority to change.

There is even the possibility that the Secretary may declare the GR result void, as contrary to the Constitution! (Sadly, I am not making this up.) This would cast Debian into (yet another) acrimonious constitutional and governance crisis.

The simplest answer is to amend the Social Contract to explicitly permit what is being proposed. Holger’s option F and Russ’s option E do precisely that.

Amending the SC is not an admission that it was legally necessary to do so. It is practical politics: it ensures that we have clear authority and legitimacy.

Aren’t we softening Debian’s principles?

I think prominently distributing an installer that can work out of the box on the vast majority of modern computers would help Debian advance our users’ freedom.

I see user freedom as a matter of practical capability, not theoretical purity. Anyone living in the modern world must make compromises. It is Debian’s job to help our users (and downstreams) minimise those compromises and retain as much control as possible over the computers in their life. Insisting that a user buys different hardware, or forcing them to a different distro, does not serve that goal.

I don’t really expect to convince anyone with such a short argument, but I do want to make the point that providing an installer that users can use to obtain a lot of practical freedom is also, for many of us, a matter of principle.



comments

Gunnar Wolf: 6237415

23 September, 2022 - 23:03

Years ago, it was customary that some of us stated publicly the way we think in time of Debian General Resolutions (GRs). And even if we didn’t, vote lists were open (except when voting for people, i.e. when electing a DPL), so if interested we could understand what our different peers thought.

This is the first vote, though, where a Debian vote is protected under voting secrecy. I think it is sad we chose that path, as I liken a GR vote more with a voting process within a general assembly of a cooperative than with a countrywide voting one; I feel that understanding who is behind each posture helps us better understand the project as a whole.

But anyway, I’m digressing… Even though I remained quiet during much of the discussion period (I was preparing and attending a conference), I am very much interested in this vote — I am the maintainer for the Raspberry Pi firmware, and am a seconder for two of them. Many people know me for being quite inflexible in my interpretation of what should be considered Free Software, and I’m proud of it. But still, I believer it to be fundamental for Debian to be able to run on the hardware most users have.

So… My vote was as follows:

[6] Choice 1: Only one installer, including non-free firmware
[2] Choice 2: Recommend installer containing non-free firmware
[3] Choice 3: Allow presenting non-free installers alongside the free one
[7] Choice 4: Installer with non-free software is not part of Debian
[4] Choice 5: Change SC for non-free firmware in installer, one installer
[1] Choice 6: Change SC for non-free firmware in installer, keep both installers
[5] Choice 7: None Of The Above

For people reading this not into Debian’s voting processes: Debian uses the cloneproof Schwatz sequential dropping Condorcet method, which means we don’t only choose our favorite option (which could lead to suboptimal strategic voting outcomes), but we rank all the options according to our preferences.

To read this vote, we should first locate position of “None of the above”, which for my ballot is #5. Let me reorder the ballot according to my preferences:

[1] Choice 6: Change SC for non-free firmware in installer, keep both installers
[2] Choice 2: Recommend installer containing non-free firmware
[3] Choice 3: Allow presenting non-free installers alongside the free one
[4] Choice 5: Change SC for non-free firmware in installer, one installer
[5] Choice 7: None Of The Above
[6] Choice 1: Only one installer, including non-free firmware
[7] Choice 4: Installer with non-free software is not part of Debian

This is, I don’t agree either with Steve McIntyre’s original proposal, Choice 1 (even though I seconded it, this means, I think it’s very important to have this vote, and as a first proposal, it’s better than the status quo — maybe it’s contradictory that I prefer it to the status quo, but ranked it below NotA. Well, more on that when I present Choice 5).

My least favorite option is Choice 4, presented by Simon Josefsson, which represents the status quo: I don’t want Debian not to have at all an installer that cannot be run on most modern hardware with reasonably good user experience (i.e. network support — or the ability to boot at all!)

Slightly above my acceptability threshold, I ranked Choice 5, presented by Russ Allbery. Debian’s voting and its constitution rub each other in interesting ways, so the Project Secretary has to run the votes as they are presented… but he has interpreted Choice 1 to be incompatible with the Social Contract (as there would no longer be a DFSG-free installer available), and if it wins, it could lead him to having to declare the vote invalid. I don’t want that to happen, and that’s why I ranked Choice 1 below None of the above.

Other than that, Choice 6 (proposed by Holger Levsen), Choice 2 (proposed by me) and Choice 3 (proposed by Bart Martens) are very much similar; the main difference is that Choice 6 includes a modification to the Social Contract expressing that:

The Debian official media may include firmware that is otherwise not
part of the Debian system to enable use of Debian with hardware that
requires such firmware.

I believe choices 2 and 3 to be mostly the same, being Choice 2 more verbose in explaining the reasoning than Choice 3.

Oh! And there are always some more bits to the discussion… For example, given they hold modifications to the Social Contract, both Choice 5 and Choice 6 need a 3:1 supermajority to be valid.

So, lets wait until the beginning of October to get the results, and to implement the changes they will (or not?) allow. If you are a Debian Project Member, please vote!

Steve Kemp: Lisp macros are magical

23 September, 2022 - 21:30

In my previous post I introduced yet another Lisp interpreter. When it was posted there was no support for macros.

Since I've recently returned from a visit to the UK, and caught COVID-19 while I was there, I figured I'd see if my brain was fried by adding macro support.

I know lisp macros are awesome, it's one of those things that everybody is told. Repeatedly. I've used macros in my emacs programming off and on for a good few years, but despite that I'd not really given them too much thought.

If you know anything about lisp you know that it's all about the lists, the parenthesis, and the macros. Here's a simple macro I wrote:

 (define if2 (macro (pred one two)
    `(if ~pred (begin ~one ~two))))

The standard lisp if function allows you to write:

 (if (= 1 a) (print "a == 1") (print "a != 1"))

There are three arguments supplied to the if form:

  • The test to perform.
  • A single statement to execute if the test was true.
  • A single statement to execute if the test was not true.

My if2 macro instead has three arguments:

  • The test to perform.
  • The first statement to execute if the test was true.
  • The second statement to execute if the test was true.
  • i.e. There is no "else", or failure, clause.

This means I can write:

 (if2 blah
    (one..)
    (two..))

Rather than:

 (if blah
    (begin
       (one..)
       (two..)))

It is simple, clear, and easy to understand and a good building-block for writing a while function:

 (define while-fun (lambda (predicate body)
    (if2 (predicate)
       (body)
       (while-fun predicate body))))

There you see that if the condition is true then we call the supplied body, and then recurse. Doing two actions as a result of the single if test is a neat shortcut.

Of course we need to wrap that up in a macro, for neatness:

(define while (macro (expression body)
                 (list 'while-fun
                       (list 'lambda '() expression)
                       (list 'lambda '() body))))

Now we're done, and we can run a loop five times like so:

(let ((a 5))
  (while (> a 0)
     (begin
        (print "(while) loop - iteration %s" a)
        (set! a (- a 1) true))))

Output:

(while) loop - iteration 5
(while) loop - iteration 4
(while) loop - iteration 3
(while) loop - iteration 2
(while) loop - iteration 1

We've gone from using lists to having a while-loop, with a couple of simple macros and one neat recursive function.

There are a lot of cute things you can do with macros, and now I'm starting to appreciate them a little more. Of course it's not quite as magical as FORTH, but damn close!

Reproducible Builds (diffoscope): diffoscope 222 released

23 September, 2022 - 07:00

The diffoscope maintainers are pleased to announce the release of diffoscope version 222. This version includes the following changes:

[ Mattia Rizzolo ]
* Use pep517 and pip to load the requirements. (Closes: #1020091)
* Remove old Breaks/Replaces in debian/control that have been obsoleted since
  bullseye

You find out more by visiting the project homepage.

Jonathan Dowland: Nine Inch Nails, Cornwall, June

22 September, 2022 - 17:09

In June I travelled to see Nine Inch Nails perform two nights at the Eden Project in Cornwall. It'd been eight years since I last saw them live and when they announced the Eden shows, I thought it might be the only chance I'd get to see them for a long time. I committed, and sods law, a week or so later they announced a handful of single-night UK club shows. On the other hand, on previous tours where they'd typically book two club nights in each city, I've attended one night and always felt I should have done both, so this time I was making that happen.

Newquay

approach by air

Towan Beach (I think)

For personal reasons it's been a difficult year so it was nice to treat myself to a mini holiday. I stayed in Newquay, a seaside town with many similarities to the North East coast, as well as many differences. It's much bigger, and although we have a thriving surfing community in Tynemouth, Newquay have it on another level. They also have a lot more tourism, which is a double-edged sword: in Newquay, besides surfing, there was not a lot to do. There's a lot of tourist tat shops, and bars and cafes (som very nice ones), but no book shops, no record shops, very few of the quaint, unique boutique places we enjoy up here and possibly take for granted.

If you want tie-dyed t-shirts though, you're sorted.

Nine Inch Nails have a long-established, independently fan-run forum called Echoing The Sound. There is now also an official Discord server. I asked on both whether anyone was around in Newquay and wanted to meet up: not many people were! But I did meet a new friend, James, for a quiet drink. He was due to share a taxi with Sarah, who was flying in but her flight was delayed and she had to figure out another route.

Eden Project

the Eden Project

The Eden Project, the venue itself, is a fascinating place. I didn't realise until I'd planned most of my time there that the gig tickets granted you free entry into the Project on the day of the gig as well as the day after. It was quite tricky to get from Newquay to the Eden project, I would have been better off staying in St Austell itself perhaps, so I didn't take advantage of this, but I did have a couple of hours total to explore a little bit at the venue before the gig on each night.

Friday 17th (sunny)

Once I got to the venue I managed to meet up with several names from ETS and the Discord: James, Sarah (who managed to re-arrange flights), Pete and his wife (sorry I missed your name), Via Tenebrosa (she of crab hat fame), Dave (DaveDiablo), Elliot and his sister and finally James (sheapdean), someone who I've been talking to online for over a decade and finally met in person (and who taped both shows). I also tried to meet up with a friend from the Debian UK community (hi Lief) but I couldn't find him!

Support for Friday was Nitzer Ebb, who I wasn't familiar with before. There were two men on stage, one operating instruments, the other singing. It was a tough time to warm up the crowd, the venue was still very empty and it was very bright and sunny, but I enjoyed what I was hearing. They're definitely on my list. I later learned that the band's regular singer (Doug McCarthy) was unable to make it, and so the guy I was watching (Bon Harris) was standing in for full vocal duties. This made the performance (and their subsequent one at Hellfest the week after) all the more impressive.

Via (with crab hat), Sarah, me (behind). pic by kraw

(Day) and night one, Thursday, was very hot and sunny and the band seemed a little uncomfortable exposed on stage with little cover. Trent commented as such at least once. The setlist was eclectic: and I finally heard some of my white whale songs. Highlights for me were The Perfect Drug, which was unplayed from 1997-2018 and has now become a staple, and the second ever performance of Everything, the first being a few days earlier. Also notable was three cuts in a row from the last LP, Bad Witch, Heresy and Love Is Not Enough.

Saturday 18th (rain)

with Elliot, before

Day/night 2, Friday, was rainy all day. Support was Yves Tumor, who were an interesting clash of styles: a Prince/Bowie-esque inspired lead clashing with a rock-out lead guitarist styling himself similarly to Brian May.

I managed to find Sarah, Elliot (new gig best-buddy), Via and James (sheapdean) again. Pete was at this gig too, but opted to take a more relaxed position than the rail this time. I also spent a lot of time talking to a Canadian guy on a press pass (both nights) that I'm ashamed to have forgotten his name.

The dank weather had Nine Inch Nails in their element. I think night one had the more interesting setlist, but night two had the best performance, hands down. Highlights for me were mostly a string of heavier songs (in rough order of scarcity, from common to rarely played): wish, burn, letting you, reptile, every day is exactly the same, the line begins to blur, and finally, happiness in slavery, the first UK performance since 1994. This was a crushing set.

A girl in front of me was really suffering with the cold and rain after waiting at the venue all day to get a position on the rail. I thought she was going to pass out. A roadie with NIN noticed, and came over and gave her his jacket. He said if she waited to the end of the show and returned his jacket he'd give her a setlist, and true to his word, he did. This was a really nice thing to happen and really gave the impression that the folks who work on these shows are caring people.

Yep I was this close

A fuckin' rainbow! Photo by "Lazereth of Nazereth"

Afterwards

Night two did have some gentler songs and moments to remember: a re-arranged Sanctified (which ended a nineteen-year hiatus in 2013) And All That Could Have Been (recorded 2002, first played 2018), La Mer, during which the rain broke and we were presented with a beautiful pink-hued rainbow. They then segued into Less Than, providing the comic moment of the night when Trent noticed the rainbow mid-song; now a meme that will go down in NIN fan history.

Wrap-up

This was a blow-out, once in a lifetime trip to go and see a band who are at the top of their career in terms of performance. One problem I've had with NIN gigs in the past is suffering gig flashback to them when I go to other (inferior) gigs afterwards, and I'm pretty sure I will have this problem again. Doing both nights was worth it, the two experiences were very different and each had its own unique moments. The venue was incredible, and Cornwall is (modulo tourist trap stuff) beautiful.

Simon Josefsson: Privilege separation of GSS-API credentials for Apache

20 September, 2022 - 13:40

To protect web resources with Kerberos you may use Apache HTTPD with mod_auth_gssapi — however, all web scripts (e.g., PHP) run under Apache will have access to the Kerberos long-term symmetric secret credential (keytab). If someone can get it, they can impersonate your server, which is bad.

The gssproxy project makes it possible to introduce privilege separation to reduce the attack surface. There is a tutorial for RPM-based distributions (Fedora, RHEL, AlmaLinux, etc), but I wanted to get this to work on a DPKG-based distribution (Debian, Ubuntu, Trisquel, PureOS, etc) and found it worthwhile to document the process. I’m using Ubuntu 22.04 below, but have tested it on Debian 11 as well. I have adopted the gssproxy package in Debian, and testing this setup is part of the scripted autopkgtest/debci regression testing.

First install the required packages:

root@foo:~# apt-get update
root@foo:~# apt-get install -y apache2 libapache2-mod-auth-gssapi gssproxy curl

This should give you a working and running web server. Verify it is operational under the proper hostname, I’ll use foo.sjd.se in this writeup.

root@foo:~# curl --head http://foo.sjd.se/
HTTP/1.1 200 OK

The next step is to create a keytab containing the Kerberos V5 secrets for your host, the exact steps depends on your environment (usually kadmin ktadd or ipa-getkeytab), but use the string “HTTP/foo.sjd.se” and then confirm using something like the following.

root@foo:~# ls -la /etc/gssproxy/httpd.keytab
-rw------- 1 root root 176 Sep 18 06:44 /etc/gssproxy/httpd.keytab
root@foo:~# klist -k /etc/gssproxy/httpd.keytab -e
Keytab name: FILE:/etc/gssproxy/httpd.keytab
KVNO Principal
---- --------------------------------------------------------------------------
   2 HTTP/foo.sjd.se@GSSPROXY.EXAMPLE.ORG (aes256-cts-hmac-sha1-96) 
   2 HTTP/foo.sjd.se@GSSPROXY.EXAMPLE.ORG (aes128-cts-hmac-sha1-96) 
root@foo:~# 

The file should be owned by root and not be in the default /etc/krb5.keytab location, so Apache’s libapache2-mod-auth-gssapi will have to use gssproxy to use it.

Then configure gssproxy to find the credential and use it with Apache.

root@foo:~# cat<<EOF > /etc/gssproxy/80-httpd.conf
[service/HTTP]
mechs = krb5
cred_store = keytab:/etc/gssproxy/httpd.keytab
cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U
euid = www-data
process = /usr/sbin/apache2
EOF

For debugging, it may be useful to enable more gssproxy logging:

root@foo:~# cat<<EOF > /etc/gssproxy/gssproxy.conf
[gssproxy]
debug_level = 1
EOF
root@foo:~#

Restart gssproxy so it finds the new configuration, and monitor syslog as follows:

root@foo:~# tail -F /var/log/syslog &
root@foo:~# systemctl restart gssproxy

You should see something like this in the log file:

Sep 18 07:03:15 foo gssproxy[4076]: [2022/09/18 05:03:15]: Exiting after receiving a signal
Sep 18 07:03:15 foo systemd[1]: Stopping GSSAPI Proxy Daemon…
Sep 18 07:03:15 foo systemd[1]: gssproxy.service: Deactivated successfully.
Sep 18 07:03:15 foo systemd[1]: Stopped GSSAPI Proxy Daemon.
Sep 18 07:03:15 foo gssproxy[4092]: [2022/09/18 05:03:15]: Debug Enabled (level: 1)
Sep 18 07:03:15 foo systemd[1]: Starting GSSAPI Proxy Daemon…
Sep 18 07:03:15 foo gssproxy[4093]: [2022/09/18 05:03:15]: Kernel doesn't support GSS-Proxy (can't open /proc/net/rpc/use-gss-proxy: 2 (No such file or directory))
Sep 18 07:03:15 foo gssproxy[4093]: [2022/09/18 05:03:15]: Problem with kernel communication! NFS server will not work
Sep 18 07:03:15 foo systemd[1]: Started GSSAPI Proxy Daemon.
Sep 18 07:03:15 foo gssproxy[4093]: [2022/09/18 05:03:15]: Initialization complete.

The NFS-related errors is due to a default gssproxy configuration file, it is harmless and if you don’t use NFS with GSS-API you can silence it like this:

root@foo:~# rm /etc/gssproxy/24-nfs-server.conf
root@foo:~# systemctl try-reload-or-restart gssproxy

The log should now indicate that it loaded the keytab:

Sep 18 07:18:59 foo systemd[1]: Reloading GSSAPI Proxy Daemon…
Sep 18 07:18:59 foo gssproxy[4182]: [2022/09/18 05:18:59]: Received SIGHUP; re-reading config.
Sep 18 07:18:59 foo gssproxy[4182]: [2022/09/18 05:18:59]: Service: HTTP, Keytab: /etc/gssproxy/httpd.keytab, Enctype: 18
Sep 18 07:18:59 foo gssproxy[4182]: [2022/09/18 05:18:59]: New config loaded successfully.
Sep 18 07:18:59 foo systemd[1]: Reloaded GSSAPI Proxy Daemon.

To instruct Apache — or actually, the MIT Kerberos V5 GSS-API library used by mod_auth_gssap loaded by Apache — to use gssproxy instead of using /etc/krb5.keytab as usual, Apache needs to be started in an environment that has GSS_USE_PROXY=1 set. The background is covered by the gssproxy-mech(8) man page and explained by the gssproxy README.

When systemd is used the following can be used to set the environment variable, note the final command to reload systemd.

root@foo:~# mkdir -p /etc/systemd/system/apache2.service.d
root@foo:~# cat<<EOF > /etc/systemd/system/apache2.service.d/gssproxy.conf
[Service]
Environment=GSS_USE_PROXY=1
EOF
root@foo:~# systemctl daemon-reload

The next step is to configure a GSS-API protected Apache resource:

root@foo:~# cat<<EOF > /etc/apache2/conf-available/private.conf
<Location /private>
  AuthType GSSAPI
  AuthName "GSSAPI Login"
  Require valid-user
</Location>

Enable the configuration and restart Apache — the suggested use of reload is not sufficient, because then it won’t be restarted with the newly introduced GSS_USE_PROXY variable. This just applies to the first time, after the first restart you may use reload again.

root@foo:~# a2enconf private
Enabling conf private.
To activate the new configuration, you need to run:
systemctl reload apache2
root@foo:~# systemctl restart apache2

When you have debug messages enabled, the log may look like this:

Sep 18 07:32:23 foo systemd[1]: Stopping The Apache HTTP Server…
Sep 18 07:32:23 foo gssproxy[4182]: [2022/09/18 05:32:23]: Client [2022/09/18 05:32:23]: (/usr/sbin/apache2) [2022/09/18 05:32:23]: connected (fd = 10)[2022/09/18 05:32:23]: (pid = 4651) (uid = 0) (gid = 0)[2022/09/18 05:32:23]:
Sep 18 07:32:23 foo gssproxy[4182]: message repeated 4 times: [ [2022/09/18 05:32:23]: Client [2022/09/18 05:32:23]: (/usr/sbin/apache2) [2022/09/18 05:32:23]: connected (fd = 10)[2022/09/18 05:32:23]: (pid = 4651) (uid = 0) (gid = 0)[2022/09/18 05:32:23]:]
Sep 18 07:32:23 foo systemd[1]: apache2.service: Deactivated successfully.
Sep 18 07:32:23 foo systemd[1]: Stopped The Apache HTTP Server.
Sep 18 07:32:23 foo systemd[1]: Starting The Apache HTTP Server…
Sep 18 07:32:23 foo gssproxy[4182]: [2022/09/18 05:32:23]: Client [2022/09/18 05:32:23]: (/usr/sbin/apache2) [2022/09/18 05:32:23]: connected (fd = 10)[2022/09/18 05:32:23]: (pid = 4657) (uid = 0) (gid = 0)[2022/09/18 05:32:23]:
root@foo:~# Sep 18 07:32:23 foo gssproxy[4182]: message repeated 8 times: [ [2022/09/18 05:32:23]: Client [2022/09/18 05:32:23]: (/usr/sbin/apache2) [2022/09/18 05:32:23]: connected (fd = 10)[2022/09/18 05:32:23]: (pid = 4657) (uid = 0) (gid = 0)[2022/09/18 05:32:23]:]
Sep 18 07:32:23 foo systemd[1]: Started The Apache HTTP Server.

Finally, set up a dummy test page on the server:

root@foo:~# echo OK > /var/www/html/private

To verify that the server is working properly you may acquire tickets locally and then use curl to retrieve the GSS-API protected resource. The "--negotiate" enables SPNEGO and "--user :" asks curl to use username from the environment.

root@foo:~# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: jas@GSSPROXY.EXAMPLE.ORG

Valid starting Expires Service principal
09/18/22 07:40:37 09/19/22 07:40:37 krbtgt/GSSPROXY.EXAMPLE.ORG@GSSPROXY.EXAMPLE.ORG
root@foo:~# curl --negotiate --user : http://foo.sjd.se/private
OK
root@foo:~#

The log should contain something like this:

Sep 18 07:56:00 foo gssproxy[4872]: [2022/09/18 05:56:00]: Client [2022/09/18 05:56:00]: (/usr/sbin/apache2) [2022/09/18 05:56:00]: connected (fd = 10)[2022/09/18 05:56:00]: (pid = 5042) (uid = 33) (gid = 33)[2022/09/18 05:56:00]:
Sep 18 07:56:00 foo gssproxy[4872]: [CID 10][2022/09/18 05:56:00]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 33,socket: (null)
Sep 18 07:56:00 foo gssproxy[4872]: [CID 10][2022/09/18 05:56:00]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 33,socket: (null)
Sep 18 07:56:00 foo gssproxy[4872]: [CID 10][2022/09/18 05:56:00]: gp_rpc_execute: executing 1 (GSSX_INDICATE_MECHS) for service "HTTP", euid: 33,socket: (null)
Sep 18 07:56:00 foo gssproxy[4872]: [CID 10][2022/09/18 05:56:00]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 33,socket: (null)
Sep 18 07:56:00 foo gssproxy[4872]: [CID 10][2022/09/18 05:56:00]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 33,socket: (null)

The Apache log will look like this, notice the authenticated username shown.

127.0.0.1 - jas@GSSPROXY.EXAMPLE.ORG [18/Sep/2022:07:56:00 +0200] "GET /private HTTP/1.1" 200 481 "-" "curl/7.81.0"

Congratulations, and happy hacking!

Matthew Garrett: Handling WebAuthn over remote SSH connections

20 September, 2022 - 09:17
Being able to SSH into remote machines and do work there is great. Using hardware security tokens for 2FA is also great. But trying to use them both at the same time doesn't work super well, because if you hit a WebAuthn request on the remote machine it doesn't matter how much you mash your token - it's not going to work.

But could it?

The SSH agent protocol abstracts key management out of SSH itself and into a separate process. When you run "ssh-add .ssh/id_rsa", that key is being loaded into the SSH agent. When SSH wants to use that key to authenticate to a remote system, it asks the SSH agent to perform the cryptographic signatures on its behalf. SSH also supports forwarding the SSH agent protocol over SSH itself, so if you SSH into a remote system then remote clients can also access your keys - this allows you to bounce through one remote system into another without having to copy your keys to those remote systems.

More recently, SSH gained the ability to store SSH keys on hardware tokens such as Yubikeys. If configured appropriately, this means that even if you forward your agent to a remote site, that site can't do anything with your keys unless you physically touch the token. But out of the box, this is only useful for SSH keys - you can't do anything else with this support.

Well, that's what I thought, at least. And then I looked at the code and realised that SSH is communicating with the security tokens using the same library that a browser would, except it ensures that any signature request starts with the string "ssh:" (which a genuine WebAuthn request never will). This constraint can actually be disabled by passing -O no-restrict-websafe to ssh-agent, except that was broken until this weekend. But let's assume there's a glorious future where that patch gets backported everywhere, and see what we can do with it.

First we need to load the key into the security token. For this I ended up hacking up the Go SSH agent support. Annoyingly it doesn't seem to be possible to make calls to the agent without going via one of the exported methods here, so I don't think this logic can be implemented without modifying the agent module itself. But this is basically as simple as adding another key message type that looks something like:
type ecdsaSkKeyMsg struct {
       Type        string `sshtype:"17|25"`
       Curve       string
       PubKeyBytes []byte
       RpId        string
       Flags       uint8
       KeyHandle   []byte
       Reserved    []byte
       Comments    string
       Constraints []byte `ssh:"rest"`
}
Where Type is ssh.KeyAlgoSKECDSA256, Curve is "nistp256", RpId is the identity of the relying party (eg, "webauthn.io"), Flags is 0x1 if you want the user to have to touch the key, KeyHandle is the hardware token's representation of the key (basically an opaque blob that's sufficient for the token to regenerate the keypair - this is generally stored by the remote site and handed back to you when it wants you to authenticate). The other fields can be ignored, other than PubKeyBytes, which is supposed to be the public half of the keypair.

This causes an obvious problem. We have an opaque blob that represents a keypair. We don't have the public key. And OpenSSH verifies that PubKeyByes is a legitimate ecdsa public key before it'll load the key. Fortunately it only verifies that it's a legitimate ecdsa public key, and does nothing to verify that it's related to the private key in any way. So, just generate a new ECDSA key (ecdsa.GenerateKey(elliptic.P256(), rand.Reader)) and marshal it ( elliptic.Marshal(ecKey.Curve, ecKey.X, ecKey.Y)) and we're good. Pass that struct to ssh.Marshal() and then make an agent call.

Now you can use the standard agent interfaces to trigger a signature event. You want to pass the raw challenge (not the hash of the challenge!) - the SSH code will do the hashing itself. If you're using agent forwarding this will be forwarded from the remote system to your local one, and your security token should start blinking - touch it and you'll get back an ssh.Signature blob. ssh.Unmarshal() the Blob member to a struct like
type ecSig struct {
        R *big.Int
        S *big.Int
}
and then ssh.Unmarshal the Rest member to
type authData struct {
        Flags    uint8
        SigCount uint32
}
The signature needs to be converted back to a DER-encoded ASN.1 structure (eg,
var b cryptobyte.Builder
b.AddASN1(asn1.SEQUENCE, func(b *cryptobyte.Builder) {
        b.AddASN1BigInt(ecSig.R)
        b.AddASN1BigInt(ecSig.S)
})
signatureDER, _ := b.Bytes()
, and then you need to construct the Authenticator Data structure. For this, take the RpId used earlier and generate the sha256. Append the one byte Flags variable, and then convert SigCount to big endian and append those 4 bytes. You should now have a 37 byte structure. This needs to be CBOR encoded (I used github.com/fxamacker/cbor and just called cbor.Marshal(data, cbor.EncOptions{})).

Now base64 encode the sha256 of the challenge data, the DER-encoded signature and the CBOR-encoded authenticator data and you've got everything you need to provide to the remote site to satisfy the challenge.

There are alternative approaches - you can use USB/IP to forward the hardware token directly to the remote system. But that means you can't use it locally, so it's less than ideal. Or you could implement a proxy that communicates with the key locally and have that tunneled through to the remote host, but at that point you're just reinventing ssh-agent.

And you should bear in mind that the default behaviour of blocking this sort of request is for a good reason! If someone is able to compromise a remote system that you're SSHed into, they can potentially trick you into hitting the key to sign a request they've made on behalf of an arbitrary site. Obviously they could do the same without any of this if they've compromised your local system, but there is some additional risk to this. It would be nice to have sensible MAC policies that default-denied access to the SSH agent socket and only allowed trustworthy binaries to do so, or maybe have some sort of reasonable flatpak-style portal to gate access. For my threat model I think it's a worthwhile security tradeoff, but you should evaluate that carefully yourself.

Anyway. Now to figure out whether there's a reasonable way to get browsers to work with this.

comments

Antoine Beaupré: Looking at Wayland terminal emulators

19 September, 2022 - 23:41

Back in 2018, I made a two part series about terminal emulators that was actually pretty painful to write. So I'm not going to retry this here, not at all. Especially since I'm not submitting this to the excellent LWN editors so I can get away with not being very good at writing. Phew.

Still, it seems my future self will thank me for collecting my thoughts on the terminal emulators I have found out about since I wrote that article. Back then, Wayland was not quite at the level where it is now, being the default in Fedora (2016), Debian (2019), RedHat (2019), and Ubuntu (2021). Also, a bunch of folks thought they would solve everything by using OpenGL for rendering. Let's see how things stack up.

Recap

In the previous article, I touched on those projects:

Terminal Changes since review Alacritty releases! scrollback, better latency, URL launcher, clipboard support, still not in Debian, but close GNOME Terminal not much? couldn't find a changelog Konsole not much mlterm long changelog but: supports console mode (like GNU screen?!), Wayland support through libvte, sixel graphics, zmodem, mosh pterm changes: Wayland support st unparseable changelog, might include scrollback support through a third-party scroll(1) command I couldn't find Terminator moved to GitHub, Python 3 support, not being dead urxvt no significant changes, a single release, still in CVS! Xfce Terminal hard to parse changeloghttps://gitlab.xfce.org/apps/xfce4-terminal/-/blob/master/NEWS, presumably some improvements to paste safety? xterm notoriously hard to parse changelog, improvements to paste safety (disallowedPasteControls), fonts, clipboard improvements?

After writing those articles, bizarrely, I was still using rxvt even though it did not come up as shiny as I would have liked. The colors problems were especially irritating.

I briefly played around with Konsole and xterm, and eventually switched to XTerm as my default x-terminal-emulator "alternative" in my Debian system, while writing this.

I quickly noticed why I had stopped using it: clickable links are a huge limitation. I ended up adding keybindings to open URLs in a command. There's another keybinding to dump the history into a command. Neither are as satisfactory as just clicking a damn link.

Requirements

Figuring out my requirements is actually a pretty hard thing to do. In my last reviews, I just tried a bunch of stuff and collected everything, but a lot of things (like tab support) I don't actually care about. So here's a set of things I actually do care about:

  • latency
  • resource usage
  • proper clipboard support, that is:
    • mouse selection and middle button uses PRIMARY
    • control-shift-c and control-shift-v for CLIPBOARD
  • true color support
  • no known security issues
  • active project
  • paste protection
  • clickable URLs
  • scrollback
  • font resize
  • non-destructive text-wrapping (ie. resizing a window doesn't drop scrollback history)
  • proper unicode support (at least latin-1, ideally "everything")
  • good emoji support (at least showing them, ideally "nicely"), which involves font fallback

Latency is particularly something I wonder about in Wayland. Kitty seem to have been pretty dilligent at doing latency tests, claiming 35ms with a hardware-based latency tester and 7ms with typometer, but it's unclear how those would come up in Wayland because, as far as I know, typometer does not support Wayland.

Candidates

Those are the projects I am considering.

  • alacritty
  • darktile - GPU rendering, Unicode support, themable, ligatures (optional), Sixel, window transparency, clickable URLs, true color support, not in Debian
  • foot - Wayland only, daemon-mode, sixel images, scrollback search, true color, font resize, URLs not clickable, but keyboard-driven selection, proper clipboard support, in Debian
  • havoc - minimal, scrollback, configurable keybindings, not in Debian
  • kitty - "fast, feature-rich, GPU based", ligatures, emojis, hyperlinks, pluggable, scriptable, tabs, layouts, history, file transfer over SSH, graphics, oh dear, in Debian
  • sakura - libvte, Wayland support, tabs, no menu bar, original libvte gangster, dynamic font size, probably supports Wayland, in Debian
  • termonad - Haskell? in Debian
  • wez - Rust, Wayland, multiplexer, ligatures, scrollback search, clipboard support, bracketed paste, panes, tabs, serial port support, Sixel, Kitty, iTerm graphics, built-in SSH client (!?), not in Debian
  • XTerm - status quo, no Wayland port obviously
  • zutty: OpenGL rendering, true color, clipboard support, small codebase, no Wayland support, crashes on bremner's, in Debian

I would really, really like to like Alacritty, but it's still not packaged in Debian, and they haven't fully addressed the latency issues although, to be fair, maybe it's just an impossible task. Once it's in Debian, maybe I'll reconsider.

It seems like Arch Linux defaults to foot in Sway, and I keep seeing it everywhere, so it is probably my next thing to try, if/when I switch to Wayland. One question mark with all Wayland terminals, and Foot in particular, is how much latency they introduce in the rendering pipeline. The foot performance and benchmarks look excellent, but do not include latency benchmarks.

No conclusion

So I guess that's all I've got so far, I may try alacritty if it hits Debian, or foot if I switch to Wayland, but for now I'm hacking in xterm still. Happy to hear ideas in the comments.

Stay tuned for more happy days.

Matthew Garrett: Bring Your Own Disaster

19 September, 2022 - 14:12
After my last post, someone suggested that having employers be able to restrict keys to machines they control is a bad thing. So here's why I think Bring Your Own Device (BYOD) scenarios are bad not only for employers, but also for users.

There's obvious mutual appeal to having developers use their own hardware rather than rely on employer-provided hardware. The user gets to use hardware they're familiar with, and which matches their ergonomic desires. The employer gets to save on the money required to buy new hardware for the employee. From this perspective, there's a clear win-win outcome.

But once you start thinking about security, it gets more complicated. If I, as an employer, want to ensure that any systems that can access my resources meet a certain security baseline (eg, I don't want my developers using unpatched Windows ME), I need some of my own software installed on there. And that software doesn't magically go away when the user is doing their own thing. If a user lends their machine to their partner, is the partner fully informed about what level of access I have? Are they going to feel that their privacy has been violated if they find out afterwards?

But it's not just about monitoring. If an employee's machine is compromised and the compromise is detected, what happens next? If the employer owns the system then it's easy - you pick up the device for forensic analysis and give the employee a new machine to use while that's going on. If the employee owns the system, they're probably not going to be super enthusiastic about handing over a machine that also contains a bunch of their personal data. In much of the world the law is probably on their side, and even if it isn't then telling the employee that they have a choice between handing over their laptop or getting fired probably isn't going to end well.

But obviously this is all predicated on the idea that an employer needs visibility into what's happening on systems that have access to their systems, or which are used to develop code that they'll be deploying. And I think it's fair to say that not everyone needs that! But if you hold any sort of personal data (including passwords) for any external users, I really do think you need to protect against compromised employee machines, and that does mean having some degree of insight into what's happening on those machines. If you don't want to deal with the complicated consequences of allowing employees to use their own hardware, it's rational to ensure that only employer-owned hardware can be used.

But what about the employers that don't currently need that? If there's no plausible future where you'll host user data, or where you'll sell products to others who'll host user data, then sure! But if that might happen in future (even if it doesn't right now), what's your transition plan? How are you going to deal with employees who are happily using their personal systems right now? At what point are you going to buy new laptops for everyone? BYOD might work for you now, but will it always?

And if your employer insists on employees using their own hardware, those employees should ask what happens in the event of a security breach. Whose responsibility is it to ensure that hardware is kept up to date? Is there an expectation that security can insist on the hardware being handed over for investigation? What information about the employee's use of their own hardware is going to be logged, who has access to those logs, and how long are those logs going to be kept for? If those questions can't be answered in a reasonable way, it's a huge red flag. You shouldn't have to give up your privacy and (potentially) your hardware for a job.

Using technical mechanisms to ensure that employees only use employer-provided hardware is understandably icky, but it's something that allows employers to impose appropriate security policies without violating employee privacy.

comments

Axel Beckert: wApua 0.06.4 released

19 September, 2022 - 07:55
I today released version 0.06.4 of my WAP WML browser wApua and also uploaded that release to Debian Unstable.

It’s a bugfix release and the first upstream release since 2017.

It fixes the recognition of WAP WML pages with more recent DTD location URLs ending in .dtd instead of .xml (and some other small difference). No idea when these URLs changed, but I assume they have been changed to look more like the URLs of other DTDs. The old URLs of the DTD still work, but more recent WAP pages (yes, they do exist :-) seem to use the new DTD URLs, so there was a need to recognise them instead of throwing an annoying warning.

Thanks to Lian Begett for the bug report!

Russ Allbery: Effective altruism and the control trap

18 September, 2022 - 03:49

William MacAskill has been on a book tour for What We Owe to the Future, which has put effective altruism back in the news. That plus the decision by GiveWell to remove GiveDirectly from their top charity list got me thinking about charity again. I think effective altruism, by embracing long-termism, is falling into an ethical trap, and I'm going to start heavily discounting their recommendations for donations.

Background

Some background first for people who have no idea what I'm talking about.

Effective altruism is the idea that we should hold charities accountable for effectiveness. It's not sufficient to have an appealing mission. A charity should demonstrate that the money they spend accomplishes the goals they claimed it would. There is a lot of debate around defining "effective," but as a basic principle, this is sound. Mainstream charity evaluators such as Charity Navigator measure overhead and (arguable) waste, but they don't ask whether the on-the-ground work of the charity has a positive effect proportional to the resources it's expending. This is a good question to ask.

GiveWell is a charity research organization that directs money for donors based on effective altruism principles. It's one of the central organizations in effective altruism.

GiveDirectly is a charity that directly transfers money from donors to poor people. It doesn't attempt to build infrastructure, buy specific things, or fund programs. It identifies poor people and gives them cash with no strings attached.

Long-termism is part of the debate over what "effectiveness" means. It says we should value impact on future generations more highly than we tend to do. (In other words, we should have a much smaller future discount rate.) A sloppy but intuitive expression of long-termism is that (hopefully) there will be far more humans living in the future than are living today, and therefore a "greatest good for the greatest number" moral philosophy argues that we should invest significant resources into making the long-term future brighter. This has obvious appeal to those of us who are concerned about the long-term impacts of climate change, for example.

There is a lot of overlap between the communities of effective altruism, long-termism, and "rationalism." One way this becomes apparent is that all three communities have a tendency to obsess over the risks of sentient AI taking over the world. I'm going to come back to that.

Psychology of control

GiveWell, early on, discovered that GiveDirectly was measurably more effective than most charities. Giving money directly to poor people without telling them how to spend it produced more benefits for those people and their surrounding society than nearly all international aid charities.

GiveDirectly then became the baseline for GiveWell's evaluations, and GiveWell started looking for ways to be more effective than that. There is some logic to thinking more effectiveness is possible. Some problems are poorly addressed by markets and too large for individual spending. Health care infrastructure is an obvious example.

That said, there's also a psychological reason to look for other charities. Part of the appeal of charity is picking a cause that supports your values (whether that be raw effectiveness or something else). Your opinions and expertise are valued alongside your money. In some cases, this may be objectively true. But in all cases, it's more flattering to the ego than giving poor people cash.

At that point, the argument was over how to address immediate and objectively measurable human problems. The innovation of effective altruism is to tie charitable giving to a research feedback cycle. You measure the world, see if it is improving, and adjust your funding accordingly. Impact is measured by its effects on actual people. Effective altruism was somewhat suspicious of talking directly to individuals and preferred "objective" statistical measures, but the point was to remain in contact with physical reality.

Enter long-termism: what if you could get more value for your money by addressing problems that would affect vast numbers of future people, instead of the smaller number of people who happen to be alive today?

Rather than looking at the merits of that argument, look at its psychology. Real people are messy. They do things you don't approve of. They have opinions that don't fit your models. They're hard to "objectively" measure. But people who haven't been born yet are much tidier. They're comfortably theoretical; instead of having to go to a strange place with unfamiliar food and languages to talk to people who aren't like you, you can think hard about future trends in the comfort of your home. You control how your theoretical future people are defined, so the results of your analysis will align with your philosophical and ideological beliefs.

Problems affecting future humans are still extrapolations of problems visible today in the world, though. They're constrained by observations of real human societies, despite the layer of projection and extrapolation. We can do better: what if the most serious problem facing humanity is the possible future development of rogue AI?

Here's a problem that no one can observe or measure because it's never happened. It is purely theoretical, and thus under the control of the smart philosopher or rich western donor. We don't know if a rogue AI is possible, what it would be like, how one might arise, or what we could do about it, but we can convince ourselves that all those things can be calculated with some probability bar through the power of pure logic. Now we have escaped the uncomfortable psychological tension of effective altruism and returned to the familiar world in which the rich donor can define both the problem and the solution. Effectiveness is once again what we say it is.

William MacAskill, one of the originators of effective altruism, now constantly talks about the threat of rogue AI. In a way, it's quite sad.

Where to give money?

The mindset of long-termism is bad for the human brain. It whispers to you that you're smarter than other people, that you know what's really important, and that you should retain control of more resources because you'll spend them more wisely than others. It's the opposite of intellectual humility. A government funding agency should take some risks on theoretical solutions to real problems, and maybe a few on theoretical solutions to theoretical problems (although an order of magnitude less). I don't think this is a useful way for an individual donor to think.

So, if I think effective altruism is abandoning the one good idea it had and turning back into psychological support for the egos of philosophers and rich donors, where does this leave my charitable donations?

To their credit, GiveWell so far seems uninterested in shifting from concrete to theoretical problems. However, they believe they can do better by picking projects than giving people money, and they're committing to that by dropping GiveDirectly (while still praising them). They may be right. But I'm increasingly suspicious of the level of control donors want to retain. It's too easy to trick oneself into thinking you know better than the people directly affected.

I have two goals when I donate money. One is to make the world a better, kinder place. The other is to redistribute wealth. I have more of something than I need, and it should go to someone who does need it. The net effect should be to make the world fairer and more equal.

The first goal argues for effective altruism principles: where can I give money to have the most impact on making the world better? The second goal argues for giving across an inequality gradient. I should find the people who are struggling the most and transfer as many resources to them as I can. This is Peter Singer's classic argument for giving money to the global poor.

I think one can sometimes do better than transferring money, but doing so requires a deep understanding of the infrastructure and economies of scale that are being used as leverage. The more distant one is from a society, the more dubious I think one should be of one's ability to evaluate that, and the more wary one should be of retaining any control over how resources are used.

Therefore, I'm pulling my recurring donation to GiveWell. Half of it is going to go to GiveDirectly, because I think it is an effective way of redistributing wealth while giving up control. The other half is going to my local foodbank, because they have a straightforward analysis of how they can take advantage of economy of scale, and because I have more tools available (such as local news) to understand what problem they're solving and if they're doing so effectively.

I don't know that those are the best choices. There are a lot of good ones. But I do feel strongly that the best charity comes from embracing the idea that I do not have special wisdom, other people know more about what they need than I do, and deploying my ego and logic from the comfort of my home is not helpful. Find someone who needs something you have an excess of. Give it to them. Treat them as equals. Don't retain control. You won't go far wrong.

Shirish Agarwal: Books and Indian Tourism

18 September, 2022 - 02:32
Fiction

A few days ago somebody asked me and I think it is an often requested to perhaps all fiction readers as to why we like fiction? First of all, reading in itself is told as food for the soul. Because, whenever you write or read anything you don’t just read it, you also visualize it. And that visualization is and would be far greater than any attempt in cinema as there are no budget constraints and it takes no more than a minute to visualize a scenario if the writer is any good. You just close your eyes and in a moment you are transported to a different world. This is also what is known as ‘world building’. Something fantasy writers are especially gifted in. Also, with the whole parallel Universes being a reality, it is just so much fertile land for imagination that I just cannot believe that it hasn’t been worked to death to date. And you do need a lot of patience to make a world, to make characters, to make characters a bit eccentric one way or the other. And you have to know to put into a three, five, or whatever number of acts you want to put in. And then, of course, they have readers like us who dream and add more color to the story than the author did. As we take his, her, or their story and weave countless stories depending on where we are, where we are and who we are.

What people need to understand is that not just readers want escapism but writers too want to escape from the human condition. And they find solace in whatever they write. The well-known example of J.R.R. Tolkien is always there. How he must have felt each day coming after war, to somehow find the strength and just dream away, transport himself to a world of hobbits, elves, and other mysterious beings. It surely must have taken a lot of pain from him that otherwise, he would have felt. There are many others. What also does happen now and then, is authors believe in their own intelligence so much, that they commit crimes, but that’s par for the course.

Dean Koontz, Odd Apocalypse

Currently, I am reading the above title. It is perhaps one of the first horror title books that I have read which has so much fun. The ‘hero’ has a great sense of wit, humor, and sarcasm that you can cut butter with it. Now if you got that, this is par for the wordplay happening every second paragraph and I’m just 100 pages in of the 500-page Novel.

Now, while I haven’t read the whole book and I’m just speculating, what if at the end we realize that the hero all along was or is the villain. Sadly, we don’t have many such twisted stories and that too is perhaps because most people used to have black and white rather than grey characters. From all my reading, and even watching web series and whatnot, it is only the Europeans who seem to have a taste for exploring grey characters and giving twists at the end that people cannot anticipate. Even their heroes or heroines are grey characters. and they can really take you for a ride. It is also perhaps how we humans are, neither black nor white but more greyish. Having grey characters also frees the author quite a bit as she doesn’t have to use so-called tropes and just led the characters to lead themselves.

Indian Book publishing Industry

I do know Bengali stories do have a lot of grey characters, but sadly most of the good works are still in Bengali and not widely published compared to say European or American authors. While there is huge potential in the Indian publishing market for English books and there is also hunger, getting good and cheap publishers is the issue. Just recently SAGE publishing division shut down and this does not augur well for the Indian market. In the past few years, I and other readers have seen some very good publishing houses quit India for one reason or the other. GST has also made the sector more expensive. The only thing that works now and has been for some time is the seconds and thirds market. For e.g. I just bought today about 15-20 books @INR 125/- a kind of belated present for the self. That would be what, at the most 2 USD or 2 Euros per book. I bet even a burger costs more than that, but again India being a price-sensitive market, at these prices the seconds book sells. And these are all my favorite authors, Lee Child, Tom Clancy, Dean Koontz, and so on and so forth. I also saw a lot of fantasy books but they would have to wait for another day.

Tourism in India for Debconf 23

I had shared a while back that I would write a bit about tourism as Debconf or Annual Debian Conference will happen in India next year around this time. I was supposed to write it in the FAQ but couldn’t find a place or a corner where I could write it. There are actually two things that people need to be aware of. The one thing that people need to be very aware of is food poisoning or Delhi Belly. This is a far too common sight that I have witnessed especially with westerners when they come to visit India. I am somewhat shocked that it hasn’t been shared in the FAQ but then perhaps we cannot cover all the bases therein. I did find this interesting article and would recommend the suggestions given in it wholeheartedly. I would suggest people coming to India to buy and have purifying water tablets with them if they decide to stay back and explore India.

Now the problem with tourism is, that one can have as much tourism as one wants. One of the unique ways I found some westerners having the time of their life is buying an Indian Rickshaw or Tuk-Tuk and traveling with it. A few years ago, when I was more adventourous-spirited I was able to meet a few of them. There is also the Race with Rickshaws that happens in Rajasthan and you get to see about 10 odd cities in and around Rajasthan state and get to see the vibrancy in the North. If somebody really wants to explore India, then I would suggest getting down to Goa, specifically, South Goa, meeting with the hippie crowd, and getting one of the hippie guidebooks to India. Most people forget that the Hippies came to India in the 1960s and many of them just never left. Tap water in Pune is ok, have seen and experienced the same in Himachal, Garwhal, and Uttarakhand, although it has been a few years since I have been to those places. North-East is a place I have yet to venture into.

India does have a lot of beauty but most people are not clean-conscious so if you go to common tourist destinations, you will find a lot of garbage. Most cities in India do give you an option of homestays and some even offer food, so if you are on a budget as well as wanna experience life with an Indian family, that could be something you could look into. So you can see and share about India with different eyes.

There is casteism, racism, and all that. Generally speaking, you would see it wielded a lot more in your face in North India than in South India where it is there but far more subtle. About food, what has been shared in the India BOF. Have to say, it doesn’t even scratch the surface. If you stay with an Indian family, there is probably a much better chance of exploring the variety of food that India has to offer. From the western perspective, we tend to overcook stuff and make food with Masalas but that’s the way most people like it. People who have had hot sauces or whatnot would probably find India much easier to adjust to as tastes might be similar to some extent.

If you want to socialize with young people, while discos are an option, meetup.com also is a good place. You can share your passions and many people have taken to it with gusto. We also have been hosting Comiccons in India, but I haven’t had the opportunity to attend them so far. India has a rich oral culture reach going back a few thousand years, but many of those who are practicing those reside more in villages rather than in cities. And while there have been attempts in the past to record them, most of those have come to naught as money runs out as there is no commercial viability to such projects, but that probably is for another day.

In the end, what I have shared is barely a drop in the ocean that is India. Come, have fun, explore, enjoy and invigorate yourself and others

James Valleroy: How I avoid sysadmin work

17 September, 2022 - 21:55

The server running this blog is a RockPro64 sitting in my living room. Besides WordPress (the blogging software), I run various other services on it:

  • Bepasty for sharing files,
  • Ikiwiki for taking notes,
  • Quassel for staying connected to IRC chat servers,
  • Radicale for synchronizing my calendar and tasks,
  • Shaarli for sharing bookmarks, and
  • Tiny Tiny RSS for reading other people’s blogs.

Most of these are for my personal use, and a few of them have pages for public viewing (linked at the top of this page).

Despite running a server, I don’t really consider myself to be a system administrator (or “sysadmin” for short). I generally try to avoid doing system administration work as much as possible. I think this is due to a number of reasons:

  • It is not part of my educational or professional background (which mainly consists of embedded software engineering).
  • I think that I lack the kind of discipline needed to be a sysadmin. For example, taking notes of the commands that you are running, or testing things before you do them on a production server. I’m likely to just run commands, and lose track of my notes.
  • And finally a type of laziness. I just don’t want to spend a lot of time and effort setting up or maintaining some services.

These reasons might be surprising to some, but they also suggest that an alternative approach:

  • I do have a server running in my home, but I don’t consider myself a sysadmin.
  • I have a different kind of discipline than the one I described above.
  • I’m willing to spend time and effort to improve things, but I want to do it in a different way than usual.

So my approach is this: if I want to run an additional service, enhance an existing one, or fix a bug, I don’t do those changes directly on my server. Instead, I will make (or suggest, or request) the change somewhere upstream of my server:

  • In the various Debian packages of the services I have installed.
  • In FreedomBox configuration and integration of those packages.
  • Or even directly to the upstream software development.

So basically my system administration task turns into a software development task instead. And (in my opinion) there are much better tools available for this: source control systems such as git, test suites and Continuous Integration (CI) pipelines, and code review processes. These make it easier to keep track of and understand the changes, and reduce the possibility of making a catastrophic mistake.

Besides this, there is one other major advantage to working upstream: the work is not just benefiting the server running in my home, but many others. Anyone who is using the same software or packages will also get the improvements or bug fixes. And likewise, I get to benefit from the work done by many other contributors.

Some final notes about this approach:

  • The software I am running is very standardized, specifically to what is available in Debian and FreedomBox. This limits the number of services available: for example, I could not easily run a Mastodon server with this approach. Also, I am not maintaining any custom configurations for my own services.
  • I am presenting these ideas as they occur to me, and specific to my situation. But there are similar concepts out there, for example “Infrastructure as Code”.

Jonathan Dowland: Introducing Red Hat UBI9 OpenJDK runtime images

17 September, 2022 - 13:11

A few weeks ago we shipped the first RHEL UBI9-based OpenJDK container images.

Universal Base Image (UBI) is an initiative where you can obtain, share and build upon official Red Hat container images without needing a Red Hat subscription. They're exactly the same base images that Red Hat products are built upon, composed entirely of Open Source software. Your precise rights are covered in the EULA.

Nowadays we offer two flavours of images, the original style (now termed builder images) and leaner runtime images, which have a subset of the JDK, and no build tools like Maven, etc.

We provide JDK11 and JDK17 for UBI9:

podman pull registry.access.redhat.com/ubi9/openjdk-11
podman pull registry.access.redhat.com/ubi9/openjdk-17
podman pull registry.access.redhat.com/ubi9/openjdk-11-runtime
podman pull registry.access.redhat.com/ubi9/openjdk-17-runtime

In comparison to the UBI8 images, we have done a lot of housecleaning. If you are curious as to exactly what we've changed, you can read the list of commits in this pull request.

Perhaps most notable is a change in the way we tune the JVM's memory. In our existing images up to now, partially for legacy reasons, the container start up scripts interrogate the cgroups (v1) virtual filesystems to establish any memory limits imposed on the running container. From that, they calculated a percentage of the memory limit as an absolute value, and then ask the JVM to limit its heap to that calculated sum via the -Xmx flag.

This dates back to a time when the JVM was not container aware. It now is, so for the UBI9 images we instead ask the JVM directly for the percentage we want using -XX:MaxRAMPercentage. We've also changed the default percentage from 50% to 80%, to better utilise the memory assigned to Java containers.

One big advantage of this is the JVM is cgroups (v2) aware, and the legacy start up scripts we wrote are not. But another is reducing the amount of code we run in the start up scripts, easing maintenance and simplifying the containers as much as possible.

Please give them a go, and let me (or us) know what you think!

Jonathan Dowland: things I'd like to 3D print, revisited

15 September, 2022 - 20:55

Back in November I wrote up a list of 25 things I would 3D print. Let's revisit the list and see how things have developed.

Stuff I won't print
  • Some kind of 45° leaning prong to dry bottles and flasks on
  • A tea tray and coasters
  • Small tins to keep loose-leaf tea in

It was pointed out to me that you can't safely print things to store food in with most materials, as their porous/layered nature facilitates the growth of bacteria. So, I'll rule out those items.

A vinyl record.

The size of the grooves in a vinyl record are smaller than conventional FDM printers can achieve.

Things I've printed

a replacement prop arm/foot for my computer keyboard

Someone has modelled the exact part I need, and it worked great: https://www.printables.com/model/59132-lenovo-keyboard-kt-1255

replacement toy bolt

replacement bits for an Early Learning Centre Build It Deluxe Set

I was amazed to find that someone has already modelled one of these, too, and it worked beautifully: https://www.printables.com/model/79243-elc-build-it-compatible-bolt

Little kids trinkets. Pacman ghosts

The Pacman ghost family so far

So far, I've tried to print useful, functional things, but on a few occasions I've printed a little Pacman ghost when testing printer calibration or similar. I've mostly used these models: https://www.printables.com/model/199425-pac-man-ghost-v2

  • Lego storage/sorters
  • DIY bits-and-bobs sorter/storage (nuts and bolts etc)

I learned about a Slicer setting sometimes called "Vase mode" and found this interesting system of modular drawers that are designed to be printed in Vase mode, so I gave them a go: https://www.printables.com/model/139570-fast-printing-modular-drawer-system-vase-mode I printed one and four drawers for it and gave it to my daughter. It might be used for sorting Lego, or possibly as a chest-of-drawers for a dolls house.

A free-standing inclined vinyl record display stand

https://www.printables.com/model/174711-vinyl-stand-for-kallax

A bracket to install a Gotek drive in my Amiga 500

https://www.thingiverse.com/thing:2745049

Summary

From my original list of 25 things to print, I've done 7 of them and determined 4 are not viable. The things in this list I've printed have been off-the-shelf models that other people have constructed. The things I haven't printed are designs I will do myself, which is one reason I haven't printed them yet: building your own designs is the hard part!

Joachim Breitner: rec-def: Dominators case study

15 September, 2022 - 15:27

More ICFP-inspired experiments using the rec-def library: In Norman Ramsey’s very nice talk about his Functional Pearl “Beyond Relooper: Recursive Translation of Unstructured Control Flow to Structured Control Flow”, he had the following slide showing the equation for the dominators of a node in a graph:

Norman Ramsey shows a formula

He said “it’s ICFP and I wanted to say the dominance relation has a beautiful set of equations … you can read all these algorithms how to compute this, but the concept is simple”.

This made me wonder: If the concept is simple and this formula is beautiful – shouldn’t this be sufficient for the Haskell programmer to obtain the dominator relation, without reading all those algorithms?

Before we start, we have to clarify the formula a bit: If a node is an entry node (no predecessors) then the big intersection is over the empty set, and that is not a well-defined concept. For these nodes, we need that big intersection to return the empty set, as entry nodes are not dominated by any other node. (Let’s assume that the entry nodes are exactly those with no predecessors.)

Let’s try, first using plain Haskell data structures. We begin by implementing this big intersection operator on Data.Set, and also a function to find the predecessors of a node in a graph:

import qualified Data.Set as S
import qualified Data.Map as M

intersections :: [S.Set Int] -> S.Set Int
intersections [] = S.empty
intersections xs = foldl1 S.intersection xs

preds :: [(Int,Int)] -> M.Map Int [Int]
preds edges = M.fromListWith (<>) $
    [ (v1, [])   | (v1, _) <- edges ] ++ -- to make the map total
    [ (v2, [v1]) | (v1, v2) <- edges ]

Now we can write down the formula that Norman gave, quite elegantly:

domintors1 :: [(Int,Int)] -> M.Map Int [Int]
domintors1 edges = fmap S.toList doms
  where
    doms :: M.Map Int (S.Set Int)
    doms = M.mapWithKey
        (\v vs -> S.insert v (intersections [ doms M.! v' | v' <- vs]))
        (preds edges)

Does this work? It seems it does:

ghci> domintors1 []
fromList []
ghci> domintors1 [(1,2)]
fromList [(1,[1]),(2,[1,2])]
ghci> domintors1 [(1,2),(1,3),(2,4),(3,4)]
fromList [(1,[1]),(2,[1,2]),(3,[1,3]),(4,[1,4])]

But – not surprising if you have read my previous blog posts – it falls over once we have recursion:

ghci> domintors1 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,^CInterrupted.

So let us reimplement it with Data.Recursive.Set.

rIntersections :: [R (S.Set Int)] -> R (S.Set Int)
rIntersections [] = rEmpty
rIntersections xs = foldl1 rIntersection xs

domintors2 :: [(Int,Int)] -> M.Map Int [Int]
domintors2 edges = fmap (S.toList . getR) doms
  where
    doms :: M.Map Int (R (S.Set Int))
    doms = M.mapWithKey
        (\v vs -> rInsert v (rIntersections [ doms M.! v' | v' <- vs]))
        (preds edges)

The hope is that we can simply replace the operations, and that now it can suddenly handle cyclic graphs as well. Let’s see:

ghci> domintors2 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,[3]),(4,[4])]

It does! Well, it does return a result… but it looks strange. Clearly node 3 and 4 are also dominated by 1, but the result does not reflect that.

But the result is a solution to Norman’s equation. Was the equation wrong? No, but we failed to notice that the desired solution is the largest, not the smallest. And Data.Recursive.Set calculates, as documented, the least fixed point.

What now? Until the library has code for R (Dual (S.Set a)), we can work around this by using the dual formula to calculate the non-dominators. To do this, we

  • use union instead of intersection
  • delete instead of insert,
  • S.empty, use the set of all nodes (which requires some extra plumbing)
  • subtract the result from the set of all nodes to get the dominators

and thus the code turns into:

rUnions' :: S.Set Int -> [R (S.Set Int)] -> R (S.Set Int)
rUnions' univ [] = mkR univ
rUnions' _ xs = foldl1 rUnion xs

domintors3 :: [(Int,Int)] -> M.Map Int [Int]
domintors3 edges = fmap (S.toList . S.difference nodes . getR) nonDoms
  where
    nodes = S.fromList [v | (v1,v2) <- edges, v <- [v1,v2]]
    nonDoms :: M.Map Int (R (S.Set Int))
    nonDoms = M.mapWithKey
        (\v vs -> rDelete v (rUnions' nodes [ nonDoms M.! v' | v' <- vs]))
        (preds edges)

And with this, now we do get the correct result:

ghci> domintors3 [(1,2),(1,3),(2,4),(3,4),(4,3)]
fromList [(1,[1]),(2,[1,2]),(3,[1,3]),(4,[1,4])]

We worked a little bit on how to express the “beautiful formula” to Haskell, but at no point did we have to think about how to solve it. To me, this is the essence of declarative programming.

Matthew Garrett: git signatures with SSH certificates

15 September, 2022 - 08:34
Last night I complained that git's SSH signature format didn't support using SSH certificates rather than raw keys, and was swiftly corrected, once again highlighting that the best way to make something happen is to complain about it on the internet in order to trigger the universe to retcon it into existence to make you look like a fool. But anyway. Let's talk about making this work!

git's SSH signing support is actually just it shelling out to ssh-keygen with a specific set of options, so let's go through an example of this with ssh-keygen. First, here's my certificate:

$ ssh-keygen -L -f id_aurora-cert.pub
id_aurora-cert.pub:
Type: ecdsa-sha2-nistp256-cert-v01@openssh.com user certificate
Public key: ECDSA-CERT SHA256:(elided)
Signing CA: RSA SHA256:(elided)
Key ID: "mgarrett@aurora.tech"
Serial: 10505979558050566331
Valid: from 2022-09-13T17:23:53 to 2022-09-14T13:24:23
Principals:
mgarrett@aurora.tech
Critical Options: (none)
Extensions:
permit-agent-forwarding
permit-port-forwarding
permit-pty

Ok! Now let's sign something:

$ ssh-keygen -Y sign -f ~/.ssh/id_aurora-cert.pub -n git /tmp/testfile
Signing file /tmp/testfile
Write signature to /tmp/testfile.sig

To verify this we need an allowed signatures file, which should look something like:

*@aurora.tech cert-authority ssh-rsa AAA(elided)

Perfect. Let's verify it:

$ cat /tmp/testfile | ssh-keygen -Y verify -f /tmp/allowed_signers -I mgarrett@aurora.tech -n git -s /tmp/testfile.sig
Good "git" signature for mgarrett@aurora.tech with ECDSA-CERT key SHA256:(elided)

Woo! So, how do we make use of this in git? Generating the signatures is as simple as

$ git config --global commit.gpgsign true
$ git config --global gpg.format ssh
$ git config --global user.signingkey /home/mjg59/.ssh/id_aurora-cert.pub

and then getting on with life. Any commits will now be signed with the provided certificate. Unfortunately, git itself won't handle verification of these - it calls ssh-keygen -Y find-principals which doesn't deal with wildcards in the allowed signers file correctly, and then falls back to verifying the signature without making any assertions about identity. Which means you're going to have to implement this in your own CI by extracting the commit and the signature, extracting the identity from the commit metadata and calling ssh-keygen on your own. But it can be made to work!

But why would you want to? The current approach of managing keys for git isn't ideal - you can kind of piggy-back off github/gitlab SSH key infrastructure, but if you're an enterprise using SSH certificates for access then your users don't necessarily have enrolled keys to start with. And using certificates gives you extra benefits, such as having your CA verify that keys are hardware-backed before issuing a cert. Want to ensure that whoever made a commit was actually on an authorised laptop? Now you can!

I'll probably spend a little while looking into whether it's plausible to make the git verification code work with certificates or whether the right thing is to fix up ssh-keygen -Y find-principals to work with wildcard identities, but either way it's probably not much effort to get this working out of the box.

comments

Joachim Breitner: rec-def: Program analysis case study

15 September, 2022 - 04:53

At this week’s International Conference on Functional Programming I showed my rec-def Haskell library to a few people. As this crowd appreciates writing compilers, and example from the realm of program analysis is quite compelling.

To Throw or not to throw

Here is our little toy language to analyze: It has variables, lambdas and applications, non-recursive (lazy) let bindings and, so that we have something to analyze, a way to throw and to catch exceptions:

type Var = String

data Exp
  = Var Var
  | Throw
  | Catch Exp
  | Lam Var Exp
  | App Exp Exp
  | Let Var Exp Exp

Given such an expression, we would like to know whether it might throw an exception. Such an analysis is easy to write: We traverse the syntax tree, remembering in the env which of the variables may throw an exception:

canThrow1 :: Exp -> Bool
canThrow1 = go M.empty
  where
    go :: M.Map Var Bool -> Exp -> Bool
    go env (Var v)       = env M.! v
    go env Throw         = True
    go env (Catch e)     = False
    go env (Lam v e)     = go (M.insert v False env) e
    go env (App e1 e2)   = go env e1 || go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env

The most interesting case is the one for Let, where we extend the environment env with the information about the additional variable env_bind, which is calculated from analyzing the right-hand side e1.

So far so good:

ghci> someVal = Lam "y" (Var "y")
ghci> canThrow1 $ Throw
True
ghci> canThrow1 $ Let "x" Throw someVal
False
ghci> canThrow1 $ Let "x" Throw (App (Var "x") someVal)
True
Let it rec

To spice things up, let us add a recursive let to the language:

data Exp
  | LetRec [(Var, Exp)] Exp

How can we support this new constructor in canThrow1? Let use naively follow the pattern used for Let: Calculate the analysis information for the variables in env_bind, extend the environment with that, and pass it down:

    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

Note that, crucially, we use env', and not just env, when analyzing the right-hand sides. It has to be that way, as all the variables are in scope in all the right-hand sides.

In a strict language, such a mutually recursive definition, where env_bind uses env' which uses env_bind is basically unthinkable. But in a lazy language like Haskell, it might just work.

Unfortunately, it works only as long as the recursive bindings are not actually recursive, or if they are recursive, they are not used:

ghci> canThrow1 $ LetRec [("x", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "y") someVal), ("y", Throw)] (Var "x")
True
ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "y")
True

But with genuine recursion, it does not work, and simply goes into a recursive cycle:

ghci> canThrow1 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
^CInterrupted.

That is disappointing! Do we really have to toss that code and somehow do an explicit fixed-point calculation here? Obscuring our nice declarative code? And possibly having to repeat work (such as traversing the syntax tree) many times that we should only have to do once?

rec-def to the rescue

Not with rec-def! Using R Bool from Data.Recursive.Bool instead of Bool, we can write the exact same code, as follows:

canThrow2 :: Exp -> Bool
canThrow2 = getR . go M.empty
  where
    go :: M.Map Var (R Bool) -> Exp -> R Bool
    go env (Var v)       = env M.! v
    go env Throw         = rTrue
    go env (Catch e)     = rFalse
    go env (Lam v e)     = go (M.insert v rFalse env) e
    go env (App e1 e2)   = go env e1 ||| go env e2
    go env (Let v e1 e2) = go env' e2
      where
        env_bind = M.singleton v (go env e1)
        env' = M.union env_bind env
    go env (LetRec binds e) = go env' e
      where
        env_bind = M.fromList [ (v, go env' e) | (v,e) <- binds ]
        env' = M.union env_bind env

And it works!

ghci> canThrow2 $ LetRec [("x", App (Var "x") someVal), ("y", Throw)] (Var "x")
False
ghci> canThrow2 $ LetRec [("x", App (Var "x") Throw), ("y", Throw)] (Var "x")
True

I find this much more pleasing than the explicit naive fix-pointing you might do otherwise, where you stabilize the result at each LetRec independently: Not only is all that extra work hidden from the programmer, but now also a single traversal of the syntax tree creates, thanks to the laziness, a graph of R Bool values, which are then solved “under the hood”.

The issue with x=x

There is one downside worth mentioning: canThrow2 fails to produce a result in case we hit x=x:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
^CInterrupted.

This is, after all the syntax tree has been processed and all the map lookups have been resolved, equivalent to

ghci> let x = x in getR (x :: R Bool)
^CInterrupted.

which also does not work. The rec-def machinery can only kick in if at least one of its function is used on any such cycle, even if it is just a form of identity (which I ought to add to the library):

ghci> idR x = rFalse ||| x
ghci> let x = idR x in getR (x :: R Bool)
False

And indeed, if I insert a call to idR in the line

        env_bind = M.fromList [ (v, idR (go env' e)) | (v,e) <- binds ]

then our analyzer will no longer stumble over these nasty recursive equations:

ghci> canThrow2 $ LetRec [("x", Var "x")] (Var "x")
False

It is a bit disappointing to have to do that, but I do not see a better way yet. I guess the def-rec library expects the programmer to have a similar level of sophistication as other tie-the-know tricks with laziness (where you also have to ensure that your definitions are productive and that the sharing is not accidentally lost).

Pages

Creative Commons License ลิขสิทธิ์ของบทความเป็นของเจ้าของบทความแต่ละชิ้น
ผลงานนี้ ใช้สัญญาอนุญาตของครีเอทีฟคอมมอนส์แบบ แสดงที่มา-อนุญาตแบบเดียวกัน 3.0 ที่ยังไม่ได้ปรับแก้