Username specifications for Caliopen accounts

This document describes valid username for the creation of an account within Caliopen instances.

Preambule

Caliopen's users should be able to take (almost) any username they want to, as long as this username could make the local-part of an email address.
The current specification is based on RFC 5322 for the Internet Message Format, but it does not allow "comment" and "quoted-string" lexical tokens. Consequently, some special characters are not allowed (see below).

Definition :

An username is the unique identifier an user makes use of to create an account within a Caliopen instance. Username is an identifier for the user's account : it is not necessarily the user's real name, or email, or nickname… It will be used as a credential for the purpose of identifying the user when logging in Caliopen. The username is unique within a Caliopen instance.
The username could form the local-part of an email address within Caliopen's domain. (i.e. : username@caliopen.org)

NB : once an user has chosen an username, she/he will be able to create or add some « identities », that are made of : first name, family name, email, etc.

Format :

Technical overview

On a technical point of view, username is a string of utf-8 characters. In other words it is an array of Unicode code point, meaning that each character is encoded as a single Unicode code point. For example, the character à (grave accent) should be encoded as U+00E0, and not as the sequence of the two code points U+0061 (a) and U+0300 (`).
The regex engines used to validate the username string must be unicode aware/compliant. This means that the regex engines must make use of Single Unicode Grapheme, and must be able to handle Unicode property escapes patterns (i.e.: \p{category}).

Regex :

Here is the general utf-8 PCRE regex implementation of the username format rules described above :

^[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]|[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]])){1,40}((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]])|([^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]))$

Languages implementations :

Go

The PCRE regex is directly used, thanks to the regexp standard package.

package main

import (
    "regexp"
    "fmt"
)

func main() {
    var re = regexp.MustCompile(`^[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]|[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]])){1,40}((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]])|([^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\x{0022},@\x{0060}:;<>[\\\]]))$`)
    var str = `John.Dœuf`

    for i, match := range re.FindAllString(str, -1) {
        fmt.Println(match, "found at index", i)
        return
    }
    fmt.Println("not found")
}

Python

Must install the "regex" package : pip install regex
Must replace the \x{0000} unicode reference pattern by \u0000

Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

# coding=utf8

import regex

r = ur"^[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\u0022,@\u0060:;<>[\\\]]((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\u0022,@\u0060:;<>[\\\]]|[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\u0022,@\u0060:;<>[\\\]])){1,40}((\.[^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\u0022,@\u0060:;<>[\\\]])|([^\p{C}\p{M}\p{Lm}\p{Sk}\p{Z}.\u0022,@\u0060:;<>[\\\]]))$"

username = u"John.Dœuf"

match = regex.search(r, username)

if match is not None:
    print("username is valid")
else:
    print("INVALID username")

Javascript

Built-in ECMAScript 6 regex implementation does not support the Unicode property escapes syntax. There are currently two options :