Snippet: Correctly capitalize names in PHP #

5 minute read    Published: 2019-07-17

When building websites with any kind of user registration, it's fascinating what people enter in name fields. no casing, Random CASING, a dozen spaces    between     words, or nospacingatall. Seeing this always irritates me, I'd fancy things to nice and be consistent.

It appears that correctly normalizing name capitalization is an unsolvable puzzle. There is no consistency in name casing, or for any kind of name formatting for that matter. See Falsehoods programmers believe about names.

I always wonder how big social networks handle this.

Okay, so this isn't solvable. But at least I could try to make it better. I came across this wonderful PHP snippet for name capitalization a while back, but it had a few shortages. It didn't correctly case with just a person's last name for instance (needed when storing first/last names separate). I love challenges like this and decided to improve, here is my take on it:

<?php

/**
 * Normalize the given (partial) name of a person.
 *
 * - re-capitalize, take last name inserts into account
 * - remove excess white spaces
 *
 * Snippet from: https://timvisee.com/blog/snippet-correctly-capitalize-names-in-php
 *
 * @param string $name The input name.
 * @return string The normalized name.
 */
function name_case($name) {
    // A list of properly cased parts
    $CASED = [
      "O'", "l'", "d'", 'St.', 'Mc', 'the', 'van', 'het', 'in', "'t", 'ten',
      'den', 'von', 'und', 'der', 'de', 'da', 'of', 'and', 'the', 'III', 'IV',
      'VI', 'VII', 'VIII', 'IX',
    ];

    // Trim whitespace sequences to one space, append space to properly chunk
    $name = preg_replace('/\s+/', ' ', $name) . ' ';

    // Break name up into parts split by name separators
    $parts = preg_split('/( |-|O\'|l\'|d\'|St\\.|Mc)/i', $name, -1, PREG_SPLIT_DELIM_CAPTURE);

    // Chunk parts, use $CASED or uppercase first, remove unfinished chunks
    $parts = array_chunk($parts, 2);
    $parts = array_filter($parts, function($part) {
            return sizeof($part) == 2;
        });
    $parts = array_map(function($part) use($CASED) {
            // Extract to name and separator part
            list($name, $separator) = $part;

            // Use specified case for separator if set
            $cased = current(array_filter($CASED, function($i) use($separator) {
                return strcasecmp($i, $separator) == 0;
            }));
            $separator = $cased ? $cased : $separator;

            // Choose specified part case, or uppercase first as default
            $cased = current(array_filter($CASED, function($i) use($name) {
                return strcasecmp($i, $name) == 0;
            }));
            return [$cased ? $cased : ucfirst(strtolower($name)), $separator];
        }, $parts);
    $parts = array_map(function($part) {
            return implode($part);
        }, $parts);
    $name = implode($parts);

    // Trim and return normalized name
    return trim($name);
}
Tap here to expand a better version for use with Laravel.

This variant is more concise and uses a function approach using Laravel collections:

<?php

/**
 * Normalize the given (partial) name of a person.
 *
 * - re-capitalize, take last name inserts into account
 * - remove excess white spaces
 *
 * Snippet from: https://timvisee.com/blog/snippet-correctly-capitalize-names-in-php
 *
 * @param string $name The input name.
 * @return string The normalized name.
 */
function name_case($name) {
    // A list of properly cased parts
    $CASED = collect([
        "O'", "l'", "d'", 'St.', 'Mc', 'the', 'van', 'het', 'in', "'t", 'ten',
        'den', 'von', 'und', 'der', 'de', 'da', 'of', 'and', 'the', 'III', 'IV',
        'VI', 'VII', 'VIII', 'IX',
    ]);

    // Trim whitespace sequences to one space, append space to properly chunk
    $name = preg_replace('/\s+/', ' ', $name) . ' ';

    // Break name up into parts split by name separators
    $parts = preg_split('/( |-|O\'|l\'|d\'|St\\.|Mc)/i', $name, -1, PREG_SPLIT_DELIM_CAPTURE);

    // Chunk parts, use $CASED or uppercase first, remove unfinished chunks
    $name = collect($parts)
        ->chunk(2)
        ->filter(function($part) {
            return $part->count() == 2;
        })
        ->mapSpread(function($name, $separator = null) use($CASED) {
            // Use specified case for separator if set
            $cased = $CASED->first(function($i) use($separator) {
                return strcasecmp($i, $separator) == 0;
            });
            $separator = $cased ?? $separator;

            // Choose specified part case, or uppercase first as default
            $cased = $CASED->first(function($i) use($name) {
                return strcasecmp($i, $name) == 0;
            });
            return [$cased ?? ucfirst(strtolower($name)), $separator];
        })
        ->map(function($part) {
            return implode($part);
        })
        ->join('');

    // Trim and return normalized name
    return trim($name);
}

Of course, this function fulfills the truth table presented with the original snippet:

InputBecomes
michael o’carrolMichael O’Carrol
lucas l’amourLucas l’Amour
george d’onofrioGeorge d’Onofrio
william stanley iiiWilliam Stanley III
UNITED STATES OF AMERICAUnited States of America
t. von lieres und wilkauT. von Lieres und Wilkau
paul van der knaapPaul van der Knaap
jean-luc picardJean-Luc Picard
JOHN MCLARENJohn McLaren
hENRIC vIIIHenric VIII
VAsco da GAmaVasco da Gama

It neatly passes additional previously problematic situations as well. Brilliant!

InputOriginal snippetThis snippet
van der knaapVan der Knaapvan der Knaap
l’amourL’Amourl’Amour
von lieres    UND wilkauVon Lieres    und Wilkauvon Lieres und Wilkau

Normalizing using a function like this makes it impossible for some to enter their name as formatted on their ID. Knowing the audience you serve, this is a risk you may be able to accept but it will never be perfect. You could always use this to suggest formatting improvements to the user, allowing them to choose what's right.


Using numbers to identify people would be a more rational choice, except when you're called Pi. /s

Feel free to use and share.

Special thanks to Armand Niculescu, for the snippet this was inspired by!

Categories

Tags

Copyright © Tim Visée 2019
Source