Generate Unique Usernames using Spreadsheet Functions

24 Oct 2017
functions 6 minute read

How to create a list of unique usernames from a list of given names and surnames.

Introduction

If you work with technology, you will often find you need to generate a list of unique usernames for importing into a system. You might only have a simple set of names to start from, and not fancy the job of manually checking all of them for duplicates (easy for tens, possible for hundreds, painful for thousands!). Here is how to do this automatically with just spreadsheet functions.

Concatenating a Username

Firstly, you need to know how you are going to be generating your usernames. In fact, these can be any strings you need to be unique, but we’ll stick to calling them usernames here. Imagine you have the following source data.

Given NameSurname
JohnDoe
PeterDoe
JaneDoe
JosephDoe

From this example list, we want to generate usernames. A typical format might be first letter of the given name, followed by the surname (all set as lowercase). The formula, using a concatenation, for this would be as follows:

LOWER(CONCAT(Left(A2,1),B2))

We use the lower function to wrap our concatenation, which ensures that all the usernames are lowercase (regardless of how our source data is formatted). If you are dealing with a long, or variable length list, and you want to just copy your username formula right down to the bottom of the sheet, you can wrap the entire formula in an if statement to check that the row has values before attempting the concatenation. If your source data is poorly formatted and has erroneous whitespace (e.g. spaces) before or after the values, you can also use trim to clean up the source values before you operate on them.

IF(AND(LEN(A2)>0,LEN(B2)>0), LOWER(CONCAT(Left(TRIM(A2),1),TRIM(B2))), "")

Avoiding Duplicates

Given NameSurnameUsername
JohnDoejdoe
PeterDoepdoe
JaneDoejdoe
JosephDoejdoe

After you have run the concatenation, you will end up with a table of data similar to the one above. This is great, apart from the duplicates. We need unique usernames. Ideally, we want our concatenation to step up, be a little more clever, and deal with these duplicates for us by appending a number to the end of each duplicate (to give us jdoe1, jdoe2 etc.).

To do this, we are going to need to break down the problem into a few steps. The first of these is looking for usernames that are duplicates of our current username. This requires us to add another column to the right of our username, to eventually contain our new unique username. Initially, though, we will use countif to work out how many duplicates of our username exist before our current row in the list.

COUNTIF(INDIRECT("C1:C"&(ROW()-1)),C2)
Given NameSurnameUsernameDuplicates
JohnDoejdoe0
PeterDoepdoe0
JaneDoejdoe1
JosephDoejdoe2

The complex part here is the use of indirect. It is used to build a dynamic range of all cells in the C column, before the current one, which then allows us to compare the values in previous rows to our one.

INDIRECT("C1:C"&(ROW()-1))

Once we have done that, we can concatenate together the duplicate count with our username to give a truly unique one.

CONCAT(C2,COUNTIF(INDIRECT("C1:C"&(ROW()-1)),C2))
Given NameSurnameUsernameUnique Username
JohnDoejdoejdoe0
PeterDoepdoepdoe0
JaneDoejdoejdoe1
JosephDoejdoejdoe2

This satisfies our requirement, but the zeros at the end of the usernames are rather ugly. Duplicates aren’t that common, so we should really just append the number if it’s more than zero. We can do this in two ways, the first using a simple if statement and the second using a more elegant regexreplace.

A simple IF
CONCAT(C2,IF(COUNTIF(INDIRECT("C1:C"&(ROW()-1)),C2)>0, COUNTIF(INDIRECT("C1:C"&(ROW()-1)),C2), ""))
An elegant REGEXREPLACE
CONCAT(C2,REGEXREPLACE(TEXT(COUNTIF(INDIRECT("C1:C"&(ROW()-1)),C2),"0"),"^0", ""))

The elegant solution uses a regular expression to check for a zero at the start of the count (^0) and replace it with an empty string if it is found. We can’t just replace all zeros because that would mean numbers like 10 and 100 would be reduced to 1, causing us new duplicate problems! Using either of these formulas, you will end up with nicely formatted unique usernames, like the ones in the table below.

Given NameSurnameUsernameUnique Username
JohnDoejdoejdoe
PeterDoepdoepdoe
JaneDoejdoejdoe1
JosephDoejdoejdoe2

Using a single column

The only annoying part of our solution above is that it requires two columns, and therefore two steps, to get a unique list. Firstly we have to generate all the normal usernames; then we have to use these usernames to check for duplicates and create a unique list. This is quite easy to follow (making it a simple solution), but it is quite convoluted (making it an inelegant solution). Can we take this further and do everything in a simple column/formula?

The answer is yes, and the secret is to use a combination of query and arrayformula functions to generate a dynamic list of all the previous usernames. The arrayformula function runs the username concatenation for each row, and the query returns only the usernames previous to the current one, by using the limit instruction and our ROW()-1 statement in the same way as our earlier indirect call.

QUERY(ARRAYFORMULA(LOWER(CONCAT(Left(A$2:A,1),B$2:B))), "select Col1 limit "&(Row()-1), 0)

Once we have that list, it’s straightforward to use another query to return only the duplicates of our current username. Remember that because we are using a query on a query, and a query on the output of arrayformula, we need to use Col1, rather than the column heading letter, in our query.

QUERY(QUERY(ARRAYFORMULA(LOWER(CONCAT(Left(A$2:A,1),B$2:B))), "select Col1 limit "&(Row()-1), 0), "select Col1 where Col1='"&LOWER(CONCAT(Left(A2,1),B2))&"'", 0)

As our data starts on row 2, we end up counting our current row too, so we have to adjust for that by taking one away from the result of our countif function. This gives us our duplicate count, as shown in the table below (without the need for the extra column).

COUNTIF(QUERY(QUERY(ARRAYFORMULA(LOWER(CONCAT(Left(A$2:A,1),B$2:B))), "select Col1 limit "&(Row()-1), 0), "select Col1 where Col1='"&LOWER(CONCAT(Left(A2,1),B2))&"'", 0), "<>") - 1
Given NameSurnameDuplicates
JohnDoe0
PeterDoe0
JaneDoe1
JosephDoe2

Finally, we can wrap it all up with the same value checks, lowercase formatting and regex replacement that we have described earlier, giving us a single formula that does everything.

IF(AND(LEN(A2)>0,LEN(B2)>0), LOWER(CONCATENATE(Left(A2,1),B2,REGEXREPLACE(TEXT(COUNTIF(QUERY(QUERY(ARRAYFORMULA(LOWER(CONCAT(Left(A$2:A,1),B$2:B))), "select Col1 limit "&(Row()-1), 0), "select Col1 where Col1='"&LOWER(CONCAT(Left(A2,1),B2))&"'", 0), "<>")-1,"0"), "^0", ""))),"")
Given NameSurnameUnique Username
JohnDoejdoe
PeterDoepdoe
JaneDoejdoe1
JosephDoejdoe2

Success! Take a bow and enjoy the applause.

Tagged Google, Sheet

Related Google Calendars on the Web (01 Sep 2018), Protecting Data with Google Drive (14 Aug 2018), Dealing with confidential information in schools (24 Jul 2018)