How to create a list of unique usernames from a list of given names and surnames.
Introduction
If you work with technology, you will often find you need to generate a list of unique usernames for importing into a system. You might only have a simple set of names to start from, and not fancy the job of manually checking all of them for duplicates (easy for tens, possible for hundreds, painful for thousands!). Here is how to do this automatically with just spreadsheet functions.
Concatenating a Username
Firstly, you need to know how you are going to be generating your usernames. In fact, these can be any strings you need to be unique, but we’ll stick to calling them usernames here. Imagine you have the following source data.
Given Name | Surname |
---|---|
John | Doe |
Peter | Doe |
Jane | Doe |
Joseph | Doe |
From this example list, we want to generate usernames. A typical format might be first letter of the given name, followed by the surname (all set as lowercase). The formula, using a concatenation, for this would be as follows:
We use the lower function to wrap our concatenation, which ensures that all the usernames are lowercase (regardless of how our source data is formatted). If you are dealing with a long, or variable length list, and you want to just copy your username formula right down to the bottom of the sheet, you can wrap the entire formula in an if statement to check that the row has values before attempting the concatenation. If your source data is poorly formatted and has erroneous whitespace (e.g. spaces) before or after the values, you can also use trim to clean up the source values before you operate on them.
Avoiding Duplicates
Given Name | Surname | Username |
---|---|---|
John | Doe | jdoe |
Peter | Doe | pdoe |
Jane | Doe | jdoe |
Joseph | Doe | jdoe |
After you have run the concatenation, you will end up with a table of data similar to the one above. This is great, apart from the duplicates. We need unique usernames. Ideally, we want our concatenation to step up, be a little more clever, and deal with these duplicates for us by appending a number to the end of each duplicate (to give us jdoe1, jdoe2 etc.).
To do this, we are going to need to break down the problem into a few steps. The first of these is looking for usernames that are duplicates of our current username. This requires us to add another column to the right of our username, to eventually contain our new unique username. Initially, though, we will use countif to work out how many duplicates of our username exist before our current row in the list.
Given Name | Surname | Username | Duplicates |
---|---|---|---|
John | Doe | jdoe | 0 |
Peter | Doe | pdoe | 0 |
Jane | Doe | jdoe | 1 |
Joseph | Doe | jdoe | 2 |
The complex part here is the use of indirect. It is used to build a dynamic range of all cells in the C column, before the current one, which then allows us to compare the values in previous rows to our one.
Once we have done that, we can concatenate together the duplicate count with our username to give a truly unique one.
Given Name | Surname | Username | Unique Username |
---|---|---|---|
John | Doe | jdoe | jdoe0 |
Peter | Doe | pdoe | pdoe0 |
Jane | Doe | jdoe | jdoe1 |
Joseph | Doe | jdoe | jdoe2 |
This satisfies our requirement, but the zeros at the end of the usernames are rather ugly. Duplicates aren’t that common, so we should really just append the number if it’s more than zero. We can do this in two ways, the first using a simple if statement and the second using a more elegant regexreplace.
A simple IF
An elegant REGEXREPLACE
The elegant solution uses a regular expression to check for a zero at the start of the count (^0) and replace it with an empty string if it is found. We can’t just replace all zeros because that would mean numbers like 10 and 100 would be reduced to 1, causing us new duplicate problems! Using either of these formulas, you will end up with nicely formatted unique usernames, like the ones in the table below.
Given Name | Surname | Username | Unique Username |
---|---|---|---|
John | Doe | jdoe | jdoe |
Peter | Doe | pdoe | pdoe |
Jane | Doe | jdoe | jdoe1 |
Joseph | Doe | jdoe | jdoe2 |
Using a single column
The only annoying part of our solution above is that it requires two columns, and therefore two steps, to get a unique list. Firstly we have to generate all the normal usernames; then we have to use these usernames to check for duplicates and create a unique list. This is quite easy to follow (making it a simple solution), but it is quite convoluted (making it an inelegant solution). Can we take this further and do everything in a simple column/formula?
The answer is yes, and the secret is to use a combination of query and arrayformula functions to generate a dynamic list of all the previous usernames. The arrayformula function runs the username concatenation for each row, and the query returns only the usernames previous to the current one, by using the limit instruction and our ROW()-1 statement in the same way as our earlier indirect call.
Once we have that list, it’s straightforward to use another query to return only the duplicates of our current username. Remember that because we are using a query on a query, and a query on the output of arrayformula, we need to use Col1, rather than the column heading letter, in our query.
As our data starts on row 2, we end up counting our current row too, so we have to adjust for that by taking one away from the result of our countif function. This gives us our duplicate count, as shown in the table below (without the need for the extra column).
Given Name | Surname | Duplicates |
---|---|---|
John | Doe | 0 |
Peter | Doe | 0 |
Jane | Doe | 1 |
Joseph | Doe | 2 |
Finally, we can wrap it all up with the same value checks, lowercase formatting and regex replacement that we have described earlier, giving us a single formula that does everything.
Given Name | Surname | Unique Username |
---|---|---|
John | Doe | jdoe |
Peter | Doe | pdoe |
Jane | Doe | jdoe1 |
Joseph | Doe | jdoe2 |
Success! Take a bow and enjoy the applause.
Related ► Google Calendars on the Web (01 Sep 2018), Protecting Data with Google Drive (14 Aug 2018), Dealing with confidential information in schools (24 Jul 2018)