Like a database query join, how to denormalise tabular data to create expanded rows, ideal for mail-merges and data imports.
Introduction
Before importing data into a system, or creating a mail-merge, you might need to denormalise your data. In simple terms, this means taking a more efficient data representation and expand it into a series of (often partially repeating) rows.
This is essential because whilst it is easy for you to read a compressed table of data, it isn’t good fodder for feeding into a mail-merge routine or another system. For this, you will need a single row of data for each ‘thing’ you are representing - e.g. label you want to print.
Source Data
Imagine we need to print book labels, with a single label for each book. Every individual will need one book for each subject, and each of these books will need a label. If we have 30 names, and 10 subjects, we will need 300 labels printed. To do this with a conventional mail-merge, we will need 300 rows in our source spreadsheet.
We might start with a simple table containing the names of each individual and a list of all the subjects (assuming everyone takes the same subjects).
Names [A] | Group [B] | Subjects [C] |
---|---|---|
John Doe | 6A | Maths |
Jane Doe | 6A | English |
Jimmy Doe | 6B | Science |
History | ||
Geography | ||
… |
We now need to turn this into our de-normalised rows, one row for each name (Column A) and subject (Column C). We can do this with a couple of clever little formulas.
Denormalising
The first of these formulas will duplicate the names so that there are multiple rows (matching the number of subjects in column C) for each name. We start with the second row (A2:A) because the first row contains our column headings.
To pull across the group codes for each name, we could use a similar formula, or we could fall back to using a simple index/match approach (here we are assuming the output data begins on column E).
Finally, we use a similar formula to above to create the list of subjects. But, instead of being the same value repeated a certain number of times, it is the entire list of subjects, repeated for each name.
Output Rows
Using these formulas on our example data from above (3 names and 5 subjects), we will end up with 15 output rows that look something like this:
Names [D] | Group [E] | Subjects [F] |
---|---|---|
John Doe | 6A | Maths |
John Doe | 6A | English |
John Doe | 6A | Science |
John Doe | 6A | History |
John Doe | 6A | Geography |
Jane Doe | 6A | Maths |
Jane Doe | 6A | English |
Jane Doe | 6A | Science |
Jane Doe | 6A | History |
Jane Doe | 6A | Geography |
Jimmy Doe | 6B | Maths |
Jimmy Doe | 6B | English |
Jimmy Doe | 6B | Science |
Jimmy Doe | 6B | History |
Jimmy Doe | 6B | Geography |
You can see the two patterns here, 5 (the number of subjects) rows for each of the names, and the 5 subjects repeated for each of the 3 names.
This data can then be used to create our label mailmerge. Best of all, if we need to add extra subjects or names to our source data; our output rows will automatically update to reflect the new changes!
Why the ✏?
No reason at all! You just need to use any character or symbol that will not appear in your values at all! The pencil unicode symbol is just a memorable one chosen to meet this requirement.
Related ► Google Calendars on the Web (01 Sep 2018), Protecting Data with Google Drive (14 Aug 2018), Dealing with confidential information in schools (24 Jul 2018)