Data journalists should know how to use Excel or the free spreadsheet software Google sheets to clean and process data.
Some data sets will tell us a story with only some sorting or creating a pivot table while others require some maths.
In this post, you will find some examples of basic formulas to analyse data and find stories in spreadsheets.
Simple calculations with spreadsheets
To follow the exercises, open this spreadsheet in Google Drive and copy it. If you prefer, you can download the data and work in Excel or Open Office.
The data show the salaries of managers in an imaginary company.
Before getting down to work, we have to freeze the first row of the data set. Although the spreadsheet is small, it is easier to work with the first row frozen.
Follow the instructions in the picture.
The first thing we want to know is the total amount of money the managers earn. To do this, we have to add up all the salaries.
Go down to the first cell that is free in the data set and write the following formula:
As you can see, when you write =SUM and the first bracket, the programme shows you the values the formula needs.
You can write manually the name of the cells or select all of them with the shift key pressed.
TASK: Repeat the same exercise for the salaries in 2015. Compare your results with the picture.
Now we want to know the difference between the salaries in 2014 and in 2015 in absolute numbers. We will calculate the percentage later on.
Create a new column with a name that helps you identify the calculation. The formula we need is the following:
= (Cell with the salary of 2015 – cell with the salary of 2014)
TASK: Repeat the exercise for each salary. When you have finished, your spreadsheet should look like this:
Multiplication and division
The sign for the multiplication is the asterisk * and the formula is:
= (cell * number)
The division uses the slash / and the formula is:
= (cell / number)
We will practise later with these elements because we will use it as part of other formulas.
Average or mean
The Collins dictionary defines average as:
“The result obtained by adding the numbers or quantities in a set and dividing the total by the number of members in the set.”
In our example, the average – also called mean – will be the sum of all the salaries divided by the number of the salaries we have.
In the spreadsheet you only have to write the following formula:
Has your result decimals? In this case, the format of the cell is not appropriate.
Select all the numbers and change the cell format to currency without decimals as the picture shows.
TASK: Calculate the average of the salaries in 2015. After that, compare your result with the picture.
The mode is the value that appears more times in the data set.
In this example, the mode is useful to know, since it is the most common salary among the managers.
Write the following formula:
TASK: Calculate the mode for the salaries in column C. Remember to change the format of the cell to currency.
Your result should look like this:
Why do you think there is an error? The reason is simple. In column C, no number is repeated and, therefore, there is no mode for the list.
The median refers to the value that is exactly in the middle of the sorted data set.
The formula is the following:
TASK: Calculate the median of the salaries in 2015. Your result should look like this:
As you can see in the example, the average, the mode and the median show different values, and that could be a lead to follow and find a story.
Percentage over the total
Now we want to know the percentage every salary represents over the total.
Create a new column, and in the first cell write a name that helps you identify the content. Now, write the following formula:
The formula of the example is: Phillip’s salary in 2014 (cell B2) divided by the total of the salaries the same year (cell B9).
To convert the result to a percentage, click on the button % in the menu.
In this calculation, the cell B9 will be always the same. To keep it frozen, we need to add to the formula the dollar sign between the letter and the number of the cell that has the total sum.
The dollar sign tells the spreadsheet that this cell is not going to change.
Click on Enter and put the cursor over the small blue square that appears on the right of your cell. When a cross appears, click and drag down your cursor until the cell 8. This should be your result:
TASK: Repeat the same operation for the salaries in 2015. This is the result you should get:
Our company has made profits and has increased almost all manager salaries. We now want to know the evolution of the salaries in percentage terms. We need to calculate the percentage change.
Data journalist Crina Boros showed me this formula during the CIJ Summer conference in London in 2015. She has a trick to remember the formula. Say NOO!
New value – Old value / Old value
In the spreadsheet, write the formula as the picture shows:
The example is the subtraction between Phillip’s salary in 2015 and 2014 divided by Phillip’s salary in 2014.
The result needs to be converted to a percentage as we did before.
Your final spreadsheet should look like this:
These are only a few formulas that journalists can use to find stories in spreadsheets, but there are many more. Which ones do you use? Leave your comments here or contact me at @Carlapedret.