Table of contents

This page contains best practice solutions to common tasks, such as modelling time, managing multiple versions of data and querying data in worksheets.

Time dimension

Most models will need one or more time dimensions to represent years, months quarters and days. CubeWeaver is a very versatile tool. There are always multiple ways to model the same concept. The following sections describe several ways of working with time:

Multiple independent flat time dimensions

The simplest way to represent time is to create separate flat lists for years, quarters, months and days. Each worksheet gets a set of dimensions depending on its granularity.

Advantages
- Easy to set up: just create a list with 12 items for months and another one for years
- Simple aggregation without using joins
Disadvantages
- Time based computations gets tricky: you will have to navigate both the year and the month dimensions, if you need to go two months back
- Working with quarters can be counterintuitive: we end up having only three months

A sample model which represents time using separate years and months dimensions can be downloaded below. A single formula which lags the series by two months shows why time based computations are not easy.

Independent time dimensions.cwa

A single hierarchical time dimension

Time can also be represented as a hierarchy of years, quarters, months and days. To create a hierarchy the the base lists like years and months are created first:

Years
- 2021
- 2022
- ...
Months
- January
- February
- ...

After that the link lists for year-month or year-quarter-month hierarchies are created:

Year-Months
- 2021 - January
- 2021 - February
- ...
- 2022 - January
- ...

The cross join list item auto-generation feature can be used to automatically create months for all years.

Advantages
- Time based computations are much more simple: ['Year-month'.prev()] will always work as expected
Disadvantages
- It takes a bit more time to setup the dimensions
- The join function has to be used for aggregation

A sample model which represents time as a hierarchy of years, quarters and months can be downloaded below. The model also contains a formula for computing yearly sums and a formula to access previous time periods.

Hierarchical time dimension.cwa

Accessing time from formulas

The standard spreadsheet date(year, month, day) function is used to construct a date from years, months and days. There are multiple ways to attach a numeric date to list items:

Ids with numeric values can be used to store the year and months numbers. The value() function can be used to access the id of a list item as a numeric value. For example: date(value(Year), value(Month), 1)
A better option is to create separate numeric columns like YEAR_NUM and MONTH_NUM and use them in formulas like: date(Year.YEAR_NUM, Month.MONTH_NUM, 1) or for hierarchies: date('Year-Month'.YEAR.YEAR_NUM, 'Year-Month'.MONTH.MONTH_NUM, 1)

The following example illustrates the latter method:

Date function.cwa

Queries and aggregation

It is often necessary to compute a sum over a subset of cells. The sumif function is used by spreadsheet models to solve such cases. In addition to the sumif function CubeWeaver offers several other ways to query data.

We will use a human resources model as an example. Let us say we have a list of employees. An employee has following properties: - Work location: Berlin or Paris - Department: R&D or Sales - Start year and end year of the work contract - Salary

We have the choice to store each of the properties as an attribute of the 'Employee' list or as a cell in one of the worksheets. Here are some general recommendations: - References to other list items that do not change over time (or from scenario to scenario) should be stored as an attribute - Numeric or time values should be stored in worksheets - Time-dependant and frequently updated information should be stored in worksheets

According to the recommendations above we will store the location and department info as an attribute. Salary and start/end dates will be stored in the 'Payroll' worksheet.

There are multiple ways to query the data depending on how the search parameters are stored:

1) Use the join() function inside a cell reference to query the data based on list item attribute
2) There are two options when working with data stored in worksheet cells:
- a) Create additional helper worksheets or measures and use the if(condition, value, 0) function to select the data which meets the condition first and use the sum() function to compute the sum in the second step. This is the preferred method.
- b) Compute the sum directly using a combination of map_cells, zip_values, filter_values and sum functions. These options should only be used by advanced users. It is much harder to understand what went wrong without seeing the intermediate results. This method will not work if a query scans more than one dimension at once. The functions are not compatible with Excel export.
3) Use a combination of 1) and 2) if to perform a combined query over parameters stored as attributes and as worksheet cells

Here is a list of some example queries with methods to implement them (a complete sample model can be downloaded below):

Compute the sum of salaries by department and location using method 1 (join by department and location):

sum(Payroll.['Work location'.join(Employee), Department.join(Employee)])
Compute the sum of salaries in the R&D department in a worksheet without the department dimension using method 1 (join using a constant):

sum(Payroll.[Department:1.join(Employee)])
Compute the sum of salaries of employees with an active work contract for each year using method 2a (with a helper sheet and two formulas):

if(and(Payroll.[Measures:02]<=value(Year), Payroll.[Measures:03]>=value(Year)), Payroll.[], 0)

sum('3. Active salaries (helper)'.[])
Compute the sum of salaries of employees with an active work contract for each year using method 2b (without a helper sheet, single formula):

sum(zip_values(Payroll.[], Payroll.[Measures:02], Payroll.[Measures:03], if(and($value2<=value(Year), $value3>=value(Year)), $value1, 0)))
Compute the sum of salaries of employees with an active work contract per department and year using method 3 (1 + 2a) (we will reuse the helper sheet from example 3.):

sum('3. Active salaries 2a (helper)'.[Department.join(Employee)])

A complete model with all 5 example queries:

Date function.cwa

Scenario and version management

When we have a working computational model we can use it to simulate multiple scenarios with different input parameters.

In most cases it is a bad idea to copy the whole model for several reasons: additional effort of keeping multiple models in sync, it is hard to transfer data between models, it is harder to compare the results.

The preferred solution is to create a "Scenario" or "Version" dimension and add it to all worksheets that have to be simulated. A good practice is to create a "Base" or "Primary" version and several more items for different scenarios. The data can be copied from the base case using formulas or buttons. Buttons should be used for the input parameters which should be changed for the simulation. Formulas are used for the parameters that should remain unchanged.

The example below contains the lemonade stand model with an additional "Scenario management" worksheet. The scenario dimension has two additional attributes:

SOURCE_SCENARIO is a scenario which is used as a data source for the current scenario
KEEP_SALES_SYNCED a flag which is used in the area of effect of a formula which automatically copies sales figures from the source scenario

The "Scenario management" worksheet contains buttons for each scenario, which let the user copy data from the source scenario, clear (set parameters to 0) the scenario and adjust all sales figures by a certain percentage with a single click.

Here is the formatting formula for the copy operation: formatbutton(concatenate("[confirm:Overwrite scenario?]Copy from '", text(Scenario.SOURCESCENARIO), "'"), update(Sales.[Measures.all()], [Scenario.SOURCESCENARIO]), update('Exchange rates'.[Measures.all()], [Scenario.SOURCE_SCENARIO]) )

The first update copies all measures from the source to target for the Sales worksheet. The second one copies the Exchange rates worksheet.

Scenario management.cwa