It started during a presentation of Cathy Dumas at the latest SQLBits where she compared the different charting capabilities of Excel, Power View, PerformancePoint and Reporting Service. One of those capabilities were trendlines who are missing in V1 of Power View. She suggested to add a trendline as a measure, but didn’t succeed, so she raised a challenge. And here I am taking this challenge to add a trendline to a PowerPivot model to be used in Power View.

## Trendline theory

The theory behind a linear trendline is minimizing the least-square of the given set. The Wikipedia page gives a lot of theory and I must admit that the Dutch version has besides the theory also a simple solution of solving the equation.

Given a trendline with the following equation:

Then *a* and *b* can be solved by using the following equations:

## Example

For the following dataset:

xy1 10 2 1 3 2 4 7

The following base values can be calculated and are input for *a* and *b*:

n 4 E(x) 10 E(y) 20 E(x*y) 46 E(x2) 30

So *a* and *b* are now:

a= –0.8b= 7

And the trendline is now *y = –0.8x + 7*. With this formula it is possible to calculate a trendline value for each *x* of the dataset, which gives a nice trendline as a extra line chart.

Plot this trendline with the original data and the build-in trendline in Excel, the two line are overlapping each other:

## PowerPivot

To convert this theory to a real life scenario I use the AdventueWorks database with the *FactResellerSales* fact table, the *DimProduct* tables and the *DimDate* table. With this dataset I ran into some challenges. First challenge is the series – I use a combination of Years and Quarters, like (20051, 20052, …) – which is not linear and continues. Also the data is not on the lowest grain of those series values.

The first step will be to convert the used series to a usable series for minimizing the least-square. Next step is adding all the needed measures to the *FactResellerSales* table to calculate the correct *a* and *b*.

### Converting the Series

I created a nice long date series by adding a calculated column to *DimDate* **YearQuarter **

=[CalendarYear] & [CalendarQuarter]

Next I added this calculated column to the *FactResellerSales* table by using a calculated column **RelYearQuarter **

=VALUE(RELATED(DimDate[YearQuarter]))

Problem is that this series is not nicely formed as need for a correct calculation, so a conversion is needed. The current version of PowerPivot has the function RANKX which can be used to rank our series. The default order of RANKX is ascending, but a descending order is needed. Also we need a *dense* ties to get a continues series. This will give as a calculation for the column **RankYQ**

=RANKX(ALL(FactResellerSales), [RelYearQuarter], [RelYearQuarter], 1, Dense)

### Calulation the Base Values

Most of these values needs to be calculated against the full selected set. Therefor most of the calculation have **ALLSELECTED(FactResellerSales)** included to make sure that the measure is calculated against the selected dataset creating the PivotTable.

*n : NrSeries*

To calculate the number of series value a maximum of the newly **RankYQ** column should be enough. Problem is that **RankYQ** is calculated again the complete set of the data, ignoring all PivotTable filters. So when the first series value is not selected in the PivotTable, the **RankYQ** will remain the same and is not recalculated. One solution is to convert the column to a measure, but this will cause a huge performance impact if a maximum is calculated. The best way is to correct the value of the maximum of **RankYQ** by subtracting the minimum of **RankYQ** value and adding 1 for the right offset. In this case the total number of series – **NrSeries** – will give the correct value for a continues selection of the series:

=MAXX(ALLSELECTED(FactResellerSales), [RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1

*E(x) : SumSeries*

To calculate the total sum of the series values the best way is to revert back to basic mathematics knowledge. What we need for our sum of the series is for a given *n*:

Which can be calculated for that same *n* as:

So the measure **SumSeries** can be calculated as follows

=0.5*[NrSeries]*([NrSeries]+1)

*E(x ^{2}) : SumSeries2*

With the same mathematical knowledge we can solve the sum of our series values squared:

The measure **SumSeries2** can be calculated as follows

=(1/6)*[NrSeries]*([NrSeries]+1)*((2*[NrSeries])+1)

*E(y) : SumAmount*

This is the most easiest calculation we need as it is a standard way to calculate the total of the selected values, here named **SumAmount**

=SUMX(ALLSELECTED(FactResellerSales), [SalesAmount])

*E(x*y) : SumRankSales*

To calculate this measure the same trick is needed as with the **NrSeries **measure: The selection of the series can start not as the first value, so it is not possible to simply multiply the **RankYQ **value with the **Sales Amount**. For this value also the minimal selected **RankYQ** value of the series needs to be subtracted of the current **RankYQ** value and corrected with 1. So the **SumRankSales** measure is calculated as follows

=SUMX(ALLSELECTED(FactResellerSales), ([RankYQ]-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)*[SalesAmount])

*a and b*

With all the base values in place *a* and *b* can be calculated as follows. In the calculation of *a* there is a check for division by zero (0).

=IF((([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries]))=0, BLANK(), (([NrSeries]*[SumRankSales])-([SumSeries]*[SumAmount]))/(([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries])))

=((1/[NrSeries])*[SumAmount])-((1/[NrSeries])*[a]*[SumSeries])

*SalesAmount Trend*

With *a* and *b* now correctly calculated we can calculate a measure – **SalesAmount Trend **– that gives the trendline usable for Power View. A **IF** statement is included to make sure that the trendline is only calculated when there is a original value of *SalesAmount* available. And the current series value is calculated using the current **RankYQ** value minus the minimal selected **RankYQ** value (and a correction of 1 to complete the calculation).

=IF(ISBLANK([Sum of SalesAmount]), BLANK(), [a]*(MAX([RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)+[b])

This trend value can now be used to validate with Excel if it shows the correct trendline.

## Limitations

Of course there are some limitation to this approach, which aren’t there if Power View supports trendlines natively:

- The value of series needs to be a continues series, so no gaps. Is this case the trendline will be off if 2006 is deselected and 2005 and 2007 are selected.
- The Trendline is only valid for the given series and values.
- Only a linear trendline is calculated at this moment. For Exponential and Logarithmic trendlines the calculation is almost the same, but maybe I blog about this some later time.
- In Power View it is not possible to combine a bar chart with a line chart, so if you need to show your data as bars this trendline will also show as a bar chart.

## Round-up

With all the intermediate measure in the model, some hiding is needed before it can be really used in Power View. But most of that is needed anyway, because of the different nature of PowerPivot and Power View.

There is probably a better and faster way to determine the total distinct series values. If anyone has a good suggestion: please let me know.

Unfortunately I don’t have a working Power View installation, so I cannot show the real result. But you can try it yourself by download the sample Excel sheet from here.

Enjoy trendlining

-JP

Wow nice work!

I uploaded it to my Power View site to give it a go. The queries for the sales trend line are s.l.o.w – we are talking on the order of minutes to return on my box (8 GB RAM for SharePoint, AS tabular instance, SQL Server, and Power View). However if you do have a few minutes to spare, you can get a line graph out of it.

Here is the result: http://cathydumas.files.wordpress.com/2012/04/powerviewtrendline.png

Nice

Thanks for the feedback and sharing the Power View result.

Indeed I didn’t completely optimized it for performance. Probably adding a default ranking to the time series will increase the performance a bit. But the rest needs to be calculated on the fly.

Still the best option will be that Power View as build in capabilities trendlines.