Adding #Trendlines to #PowerPivot for #PowerView

It started during a presentation of Cathy Dumas at the latest SQLBits where she compared the different charting capabilities of Excel, Power View, PerformancePoint and Reporting Service. One of those capabilities were trendlines who are missing in V1 of Power View. She suggested to add a trendline as a measure, but didn’t succeed, so she raised a challenge. And here I am taking this challenge to add a trendline to a PowerPivot model to be used in Power View.

Trendline theory

The theory behind a linear trendline is minimizing the least-square of the given set. The Wikipedia page gives a lot of theory and I must admit that the Dutch version has besides the theory also a simple solution of solving the equation.

Given a trendline with the following equation:

image

Then a and b can be solved by using the following equations:

imageimage

Example

For the following dataset:

x y
1 10
2 1
3 2
4 7

The following base values can be calculated and are input for a and b:

n 4
E(x) 10
E(y) 20
E(x*y) 46
E(x2) 30

So a and b are now:

a = –0.8
b = 7

And the trendline is now y = –0.8x + 7. With this formula it is possible to calculate a trendline value for each x of the dataset, which gives a nice trendline as a extra line chart.

Plot this trendline with the original data and the build-in trendline in Excel, the two line are overlapping each other:

image

PowerPivot

To convert this theory to a real life scenario I use the AdventueWorks database with the FactResellerSales fact table, the DimProduct tables and the DimDate table. With this dataset I ran into some challenges. First challenge is the series – I use a combination of Years and Quarters, like (20051, 20052, …) – which is not linear and continues. Also the data is not on the lowest grain of those series values.

The first step will be to convert the used series to a usable series for minimizing the least-square. Next step is adding all the needed measures to the FactResellerSales table to calculate the correct a and b.

Converting the Series

I created a nice long date series by adding a calculated column to DimDate YearQuarter

=[CalendarYear] & [CalendarQuarter]

Next I added this calculated column to the FactResellerSales table by using a calculated column RelYearQuarter

=VALUE(RELATED(DimDate[YearQuarter]))

Problem is that this series is not nicely formed as need for a correct calculation, so a conversion is needed. The current version of PowerPivot has the function RANKX which can be used to rank our series. The default order of RANKX is ascending, but a descending order is needed. Also we need a dense ties to get a continues series. This will give as a calculation for the column RankYQ

=RANKX(ALL(FactResellerSales), [RelYearQuarter], [RelYearQuarter], 1, Dense)

Calulation the Base Values

Most of these values needs to be calculated against the full selected set. Therefor most of the calculation have ALLSELECTED(FactResellerSales) included to make sure that the measure is calculated against the selected dataset creating the PivotTable.

n : NrSeries

To calculate the number of series value a maximum of the newly RankYQ column should be enough. Problem is that RankYQ is calculated again the complete set of the data, ignoring all PivotTable filters. So when the first series value is not selected in the PivotTable, the RankYQ will remain the same and is not recalculated. One solution is to convert the column to a measure, but this will cause a huge performance impact if a maximum is calculated. The best way is to correct the value of the maximum of RankYQ by subtracting the minimum of RankYQ value and adding 1 for the right offset. In this case the total number of series – NrSeries – will give the correct value for a continues selection of the series:

=MAXX(ALLSELECTED(FactResellerSales), [RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1

E(x) : SumSeries

To calculate the total sum of the series values the best way is to revert back to basic mathematics knowledge. What we need for our sum of the series is for a given n:

image

Which can be calculated  for that same n as:

image

So the measure SumSeries can be calculated as follows

=0.5*[NrSeries]*([NrSeries]+1)

E(x2) : SumSeries2

With the same mathematical knowledge we can solve the sum of our series values squared:

image

The measure SumSeries2 can be calculated as follows

=(1/6)*[NrSeries]*([NrSeries]+1)*((2*[NrSeries])+1)

E(y) : SumAmount

This is the most easiest calculation we need as it is a standard way to calculate the total of the selected values, here named SumAmount

=SUMX(ALLSELECTED(FactResellerSales), [SalesAmount])

E(x*y) : SumRankSales

To calculate this measure the same trick is needed as with the NrSeries measure: The selection of the series can start not as the first value, so it is not possible to simply multiply the RankYQ value with the Sales Amount. For this value also the minimal selected RankYQ value of the series needs to be subtracted of the current RankYQ value and corrected with 1. So the SumRankSales measure is calculated as follows

=SUMX(ALLSELECTED(FactResellerSales), ([RankYQ]-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)*[SalesAmount])

a and b

With all the base values in place a and b can be calculated as follows. In the calculation of a there is a check for division by zero (0).

=IF((([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries]))=0, BLANK(), (([NrSeries]*[SumRankSales])-([SumSeries]*[SumAmount]))/(([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries])))

=((1/[NrSeries])*[SumAmount])-((1/[NrSeries])*[a]*[SumSeries])

SalesAmount Trend

With a and b now correctly calculated we can calculate a measure – SalesAmount Trend – that gives the trendline usable for Power View. A IF statement is included to make sure that the trendline is only calculated when there is a original value of SalesAmount available. And the current series value is calculated using the current RankYQ value minus the minimal selected RankYQ value (and a correction of 1 to complete the calculation).

=IF(ISBLANK([Sum of SalesAmount]), BLANK(), [a]*(MAX([RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)+[b])

This trend value can now be used to validate with Excel if it shows the correct trendline.

image

Limitations

Of course there are some limitation to this approach, which aren’t there if Power View supports trendlines natively:

  • The value of series needs to be a continues series, so no gaps. Is this case the trendline will be off if 2006 is deselected and  2005 and 2007 are selected.
  • The Trendline is only valid for the given series and values.
  • Only a linear trendline is calculated at this moment. For Exponential and Logarithmic trendlines the calculation is almost the same, but maybe I blog about this some later time.
  • In Power View it is not possible to combine a bar chart with a line chart, so if you need to show your data as bars this trendline will also show as a bar chart.

Round-up

With all the intermediate measure in the model, some hiding is needed before it can be really used in Power View. But most of that is needed anyway, because of the different nature of PowerPivot and Power View.

There is probably a better and faster way to determine the total distinct series values. If anyone has a good suggestion: please let me know.

Unfortunately I don’t have a working Power View installation, so I cannot show the real result. But you can try it yourself by download the sample Excel sheet from here.

Enjoy trendlining

-JP

About these ads

About Jan Pieter

I am the Microsoft Business Intelligence technical lead at Inter Access, a local consultancy firm in The Netherlands.
This entry was posted in PoC and tagged , . Bookmark the permalink.

2 Responses to Adding #Trendlines to #PowerPivot for #PowerView

  1. cathydumas says:

    Wow nice work!

    I uploaded it to my Power View site to give it a go. The queries for the sales trend line are s.l.o.w – we are talking on the order of minutes to return on my box (8 GB RAM for SharePoint, AS tabular instance, SQL Server, and Power View). However if you do have a few minutes to spare, you can get a line graph out of it.

    Here is the result: http://cathydumas.files.wordpress.com/2012/04/powerviewtrendline.png
    Nice

    • Jan Pieter says:

      Thanks for the feedback and sharing the Power View result.

      Indeed I didn’t completely optimized it for performance. Probably adding a default ranking to the time series will increase the performance a bit. But the rest needs to be calculated on the fly.
      Still the best option will be that Power View as build in capabilities trendlines.

Comments are closed.