Multiple hierarchies and xTDs

Last week I was challenged by a co-worker: ‘create a way that a tabular cube case serve a YTD value for different time hierarchies.’ Because I work at a retailer at this moment, there is a fiscal calendar and a 454 calendar, which of course doesn’t align in the begin and end date.

While I was writing this post, Marco Russo wrote a great post on how to create different xTD values. Please read it: http://www.sqlbi.com/articles/week-based-time-intelligence-in-dax/ (Note to myself: do not postpone the writing of your posts)

This has one advantage for me: I don’t have to explain the logic behind xTD who are not based on a standard calendar. Lucky me :-).

The only challenge left is the one measure with different hierarchies, but that can easily been solved by using to Date tables in PowerPivot. One table containing the standard calendar and one containing the year-week data. Both will get a hierarchy and are linked to the same fact table using the same date key.

Diagram view

And in the fact table two values are defined: YTDMonth and YTDWeek using both a different date table. And the combined is a measure with a IF() statement showing one of the two YTD values.

YTD:=IF(HASONEVALUE('Calendar Week'[Year]), [YTDWeek],[YTDMonth])

Basically it is the same situation as there was with Analysis Service 2000: per dimension one hierarchy. And as long as DAX is not aware of the dimension that is selected or scoping, this design paradigm should be used to solve this problem. Or stick with a value per xTD.

-JP

Test file: Multiple hierarchies with one YTD measure

Posted in PoC | Tagged , ,

Adding #Trendlines to #PowerPivot for #PowerView

It started during a presentation of Cathy Dumas at the latest SQLBits where she compared the different charting capabilities of Excel, Power View, PerformancePoint and Reporting Service. One of those capabilities were trendlines who are missing in V1 of Power View. She suggested to add a trendline as a measure, but didn’t succeed, so she raised a challenge. And here I am taking this challenge to add a trendline to a PowerPivot model to be used in Power View.

Trendline theory

The theory behind a linear trendline is minimizing the least-square of the given set. The Wikipedia page gives a lot of theory and I must admit that the Dutch version has besides the theory also a simple solution of solving the equation.

Given a trendline with the following equation:

image

Then a and b can be solved by using the following equations:

imageimage

Example

For the following dataset:

x y
1 10
2 1
3 2
4 7

The following base values can be calculated and are input for a and b:

n 4
E(x) 10
E(y) 20
E(x*y) 46
E(x2) 30

So a and b are now:

a = –0.8
b = 7

And the trendline is now y = –0.8x + 7. With this formula it is possible to calculate a trendline value for each x of the dataset, which gives a nice trendline as a extra line chart.

Plot this trendline with the original data and the build-in trendline in Excel, the two line are overlapping each other:

image

PowerPivot

To convert this theory to a real life scenario I use the AdventueWorks database with the FactResellerSales fact table, the DimProduct tables and the DimDate table. With this dataset I ran into some challenges. First challenge is the series – I use a combination of Years and Quarters, like (20051, 20052, …) – which is not linear and continues. Also the data is not on the lowest grain of those series values.

The first step will be to convert the used series to a usable series for minimizing the least-square. Next step is adding all the needed measures to the FactResellerSales table to calculate the correct a and b.

Converting the Series

I created a nice long date series by adding a calculated column to DimDate YearQuarter

=[CalendarYear] & [CalendarQuarter]

Next I added this calculated column to the FactResellerSales table by using a calculated column RelYearQuarter

=VALUE(RELATED(DimDate[YearQuarter]))

Problem is that this series is not nicely formed as need for a correct calculation, so a conversion is needed. The current version of PowerPivot has the function RANKX which can be used to rank our series. The default order of RANKX is ascending, but a descending order is needed. Also we need a dense ties to get a continues series. This will give as a calculation for the column RankYQ

=RANKX(ALL(FactResellerSales), [RelYearQuarter], [RelYearQuarter], 1, Dense)

Calulation the Base Values

Most of these values needs to be calculated against the full selected set. Therefor most of the calculation have ALLSELECTED(FactResellerSales) included to make sure that the measure is calculated against the selected dataset creating the PivotTable.

n : NrSeries

To calculate the number of series value a maximum of the newly RankYQ column should be enough. Problem is that RankYQ is calculated again the complete set of the data, ignoring all PivotTable filters. So when the first series value is not selected in the PivotTable, the RankYQ will remain the same and is not recalculated. One solution is to convert the column to a measure, but this will cause a huge performance impact if a maximum is calculated. The best way is to correct the value of the maximum of RankYQ by subtracting the minimum of RankYQ value and adding 1 for the right offset. In this case the total number of series – NrSeries – will give the correct value for a continues selection of the series:

=MAXX(ALLSELECTED(FactResellerSales), [RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1

E(x) : SumSeries

To calculate the total sum of the series values the best way is to revert back to basic mathematics knowledge. What we need for our sum of the series is for a given n:

image

Which can be calculated  for that same n as:

image

So the measure SumSeries can be calculated as follows

=0.5*[NrSeries]*([NrSeries]+1)

E(x2) : SumSeries2

With the same mathematical knowledge we can solve the sum of our series values squared:

image

The measure SumSeries2 can be calculated as follows

=(1/6)*[NrSeries]*([NrSeries]+1)*((2*[NrSeries])+1)

E(y) : SumAmount

This is the most easiest calculation we need as it is a standard way to calculate the total of the selected values, here named SumAmount

=SUMX(ALLSELECTED(FactResellerSales), [SalesAmount])

E(x*y) : SumRankSales

To calculate this measure the same trick is needed as with the NrSeries measure: The selection of the series can start not as the first value, so it is not possible to simply multiply the RankYQ value with the Sales Amount. For this value also the minimal selected RankYQ value of the series needs to be subtracted of the current RankYQ value and corrected with 1. So the SumRankSales measure is calculated as follows

=SUMX(ALLSELECTED(FactResellerSales), ([RankYQ]-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)*[SalesAmount])

a and b

With all the base values in place a and b can be calculated as follows. In the calculation of a there is a check for division by zero (0).

=IF((([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries]))=0, BLANK(), (([NrSeries]*[SumRankSales])-([SumSeries]*[SumAmount]))/(([NrSeries]*[SumSeries2])-([SumSeries]*[SumSeries])))

=((1/[NrSeries])*[SumAmount])-((1/[NrSeries])*[a]*[SumSeries])

SalesAmount Trend

With a and b now correctly calculated we can calculate a measure – SalesAmount Trend – that gives the trendline usable for Power View. A IF statement is included to make sure that the trendline is only calculated when there is a original value of SalesAmount available. And the current series value is calculated using the current RankYQ value minus the minimal selected RankYQ value (and a correction of 1 to complete the calculation).

=IF(ISBLANK([Sum of SalesAmount]), BLANK(), [a]*(MAX([RankYQ])-MINX(ALLSELECTED(FactResellerSales), [RankYQ])+1)+[b])

This trend value can now be used to validate with Excel if it shows the correct trendline.

image

Limitations

Of course there are some limitation to this approach, which aren’t there if Power View supports trendlines natively:

  • The value of series needs to be a continues series, so no gaps. Is this case the trendline will be off if 2006 is deselected and  2005 and 2007 are selected.
  • The Trendline is only valid for the given series and values.
  • Only a linear trendline is calculated at this moment. For Exponential and Logarithmic trendlines the calculation is almost the same, but maybe I blog about this some later time.
  • In Power View it is not possible to combine a bar chart with a line chart, so if you need to show your data as bars this trendline will also show as a bar chart.

Round-up

With all the intermediate measure in the model, some hiding is needed before it can be really used in Power View. But most of that is needed anyway, because of the different nature of PowerPivot and Power View.

There is probably a better and faster way to determine the total distinct series values. If anyone has a good suggestion: please let me know.

Unfortunately I don’t have a working Power View installation, so I cannot show the real result. But you can try it yourself by download the sample Excel sheet from here.

Enjoy trendlining

-JP

Posted in PoC | Tagged , | 2 Comments

The beauty of the #SSAS DefaultMember

imageThe DefaultMember property for the dimension is know by every Microsoft BI professional.  This property sets the default selected member of a dimension attribute, so that member will be selected by default.

The dimension DefaultMember can be declared in BIDS or by using an Alter Cube statement as part of the cube calculations. While the first is static the latter can be used to dynamically sets a DefaultMember. The DefaultMember can also be set by defining a role, very useful for dynamic role security.image

But the dimension DefaultMember property isn’t the only one. Almost every object in Analysis Service has a DefaultMember property, but in this blogpost I only look at the DefaultMember of the Measures ‘dimension’.

Determination of the DefaultMember

The big question is: how is the DefaultMember property determined when it is not set? If we take a look at the BOL there is a remark for the DefaultMember property. (Although the article is referring to Analysis Service 2005, the behavior hasn’t changed):

If DefaultMember is not specified, or is set to an empty string, then Microsoft SQL Server 2005 Analysis Services (SSAS) chooses a member to use as the default member.

But what if the default behavior of choosing the DefaultMember by Analysis Service: it chooses the first member which can be found. So if we query a DefaultMember of a hierarchy, the All level is returned.

image

And if you query an attribute with the property IsAggregatable is set to false, the FirstChild is returned, like the Scenario attribute in the AdventureWorks cube.

image

But there is one very important DefaultMember that helps the overall experience of Analysis Service. This is the DefaultMember of the measures.

DefaultMember of the Measures

What will a query return without a Measure added? In this case the Query Engine of Analysis Service is adding the DefaultMember of the Measures ‘dimension’, the DefaultMeasure. If we look at the AdventureWorks cube this value is set to the Reseller Sales Amount

imageimage

Or when not set, Analysis Service will take the first Measure from the first Measure Group, in this case the Internet Sales Amount. You can manually remove the DefaultMeasure without processing the AdventureWorks cube.

imageimage 

But how can this behavior impact the overall experience? When we select in Excel a dimension attribute at the rows or columns, the following query is fired against Analysis Service.

image

This is a query without a measure included, so Analysis Service is adding the DefaultMember (DefaultMeasure) to the query as described above.

So this could result in a performance delay when this DefaultMeasure is part of a Measure Group with a lot of partitions and data. Then it can take a long time before the members of the dimension attribute is shown in Excel.

A simple way to solve this is to either set the DefaultMeasure to a measure from a small Measure Group or move that Measure Group up to be the top of the Measure Groups. In this case the measure less query will only hit the partitions of that small Measure Group.

image

-JP (@jppp)

Posted in Out the field | Tagged

My PerformancePoint experiences

Finally I made some time to write about my experiences with PerformancePoint 2010 and then focused on my dashboard specially for the iPad and the new SP1 features.

Last April the was a business request to enable the existing PerformancePoint dashboard for viewing on a iPad. The first step was, besides getting an iPad with the correct WiFi, 3G and VPN connections, trying the existing dashboard opening in the iPad Safari browser.

This gave already the first challenge: with iOS4.3 update the JavaScript engine has significant changed. This resulted in either loading or not loading the dashboard. With an then not updated iPad running iOS4.2 the dashboard showed, and an iPad with iOS4.3 not!

Next challenge was that the original dashboard used a filter showed as treeview. This is a rich HTML/JavaScript object and failed to behave correctly on the iPad. When a filter used a normal dropdown box, a core feature of any web browser, Safari shows that a finger touch friendly control. So the first choose was to abandon the multiselect and use a dropdown as the filter. This should work fine, but with a content of more that 300 options the user-friendliness isn’t optimal.

As at that moment we were running a RTM version of SharePoint, so the only dynamic option for filters was the visibility, not the initial content. And there were two filters based on the same dimension which controlled the content of the scorecards and reports. So how to make the dropdownbox as small as possible, say one region and respect the original behavior. Luckily I could use a extra filter to ‘pre-filter’ the main filter. But that filter could only set the visibility of the main filter. In the end I could solve it the following way:

First I added an extra dimension, which was just an extra copy of the original dimension used in the main filter. After that I based the second filter on a SharePoint list mainly with members of that extra dimension. Only the All level of that filter was from the other original dimension hence the reason for the SharePoint list. After that I created for every child level a corresponding filter with the visibility linked. The last part was to attach all filters, in my case +/- 30, to all parts of the dashboard.

Now with the new functionality of dynamic filters in SP1 for PerformancePoint, the content of a filter can be based on another filter. In my case I can redesign my dashboard by using just two filters for the situation above. Just link the filters together and let PerformancePoint filter the content for you. How easy can it be. One important thing for everyone who is upgrading from RTM to SP1: redefine your filter be simply step thru the wizard or else you cannot link the filters. Looks like the PerformancePoint items are not  updated automatically.

BTW. I haven’t tested iOS4.3 with PerformancePoint 2010 SP1, so I don’t know if the dashboards are showing at an iPad at this moment. Although PerformancePoint works with the Safari browser (http://office.microsoft.com/en-us/dashboard-designer-help/using-performancepoint-services-with-multiple-web-browsers-HA102237261.aspx), the iOS Safari browser is a different browser that the desktop version and works, of course, different. So it is a good choice of Microsoft to use the same browser engine for their phone devices as for the desktop: no worries that some JavaScript would not work on different devices.

-JP

Posted in Out the field | Tagged , ,

Update on the future of SSAS

Last November I wrote about the future of SSAS and BISM (see here for the original post). Now with the release of CTP3 of SQL Server codenamed ‘Denali’ (download here), the new BISM and the two different modes are open to the public so I can take a look if my thoughts are still valid.

First take a look at what Microsoft gave to us with the release of the CTP3. Analysis Service can now run in two modes: the ‘old’ UDM mode, now called Multidimensional mode and the new VertiPaq mode, called Tabular mode. A good overview is given in the blogpost by TK Anand. Basically the two modes can perfect live side by sides in a corporate environment.

But which of the two model has the future and therefor the priority of the product team? I still believe that the PowerPivot part of the team and the Tabular mode has the higher priority. They developed the very powerful VertiPaq engine and they are not going to drop that asset. At the other hand, the Multidimensional model has the very powerful functional engine. I got a very high overview of the engine a part of the SSASMaestros course, and it is a state of the art piece of technology. But as PowerPivot started of as a branch of the SSAS code, it has got the same FE, but it is not that big used at this moment. But that will probably change in future releases, especially if you take a look at the improvements in PowerPivot v2.

At this moment there are still big differences between the Multidimensional and Tabular modes of Analysis Service. If you take a look at the different features it is still easy to select the correct mode for the business need. The comparison table can be found at the following MSDN link: http://msdn.microsoft.com/en-us/library/hh212940(v=SQL.110).aspx. The table contains also the PowerPivot mode of Analysis Service which is actually the third mode of Analysis Service. (Technically there is also a fourth mode: the client PowerPivot mode)

Multidimensional Tabular PowerPivot
Actions Yes No No
Aggregations Yes Yes Yes
Calculated Measures Yes Yes Yes
Custom Assemblies Yes No No
Custom Rollups Yes No No
Distinct Count Yes Yes (via DAX) Yes (via DAX)
Drillthrough Yes No Yes (detail opens in separate worksheet)
Hierarchies Yes Yes Yes
KPIs Yes Yes Yes
Linked objects Yes No Yes (linked tables)
Many-to-many Yes No No
Parent-child Hierarchies Yes Yes (via DAX) Yes (via DAX)
Partitions Yes Yes No
Perspectives Yes Yes Yes
Semi-additive Measures Yes Yes Yes
Translations Yes No No
User-defined Hierarchies Yes Yes Yes
Writeback Yes No No

I think that in the end with upcoming versions the difference between the Multidimensional and Tabular version will be gone. I expect that the MOLAP storage engine will become part of the Tabular mode and the Multidimensional mode will become the PDW version of Analysis Service for the (very) big datawarehousing solutions. I don’t think that the Yahoo 12 TB Cube will perform as good as it does now when using the Tabular mode.

-JP

Posted in Editorial | Tagged ,

Preparing for the #SSASMaestros in HK

After a long wait and a lot of arrangements, I am finally about to leave to Hong Kong for the SSASMaestro training.

At this moment I got not that much to tell, mainly because of the NDA. But the slides already shared looks very nice. A lot of stuff I didn’t know but luckily also part I already know :-).

After I finished the course I have to look what I can and may tell.

-JP

Posted in Editorial | Tagged

Analysis on tagged customers

It is been a while since my last blog and of course I got the standard excuses. The two main are: busy with my project/customer and I intended to blog about the upcoming SQL Azure Reporting release, but as you know: it is not released to the beta tester yet.

Therefor I revived an old PoC I created two years ago, but was lost during a laptop change: Analysis on tagged customers. Tagging is very Web 2.0 (and should exist in a cloud solution) and how about tagging the customers by the business users. The advantage of tagging over (default) grouping is that it is possible to assign multiple tags to the same customer.

And now how can we use that advantage in our analysis environment. First we start with the data model.

image

The model is very simple: one Customers table with associated Sales table. And with an N:N relation between the Tags and Customers.

If we now create a cube in Visual Studio by using the wizard we can create the base of the tagging cube. We can use the default settings to create the cube. I got only the Sales Count measure deselected. SSAS creates by default a Many-to-Many relation between the Tags dimension and Sales measure group.

image

This relation already creates a part of the desired behavior: it is now possible to filter the value based on the customers that have a link to one of the selected tags. In our case when we select the tag International and Large we see that Cust B and Cust C are filtered.

image

But now we want the make sure that only the customers are shown that are tagged with both International and Large, in our case Cust C.

This is a little complicated, but we need the look for every customer that have as many tags as there are selected. The count of tags that are linked to a customer are already available in the cube: Tag ID Count (I removed the ID from the measure name). And the amount of tags selected we can easily determine by taking the distinct count of the tags on the total level of the customers. So we first add the distinct count based on the TagID of the TagCloud table to the cube and I renamed it to Tag Distinct Count.

image

Also we need an attribute to separate the default Or operator and the calculated And operator. First we add a column to the Tags table to host the Operator attribute and default behavior Or.

image

After we add this column as an attribute to the Tags dimension we can start creating and adding the And operator to that attribute. In the Calculate tab of the cube we add the following statement:

CREATE [Tags].[Operator].[All].[And];

This will create the new value to the attribute. And for the calculation we add the following statement.

([Tags].[Operator].[All].[And]) =
  Aggregate(Filter(Descendants([Customers].[Customer Name].CurrentMember, , LEAVES)
        , ([Tags].[Operator].[All], [Customers].[Customer Name].CurrentMember, [Measures].[Tag Count]) =
          ([Tags].[Operator].[All], [Customers].[Customer Name].[All], [Measures].[Tag Distinct Count])),
    ([Tags].[Operator].[All], [Measures].CurrentMember));

The last statement needs some explanation. The first line makes sure that the value is set to the following expression. Next we need to aggregate the current value ([Measures].CurrentMember) of the unfiltered Operator attribute ([Tags].[Operator].[All]). But we still need to filter the customers and aggregate only that customers which have the same amount of tags linked as selected is the Tags attribute. But as we are working in a calculated member we always need to refer to the [Tags].[Operator].[All] level because that is the level that the cube has values.

And if we now open the Excel 2010 we can add the Operator attribute and use that in our analysis. And luckily Excel 2010 has some major improvements it handling calculated measures. Before Exel 2010 it was not possible to deselect the calculated value, but that is now finally available. And with the use of the new slicers we can create a nice and easy to use analysis environment.

image 

One thing I need to say: the solution is created using a small dataset. When using it in a larger scale it is possible that either performance is dropping (due to the distinct count eg) or the calculation becomes complex to make is calculate for more customer attributes.

Posted in PoC | Tagged , ,