(More) DMX features in 2008: Better use of structure columns in models
In a previous post I presented some of the new DMX features in the November CTP of SQL Server 2008. So, here are some new cool DMX tricks.
Assume you want to use Microsoft Association Rules and Microsoft Decision Trees on the same data. Also, assume that the data contains one numeric column (say, Age). You may have noticed that Decision Trees supports continuous columns, while Association Rules does not. Not a big deal, we have discretization, and Age can be added twice to the same mining structure as, say Age (Continuous) and Age Disc (Discretized). However, the two different names raise a problem in the case of NATURAL PREDICTION JOIN (where input columns are bound by name to the model columns).
In SQL Server 2005, the mining model columns typically had the same name as the mining structure’s ones. There was no way in DMX to change the model column names (well, there is a way in BI DevStudio).
Another problem: one would not include the email, name or phone number of a customer in a mining model, because, at best, this would increase training time and, in the worst case, would unnecessarily complicate the model with fake patterns. But this makes it hard to link the training cases leading to one pattern (available with the drillthrough feature) to information that would perhaps make that pattern actionable (like contact information, if the pattern suggest strong probability to buy a product).
In 2008, these problems are much easier to solve, with mode column aliasing and structure columns drillthrough, and here is how these work.