Some very interesting, although time-consuming project kept me away from posting in the last month. Mea maxima culpa, I will try to catch up and put some new and interesting content here more often.
Recently, trying to make sense of some prediction results, I realized that the DMX NATURAL PREDICTION JOIN statement has some obvious and some not so obvious dangers, which deserve some clarification.
To summarize, the PREDICTION JOIN operator is used to apply patterns from a trained mining model against new data. The general syntax is
OPENQUERY(MyDataSource, ”SELECT Height, , HairColor, ShoeSize FROM CustomerInfo’) AS T
T.Height = MyModel.Height AND
T.HairColor = MyModel.HairColor AND
In the standard PREDICTION JOIN statement, the ON clauses specify how columns from the new data are to mapped to the input columns of the mining model.
The statement supports a simplified format:
NATURAL PREDICTION JOIN
OPENQUERY(MyDataSource, ”SELECT Height, HairColor, ShoeSize FROM CustomerInfo’) AS T
In the simplified format, the columns from the new data are mapped to the input columns of the mining model by name. Much easier to type, resulting in a more readable statement!
Now, the natural mapping may end up in three different scenarios. In the first scenario, all the model’s input columns find a corresponding data column in the input. The mapping is performed correctly and everything works as expected. Nothing to blog about in this case!
The second case: some of the inputs are mapped correctly to model columns, some are not, although you would expect them to be mapped correctly. More than a simple user error, this situation is actually “facilitated” by the way the BI Dev Studio works. Assume you start with a table containing 4 columns: Height, HairColor, ShoeSize, EducationLevel and use BI Dev Studio to create a mining structure and a model. BI Dev Studio will generate more friendly names for the columns and your mining model ends up with the structure below: