The point: I am proud to invite you to enjoy our new product, Predixion Insight. It is a cloud-based predictive analytics service that can be accessed from Excel 2007 or 2010 (32 or 64 bit) and allows you to perform advanced analytics on your Excel or PowerPivot data. To install the Predixion Insight client, just go to https://www.predixionsoftware.com/products , download Predixion Insight for Excel and subscribe to our Free Beta Trial.
The rest of this post is just me bragging about how cool Predixion Insight is and how one can have fun with it.
A few years ago, I was working with my adviser on a paper trying to offer biochemists a predictive tool for the bioactivity of HIV-1 protease inhibitors. I am a bit ashamed to admit this, but to this day an “HIV-1 protease inhibitor” is, for me (just like the day I first met one), mainly a vector of 4 numbers in one dataset and 25 numbers in another.
In analyzing those datasets, I used a few different tools, including the Microsoft data mining add-ins. Excel, empowered by the data mining add-ins, was by far the best data playground for me at the time (naturally, no trace of bias here!). What made this experience memorable is that, after working as a developer on those add-ins for a few years, I was using them as a user and not for a test or demo nor on a customer’s dataset.
Today, I played a bit with those datasets and tried to replicate that work using Excel 2010, PowerPivot and Predixion Insight. I tried to use the combo as a real user, not as a developer, and see how I can solve a specific problem.
Here are a few things that marked the experience. I will try to blog about each individual feature in more detail in the upcoming weeks, so here is just an enumeration
- Asynchronous task execution. I can launch a task (say, a neural network processing task) and then, while the clouds are crunching the 25-dimensional protease inhibitors, I can schedule additional tasks or use excel. Even better, the processing results are cached in the cloud and available for retrieval when the job completes, from wherever I log in. Naturally, result sets can be retrieved multiple times
- Power Pivot integration: on one hand, it is nice to know that you can analyze more than 1M rows. However, I found sampling pretty useful in solving this issue, at least for modeling. I found more interesting the fact that I can synthesize very rich datasets using power pivot. Some things are just native there (cross table RELATED calls, aggregations). Others are added by Predixion Insight (outlier and missing values handling, binning of numeric data). All these transformations are wonderfully refreshable and can be nicely applied to new data.
- Visual Macros: these are executable reports generated by Predixion Insight for Excel, reports which make repetitive tasks much easier. In brief, after going through a wizard, such as specifying which of 25 input columns should be used in a regression model, a script is generated, script which can be modified and re-executed. Also, multiple such scripts can be chained on the same worksheet and executed as a package. This way , creating 5 regression models on the same 25 columns, with different algorithm parameters, stops being boring. These macros can be used for queries as well and I intend to show some Visual Macros based application worksheets in a future post
Add the fact that I ran the experiments on a tiny laptop running just Office and you’ll see why the 2010 version of the Excel-based advanced analytics tools is my new preferred data playground. Hence, the point at the beginning of this post. Enjoy!