Entries Tagged as 'DMX'

Predilexion for Insight

The point: I am proud to invite you to enjoy our new product, Predixion Insight. It is a cloud-based predictive analytics service that can be accessed from Excel 2007 or 2010 (32 or 64 bit) and allows you to  perform advanced analytics on your Excel or PowerPivot data.  To install the Predixion  Insight client, just go to https://www.predixionsoftware.com/products , download Predixion Insight for Excel and subscribe to our Free Beta Trial.

The rest of this post is just me bragging about how cool Predixion Insight is and how one can have fun with it.

  PDXWait

A few years ago, I was working with my adviser on a paper trying to offer biochemists a predictive tool for the bioactivity of HIV-1 protease inhibitors. I am a bit ashamed to admit this, but to this day an “HIV-1 protease inhibitor” is, for me (just like the day I first met one), mainly a vector of 4 numbers in one dataset and 25 numbers in another.

In analyzing those datasets, I used a few different tools, including the Microsoft data mining add-ins. Excel, empowered by the data mining add-ins, was by far the best data playground for me at the time (naturally, no trace of bias here!).  What made this experience memorable is that, after working as a developer on those add-ins for a few years, I was using them as a user and not for a test or demo nor  on a customer’s dataset.

Today, I played a bit with those datasets and tried to replicate that work using Excel 2010, PowerPivot and Predixion Insight. I tried to use the combo as a real user, not as a developer, and see how I can solve a specific problem.

Here are a few things that marked the experience. I will try to blog about each individual feature in more detail in the upcoming weeks, so here is just an enumeration

- Asynchronous task execution. I can launch a task (say, a neural network processing task) and then, while the clouds are crunching the 25-dimensional protease inhibitors, I can schedule additional tasks or use excel. Even better, the processing results are cached in the cloud and available for retrieval when the  job completes, from wherever I log in. Naturally, result sets can be retrieved multiple times

- Power Pivot integration: on one hand, it is nice to know that you can analyze more than 1M rows. However, I found sampling pretty useful in solving this issue, at least for modeling. I found more interesting the fact that I can synthesize very rich datasets using power pivot. Some things are just native there (cross table RELATED calls, aggregations). Others are added by Predixion Insight (outlier and missing values handling, binning of numeric data). All these transformations are wonderfully refreshable and can be nicely applied to new data.

Visual Macros: these are executable reports generated by Predixion Insight for Excel, reports which make repetitive tasks much easier. In brief, after going through a wizard, such as specifying which of 25 input columns should be used in a regression model, a script is generated, script which can be modified and re-executed. Also, multiple such scripts can be chained on the same worksheet and executed as a package. This way , creating 5 regression models on the same 25 columns, with different algorithm parameters, stops being boring. These macros can be used for queries as well and I intend to show some Visual Macros based application worksheets in a future post

Add the fact that I ran the experiments on a tiny laptop running just Office and you’ll see why  the 2010 version of the Excel-based advanced analytics tools is my new preferred data playground. Hence, the point at the beginning of this post. Enjoy!

Technorati Tags: ,,,

Getting Ready For Showtime

Predixion is getting ready for show time. Today is the first day of the Predixion Insight invitational Beta, the first product from our company. This Beta will become public shortly (mid August) so if you are a Data Miner, a PowerPivot pro or  Excel user please keep an eye on www.predixionsoftware.com to get more information. In fact, you should probably check out that page today (if you didn’t recently) as it contains a very crisp description of the Predixion vision for removing the barriers to predictive analytics.

Ever since two years ago when I spent my summer developing a cloud prototype I got hooked on the potential of predictive analytics in the cloud. Predixion Insight is unlocking that potential. I hope and believe that you will find our product a powerful instrument for analyzing your data, anywhere, anytime. We certainly put a lot of work in making it so.

Some information about the product features is already available on Jamie’s blog . I will follow up with more posts describing in detail some product features (after the public Beta becomes available).

My first day at Predixion Software

As of today I am working as a Principal Architect (tada!) for Predixion Software, a predictive analytics startup in Redmond, WA. So far, it’s pretty close to the Encarta definition of exhilarate

The company aims to bring predictive analytics technology within reach of every information worker. My past professional incarnation taught me that it may be quite a challenge to make statistics and data mining appealing and intuitive. It also taught me that this is exactly what many users actually expect from an advanced analytics solution, so I jumped at this challenge. (By the way, if you find this kind of problems interesting and appealing then check out our job posting at stackoverflow).

The fact that  Jamie MacLennan is the CTO of this company also helped. I worked for quite a few years with Jamie on Microsoft’s SQL Server Data Mining and I am really looking forward to building other cool products.

 

So – until Predixion has a product I will keep posting about SQL Server Data Mining but also about my trip in the startup world. Meanwhile, if you use SQL Server Data Mining, take this survey  and win a copy of Jamie and Bogdan’s book!

Farewell, Microsoft

Today has been my last day at Microsoft. I spent more than 10 years with the company and loved every bit of these 10 years.  If not for a very promising startup, I would probably be looking forward to spending the next 10 years with Microsoft, in the same BI area that made me write this blog.  My best wishes to the company, the technology and, above all, to the wonderful people that made me love my Microsoft journey!

The next, very-soon-to-show-up-here post will discuss where I am going and what I will be doing there. Talking about future, though, is talking about work, which would spoil this weekend.

This blog will come back to life (although it might move to a different platform). The main topic remains Data Mining and predictive analytics, the core technology remains SQL Server Analysis Services.

 

Regards,

bogdan

 

PS – The Cloud Table Analysis Tools service completed its prototype mission and has been retired. Thanks a lot to all the users! More in my next post

Querying Rules and Itemsets (like the Data Mining Viewers do)

I will try to continue the series started by Jamie, presenting the other set of queries issued by the Microsoft Association Rules viewer. Recently, a question on these queries appeared on the MSDN Data Mining Forums and the poster raised a very good point: while the stored procedures were intended as internal calls for the built-in viewers, external applications and viewers may want to employ them.

So, here is how the rest of the Association Rules viewer works.

Once the viewer is loaded, the first call is something like:

CALL System.Microsoft.AnalysisServices.System.DataMining.AssociationRules.GetStatistics(’Customers’)

The single parameter of this stored procedure is the mining model name.  The result is a one-row table containing the following columns:

Column

Sample Value

Comments

MAX_PAGE_SIZE 2000 The maximum server supported page size for fetching rule and itemsets. This parameter ensures the the viewer will not make requests which will make the server go out of memory, details later.
MIN_SUPPORT 89 Minimum actual support for rules detected by the model
MAX_SUPPORT 2439 Maximum actual support for rules detected by the model
MIN_ITEMSET_SIZE 0 Minimum itemset size
MAX_ITEMSET_SIZE 3 Maximum itemset size
MIN_RULE_PROBABILITY 0.401529636711281 minimum actual rule probability
MAX_RULE_PROBABILITY 0.993975903614458    maximum actual rule probability
MIN_RULE_LIFT 0.514182044237125 minimum actual rule importance
MAX_RULE_LIFT 2.13833283242171 maximum actual rule importance

[Read more →]

Data Mining in the Cloud is temporarily down

On Saturday, November 15th, the connection to the Table Analysis in the Cloud URL is broken. Until the problem is identified and fixed, here are some workarounds:

- For the web interface, use the http://www.sqlserverdatamining.com/cloud URL

- For the Excel add-in, please change the services connection URL. To do that, click the Connections button in the "Analyze (in the Cloud)" ribbon and change the destination URL to

http://131.107.181.101/CloudDM/TATServices/

NOTE: This temporary solution does not support SSL. Your data is transmitted in clear

I’ll post here an update as soon as the servers are up again

Technorati Tags:

Book’s Blog

The “Data Mining with SQL Server 2008″ book  now has a blog. You can check it out at http://www.SqlDataMiningBook.com

Soon enough, there will be some content there:

- various data mining related postings from Jamie and me — all the product related postings on this blog will be replicated on the book’s blog . With a bit of discipline, all the postings will be tagged with the relevant chapter number

- errata (well, hopefully that won’t be the main topic of the blog :-) )

- any new downloads or other information that may be relevant for the readers

Data Mining with SQL Server 2008 + get your own free autographed copy!

DMBook

 

The new version of the SQL Data Mining book is finally available, at least at Amazon. If you are currently SQL DM user, you have an opportunity to get a free autographed copy  by filling out a short survey about the way you use SQL Server Data Mining. For more details about the survey, check Jamie’s blog post.

More details about the book here: Data Mining with SQL Server 2008

Data Mining for the Cloud (or how I spent my summer)

This week, at the KDD (Knowledge Discovery and Data Mining) conference, we (as in Microsoft SQL Server Data Mining team) presented the Table Analysis Tools for the Cloud, a preview for a technology that enables anybody to play with some of the Microsoft’s data mining tools, without any bulky downloads and with zero configuration effort.

Around May this year I practically entered some sort of sabbatical: 3 months to work on an incubation project of my choice (yes, the Microsoft SQL Server organization does this kind of things! if it sounds appealing,  check out our recruiting site or, even better, contact directly our SQL Server Data Mining recruiter, Melsa Clarke - melsac AT microsoft DOT com). With some help from Jamie, various management levels and some nice guys in the SQL Server Data Services team, I gathered the infrastructure for a Software as a Service incubation and set up a web incarnation of the Table Analysis Tools add-in for Excel.

Now it is up and running. The entry page is at sqlserverdatamining.com/cloud, so if I got you bored already and you don’t want to read the rest of this stuff, go ahead and browse that page.

image

 

What it is

TAT Cloud is a set of canned data mining tasks that you can use without having SQL Server installed on your machine.  It consists of encapsulations of some common data mining problems, such as detecting key influencers, forecasting, generating predictive scorecards or doing market basket analysis. The tasks can be executed directly from your browser: just go to the web page, upload your data (in CSV) format and run a tool from the toolbar. Even better, the tasks can be executed directly from Excel. For this, however, you will need to have Excel 2007 and install an add-in which can be downloaded from here.

All tasks work on a table (Excel table or a table in CSV format that you upload to the web interface). All tasks produce reports that can be used to learn more about the analyzed data.

Here is a complete list of features:

  • - Analyze Key Influencers: it detects the columns that impact your target column. It presents a report of those values in other columns that correlate strongly with values in your target column.
  • - Detect Categories (clustering, for data miners) — identifies groups of table rows that share similar characteristics. A categories report is generated, which details the characteristics of each category
  • - Fill From Example — to some extent, similar to Excel’s Autofill feature: it learns from a few examples and extends the learned patterns to the remaining rows in the table
  • - Forecasting — analyzes vertical series of numeric data, detects periodicity, trends and correlations between series and produces a forecast for those series
  • - Highlight Exceptions — finds the interesting (or unusual, or out-of-ordinary) rows in your table
  • - Scenario Analysis — What-If and Goal-Seek tools based on a probabilistic model built on top of your data.
  • - Prediction Calculator — a tool for generating prediction scorecards
  • - Market Basket Analysis — analyzes transaction tables to identify groups of items that appear together in transactions

All features work from Excel. However, not all the features are implemented in the web interface. They will show up eventually!

A really nice presentation of how to use the tools was written by Brent Ozar on SQLServerPedia. Thanks, Brent!

 

How it works

Your data (CSV file or Excel spreadsheet) is uploaded to the web service site. There, Analysis Services crunches it and produces the reports you get either in the browser or in a different spreadsheet. The data and the mining model used for analyzing it are  deleted immediately after processing. If something bad happens and your session does not conclude successfully (fancy wording for "if it crashes") then both data and models are removed automatically after 15-20 minutes.

You will notice in the Excel add-in (as well as in the web interface) a strange IP address, 131.107.181.99 — this is IP of the service. It will be changed to something cleaner very soon.

Excel uses HTTPS to connect to the service, in an effort to protect your data. However, you should not use this technology on sensitive data.

 

What it is not

This is not an official shipping Microsoft technology. It is actually less than even a beta. It may crash, it may produce incorrect results, it may be shot down at any time. BTW, I would appreciate if you posted note here or on the Microsoft data mining forums if this happens (particularly about crashes, I should probably know if it gets shut down)!

 

What next

Well, try the tools: sqlserverdatamining.com/cloud . I recommend downloading the Excel 2007 add-in, rather than using the web application, as it is more functional. Post any questions you might have on the Microsoft data mining forums. And check out this blog periodically for announcements (typically new additions to the web interface) or for more details on how this stuff works

Companion for MS Analysis Services

 

Today I discovered a very nice Analysis Services client tool. Produced by SQLMinds, the tool seems a great addition to Analysis Services. It provides many components, among them a performance tuning service, an OLAP cube browser for the web and a very nice web front end for data mining, the DM Companion tool which can be launched at http://x32.sqlminds.com/dmcompanion.

So, here are a few really nice things about the DM Companion tool (BTW, a fully working demo is running at the aforementioned URL).

The tool works to some extent like a Data Mining -specific SQL Server Management Studio for the web. Therefore, it allows you connect to the server of your choice (through the Analysis Services HTTP pump). The demo seems to allow anonymous connections (pretty safe  as all the interactions offered by the tool are read-only).  Next you get to chose your AS catalog and the mining model you want to use.

For each model you have the option to browse the content or execute predictions.

The prediction feature provides a nice interface for defining singleton predictions, as you can see below:

image

The interface supports specifying multiple nested table keys as input, so the tool can perform associative predictions as well. It reminds me of the XMLA Thin Miner sample running on SQLServerDatamining,com, however, it looks much better and is nicely integrated with the rest of the application. While prediction functions do not seem to be directly supported, the application is able to predict a cluster for clustering models.

The model browsing features are really nice. Analysis Services includes a set of sample web viewers for Naive Bayes, Trees and Clustering. This application provides some seriously better looking viewers for these algorithms, and extends the suite at least for Neural Networks (a really nice viewer), Sequence Clustering and Association Rules.  The DM Companion viewers offer all the features in the sample viewers, with a nicer implementation which uses AJAX and has better graphics, plus a solid set of new features, the most spectacular being the interactive dependency net browser and the pie chart visualization for decision trees, which you can see below.

image image

 

Overall, DM Companion looks like a really nice tool for sharing your data mining implementation on the web