|Machine Learning Miller by |
The past couple of months though and I am noticing a bit of a trend in conference presentations (and tweets coming out of conferences) that seem to have moved a lot of the hope and hype around big data onto AI. Or more specifically machine learning. I am not going to single out any specific examples, but I feel this covers two basic areas:
- I don't need to know about my data or structure it to get useful information and
- I won't need to configure things. because machine learning.
(Lack of) Data structureI am not sure what is driving this need but I guess it has something to do with the falling price of storage and the rise of world wide data collection via the web. So lots of business now have lots of data sitting there. There is also hype about the businesses that do data analytics well - e.g. Amazon or Uber - that leads people to think "They make money from their data. I have data. I should be able to make money out of it".There are also lots of blog posts and talks about this tool or that (e.g. Hadoop or Map Reduce) has enabled the data analytics, which usually gloss over the basics like linear regression because everyone does that...right?
In the Computerworld article Machine learning: Demystifying linear regression and feature selection they make this very good point
Much of the art in data science is understanding the problem domain well enough to build up a clean set of features that are likely related to what you want to model.and show how using training data that has been modelled with clean features outperforms using all the data without any domain insight. So yes, machine learning probably can be useful just be prepared to understand the data and be able to run your own regression analysis on any training sets needed.
Configuring thingsAgain, the hype seems to have come from wanting to copy successful market leaders and I can almost hear people thinking "Netflix adapt the recommendations without configuring it, can I configure my users menu structure? (or whatever)". Adapting to changing usage or conditions that your software operates in is a perfectly reasonable desire. But here it's worth asking how much "adaptability" to we really require? and how much of the conditions of this can I realistically foresee? For example adaptive polling of mailboxes may not require a vast neural net to see traffic cycles and usage behaviour to tweak the likely poll time. We might just be able to write much less code that has enough adaptability based on what it found last time and a simple heuristic.
So please, do explore the potential that machine learning has to offer but don't expect it to be a silver bullet or replace understanding your domain. I think that AI surrounds us to much now for another AI winter to set in, but too much hype isn't good for any tool or approach.