Petit aperçu des solutions de Cloud Machine Learning


Depuis mi 2015 Amazon propose en plus de Microsoft (MS Azure Machine Learning renommée MS Cortana Analytics récemment) et d'IBM (Watson Analytics) sa solution de Cloud Machine Learning nommée Amazon Machine Learning.

Petit tour d'horizon réalisé par KDNuggets (en anglais):

Adding to the list of companies competing in this space, Amazon recently announced the launch of Amazon Machine Learning. In contrast to offerings from Microsoft and IBM, Amazon's product has a much more focused mission. 
"AMAZON is focused squarely on a fully automatic tool for supervised machine learning"
While Microsoft is betting that users will drag and drop boxes to perform each step of a data pipeline, and IBM offers an open-ended service for interrogating data, Amazon is focused squarely on a fully automatic tool for supervised machine learning.
Supervised machine learning refers to setting where each datapoint is associated with some target variable. 
When the target variable is a binary quantity, the problem is called binary classification
When the target is categorical (with more than 2 categories) the problem is called multi-class classification
When the target is real-valued (a floating point number), the problem is called regression
These are the three services offered by Amazon Machine Learning. While many algorithms exist for each of these three tasks, Amazon places minimal responsibility for the algorithm in the users' hands, instead offering a nearly fully automated solution for supervised learning problems.
Data Acquisition
Likely its killer feature, Amazon's machine learning software can load your data from anywhere it might live in its vast network of web services. 
This includes relational data stored in RDS, csv files stored in S3 or data in Amazon's Redshift data warehouse. Given Amazon's primacy in virtualized web services, it seems this is likely to appeal to internet companies, many of which already have their data in Amazon's ecosystem. For those who want to take the software for a spin but do not have any datasets in Amazon's cloud, they provide a sample dataset (bank.csv) that contains dummy data for bank customers.

One nice feature of Amazon's service is that it automatically combs through the data, identifying which fields are numerical and which categorical. Further preprocessing (whitening and dimensionality reduction) are presumably performed automatically (the service never troubles the user to select preprocessing options). It seems that Amazon has astutely surmised that the user who farms his machine learning tasks to a service provider is unlikely to have strong preferences about data preprocessing methodology.

Model Selection 
Amazon machine learning simply requires that the user select a target variable. Then the user chooses whether to learn with default settings or to select a custom model. Even under the Advanced Settings for model creation, the choices available to the user are limited to model size, l1 vs l2 regularization (or neither) and magnitude of the regularization parameter (they call this regularization amount. As with data preprocessing, Amazon Machine Learning assumes that target customers seek an automatic solution. Presumably, under the hood Amazon is running either logistic regression or an SVM for classification and linear regression or low order polynomial regression to predict numerical quantities.

Pricing
Amazon's pricing model is straightforward. In addition to standard computer charges for data storage, etc, Amazon charges for making predictions. Each real-time prediction costs $.0001 and batch predictions cost $.10 per 1,000 predictions (the same price per prediction). Additionally, Amazon charges $0.42 per hour for model building. This is somewhat opaque as the user may have no way to reason about how long a model should take to build, knowing neither what precise model is chosen, the number of model parameters, nor how many passes will be made through the data (assuming running in default mode). A nice additional feature for the service might be cost estimation prior to running the algorithm, especially for models run with the default settings, which are hardest to reason about.
Conclusions
Amazon's cloud machine learning service is narrower in scope than either IBM's Watson Analytics or Microsoft's Azure ML offerings in the space. However, it is far the smoothest of the services. The service has a clear use case, data acquisition is effortless, and it's clear who might use the product. In contrast, Azure ML assumes that its customers know nearly enough to build a model themselves but want a GUI. Watson Analytics, when we tested it, couldn't handle enterprise scale data. Watson appeared focused more on data visualization and exploration, than specific prediction problems. As Amazon's service does not feature deep learning or machine perception functionality, and can only be trained on supplied datasets (as opposed to more universal datasets like Imagenet, or large text corpora), it's unlikely to compete directly with MetaMind.

http://www.kdnuggets.com/2015/04/cloud-machine-learning-amazon-ibm-watson-microsoft-azure.html

Comments