Machine Learning - Work Flow

Work flow usually represent the steps to be carried to obtain a desired  output. We can say its a representation of the various  stages data should go through in order to obtain a machine learning model. The out come of the Machine learning workflow is the black box model which represent the relationship between the input data and and the output information.
There are 2 types of machine learning problems Supervised and Unsupervised.Now we can concentrate on supervised learning where data used will have labels along with features whereas in unsupervised learning data will be consist of only features.



What do you mean by "Feature set"? 
Feature set is the set of measurable properties of data that is being observed to get the result. Consider an example of judging whether a given fruit is apple or orange we need to study/observe various characteristics of given fruit such as color,taste shape etc. All the property which describes a particular object can't be a feature there are certain rules to be satisfied to select the feature set.There is separate field called Feature Engineering to learn more about feature selection.

What do you mean by "Labels"?
Labels are the properties/decisions which will be discovered based on the feature set. In the process of identifying fruit the class of the fruit will be a label.Most of the machine learning algorithm deals with the single valued labels however there is a chance of having multi labeled data also. A simple decision making problem will have a label taking value True/False or 0/1.Therefore labels are those properties which will be derived by the feature data.

So now we are familiar with Feature set and Labels. For example consider a data set which contains sample data set as shown below.
Dataset 1
In the above example Fields like sugar_content and calories will be considered as features and food_category represents Label.

All these features are converted to Feature vector  before fed into machine learning algorithms. Feature vector is the vector which consists of numerical representation of feature set.

Steps involved in ML WorkFlow:

Basically we can divide workflow into three stages namely Extraction,Training and Prediction.
  • Extraction: This stage contains steps like data pre-processing,feature extraction,data cleaning depending on the type and state of the data set. In real world data will be collected from various sources(IoT is the main source of these data) and will be in different form with or without well formed structure. These data should be studied properly in various way and set of features should be extracted which defines the labels. Once feature extraction completed it should be optimized through few process called data cleaning, feature hashing etc. and finally converted into feature vector which is a numerical representation of feature set and labels.So input of this stage is raw data from real world source and out put will be feature vector representing processed data.Natural Language processing is the part which will be used in few problem types to do data pre processing(Will study one interesting problem about it in coming post!).
  • Training : Once we are ready with the data in feature vector form,  we basically divide the data set into two parts training data set and testing data set. Training data is used to train machine earning model.Once the model is created it will be tested against the testing data for accuracy level.This is the stage where we select a particular machine learning algorithm(called as model selection) to train data.Data will be fed into algorithm in numerical format.Algorithms will be selected based on the type of data set for example if data is continuous and linear one can go with Regression algorithms or if data is distributed randomly and discrete we can make use of clustering algorithms.Similarly model selection also depends on the type of problems if the problem is decision making type one can easily make use of decision tree (Provided data set matches few criteria). Once the model selection is completed selected model will be trained against the data set prepared(Usually data set size will be huge so model training may take hours and it is better to run training in high end machines) and tested for accuracy using test data. A best model will have a high accuracy.Different models can be compared against each other to come up with the best model!.This is the most painful task though. The out put of this stage is well trained machine learning model which has the capability of predicting/judging any out put based on the input we give(Input must be related to the task for which model has been designed and will be similar to the data set used to train).

  • Prediction: It's the show time now!. Now we have well trained model which gives the answer for whatever the question we ask(Only questions related to the task for which it is trained). This comes as the application point of stage where we can create interface for the model and pass the input to predict the out put. The interface will be most of the times Webservice APIs with the model hosted/running in some high end servers.
The workflow discussed here gives a abstract view of the steps involved in machine learning from raw data to prediction API. However there are still more advanced steps in between them and may vary depending on the problem in hand, structure/type of data to be processed, algorithm used to train etc.
This post just acts as a introduction to machine learning workflow.Because a better start leads to a great success.

Happy Coding :)




Comments

Popular posts from this blog

Introduction to Deep Learning and Artificial Neural network.

Functional Programming in JAVA