Tagged: Data science, Life cycle, Machine Learning, Projects
- This topic has 0 replies, 1 voice, and was last updated 2 years, 4 months ago by
Idowu.
- AuthorPosts
- February 13, 2020 at 1:13 pm #85745Spectator@idowu
Sometimes, you can become too excited to an extent that you fail to follow protocols, without which you’ll never achieve certain set goals. All projects have life cycles which must all be followed for them to be completed.
For a machine learning project, the following life cycle must be adhered to, if the model will work fine.
Defining the Problem
This is the first phase in the life cycle of any project. In machine learning, you need to identify and review the problem. Criticize the different approaches to solving it and carefully place the problem under a suitable machine learning model.
In essence, you should establish which type of machine learning model you’ll be dealing with – this could either be supervised or unsupervised. If it’s supervised learning, determine whether it’s a classification or a regression problem. Whereas, for an unsupervised learning model, ascertain if you’re dealing with a clustering or an association problem.
To learn more about the terminologies used in machine learning, check out this article
Getting the data
Once the problem has been identified and the strategies to solving it are established, the next phase is to get the data sets needed for your project. Depending on the type of machine learning model you’re building, the data could be labeled (with precise feature names) or unlabeled (without defined features, examples; photos and videos). You can then go ahead and apply the necessary strategies needed to acquire your data. Check my previous article for some insights about data collection.
Data cleaning and sorting
Imagine you collected raw data of about 20,000 rows and you were asked to analyze it. As an intern, you were excited about your new task. As soon as you glanced through the data, your anxiety got suppressed upon detecting a whole lot of errors and inconsistencies in the data. Your manager doesn’t want to concern himself with methods; all he wants is a clean output. You then consulted a more experienced colleague and he tells you “you should consider cleaning and sorting the data set”.
This phase is sometimes referred to as wrangling – A process of turning raw and dirty data into a more refined and readable format in preparation for analysis. For better comprehension, you can take a look at my previous article on Data cleaning with Python.
Feature selection and Data preprocessing
After a series of brainstorming and sleepless nights, you’ve now transformed your data into a refined form, but your manager informs you that the data was meant for a machine learning project for the company. He then instructs you to preprocess and select the best features from the data in preparation for the task ahead. This got you thinking about what these two terms mean and how to go about executing them.
Feature selection is the act of selecting the best features from a data, which best predict the output. I’ve prepared an article to help you understand feature selection, you can check it out here.
Data preprocessing is the method of transforming your data into formats that are more readable and interpretable by a machine. Some of the available preprocessing algorithms include:
- StandardScalar from sklearn.preprocessing
- Binarizer from sklearn.preprocessing
- MinMaxScalar from sklearn.preprocessing
- LabelEncoder from sklearn.preprocessing
Splitting the Data, Training the Model and Testing Accuracy
This phase involves splitting the data into test and train data sets. This is usually done by randomly selecting 30% of the data as a test set; while training the other 70%.
A machine learning algorithm is supplied with the training set, while the test set is later presented to the trained model to test its accuracy. To split the data, the train_test_split class of sklearn.model_selection is used.
The higher the accuracy of a model, the better its performance and vice versa.
Model Deployment and Integration
For the model to be ready for consumption, deployment, and integration into an existing production environment (such as a business app) is about the last stage. This is where your model starts to make applicable business decisions.
Despite being part of the final phases of the cycle, it can also be very tedious, as Data Scientists and SoftWare Programmers channel much of their time and energy into adjusting compatibility problems that usually arise during model integration.
Monitoring and optimization
Getting a working model and integrating it into an app is not the final phase. Monitoring the performance of the model is highly important if it will continue to scale. Should any fault or loophole be detected during the monitoring phase, then, optimization of the model becomes necessary.
Thus, monitoring and optimization should be a continuous and periodic event in the life cycle of a machine learning model.
- AuthorPosts
- You must be logged in to reply to this topic.