What is Machine Learning?
Machine Learning is the process of teaching machines to understand the data and draw conclusions either by logically grouping the data or predicting the nature of new data fed to them.
What are different types of Machine Learning?
There are two basic types of Machine Learning.
- Supervised Machine Learning
- Unsupervised Machine Learning
What is Supervised Machine Learning?
Supervised Machine Learning is the process of training the machines to predict a feature or an aspect of the new incoming data based on the existing data.
Why do you call a Supervised Machine Learning as Supervised Machine Learning?
In the Supervised Machine Learning technique, the data has evidence about how a feature or an aspect of data you are going to predict has performed in the existing data.
Thus in Supervised Machine Learning, the data is structured and properly labeled.
What type of data a Supervised Machine Learning Models requires to be trained?
Supervised Machine Learning models typically need Structured Data.
What do you mean by structured data?
Structured data is well organized and properly formatted data, which can be readily processed and analyzed. Structured data is also easily readable with naked eye and provides a quick way to perform quick data visualization and draw conclusions about it.
What are the Supervised Machine Learning Approaches?
Classification and Regression are two basic Supervised Machine Learning approaches.
What do you mean by Classification?
Classification is the process that identifies to which group any particular observation belongs to. Classification is used to predict the categorical data.
Can you give an example of Classification?
Suppose you are teaching a 5 year old boy to identify fruits. You are explaining various aspects of each fruit like Shape, Size, Weight, Color along with the Name of the fruit.
Once you are done teaching the boy, you give him a fruit and ask him to identify the name. The boy tells the name of the fruit based on the training you gave him. This ability of the boy to recognize the fruit based on the teaching is called Classification.
From this example, the following terms are analogies with the terminology in the world of machine learning:
|Words Used in Example
|Machine Learning Terminology
|Teaching, Explaining the boy
|Training the model
|Shape, Size, Weight, Color, Name of the Fruit
What do you mean by Regression?
Regression by name tends to reiterate and regress the data provided and is used to predict a numerical outcome.
Can you give an example of Regression?
Suppose you have data of Temp High and Temp Low for the past 10 years with the following information.
|Temp Low (deg celcius)
|Temp High (deg celcius)
In this example, the data is structured and is a supervised data based on ground truth with regards to Temp High and Temp Low. Based on this data, you may want to predict the highest temperature tomorrow. Maybe you may want to predict the Temp High for the next 7 days too!
What is an Observation?
Aspects or features or attributes are the properties of one particular Observation. A set of observations form a supervised dataset.
In the example of predicting the highest possible temperature based on past 10 years data, there are four aspects captured for each day. They are Date, Season, Temp High and Temp Low. Aspects or features recorded for each date are called an observation.
What is the role of Subject Matter Expert with regards to data being trained in Supervised Machine Learning?
A Subject Matter Expert has major role in the following stages of Supervised Machine Learning approach:
- Ensures the data that is being fed to the Machine Learning model is valid.
- Ensures the correctness of the data.
- Enhances the predicting power of the model by further providing more scenarios of the data for training.
- Validates the Supervised Machine Learning model’s prediction performance on the new data.
What is Unsupervised Machine Learning?
Unsupervised Machine Learning is the process of finding patterns in unstructured data.
Why do you call it Unsupervised Machine Learning?
In Unsupervised Machine Learning technique, there is no clue about the data that is being processed. Generally you do not know anything about the nature of data, what to do with the data and where to start with the huge amount of unstructured data. In order to find some patterns in such data or to logically group the data, we go for Unsupervised Machine Learning.
What is Unstructured data?
Unstructured data is unorganized data and doesn’t have any format. Unstructured data is hard to read and process.
What is the role of a Subject Matter Expert with regards to data fed to Unsupervised Machine Learning model?
In the Unsupervised Machine Learning approach, there is no concept of a Subject Matter Expert.
In this approach, there is nothing like a valid or invalid outcome. It is all that the unsupervised machine learning algorithms discover the hidden patterns in the data.
What are the Unsupervised Machine Learning Approaches?
Clustering and Association are the two unsupervised machine learning approaches.
What happens in Clustering?
Clustering as the name indicates forming clusters or groups of data based on the chunk of unstructured data provided.
Clustering follows two basic approaches:
- Identify the similar data using similarity
- Group the data into clusters.
What are the various measures to find similar data?
To identify any two given documents are similar, following are most familiar similarity measures:
- Cosine Similarity
- Jaccard Similarity
- Euclidean Distance
- Manhattan Distance
- Hamming Distance
What are the various approaches to group the data into Clusters?
Following are the various approaches to perform clustering based on:
- Centroid (k-Means Clustering)
- Connectivity (Hierarchical Clustering)
- Density (DBSCAN, OPTICS)
- Distribution (Expectation Maximization)
Can you explain Clustering by example?
Suppose you are a recruiter at a Job Consulting Firm. You end up having a lot of CVs and they are not organized by technology or expertise. Your job is to group all the CVs that are logically related to each other. For example all Software Engineer related CVs in one group, all the scrum master related CVs in other and so on. In this situation a clustering approach will help you find similar CVs and logically group them.
What do you mean by Association?
Association is the process of discovering patterns in the behavior of data provided.
Can you explain Association by example?
Consider consumer purchase behavior at D-Mart. As for each and every consumer the number of items purchased varies.
As this data with regard to the number of items purchased and the type of items purchased is highly unorganized. It is at least possible to identify a pattern in consumers’ purchase behavior.
Given a set of consumers who bought utensils like plates, bowls, spoons and various other cutlery along with door curtains and door mats. It is noticed that the majority of the consumers with these items in their purchase list have also bought plastic chairs.
Though utensils and cutlery belong to say categories of kitchen related items, door curtains and door mats as home décor. Identifying the purchasing power of these consumers to buy a plastic chair with the kitchen items and home décor in their purchased items list will be done by Association rules mining.
With this approach there may not necessarily be an item from the similar category such as kitchen or home décor in this example. The reason for this behavior can only be understood that the majority of these consumers falling into this purchase behavior have got newly married and are setting up a new home. Hence they are buying the basic stuff necessary to start their home.
Thanks to Association rules mining algorithm that helped the store manager to identify the hidden pattern. This helped the store keeper to keep the Chairs in the vicinity of Utensils and home décor.
PS: The D-Mart example is just for illustration purposes.