Data Collection and Preparation

The dataset that is to be used for predictions is not available online. It was obtained from an Aluminium company on personal basis. The dataset can be found in the project repository. It is a collection of sensors values over a period of time with a label "Good/Bad", where "Bad" represents a bad cycle, after which maintenance is required. The goal is to correctly predict if a cycle is "bad" or not. Since the first assignment requires the usage of an API to collect the data, I have used the Spotify API as a substitute.


Data Collection using Spotify Web API

Here is the code snippet using which I've used to create an access token, that shall be further used to send a request at an endpoint


Furhter, here is the code snippet to parse the data from json format to a DataFrame.

Data Cleaning

In this section, data cleaning is performed on the original dataset. Let's start by observing a subset of the data.

From the above image it can be inferred that most of the columns are numerical. Here is the summary of variables:

From the above image it is pretty clear that the features and their datatype has a mismatch, where numerical columns are present as object data type in pandas, which is a character type. Let's check for null values in the data:

Analyzing the target variable is critical for any prediction problem.

After getting rid of the rows having null values for taget, it is evident that the label "Good/Bad" (target) has some noise in it. After cleaning: