Dataset: Candy Hierarchy Dataset 2017
Packages Used: Pandas, Fuzzy Wuzzy, Numpy, country_list
Coding Application: Jupyter Notebooks and then moved on to PyCharm
Github link to the code: https://github.com/adithya9201/candyhierarchy/blob/master/candy.py
Data Cleaning – Candy Hierarchy 2017
Here is the link to the dataset, in case you guys want it.
Contents of the Dataset:
In this Candy Hierarchy 2017 dataset the variables present are:
Internal_ID: This contains a unique identifier to every record in the database and naturally will be set to become the index of this dataset.
Going Out?: Binary field, but we can see that there are a lot of missing values.
Gender: It has four different options, but then again we have missing values as well.
Age: Numerical field, along with missing values
Country: Text Field, but users have written their own version of the names. Example, for America, there are entries such as USA, us, US, America.
State/Province: Text Field, but users have written their own version of the names. Same as the country data.
Joy Or Despair: All kinds of chocolate bars are the questions with three distinct options to choose from (Joy, Meh, Despair).
Joy Other: Text Field. Lots of missing values.
Despair Other: Text Field. Lots of missing values.
Other Comments: Text Field. Lots of missing values.
Dress: Binary field. Missing values present
Day: Binary Answer Field. Missing values present
Media: Images and click coordinates are specified.