EDA ~ Unemployment Rate

Today let’s explore the unemployment rate across different countries of the world. The dataset we will be using today is Unemployment data— World wide Figures present on the Kaggle platform. It includes the unemployment rate for 31 years of each country i.e. from the year 1991–2021 with columns such as Columns of the dataset, Country name, Country code, Years- 1991 to 2021

Photo by The New York Public Library on Unsplash
PHOTO:THE BALANCE / MARY MCLAIN

1. Import important libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

2. Reading the CSV file

df  = pd.read_csv("/content/unemployment analysis.csv")
df.head()

3. Data Summary

I/P:
df.columns
O/P:
Index(['Country Name', 'Country Code', '1991', '1992', '1993', '1994', '1995','1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013','2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
dtype='object')
  • The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).
df.info()
Observations:
* There are no missing values
* There are 33 columns in total
* There are 235 rows which has a country name, country code, and unemployment dataset between 1991 and 2021

* There are 2 categorical features Country Name and Country Code; and the rest are numeric features.

4. Finding and Removing the missing values

The isna() function is used to detect missing values.

I/P:
df.isna().sum()
O/P:
Country Name 0 Country Code 0 1991 0 1992 0 1993 0 1994 0 1995 0 1996 0 1997 0 1998 0 1999 0 2000 0 2001 0 2002 0 2003 0 2004 0 2005 0 2006 0 2007 0 2008 0 2009 0 2010 0 2011 0 2012 0 2013 0 2014 0 2015 0 2016 0 2017 0 2018 0 2019 0 2020 0 2021 0
dtype: int64
Observations:
* There are no missing values

Top 10 Countries with highest Unemployment rate

I/P:
top_10 = df.groupby(by = 'Country Name')['2021'].sum().sort_values(ascending=False).head(10)
O/P:
Country Name South Africa 33.56
Djibouti 28.39
Eswatini 25.76
West Bank and Gaza 24.90
Botswana 24.72
Lesotho 24.60
Congo, Rep. 23.01
Gabon 22.26
Namibia 21.68
St. Vincent and the Grenadines 21.62
Name: 2021, dtype: float64

Top 10 Countries with the lowest Unemployment rate

I/P:
top_10 = df.groupby(by = 'Country Name')['2021'].sum().sort_values(ascending=True).head(10)
O/P:
Country Name
Qatar 0.26
Cambodia 0.61
Niger 0.75
Solomon Islands 1.03
Lao PDR 1.26
Thailand 1.42
Benin 1.57
Rwanda 1.61
Burundi 1.79
Bahrain 1.87
Name: 2021, dtype: float64

Just because a country has a low unemployment rate, does not mean its citizens are necessarily well-off. That is determined by GDP per capita. - Source

Visualizing the Unemployment rate of the world in the year 1991

fig = px.choropleth(df,locations='Country Name',locationmode='country names',color='2021',hover_name='Country Name',title = '1991 Unemployment rate',
color_continuous_scale='aggrnyl')
fig.show()

Visualizing the Unemployment rate of the world in the year 2021

Switching rows and Columns

df = df.set_index("Country Name").transpose()
df.index.names = ["Year"]
df.head()

Number of Countries:

I/P:
print(df.columns.tolist())
O/P:
['Africa Eastern and Southern', 'Afghanistan', 'Africa Western and Central', 'Angola', 'Albania', 'Arab World', 'United Arab Emirates', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh', 'Bulgaria', 'Bahrain', 'Bahamas, The', 'Bosnia and Herzegovina', 'Belarus', 'Belize', 'Bolivia', 'Brazil', 'Barbados', 'Brunei Darussalam', 'Bhutan', 'Botswana', 'Central African Republic', 'Canada', 'Central Europe and the Baltics', 'Switzerland', 'Channel Islands', 'Chile', 'China', "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.', 'Congo, Rep.', 'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica', 'Caribbean small states', 'Cuba', 'Cyprus', 'Czech Republic', 'Germany', 'Djibouti', 'Denmark', 'Dominican Republic', 'Algeria', 'East Asia & Pacific (excluding high income)', 'Early-demographic dividend', 'East Asia & Pacific', 'Europe & Central Asia (excluding high income)', 'Europe & Central Asia', 'Ecuador', 'Egypt, Arab Rep.', 'Euro area', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'European Union', 'Fragile and conflict affected situations', 'Finland', 'Fiji', 'France', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia, The', 'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Guatemala', 'Guam', 'Guyana', 'High income', 'Hong Kong SAR, China', 'Honduras', 'Heavily indebted poor countries (HIPC)', 'Croatia', 'Haiti', 'Hungary', 'IBRD only', 'IDA & IBRD total', 'IDA total', 'IDA blend', 'Indonesia', 'IDA only', 'India', 'Ireland', 'Iran, Islamic Rep.', 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan', 'Kazakhstan', 'Kenya', 'Kyrgyz Republic', 'Cambodia', 'Korea, Rep.', 'Kuwait', 'Latin America & Caribbean (excluding high income)', 'Lao PDR', 'Lebanon', 'Liberia', 'Libya', 'St. Lucia', 'Latin America & Caribbean', 'Least developed countries: UN classification', 'Low income', 'Sri Lanka', 'Lower middle income', 'Low & middle income', 'Lesotho', 'Late-demographic dividend', 'Lithuania', 'Luxembourg', 'Latvia', 'Macao SAR, China', 'Morocco', 'Moldova', 'Madagascar', 'Maldives', 'Middle East & North Africa', 'Mexico', 'Middle income', 'North Macedonia', 'Mali', 'Malta', 'Myanmar', 'Middle East & North Africa (excluding high income)', 'Montenegro', 'Mongolia', 'Mozambique', 'Mauritania', 'Mauritius', 'Malawi', 'Malaysia', 'North America', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'New Zealand', 'OECD members', 'Oman', 'Other small states', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Papua New Guinea', 'Poland', 'Pre-demographic dividend', 'Puerto Rico', "Korea, Dem. People's Rep.", 'Portugal', 'Paraguay', 'West Bank and Gaza', 'Pacific island small states', 'Post-demographic dividend', 'French Polynesia', 'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'South Asia', 'Saudi Arabia', 'Sudan', 'Senegal', 'Singapore', 'Solomon Islands', 'Sierra Leone', 'El Salvador', 'Somalia', 'Serbia', 'Sub-Saharan Africa (excluding high income)', 'South Sudan', 'Sub-Saharan Africa', 'Small states', 'Sao Tome and Principe', 'Suriname', 'Slovak Republic', 'Slovenia', 'Sweden', 'Eswatini', 'Syrian Arab Republic', 'Chad', 'East Asia & Pacific (IDA & IBRD countries)', 'Europe & Central Asia (IDA & IBRD countries)', 'Togo', 'Thailand', 'Tajikistan', 'Turkmenistan', 'Latin America & the Caribbean (IDA & IBRD countries)', 'Timor-Leste', 'Middle East & North Africa (IDA & IBRD countries)', 'Tonga', 'South Asia (IDA & IBRD)', 'Sub-Saharan Africa (IDA & IBRD countries)', 'Trinidad and Tobago', 'Tunisia', 'Turkiye', 'Tanzania', 'Uganda', 'Ukraine', 'Upper middle income', 'Uruguay', 'United States', 'Uzbekistan', 'St. Vincent and the Grenadines', 'Venezuela, RB', 'Virgin Islands (U.S.)', 'Vietnam', 'Vanuatu', 'World', 'Samoa', 'Yemen, Rep.', 'South Africa', 'Zambia', 'Zimbabwe']

Unemployment Rates for the World’s major Economies as of 1999

I/P:
Country = ["United States", "China","Japan", "Germany","India","United Kingdom","France","Italy","Canada"]
for i in Country:
print(f'{i} ~~~~> {df[f"{i}"]["1999"]}')

Effect of 2008 recession on World’s major economies

I/P:
Country = ["United States", "China","Japan", "Germany","India","United Kingdom","France","Italy","Canada"]
for i in Country:
print(i)
print("Before 2008 Recession",df[f"{i}"]["2007"])
print("After 2008 Recession",df[f"{i}"]["2009"])
print()

Effect of COVID-19 pandemic on World’s major economies

for i in Country:
print(i)
print("Before Pandemic",df[f"{i}"]["2018"])
print("After Pandemic",df[f"{i}"]["2021"])
print()

Comments

Popular posts from this blog

Covariance and Correlation

Split it up - Part 1

Why activation function is needed in Neural Networks???