EDA ~ Unemployment Rate
Today let’s explore the unemployment rate across different countries of the world. The dataset we will be using today is Unemployment data— World wide Figures present on the Kaggle platform. It includes the unemployment rate for 31 years of each country i.e. from the year 1991–2021 with columns such as Columns of the dataset, Country name, Country code, Years- 1991 to 2021
Note: For quick Pandas revision you can refer to this blog : Tutorial: Pandas
The unemployment rate formula is the number of unemployed people in the country, divided by the total number of workers available in the civilian labor force.
1. Import important libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
2. Reading the CSV file
df = pd.read_csv("/content/unemployment analysis.csv")
df.head()
3. Data Summary
I/P:
df.columnsO/P:
Index(['Country Name', 'Country Code', '1991', '1992', '1993', '1994', '1995','1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013','2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021'],
dtype='object')
- The info() method prints information about the DataFrame. The information contains the number of columns, column labels, column data types, memory usage, range index, and the number of cells in each column (non-null values).
df.info()
Observations:
* There are no missing values
* There are 33 columns in total
* There are 235 rows which has a country name, country code, and unemployment dataset between 1991 and 2021
* There are 2 categorical features Country Name and Country Code; and the rest are numeric features.
4. Finding and Removing the missing values
The isna() function is used to detect missing values.
I/P:
df.isna().sum()O/P:
Country Name 0 Country Code 0 1991 0 1992 0 1993 0 1994 0 1995 0 1996 0 1997 0 1998 0 1999 0 2000 0 2001 0 2002 0 2003 0 2004 0 2005 0 2006 0 2007 0 2008 0 2009 0 2010 0 2011 0 2012 0 2013 0 2014 0 2015 0 2016 0 2017 0 2018 0 2019 0 2020 0 2021 0
dtype: int64Observations:
* There are no missing values
Top 10 Countries with highest Unemployment rate
I/P:
top_10 = df.groupby(by = 'Country Name')['2021'].sum().sort_values(ascending=False).head(10)O/P:
Country Name South Africa 33.56
Djibouti 28.39
Eswatini 25.76
West Bank and Gaza 24.90
Botswana 24.72
Lesotho 24.60
Congo, Rep. 23.01
Gabon 22.26
Namibia 21.68
St. Vincent and the Grenadines 21.62
Name: 2021, dtype: float64
Observations:
The countries with the highest unemployment rates include South Africa, Djibouti, and Eswatini.
Top 10 Countries with the lowest Unemployment rate
I/P:
top_10 = df.groupby(by = 'Country Name')['2021'].sum().sort_values(ascending=True).head(10)O/P:
Country Name
Qatar 0.26
Cambodia 0.61
Niger 0.75
Solomon Islands 1.03
Lao PDR 1.26
Thailand 1.42
Benin 1.57
Rwanda 1.61
Burundi 1.79
Bahrain 1.87
Name: 2021, dtype: float64
Observations:
The countries that have the lowest unemployment rates are Qatar, Cambodia, and Niger.
Just because a country has a low unemployment rate, does not mean its citizens are necessarily well-off. That is determined by GDP per capita. - Source
Visualizing the Unemployment rate of the world in the year 1991
fig = px.choropleth(df,locations='Country Name',locationmode='country names',color='2021',hover_name='Country Name',title = '1991 Unemployment rate',
color_continuous_scale='aggrnyl')
fig.show()
Visualizing the Unemployment rate of the world in the year 2021
Switching rows and Columns
df = df.set_index("Country Name").transpose()
df.index.names = ["Year"]
df.head()
Note:
The
set_index()
method allows one or more column values become the row index.Syntax: dataframe.set_index(keys, drop, append, inplace, verify_integrity)
The transpose() function is used to transpose index and columns.Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa.
Source: W3School
Number of Countries:
I/P:
print(df.columns.tolist())O/P:
['Africa Eastern and Southern', 'Afghanistan', 'Africa Western and Central', 'Angola', 'Albania', 'Arab World', 'United Arab Emirates', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Burundi', 'Belgium', 'Benin', 'Burkina Faso', 'Bangladesh', 'Bulgaria', 'Bahrain', 'Bahamas, The', 'Bosnia and Herzegovina', 'Belarus', 'Belize', 'Bolivia', 'Brazil', 'Barbados', 'Brunei Darussalam', 'Bhutan', 'Botswana', 'Central African Republic', 'Canada', 'Central Europe and the Baltics', 'Switzerland', 'Channel Islands', 'Chile', 'China', "Cote d'Ivoire", 'Cameroon', 'Congo, Dem. Rep.', 'Congo, Rep.', 'Colombia', 'Comoros', 'Cabo Verde', 'Costa Rica', 'Caribbean small states', 'Cuba', 'Cyprus', 'Czech Republic', 'Germany', 'Djibouti', 'Denmark', 'Dominican Republic', 'Algeria', 'East Asia & Pacific (excluding high income)', 'Early-demographic dividend', 'East Asia & Pacific', 'Europe & Central Asia (excluding high income)', 'Europe & Central Asia', 'Ecuador', 'Egypt, Arab Rep.', 'Euro area', 'Eritrea', 'Spain', 'Estonia', 'Ethiopia', 'European Union', 'Fragile and conflict affected situations', 'Finland', 'Fiji', 'France', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Guinea', 'Gambia, The', 'Guinea-Bissau', 'Equatorial Guinea', 'Greece', 'Guatemala', 'Guam', 'Guyana', 'High income', 'Hong Kong SAR, China', 'Honduras', 'Heavily indebted poor countries (HIPC)', 'Croatia', 'Haiti', 'Hungary', 'IBRD only', 'IDA & IBRD total', 'IDA total', 'IDA blend', 'Indonesia', 'IDA only', 'India', 'Ireland', 'Iran, Islamic Rep.', 'Iraq', 'Iceland', 'Israel', 'Italy', 'Jamaica', 'Jordan', 'Japan', 'Kazakhstan', 'Kenya', 'Kyrgyz Republic', 'Cambodia', 'Korea, Rep.', 'Kuwait', 'Latin America & Caribbean (excluding high income)', 'Lao PDR', 'Lebanon', 'Liberia', 'Libya', 'St. Lucia', 'Latin America & Caribbean', 'Least developed countries: UN classification', 'Low income', 'Sri Lanka', 'Lower middle income', 'Low & middle income', 'Lesotho', 'Late-demographic dividend', 'Lithuania', 'Luxembourg', 'Latvia', 'Macao SAR, China', 'Morocco', 'Moldova', 'Madagascar', 'Maldives', 'Middle East & North Africa', 'Mexico', 'Middle income', 'North Macedonia', 'Mali', 'Malta', 'Myanmar', 'Middle East & North Africa (excluding high income)', 'Montenegro', 'Mongolia', 'Mozambique', 'Mauritania', 'Mauritius', 'Malawi', 'Malaysia', 'North America', 'Namibia', 'New Caledonia', 'Niger', 'Nigeria', 'Nicaragua', 'Netherlands', 'Norway', 'Nepal', 'New Zealand', 'OECD members', 'Oman', 'Other small states', 'Pakistan', 'Panama', 'Peru', 'Philippines', 'Papua New Guinea', 'Poland', 'Pre-demographic dividend', 'Puerto Rico', "Korea, Dem. People's Rep.", 'Portugal', 'Paraguay', 'West Bank and Gaza', 'Pacific island small states', 'Post-demographic dividend', 'French Polynesia', 'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'South Asia', 'Saudi Arabia', 'Sudan', 'Senegal', 'Singapore', 'Solomon Islands', 'Sierra Leone', 'El Salvador', 'Somalia', 'Serbia', 'Sub-Saharan Africa (excluding high income)', 'South Sudan', 'Sub-Saharan Africa', 'Small states', 'Sao Tome and Principe', 'Suriname', 'Slovak Republic', 'Slovenia', 'Sweden', 'Eswatini', 'Syrian Arab Republic', 'Chad', 'East Asia & Pacific (IDA & IBRD countries)', 'Europe & Central Asia (IDA & IBRD countries)', 'Togo', 'Thailand', 'Tajikistan', 'Turkmenistan', 'Latin America & the Caribbean (IDA & IBRD countries)', 'Timor-Leste', 'Middle East & North Africa (IDA & IBRD countries)', 'Tonga', 'South Asia (IDA & IBRD)', 'Sub-Saharan Africa (IDA & IBRD countries)', 'Trinidad and Tobago', 'Tunisia', 'Turkiye', 'Tanzania', 'Uganda', 'Ukraine', 'Upper middle income', 'Uruguay', 'United States', 'Uzbekistan', 'St. Vincent and the Grenadines', 'Venezuela, RB', 'Virgin Islands (U.S.)', 'Vietnam', 'Vanuatu', 'World', 'Samoa', 'Yemen, Rep.', 'South Africa', 'Zambia', 'Zimbabwe']
Unemployment Rates for the World’s major Economies as of 1999
I/P:
Country = ["United States", "China","Japan", "Germany","India","United Kingdom","France","Italy","Canada"]for i in Country:
print(f'{i} ~~~~> {df[f"{i}"]["1999"]}')
Observation:
At the end of the 20th century, China had the least unemployment rate among all major economies.France and Italy were having the highest unemployment rate
Effect of 2008 recession on World’s major economies
I/P:
Country = ["United States", "China","Japan", "Germany","India","United Kingdom","France","Italy","Canada"]for i in Country:
print(i)
print("Before 2008 Recession",df[f"{i}"]["2007"])
print("After 2008 Recession",df[f"{i}"]["2009"])
print()
Observations:
The country which was highly impacted by the 2008 recession was the USA
Compared to other major countries India’s unemployment rate didn’t change much
Effect of COVID-19 pandemic on World’s major economies
for i in Country:
print(i)
print("Before Pandemic",df[f"{i}"]["2018"])
print("After Pandemic",df[f"{i}"]["2021"])
print()
More EDA blogs are available down below, check out for gaining new concepts.
References:
Comments
Post a Comment