0.1 Project Motivation

Being a nerdy kaggler, it was no brainer to pick 1 out of 54k+ public datasets. Totally didn’t have paradox of choice/existential crisis. It lead me to find something completely opposite to what I was feeling; a dataset of happiness levels around the globe.
The World Happiness Report 2021 focuses on the effects of COVID-19 and how people all over the world have fared. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.
That’s how I decided to check out the pursuit of happi(y)ness.

0.1.1 Content

The happiness scores and rankings use data from the Gallup World Poll . The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors.

0.1.2 Data Source

  • For this analysis, the scope is just limited to the World Happiness 2021 and World Happiness (2005-2020) dataset.

  • The Happiness 2021 dataset was sourced from Kaggle.

  • Due to the sheer volume of data, the analysis focused specifically on 10 variables.The table below provides a description of the variables used during the analysis

  • Data Dictionary

Variable Datatype Explaination
country_name character Country name (141 countries)
year integer Year
ladder_score numeric Life evaluation score
regional_indicator numeric Region (10 Regions)
logged_GDP_per_capita numeric Extent to which GDP contributes to the calculation of the Ladder score
healthy_life_expectancy numeric Healthy life expectancies at birth based on the data extracted from the World Health Organisation (WHO) data repository
social_support numeric Defined as having someone to count on in times of trouble (ranked from 0 to 1)
freedom_to_make_life_choices numeric Defined as the national average of responses to the Gall-WorldPoll question (“Are you satisfied or dissatisfied with your freedom to choose what you do with your life?”)
generosity numeric National average of responses to the question - “Have you donated money to a charity in the past month?”
perception_of_corruption numeric National average of responses to the questions (“Is corruption widespread throughout the government or not” and “Is corruption widespread within businesses or not?” )
  • Details on the metadata of the dataset is provided under the References section.

Important steps-

  • To check my RPubs out click here

  • To check my github website click here

  • To check my GitHub repository out please click here

    • Clone my git repository
    • Open index.Rmd
    • Enter renv::restore()
    • Enter y

0.2 Research Questions

  • What is the effect of Covid19 on happiness levels in 2020 and 2021?
    • What are the top 10 happiest and saddest countries in 2021?
    • What is general trend of happiness in the world over last 3 years?
    • What factors contributed most to happiness scores?
    • Which regions showed change in happiness levels?

0.3 Data Preperation

0.3.1 Importing libraries and datasets

packages = c('tidyverse', # for easy handling of data
'heatmaply', # for visualizing data around plotly
'visdat', # for exploring missing data structure
'ggplot2', # for data visualization
'naniar', # for plotting missing values
'dplyr', # for data manipulation
'tidyr', # to create tidy data
'hrbrthemes', # to include additional themes for ggplot2
'ggchicklet', #stylize charts
'ggalt', # for statistical transformation
'corrplot', # for correlogram
'plotly', # for interactive and publication quality graphs
'cowplot', #add-on to ggplot
'patchwork', # to create layouts in ggplot
'RColorBrewer', # for ready to use color palettes
'ggbeeswarm', # to plot scatterplots
'scales', # for internal scaling
'kableExtra' # to build tables
)
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

#Reading the data
df_2021 <- read.csv("data/world-happiness-report-2021.csv")
df_all <- read.csv("data/world-happiness-report.csv")

0.4 Exploratory Data Analysis and cleaning

0.4.1 Full dataset (2005-2020)

Displaying dataset from 2005 t0 2020

df1<-head(df_all)
kable(df1) %>%
kable_styling(bootstrap_options = "condensed", font_size= 8, full_width = F)
ï..country_name year life_ladder log_GDP_per_capita social_support healthy_life_expectancy_at_birth freedom_to_make_life_choices generosity perceptions_of_corruption positive_affect negative_affect
Afghanistan 2008 3.724 7.370 0.451 50.80 0.718 0.168 0.882 0.518 0.258
Afghanistan 2009 4.402 7.540 0.552 51.20 0.679 0.190 0.850 0.584 0.237
Afghanistan 2010 4.758 7.647 0.539 51.60 0.600 0.121 0.707 0.618 0.275
Afghanistan 2011 3.832 7.620 0.521 51.92 0.496 0.162 0.731 0.611 0.267
Afghanistan 2012 3.783 7.705 0.521 52.24 0.531 0.236 0.776 0.710 0.268
Afghanistan 2013 3.572 7.725 0.484 52.56 0.578 0.061 0.823 0.621 0.273

0.4.2 Dataset 2021

Displaying dataset 2021

df2<-head(df_2021)
kable(df2) %>%
kable_styling(bootstrap_options = "condensed", font_size= 8, full_width = F)
ï..country_name regional_indicator ladder_score standard_error_of_ladder_score upperwhisker lowerwhisker logged_GDP_per_capita social_support healthy_life_expectancy freedom_to_make_life_choices generosity perceptions_of_corruption ladder_score_in_dystopia explained_by_Log_GDP_per_capita explained_by_social_support explained_by_healthy_life_expectancy explained_by_freedom_to_make_life_choices explained_by_generosity explained_by_perceptions_of_corruption dystopia_residual
Finland Western Europe 7.842 0.032 7.904 7.780 10.775 0.954 72.0 0.949 -0.098 0.186 2.43 1.446 1.106 0.741 0.691 0.124 0.481 3.253
Denmark Western Europe 7.620 0.035 7.687 7.552 10.933 0.954 72.7 0.946 0.030 0.179 2.43 1.502 1.108 0.763 0.686 0.208 0.485 2.868
Switzerland Western Europe 7.571 0.036 7.643 7.500 11.117 0.942 74.4 0.919 0.025 0.292 2.43 1.566 1.079 0.816 0.653 0.204 0.413 2.839
Iceland Western Europe 7.554 0.059 7.670 7.438 10.878 0.983 73.0 0.955 0.160 0.673 2.43 1.482 1.172 0.772 0.698 0.293 0.170 2.967
Netherlands Western Europe 7.464 0.027 7.518 7.410 10.932 0.942 72.4 0.913 0.175 0.338 2.43 1.501 1.079 0.753 0.647 0.302 0.384 2.798
Norway Western Europe 7.392 0.035 7.462 7.323 11.053 0.954 73.3 0.960 0.093 0.270 2.43 1.543 1.108 0.782 0.703 0.249 0.427 2.580

0.4.3 Exploring data

Visualizing datasets to check datatypes.

vis_data_1 <- vis_dat(df_2021)+ labs(x = "Datatypes for 2021 dataset")
vis_data_2 <- vis_dat(df_all)+ labs(x = "Datatypes for all dataset")
vis_data_1 + vis_data_2
Figure 1: Plots showing datatypes

Figure 1: Plots showing datatypes

Both datasets show numeric and character datatypes with few NA values.

0.4.4 Checking for missing values

Plotting missing values of both datasets.

miss_data_1 <- gg_miss_var(df_2021) + labs(y = "Checking for the missing ones in 2021")
miss_data_2 <- gg_miss_var(df_all) + labs(y = "Checking for the missing ones in full_data")
miss_data_1 + miss_data_2
Figure 2: Plots of missing values

Figure 2: Plots of missing values

df_2021 has no missing values whereas df_all has some columns with missing values.

0.4.5 Treating NA values

Imputing missing values with mean of the columns

df_all[sapply(df_all, is.numeric)] <- lapply(df_all[sapply(df_all, is.numeric)], function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))  
df_all %>% summarise(across(everything(), ~ sum(is.na(.))))

0.4.6 Cleaned dataset

Checking for NA values in dataset.

gg_miss_var(df_all) + labs(y = "Checking for the missing ones in full_data")
Figure 3: Plot of NA for cleaned dataset

Figure 3: Plot of NA for cleaned dataset

Dataset has no missing values after treatment.

0.4.7 Summary of full dataset

summary(df_all)
##  ï..country_name         year       life_ladder    log_GDP_per_capita
##  Length:1949        Min.   :2005   Min.   :2.375   Min.   : 6.635    
##  Class :character   1st Qu.:2010   1st Qu.:4.640   1st Qu.: 8.478    
##  Mode  :character   Median :2013   Median :5.386   Median : 9.443    
##                     Mean   :2013   Mean   :5.467   Mean   : 9.368    
##                     3rd Qu.:2017   3rd Qu.:6.283   3rd Qu.:10.335    
##                     Max.   :2020   Max.   :8.019   Max.   :11.648    
##  social_support   healthy_life_expectancy_at_birth freedom_to_make_life_choices
##  Min.   :0.2900   Min.   :32.30                    Min.   :0.2580              
##  1st Qu.:0.7510   1st Qu.:58.90                    1st Qu.:0.6490              
##  Median :0.8340   Median :65.00                    Median :0.7590              
##  Mean   :0.8126   Mean   :63.36                    Mean   :0.7426              
##  3rd Qu.:0.9050   3rd Qu.:68.40                    3rd Qu.:0.8540              
##  Max.   :0.9870   Max.   :77.10                    Max.   :0.9850              
##    generosity         perceptions_of_corruption positive_affect
##  Min.   :-0.3350000   Min.   :0.0350            Min.   :0.322  
##  1st Qu.:-0.1060000   1st Qu.:0.6990            1st Qu.:0.627  
##  Median :-0.0160000   Median :0.7930            Median :0.718  
##  Mean   : 0.0001032   Mean   :0.7471            Mean   :0.710  
##  3rd Qu.: 0.0850000   3rd Qu.:0.8680            3rd Qu.:0.798  
##  Max.   : 0.6980000   Max.   :0.9830            Max.   :0.944  
##  negative_affect 
##  Min.   :0.0830  
##  1st Qu.:0.2070  
##  Median :0.2600  
##  Mean   :0.2685  
##  3rd Qu.:0.3190  
##  Max.   :0.7050

0.4.8 Summary of full dataset

summary(df_2021)
##  ï..country_name    regional_indicator  ladder_score  
##  Length:149         Length:149         Min.   :2.523  
##  Class :character   Class :character   1st Qu.:4.852  
##  Mode  :character   Mode  :character   Median :5.534  
##                                        Mean   :5.533  
##                                        3rd Qu.:6.255  
##                                        Max.   :7.842  
##  standard_error_of_ladder_score  upperwhisker    lowerwhisker  
##  Min.   :0.02600                Min.   :2.596   Min.   :2.449  
##  1st Qu.:0.04300                1st Qu.:4.991   1st Qu.:4.706  
##  Median :0.05400                Median :5.625   Median :5.413  
##  Mean   :0.05875                Mean   :5.648   Mean   :5.418  
##  3rd Qu.:0.07000                3rd Qu.:6.344   3rd Qu.:6.128  
##  Max.   :0.17300                Max.   :7.904   Max.   :7.780  
##  logged_GDP_per_capita social_support   healthy_life_expectancy
##  Min.   : 6.635        Min.   :0.4630   Min.   :48.48          
##  1st Qu.: 8.541        1st Qu.:0.7500   1st Qu.:59.80          
##  Median : 9.569        Median :0.8320   Median :66.60          
##  Mean   : 9.432        Mean   :0.8147   Mean   :64.99          
##  3rd Qu.:10.421        3rd Qu.:0.9050   3rd Qu.:69.60          
##  Max.   :11.647        Max.   :0.9830   Max.   :76.95          
##  freedom_to_make_life_choices   generosity       perceptions_of_corruption
##  Min.   :0.3820               Min.   :-0.28800   Min.   :0.0820           
##  1st Qu.:0.7180               1st Qu.:-0.12600   1st Qu.:0.6670           
##  Median :0.8040               Median :-0.03600   Median :0.7810           
##  Mean   :0.7916               Mean   :-0.01513   Mean   :0.7274           
##  3rd Qu.:0.8770               3rd Qu.: 0.07900   3rd Qu.:0.8450           
##  Max.   :0.9700               Max.   : 0.54200   Max.   :0.9390           
##  ladder_score_in_dystopia explained_by_Log_GDP_per_capita
##  Min.   :2.43             Min.   :0.0000                 
##  1st Qu.:2.43             1st Qu.:0.6660                 
##  Median :2.43             Median :1.0250                 
##  Mean   :2.43             Mean   :0.9772                 
##  3rd Qu.:2.43             3rd Qu.:1.3230                 
##  Max.   :2.43             Max.   :1.7510                 
##  explained_by_social_support explained_by_healthy_life_expectancy
##  Min.   :0.0000              Min.   :0.0000                      
##  1st Qu.:0.6470              1st Qu.:0.3570                      
##  Median :0.8320              Median :0.5710                      
##  Mean   :0.7933              Mean   :0.5202                      
##  3rd Qu.:0.9960              3rd Qu.:0.6650                      
##  Max.   :1.1720              Max.   :0.8970                      
##  explained_by_freedom_to_make_life_choices explained_by_generosity
##  Min.   :0.0000                            Min.   :0.000          
##  1st Qu.:0.4090                            1st Qu.:0.105          
##  Median :0.5140                            Median :0.164          
##  Mean   :0.4987                            Mean   :0.178          
##  3rd Qu.:0.6030                            3rd Qu.:0.239          
##  Max.   :0.7160                            Max.   :0.541          
##  explained_by_perceptions_of_corruption dystopia_residual
##  Min.   :0.0000                         Min.   :0.648    
##  1st Qu.:0.0600                         1st Qu.:2.138    
##  Median :0.1010                         Median :2.509    
##  Mean   :0.1351                         Mean   :2.430    
##  3rd Qu.:0.1740                         3rd Qu.:2.794    
##  Max.   :0.5470                         Max.   :3.482

0.5 Visualization 1 -Top 10 happiest and saddest countries in 2021

0.5.1 Getting top and bottom 10 countries.

Mapping countries by region and filtering countries by ladder_score i.e. happiness level.

# Subsetting dimensions
dimensions <- c('ladder_score',
                'logged_GDP_per_capita',
                'social_support',
                'healthy_life_expectancy',
                'freedom_to_make_life_choices',
                'generosity',
                'perceptions_of_corruption')

# Mapping country to regions
country_region_dict = df_2021 %>% 
  select(country = ï..country_name, region = regional_indicator) %>% unique()

df_2021_long <- df_2021 %>% 
  select(country = ï..country_name, all_of(dimensions)) %>%
  mutate(absence_of_corruption = 1- perceptions_of_corruption) %>%
  pivot_longer(cols = c(all_of(dimensions),'absence_of_corruption'),
               names_to = 'dimension', values_to = 'score') %>%
  filter(dimension != "perceptions_of_corruption")

# Calculating score using min max values
df_2021_tranformed <- df_2021_long %>%
  group_by(dimension) %>%
  mutate(min_value = min(score),
         max_value = max(score)) %>%
  mutate(score_pct = (score-min_value)/(max_value-min_value)) %>%
  ungroup()

# Getting top 10 countries
df_2021_top10 <- df_2021_tranformed %>%
  filter(dimension == "ladder_score") %>%
  slice_max(score, n = 10) %>%
  mutate(cat = 'top_10', 
         country_rank = rank(-score),
         country_label = paste0(country, ' (', country_rank, ')'))

# Getting bottom 10 countries
df_2021_bottom10 <- df_2021_tranformed %>%
  filter(dimension == "ladder_score") %>%
  mutate(country_rank = rank(score),
         country_label = paste0(country, ' (', country_rank, ')')) %>%
  slice_min(score, n = 10) %>%
  mutate(cat = 'bottom_10')

0.5.2 Plotting top and bottom 10 countries

# Plotting top 10 happiest countries 
top_10 <- ggplot(df_2021_top10, aes(x = reorder(country_label, score))) + 
  geom_chicklet(aes(y = 10, fill = 4.9), width = 0.5, radius = grid::unit(5, "pt")) +
  geom_chicklet(aes(y = score, fill = score), width = 0.5, radius = grid::unit(5, "pt")) +
  geom_text(aes(y = score), label = round(df_2021_top10$score,2), nudge_y = 0.4, size = 3) + 
  scale_y_continuous(expand = c(0, 0.1), position = "right", limits = c(0, 10)) +
  scale_fill_gradient2(low = 'black', high = '#818aeb', mid = 'white', midpoint = 5) + 
  coord_flip() +
  labs(y="Best possible life = 10", x = '',
       title="Top 10 Happiest Countries in 2021",
       subtitle="9 of the happiest countries present in Europe",
       caption="Source: The World Happiness Report 2021") + 
  theme_ipsum(grid = '')  +
  theme(plot.title = element_text(size=15),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 10),
        axis.title.x = element_text(size= 10, color = '#555955'),
        axis.text.y = element_text(size = 10, color = 'black'),
        axis.text.x = element_blank(),
        legend.position = 'None')

# Plotting 10 saddest countries
bottom_10 <- ggplot(df_2021_bottom10, aes(x = reorder(country_label, score))) + 
  geom_chicklet(aes(y = 10, fill = 4.9), width = 0.5, radius = grid::unit(5, "pt")) +
  geom_chicklet(aes(y = score, fill = score), width = 0.5, radius = grid::unit(5, "pt")) +
  geom_text(aes(y = score), label = round(df_2021_bottom10$score,2), nudge_y = 0.4, size = 3) + 
  scale_y_continuous(expand = c(0, 0.1), position = "right", limits = c(0, 10)) +
  scale_fill_gradient2(low = '#074040', high = '#4cc2c2', mid = 'white', midpoint = 5) + 
  coord_flip() +
  labs(y="Best possible life = 10", x = '',
       title="Top 10 Saddest Countries in 2021",
       subtitle="Mostly struck by poverty",
       caption="Source: The World Happiness Report 2021") + 
  theme_ipsum(grid = '') +
  theme(plot.title = element_text(size=15),
        plot.subtitle = element_text(size = 12),
        plot.caption = element_text(size = 10),
        axis.title.x = element_text(size= 10, color = '#555955'),
        axis.text.y = element_text(size = 10, color = 'black'),
        axis.text.x = element_blank(),
        legend.position = 'None')

# Displaying plots side by side
top_10 + bottom_10
Figure 4: Plot of 10 happiest and saddest countries in the world

Figure 4: Plot of 10 happiest and saddest countries in the world

Most of the happiest countries comprise in Europe. Most saddest seem to be under financial crisis


0.6 Visualization 2 -Happiness trend in 2019, 2020 and 2021 (insights w.r.t. covid19)

0.6.1 Extracting relevant columns for analysis

Subsetting country, region, ladder_score for the years 2019 and 2020.

# Filtering coloums required to evaluate trends in years 2019 and above
df_2019_2020 <- df_all %>% 
  filter(year >= 2019) %>%
  left_join(country_region_dict, by = c('ï..country_name' = 'country')) %>%
  select(country = ï..country_name, region, year, ladder = life_ladder)  %>%
  pivot_wider(names_from = 'year', names_prefix = 'year', values_from = 'ladder') %>%
  filter(!is.na(year2019) & !is.na(year2020)) %>%
  group_by(region) %>%
  summarize(happiness_2019 = mean(year2019, na.rm = TRUE),
            happiness_2020 = mean(year2020, na.rm = TRUE)) %>%
  mutate(diff = happiness_2020-happiness_2019) %>%
  arrange(diff) %>%
  mutate(region = factor(region, levels = region))

0.6.2 Plotting happiness levels during covid19

# Visualizing difference between happiness scores in 2019 and 2020
plot_2020 <- ggplot() + 
  geom_dumbbell(data = df_2019_2020 %>% filter(diff >0),
                aes(y=region, x=happiness_2019, xend=happiness_2020),
                size=1.5, color="#7FB185", 
                colour_xend = "#7FB185", colour_x = "#7FB185",
                size_x = 2.5, size_xend = 5,
                dot_guide=TRUE, dot_guide_size=0.5) +
  geom_dumbbell(data = df_2019_2020 %>% filter(diff <0),
                aes(y=region, x=happiness_2019, xend=happiness_2020),
                size=1.5, color="#edae52", 
                colour_xend = "#edae52", colour_x = "#edae52",
                size_x = 2.5, size_xend = 5,
                dot_guide=TRUE, dot_guide_size=0.5) +
  scale_y_discrete(limits = levels(df_2019_2020$region), expand=c(0.075,1)) +
  labs(x='', y=NULL,
       title="Happiness in pre to amidst Covid",
       subtitle = 'Regions see increases in happiness, despite Covid',
       caption= 'Source: World Happiness Report (2021)') +
  geom_rect(data=df_2019_2020,
            aes(xmin=7.35, xmax=7.65, ymin=-Inf, ymax=Inf),
            fill="#e3e2e1") +
  geom_text(data=df_2019_2020 %>% filter(region == 'South Asia'),
            aes(x=happiness_2020, y=region, label= "2020"),
            color="gray15", size=3, vjust=-1.5) +
  geom_text(data=df_2019_2020 %>% filter(region == 'South Asia'),
            aes(x=happiness_2019, y=region, label= "2019"),
            color="gray15", size=3, vjust=-1.5) +
  geom_text(data=df_2019_2020 %>% filter(diff>0),
            aes(x=happiness_2020 , y=region, label=round(happiness_2020,2)),
            size=4, hjust=-0.5) +
  geom_text(data=df_2019_2020 %>% filter(diff>0),
            aes(x=happiness_2019 , y=region, label=round(happiness_2019,2)),
            color="gray15", size=4, hjust=1.3) +
  geom_text(data=df_2019_2020 %>% filter(diff<0),
            aes(x=happiness_2020 , y=region,
                label=round(happiness_2020,2)),size=4, hjust=1.5) +
  geom_text(data=df_2019_2020 %>% filter(diff<0),
            aes(x=happiness_2019 , y=region,
                label=round(happiness_2019,2)),
            color="gray15", size=4, hjust=-0.3) +
  geom_text(data=df_2019_2020 %>%
              filter(region == 'South Asia'),
            aes(x=7.5, y=region, label="DIFF"),
            size=4.5, vjust=-1.5, fontface="bold") +
  geom_text(data=df_2019_2020, aes(label=round(diff,2),
                                   y=region, x=7.5), size=3) + 
  theme_ipsum(grid="") +
  theme(plot.title = element_text(size=20),
        plot.subtitle = element_text(size = 15),
        plot.caption = element_text(size = 12),
        axis.title.x = element_text(size= 12, color = '#3a403a'),
        axis.text.y = element_text(size = 15, color = 'black'),
        axis.text.x = element_blank(),
        legend.position = 'left')

0.6.3 Creating new dataframe to compare happiness levels amidst Covid to 2021 level

Combining dimentions from both datasets to form new dataset with country, region, year and ladder_score.

# Adding year column to 2021 dataset
df_2021$year <- rep(2021,nrow(df_2021)) 

# Renaming 2021 `ladder_score` as `happiness_2021`
df_2021_new <- cbind(df_2021)
names(df_2021_new)[names(df_2021_new) == 'ladder_score'] <- 'happiness_2021'

# Joining 2020 and 2021 dataset
df_yr_score<-full_join(df_2019_2020, df_2021_new,
                       by=c("region"="regional_indicator"))

0.6.4 Creating new dataframe with region, country and ladder_score columns for year 2019,2020 and 2021

# Merging country regions with countries 
df_all_region <- df_all %>% 
  left_join(country_region_dict, by = c('ï..country_name' = 'country')) %>%
  select(country = ï..country_name, region, year, ladder = life_ladder) 

# Renaming region, ladder score in data_all dataset
names(df_all_region)[names(df_all_region) == 'region'] <- 'regional_indicator'
names(df_all_region)[names(df_all_region) == 'ladder'] <- 'ladder_score'


# Subsetting df_2021 dataset
df_2021_region<- df_2021 %>%
  select(country = ï..country_name, regional_indicator, year, ladder_score)

# Binding all the regions in `df_final` dataset
df_final <-rbind(df_all_region,df_2021_region) %>%
  filter(!is.na(year) & !is.na(regional_indicator)) 

# Making dataset of last 3 years
df_final_19_20_21 <- df_final %>% 
  filter(year >= 2019)

0.8 Visualization 3- Factors correlating to happiness

Plotting correlation among factors related to happiness in 2021 dataset.

# Subsetting numerical colums 
df_cor <- df_2021 %>% 
  select(corruption = perceptions_of_corruption,
         generosity = generosity,
         freedom = freedom_to_make_life_choices, 
         life_expectancy = healthy_life_expectancy, 
         social_support = social_support,
         GDP_per_capita = logged_GDP_per_capita, 
         happiness = ladder_score
  )
# Displaying heatmap of correlation
corr <- cor(df_cor)
plot_ly(colors = "RdBu") %>%
  add_heatmap(x = rownames(corr), y = colnames(corr), z = corr) %>%
  colorbar(limits = c(-1, 1))

Figure 7: Correlation Matrix

# plotly is used to make an interactive plots. If the mouse is hovered over figure, datapoints will be visible.

Top 3 contributors of happiness from the 2021 dataset are:

  1. Life Expectancy
  2. Social Support
  3. GDP per capita

0.9 Visualization 4- General trend of happiness levels over the regions.

0.9.1 Plotting overall trend of happiness scores across all regions.

# Mapping regions to happiness scores.
region_level <- ggplot(df_final, aes( x = regional_indicator, y = ladder_score, fill = regional_indicator, text = country)) + geom_beeswarm(aes(color = regional_indicator, alpha=1) )

region_level4 <- region_level   +  geom_boxplot(aes(alpha=2 )) +
            ggtitle("Country-wise happiness trends in regions")+
            #theme_classic() + 
            theme(legend.position = "none", axis.text.x=element_text( angle = 0,hjust=1, size=8)) +  
            scale_x_discrete(labels = wrap_format(10))+  
            scale_fill_brewer(palette = "Spectral") + 
            scale_color_brewer(palette = "Spectral")
        ggplotly(region_level4, tooltip = c("country","ladder_score"))

Figure 8: Happiness trend plot across regions

# plotly is used to make an interactive plots. If the mouse is hovered over figure, datapoints will be visible.

Top happiest regions in the world are Western Europe and North America & ANZ.


0.10 Summary

Happiness across the globe has taken a blow. None of the regions in the world show rise in happiness in 2021 as compared to 2020 (amidst covid).

Countries having good healthy lifestyle, social support, and high good per capita income have high happiness index. Western Europe and Northern America further confirm the pattern with high happiness scores.

0.10.1 What I learned?

Definitely got what the fuss is all about for R. Despite being a python enthusiast I have grown to love R. Learned how R handles reproducibility and version control issues, share my project without the hassle of handling dependencies and new packages like packrat, vizdat, nainar etc. Learned a lot about automation in RMarkdown and publication on RPubs.

0.10.2 What I would do next?

There were several variables untouched due to time constraint and scope of the project. Further I would see how the six factors contribute to happiness scores and look at the time series analysis of the most influential factors. Futhermore, make an interactive dashboard to display regression and cluster analysis of important features.


