Tumgik
msmehtashubham-blog · 6 years
Text
Assignment 3
My research involves determining whether countries with higher female labour participation rates are more democratic.  The three variables which I have chosen from the Gapminder dataset are:
·        The female employment rate;
·        The employment rate; and
·        The polity score
I did some further reading this week on polity scores and discovered that the scores are grouped into 3 categories as follows:
·        A polity score between -10 and -6 indicates an autocracy, which is defined as a system of government where one person holds absolute power
·        A polity score between -5 and 5 indicates an anocracy.  According to Wikipedia, an anocracy is “a government regime featuring inherent qualities of political instability and ineffectiveness, as well as an incoherent mix of democratic and autocratic traits and practices”; and
·        A polity score between 6 and 10 indicates a democracy.
I decided, therefore, to break down the polity score into groups, which matched the polity categories indicated in Wikipedia and to compare the female employment rate and general employment rate of each group to see if my initial hypothesis was correct.  
For this assignment, I made and implemented the following management decisions:
·        I replaced missing data, indicated by “ “ in the dataset, with Nan
·        I split the polity score into three categories:
o   -10 to -6
o   -5 to 5
o   6 to 10
·        I recoded the polity score so that a score between -10  and -6 = Autocracy, a score between -5 and 5 = Anocracy and a score between 6 and 10 = Democracy
·        I cut the female employment rate into 10% bands from 10% to 90%
·        I recoded the column names for “employrate”, “femaleemployrate” and “polityscore” to “employ rate”, “female employ rate” and “polity score” respectively so that they were easier to understand.
·        I ran various pivot tables and looked at the distribution by polity category (i.e. autocracy, anocracy and democracy) and female employment rate range.
·        I looked at the mean of the employment rate, female employment rate and polity score.
·        I split the data into 3 subsets based on female labour participation rates up to 40%, female labour participation rates between 40% and 60% and female labour participation rates over 60% and calculated the percentage by polity category for each subset.
2)      Running Frequency Distributions
Distribution by Female Employment Rate Range and Polity Category
> ���ā|
d “;��G{
Mean of Employment Rate, Female Employment Rate and Polity Score by Polity Category and Female Employment Rate Range
·        By polity category, the majority of countries in the gap minder data set are democracies.
·        There was no data for female labour participation rate in autocracies in the highest range above 80% in autocracies and at the lowest end of the range up to 20% in democracies
·        The most frequent distribution by female labour participation rates were in the ranges between 40% and 60%
·        Female labour participation rate frequencies tend to be clustered around the centre of the female employment rate range in anocracies, autocracies and democracies.
·        There is a greater difference between the female labour participation rate and the labour participation rate at the lower end of the female employment rate range.
Program
import pandas as pd import numpy as np data = pd.read_csv('gapminder.csv', low_memory=False) #this code replaces missing data (indicated by one space) with np.nan and creates a new data set called data1 to allow conversion to numeric values data1 = data.replace(" ", np.nan)
#setting variables to numeric values data1['femaleemployrate'] = pd.to_numeric(data1['femaleemployrate']) data1['employrate'] = pd.to_numeric(data1['employrate'])#categorize quantitative variable based on customized splits using cut function on entire dataset
#splits polity score into 3 groups (-10--6, -5-5, 6-10) - NB Python starts counting from 0, not 1 data1['pscategory3'] = pd.cut(data1.polityscore,[-11, -7, 4, 9]) psc = data1['pscategory3'].value_counts(sort=False, dropna=True, normalize=True) print(psc) data1['polityscore'] = pd.to_numeric(data1['polityscore'])
#categorize quantitative variable based on customized splits using cut function on sub1 #splits female employment rate into 3 groups (10-20, 20-30, 30-40, 40-50, 50-60, 60-70, 70-80, 80-90) - NB Python starts counting from 0, not 1 data1['fer'] = pd.cut(data1.femaleemployrate,[9,19,29,39,49,59,69,79,89]) recode1 = {-10.0: 'Autocracy', -9.0: 'Autocracy', -8.0: 'Autocracy', -7.0: 'Autocracy', -6.0: 'Autocracy', -5.0: 'Anocracy',           -4.0: 'Anocracy', -3.0: 'Anocracy', -2.0: 'Anocracy', -1.0: 'Anocracy', 0.0: 'Anocracy', 1.0: 'Anocracy',          2.0: 'Anocracy', 3.0: 'Anocracy', 4.0: 'Anocracy', 5.0: 'Anocracy', 6.0: 'Democracy', 7.0: 'Democracy',           8.0: 'Democracy', 9.0: 'Democracy', 10.0: 'Democracy'} data1['pcategory'] = data1['polityscore'].map(recode1) pcc = data1['pcategory'].value_counts(sort=False, dropna=True) print(pcc) print("Polity Category Frequency Distribution") ppcc = data1['polity category'].value_counts(sort=False, normalize=True) print(ppcc) #subset data countries with a female employment rate higher than 10% and less than or equal to 40% sub1=data1[(data1['female employ rate']>10) & (data1['female employ rate']<=40)]
#frequency and distribution of polity category for countries with a female employment rate higher than 10% and less than or equal to 40% print("Distribution by Polity Category - Subset of Countries with Female Employment Rate up to 40%") s1c = sub1['polity category'].value_counts(sort=False, dropna=True) print(s1c) print("Percentage by Polity Category - Subset of Countries with Female Employment Rate up to 40%") s1pc = sub1['polity category'].value_counts(sort=False, normalize=True) print(s1pc)
#subset data countries with a female employment rate higher than 40% and less than or equal to 60% sub2=data1[(data1['female employ rate']>40) & (data1['female employ rate']<=60)] #frequency and distribution of polity category for countries with a female employment rate higher than 40% and less than or equal to 60% print("Distribution by Polity Category - Subset of Countries with Female Employment Rate between 40% and 60%") s2c = sub2['polity category'].value_counts(sort=False, dropna=True) print(s2c) print("Percentage by Polity Category - Subset of Countries with Female Employment Rate between 40% and 60%") s2pc = sub2['polity category'].value_counts(sort=False, normalize=True)
print(s2pc) #subset data countries with a female employment rate higher than 60% sub3=data1[(data1['female employ rate']>60)]
#frequency and distribution of polity category for countries with a female employment rate >60% print("Distribution by Polity Category - Subset of Countries with Female Employment Rate >60%") s3c = sub3['polity category'].value_counts(sort=False, dropna=True) print(s3c) print("Percentage by Polity Category - Subset of Countries with Female Employment Rate >60%") s3pc = sub3['polity category'].value_counts(sort=False, normalize=True) print(s3pc)
0 notes
msmehtashubham-blog · 6 years
Text
Assignment Week 2
/* Code to display label*/
LIBNAME mydata “/courses/d1406ae5ba27fe300 ” access=readonly;
DATA new; set mydata.gapminder;
LABEL incomeperperson = “Label Code Example”;
PROC FREQ; TABLE incomeperperson;
RUN;
Output : https://goo.gl/hynpf4
/* Code to group countries within High, Mid and Low Income range in realtion to life expectancy*/
LIBNAME mydata “/courses/d1406ae5ba27fe300” access=readonly;
proc sql;
create Table Testing as
SELECT
 CASE
   WHEN incomeperperson < 5000 THEN ‘LOW INCOME’
   WHEN incomeperperson BETWEEN 5000 AND 8000 THEN 'MID INCOME’
   ELSE 'HIGH INCOME’
 END AS incomeperperson ,
 CASE
   WHEN lifeexpectancy < 60 THEN 'LOW AGE’
   WHEN lifeexpectancy BETWEEN 60 AND 80 THEN 'MID AGE’
   ELSE 'HIGH AGE’
 END AS lifeexpectancy, country
FROM mydata.gapminder
order by incomeperperson asc;
proc freq data=work.Testing;
run;
/* The above code talks about the similarities and relationship between Countries, Income and Life expectancy.
My hypothesis was that when income rises the access to quality health case also increases and hence life expectancy would also increase.
However based on the outcome of above data, there is no relationship between these 2 parameters. */
Output :
https://goo.gl/bNZgNq
0 notes
msmehtashubham-blog · 6 years
Text
Week 1
I have selected the following data sets:-
Gapminder.csv
The Gapminder and similar data sets on Gapminder talk about different relative factors like education, life expectancy, military spending, gdp and etc. of global countries over a date range spanning few to multiple years. I find life expectancy as a very crucial factor among all these relationships and would like to research many factors that give rise to this. Following are my research questions:
a) Is Life expectancy dependent on income per person globally? This is in comparison to life expectancy in different countries.
b) Is life expectancy related to the availability of healthcare in different countries? This takes into account money spent by countries as a percentage of gdp.
References :
https://ourworldindata.org/the-link-between-life-expectancy-and-health-spending-us-focus
http://www.npr.org/sections/goatsandsoda/2017/04/20/524774195/what-country-spends-the-most-and-least-on-health-care-per-person
https://www.ineteconomics.org/perspectives/blog/the-link-between-health-spending-and-life-expectancy-the-us-is-an-outlier
Hypothesis :
My topic of interest is the relationship between health care spending, urban population and its impact on life expectancy. I feel that when access to health care increases, the income of individual increases - they have more money to spend on treatments and also select better treatments options, it gives rise a better outcome and hence this also adds to lifespan increase. I also predict that urban population is more informed and has better access to health care, thus again leading to greater life expectancy.
My additional topic of interest is how urban population impacts life expectancy in a country. Following is my research question :
a) Are Life expectancy and urban population increase related? - When a country has an increase in urban population does this also mean that life expectancy increase - Is there any relation between these two trends?
References :
http://www.nejm.org/doi/full/10.1056/NEJMsa020614
http://www.commonwealthfund.org/publications/press-releases/2014/jun/us-health-system-ranks-last
0 notes