Friday, December 11, 2015

Improving the program

1) This is my program, created using SAS:

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;DATA new; set mydata.gapminder;LABEL incomeperperson="Income per person" suicideper100TH="Suicide rate among 100,000 inhabitants" lifeexpectancy="Life expectancy" incomegroup="Aggregated income per person" suicidegroup="Agregated suicide rate" lifegroup="Aggregated life expectancy";if incomeperperson LE 1000.000000 then incomegroup=1; /*Less than 1,000 per person*/else if incomeperperson LE 5000.00000 then incomegroup=2; /*Between 1,000 and 5,000 per person*/else incomegroup=3; /*More than 5,000 per person*/if suicideper100TH LE 5 then suicidegroup=1; /*Up to 5 suicide per 100,000*/else if suicideper100TH LE 10 then suicidegroup=2; /*Between 5 and 10 suicide per 100,000*/else suicidegroup=3; /*More than 10 suicides per 100,000*/if lifeexpectancy LE 60 then lifegroup=1; /*Life expectancy under 60 years*/else if lifeexpectancy LE 75 then lifegroup=2; /*Life expectancy between 60 and 75 years old*/else lifegroup=3; /*Life expectancy over 75 years*/PROC SORT; by COUNTRY;PROC FREQ; TABLES incomeperperson suicideper100TH lifeexpectancy incomegroup suicidegroup lifegroup; PROC PRINT; var incomegroup suicidegroup life group;

2) This is the output: frequency tables in PDF

3) From the last week to this week, here is what I did: I added a third variable (life expectancy), and then grouped all variables. Now, instead of a big list of numbers, results are also aggregated into 3 groups (1 low / 2 medium / 3 high) regarding income per person, suicide rate and life expectancy. 
The dataset already accounts for missing data, so I didn't have to change that. 
I still have to analyze data further, but in general, only the obvious variables of income per person X life expectancy seem to correlate. Since my research question is related to suicide rate in rich countries, I still have to find more connections among the variables.

No comments:

Post a Comment