STAT 541 Homework 4 NOTE: You MUST intersperse comments (lines that start with * and end with ; or lines that start with /* and end with */) in your code to explain what your SAS statements are supposed to be doing. Please be generous with your comments, since you will be graded not only on the correctness of the code, but partially on the clarity of comments. NOTE: Submit your solution code via Blackboard (see course web page for instructions). Please save your work as a plain text file (e.g., a .txt file) and then submit that file in Blackboard. NOTE: PLEASE put WITHIN COMMENTS any text (i.e., if you choose to include problem numbers, problem description, your personal comments, output/results) in your file that is not actual SAS code. This will make it easier and faster to grade. The grader should be able to copy and paste your entire file into SAS and have it run correctly. 1. The following problem uses the sashelp.us_data data set. The ultimate goal will be to write a SAS program that will apply the %dembplot macro to EVERY numeric variable in the sashelp.us_data data set, WITHOUT having to type the names of all those variables. (a) The following code creates a SAS data set called mynumvars. Run the code and print this mynumvars data set, and in your comments describe what mynumvars contains. Also carefully provide comments for this block of code explaining what each piece does. proc contents data=sashelp.us_data out=mycontents noprint; run; data myvars; set mycontents(keep=NAME TYPE); run; data mynumvars; set myvars; IF TYPE=1; drop TYPE; run; (b) Write SAS code to repeatedly call the %dembplot macro that we saw in the class examples, each time automatically entering a different numeric variable in the place of the positional parameter, until it has created a boxplot for EVERY numeric variable from the sashelp.us_data data set. (You should not have to type all the names of the numeric variables yourself!) [NOTE: You will have to alter the %dembplot macro slightly in a couple of places, specifically the data= part and the datalabel= part.] (c) Do the same thing as in part (b), except now create side-by-side box plots for every numeric variable, where these side-by-side box plots are separate for each of the levels of the Region variable. 2. For this problem, we will use the two data sets which can be created using the code on the course web page labeled "Tennis Singles League Lists". This gives lists of male players in the singles-format league of the Columbia Tennis League for 2014 (first data set) and for 2015 (second data set). Alternatively, you can load the data using the code: DATA CTL2014; FILENAME webpage URL 'http://people.stat.sc.edu/hitchcock/tennissingleslists.txt'; INFILE webpage DLM='09'X firstobs=5 obs=42; INPUT Name :$24. Gender $ City :$10. State $ Rating $ RatingDate MMDDYY10. RatingType $; run; DATA CTL2015; FILENAME webpage2 URL 'http://people.stat.sc.edu/hitchcock/tennissingleslists.txt'; INFILE webpage2 DLM='09'X firstobs=50 obs=102; INPUT Name :$24. Gender $ City :$10. State $ Rating $ RatingDate MMDDYY10. RatingType $; run; (a) Write a program that creates a new variable called "NewDate" based on the CTL2015 data set. This NewDate variable will shift the "RatingDate" variable in time, based on the value of the Rating variable: * If a player's Rating is 4.5, then NewDate is a shift of RatingDate EXACTLY 3 weeks forward in time. * If a player's Rating is 4.0, then NewDate is a shift of RatingDate EXACTLY 4 years forward in time. * If a player's Rating is 3.5, then NewDate is a shift of RatingDate EXACTLY 10 days forward in time. * If a player's Rating is 3.0, then NewDate is a shift of RatingDate EXACTLY 2 months back in time. * If a player's Rating is 2.5, then NewDate is a shift of RatingDate forward in time to the first day of the next quarter. Print ONLY the Name, Rating, RatingDate, and NewDate variables in the resulting data set. (b) Suppose the data lines that make up the CTL2014 and CTL2015 data sets were saved as external data files having the names "CTL2014.txt" and "CTL2015.txt". Write a program that uses the FILEVAR option to read in and automatically concatenate/stack the CTL2014 and CTL2015 data sets. You may name the external file specifications to have any path/directory name, but the actual file names should be CTL2014.txt and CTL2015.txt . (c) The CTL2015more data set has data for the players in CTL2015 who played at least one match in the singles league in 2015. It has 4 additional variables: MatchWins, MatchLosses, GameWins, and GameLosses. The data can be read in using this code: DATA CTL2015more; FILENAME webpage3 URL 'http://people.stat.sc.edu/hitchcock/tennissinglesmore.txt'; INFILE webpage3 DLM='09'X; INPUT Name :$24. Gender $ City :$10. State $ Rating $ RatingDate MMDDYY10. RatingType $ MatchWins MatchLosses GameWins GameLosses; run; Write a program that first creates an additional variable called GameWinPct, where GameWinPct= 100*(GameWins/(GameWins+GameLosses)) Next, create an additional variable called MatchWinPct, where MatchWinPct= 100*(MatchWins/(MatchWins+MatchLosses)) Your program should then create an additional variable called TotalMatches, which is the sum of MatchWins and MatchLosses for each player. Create the following plots in SAS: (d) a vertical bar plot in which the bars represent the different ratings, and the height of each bar represents the total number of matches played by individuals with that rating. [HINT: Look at the SAS online help for PROC SGPLOT, especially the VBAR statement.] (e) a panel of several scatter plots (one for each rating), in which MatchWinPct is plotted on the y-axis and GameWinPct is plotted on the x-axis, plotted separately in different panels for each rating. [HINT: Look at the SAS online help for PROC SGPANEL.] For a 3.5-rated player who wins 60% of his games, what percentage of matches would you expect him to win? Explain in your comments. Some of the examples on this webpage: https://people.stat.sc.edu/hitchcock/sgplot_examples.txt from my STAT 540 class may also be helpful.