STAT 541 Homework 1 NOTE: You MUST intersperse comments (lines that start with * and end with ; ) in your code to explain what your SAS statements are supposed to be doing. Please be generous with your comments, since you will be graded not only on the correctness of the code, but partially on the clarity, readability, and amount of your comments. SAS comments can also be lines that start with /* and end with */ . As a general rule, to get the best possible grade on your comments, I recommend having at least as many "words" of comments as you do of code in the program that you submit. Look on the course web page for an example of a "well-commented" SAS program! Please write your comments so that your code could be completely understood by someone unfamiliar with your code; for example, imagine yourself writing a SAS program for a company and writing the comments for your successor who will use your program after you leave the job. NOTE: Submit your solution code via Blackboard (see course web page for instructions). Please save your work as a plain text file (e.g., a .txt file) (see course web page for instructions for how to do that) and then submit that file in Blackboard. NOTE: PLEASE put WITHIN COMMENTS any text (i.e., if you choose to include problem numbers, problem description, your personal comments, output/results) in your file that is not actual SAS code. This will make it easier and faster to grade. The grader should be able to copy and paste your entire file into SAS and have it run correctly. 1. Baseball teams that are strong "up the middle" have good catchers (position = C), 2nd basemen (position = 2B), shortstops (position = SS), and center fielders (position = CF). Use the sashelp.baseball data set to determine which teams were strongest up the middle in 1986: (a) Write a PROC SQL query that will result in the teams' summed number of runs (nRuns) for all players in the "up the middle" positions. Have the result be reported with the teams listed from most summed number of runs to least. (b) Which teams have the most powerful outfields? Write a PROC SQL query that will result in the teams' summed number of HOME RUNS (nHome) for all players in the outfield positions (LF, CF, RF). Have the result be reported with the teams listed from most summed number of home runs to least. The CAREER statistics are given in the sashelp.baseball data set as 'CrAtBat', 'CrHits', 'CrHome', 'CrRuns', 'CrRbi', 'CrBB'. (c) Suppose an "all-time great" player has a least 12 years of experience ('YrMajor'), at least 2000 career hits ('CrHits'), at least 1000 career runs ('CrRuns'), and at least 1000 career runs batted in ('CrRbi'). Write a PROC SQL query to report those players in the sashelp.baseball data set who were all-time greats in 1986, that is, at the times these statistics were gathered. Report the players along with their teams and their career statistics. (d) Write a a PROC SQL query to calculate career batting average (hits divided by at-bats) for the players in the sashelp.baseball data set. Report the players along with their teams and their career batting averages, listed from highest career batting average to lowest. Format the career batting average so that exactly 3 decimal places are given in the output listing, and give the career batting average column a clear label. 2. The SAT scores data set is given on the course web page as "Average SAT Scores by State". You can use the DATA step code given to create a SAS data set called 'satscore'. In addition, examine the built-in sashelp.us_data data set. NOTE: If your sashelp library does have us_data in it, then please ignore this message. If your version of SAS Studio does not include the us_data dataset in the sashelp library, you can run this code to read in the data -- it will create a dataset called us_data in the work library rather than in the sashelp library (and it will also convert the population_1990 and population_2000 variables back to numeric): filename testurl url "http://people.stat.sc.edu/hitchcock/us_data.csv"; proc import datafile=testurl out=us_data dbms=csv replace; getnames=yes; run; data us_data; set us_data; new = input(POPULATION_1990, comma15.); drop POPULATION_1990; rename new=POPULATION_1990; run; data us_data; set us_data; new = input(POPULATION_2000, comma15.); drop POPULATION_2000; rename new=POPULATION_2000; run; (a) Use PROC SQL to do an inner join of the SAT scores and sashelp.us_data data sets that reports the appropriate columns with these labels: "State Name", "SAT Scores for 1990", "SAT Scores for 2000", "Population for 1990", "Population for 2000", "Region of U.S." (b) Do a left outer join, a right outer join, and a full outer join. Ensure that the "State Name" column has a non-missing value in each row in all the results. Comment on any differences in the results from these types of joins. (c) From the inner join, use PROC SQL to report which states (for which data is available) both improved by MORE THAN 15 points in SAT score from 1990 to 2000 AND grew in population from 1990 to 2000. (d) Use PROC SQL to report the average 2000 SAT score separately by region (based on those states for which data is available here).