STAT 541 Homework 2 NOTE: You MUST intersperse comments (lines that start with * and end with ; ) in your code to explain what your SAS statements are supposed to be doing. Please be generous with your comments, since you will be graded not only on the correctness of the code, but partially on the clarity of comments. SAS comments can also be lines that start with /* and end with */ . As a general rule, to get the best possible grade on your comments, I recommend having at least as many "words" of comments as you do of code in the program that you submit. Look on the course web page for an example of a "well-commented" SAS program! Please write your comments so that your code could be completely understood by someone unfamiliar with your code; for example, imagine yourself writing a SAS program for a company and writing the comments for your successor who will use your program after you leave the job. NOTE: Submit your solution code via Blackboard (see course web page for instructions). Please save your work as a plain text file (e.g., a .txt file) (see course web page for instructions for how to do that) and then submit that file in Blackboard. NOTE: PLEASE put WITHIN COMMENTS any text (i.e., if you choose to include problem numbers, problem description, your personal comments, output/results) in your file that is not actual SAS code. This will make it easier and faster to grade. The grader should be able to copy and paste your entire file into SAS and have it run correctly. 1. For this problem we will begin with the sashelp.baseball data set. (a) Create a new table called baseball2 in the work library (so that the full name of the new table is work.baseball2) that has exactly the same rows as the sashelp.baseball table, but only has the following columns: name, team, league, YrMajor, CrAtBat, CrHits, CrHome, CrRuns, CrRbi, CrBB, position, salary (b) Insert the following two rows into the work.baseball2 table: * A row for Babe Ruth, a left fielder for New York in the American League who played for 22 years and in his career had 8399 at bats, 2873 hits, 714 home runs, 2174 runs, 2214 runs batted in, and 2062 bases on balls. * A row for Roberto Clemente, a right fielder for Pittsburgh in the National League who played for 18 years and in his career had 9454 at bats, 3000 hits, 240 home runs, 1416 runs, 1305 runs batted in, and 621 bases on balls. NOTE: Assume that the salary is missing for both Ruth and Clemente. (c) Add three different types of integrity constraint to the work.baseball2 table. Verify that each integrity constraint works by attempting to insert rows that fail each respective constraint. In your program's comments, report the result of the attempted insertions. (d) Convert the salary from 1987 dollars (which the table contains) to numbers that better reflect today's salaries. Suppose that infielders (positions 1B, 2B, SS, 3B) nowadays make 6 times as much as in 1987; catchers (position C) make 5 times as much; and outfielders (positions LF, CF, RF) and designated hitters (position DH) make 8 times as much. Create a variable called 'modsal' (with a label of "Modern Salary Equivalent") that reflects this for all the players in the work.baseball2 table. 2. For this problem, we will use the two data sets which can be created using the code on the course web page labeled "Tennis Singles League Lists". This gives lists of male players in the singles-format league of the Columbia Tennis League for 2014 (first data set) and for 2015 (second data set). Use PROC SQL set operation commands to do the following: (a) Write a query that will combine the tables into one table that has all the players who played in both years. Select only the "Name", "Rating" and "RatingDate" columns to be printed. Have the RatingDate printed out in an understandable format. (b) Do the same as in part (a), but have the data sorted from highest rating to lowest rating. (c) Write a query that will combine the tables into one table that has all the players who played in 2015 but not in 2014. Select only the "Name", "Rating" and "RatingDate" columns to be printed. Have the RatingDate printed out in an understandable format. (d) Write a query that will combine the tables into one table that has all the players who played in either 2014 or 2015 or both years. Select only the "Name", "Rating" and "RatingDate" columns to be printed. Have the RatingDate printed out in an understandable format. (e) Do the same as in part (d), but for players who played in both years, have them be listed multiple times (i.e., as many times as they appear in the tables).