STAT 19000: Project 9 — Fall 2020
Motivation: A key component to writing efficient code is writing functions. Functions allow us to repeat and reuse coding steps that we used previously, over and over again. If you find you are repeating code over and over, a function may be a good way to reduce lots of lines of code!
Context: We’ve been learning about and using functions all year! Now we are going to learn more about some of the terminology and components of a function, as you will certainly need to be able to write your own functions soon.
Scope: r, functions
Dataset
The following questions will use the dataset found in Scholar:
/class/datamine/data/goodreads/csv
Questions
Please make sure to double check that the your submission does indeed contain the files you think it does. You can do this by downloading your submission from Gradescope after uploading. If you can see all of your files and they open up properly on your computer, you should be good to go. |
Please make sure to look at your knit PDF before submitting. PDFs should be relatively short and not contain huge amounts of printed data. Remember you can use functions like |
Question 1
We’ve provided you with a function below. How many arguments does the function have, and what are their names? You can get a book_id
from the URL of a goodreads book’s webpage.
Two examples:
-
If you search for the book
Words of Radiance
on goodreads, thebook_id
contained in the url www.goodreads.com/book/show/17332218-words-of-radiance#, is 17332218. -
www.goodreads.com/book/show/157993.The_Little_Prince?from_search=true&from_srp=true&qid=JJGqUK9Vp9&rank=1, (the little prince) with a
book_id
of 157993.
Find 2 or 3 book_id
and test out the function until you get two successes. Explain in words, what the function is doing, and what options you have.
library(imager)
books <- read.csv("/class/datamine/data/goodreads/csv/goodreads_books.csv")
authors <- read.csv("/class/datamine/data/goodreads/csv/goodreads_book_authors.csv")
get_author_name <- function(my_authors_dataset, my_author_id){
return(my_authors_dataset[my_authors_dataset$author_id==my_author_id,'name'])
}
fun_plot <- function(my_authors_dataset, my_books_dataset, my_book_id, display_cover=T) {
book_info <- my_books_dataset[my_books_dataset$book_id==my_book_id,]
all_books_by_author <- my_books_dataset[my_books_dataset$author_id==book_info$author_id,]
author_name <- get_author_name(my_authors_dataset, book_info$author_id)
img <- load.image(book_info$image_url)
if(display_cover){
par(mfrow=c(1,2))
plot(img, axes=FALSE)
}
plot(all_books_by_author$num_pages, all_books_by_author$average_rating,
ylim=c(0,5.1), pch=21, bg='grey80',
xlab='Number of pages', ylab='Average rating',
main=paste('Books by', author_name))
points(book_info$num_pages, book_info$average_rating,pch=21, bg='orange', cex=1.5)
}
-
How many arguments does the function have, and what are their names?
-
The result of using the function on 2-3 `book_id`s.
-
1-2 sentences explaining what the function does (generally), and what (if any) options the function provides you with.
Question 2
You may have encountered a situation where the my_book_id
was not in our dataset, and hence, didn’t get plotted. When writing functions, it is usually best to try and foresee issues like this and have the function fail gracefully, instead of showing some ugly (and sometimes unclear) warning. Add some code at the beginning of our function that checks to see if my_book_id
is within our dataset, and if it does not exist, prints "Book ID not found.", and exits the function. Test it out on book_id=123
and book_id=19063
.
Run |
-
R code with your new and improved function.
-
The results from
fun_plot(123)
. -
The results from
fun_plot(19063)
.
Question 3
We have this nice get_author_name
function that accepts a dataset (in this case, our authors
dataset), and a book_id
and returns the name of the author. Write a new function called get_author_id
that accepts an authors name and returns the author_id
of the author.
You can test your function using some of these examples:
get_author_id(authors, "Brandon Sanderson") # 38550
get_author_id(authors, "J.K. Rowling") # 1077326
-
R code containing your new function.
-
The results of using your new function on a few authors.
Question 4
See the function below.
search_books_for_word <- function(word) {
return(books[grepl(word, books$description, fixed=T),]$title)
}
Given a word, search_books_for_word
returns the titles of books where the provided word is inside the book’s description. search_books_for_word
utilizes the books
dataset internally. It requires that the books
dataset has been loaded into the environment prior to running (and with the correct name). By including and referencing objects defined outside of our function’s scope within our function (in this case the variable books
), our search_books_for_word
function will be more prone to errors, as any changes to those objects may break our function. For example:
our_function <- function(x) {
print(paste("Our argument is:", x))
print(paste("Our variable is:", my_variable))
}
# our variable outside the scope of our_function
my_variable <- "dog"
# run our_function
our_function("first")
# change the variable outside the scope of our function
my_variable <- "cat"
# run our_function again
our_function("second")
# imagine a scenario where "my_variable" doesn't exist, our_function would break!
rm(my_variable)
our_function("third")
Fix our search_books_for_word
function to accept the books
dataset as an argument called my_books_dataset
and utilize my_books_dataset
within the function instead of the global variable books
.
-
R code with your new and improved function.
-
An example using the updated function.
Question 5
Write your own custom function. Make sure your function includes at least 2 arguments. If you access one of our datasets from within your function (which you definitely should do), use what you learned in (4), to avoid future errors dealing with scoping. Your function could output a cool plot, interesting tidbits of information, or anything else you can think of. Get creative and make a function that is fun to use!
-
R code used to solve the problem.
-
Examples using your function with included output.