Session-2 Control Structures and Loops
2.1 Control Structures
Also known as Conditional statements, are primarily represented by the if-else construct, allow R programmers to execute specific code blocks based on conditions. For instance, you can use an if statement to check if a variable meets a certain criterion and perform different actions accordingly:
2.1.1 If-else :
## [1] "x is greater than 5"
In this example, the if statement evaluates whether x is greater than 5 and prints the appropriate message.
## [1] "Num1 is less or equal to Num2"
In this example, the if statement evaluates num1 is less than equal to num2 and prints the appropriate message.
## [1] "Negative number"
In this example, the if statement evaluates if x is a positive or negative number and prints the appropriate message.
2.1.2 Switch case
# Define the day_number variable
day_number <- 3
# Use a switch statement to determine the day of the week
day <- switch(
day_number,
"Sunday",
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday"
)
cat("It's", day, "\n")
## It's Tuesday
Hands on exercise:
Problem Statement: Using the above control structures try building a simple calculator:
Instructions :
- Define a Menu: In your R script, define a menu to display the available operations (addition, subtraction, multiplication, division) to the user. You can use cat to print messages to the console.
- Get User Input: Use readline to get the user’s choice of operation and the numbers for the calculation.
- Perform Calculations: Depending on the user’s choice, perform the corresponding calculation. You can use if-else or switch statements.
- Display the Result: Use cat to display the result of the calculation.
Hint
# Simple Calculator without Functions
# Function to display the menu and get user input
add <- function(x, y) {
return(x + y)
}
subtract <- function(x, y) {
return(x - y)
}
multiply <- function(x, y) {
return(x * y)
}
divide <- function(x, y) {
return(x / y)
}
# take input from the user
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first number: "))
num2 = as.integer(readline(prompt="Enter second number: "))
operator <- switch(choice,"+","-","*","/")
result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2), divide(num1, num2))
print(paste(num1, operator, num2, "=", result))
2.2 Loops in R
Loops are invaluable when you need to execute a block of code repeatedly. R offers several types of loops, with the most common being the for and while loops.
2.2.1 For loop
A for loop allows you to iterate over a sequence or a collection, executing code for each element. For example, you can use a for loop to go through a vector and perform operations on its elements:
Example - 1:
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
This loop multiplies each element in my_vector by 2 and prints the results.
Example - 2:
## The Sum of the first 100 numbers is 5050
Hands on exercise:
Problem Statement-1: You are given a vector of numbers, and you need to calculate the sum of the squares of these numbers using a for loop. Your task is to write R code to perform this calculation.
You can use the following vector as an example:
Hint:
- Initialize a variable, let’s call it sum_of_squares, to store the cumulative sum of the squares.
- Use a for loop to iterate through each element in the vector.
- Inside the loop, square the current number and add it to the sum_of_squares variable.
- After the loop completes, print the sum_of_squares value, which should be the sum of the squares of all the numbers in the vector.
# Create a vector of numbers (you can replace this with your own vector)
numbers <- c(2, 4, 6, 8, 10)
# Initialize a variable to store the sum of squares
sum_of_squares <- 0
# Use a for loop to iterate through each element in the vector
for (num in numbers) {
# Square the current number and add it to the sum_of_squares variable
sum_of_squares <- sum_of_squares + (num^2)
}
# Print the result
sum_of_squares
## [1] 220
Problem Statement-2: Given a number find its factorial.
You can use the following vector as an example:
Hint:
- Define the number for which to calculate factorial
- Initialize a variable to store the factorial
- Use a for loop to calculate the factorial
- After the loop completes, print the factorial value
# Define the number for which to calculate factorial
n <- 5
# Initialize a variable to store the factorial
factorial_result <- 1
# Use a for loop to calculate the factorial
for (i in 1:n) {
factorial_result <- factorial_result * i
}
# Print the result
cat("The factorial of", n, "is", factorial_result, "\n")
## The factorial of 5 is 120
Problem Statement-3: You are given a number ‘n’, Generate Fibonacci Series of the first ‘n’ numbers
You can use the following vector as an example:
Hint:
- Define the length of the Fibonacci series
- Initialize variables for the first two numbers in the series
- Use a for loop to generate the Fibonacci series
- Print the Fibonacci series.
# Define the length of the Fibonacci series
number <- 10
# Initialize variables for the first two numbers in the series
fibonacci <- numeric(number)
fibonacci[1] <- 0
fibonacci[2] <- 1
# Use a for loop to generate the Fibonacci series
for (i in 3:number) {
fibonacci[i] <- fibonacci[i - 1] + fibonacci[i - 2]
}
# Print the Fibonacci series
cat("Fibonacci Series (first", number, "numbers):", fibonacci, "\n")
## Fibonacci Series (first 10 numbers): 0 1 1 2 3 5 8 13 21 34
2.2.2 While Loop
A while loop continues execution as long as a specified condition remains true. It’s suitable when the number of iterations isn’t known in advance. Be cautious with while loops to avoid infinite loops that can crash your program. Here’s an example of a while loop:
## [1] "Iteration: 1"
## [1] "Iteration: 2"
## [1] "Iteration: 3"
## [1] "Iteration: 4"
## [1] "Iteration: 5"
This loop prints the message “Iteration: X” for each value of count from 1 to 5.
Problem Statement-1: You are given a number ‘n’, create a countdown timer using a while loop to count down from a specified number to 1
You can use the following vector as an example:
Hint:
Instructions : - Define the initial countdown value. - Use a while loop to create a countdown timer. - Display the time remaining in seconds. - Pause for 1 second between each countdown. - Decrement the countdown value until it reaches 1.
# Define the initial countdown value
countdown <- 10
# Use a while loop to create a countdown timer
while (countdown >= 1) {
cat("Time remaining:", countdown, "seconds\n")
Sys.sleep(1) # Pause for 1 second
countdown <- countdown - 1
}
## Time remaining: 10 seconds
## Time remaining: 9 seconds
## Time remaining: 8 seconds
## Time remaining: 7 seconds
## Time remaining: 6 seconds
## Time remaining: 5 seconds
## Time remaining: 4 seconds
## Time remaining: 3 seconds
## Time remaining: 2 seconds
## Time remaining: 1 seconds
## Countdown complete!
Problem Statement-2: You are given a number ‘n’, create a countdown timer using a while loop to count down from a specified number to 1 Calculate the sum of numbers from 1 to a specified limit using a while loop.
Hint:
Instructions:
- Define the limit for the sum.
- Initialize variables for the sum result and the current number.
- Use a while loop to calculate the sum.
- Add the current number to the sum result.
- Increment the current number until it reaches the limit.
# Define the limit for the sum
limit <- 100
# Initialize variables
sum_result <- 0
current_number <- 1
# Use a while loop to calculate the sum
while (current_number <= limit) {
sum_result <- sum_result + current_number
current_number <- current_number + 1
}
cat("The sum of numbers from 1 to", limit, "is", sum_result, "\n")
## The sum of numbers from 1 to 100 is 5050
Problem Statement-3: You are required to validate the user input if it is a valid number or not. You can use inbuilt R functions as.numeric, readline, is.na
Hint:
Use a while loop to repeatedly prompt the user for input until they enter a valid number.
Instructions:
- Initialize a variable to store user input.
- Use a while loop for input validation.
- Prompt the user to enter a number.
- Check if the input is a valid numeric value.
- Display an error message for invalid input and continue the loop.
- Exit the loop when valid input is provided.
# Initialize a variable to store user input
user_input <- NULL
# Use a while loop to validate user input
while (is.null(user_input)) {
user_input <- as.numeric(readline("Enter a number: "))
if (is.na(user_input)) {
cat("Invalid input. Please enter a valid number.\n")
user_input <- NULL # Reset user_input to continue the loop
}
}
cat("You entered:", user_input, "\n")
Control Structures and Data Analysis In data analysis with R, control structures and loops are essential. You can use them to filter, transform, and manipulate data frames, lists, and vectors. For instance, you can create a loop to process multiple data files, apply conditional operations to clean data, or iterate through rows and columns to perform calculations.
# Example: Calculating the mean of columns in a data frame
data <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
means <- numeric(length(data))
for (i in 1:ncol(data)) {
means[i] <- mean(data[, i])
}
print(means)
## [1] 2 5
In this example, a for loop calculates the mean of each column in a data frame, providing valuable insights into the dataset.
2.3 Loop alternatives
Command | Description |
---|---|
apply(X, MARGIN, FUN) |
Obtain a vector/array/list by applying FUN along the specified MARGIN of an array or matrix X |
map(.x, .f, ...) |
Obtain a list by applying .f to every element of a list or atomic vector .x |
map_<type>(.x, .f, ...) |
For <type> given by lgl (logical), int (integer), dbl (double) or chr (character), return a vector of this type obtained by applying .f to each element of .x |
map_at(.x, .at, .f) |
Obtain a list by applying .f to the elements of .x specified by name or index given in .at |
map_if(.x, .p, .f) |
Obtain a list .f to the elements of .x specified by .p (a predicate function, or a logical vector) |
mutate_all/_at/_if |
Mutate all variables, specified (at) variables, or those selected by a predicate (if) |
summarize_all/_at/_if |
Summarize all variables, specified variables, or those selected by a predicate (if) |
These take practice to get used to, but make analysis easier to debug and less prone to error when used effectively
2.3.1 Example: apply()
## ── Attaching core tidyverse packages ──────────────────
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.3 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ───────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# For Cars93 data again
Cars93 <- MASS::Cars93
# For the clean survey data:
survey <- read.csv("data/survey_data2020.csv",
header=TRUE, stringsAsFactors = FALSE)
fake.data <- matrix(rnorm(500), ncol=5) # create fake 100 x 5 data set
head(fake.data, 5)
## [,1] [,2] [,3] [,4] [,5]
## [1,] -0.9686 1.06400 -0.6629 -0.913 -0.75931
## [2,] -0.6975 -0.04018 1.3387 1.482 -0.05027
## [3,] -0.1386 0.11574 -0.7495 1.250 0.76470
## [4,] -0.8709 0.26972 1.9563 1.749 -2.07938
## [5,] -0.9126 0.65895 0.4919 -1.764 0.30991
## [1] -0.09517 0.05196 -0.05322 -0.01535 0.07402
## [1] -0.09517 0.05196 -0.05322 -0.01535 0.07402
# Function that calculates proportion of vector indexes that are > 0
propPositive <- function(x) mean(x > 0)
apply(fake.data, MARGIN=2, FUN=propPositive)
## [1] 0.46 0.57 0.40 0.50 0.55
2.3.2 Example: map, map_()
## Program PriorExp Rexperience
## 1 PPM Some experience Never used
## 2 Other Extensive experience Basic competence
## 3 MISM Never programmed before Basic competence
## 4 PPM Never programmed before Never used
## 5 PPM Never programmed before Never used
## OperatingSystem TVhours Editor
## 1 Windows 10.5 Other
## 2 Mac OS X 3.0 Microsoft Word
## 3 Windows 0.0 Microsoft Word
## 4 Windows 10.0 Excel
## 5 Windows 4.0 Microsoft Word
## $Program
## [1] FALSE
##
## $PriorExp
## [1] FALSE
##
## $Rexperience
## [1] FALSE
##
## $OperatingSystem
## [1] FALSE
##
## $TVhours
## [1] TRUE
##
## $Editor
## [1] FALSE
## Program PriorExp Rexperience
## FALSE FALSE FALSE
## OperatingSystem TVhours Editor
## FALSE TRUE FALSE
2.3.3 Example: apply(), map(), map_()
## speed dist
## 15.40 42.98
## $speed
## [1] 15.4
##
## $dist
## [1] 42.98
## speed dist
## 15.40 42.98
2.3.4 Example: mutate_if
Let’s convert all factor variables in Cars93 to lowercase
## [1] Small Midsize Compact Midsize Midsize Midsize
## Levels: Compact Large Midsize Small Sporty Van
## [1] "small" "midsize" "compact" "midsize" "midsize"
## [6] "midsize"
- Note: this has the effect of producing a copy of the
Cars93
data where all of the factor variables have been replaced with versions containing lowercase values
2.3.5 Example: mutate_if, adding instead of replacing columns
If you pass the functions in as a list with named elements, those names get appended to create modified versions of variables instead of replacing existing variables
## [1] Small Midsize Compact Midsize Midsize Midsize
## Levels: Compact Large Midsize Small Sporty Van
## [1] "small" "midsize" "compact" "midsize" "midsize"
## [6] "midsize"
2.3.6 Example: mutate_at
Let’s convert from MPG to KPML but this time using mutate_at
Cars93.metric <- Cars93 %>%
mutate_at(c("MPG.city", "MPG.highway"),
list(KMPL = ~ 0.425 * .x))
tail(colnames(Cars93.metric))
## [1] "Luggage.room" "Weight"
## [3] "Origin" "Make"
## [5] "MPG.city_KMPL" "MPG.highway_KMPL"
Here, ~ 0.425 * .x
is an example of specifying a “lambda” (anonymous) function. It is permitted short-hand for
2.3.7 Example: summarize_if
Let’s get the mean of every numeric column in Cars93
## Min.Price Price Max.Price MPG.city MPG.highway
## 1 17.13 19.51 21.9 22.37 29.09
## EngineSize Horsepower RPM Rev.per.mile
## 1 2.668 143.8 5281 2332
## Fuel.tank.capacity Passengers Length Wheelbase Width
## 1 16.66 5.086 183.2 103.9 69.38
## Turn.circle Rear.seat.room Luggage.room Weight
## 1 38.96 NA NA 3073
## Min.Price_mean Price_mean Max.Price_mean
## 1 17.13 19.51 21.9
## MPG.city_mean MPG.highway_mean EngineSize_mean
## 1 22.37 29.09 2.668
## Horsepower_mean RPM_mean Rev.per.mile_mean
## 1 143.8 5281 2332
## Fuel.tank.capacity_mean Passengers_mean Length_mean
## 1 16.66 5.086 183.2
## Wheelbase_mean Width_mean Turn.circle_mean
## 1 103.9 69.38 38.96
## Rear.seat.room_mean Luggage.room_mean Weight_mean
## 1 27.83 13.89 3073
2.4 Another approach
We’ll learn about a bunch of select helper functions like contains()
and starts_with()
.
Here’s one way of performing the previous operation with the help of these functions, and appending _mean
to the resulting output.
## # A tibble: 6 × 3
## Type MPG.city_mean MPG.highway_mean
## <fct> <dbl> <dbl>
## 1 Compact 22.7 29.9
## 2 Large 18.4 26.7
## 3 Midsize 19.5 26.7
## 4 Small 29.9 35.5
## 5 Sporty 21.8 28.8
## 6 Van 17 21.9
2.5 More than one grouping variable
## # A tibble: 6 × 4
## # Groups: Origin [2]
## Origin AirBags MPG.city_mean MPG.highway_mean
## <fct> <fct> <dbl> <dbl>
## 1 USA Driver & Pass… 19 27.2
## 2 USA Driver only 20.2 27.5
## 3 USA None 23.1 29.6
## 4 non-USA Driver & Pass… 20.3 27
## 5 non-USA Driver only 23.2 29.4
## 6 non-USA None 25.9 32