Session-2 Control Structures and Loops

2.1 Control Structures

Also known as Conditional statements, are primarily represented by the if-else construct, allow R programmers to execute specific code blocks based on conditions. For instance, you can use an if statement to check if a variable meets a certain criterion and perform different actions accordingly:

2.1.1 If-else :

x <- 10
if (x > 5) {
  print("x is greater than 5")
} else {
  print("x is not greater than 5")
}
## [1] "x is greater than 5"

In this example, the if statement evaluates whether x is greater than 5 and prints the appropriate message.

num1=10
num2=20

if(num1<=num2){
  print("Num1 is less or equal to Num2")
}
## [1] "Num1 is less or equal to Num2"

In this example, the if statement evaluates num1 is less than equal to num2 and prints the appropriate message.

x <- -5
if(x > 0){
  print("Non-negative number")
} else {
  print("Negative number")
}
## [1] "Negative number"

In this example, the if statement evaluates if x is a positive or negative number and prints the appropriate message.

2.1.2 Switch case

# Define the day_number variable
day_number <- 3

# Use a switch statement to determine the day of the week
day <- switch(
  day_number,
  "Sunday",
  "Monday",
  "Tuesday",
  "Wednesday",
  "Thursday",
  "Friday",
  "Saturday"
)

cat("It's", day, "\n")
## It's Tuesday

Hands on exercise:

Problem Statement: Using the above control structures try building a simple calculator:

Instructions :

  • Define a Menu: In your R script, define a menu to display the available operations (addition, subtraction, multiplication, division) to the user. You can use cat to print messages to the console.
  • Get User Input: Use readline to get the user’s choice of operation and the numbers for the calculation.
  • Perform Calculations: Depending on the user’s choice, perform the corresponding calculation. You can use if-else or switch statements.
  • Display the Result: Use cat to display the result of the calculation.
Hint
# Simple Calculator without Functions

# Function to display the menu and get user input

add <- function(x, y) {
  return(x + y)
}
subtract <- function(x, y) {
  return(x - y)
}
multiply <- function(x, y) {
  return(x * y)
}
divide <- function(x, y) {
  return(x / y)
}
# take input from the user
print("Select operation.")
print("1.Add")
print("2.Subtract")
print("3.Multiply")
print("4.Divide")
choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: "))
num1 = as.integer(readline(prompt="Enter first number: "))
num2 = as.integer(readline(prompt="Enter second number: "))
operator <- switch(choice,"+","-","*","/")
result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2), divide(num1, num2))
print(paste(num1, operator, num2, "=", result))

2.2 Loops in R

Loops are invaluable when you need to execute a block of code repeatedly. R offers several types of loops, with the most common being the for and while loops.

2.2.1 For loop

A for loop allows you to iterate over a sequence or a collection, executing code for each element. For example, you can use a for loop to go through a vector and perform operations on its elements:

Example - 1:

my_vector <- c(1, 2, 3, 4, 5)
for (i in my_vector) {
  print(i * 2)
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10

This loop multiplies each element in my_vector by 2 and prints the results.

Example - 2:

n <- 100
k <- 0
for (i in 1:n)
  k = k+i

cat("The Sum of the first",n,"numbers", "is", k, "\n")
## The Sum of the first 100 numbers is 5050

Hands on exercise:

Problem Statement-1: You are given a vector of numbers, and you need to calculate the sum of the squares of these numbers using a for loop. Your task is to write R code to perform this calculation.

You can use the following vector as an example:

numbers <- c(2, 4, 6, 8, 10)
Hint:
  • Initialize a variable, let’s call it sum_of_squares, to store the cumulative sum of the squares.
  • Use a for loop to iterate through each element in the vector.
  • Inside the loop, square the current number and add it to the sum_of_squares variable.
  • After the loop completes, print the sum_of_squares value, which should be the sum of the squares of all the numbers in the vector.
# Create a vector of numbers (you can replace this with your own vector)
numbers <- c(2, 4, 6, 8, 10)

# Initialize a variable to store the sum of squares
sum_of_squares <- 0

# Use a for loop to iterate through each element in the vector
for (num in numbers) {
  # Square the current number and add it to the sum_of_squares variable
  sum_of_squares <- sum_of_squares + (num^2)
}

# Print the result
sum_of_squares
## [1] 220

Problem Statement-2: Given a number find its factorial.

You can use the following vector as an example:

number <- 5
Hint:
  • Define the number for which to calculate factorial
  • Initialize a variable to store the factorial
  • Use a for loop to calculate the factorial
  • After the loop completes, print the factorial value
# Define the number for which to calculate factorial
n <- 5

# Initialize a variable to store the factorial
factorial_result <- 1

# Use a for loop to calculate the factorial
for (i in 1:n) {
  factorial_result <- factorial_result * i
}

# Print the result
cat("The factorial of", n, "is", factorial_result, "\n")
## The factorial of 5 is 120

Problem Statement-3: You are given a number ‘n’, Generate Fibonacci Series of the first ‘n’ numbers

You can use the following vector as an example:

number <- 10
Hint:
  • Define the length of the Fibonacci series
  • Initialize variables for the first two numbers in the series
  • Use a for loop to generate the Fibonacci series
  • Print the Fibonacci series.
# Define the length of the Fibonacci series
number <- 10

# Initialize variables for the first two numbers in the series
fibonacci <- numeric(number)
fibonacci[1] <- 0
fibonacci[2] <- 1

# Use a for loop to generate the Fibonacci series
for (i in 3:number) {
  fibonacci[i] <- fibonacci[i - 1] + fibonacci[i - 2]
}

# Print the Fibonacci series
cat("Fibonacci Series (first", number, "numbers):", fibonacci, "\n")
## Fibonacci Series (first 10 numbers): 0 1 1 2 3 5 8 13 21 34

2.2.2 While Loop

A while loop continues execution as long as a specified condition remains true. It’s suitable when the number of iterations isn’t known in advance. Be cautious with while loops to avoid infinite loops that can crash your program. Here’s an example of a while loop:

count <- 1
while (count <= 5) {
  print(paste("Iteration:", count))
  count <- count + 1
}
## [1] "Iteration: 1"
## [1] "Iteration: 2"
## [1] "Iteration: 3"
## [1] "Iteration: 4"
## [1] "Iteration: 5"

This loop prints the message “Iteration: X” for each value of count from 1 to 5.

Problem Statement-1: You are given a number ‘n’, create a countdown timer using a while loop to count down from a specified number to 1

You can use the following vector as an example:

number <- 10
Hint:

Instructions : - Define the initial countdown value. - Use a while loop to create a countdown timer. - Display the time remaining in seconds. - Pause for 1 second between each countdown. - Decrement the countdown value until it reaches 1.

# Define the initial countdown value
countdown <- 10

# Use a while loop to create a countdown timer
while (countdown >= 1) {
  cat("Time remaining:", countdown, "seconds\n")
  Sys.sleep(1)  # Pause for 1 second
  countdown <- countdown - 1
}
## Time remaining: 10 seconds
## Time remaining: 9 seconds
## Time remaining: 8 seconds
## Time remaining: 7 seconds
## Time remaining: 6 seconds
## Time remaining: 5 seconds
## Time remaining: 4 seconds
## Time remaining: 3 seconds
## Time remaining: 2 seconds
## Time remaining: 1 seconds
cat("Countdown complete!\n")
## Countdown complete!

Problem Statement-2: You are given a number ‘n’, create a countdown timer using a while loop to count down from a specified number to 1 Calculate the sum of numbers from 1 to a specified limit using a while loop.

Hint:

Instructions:

  • Define the limit for the sum.
  • Initialize variables for the sum result and the current number.
  • Use a while loop to calculate the sum.
  • Add the current number to the sum result.
  • Increment the current number until it reaches the limit.
# Define the limit for the sum
limit <- 100

# Initialize variables
sum_result <- 0
current_number <- 1

# Use a while loop to calculate the sum
while (current_number <= limit) {
  sum_result <- sum_result + current_number
  current_number <- current_number + 1
}

cat("The sum of numbers from 1 to", limit, "is", sum_result, "\n")
## The sum of numbers from 1 to 100 is 5050

Problem Statement-3: You are required to validate the user input if it is a valid number or not. You can use inbuilt R functions as.numeric, readline, is.na

Hint:

Use a while loop to repeatedly prompt the user for input until they enter a valid number.

Instructions:

  • Initialize a variable to store user input.
  • Use a while loop for input validation.
  • Prompt the user to enter a number.
  • Check if the input is a valid numeric value.
  • Display an error message for invalid input and continue the loop.
  • Exit the loop when valid input is provided.
# Initialize a variable to store user input
user_input <- NULL

# Use a while loop to validate user input
while (is.null(user_input)) {
  user_input <- as.numeric(readline("Enter a number: "))
  if (is.na(user_input)) {
    cat("Invalid input. Please enter a valid number.\n")
    user_input <- NULL  # Reset user_input to continue the loop
  }
}

cat("You entered:", user_input, "\n")

Control Structures and Data Analysis In data analysis with R, control structures and loops are essential. You can use them to filter, transform, and manipulate data frames, lists, and vectors. For instance, you can create a loop to process multiple data files, apply conditional operations to clean data, or iterate through rows and columns to perform calculations.

# Example: Calculating the mean of columns in a data frame
data <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
means <- numeric(length(data))
for (i in 1:ncol(data)) {
  means[i] <- mean(data[, i])
}
print(means)
## [1] 2 5

In this example, a for loop calculates the mean of each column in a data frame, providing valuable insights into the dataset.

2.3 Loop alternatives

Command Description
apply(X, MARGIN, FUN) Obtain a vector/array/list by applying FUN along the specified MARGIN of an array or matrix X
map(.x, .f, ...) Obtain a list by applying .f to every element of a list or atomic vector .x
map_<type>(.x, .f, ...) For <type> given by lgl (logical), int (integer), dbl (double) or chr (character), return a vector of this type obtained by applying .f to each element of .x
map_at(.x, .at, .f) Obtain a list by applying .f to the elements of .x specified by name or index given in .at
map_if(.x, .p, .f) Obtain a list .f to the elements of .x specified by .p (a predicate function, or a logical vector)
mutate_all/_at/_if Mutate all variables, specified (at) variables, or those selected by a predicate (if)
summarize_all/_at/_if Summarize all variables, specified variables, or those selected by a predicate (if)

These take practice to get used to, but make analysis easier to debug and less prone to error when used effectively

2.3.1 Example: apply()

# Our favourite library
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ───────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# For Cars93 data again
Cars93 <- MASS::Cars93 

# For the clean survey data:
survey <- read.csv("data/survey_data2020.csv", 
                   header=TRUE, stringsAsFactors = FALSE)

fake.data <- matrix(rnorm(500), ncol=5) # create fake 100 x 5 data set

head(fake.data, 5)
##         [,1]     [,2]    [,3]   [,4]     [,5]
## [1,] -0.9686  1.06400 -0.6629 -0.913 -0.75931
## [2,] -0.6975 -0.04018  1.3387  1.482 -0.05027
## [3,] -0.1386  0.11574 -0.7495  1.250  0.76470
## [4,] -0.8709  0.26972  1.9563  1.749 -2.07938
## [5,] -0.9126  0.65895  0.4919 -1.764  0.30991
colMeans(fake.data)
## [1] -0.09517  0.05196 -0.05322 -0.01535  0.07402
apply(fake.data, MARGIN=2, FUN=mean) # MARGIN = 1 for rows, 2 for columns
## [1] -0.09517  0.05196 -0.05322 -0.01535  0.07402
# Function that calculates proportion of vector indexes that are > 0
propPositive <- function(x) mean(x > 0)
apply(fake.data, MARGIN=2, FUN=propPositive) 
## [1] 0.46 0.57 0.40 0.50 0.55

2.3.2 Example: map, map_()

head(survey,5)
##   Program                PriorExp      Rexperience
## 1     PPM         Some experience       Never used
## 2   Other    Extensive experience Basic competence
## 3    MISM Never programmed before Basic competence
## 4     PPM Never programmed before       Never used
## 5     PPM Never programmed before       Never used
##   OperatingSystem TVhours         Editor
## 1         Windows    10.5          Other
## 2        Mac OS X     3.0 Microsoft Word
## 3         Windows     0.0 Microsoft Word
## 4         Windows    10.0          Excel
## 5         Windows     4.0 Microsoft Word
map(survey, is.numeric) # Returns a list
## $Program
## [1] FALSE
## 
## $PriorExp
## [1] FALSE
## 
## $Rexperience
## [1] FALSE
## 
## $OperatingSystem
## [1] FALSE
## 
## $TVhours
## [1] TRUE
## 
## $Editor
## [1] FALSE
map_lgl(survey, is.numeric) # Returns a logical vector with named elements
##         Program        PriorExp     Rexperience 
##           FALSE           FALSE           FALSE 
## OperatingSystem         TVhours          Editor 
##           FALSE            TRUE           FALSE

2.3.3 Example: apply(), map(), map_()

apply(cars, 2, FUN=mean) # Data frames are arrays
## speed  dist 
## 15.40 42.98
map(cars, mean) # Data frames are also lists
## $speed
## [1] 15.4
## 
## $dist
## [1] 42.98
map_dbl(cars, mean) # map output as a double vector
## speed  dist 
## 15.40 42.98

2.3.4 Example: mutate_if

Let’s convert all factor variables in Cars93 to lowercase

head(Cars93$Type)
## [1] Small   Midsize Compact Midsize Midsize Midsize
## Levels: Compact Large Midsize Small Sporty Van
Cars93.lower <- mutate_if(Cars93, is.factor, tolower)
head(Cars93.lower$Type)
## [1] "small"   "midsize" "compact" "midsize" "midsize"
## [6] "midsize"
  • Note: this has the effect of producing a copy of the Cars93 data where all of the factor variables have been replaced with versions containing lowercase values

2.3.5 Example: mutate_if, adding instead of replacing columns

If you pass the functions in as a list with named elements, those names get appended to create modified versions of variables instead of replacing existing variables

Cars93.lower <- mutate_if(Cars93, is.factor, list(lower = tolower))
head(Cars93.lower$Type)
## [1] Small   Midsize Compact Midsize Midsize Midsize
## Levels: Compact Large Midsize Small Sporty Van
head(Cars93.lower$Type_lower)
## [1] "small"   "midsize" "compact" "midsize" "midsize"
## [6] "midsize"

2.3.6 Example: mutate_at

Let’s convert from MPG to KPML but this time using mutate_at

Cars93.metric <- Cars93 %>% 
  mutate_at(c("MPG.city", "MPG.highway"), 
            list(KMPL = ~ 0.425 * .x))
tail(colnames(Cars93.metric))
## [1] "Luggage.room"     "Weight"          
## [3] "Origin"           "Make"            
## [5] "MPG.city_KMPL"    "MPG.highway_KMPL"

Here, ~ 0.425 * .x is an example of specifying a “lambda” (anonymous) function. It is permitted short-hand for

function(.x){0.425 * .x}

2.3.7 Example: summarize_if

Let’s get the mean of every numeric column in Cars93

Cars93 %>% summarize_if(is.numeric, mean)
##   Min.Price Price Max.Price MPG.city MPG.highway
## 1     17.13 19.51      21.9    22.37       29.09
##   EngineSize Horsepower  RPM Rev.per.mile
## 1      2.668      143.8 5281         2332
##   Fuel.tank.capacity Passengers Length Wheelbase Width
## 1              16.66      5.086  183.2     103.9 69.38
##   Turn.circle Rear.seat.room Luggage.room Weight
## 1       38.96             NA           NA   3073
Cars93 %>% summarize_if(is.numeric, list(mean = mean), na.rm=TRUE)
##   Min.Price_mean Price_mean Max.Price_mean
## 1          17.13      19.51           21.9
##   MPG.city_mean MPG.highway_mean EngineSize_mean
## 1         22.37            29.09           2.668
##   Horsepower_mean RPM_mean Rev.per.mile_mean
## 1           143.8     5281              2332
##   Fuel.tank.capacity_mean Passengers_mean Length_mean
## 1                   16.66           5.086       183.2
##   Wheelbase_mean Width_mean Turn.circle_mean
## 1          103.9      69.38            38.96
##   Rear.seat.room_mean Luggage.room_mean Weight_mean
## 1               27.83             13.89        3073

2.3.8 Example: summarize_at

Let’s get the average fuel economy of all vehicles, grouped by their Type

Cars93 %>%
  group_by(Type) %>%
  summarize_at(c("MPG.city", "MPG.highway"), mean)
## # A tibble: 6 × 3
##   Type    MPG.city MPG.highway
##   <fct>      <dbl>       <dbl>
## 1 Compact     22.7        29.9
## 2 Large       18.4        26.7
## 3 Midsize     19.5        26.7
## 4 Small       29.9        35.5
## 5 Sporty      21.8        28.8
## 6 Van         17          21.9

2.4 Another approach

We’ll learn about a bunch of select helper functions like contains() and starts_with().

Here’s one way of performing the previous operation with the help of these functions, and appending _mean to the resulting output.

Cars93 %>%
  group_by(Type) %>%
  summarize_at(vars(contains("MPG")), list(mean = mean))
## # A tibble: 6 × 3
##   Type    MPG.city_mean MPG.highway_mean
##   <fct>           <dbl>            <dbl>
## 1 Compact          22.7             29.9
## 2 Large            18.4             26.7
## 3 Midsize          19.5             26.7
## 4 Small            29.9             35.5
## 5 Sporty           21.8             28.8
## 6 Van              17               21.9

2.5 More than one grouping variable

Cars93 %>%
  group_by(Origin, AirBags) %>%
  summarize_at(vars(contains("MPG")), list(mean = mean))
## # A tibble: 6 × 4
## # Groups:   Origin [2]
##   Origin  AirBags        MPG.city_mean MPG.highway_mean
##   <fct>   <fct>                  <dbl>            <dbl>
## 1 USA     Driver & Pass…          19               27.2
## 2 USA     Driver only             20.2             27.5
## 3 USA     None                    23.1             29.6
## 4 non-USA Driver & Pass…          20.3             27  
## 5 non-USA Driver only             23.2             29.4
## 6 non-USA None                    25.9             32