Session-1 Getting Started with R
In the world of data analysis and programming, everything boils down to the application of functions to data. Understanding how to manipulate and work with different types of data is fundamental to mastering the art of programming.
1.1 Functions and Data
- Data: Data can take various forms, from simple numbers like 4 or “four” to complex structures like matrices or even mathematical expressions.
4, “four”, 4.000, \(\left[ \begin{array}{ccc} 4 & 4 & 4 \\ 4 & 4 & 4\end{array}\right]\)
- Functions: Functions are the tools we use to process data. They can be as basic as addition or as intricate as logarithms, and they follow specific rules to transform input data into output, possibly with side effects.
\(\log{}\), \(+\) (two arguments), \(<\) (two), \(\mod{}\) (two), mean
(one)
A function acts like a machine, taking input objects (arguments) and producing an output object (return value), all according to a predefined rule.
1.2 Types of Data
As you delve deeper into programming, you’ll encounter different types of data:
Booleans: These are direct binary values, often represented as
TRUE
orFALSE
in R.Integers: Whole numbers, including both positive and negative values, as well as zero.
Characters: These are fixed-length blocks of bits with special encoding. They are the building blocks of strings, which are sequences of characters.
Floating Point Numbers: These are numbers represented as a fraction (with a finite number of bits) times an exponent, like \(1.87 \times {10}^{6}\).
Missing or Ill-Defined Values: Programming languages provide special values like
NA
andNaN
to represent missing or undefined data.
Understanding the intricacies of these data types is crucial for effective programming and data analysis. So, let’s dive in and explore the world of functions and data!
1.3 R as calculator
R is a versatile programming language that can be used as a powerful calculator. Its ability to perform basic arithmetic operations and more advanced mathematical calculations makes it a handy tool for quick calculations and data manipulation.
You can use R as a very, very fancy calculator
Command | Description |
---|---|
+,-,*,\ |
add, subtract, multiply, divide |
^ |
raise to the power of |
%% |
remainder after division (ex: 8 %% 3 = 2 ) |
( ) |
change the order of operations |
log(), exp() |
logarithms and exponents (ex: log(10) = 2.302 ) |
sqrt() |
square root |
round() |
round to the nearest whole number (ex: round(2.3) = 2 ) |
floor(), ceiling() |
round down or round up |
abs() |
absolute value |
Here are few examples of using R as a calculator:
1.3.8 Comparisons:
Are binary operators; they take two objects, like numbers, and give a Boolean
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] FALSE
1.3.9 Boolean operators:
Basically “and” and “or”:
## [1] FALSE
## [1] TRUE
(will see special doubled forms, &&
and ||
, later)
1.3.10 More types
typeof()
function returns the typeis.
foo()
functions return Booleans for whether the argument is of type fooas.
foo()
(tries to) “cast” its argument to type foo — to translate it sensibly into a foo-type value
Special case: as.factor()
will be important later for telling R when numbers are actually encodings and not numeric values. (E.g., 1 = High school grad; 2 = College grad; 3 = Postgrad)
## [1] "double"
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE
1.4 Variables and Data Types
Variables are fundamental in programming, serving as containers for storing and manipulating data. R supports various data types, including numeric, character, and logical types.
We can give names to data objects; these give us variables
A few built variables are:
## [1] 3.142
Variables can be arguments to functions or operators, just like constants:
## [1] 31.42
## [1] -1
1.4.1 Numeric
Numeric variables store numeric values like integers and decimals.
Examples:
## [1] 25
## [1] 98.6
1.5 Basic Operators in R
In R, you can use various operators to perform different types of operations, including arithmetic, comparison, and logical operations.
1.5.1 Arithmetic Operators
These operators perform basic mathematical calculations:
## [1] 8
## [1] 3
## [1] 24
## [1] 4
## [1] 8
1.6 Assignment operator
Most variables are created with the assignment operator, <-
or =
## [1] 12
## [1] 30
The assignment operator also changes values:
## [1] 30
## [1] 45
1.7 Pipe operator
The %>%
operator in R is part of the magrittr package and is commonly referred to as the “pipe” operator. It is used to chain together multiple operations or functions in a way that enhances code readability and conciseness, particularly when working with data manipulation and transformation tasks.
Here’s how the %>%
operator works:
It takes the result of the expression on its left-hand side and passes it as the first argument to the function on its right-hand side. It can be used to chain together a series of operations, allowing you to perform a sequence of actions on a data frame or other objects. It eliminates the need for nested function calls, making code more linear and easier to understand.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Chain together operations on a data frame
result <- data.frame(x = 1:10, y = 11:20) %>%
filter(x > 5) %>%
mutate(z = x + y) %>%
select(x, z)
# The result will contain the filtered and mutated data frame
result
## x z
## 1 6 22
## 2 7 24
## 3 8 26
## 4 9 28
## 5 10 30
In this example, the %>%
operator is used to filter rows, create a new column, and select specific columns in a data frame, making the code more readable and structured. It is a valuable tool for improving the clarity of data manipulation pipelines in R.
1.8 Creating and indexing vectors
Creating and Indexing Vectors in R
In R, a vector is a fundamental data structure that stores a collection of values of the same data type. You can create vectors using various methods and access their elements through indexing.
Examples:
Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read
Avoid “magic constants”; use named variables
Use descriptive variable names
- Good:
num.students <- 35
- Bad:
ns <- 35
- Good:
1.9 The workspace
What names have you defined values for?
## [1] "age" "gender"
## [3] "githubs" "has_license"
## [5] "is_student" "name"
## [7] "result" "result_add"
## [9] "result_and" "result_div"
## [11] "result_equal" "result_exp"
## [13] "result_greater_than" "result_less_equal"
## [15] "result_mul" "result_not"
## [17] "result_not_equal" "result_or"
## [19] "result_sub" "temperature"
## [21] "time.factor" "time.in.months"
## [23] "time.in.years"
Getting rid of variables:
## [1] "age" "gender"
## [3] "githubs" "has_license"
## [5] "is_student" "name"
## [7] "result" "result_add"
## [9] "result_and" "result_div"
## [11] "result_equal" "result_exp"
## [13] "result_greater_than" "result_less_equal"
## [15] "result_mul" "result_not"
## [17] "result_not_equal" "result_or"
## [19] "result_sub" "temperature"
## [21] "time.factor" "time.in.years"
Using names and variables makes code: easier to design, easier to debug, less prone to bugs, easier to improve, and easier for others to read
Avoid using constants or hard coded values instead use named variables
Use descriptive variable names
- Good:
num.students <- 35
- Bad:
ns <- 35
- Good:
1.10 Vectors
Group related data values into one object, a data structure
A vector is a sequence of values, all of the same type
c()
function returns a vector containing all its arguments in order
1.10.1 Creating Vectors
## [1] 1 2 3 4 5
# 2. Creating a character vector
character_vector <- c("apple", "banana", "cherry")
character_vector
## [1] "apple" "banana" "cherry"
## [1] TRUE FALSE TRUE
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 0 0 0 0 0
1.10.2 Indexing vectors
## [1] 1
## [1] 2 4
# 8. Indexing using logical condition
filtered_vector <- numeric_vector[numeric_vector > 3]
filtered_vector
## [1] 4 5
# 9. Named vector elements
names(numeric_vector) <- c("one", "two", "three", "four", "five")
numeric_vector
## one two three four five
## 1 2 3 4 5
## two
## 2
1.10.3 Vector arithmetic
Operators apply to vectors “pairwise” or “elementwise”:
students <- c("SaiKumar", "Aditi", "Akshay", "Arun", "Deepika")
final <- c(87, 45, 98, 80, 75) # Final exam scores
midterm <- c(25, 28, 26, 28, 25)# Midterm exam scores
midterm + final # Sum of midterm and final scores
## [1] 112 73 124 108 100
## [1] 56.0 36.5 62.0 54.0 50.0
## [1] 62.2 38.2 69.2 59.2 55.0
1.10.4 Pairwise comparisons
Is the final score higher than the midterm score?
## [1] 25 28 26 28 25
## [1] 87 45 98 80 75
## [1] TRUE TRUE TRUE TRUE TRUE
Boolean operators can be applied elementwise:
## [1] FALSE FALSE FALSE FALSE FALSE
1.10.5 Functions on vectors
Command | Description |
---|---|
sum(vec) |
sums up all the elements of vec |
mean(vec) |
mean of vec |
median(vec) |
median of vec |
min(vec), max(vec) |
the largest or smallest element of vec |
sd(vec), var(vec) |
the standard deviation and variance of vec |
length(vec) |
the number of elements in vec |
pmax(vec1, vec2), pmin(vec1, vec2) |
example: pmax(quiz1, quiz2) returns the higher of quiz 1 and quiz 2 for each student |
sort(vec) |
returns the vec in sorted order |
order(vec) |
returns the index that sorts the vector vec |
unique(vec) |
lists the unique elements of vec |
summary(vec) |
gives a five-number summary |
any(vec), all(vec) |
useful on Boolean vectors |
1.10.6 Functions on vectors
## [1] 62.2 38.2 69.2 59.2 55.0
## [1] 56.76
## [1] 59.2
## [1] 11.6
## [1] 38.2 55.0 59.2 62.2 69.2
## [1] 69.2
## [1] 38.2
1.10.7 Referencing elements of vectors
## [1] "SaiKumar" "Aditi" "Akshay" "Arun"
## [5] "Deepika"
Vector of indices:
## [1] "Aditi" "Arun"
Vector of negative indices : Excludes the elements at specified indices
## [1] "Aditi" "Arun" "Deepika"
which()
returns the TRUE
indexes of a Boolean vector:
## [1] 62.2 38.2 69.2 59.2 55.0
## [1] FALSE FALSE FALSE FALSE FALSE
## integer(0)
## character(0)
1.10.8 Named components
You can give names to elements or components of vectors
## [1] "SaiKumar" "Aditi" "Akshay" "Arun"
## [5] "Deepika"
## [1] "SaiKumar" "Aditi" "Akshay" "Arun"
## [5] "Deepika"
## Aditi Akshay Arun <NA>
## 38.2 69.2 59.2 NA
Note the labels in what R prints; these are not actually part of the value