If you have finished Weeks 1-5 of EDUSHARK, you can already analyse data with Excel and SQL, and you understand the statistics behind it. This week, we add the third leg of the analyst's tripod: a real programming language built for data. R is free, open-source, used in pharma, banks and research labs everywhere, and it pairs with R Shiny - the dashboarding tool we will build in Weeks 7-8.
This guide is written for absolute beginners. No prior programming required - we start with arithmetic and end with a complete data-wrangling pipeline.
What You'll Learn This Week
R, RStudio and Hello World
Set up the tool you will use for the next three weeks
Installing R and RStudio
R is the engine; RStudio is the comfortable cockpit you drive it from. Download both (both are free):
- Go to cran.r-project.org, click your operating system, install R.
- Go to posit.co/download/rstudio-desktop and install the free RStudio Desktop.
- Open RStudio. You will see four panes.
- Script (top-left): where you write code you want to keep.
- Console (bottom-left): where R executes commands.
- Environment / History (top-right): what variables you have created.
- Files / Plots / Help (bottom-right): file browser, charts, documentation.
Your first R commands
<- is R's preferred assignment operator. The keyboard shortcut in RStudio is Alt + -. Use it religiously - = works but is reserved for function arguments by convention.
Data Types and Structures
The five shapes of R data you'll see every day
The atomic types
R recognises a handful of basic types: numeric (any real number), integer, character (text), logical (TRUE/FALSE), and a couple of rarely used ones (complex, raw). Use typeof(x) or class(x) to check.
Vectors - R's workhorse
A vector is an ordered collection of values of the same type. Almost everything in R is a vector under the hood.
Data frames - the table
A data frame is a table: rows are observations, columns are variables. This is what you read from CSV files and what most analyses revolve around.
Lists and factors
A list can hold elements of different types and lengths - think of it as the JSON of R. A factor stores a categorical variable with a fixed set of levels, and optionally an order - always use it for ordered categories like ratings or T-shirt sizes.
Control Flow and Functions
Make your code reusable in 90 seconds
dplyr or apply version that is shorter, cleaner and 10x faster. Loops are for control flow, not for data manipulation.
The Tidyverse and the Pipe
One philosophy, eight packages, infinite analyses
The tidyverse is a collection of R packages that share a consistent philosophy. Install once and forget:
%>%) takes the result on its left and passes it as the first argument of the function on its right. It turns nested function calls into a readable, left-to-right recipe.
Reading data
Data Manipulation with dplyr
Five verbs that run half your job
| Verb | What it does |
|---|---|
select() | Keep or drop specific columns |
filter() | Keep rows that match a condition |
mutate() | Create or modify columns |
arrange() | Sort rows ascending/descending |
summarise() | Collapse a group into one row |
group_by() | Set the grouping for subsequent verbs |
Reshaping with tidyr and Joins
Long, wide and joined
Joins
The dplyr joins map directly to SQL: inner_join(), left_join(), right_join(), full_join(), plus the useful semi_join() (keep matching rows from x) and anti_join() (keep non-matching rows from x).
Project: Data Wrangling Pipeline
Build a complete pipeline using everything from this week
Build a single R script (wrangle.R) that:
- Reads four CSV files (orders, items, customers, products) from
/data/week6/. - Joins them into one tidy table.
- Cleans column names with
janitor::clean_names(). - Handles missing prices by imputing the category median.
- Pivots to a monthly category-revenue table (one row per category, one column per month).
- Writes the final tidy file to
data/analysis_ready.csv.
A complete starter script is provided in /data/week6-r-wrangling-starter.R - download it, walk through line-by-line, and adapt to your dataset.