Day 1

Subqueries & Nested Queries

Queries Within Queries - The Foundation of Complex Analysis

What is a Subquery?

Subquery: A subquery (also called an inner query or nested query) is a SQL query embedded inside another query. The inner query executes first, and its result is used by the outer query.

Think of a subquery as asking a question to get an answer, then using that answer to ask another question. For example: "What's the average salary?" (subquery) then "Show me all employees who earn more than that average" (outer query).

Why Use Subqueries?

Dynamic filtering: Filter based on calculated values, not hardcoded numbers
Multi-step analysis: Break complex problems into logical steps
Data comparison: Compare rows against aggregated results
Existence checks: Verify if related records exist

Types of Subqueries

Type	Returns	Used With	Example Use Case
Scalar	Single value	=, >, <, >=, <=	Find products priced above average
Column	List of values	IN, NOT IN, ANY, ALL	Find customers who ordered specific products
Table	Result set (rows & columns)	FROM clause, JOINs	Join with aggregated summary data

Scalar Subqueries

A scalar subquery returns exactly one value (one row, one column). It can be used anywhere a single value is expected.

Example: Find Products Above Average Price

-- Find all products priced above the average
SELECT product_name, price
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);

-- The subquery calculates: AVG(price) = 250.00
-- The outer query then finds: WHERE price > 250.00

The subquery (SELECT AVG(price) FROM products) executes first, returning a single number. The outer query then uses this value to filter the results.

Example: Compare to Department Average

-- Find employees earning more than their company's average salary
SELECT employee_name, salary, department
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
)
ORDER BY salary DESC;

Column Subqueries (List of Values)

A column subquery returns multiple values (multiple rows, one column). Use with IN, NOT IN, ANY, or ALL operators.

Example: Find Customers Who Made Purchases

-- Find customers who have placed at least one order
SELECT customer_name, email, city
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
);

-- The subquery returns: (1, 3, 5, 7, 12, ...)
-- Outer query finds customers with those IDs

Example: Find Products Never Ordered

-- Find products that have never been ordered
SELECT product_id, product_name, price
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
WHERE product_id IS NOT NULL
);

Watch Out: NULL Values with NOT IN

If the subquery returns any NULL values, NOT IN will return no results! Always add WHERE column IS NOT NULL in your subquery when using NOT IN.

Table Subqueries (Derived Tables)

A table subquery returns a complete result set that can be used in the FROM clause like a regular table. It must have an alias.

Example: Join with Aggregated Data

-- Compare each product's price to its category average
SELECT
p.product_name,
p.category,
p.price,
cat_avg.avg_price AS category_avg,
p.price - cat_avg.avg_price AS difference
FROM products p
JOIN (
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
) AS cat_avg ON p.category = cat_avg.category
ORDER BY difference DESC;

Subqueries in Different Clauses

In the SELECT Clause

-- Show each product with the total count of products in its category
SELECT
product_name,
category,
price,
(SELECT COUNT(*)
FROM products p2
WHERE p2.category = p1.category) AS products_in_category
FROM products p1;

In the FROM Clause

-- Get statistics from aggregated data
SELECT
AVG(order_total) AS avg_order_value,
MAX(order_total) AS max_order_value,
MIN(order_total) AS min_order_value
FROM (
SELECT order_id, SUM(quantity * unit_price) AS order_total
FROM order_items
GROUP BY order_id
) AS order_totals;

In the WHERE Clause

-- Find orders placed on the busiest day
SELECT *
FROM orders
WHERE DATE(order_date) = (
SELECT DATE(order_date)
FROM orders
GROUP BY DATE(order_date)
ORDER BY COUNT(*) DESC
LIMIT 1
);

EXISTS and NOT EXISTS

EXISTS checks if a subquery returns any rows. It's often more efficient than IN for large datasets.

Example: Customers With Orders (Using EXISTS)

-- Find customers who have placed orders (efficient method)
SELECT customer_name, email
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);

-- EXISTS stops as soon as it finds one matching row
-- More efficient than IN for large tables

Example: Customers Without Orders (Using NOT EXISTS)

-- Find customers who have never placed an order
SELECT customer_name, email, signup_date
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);

                        EXISTS vs IN: When to Use Which?
                        EXISTS: Better for large subquery results, stops at first match
IN: Better for small, static lists; easier to read
NOT EXISTS: Handles NULLs correctly, preferred over NOT IN

                    

Day 1 Key Takeaways

Subqueries let you use the result of one query inside another
Scalar subqueries return one value; column subqueries return a list
Table subqueries (derived tables) return full result sets
Use EXISTS for efficient existence checks on large tables
Be careful with NULL values when using NOT IN

Day 2

Correlated Subqueries

Row-by-Row Processing for Advanced Comparisons

What is a Correlated Subquery?

Correlated Subquery: A subquery that references columns from the outer query. Unlike regular subqueries that execute once, correlated subqueries execute once for each row in the outer query.

Think of it this way: A regular subquery answers a question once (e.g., "What's the average salary?"). A correlated subquery answers a question for each row (e.g., "For THIS employee, what's the average salary in THEIR department?").

Regular vs Correlated Subquery

Aspect	Regular Subquery	Correlated Subquery
Execution	Runs once, result cached	Runs once per outer row
References outer query?	No	Yes
Performance	Generally faster	Can be slower on large tables
Use case	Compare to a fixed value	Row-specific comparisons

Side-by-Side Comparison

-- REGULAR SUBQUERY: Compare to company average
SELECT employee_name, department, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees  -- Runs ONCE
);

-- CORRELATED SUBQUERY: Compare to department average
SELECT employee_name, department, salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department  -- References outer query!
);

How Correlated Subqueries Work

For each row in the outer query:

The outer query passes the current row's values to the subquery
The subquery executes using those values
The result is used to evaluate the current row
Move to the next row and repeat

Example: Employees Above Department Average

-- Find employees who earn more than their department's average
SELECT
employee_name,
department,
salary,
(SELECT ROUND(AVG(salary), 2)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department
)
ORDER BY department, salary DESC;

Result:

employee_name	department	salary	dept_avg
Priya	Engineering	95,000	78,000
Rahul	Engineering	88,000	78,000
Amit	Sales	72,000	62,000

Finding Latest/First Records Per Group

A common use case for correlated subqueries is finding the most recent (or first) record for each group.

Example: Latest Order for Each Customer

-- Get the most recent order for each customer
SELECT
c.customer_name,
o.order_id,
o.order_date,
o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (
SELECT MAX(order_date)
FROM orders o2
WHERE o2.customer_id = c.customer_id
)
ORDER BY o.order_date DESC;

Example: Top Product in Each Category

-- Find the highest-priced product in each category
SELECT product_name, category, price
FROM products p1
WHERE price = (
SELECT MAX(price)
FROM products p2
WHERE p2.category = p1.category
)
ORDER BY category;

Using ANY and ALL

ANY and ALL operators compare a value against a set of values returned by a subquery.

Operator	Meaning	Example
= ANY	Equal to at least one value (same as IN)	salary = ANY (subquery)
> ANY	Greater than the minimum value	salary > ANY (subquery)
> ALL	Greater than the maximum value	salary > ALL (subquery)
< ALL	Less than the minimum value	price < ALL (subquery)

Example: Using ALL

-- Find products more expensive than ALL products in 'Accessories' category
SELECT product_name, category, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Accessories'
);

-- This is equivalent to:
-- WHERE price > (SELECT MAX(price) FROM products WHERE category = 'Accessories')

Performance Considerations

Correlated Subqueries Can Be Slow

Since correlated subqueries execute once per row, they can be slow on large tables. For a table with 1 million rows, the subquery runs 1 million times!

When to Use Alternatives

Window functions: Often faster for ranking and comparing to group aggregates
JOINs with derived tables: Calculate aggregates once, then join
CTEs: More readable alternative that may optimize better

We'll learn these alternatives in the coming days!

Rewriting with a JOIN (Often Faster)

-- Instead of correlated subquery:
SELECT e1.employee_name, e1.department, e1.salary
FROM employees e1
JOIN (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept_avgs ON e1.department = dept_avgs.department
WHERE e1.salary > dept_avgs.avg_salary;

-- This calculates department averages ONCE, then joins

Day 2 Key Takeaways

Correlated subqueries reference the outer query and run for each row
Use them for row-specific comparisons (e.g., compare to group average)
Great for finding latest/first records per group
ANY and ALL operators compare against sets of values
Consider JOINs or window functions for better performance

Day 3

Window Functions - Part 1

Ranking Functions & The Power of OVER()

What are Window Functions?

Window Function: A function that performs calculations across a set of rows related to the current row (the "window"), without collapsing them into a single output row like GROUP BY does.

Window functions are one of the most powerful features in SQL. They let you perform calculations across rows while keeping all individual rows in your result. This is perfect for rankings, running totals, and comparisons.

Window Functions vs GROUP BY

Aspect	GROUP BY	Window Functions
Rows returned	One row per group	All original rows
Access to detail	Loses individual row data	Keeps all row data
Use case	Summaries, totals	Rankings, running totals, comparisons

GROUP BY vs Window Function Comparison

-- GROUP BY: Returns 3 rows (one per department)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

-- Result:
-- Engineering | 78000
-- Sales       | 62000
-- Marketing   | 55000

-- WINDOW FUNCTION: Returns ALL rows with avg added
SELECT
employee_name,
department,
salary,
AVG(salary) OVER(PARTITION BY department) AS dept_avg
FROM employees;

-- Result:
-- Rahul  | Engineering | 88000 | 78000
-- Priya  | Engineering | 95000 | 78000
-- Vikram | Engineering | 51000 | 78000
-- Amit   | Sales       | 72000 | 62000
-- ...

The OVER() Clause

The OVER() clause defines the "window" - which rows the function should consider for its calculation.

Syntax:
function_name() OVER(
[PARTITION BY column1, column2, ...]
[ORDER BY column3, column4, ...]
)

PARTITION BY: Divides rows into groups (like GROUP BY, but keeps all rows)
ORDER BY: Determines the order of rows within each partition

Ranking Functions

Ranking functions assign a rank to each row based on specified ordering.

Function	Behavior	Ties	Example Output
ROW_NUMBER()	Unique sequential number	No gaps, arbitrary tie-breaking	1, 2, 3, 4, 5
RANK()	Rank with gaps	Same rank for ties, skips next	1, 2, 2, 4, 5
DENSE_RANK()	Rank without gaps	Same rank for ties, no skip	1, 2, 2, 3, 4
NTILE(n)	Divides into n equal groups	N/A	1, 1, 2, 2, 3, 3

Example: Comparing Ranking Functions

SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER(ORDER BY salary DESC) AS row_num,
RANK() OVER(ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER(ORDER BY salary DESC) AS dense_rank
FROM employees;

Result (notice how ties are handled):

employee_name	salary	row_num	rank	dense_rank
Priya	95,000	1	1	1
Rahul	88,000	2	2	2
Amit	88,000	3	2	2
Sneha	75,000	4	4	3
Vikram	65,000	5	5	4

PARTITION BY: Ranking Within Groups

Use PARTITION BY to reset rankings for each group.

Example: Rank Employees Within Each Department

SELECT
employee_name,
department,
salary,
RANK() OVER(
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees
ORDER BY department, dept_rank;

Result:

employee_name	department	salary	dept_rank
Priya	Engineering	95,000	1
Rahul	Engineering	88,000	2
Vikram	Engineering	51,000	3
Amit	Sales	72,000	1
Sneha	Sales	65,000	2

NTILE: Creating Percentile Groups

NTILE divides rows into a specified number of roughly equal groups (quartiles, deciles, etc.).

Example: Divide Customers into Quartiles by Purchase Value

SELECT
customer_name,
total_purchases,
NTILE(4) OVER(ORDER BY total_purchases DESC) AS quartile,
CASE NTILE(4) OVER(ORDER BY total_purchases DESC)
WHEN 1 THEN 'Premium'
WHEN 2 THEN 'Gold'
WHEN 3 THEN 'Silver'
ELSE 'Bronze'
END AS customer_tier
FROM customer_summary
ORDER BY total_purchases DESC;

Practical Example: Top 3 Products Per Category

Finding Top N Per Group

-- Get top 3 selling products in each category
SELECT *
FROM (
SELECT
product_name,
category,
total_sales,
ROW_NUMBER() OVER(
PARTITION BY category
ORDER BY total_sales DESC
) AS sales_rank
FROM products
) ranked
WHERE sales_rank <= 3
ORDER BY category, sales_rank;

Why Use a Subquery?

You cannot use window functions directly in a WHERE clause. The window function runs after WHERE. So we wrap it in a subquery to filter on the calculated rank.

Day 3 Key Takeaways

Window functions perform calculations across rows without collapsing them
OVER() defines the window; PARTITION BY creates groups; ORDER BY determines order
ROW_NUMBER gives unique numbers; RANK and DENSE_RANK handle ties differently
NTILE divides rows into equal groups (great for percentiles)
To filter on window function results, use a subquery

Day 4

Window Functions - Part 2

Aggregate Functions, Running Totals & Row Navigation

Aggregate Window Functions

You can use familiar aggregate functions (SUM, AVG, COUNT, MIN, MAX) as window functions to calculate aggregates while keeping all rows.

Example: Compare Each Sale to Department Average

SELECT
employee_name,
department,
sale_amount,
AVG(sale_amount) OVER(PARTITION BY department) AS dept_avg,
sale_amount - AVG(sale_amount) OVER(PARTITION BY department) AS diff_from_avg,
ROUND(sale_amount * 100.0 / SUM(sale_amount) OVER(PARTITION BY department), 1) AS pct_of_dept
FROM sales;

Running Totals (Cumulative Sum)

Adding ORDER BY to an aggregate window function creates a running (cumulative) calculation.

Example: Running Total of Daily Sales

SELECT
sale_date,
daily_sales,
SUM(daily_sales) OVER(ORDER BY sale_date) AS running_total,
AVG(daily_sales) OVER(ORDER BY sale_date) AS running_avg
FROM daily_sales
ORDER BY sale_date;

Result:

sale_date	daily_sales	running_total	running_avg
2024-01-01	1,500	1,500	1,500.00
2024-01-02	2,200	3,700	1,850.00
2024-01-03	1,800	5,500	1,833.33
2024-01-04	3,100	8,600	2,150.00

LAG and LEAD: Accessing Adjacent Rows

LAG looks at previous rows; LEAD looks at following rows. Essential for period-over-period comparisons!

Syntax:
LAG(column, offset, default) OVER(ORDER BY ...)
LEAD(column, offset, default) OVER(ORDER BY ...)

- offset: How many rows back/forward (default: 1)
- default: Value if no row exists (default: NULL)

Example: Month-over-Month Sales Growth

SELECT
month,
monthly_sales,
LAG(monthly_sales, 1) OVER(ORDER BY month) AS prev_month_sales,
monthly_sales - LAG(monthly_sales, 1) OVER(ORDER BY month) AS growth,
ROUND(
(monthly_sales - LAG(monthly_sales, 1) OVER(ORDER BY month)) * 100.0
/ LAG(monthly_sales, 1) OVER(ORDER BY month), 1
) AS growth_pct
FROM monthly_sales;

Result:

month	monthly_sales	prev_month	growth	growth_pct
2024-01	45,000	NULL	NULL	NULL
2024-02	52,000	45,000	7,000	15.6%
2024-03	48,000	52,000	-4,000	-7.7%
2024-04	61,000	48,000	13,000	27.1%

Example: Year-over-Year Comparison

SELECT
year,
quarter,
revenue,
LAG(revenue, 4) OVER(ORDER BY year, quarter) AS same_quarter_last_year,
ROUND(
(revenue - LAG(revenue, 4) OVER(ORDER BY year, quarter)) * 100.0
/ LAG(revenue, 4) OVER(ORDER BY year, quarter), 1
) AS yoy_growth_pct
FROM quarterly_revenue;

FIRST_VALUE and LAST_VALUE

Get the first or last value in a window partition.

Example: Compare Each Sale to First and Best Sale

SELECT
sale_date,
product_name,
sale_amount,
FIRST_VALUE(sale_amount) OVER(
PARTITION BY product_name
ORDER BY sale_date
) AS first_sale,
MAX(sale_amount) OVER(
PARTITION BY product_name
) AS best_sale,
sale_amount - FIRST_VALUE(sale_amount) OVER(
PARTITION BY product_name
ORDER BY sale_date
) AS diff_from_first
FROM sales;

Frame Specifications: ROWS vs RANGE

Frame specifications let you define exactly which rows to include in the window calculation.

Syntax:
ROWS BETWEEN start AND end

Options for start/end:
- UNBOUNDED PRECEDING (all rows from beginning)
- n PRECEDING (n rows before current)
- CURRENT ROW
- n FOLLOWING (n rows after current)
- UNBOUNDED FOLLOWING (all rows to end)

Example: 7-Day Moving Average

SELECT
sale_date,
daily_sales,
ROUND(AVG(daily_sales) OVER(
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;

Example: Running Total vs Full Total

SELECT
sale_date,
daily_sales,
-- Running total (default frame when ORDER BY is present)
SUM(daily_sales) OVER(
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
-- Total of all rows
SUM(daily_sales) OVER() AS grand_total,
-- Percentage of total
ROUND(daily_sales * 100.0 / SUM(daily_sales) OVER(), 1) AS pct_of_total
FROM daily_sales;

                        Common Window Frame Patterns
                        Running total: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Moving average: ROWS BETWEEN n PRECEDING AND CURRENT ROW
Centered average: ROWS BETWEEN n PRECEDING AND n FOLLOWING
Entire partition: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

                    

Day 4 Key Takeaways

Aggregate functions with OVER() calculate across windows while keeping rows
Adding ORDER BY creates running/cumulative calculations
LAG looks back; LEAD looks forward - essential for period comparisons
Frame specifications (ROWS BETWEEN) control exactly which rows to include
Moving averages smooth out fluctuations in time-series data

Day 5

Common Table Expressions (CTEs)

Write Readable, Maintainable, and Reusable Queries

What is a CTE?

Common Table Expression (CTE): A temporary named result set defined within a query using the WITH clause. It exists only for the duration of that query and makes complex queries more readable.

Think of CTEs as giving a name to a subquery that you can reference later. Instead of nesting queries inside queries (hard to read), you define named "building blocks" at the top, then use them below.

Basic Syntax:
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;

CTE vs Subquery: Readability Comparison

Nested Subquery (Hard to Read)

SELECT
department,
avg_salary,
company_avg,
avg_salary - company_avg AS diff
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept_salaries
CROSS JOIN (
SELECT AVG(salary) AS company_avg
FROM employees
) company;

Same Query with CTEs (Easy to Read)

WITH dept_salaries AS (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
),
company_avg AS (
SELECT AVG(salary) AS avg_salary
FROM employees
)
SELECT
d.department,
d.avg_salary,
c.avg_salary AS company_avg,
d.avg_salary - c.avg_salary AS diff
FROM dept_salaries d
CROSS JOIN company_avg c;

Benefits of CTEs

Readability: Name your logic, making queries self-documenting
Reusability: Reference the same CTE multiple times in one query
Maintainability: Easier to modify and debug
Recursion: CTEs support recursive queries (hierarchies)

Multiple CTEs

You can define multiple CTEs separated by commas, and later CTEs can reference earlier ones.

Example: Multi-Step Sales Analysis

WITH
-- Step 1: Calculate monthly sales
monthly_sales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(total_amount) AS revenue
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
-- Step 2: Add previous month for comparison
sales_with_prev AS (
SELECT
month,
revenue,
LAG(revenue) OVER(ORDER BY month) AS prev_month_revenue
FROM monthly_sales
),
-- Step 3: Calculate growth metrics
growth_analysis AS (
SELECT
month,
revenue,
prev_month_revenue,
revenue - prev_month_revenue AS growth,
ROUND((revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 1) AS growth_pct
FROM sales_with_prev
)
-- Final: Select with classifications
SELECT
month,
revenue,
growth,
growth_pct,
CASE
WHEN growth_pct > 20 THEN 'Strong Growth'
WHEN growth_pct > 0 THEN 'Moderate Growth'
WHEN growth_pct > -10 THEN 'Slight Decline'
ELSE 'Significant Decline'
END AS performance
FROM growth_analysis
WHERE prev_month_revenue IS NOT NULL;

Recursive CTEs

Recursive CTEs are powerful for working with hierarchical data like org charts, category trees, or bill of materials.

Example: Employee Hierarchy

WITH RECURSIVE org_chart AS (
-- Base case: Start with the CEO (no manager)
SELECT
employee_id,
employee_name,
manager_id,
1 AS level,
employee_name AS path
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive case: Find employees who report to previous level
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
o.level + 1,
CONCAT(o.path, ' -> ', e.employee_name)
FROM employees e
JOIN org_chart o ON e.manager_id = o.employee_id
)
SELECT
REPEAT('  ', level - 1) || employee_name AS hierarchy,
level,
path
FROM org_chart
ORDER BY path;

Result:

hierarchy	level	path
Rajesh (CEO)	1	Rajesh (CEO)
Priya (VP Engineering)	2	Rajesh (CEO) -> Priya
Rahul (Manager)	3	Rajesh (CEO) -> Priya -> Rahul
Amit (Manager)	3	Rajesh (CEO) -> Priya -> Amit
Sneha (VP Sales)	2	Rajesh (CEO) -> Sneha

CTEs vs Subqueries vs Temp Tables

Feature	CTE	Subquery	Temp Table
Scope	Single query	Single location	Entire session
Reusable in query	Yes	No (must repeat)	Yes
Supports recursion	Yes	No	No
Can be indexed	No	No	Yes
Best for	Readability, recursion	Simple, one-time use	Large data, multiple queries

Day 5 Key Takeaways

CTEs make complex queries readable by naming intermediate steps
Use WITH clause before your main SELECT statement
Multiple CTEs can be chained, with later ones referencing earlier ones
Recursive CTEs solve hierarchical data problems elegantly
CTEs are preferred over deeply nested subqueries for maintainability

Day 6

Query Optimization & Best Practices

Writing Efficient, Production-Ready SQL

Why Query Performance Matters

A query that runs in 30 seconds on your test database might take 30 minutes on production with millions of rows. Understanding optimization is essential for any serious data work.

Understanding Query Execution

Most SQL databases process queries in this order:

FROM / JOIN: Identify and combine tables
WHERE: Filter rows
GROUP BY: Aggregate rows
HAVING: Filter aggregated results
SELECT: Choose columns and calculate expressions
DISTINCT: Remove duplicates
ORDER BY: Sort results
LIMIT: Restrict output rows

EXPLAIN: Understanding Query Plans

Use EXPLAIN to see how the database plans to execute your query.

EXPLAIN SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2024-01-01';

                        Key Things to Look For in EXPLAIN
                        Table scan: Reading entire table (slow for large tables)
Index usage: Using index for fast lookups (good)
Rows examined: How many rows the database checks
Join type: How tables are being joined

                    

Indexing Basics

Index: A data structure that improves the speed of data retrieval. Think of it like a book's index - instead of reading every page, you look up keywords to find specific pages quickly.

Column Type	Should Be Indexed?	Why
Primary Key	Automatic	Created automatically
Foreign Keys	Yes	Used frequently in JOINs
WHERE clause columns	Often yes	Speeds up filtering
ORDER BY columns	Sometimes	Speeds up sorting
Columns with few unique values	Usually no	Index won't help much

Common Performance Pitfalls

1. SELECT * (Selecting All Columns)

-- BAD: Fetches all columns, even if you only need a few
SELECT * FROM orders;

-- GOOD: Only fetch what you need
SELECT order_id, customer_id, order_date FROM orders;

2. Functions on Indexed Columns in WHERE

-- BAD: Function prevents index usage
SELECT * FROM orders
WHERE YEAR(order_date) = 2024;

-- GOOD: Compare directly to enable index usage
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01';

3. Using OR Instead of IN

-- Less efficient
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Clothing' OR category = 'Books';

-- More efficient and readable
SELECT * FROM products
WHERE category IN ('Electronics', 'Clothing', 'Books');

4. Not Using LIMIT During Development

-- When testing on large tables, always limit results first
SELECT * FROM orders
WHERE total_amount > 1000
LIMIT 100;  -- Test with small sample first!

SQL Best Practices

                        Writing Clean, Maintainable SQL
                        Use meaningful aliases: customers c not customers x
Format consistently: Keywords uppercase, columns lowercase
Comment complex logic: Explain why, not what
One column per line: Makes diffs and reviews easier
Use CTEs over nested subqueries: More readable

                    

Well-Formatted Production Query

/*
* Monthly Sales Report with YoY Comparison
* Author: Data Team
* Last Updated: 2024-01-15
*/
WITH monthly_revenue AS (
-- Calculate revenue by month
SELECT
DATE_FORMAT(order_date, '%Y-%m-01') AS month_start,
SUM(total_amount) AS revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM orders
WHERE order_status = 'completed'
GROUP BY DATE_FORMAT(order_date, '%Y-%m-01')
)
SELECT
month_start,
revenue,
unique_customers,
LAG(revenue, 12) OVER(ORDER BY month_start) AS revenue_ly,
ROUND(
(revenue - LAG(revenue, 12) OVER(ORDER BY month_start))
* 100.0 / NULLIF(LAG(revenue, 12) OVER(ORDER BY month_start), 0),
1
) AS yoy_growth_pct
FROM monthly_revenue
ORDER BY month_start DESC
LIMIT 24;  -- Last 2 years

Day 6 Key Takeaways

Use EXPLAIN to understand how your queries execute
Index columns used in WHERE, JOIN, and ORDER BY clauses
Avoid SELECT * - only fetch columns you need
Don't apply functions to indexed columns in WHERE clauses
Write clean, commented, well-formatted SQL for maintainability

Day 7

Week 3 Project & Assessment

Putting It All Together

Project: Business Analytics SQL Report

Create a comprehensive analytics report for a retail company using all the advanced SQL techniques you've learned this week.

Project Requirements

Using the e-commerce database from Week 2, write queries to answer the following business questions:

Task 1: Customer Segmentation (Window Functions)

Segment customers into tiers based on their total purchase value:

Use NTILE to divide customers into 4 quartiles
Label them as Premium, Gold, Silver, Bronze
Show the count and average purchase value for each tier

Task 2: Sales Trend Analysis (Running Totals)

Create a monthly sales summary with:

Monthly revenue
Running total (year-to-date)
Month-over-month growth percentage
3-month moving average

Task 3: Product Performance Ranking (RANK Functions)

For each product category:

Rank products by total revenue
Show top 3 products per category
Include each product's percentage of category revenue

Task 4: Year-over-Year Comparison (LAG)

Compare this year's quarterly performance to last year:

Revenue by quarter
Same quarter last year revenue
YoY growth percentage
Classify as "Growth", "Stable", or "Decline"

Task 5: Complex Analytics Query (Multiple CTEs)

Build a comprehensive customer analysis using multiple CTEs:

CTE 1: Customer purchase summary
CTE 2: Customer recency (days since last order)
CTE 3: Customer ranking by value
Final: Combine into RFM-style segmentation

Assessment Criteria

Component	Weight	Evaluation
Query Correctness	40%	Results are accurate
Technique Usage	30%	Appropriate use of advanced SQL
Code Quality	20%	Clean, formatted, commented
Efficiency	10%	No obvious performance issues

Starter Template

/*
* Week 3 Project: Business Analytics Report
* Student Name: [Your Name]
* Date: [Date]
*/

-- Task 1: Customer Segmentation
WITH customer_totals AS (
-- Your code here
)
SELECT ...;

-- Task 2: Sales Trend Analysis
-- Your code here

-- Continue for remaining tasks...

Week 3 Complete!

Congratulations! You've mastered advanced SQL techniques that are used daily by data professionals worldwide. These skills will set you apart in job interviews and enable you to tackle complex analytical challenges.

Next Week: Data Visualization - learn to communicate your insights effectively!

Self-Assessment Quiz

Test your understanding of this week's concepts. Try to answer without looking back!

Question 1: Subqueries

What type of subquery returns a single value that can be compared using =, >, or

Show Answer

Scalar Subquery - Returns exactly one row and one column (a single value).

Question 2: Correlated vs Regular

A correlated subquery executes ___ time(s), while a regular subquery executes ___ time(s).

Show Answer

Once per row / once total - Correlated subqueries reference the outer query and must execute for each row.

Question 3: Ranking Functions

If two employees have the same salary and are ranked, RANK() gives them both 2. What does RANK() assign to the next person?

Show Answer

4 - RANK() skips numbers after ties. DENSE_RANK() would give 3.

Question 4: Window Functions

Which function would you use to access the previous row's value in a window?

Show Answer

LAG() - LAG looks at previous rows; LEAD looks at following rows.

Question 5: CTEs

What keyword is used to start a Common Table Expression?

Show Answer

WITH - CTEs are defined using WITH cte_name AS (query).

Question 6: Optimization

Why is WHERE YEAR(date_col) = 2024 slower than WHERE date_col >= '2024-01-01'?

Show Answer

Applying a function to a column prevents the database from using an index on that column. The second form allows index usage.

Question 7: Frame Specification

To calculate a 7-day moving average, which frame specification would you use?

Show Answer

ROWS BETWEEN 6 PRECEDING AND CURRENT ROW - This includes the current row plus the 6 previous rows (7 total).

                        Scoring
                        7 correct: Excellent! You've mastered advanced SQL
5-6 correct: Good understanding, review weak areas
3-4 correct: Review the day's content for missed topics
Below 3: Consider re-reading the week's content

                    

About the Instructor

Pawan Rama Mali

Data Analytics professional and educator with extensive experience in SQL, database management, and teaching complex analytical concepts to beginners. Passionate about making data skills accessible to everyone and helping students build practical, job-ready expertise.

Specializations:

SQL & Database Design
Data Analytics & Business Intelligence
R Programming & R Shiny Dashboards
Python for Data Analysis

Visit Instructor's Portfolio

Week 3: Advanced SQL & Analytics Queries

What You'll Learn This Week

Subqueries & Nested Queries

What is a Subquery?

Why Use Subqueries?

Types of Subqueries

Scalar Subqueries

Example: Find Products Above Average Price

Example: Compare to Department Average

Column Subqueries (List of Values)

Example: Find Customers Who Made Purchases

Example: Find Products Never Ordered

Watch Out: NULL Values with NOT IN

Table Subqueries (Derived Tables)

Example: Join with Aggregated Data

Subqueries in Different Clauses

In the SELECT Clause

In the FROM Clause

In the WHERE Clause

EXISTS and NOT EXISTS

Example: Customers With Orders (Using EXISTS)

Example: Customers Without Orders (Using NOT EXISTS)

EXISTS vs IN: When to Use Which?

Day 1 Key Takeaways

Correlated Subqueries

What is a Correlated Subquery?

Regular vs Correlated Subquery

Side-by-Side Comparison

How Correlated Subqueries Work

Example: Employees Above Department Average

Finding Latest/First Records Per Group

Example: Latest Order for Each Customer

Example: Top Product in Each Category

Using ANY and ALL

Example: Using ALL

Performance Considerations

Correlated Subqueries Can Be Slow

When to Use Alternatives

Rewriting with a JOIN (Often Faster)

Day 2 Key Takeaways

Window Functions - Part 1

What are Window Functions?

Window Functions vs GROUP BY

GROUP BY vs Window Function Comparison

The OVER() Clause

Ranking Functions

Example: Comparing Ranking Functions

PARTITION BY: Ranking Within Groups

Example: Rank Employees Within Each Department

NTILE: Creating Percentile Groups

Example: Divide Customers into Quartiles by Purchase Value

Practical Example: Top 3 Products Per Category

Finding Top N Per Group

Why Use a Subquery?

Day 3 Key Takeaways

Window Functions - Part 2

Aggregate Window Functions

Example: Compare Each Sale to Department Average

Running Totals (Cumulative Sum)

Example: Running Total of Daily Sales

LAG and LEAD: Accessing Adjacent Rows

Example: Month-over-Month Sales Growth

Example: Year-over-Year Comparison

FIRST_VALUE and LAST_VALUE

Example: Compare Each Sale to First and Best Sale

Frame Specifications: ROWS vs RANGE

Example: 7-Day Moving Average

Example: Running Total vs Full Total

Common Window Frame Patterns

Day 4 Key Takeaways

Common Table Expressions (CTEs)

What is a CTE?

CTE vs Subquery: Readability Comparison

Nested Subquery (Hard to Read)

Same Query with CTEs (Easy to Read)

Benefits of CTEs

Multiple CTEs

Example: Multi-Step Sales Analysis

Recursive CTEs

Example: Employee Hierarchy