Welcome to Week 2! Now that you've built a solid foundation in data concepts and Excel, it's time to learn the most essential skill for any data analyst - SQL (Structured Query Language).
SQL is the universal language for communicating with databases. Whether you're working at a startup or a Fortune 500 company, SQL will be your primary tool for extracting, analyzing, and manipulating data. By the end of this week, you'll be writing queries that would have taken hours in Excel - but take just seconds in SQL.
Let's unlock the power of databases!
What You'll Learn This Week
Introduction to Databases
Understanding How Data is Stored and Organized
What is a Database?
Think of a database like a super-powered Excel workbook. While Excel can handle thousands of rows, databases can handle millions or even billions of records while still being fast and efficient.
Real-World Example: Your Favorite E-commerce Site
When you shop online, the website uses multiple databases to store:
- Product catalog: All products with names, descriptions, prices, images
- Customer information: Your name, address, email, order history
- Inventory: Stock levels for each product
- Orders: Every transaction, payment status, shipping details
- Reviews: Customer ratings and feedback
Types of Databases
| Type | Description | Examples | Best For |
|---|---|---|---|
| Relational (SQL) | Data stored in tables with rows and columns, linked by relationships | MySQL, PostgreSQL, SQL Server, Oracle | Structured data, business applications |
| NoSQL | Flexible schema, stores data in documents, key-value pairs, or graphs | MongoDB, Cassandra, Redis | Unstructured data, real-time apps |
| Data Warehouse | Optimized for analytics and reporting on large datasets | Snowflake, BigQuery, Redshift | Business intelligence, analytics |
Focus: Relational Databases
In this course, we focus on relational databases because they're the most common in business environments and essential for data analysts. Over 70% of enterprise data is stored in relational databases!
Relational Database Concepts
Tables
A table is like a spreadsheet - it has rows and columns. Each table stores data about one type of entity (customers, products, orders, etc.).
Example: Customers Table
| customer_id | first_name | last_name | city | |
|---|---|---|---|---|
| 1 | Rahul | Sharma | rahul@email.com | Mumbai |
| 2 | Priya | Patel | priya@email.com | Delhi |
| 3 | Amit | Kumar | amit@email.com | Bangalore |
Key Terminology
- Row (Record): A single entry in a table (e.g., one customer)
- Column (Field): A single attribute (e.g., email address)
- Primary Key: A unique identifier for each row (e.g., customer_id)
- Foreign Key: A column that links to another table's primary key
- Schema: The structure of the database (tables, columns, relationships)
Primary Keys and Foreign Keys
These are crucial for understanding how tables relate to each other:
Customers Table
Primary Key: customer_id uniquely identifies each customer
| customer_id (PK) | name |
|---|---|
| 1 | Rahul |
| 2 | Priya |
Orders Table
Foreign Key: customer_id links to the customers table
| order_id (PK) | customer_id (FK) | amount |
|---|---|---|
| 101 | 1 | 5000 |
| 102 | 1 | 3000 |
| 103 | 2 | 7500 |
Notice how customer_id in Orders references customer_id in Customers - this is the relationship!
Entity-Relationship (ER) Diagrams
ER diagrams visually show how tables relate to each other. Understanding these helps you write better queries.
Common Relationship Types
- One-to-One (1:1): Each record in Table A relates to exactly one record in Table B (e.g., User and User_Profile)
- One-to-Many (1:N): One record in Table A relates to many records in Table B (e.g., Customer to Orders)
- Many-to-Many (M:N): Multiple records in Table A relate to multiple records in Table B (e.g., Students and Courses)
Introduction to SQL
SQL allows you to:
- Query data: Retrieve information from tables
- Insert data: Add new records
- Update data: Modify existing records
- Delete data: Remove records
- Create structures: Design tables and databases
SQL Command Categories
| Category | Purpose | Commands |
|---|---|---|
| DQL (Data Query Language) | Retrieve data | SELECT |
| DML (Data Manipulation Language) | Modify data | INSERT, UPDATE, DELETE |
| DDL (Data Definition Language) | Define structure | CREATE, ALTER, DROP |
| DCL (Data Control Language) | Control access | GRANT, REVOKE |
For Data Analysts
As a data analyst, you'll primarily use SELECT statements (DQL) to query and analyze data. You typically won't need to create tables or modify data directly - that's usually handled by database administrators or data engineers.
Setting Up Your SQL Environment
To practice SQL, you'll need a database system. Here are the most popular options:
- MySQL + MySQL Workbench: Free, popular, great for beginners
- PostgreSQL + pgAdmin: Free, powerful, industry standard
- SQLite + DB Browser: Lightweight, no server needed, perfect for learning
- Online platforms: SQLZoo, Mode Analytics, W3Schools SQL Tryit
Recommended for Beginners
Start with MySQL Workbench or use online platforms like SQLZoo to practice without any installation.
Day 1 Key Takeaways
- A database is an organized collection of structured data
- Relational databases store data in tables linked by relationships
- Primary keys uniquely identify records; foreign keys link tables
- SQL is the standard language for querying relational databases
- As data analysts, we focus primarily on SELECT queries
Basic SELECT Queries
Your First Steps in Retrieving Data
Today you'll write your first SQL queries! The SELECT statement is the foundation of data analysis in SQL - it's how you retrieve data from tables.
Practice Database Setup
For this week's exercises, we'll use a sample e-commerce database with these tables:
- customers - Customer information
- products - Product catalog
- orders - Order transactions
- order_items - Items in each order
- categories - Product categories
The Basic SELECT Statement
SELECT column1, column2, ... FROM table_name;
Select All Columns
Use the asterisk (*) to select all columns from a table:
Result: Returns ALL columns and ALL rows from the customers table
| customer_id | first_name | last_name | city | |
|---|---|---|---|---|
| 1 | Rahul | Sharma | rahul@email.com | Mumbai |
| 2 | Priya | Patel | priya@email.com | Delhi |
Avoid SELECT * in Production
While SELECT * is convenient for exploring data, avoid it in production queries. It retrieves unnecessary columns, slows down queries, and can break if table structure changes. Always specify the columns you need.
Select Specific Columns
List only the columns you need:
Result: Only returns these three columns
| first_name | last_name | |
|---|---|---|
| Rahul | Sharma | rahul@email.com |
| Priya | Patel | priya@email.com |
Column Aliases with AS
Rename columns in your output for clarity using the AS keyword:
Result:
| First Name | Last Name | Email Address |
|---|---|---|
| Rahul | Sharma | rahul@email.com |
Pro Tip: Alias Syntax
The AS keyword is optional. You can also write: first_name 'First Name'. However, using AS makes your code more readable.
DISTINCT - Removing Duplicates
Use DISTINCT to get unique values only:
Result:
| city |
|---|
| Mumbai |
| Delhi |
| Bangalore |
LIMIT and OFFSET
Control how many rows are returned:
Pagination Example
To show results page by page (10 items per page):
- Page 1:
LIMIT 10 OFFSET 0 - Page 2:
LIMIT 10 OFFSET 10 - Page 3:
LIMIT 10 OFFSET 20
Comments in SQL
Good practice: Always comment your code!
Calculated Columns
You can perform calculations and create new columns in your SELECT:
Try It Yourself - Day 2 Exercises
Practice these queries on your sample database:
- Select all columns from the products table
- Select only product_name and price from products
- Get a list of all unique categories
- Select the first 5 products with aliases for column names
- Create a calculated column showing price with 18% GST added
Day 2 Key Takeaways
- SELECT retrieves data from tables; specify columns or use * for all
- Use AS to create meaningful column aliases
- DISTINCT removes duplicate values from results
- LIMIT controls how many rows to return; OFFSET skips rows
- You can perform calculations and create new columns in SELECT
Filtering Data with WHERE
Finding Exactly What You Need
Selecting all data is rarely useful. The real power of SQL comes from filtering - retrieving only the records that match specific criteria. The WHERE clause is your filtering tool.
The WHERE Clause
SELECT columns FROM table_name WHERE condition;
Comparison Operators
| Operator | Description | Example |
|---|---|---|
| = | Equal to | WHERE city = 'Mumbai' |
| <> or != | Not equal to | WHERE status <> 'Cancelled' |
| > | Greater than | WHERE price > 1000 |
| < | Less than | WHERE quantity < 10 |
| >= | Greater than or equal | WHERE rating >= 4 |
| <= | Less than or equal | WHERE age <= 30 |
Logical Operators: AND, OR, NOT
Combine multiple conditions using logical operators:
Watch Your Parentheses!
AND has higher precedence than OR. Use parentheses to ensure conditions are evaluated in the correct order. A OR B AND C is interpreted as A OR (B AND C), not (A OR B) AND C.
BETWEEN - Range Filtering
Use BETWEEN to filter values within a range (inclusive):
IN - Multiple Value Matching
Use IN to match against multiple values (cleaner than multiple ORs):
LIKE - Pattern Matching
Use LIKE for text pattern matching with wildcards:
| Wildcard | Meaning | Example |
|---|---|---|
| % | Any sequence of characters (including none) | 'A%' matches 'Apple', 'Air', 'A' |
| _ | Exactly one character | '_at' matches 'Cat', 'Bat', 'Hat' |
IS NULL and IS NOT NULL
Handle missing (NULL) values carefully:
Understanding NULL
NULL means "unknown" or "no value" - it's not the same as 0 or an empty string. You cannot use = or <> with NULL; you must use IS NULL or IS NOT NULL.
Try It Yourself - Day 3 Exercises
- Find all products priced between Rs. 500 and Rs. 2000
- Find customers from Mumbai, Delhi, or Bangalore
- Find products whose name contains "Wireless"
- Find orders placed in January 2024 that are still pending
- Find customers who haven't provided their phone number
Day 3 Key Takeaways
- WHERE filters rows based on specified conditions
- Use comparison operators (=, <>, >, <, >=, <=) for basic filtering
- Combine conditions with AND, OR, NOT; use parentheses for clarity
- BETWEEN filters ranges; IN matches multiple values; LIKE handles patterns
- Always use IS NULL / IS NOT NULL for NULL comparisons
Sorting & Aggregations
Organizing and Summarizing Data
Today we learn two powerful capabilities: sorting data with ORDER BY, and summarizing data with aggregate functions and GROUP BY. These are essential for creating meaningful reports and analytics.
ORDER BY - Sorting Results
SELECT columns FROM table ORDER BY column [ASC|DESC];
Aggregate Functions
Aggregate functions perform calculations across multiple rows and return a single value:
| Function | Description | Example |
|---|---|---|
| COUNT() | Counts number of rows | COUNT(*), COUNT(column) |
| SUM() | Adds up values | SUM(amount) |
| AVG() | Calculates average | AVG(price) |
| MIN() | Finds minimum value | MIN(price) |
| MAX() | Finds maximum value | MAX(price) |
GROUP BY - Grouping Data
GROUP BY divides data into groups and applies aggregate functions to each group:
SELECT column, AGGREGATE() FROM table GROUP BY column;
GROUP BY Rule
Every column in your SELECT that is NOT inside an aggregate function MUST appear in the GROUP BY clause. Otherwise, you'll get an error.
HAVING - Filtering Groups
HAVING filters groups after aggregation (WHERE filters rows before aggregation):
WHERE vs HAVING
| WHERE | HAVING |
|---|---|
| Filters individual rows | Filters grouped results |
| Applied BEFORE grouping | Applied AFTER grouping |
| Cannot use aggregate functions | Can use aggregate functions |
| WHERE price > 100 | HAVING SUM(price) > 100 |
Query Execution Order
Understanding the order helps you write correct queries:
- FROM - Identify the table(s)
- WHERE - Filter individual rows
- GROUP BY - Group the remaining rows
- HAVING - Filter the groups
- SELECT - Choose columns and calculate aggregates
- ORDER BY - Sort the results
- LIMIT - Limit the output
Try It Yourself - Day 4 Exercises
- Find the top 10 most expensive products (sorted by price descending)
- Calculate the total number of orders and total revenue
- Find total sales by region, sorted highest to lowest
- Count orders per customer, show only those with more than 5 orders
- Calculate monthly revenue for 2024, ordered chronologically
Day 4 Key Takeaways
- ORDER BY sorts results; use ASC (default) or DESC
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX) summarize data
- GROUP BY divides data into groups for aggregation
- HAVING filters groups (after aggregation); WHERE filters rows (before)
- Query execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
SQL Functions
Transform and Manipulate Your Data
SQL provides built-in functions to transform data within your queries. Today we'll cover string functions, numeric functions, date functions, and conditional logic.
String Functions
Manipulate text data:
Numeric Functions
Work with numbers:
Date Functions
Work with dates and times:
CASE WHEN - Conditional Logic
Create conditional columns (like IF in Excel):
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE default_result
END
COALESCE and IFNULL
Handle NULL values gracefully:
Try It Yourself - Day 5 Exercises
- Create full customer names by concatenating first and last name
- Extract the year and month from order dates
- Calculate days between order date and delivery date
- Create a price category column (Budget/Mid/Premium) using CASE
- Handle NULL phone numbers by replacing with 'Not Provided'
Day 5 Key Takeaways
- String functions: UPPER, LOWER, CONCAT, SUBSTRING, TRIM, REPLACE
- Numeric functions: ROUND, CEIL, FLOOR, ABS, MOD
- Date functions: YEAR, MONTH, DAY, DATEDIFF, DATE_ADD, DATE_FORMAT
- CASE WHEN creates conditional logic in queries
- COALESCE and IFNULL handle NULL values gracefully
Introduction to JOINs
Combining Data from Multiple Tables
Real-world databases have data spread across multiple related tables. JOINs allow you to combine data from these tables based on related columns. This is one of the most important SQL skills!
Why Do We Need JOINs?
Consider this scenario: You want to see order details including customer names. But:
- The orders table has order_id, customer_id, amount, date
- The customers table has customer_id, name, email, city
To get customer names with their orders, you need to JOIN these tables!
Types of JOINs
| JOIN Type | Description | Returns |
|---|---|---|
| INNER JOIN | Only matching records from both tables | Intersection |
| LEFT JOIN | All from left table + matches from right | Left + matching right |
| RIGHT JOIN | All from right table + matches from left | Right + matching left |
| FULL OUTER JOIN | All records from both tables | Everything (Union) |
INNER JOIN
Returns only rows that have matching values in both tables:
INNER JOIN Visual
If Customer A has orders but Customer B has no orders:
- Customer A's orders: Included
- Customer B: NOT included (no matching orders)
Only rows with matches in BOTH tables appear in results.
LEFT JOIN (LEFT OUTER JOIN)
Returns ALL rows from the left table, plus matching rows from the right table:
LEFT JOIN Use Cases
- Find customers who haven't ordered (NULL check)
- Find products that haven't sold
- Find employees without assigned tasks
- Any "find records without matches" scenario
RIGHT JOIN (RIGHT OUTER JOIN)
Returns ALL rows from the right table, plus matching rows from the left:
Joining Multiple Tables
You can join more than two tables:
JOIN with Aggregations
Combine JOINs with GROUP BY for powerful analysis:
Try It Yourself - Day 6 Exercises
- Join orders with customers to show customer names with each order
- Find all customers who have never placed an order
- Join orders, order_items, and products to see product names in orders
- Calculate total revenue by customer using JOINs and GROUP BY
- Find the top 5 selling products by total quantity
Day 6 Key Takeaways
- JOINs combine data from multiple related tables
- INNER JOIN returns only matching rows from both tables
- LEFT JOIN returns all from left table + matches from right
- Use table aliases (a, b, o, c) for cleaner, readable queries
- Combine JOINs with GROUP BY for powerful multi-table analysis
Week 2 Project & Assessment
E-commerce Database Analysis
Congratulations on reaching the final day of Week 2! Today, you'll apply everything you've learned to analyze a complete e-commerce database using SQL queries.
Project: E-commerce Database Analysis
Project Brief
You've been hired as a data analyst for an online retail company. The management team needs insights from the database to make strategic decisions. Write SQL queries to answer their business questions.
Download Practice Database
Use this SQL script to set up your practice database:
Download E-commerce Database (SQL)
5 tables | 1000+ records | Complete e-commerce schema
Database Schema
The database contains these tables:
- customers (customer_id, first_name, last_name, email, phone, city, state, join_date)
- products (product_id, product_name, category_id, price, stock_quantity)
- categories (category_id, category_name)
- orders (order_id, customer_id, order_date, status, total_amount)
- order_items (item_id, order_id, product_id, quantity, unit_price)
Project Requirements
Part 1: Basic Queries (20 points)
- List all unique cities where customers are located
- Find the 10 most expensive products
- Count total number of orders by status
- Find products with stock quantity less than 10
Part 2: Filtering & Aggregation (30 points)
- Calculate total revenue for each month in 2024
- Find the top 5 customers by total spending
- Calculate average order value by customer city
- List categories with more than Rs. 1,00,000 in total sales
- Find products that have never been ordered
Part 3: JOINs & Complex Queries (50 points)
- Create a sales report showing: order_id, customer_name, product_name, quantity, total_price
- Find the best-selling product in each category
- Calculate customer lifetime value (total spending per customer)
- Create a monthly revenue trend report with month-over-month growth
- Identify customers who made purchases in every month of 2024
Assessment Criteria
| Component | Weight | What's Evaluated |
|---|---|---|
| Query Correctness | 50% | Queries return accurate results |
| Query Efficiency | 20% | Using appropriate JOINs, avoiding redundancy |
| Code Quality | 15% | Formatting, comments, meaningful aliases |
| Quiz: SQL Concepts | 15% | Understanding of SQL fundamentals |
Tips for Success
- Start simple: Build complex queries step by step
- Test incrementally: Run parts of your query to verify each step
- Use aliases: Make your queries readable with meaningful table and column aliases
- Comment your code: Explain what each query does
- Check for NULLs: Consider edge cases with missing data
Week 2 Complete!
You've completed an intensive week covering:
- Database concepts and relational data models
- SELECT queries with filtering using WHERE
- Sorting with ORDER BY and aggregations with GROUP BY
- SQL functions for strings, numbers, dates, and conditional logic
- JOINs to combine data from multiple tables
You can now query any relational database!
Next week, you'll learn Advanced SQL techniques including subqueries, CTEs, and window functions!
Week 2 Self-Assessment Quiz
Test Your SQL Knowledge
Answer these questions to check your understanding of Week 2 SQL concepts.
Question 1: SELECT Basics
What does SELECT DISTINCT city FROM customers; return?
- All cities including duplicates
- Only unique city values (no duplicates)
- The first city in the table
- An error - DISTINCT is not valid
Show Answer
B. Only unique city values - DISTINCT removes duplicate values from the result set.
Question 2: WHERE Clause
Which operator finds values within a range (inclusive)?
- IN
- LIKE
- BETWEEN
- WITHIN
Show Answer
C. BETWEEN - BETWEEN x AND y returns values from x to y inclusive.
Question 3: NULL Values
How do you find rows where the phone column is empty (NULL)?
- WHERE phone = NULL
- WHERE phone IS NULL
- WHERE phone == NULL
- WHERE phone = ''
Show Answer
B. WHERE phone IS NULL - NULL cannot be compared using =; you must use IS NULL or IS NOT NULL.
Question 4: Aggregation
What's the difference between WHERE and HAVING?
- They are the same thing
- WHERE filters groups, HAVING filters rows
- WHERE filters rows before grouping, HAVING filters groups after
- WHERE is for numbers, HAVING is for text
Show Answer
C. WHERE filters rows before grouping, HAVING filters groups after - Use WHERE to filter individual rows, HAVING to filter aggregated results.
Question 5: GROUP BY
What does this query return? SELECT city, COUNT(*) FROM customers GROUP BY city;
- All customers with their cities
- The number of unique cities
- Each city with the count of customers in that city
- An error
Show Answer
C. Each city with the count of customers in that city - GROUP BY groups rows by city, and COUNT(*) counts rows in each group.
Question 6: JOINs
Which JOIN returns all rows from the left table, even if there's no match in the right table?
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
- CROSS JOIN
Show Answer
B. LEFT JOIN - LEFT JOIN returns all rows from the left table plus matching rows from the right. Non-matching rows show NULL for right table columns.
Question 7: CASE WHEN
What does CASE WHEN do in SQL?
- Changes the case of text (uppercase/lowercase)
- Creates conditional logic to return different values
- Sorts data by case sensitivity
- Creates a new table
Show Answer
B. Creates conditional logic to return different values - Similar to IF/ELSE, CASE WHEN returns different values based on conditions.
Question 8: Query Order
What is the correct order of SQL clauses?
- SELECT, FROM, WHERE, ORDER BY, GROUP BY
- FROM, SELECT, WHERE, GROUP BY, ORDER BY
- SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY
- SELECT, WHERE, FROM, GROUP BY, ORDER BY
Show Answer
C. SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY - This is the standard clause order in SQL queries.
Quiz Complete!
How did you do?
- 8/8: Excellent! You've mastered SQL basics
- 6-7/8: Good understanding - review the missed topics
- 4-5/8: Fair - revisit Days 3-6 material
- Below 4: Consider re-reading the week's content
About the Instructor
Pawan Rama Mali
Data Analytics professional and educator with extensive experience in SQL, database management, and teaching complex analytical concepts to beginners. Passionate about making data skills accessible to everyone and helping students build practical, job-ready expertise.
Specializations:
- SQL & Database Design
- Data Analytics & Business Intelligence
- R Programming & R Shiny Dashboards
- Python for Data Analysis