Vicki Boykis; vickiboykis.com / @vboykis
Girl Develop It is here to provide affordable and accessible programs to learn software through mentorship and hands-on instruction.
Some rules
We're going to make a table out of this info in part 2, and the data will live here: [Table Link]
By the end of the class, you will be able to query and modify a SQL table to make sense of the data you have in a relational way.
What's a database? What's SQL?
A virtual organized file cabinet that makes data available to people and computers across the organization.
DB Schema→ Table → Row → Value
Excel File → Workbook → Row → Cell
Oracle | MySQL |
SQL Server | Postgres |
Access | Sybase |
DB2 |
All different flavors of the same language
The power is in the relationship between data sets.
1 SELECT | selects variables |
2 FROM | opens datasets |
3 WHERE | restricts observations |
4 GROUP BY | groups observations |
5 HAVING | restricts groups |
6 ORDER BY | sorts results |
We'll use MySQL, a popular version of SQL.
If you've done anything online, chances are the transation went through MySQL (Walmart, Kayak, Facebook, Etsy, Verizon)
We'll log on to the front-end of MySQL, PHPMYAdmin.
Logon site: [site]
Your Password: [password]
We have a test dataset of employee records we'll be using in the employee schema.
employees
is the database and it has 6 tables.
Database/schema used interchangeably
Database → Schema ---> Table
Book ---> Chapter ---> Page
MySQL Database → Employees Schema → employees Table
Usually it's bad practice to name tables the same thing as the schema (confusing), but we'll be ok with the small test data set.
SELECT * FROM employees.employees;
Type out all the SQL you write. It'll be easier to learn.
Try a couple of these:
SELECT * FROM employees.titles;
SELECT * FROM employees.dept_emp;
SELECT * FROM employees.salaries;
and check out the tables and info.
Now let's try selecting only certain columns
SELECT title, emp_no FROM employees.titles;
A note about the semi-colon: you don't really need it, depending on which flavor of SQL you're using, but it's good practice in general to include it. Like a period at the end of an English sentence.
Selecting one column:
SELECT dept_name FROM employees.departments;
Selecting multiple columns:
SELECT emp_no, dept_no FROM employees.dept_emp;
SELECT DISTINCT title FROM employees.titles;
title |
---|
Senior Engineer |
Staff |
Engineer |
Senior Staff |
Assistant Engineer |
Technique Leader |
Manager |
SELECT COUNT(DISTINCT title) FROM employees.titles;
7
SELECT COUNT(title) FROM employees.titles;
443,308
Distinct: Selecting only unique instances
Which result set is bigger?
What is each one telling us?
SELECT COUNT(title) FROM employees.titles;
433,308
SELECT COUNT(*) FROM employees.salaries;
2,844,047
SELECT COUNT(*) from employees.departments;
9
SELECT COUNT(*) FROM employees.dept_emp;
331,603
SELECT COUNT(*) FROM employees.dept_manager;
24
SELECT COUNT(*) FROM employees.employees;
300,024
SELECT COUNT(*) FROM employees.salaries;
2,844,047
SELECT COUNT(*) FROM employees.titles;
443,308
SQL clauses: required order
1 SELECT | selects variables |
2 FROM | opens datasets |
3 WHERE | restricts observations |
4 GROUP BY | groups observations |
5 HAVING | restricts groups |
6 ORDER BY | sorts results |
SELECT type_of_dog,COUNT(*)
FROM animals.total_dogs
GROUP BY type_of_dog;
Group statements corral data together into categories when we want to count things by groups of things
i.e. How many total dogs vs puppies vs parents.
SELECT title, COUNT(*) FROM employees.titles
GROUP by title;
SELECT dept_name, COUNT(*) FROM departments
GROUP BY dept_name;
How many instances of that department are there in that table?
SELECT emp_no, COUNT(DISTINCT salary)
FROM salaries
GROUP BY emp_no;
SELECT dept_no, COUNT(emp_no) FROM dept_manager
GROUP BY dept_no;
SELECT birth_date,first_name, COUNT(*) FROM employees.employees
GROUP BY birth_date, first_name;
When was your youngest employee born? (i.e. are you following labor laws?)
SELECT MAX(birth_date) FROM employees;
In this case, you don't need the group-by statement since you're not counting groups of people, but looking at the whole table
Which title has been around the longest?(Smallest employee number)
SELECT title, MIN(emp_no) from titles
GROUP BY title;
Will sort by first name ascending, unless you specify desc.
SELECT first_name, gender, COUNT(*)
FROM employees.employees
GROUP BY first_name, gender
ORDER BY first_name;
SELECT first_name, gender, COUNT(*)
FROM employees.employees
GROUP BY first_name, gender
ORDER by first_name DESC;
Ordering by more than one column, will sort by the first, then by the second, ascending
SELECT gender, birth_date FROM employees.employees
ORDER BY gender, birth_date;
Order by a number ascending
SELECT emp_no, salary FROM employees.salaries
ORDER BY salary ASC;
Order by a number descending
SELECT emp_no, salary FROM employees.salaries
ORDER BY salary DESC;
1 SELECT | selects variables |
2 FROM | opens datasets |
3 WHERE | restricts observations |
4 GROUP BY | groups observations |
5 HAVING | restricts groups |
6 ORDER BY | sorts results |
You only want to select certain things.
SELECT first_name, last_name, COUNT(*)
FROM employees.employees
WHERE first_name='Adam'
GROUP BY first_name, last_name;
first_name | last_name | COUNT(*) |
---|---|---|
Adam | Alpin | 1 |
Adam | Alvarado | 1 |
"Like" will pick up anything similar to what you entered.
SELECT last_name, COUNT( * )
FROM employees
WHERE last_name LIKE 'S%'
GROUP BY last_name;
Only all people named Mary
SELECT * FROM employees
WHERE first_name='Mary';
Only all of the engineers
SELECT * FROM employees.titles
WHERE title LIKE '%Engineer%';
WHERE number >='5'; --returns (5,6,7,8...)
WHERE number > '5'; --returns (6,7,8...)
WHERE day <= '2017-01-01'; --returns (Jan 1, Dec. 31, Dec. 30…)
WHERE day < '2017-01-01'; --returns (Dec. 31, Dec. 30…)
WHERE day != '2017-01-01'; --returns anything other than Jan 1
One table is great. But the power of SQL comes from being able to join related tables to get information.
What's a database? What's SQL?
emp_no
is the column that's the primary key for all three tables that allows us to join them
Often when we join tables, we make them easier to reference later on by aliasing them, but at this point it's a style choice.
Find out salary history:
SELECT first_name, last_name, hire_date,salary, from_date, to_date
FROM employees.employees a
JOIN employees.salaries b ON a.emp_no = b.emp_no
first_name | last_name | hire_date | salary | from_date | to_date |
---|---|---|---|---|---|
Georgi | Facello | 6/26/86 | 60117 | 6/26/86 | 6/26/87 |
Georgi | Facello | 6/26/86 | 62102 | 6/26/87 | 6/25/88 |
Georgi | Facello | 6/26/86 | 66074 | 6/25/88 | 6/25/89 |
When you have two columns from different tables with the same name, you have to reference them specifically.
Left join: All the fields in the first table, plus the ones that match in the second table
SELECT first_name, last_name, hire_date,title, c.from_date, c.to_date
FROM employees.employees a
LEFT JOIN titles c ON a.emp_no = c.emp_no;
What if you'd done a right join instead?
What is this query telling us?
SELECT first_name, last_name,a.emp_no, hire_date,
COUNT(DISTINCT title) , COUNT(DISTINCT salary)
FROM employees.employees a
JOIN employees.salaries b ON a.emp_no = b.emp_no
JOIN employees.titles c ON a.emp_no = c.emp_no
where first_name like 'A%'
GROUP BY first_name, last_name, hire_date,a.emp_no;
What is this query bring back?
SELECT first_name, last_name,hire_date,a.emp_no, salary, from_date as
salary_start_date, to_date AS salary_end_date
FROM employees.employees a
JOIN employees.salaries b ON a.emp_no = b.emp_no
WHERE salary >60000
AND from_date >'2000-01-01'
ORDER BY last_name
Anyone whose salary is greater than $60,000 and who was hired after 1/1/2000
Using nested logic datasets to query other data sets
A subquery or inner query or nested query is a query within a query. Most of the time, a subquery is used when you know how to search
for a value using a SELECT statement, but not the exact value
Exercises:
SELECT COUNT(DISTINCT emp_no)
FROM employees.employees;
SELECT * from (
SELECT emp_no, COUNT(*) AS number_of_salaries
FROM salaries
WHERE from_date>'1998-01-01'
AND from_date<'2000-01-01'
GROUP BY emp_no) a
WHERE number_of_salaries >1
SELECT title, COUNT(DISTINCT a.emp_no)
FROM titles a
JOIN employees b
ON a.emp_no=b.emp_no
GROUP BY title
Exercises:
What did we do so far?
Now that you see how powerful SQL is, let's make some tables of our own. What if we have tons of data we want to query?
Or we want to join our Excel tables to other data?
You have the ability to create tables. But also to delete them by accident, to create duplicate data,or to confuse people with data.
You have the power to make people hate you. Use it wisely :)
DB Schema→ Table → Row → Value
Excel File → Workbook → Row → Cell
Name it with your first name or something easy you can remember; You'll have to keep typing it, so under 10 characters is optimal
Let's do favorite movies that we did at the beginning of the class.
Let's have: First Name, Favorite Movie, Number of times watched
First, we need to design it a bit in our head. Which columns do we want? Which one will be the unique primary key? How to we want to name the columns?
Relational Database Model
Storing the data as relations
Tables are joined on special fields called keys
Data usually comes in two types: numbers and charactes
*Accepted mySQL formats: https://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
By going to STRUCTURE
For the second half of the class we'll be working with commands more than the UI, but there is usually more than 1 way to do everything in MySQL.
Run in the sample place you did your select statements:
CHAR() | A fixed section from 0 to 255 characters long. |
VARCHAR( ) | A variable section from 0 to 255 characters long. |
TINYTEXT | A string with a maximum length of 255 characters. |
TEXT | A string with a maximum length of 65,535 characters. |
BLOB | A string with a maximum length of 65,535 characters. |
MEDIUMTEXT | A string with a maximum length of 16777215 characters. |
MEDIUMBLOB | A string with a maximum length of 16777215 characters. |
LONGTEXT | A string with a maximum length of 4294967295 characters. |
LONGBLOB | A string with a maximum length of 4294967295 characters. |
CREATE TABLE employees (
emp_no INT NOT NULL,
first_name VARCHAR(14) NOT NULL,
last_name VARCHAR(16) NOT NULL,
gender ENUM ('M','F') NOT NULL,
hire_date DATE NOT NULL,
PRIMARY KEY (emp_no);
Which field is the primary key?
Which fields are numbers?
What's enum*?
Congratulations, it's a table!
SELECT * FROM vicki.moviesample;
SELECT first_name FROM vicki.moviesample;
SELECT COUNT(*) FROM vicki.moviesample;
What do we get?
Nothing! because our table has no data. :(...yet
Now that we have the table set up, we can insert the values. Let's load a row of data.
INSERT INTO
yourname.tablename(first_name, favorite_movie,
times_watched) -- column names
VALUES ('Jasmine', 'Aladdin', '456'); -- column values
INSERT INTO vicki.moviesample(first_name, favorite_movie, times_watched)
VALUES ('Alice','Alice in Wonderland', 5);
INSERT INTO vicki.moviesample
(first_name, favorite_movie,times_watched)
VALUES ('Vicki', 'Beauty and the Beast', '5');
INSERT INTO vicki.movie_sample
(first_name, favorite_movie,times_watched)
VALUES ('Jasmine', 'Aladdin','456');
What kind of issues do you see arising from wrong data entry?
Let's answer some exercises with SQL:
What's the most popular movie?SELECT favorite_movie,
COUNT(DISTINCT first_name)
FROM yourname.moviesample
GROUP BY favorite_movie
ORDER BY COUNT(DISTINCT first_name) DESC;
We want to see which decade was more popular among GDI members. This one has the data for the movie name and the year it was made.
SELECT * FROM vicki.moviesample a
JOIN vicki.movie_year b
ON a.movie_title=b. movie_title
What kind of issues do you see arising from wrong data entry?
This is the part where you have to be really careful. Most users won't have access to delete tables.
truncate table; Leaves the table structure in tact, just deletes the values. (deleting all values in a spreadsheet, keeping file)
drop table; Deletes EVERYTHING. BADBADBAD. (file goes in the trash)
Easy way in UI: The delete button. Be careful!
DELETE FROM user1.moviesample WHERE first_name='Shrek';
DELETE FROM moviesample
WHERE first_name like '%S';
DELETE FROM moviesample
WHERE times_watched >='500';
SELECT
is to count things
CREATE
is to create a table
DELETE
is to erase
UPDATE
is to change specific entries
UPDATE moviesample
SET first_name = 'Chihiro', times_watched= 3
WHERE movie = 'Spirited Away';
Try it now
SELECT
is to count things
CREATE
is to create a table
DELETE
is to erase
ALTER
is to fundamentally change a table:
ALTER TABLE icecream DROP COLUMN flavor;
ALTER TABLE icecream ADD unique (quantity);
ALTER TABLE icecream MODIFY flavor VARCHAR(35);
Try it now:
ALTER TABLE vicki.moviesample ADD COLUMN release_yr
varchar(4) after times_watched;
Here's what we've learned:
Practice, practice, practice! Upload your Excel spreadsheets.
We value your feedback and are always trying to improve.
Vicki Boykis vicki.boykis@gmail.com
vickiboykis.com | @vboykis
Sarah Canfield - converting slides to Reveal A.E.Lavender - Sanity and syntax checks