Introduced in 1974, SQL or Structured Query Language is a domain-specific language used in programming and for managing data held in a relational database management system or for stream processing in a relational data stream management system. It is the most used database environment and the third most popular language.
SQL is the standard language for dealing with relational databases in big tech companies like Facebook, Instagram, Accenture, Dell, etc. Essentially, SQL language is used to query a database.
Below, we look at the eight SQL queries all data scientists should know.
Where Clause
The WHERE clause filters records based on input statements. Developers can use extract only those records that fulfil a specified condition. They can use the * operator to display the entire table and narrow down the results using the ‘WHERE’ keyword that displays specific rows.
Syntax:
SELECT column1, column2, …
FROM table_name
WHERE [condition]
Date Functions
The usual date format is YYYY-MM-DD, with the date-time format being YYYY-MM-DD HH:MI: SS. The data function allows developers to find a date in the present or future and dissect it into components for the month, the year, the day or a combination facilitating large-scale database design and maintenance.
Syntax:
select current_timestamp; — date and time, standard ANSI SQL so compatible across DBs
select getdate(); — date and time, specific to SQL Server
SQL Joins
A JOIN clause combines rows from two or more tables based on a related column. There are four types of Joins in SQL– the inner, left, right and full. The Inner Join clause creates a new table by combining rows with matching values in two or more tables. The Right Join returns all records from the right table matching with the left table, and the Left Join returns all records from the left table matching with the right. Lastly, the Full Join returns all records when there is a match in the left or right table.
Syntax for Inner Join:
SELECT column_name(s)
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;
Correlated Subqueries
Correlated Subqueries are used for row-by-row processing and select data from a table referenced in the outer query. The function works where each subquery is executed once for every outer query row. This allows developers to read every row in a table and compare their values against related data. A correlated subquery is evaluated once for every row processed by the parent statement, such as SELECT, UPDATE, or DELETE.
Syntax:
SELECT column1, column2, ….
FROM table1 outer
WHERE column1 operator
(SELECT column1, column2
FROM table2
WHERE expr1 =
outer.expr2);
Case Clause
The CASE clause implements the logic when the developer wants to set the value of one column based on the values in other columns. The function goes through conditions and returns a value when the first condition is met. It works till it finds a true condition, and if no condition is true, it returns the value in the ELSE clause. The Case clause is similar to the IF-ELSE statement in Excel.
Syntax:
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition THEN resultN
ELSE result
END;
Insert Values/ Rows into Tables
This query allows developers to easily insert multiple values or rows in a table with a single query. Inserting values in a table is a DML (Data manipulation language) operation to add data to the database.
Syntax:
INSERT INTO table_name (column_list)
VALUES
(value_list_1),
(value_list_2),
…
(value_list_n);
SQL Queries to Retrieve All Data From A Table
One of the most used DLM commands, the SELECT query, is used to select data from a database, and the retrieved data is stored in a result table called the result-set. The query returns a result set of records from one or more tables.
Syntax:
SELECT column1, column2, …
FROM table_name;
Get Average Values for Data
The average value syntax helps developers find the average value in the given data. It is an aggregate function that calculates the average value of a numerical dataset. The function can be used with the SELECT query for retrieving data from a table.
Syntax:
SELECT AVG( column_name ) FROM table_name
Find Duplicate records
This query helps developers find duplicate records in the table using a unique identifier. It is done via the GROUP BY statement in SQL that arranges identical data into groups. For instance, if a column has the same values in different rows, the tool will arrange those rows in a group. Duplicate values can be found in two steps:
- Define criteria for duplicates.
- Write a query to search for duplicates.