Home > Education > Chapter 7 – SQL Joins

Chapter 7 – SQL Joins

Section I – Usage

What is a Join?

Up to this point, we’ve only used single-table queries.  If the only analysis we ever did were on single-table databases, the queries we’ve learned so far would be all we would need!  However, analysis is never that easy.  Most production databases will be normalized in a snowflake schema, which require lengthy joins, and even datamarts using star schemas will require multiple table queries.

So, how does an analyst cope?  With Joins.  SQL utilizes seven joins, six distinct, which allow us to combine multiple input-tables into a single output-table. 

Section II -                  Types of Joins

Sub-queries

Sub-queries are queries that are embedded within other queries.  A sub-query is not a traditional join in that the output will only come from a single input-table, but you will be constraining your results versus one or more other tables. 

Let’s start out with a simple example.  Here we want to grab only interval data that we’ve estimated.  Since the Data_Load table only has numbers, we need to get the data from the LKP_StatusCodes table.  The first thing we’ll need to do is determine what constitutes an estimate.  We could easily go to the table and lookup the values we need, but it’s easier to write a query.

Proc SQL;

Create Table Example01a as

Select ID

From Chapt07.LKP_StatusCodes

Where Lower(Description) contains “estimate”;

Quit;

Now, this lets us know what exactly it is we’re looking for, but since the Data_Load table doesn’t contain the word “estimate”, we need to get the tables in one query.  When using a sub-query, we join the two tables using a WHERE statement.  Please note that your sub-query can ONLY return a single column.  Also, since it’s possible the sub-query contains more than one row, you MUST use the “in” keyword.

Proc SQL;

Create Table Example01b as

Select      *

From Chapt07.Data_Load

Where StatusCodeID in   (Select ID

                         From Chapt07.LKP_StatusCodes

                         Where Lower(Description) contains “estimate”);

Quit;

WHERE (Implicit Inner Join)

Creating a join is fairly straight-forward.  All we need to do is specify every table name we wish to pull data from, and constrain the results using a WHERE statement.  However, with this type of join, the resultant dataset will ONLY contain the rows that adhere to the constraints.

Proc SQL;

Create Table Example02 as

Select      A.*,

            B.City as WeatherStationCity,

            B.State as WeatherStationState

From Chapt07.Obj_Recorders as A, Chapt07.Obj_Weatherstations as B

Where A.WeatherStationID = B.ID;

Quit;

We can also join multiple tables in the same way, as long as we remember to add the WHERE statement.

Proc SQL;

Create Table Example03 as

Select      A.OpCo,

            B.*,

            C.Code,

            C.Description

From Chapt07.Obj_Recorders as A,

      Chapt07.Data_Load as B,

      Chapt07.LKP_StatusCodes as C

Where A.ID = B.RecorderID

And B.StatusCodeID = C.ID;

Quit;

Cartesian Join

Before we get into the specific keywords involved in a proper join, let’s discuss the Cartesian Join.  A Cartesian join (or Cartesian product for those familiar with linear algebra) occurs whenever we forget the WHERE keyword.  You will probably experience the Cartesian join multiple times using SQL, but almost always due to forgetting a constraining column.  However, when you do wish to utilize the Cartesian join, it can be very powerful.  Take for example, suppose we wish to create a calendar dataset.  Now, we could open Excel and type all of it, but it’s much easier to create from scratch.  Let’s start by creating three sample datasets:

Proc SQL;

Create Table Example04a as Select Distinct IDYear from Chapt07.LKP_LoadPeriods;

Create Table Example04b as Select Distinct IDMonth from Chapt07.LKP_LoadPeriods;

Quit;

In the Year dataset, we found 6 years.  In the month dataset, we obviously found 12.  Now that we’ve got our base datasets, let’s merge them.

Proc SQL;

Create Table Example04c as

Select      IDYear as Year,

            IDMonth as Month

From Example04a, Example04b;

Quit;

Here, we can see that we have 72 rows (6*12), which is exactly what we’d expect.

Inner Join (Explicit Inner Join)

Now that we’ve seen how to merge multiple tables, let’s show another way of merging tables.  The INNER JOIN is exactly the same as the WHERE Join, except it’s easier for others to decipher our code.  This type of join is known as an equi-join based on the equality of two tables.  Let’s examine Examples 2 and 3 and rewrite them using INNER JOIN syntax.  In an INNER JOIN statement, we separate tables with the words “INNER JOIN” and then equate them using the “ON” keyword.  The “ON” keyword works the same as the WHERE statement.  In Set Theory, an Inner Join is also called an Intersection. (A Ç B)

Proc SQL;

Create Table Example05 as

Select      A.*,

            B.City as WeatherStationCity,

            B.State as WeatherStationState

From        Chapt07.Obj_Recorders as A

INNER JOIN Chapt07.Obj_Weatherstations as B  ON A.WeatherStationID = B.ID;

Quit;

Proc SQL;

Create Table Example06 as

Select      A.OpCo,

            B.*,

            C.Code,

            C.Description

From        Chapt07.Obj_Recorders as A

INNER JOIN  Chapt07.Data_Load as B ON A.ID = B.RecorderID

INNER JOIN Chapt07.LKP_StatusCodes as C ON B.StatusCodeID = C.ID;

Quit;

Please note, SAS will evaluate multi-table joins based on the order in which they are written, unless parentheses are used.  This can be very helpful when you need to speed up processes.  Join the largest tables last to minimize resource usage.


Section III -             Advanced Joins

Before we get to the more advanced joins, run the following query.  This will allow us to magnify how the advanced joins work.

Proc SQL;

Update Chapt07.Obj_Recorders Set WeatherStationID = “XXX” Where Opco = “SWP”;

Quit;

Left Outer Join

The Left Outer Join works similar to the Inner Join, except that in the resultant dataset, everything in the “Left” table will be returned, along with any corresponding rows in the “Right” table.  In Set theory, we’d represent this as “Everything in A”(A).

Proc SQL;

Create Table Example07 as

Select      A.*,

            B.*

From              Chapt07.Obj_WeatherStations as A

Left Outer Join   Chapt07.Obj_Recorders as B 

On A.ID = B.WeatherStationID;

Quit;

Right Outer Join

The Right Outer Join is the exact same as the Left Outer Join, except we’re keeping everything that occurs in the “Right” table and any corresponding values that are in the “Left” table.  In Set Theory, we’d represent this as “Everything in B” (B).

Proc SQL;

Create Table Example08 as

Select      A.*,

            B.*

From              Chapt07.Obj_WeatherStations as A

Right Outer Join  Chapt07.Obj_Recorders as B 

On A.ID = B.WeatherStationID;

Quit;

Full Outer Join

The Full Outer Join will return everything from A and B, and when the values in A are not equitable to values in B, a Cartesian product occurs.  In Set theory, we’d represent this as A Union B (A È B).

Proc SQL;

Create Table Example09 as

Select      A.*,

            B.*

From              Chapt07.Obj_WeatherStations as A

Full Outer Join   Chapt07.Obj_Recorders as B 

On A.ID = B.WeatherStationID;

Quit;


Section IV -                Helpful Images

Advertisement
Categories: Education Tags:
  1. No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.