Thursday, September 4, 2008

MySQL UNION vs. UNION ALL

I just learned a new quirk with MySQL.

If you a SQL buff you might find this interesting. This is definitely going on my blog. It's cool:

MySQL rules:

  1. Querieswith a UNION end up doing a AUTO-DISTINCT type of functionality. Thismakes sense because later the secondary unioned queries are onlysupposed to get new unique records. (ie. first query alone returns 10,second query along returns 6 but 3 are the same as the first query, soonly return 3 resulting in 13 records total.)

  2. However,UNION ALL is supposed to override that and get all records no matterwhat. (ie, in the above example this would return all 16 records)

  3. However,if the first query in the UNION has *results* that have two recordsthat appear identical (ie. if you are not returning primary keys, youonly select first and last name - and you have two real people named"John Doe" then results are identical) and the second unioned query haszero records, UNION ALL does not have the expected results. You onlyget one result on John Doe.
    This is especially bad, for example, ifyou are summing hours for each person, say for a timecard. If bothemployees named John Doe had 40 hours, the *results* are stillidentical.

    This was an issue of four unioned queries, inmy tests I was only doing UNION ALL on the first two queries which werethe ones having this issue. I learned thatmixing "UNION" with "UNION ALL" DOES NOT WORK. If you want UNION ALL results, all unions must be UNION ALL. If any union is missing the 'ALL' then all 'ALL's are ignored.

Solution #1: Derived Queries - heaven sent!

  1. Selectthe primary keys and they wrap the unioned queries in another query toperform the SUMs that you need and don't select the PKs in the outerquery.

Solution #2: UNION *ALL*

  1. As explained in the SQL rule #2 above, simply making each "UNION" statement be "UNION ALL" will solve the issue.
*** *** ***

Some examples of simple derived queries:

select fname, lname, hours
from (select 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION
select 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0


GETSQL - 3 Records
FNAME HOURS LNAME
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam

_________________________________

WITH ID's but same results:

select fname, lname, hours
from (select 1 as ID, 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 2 as ID, 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 3 as ID, 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION
select 4 as ID, 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0

GETSQL - 4 Records
FNAME HOURS LNAME
John 40 Doe
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam



Larger Unioned Query Examples:

select *
from (select 'John' as fname, 'Doe' as lname, 40 as hours
UNION ALL
select 'John' as fname, 'Doe' as lname, 40 as hours
UNION ALL
select 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION ALL
select 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0
UNION ALL
select *
from (select 'Jim' as fname, 'Dim' as lname, 35 as hours
UNION
select 'Ken' as fname, 'Doll' as lname, 36 as hours
UNION
select 'Rod' as fname, 'Ram' as lname, 37 as hours
UNION
select 'Jack' as fname, 'Black' as lname, 38 as hours
) as D2
where hours = 0 ## Make sure second query returns none, just for this example


GETSQL - 4 Records
FNAME HOURS LNAME
John 40 Doe
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam

_________________________________

Mixed "UNION" and "UNION ALL"

select *
from (select 'John' as fname, 'Doe' as lname, 40 as hours
UNION ALL
select 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION ALL
select 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0
UNION ALL
select *
from (select 'Jim' as fname, 'Dim' as lname, 35 as hours
UNION
select 'Ken' as fname, 'Doll' as lname, 36 as hours
UNION
select 'Rod' as fname, 'Ram' as lname, 37 as hours
UNION
select 'Jack' as fname, 'Black' as lname, 38 as hours
) as D2
where hours = 0 ## Make sure second query returns none, just for this example

GETSQL - 3 Records
FNAME HOURS LNAME
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam

__________________________________

select *
from (select 'John' as fname, 'Doe' as lname, 40 as hours
UNION ALL
select 'John' as fname, 'Doe' as lname, 40 as hours
UNION ALL
select 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION ALL
select 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0
UNION
select *
from (select 'Jim' as fname, 'Dim' as lname, 35 as hours
UNION
select 'Ken' as fname, 'Doll' as lname, 36 as hours
UNION
select 'Rod' as fname, 'Ram' as lname, 37 as hours
UNION
select 'Jack' as fname, 'Black' as lname, 38 as hours
) as D2
where hours = 0


GETSQL - 3 Records
FNAME HOURS LNAME
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam

__________________________________

select *
from (select 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 'John' as fname, 'Doe' as lname, 40 as hours
UNION
select 'Joe' as fname, 'Shmoe' as lname, 25 as hours
UNION
select 'Sam' as fname, 'Dam' as lname, 37 as hours
) as D1
where hours > 0
UNION
select *
from (select 'Jim' as fname, 'Dim' as lname, 35 as hours
UNION
select 'Ken' as fname, 'Doll' as lname, 36 as hours
UNION
select 'Rod' as fname, 'Ram' as lname, 37 as hours
UNION
select 'Jack' as fname, 'Black' as lname, 38 as hours
) as D2
where hours = 0


GETSQL - 3 Records
FNAME HOURS LNAME
John 40 Doe
Joe 25 Shmoe
Sam 37 Dam



And that's a wrap!