Monday, March 19, 2012

An interview question

Hi,

My wife had an interview today. Below is the sql-question she was asked:

There are 2 tables, A and B. The tables have same structure - each of them consists of 100 columns named c_1 through c_100. There are no primary keys defined on the tables.
Each table has 10,000 rows. Most rows are identical for tables A and B, however there are few that are not.
She was required to show only those rows that have no match in the other table - by 1 query that will be short, i.e. - will not contain someting like "col_1,col_2,.._col_99,Col_100" .Hmm...interesting question. It's hard to think of how to do this if there is no field(s) that can be used to join the two tables together, but part of the way might be doing a UNION on the two tables since they have the same structure, then using the UNION as a sub-query for an outer GROUP BY and do a HAVING Count(*) = 1. That way, it'll only show records that do not have a match to another table.

The problem with this is that you don't know which table contains that record and you do have to do the GROUP BY on all the fields outputted from the UNION query.

I'm kinda reaching here, but maybe create some type of primary key composed of the data in each row...by creating an expression like SOUNDEX(field1 + field2 + ... field99 + field100). That way, you have a primary key that might be dependable for use in an OUTER or FULL join. Just another idea...

Maybe something like this could be part of the way...I'm curious as to what the answer would be myself, especially if the answer has to be "1 query that will be short".

Kael V. Dowdy MCSD, MCP|||I think the answer involves using the BINARY_CHECKSUM function, but I'll have to check it out.

blindman|||OK, here goes:

select *
from (select BINARY_CHECKSUM(*) CheckSum, * from TableA) TableA
full outer join (select BINARY_CHECKSUM(*) CheckSum, * from TableB) TableB
on TableA.CHeckSum = TableB.Checksum
where TableA.Checksum is null or TableB.Checksum is null

The full outer join should show any differences between the two tables.

No comments:

Post a Comment