What is a fast way of joining two tables and using the first table column to "filter" the second table?

I am trying to develop a SQL Server 2005 query but I'm being unsuccessful at the moment. I trying every different approach that I know, like derived tables, sub-queries, CTE's, etc, but I couldn't solve the problem. I won't post the queries I tried here because they involve many other columns and tables, but I will try to explain the problem with a simpler example:

  • There are two tables: PARTS_SOLD and PARTS_PURCHASED. The first contains products that were sold to customers, and the second contains products that were purchased from suppliers. Both tables contains a foreign key associated with the movement itself, that contains the dates, etc.

  • Here is the simplified schema:

Table PARTS_SOLD:

  • part_id
  • date
  • other columns

Table PARTS_PURCHASED

  • part_id
  • date
  • other columns

  • What I need is to join every row in PARTS_SOLD with a unique row from PARTS_PURCHASED, chose by part_id and the maximum "date", where the "date" is equal of before the "date" column from PARTS_PURCHASED. In other words, I need to collect some information from the last purchase event for the item for every event of selling this item.

The problem itself is that I didn't find a way of joining the PARTS_PURCHASED table with PARTS_SOLD table using the column "date" from PARTS_SOLD to limit the MAX(date) of the PARTS_PURCHASED table.

I could have done this with a cursor to solve the problem with the tools I know, but every table has millions of rows, and perhaps using cursors or sub-queries that evaluate a query for every row would make the process very slow.

Answers


You aren't going to like my answer. Your database is designed incorrectly which is why you can't get the data back out the way you want. Even using a cursor, you would not get good data from this. Assume that you purchased 5 of part 1 on May 31, 2010. Assume on June 1, you sold ten of part 1. Matching just on date, you would match all ten to the May 31 purchase even though that is clearly not correct, some parts might have been purchased on May 23 and some may have been purchased on July 19, 2008.

If you want to know which purchased part relates to which sold part, your database design should include the PartPurchasedID as part of the PartsSold record and this should be populated at the time of the purchase, not later for reporting when you have 1,000,000 records to sort through.


Perhaps the following would help:

SELECT S.*
  FROM PARTS_SOLD S
INNER JOIN (SELECT PART_ID, MAX(DATE)
              FROM PARTS_PURCHASED
              GROUP BY PART_ID) D
  ON (D.PART_ID = S.PART_ID)
WHERE D.DATE <= S.DATE

Share and enjoy.


I'll toss this out there, but it's likely to contain all kinds of mistakes... both because I'm not sure I understand your question and because my SQL is... weak at best. That being said, my thought would be to try something like:

SELECT * FROM PARTS_SOLD
INNER JOIN (SELECT part_id, max(date) AS max_date 
            FROM PARTS_PURCHASED 
            GROUP BY part_id) AS subtable
   ON PARTS_SOLD.part_id = subtable.part_id
      AND PARTS_SOLD.date < subtable.max_date

Need Your Help

iPhone SDK - UIButton not working in custom view

iphone custom-controls uibutton

This is driving me nuts... I've seen a few posts on similar issues but can't seem to make it work, so here it is:

dynamically allocate memory for a string in a linked list node

c string visual-studio-2010 linked-list calloc

I'm using Visual Studio 2010, which I know has some idiosyncrasies. I'm hoping it's not that.