Is it OK to return a HTTP 401 for a non existent resource instead of 404 to prevent information disclosure?
Inspired by a thought while looking at the question "Correct HTTP status code when resource is available but not accessible because of permissions", I will use the same scenario to illustrate my hypothetical question.
Imagine I am building a a carpooling web service.
Suppose the following
retrieves the current position of user "angela". Only angela herself and a possible driver that is going to pick her should be able to know her location, so if the request is not authenticated to an appropriate user, a 401 Unauthorized response is returned.
Also consider the request
when no user called john has registered with the system. There is no john resource let alone a resource for john's location, so this obviously returns a 404 Not Found. Or does it?
What if I don't want to reveal whether or not john is registered with the system?
(Perhaps the usernames are drawn from a small pool of university logins, and there is a very militant cycling group on campus that takes a very dim view of car usage, even if you are pooling? They could make requests to the URL for every user, and if they receive a 401 instead of 404, infer that the user is a car pooler)
Does it make sense to return a 401 Unauthorized for this request, even though the resource does not exist and there is no possible set of credentials that could be supplied in a request to have the server return a 200?
Actually, the W3C recommends (RFC 2616 §10.4.4 403 Forbidden) doing the opposite. If someone attempts to access a resource, but is not properly authenticated, return 404 then, rather than 403 (Forbidden). This still solves the information disclosure issue.
If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.
Thus, you would never return 403 (or 401). However, I think your solution is also reasonable.
EDIT: I think Gabe's on the right track. You would have to reconsider part of the design, but why not:
- Not found - 404
- User-specific insufficient permission - 404
- General insufficient permission (no one can access) - 403
- Not logged in - 401
If usernames are sensitive information, then don't put them directly in the URI. If you use hypermedia within your representations then you can make it just as easy for an authorized client applications to navigate your api without leaking information in your URLs.
Hackable urls are great for information that you want everyone to be able to access easily. However, for a RESTful client, there is no problem using URIs that are completely opaque.
Once you have removed the direct correlation between the user and the URI, it becomes difficult to infer any information from a 401 response code.
I think the best solution would be to return 403 (forbidden) for every (potential) page in a class, if the user is not authenticated to see any of them. If the user is, return 404 for stuff that's not there and 200 for stuff that is.
I think it's fine if you want to return a 401 Unauthorized if the request is made by a client that is not a user. However, if a user makes the request and is authenticated, then I don't think that a 401 is the best solution. If you feel that returning a 404 would compromise the security of some users, then you may want to consider returning a 403 Forbidden or perhaps a 200 OK, but just don't specify a location. If I query for user bob and get a response and query for user sam and get an error response, be it 401, 403, 404, etc, then I can probably come to the conclusion that it means that user sam doesn't exist.
200 OK with no location specified may be the most disguised solution.
Edit: Just to illustrate what I am proposing. Return a 401 if the client isn't authorized. Otherwise, always return a 200 OK.
<user-location for="bob"> <location>geo-coordinates here</location> </user-location> <user-location for="sam"> <location/> </user-location>
This doesn't really indicate if sam exists or not, or perhaps there just isn't any location data for him currently.
Return 401 Unauthorized in any case in which the user is not allowed to see a particular page, whether it exists or not.
From RFC 2616: "If the request already included Authorization credentials, then the 401 response indicates that authorization has been refused for those credentials."
Consider HTTP servers that use separate lists of credentials for authentication to different URLs. Obviously, a server should not check every list when a URL is requested, so if the credentials are not in that one applicable list, because HTTP requests are completely independent of each other, it makes sense to return 401 Unauthorized if the credentials are not valid for that particular URL.
Furthermore, the description of 403 Forbidden includes: "Authorization will not help and the request SHOULD NOT be repeated." In contrast, if the user chooses to log in using the correct credentials, Authorization will help.