How big is too big for a PostgreSQL table?

I'm working on the design for a RoR project for my company, and our development team has already run into a bit of a debate about the design, specifically the database.

We have a model called Message that needs to be persisted. It's a very, very small model with only three db columns other than the id, however there will likely be A LOT of these models when we go to production. We're looking at as much as 1,000,000 insertions per day. The models will only ever be searched by two foreign keys on them which can be indexed. As well, the models never have to be deleted, but we also don't have to keep them once they're about three months old.

So, what we're wondering is if implementing this table in Postgres will present a significant performance issue? Does anyone have experience with very large SQL databases to tell us whether or not this will be a problem? And if so, what alternative should we go with?

Answers


Rows per a table won't be an issue on it's own.

So roughly speaking 1 million rows a day for 90 days is 90 million rows. I see no reason Postgres can't deal with that, without knowing all the details of what you are doing.

Depending on your data distribution you can use a mixture of indexes, filtered indexes, and table partitioning of some kind to speed thing up once you see what performance issues you may or may not have. Your problem will be the same on any other RDMS that I know of. If you only need 3 months worth of data design in a process to prune off the data you don't need any more. That way you will have a consistent volume of data on the table. Your lucky you know how much data will exist, test it for your volume and see what you get. Testing one table with 90 million rows may be as easy as:

select x,1 as c2,2 as c3
from generate_series(1,90000000) x;

https://wiki.postgresql.org/wiki/FAQ

Limit   Value
Maximum Database Size       Unlimited
Maximum Table Size          32 TB
Maximum Row Size            1.6 TB
Maximum Field Size          1 GB
Maximum Rows per Table      Unlimited
Maximum Columns per Table   250 - 1600 depending on column types
Maximum Indexes per Table   Unlimited

Another way to speed up your queries significantly on a table with > 100 million rows is in the off hours cluster the table on the index that is most often used in your queries. We have a table with > 218 million rows and have found 30X improvements.

Also, for a very large table, it's a good idea to create an index on your foreign keys.


Need Your Help

Forward SSH traffic through a middle machine

command-line bash ssh git tunnel

SSH tunneling is very confusing to me. I am wondering if I can do this in Linux.

What's the difference between faking, mocking, and stubbing?

unit-testing mocking terminology definition stub

I know how I use these terms, but I'm wondering if there are accepted definitions for faking, mocking, and stubbing for unit tests? How do you define these for your tests? Describe situations whe...