Since the introduction of the NoSQL database model to the world, there’s been a flurry of proponents and detractors that seem to fall into a 50-50 distribution. Some of the discussions have become very heated, others are just laugh out funny.
One of the things that have been talked about in the blogosphere is that you shouldn’t embrace NoSQL as the first solution to your problem. There is time — they say — to scale it using NoSQL later on. To me that makes no sense: NoSQL engines have become mature technologies that can be used by any enterprise, big or small.
Reddit user cogman10 mentions on this blog post:
Picking Tech A over Tech B is NOT a premature optimization. Would the author claim that “Using InnoDB is a premature optimization because MySQL is better supported!” It is called planning, you do that whenever you write a new application.
Use the database that best matches your data. If some non-relational database is a perfect match for the data you want to store, by all means use it. Don’t give two shits about people like the author that think SQL is the one and only query language. (hell, I wish that SQL would die in flames, but it is heavily built into current business models. Not because it is the best, but because it is common.)
Insisting on the wrong tech is not premature optimization. It is stupidity.
But in some cases, there might be a mixed option available: using both SQL and NoSQL.
High Write / Low Reads
Writes are expensive on SQL engines. This is because, unless you use sharding, you usually write on a Master Server. But sharding (in effect writing to multiple “masters”) makes your solution to not be ACID — “Atomic, Consistent, Isolated, Durable transactions” — anymore (i.e. a write on master 1 might no be available for some time on master 2).
This is where NoSQL shines: writes can happen on any box and even though they’re BASE – ”Basically Available, Soft State, and Eventually Consistent” — the “eventually” piece is usually really fast.
The Mixed Approach
If you have relational model that you would like to still use, you could potentially leave the read data on a SQL engine (being careful to avoid super complex joins, which are also very expensive in terms of load) and then host the write-heavy tables on a NoSQL engine.
Why not code everything using NoSQL from the start? Because there are benefits to using SQL, like joins and or other features that you might not want to give up. You could also potentially need to write much more code to adopt the different data approach that NoSQL requires.
Take a look at the diagram below (click to enlarge):
As you can see on the diagram, we have web users coming into a load-balanced cluster of web server instances that have connections to both MongoDB server instances (NoSQL) and a connection to a MySQL slave server instance on a cloud environment. This part of the network can grow or shrink horizontally very easily, by adding web servers, mongo servers or MySQL slave servers. I would probably group up to four web servers per MySQL slave, and then create a new MySQL slave instance for every new group of four.
The MySQL Master lives in a physical colocation environment and there are processes running there that update the relational data.
Let’s see an example. Say you have a heavily visited shopping website where you need to have complex product information that contains many joins (product to manufacturer to inventory levels; that would be the SQL piece), but you need to track the product pages people are visiting, sort of a log of their activity. You could have this high-write activity happening on the MongoDB NoSQL servers. And because they will actively be written in the sharded MongoDB servers they will be scalable.
As traffic grows, you can add more MongoDB servers and the load will be distributed properly. The product information might change once per day or some other low frequency schedule, which makes it perfect to have on a read environment. Remember, reads are cheap.
If you need data from the MongoDB servers from your colocation environment, you could potentially run cron-based jobs that download the data from the MongoDB servers securely in a scheduled form.
You could also potentially put a load balancer on front of multiple MySQL Slaves, if you wanted a quickly scaling setup.
The main point I’m trying to make is the following — use the right data solution to your problem. SQL, NoSQL or both. Don’t be fixated on the technology, but on what you need to accomplish.
{ 0 comments }

