Since the introduction of the NoSQL database model to the world, there’s been a flurry of proponents and detractors that seem to fall into a 50-50 distribution. Some of the discussions have become very heated, others are just laugh out funny.

One of the things that have been talked about in the blogosphere is that you shouldn’t embrace NoSQL as the first solution to your problem. There is time — they say — to scale it using NoSQL later on. To me that makes no sense: NoSQL engines have become mature technologies that can be used by any enterprise, big or small.

Reddit user cogman10 mentions on this blog post:

Picking Tech A over Tech B is NOT a premature optimization. Would the author claim that “Using InnoDB is a premature optimization because MySQL is better supported!” It is called planning, you do that whenever you write a new application.

Use the database that best matches your data. If some non-relational database is a perfect match for the data you want to store, by all means use it. Don’t give two shits about people like the author that think SQL is the one and only query language. (hell, I wish that SQL would die in flames, but it is heavily built into current business models. Not because it is the best, but because it is common.)

Insisting on the wrong tech is not premature optimization. It is stupidity.

But in some cases, there might be a mixed option available: using both SQL and NoSQL.

High Write / Low Reads

Writes are expensive on SQL engines. This is because, unless you use sharding, you usually write on a Master Server. But sharding (in effect writing to multiple “masters”) makes your solution to not be ACID — “Atomic, Consistent, Isolated, Durable transactions” — anymore (i.e. a write on master 1 might no be available for some time on master 2).

This is where NoSQL shines: writes can happen on any box and even though they’re BASE – ”Basically Available, Soft State, and Eventually Consistent” — the “eventually” piece is usually really fast.

The Mixed Approach

If you have relational model that you would like to still use, you could potentially leave the read data on a SQL engine (being careful to avoid super complex joins, which are also very expensive in terms of load) and then host the write-heavy tables on a NoSQL engine.

Why not code everything using NoSQL from the start? Because there are benefits to using SQL, like joins and or other features that you might not want to give up. You could also potentially need to write much more code to adopt the different data approach that NoSQL requires.

Take a look at the diagram below (click to enlarge):

As you can see on the diagram, we have web users coming into a load-balanced cluster of web server instances that have connections to both MongoDB server instances (NoSQL) and a connection to a MySQL slave server instance on a cloud environment. This part of the network can grow or shrink horizontally very easily, by adding web servers, mongo servers or MySQL slave servers. I would probably group up to four web servers per MySQL slave, and then create a new MySQL slave instance for every new group of four.

The MySQL Master lives in a physical colocation environment and there are processes running there that update the relational data.

Let’s see an example. Say you have a heavily visited shopping website where you need to have complex product information that contains many joins (product to manufacturer to inventory levels; that would be the SQL piece), but you need to track the product pages people are visiting, sort of a log of their activity. You could have this high-write activity happening on the MongoDB NoSQL servers. And because they will actively be written in the sharded MongoDB servers they will be scalable.

As traffic grows, you can add more MongoDB servers and the load will be distributed properly. The product information might change once per day or some other low frequency schedule, which makes it perfect to have on a read environment. Remember, reads are cheap.

If you need data from the MongoDB servers from your colocation environment, you could potentially run cron-based jobs that download the data from the MongoDB servers securely in a scheduled form.

You could also potentially put a load balancer on front of multiple MySQL Slaves, if you wanted a quickly scaling setup.

The main point I’m trying to make is the following — use the right data solution to your problem. SQL, NoSQL or both. Don’t be fixated on the technology, but on what you need to accomplish.

{ 0 comments }

7 Habits For Highly Effective Developers

November 7, 2011
Thumbnail image for 7 Habits For Highly Effective Developers

A new developer joined our tech team this week, and I’ve often seen how it takes some time for new recruits to get the hang of a new development environment. I thought it’d be a good idea to sit down with him and give him some pointers so that he can move in the right [...]

Read the full article →

When Is It Right For a Startup To Pivot?

October 24, 2011
Thumbnail image for When Is It Right For a Startup To Pivot?

A pivot is defined as a quick turn by either a company or a project. Sometimes it’s like a shift in focus in a small startup. Other times we see it in companies as large as Google, when it announced  it was shutting down Google Buzz, a service that it announced to great fanfare but [...]

Read the full article →

I Ain’t Goin’ Nowhere

October 16, 2011
Thumbnail image for I Ain’t Goin’ Nowhere

Blogs are becoming harder and harder to maintain. Some are calling blogs dead. My blog hasn’t been the exception. It’s not that I don’t have ideas that I want to continue discussing with you, my faithful readers. It’s more that the platforms where to put those ideas are becoming more and more powerful. Take a [...]

Read the full article →

Spotify: The New Way to Enjoy Vinyl (Or the Closest Thing to It)

July 20, 2011
Thumbnail image for Spotify: The New Way to Enjoy Vinyl (Or the Closest Thing to It)

As many people can attest to, specially if you are a Generation X’er, I used to buy vinyl records when I was a teenager. Yes, they were fragile, could get scratched easily and if you played them too many times, they would become unusable. But for me, vinyls represent the long form of an artist’s [...]

Read the full article →

Google+: It’s Not About Social, It’s All About SEO’s Next Frontier

July 1, 2011
Thumbnail image for Google+: It’s Not About Social, It’s All About SEO’s Next Frontier

When I first got into Google+ (thanks to my fellow blogger Rob Diana) I was expecting to see, as everyone else, what Google had developed to finally put a good dent into the social media space. We all saw this chart emerge from AllThingsD where Facebook was basically killing, in terms of time spent on site, [...]

Read the full article →

The Keys to the Cloud Are Inside Smart Caching

June 14, 2011

There is a strong wind blowing the Cloud space these days, and we are about to be part of a great shift in computing. Web apps seem to be the next logical frontier to be reached, where URLs will be a thing of the past. ReadWriteWeb wrote the following about the new version of Google [...]

Read the full article →

Rackspace Cloud vs. Amazon Cloud — Which is the Winner?

April 15, 2011

I’ve been an EC2 customer for 3 years now, and have been using Rackspace for the past 3 months. But I’ve been reading or hearing this question for what feels like an eternity: Which one is better, Rackspace Cloud Servers or Amazon EC2? To add fuel to the fire, Dave Winer recently posted an article [...]

Read the full article →

Google TV is Limping Without Studio’s Support

October 23, 2010
Thumbnail image for Google TV is Limping Without Studio’s Support

My wife works in television and film production, and one thing I can tell you is that producing quality content is very, very expensive. It takes a lot of effort by a lot of people (and don’t ask me about those fancy dressing room requests by actors). On May 20, Google officially confirmed at their [...]

Read the full article →

Facebook Groups is the Needed Step Towards Social Curation

October 10, 2010
Thumbnail image for Facebook Groups is the Needed Step Towards Social Curation

One of the most commented news around the blogosphere this week was the launch of the new Facebook Groups, a reboot of the original groups product with a number of new features, some of which have been criticized. I have been playing with the product for a few days and I have to say this is [...]

Read the full article →