- Size of your data: Evaluate how your data is! Is storing your data in relational database like MySQL or Oracle an option? How many tables do you need to create? How many columns per table will you have on an average and most importantly how many rows each table will have?
- Flexibility: In a relational database, you need to create the schema first. Do you need some flexibility over there? Like in one of my projects I worked on a logging module and the structure of the logs differed. Therefore I wanted to have flexible schema for it.
- Data Retrieval or Faster Write Access: While some applications require fast data retrieval, some require fast write access and a few require both. Think about Google Search where fast data retrieval is very important whereas for an application like Twitter, lot of tweets require lots of write operation.
- Concurrent Read & Write Access: It’s not just the speed of the application that matters but also the concurrent read and write access that should be taken into consideration. Think about the number of Facebook users writing simultaneously on various sections of the website.
- Are you creating an application for Analytics?
- Social network Integration: Does your application have social-network features? Facebook is a big inspiration to choose NoSQL if your application has similar features. Facebook Engineering Notes and Facebook Engineering Papers are good sources of information about the latest technologies used at Facebook.
- Is indexing, caching etc. not a solution to your problem?
- MongoDB: Languages like Java, C, C++, PHP, etc. can be used. It is written in C++. Data interchange format is BSON (Binary JSON).
- Cassandra: Written in Java and Thrift is used for external client-facing API. It supports a wide range of languages (Thrift languages) and brings together the best blend of features of Google’s BigTable and Amazon’s Dynamo. Cassandra was developed and later open-sourced by FaceBook.
- HBase: It is built on top of Hadoop and written in Java. It is used when realtime read/write access to big data is needed.
- Neo4j: NOSQL graph database.
- Redis: advanced key-value store.
My approach is to choose the one that suits your application data model. If your application has something like a social-graph (some of the social networking features) then use a graph database like Neo4j. If your application requires storing very large amount of data and processing it then use column oriented database like HBase (Google’s BigTable belongs to ColumnFamilies).
If you want fast lookups then you should choose something like Redis which supports key/value pairs. When data-structure can vary and you need a document type storage go for something like mongoDB. If your application requires high concurrency and low latency data access then go for Membase.
Now even if you have chosen the appropriate NoSQL database there are still certain things which you should make a note of :
1. Is the DB that you have chosen easy to manage and administer?
2. Developer’s Angle – Do you have the right set of people who can get started quickly on this. Does it have enough forums and community support. Affirmative answers for these questions are a must since NoSQL DBs have not matured yet and are still in the stage of emergence.
3. Are open source communities actively building tools/frameworks/utilities around it in order to make the developer’s life easy?
Since I am a regular reader of High Scalability site, I would recommend going though this URL: www.highscalability.com/blog/category/nosql . It has around 38 informative articles on NoSQL.
Apart from this, InfoQ also has good content for NoSQL: www.infoq.com/nosql
Another hot place these days to get smart answers is Quora. Do read various NoSQL related queries and their answers written by Developers/Engineers/Architects from top organizations like Facebook, Twitter, LinkedIn, Amazon and various other hot startups at: www.quora.com/NoSQL.
The list is indeed big but I am not going to publish too many URLs to divert your attention 🙂