Relational databases have been the stronghold of modern computing applications for decades, but the need to handle data in web-scale systems have led to the creation of numerous NoSQL databases. Whatever, the differences between NoSQL databases are much bigger than ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.
There are hundreds of readily available NoSQL databases, and each have different use case scenarios. In this light, here is a comparison of Open Source NOSQL databases:
Cassandra
Cassandra is an open-source shared-nothing NoSQL column-store database developed and used in Facebook and it is based on the ideas behind Google BigTable and Amazon Dynamo. It supports a SQL-like language called CQL, together with other protocols.
Best used: When you need to store data so huge that it doesn't fit on server, but still want a friendly interface.
When/Where: Web analytics, to count hits by hour, by browser, by IP, etc. Transaction logging.
Couchbase
Couchbase is a combination of Membase, a key-value system with memcached compatibility, and CouchDB. It can be used in key-value fashion, but is considered a document store working with JSON documents.
Best used: Any application where low-latency data access, high concurrency support and high availability is a requirement.
When/Where: Low-latency use-cases like ad targeting or highly-concurrent web apps like online gaming.
Aerospike
Aerospike is a NoSQL shared-nothing key-value database which offers mainly AP characteristics. An in-memory database with disk persistence, automatic data partitioning and synchronous replication, offering cross datacenter replication and configurability in the failover handling mechanism, preferring full consistency or high consistency.
Best used: As session store, user profile store, id-mapping, dynamic web portals, fraud detection and real-time bidding (RTB).
When/Where: Storing massive amounts of profile data in online advertising or retail web sites.
HBase
HBase is an open-source database written in Java and developed by the Apache Software Foundation. It is intended to be the open-source implementation of the Google BigTable principles, and relies on the Apache Hadoop Framework and the Apache ZooKeeper projects.
Best used: It is probably still the best way to run Map/Reduce jobs on huge datasets.
E.g: Any place where scanning huge, two-dimensional join-less tables are a requirement.
Redis
Redis is an open source (BSD licensed), in-memory data structure store, and stores keys with optional durability, used as a database, cache and message broker.
Best used: For rapidly changing data with a foreseeable database size.
E.g: To store real-time stock prices, real-time analytics, real-time communication and wherever you used memcached before.
RethinkDB
RethinkDB is an open source, NoSQL, distributed document-oriented database. The database stores JSON documents with dynamic schemas, and is designed to facilitate pushing real-time updates for query results to applications.
Best used: Applications where you need constant real-time updates.
When/Where: Displaying sports scores on various displays or online. Monitoring systems.
Accumulo
Accumulo is a computer software project that developed a sorted, distributed key/value store based on the BigTable technology from Google. It has cell-level access labels and server-side programming mechanisms.
Best used: If you need to restrict access on the cell level.
When/Where: Search engines, analysing log data and any place where scanning huge, two-dimensional join-less tables are a requirement.
Neo4j
Neo4j is a graph database management system developed by Neo Technology, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing.
Best used: For graph-style, rich or complex, interconnected data.
When/Where: For searching routes in social relations, public transport links, road maps, or network topologies.
Now well, before to choose a NoSQL database, we must to evaluate a several quality attribute: availability, consistency, durability, maintainability, performance, reliability, robustness and scalability. In a next post, we will delve into each of these attributes to find how each of these affects the software quality of the database.
About Carlos Alvarez
Carlos is a Software Engineer with more than 8 years of experience developing Web and Mobile applications for some of the most important Fortune 500 companies
Nowadays Carlos works at the Engineering department of TISA, looking for implementing the latest technologies and frameworks to be used in future projects.
Beyond his technical knowledge and passion for the technology Carlos enjoys motorbikes and mixed martial arts.