The switch from relational hadn’t been too hard because Riak is a key-value store, which made modeling relatively easy. Key value-stores are relatively simple database management systems that store just pairs of keys and values.
McCaul reckoned, too, migration of data had been made possible because the structure of patient records lent themselves to Riak’s key-value mode
PostgreSQL’s structured format for saving JSON, called JSONB, eliminates the need for restructuring a document before it is committed to the database.
Even though our team specializes in MongoDB (and initially considered using CouchDB), we ended up using Amazon’s DynamoDB to complete the task. Here are the steps that led to the decision:
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.
via Apache Accumulo.
So what can we conclude? Well, with the drivers here I focused primarily on ease-of-use. There are other factors that need to be considered, as well. Do they support connection pooling, for example? Do they cache? What about pulling in large amounts of data? (Hint: Most of the better drivers for most of the popular languages support cursors, so you don’t have to pull all the data in at once.) Those are factors you’ll need to investigate as you choose a driver for the language and database you’re using. But in general, virtually all the popular languages today, including Java, PHP, Python, PERL, and even C++, have nice libraries that make database programming far easier than it used to be.
MongoDB does great with large complex structures that are typically read in individually, while the large relational databases do well when I’m processing huge amounts of data. And no, my clients’ data needs are nowhere near as big as Google, so we don’t encounter any performance and scalability problems.
MongoDB (from “humongous”) is a scalable, high-performance, open source NoSQL database. Written in C++, MongoDB features: