CloverDX Blog on Data Integration

Learn How to Manage CouchDB with CloverDX

Written by CloverDX | May 20, 2014

NoSQL databases are fast and used for high volumes of data and documents – and they are today's rising stars. When a customer asked us for help to manage CouchDB, we decided it would be helpful to others if we shared some tips in a blog post.

Apache’s CouchDB is an open source NoSQL document-oriented database, an alternative to the well known MongoDB. It's free and pretty functional, and because CloverDX can read and write JSON and has an HTTPconnector, we can work with and manage it pretty easily.

About Apache CouchDB

A brief introduction from the Apache CouchDB documentation describes the interface method and the way it stores data:

The CouchDB API is the primary method of interfacing to a CouchDB instance. Requests are made using HTTP and requests are used to request information from the database, store new data, and perform views and formatting of the information stored within the documents.

For nearly all operations, the submitted data, and the returned data structure, is defined within a JavaScript Object Notation (JSON) object.

CloverDX can actually both read and write JSON, and then use CloverDX HTTP connectors to control the database. The CouchDB returns HTTP Status Codes as a response to any operation, which can be used for later processing in CloverDX. Read more about CouchDB API in the documentation: http://docs.couchdb.org/en/latest/api/basics.html#errors.

Demo Project

Download sample project here: CouchDB_project.zip

Manage CouchDB Databases

In this example, we first delete the database created for demo purposes so we can show how CloverDX can easily control the CouchDB database via HTTPConnector using standard request methods. More information about this is available in CloverDX HTTPConnector documentation: http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/httpconnector.html

As you can see in the picture below, we delete the database with the name white_house, using the request method DELETE. The response is saved in the output file delete_db.out.

Right after, we create a new database for each different insert method using the request method PUT. This time, we get the output on an edge using standard CloverDX Output mapping.

JSON File Creation in CloverDX

There are a couple of ways to insert JSON documents into CouchDB. The following example demonstrates these three different ones:

  • Single insert from CloverDX to the DB
  • Bulk insert directly from CloverDX to the DB
  • JSON file upload to the DB

The sample data used in the example is provided in a CSV file and sent to the JSONWriter component for generating the final JSON file. The JSONWriter component uses specific mapping.

Further information about the component can be found in CloverDX documentation: http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/jsonwriter.html

In the following steps, we’ll be using previously created JSON files in the format described below.

JSON file for Single Record Inserting Method

The “_id” field is unique ID field which can be provided on insert or is generated by CouchDB.

{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}

JSON file for Bulk Insert Method

The bulk method requires a set of single documents to be wrapped in an array named “docs” as shown below:

More information about HTTP Bulk Document API is available here: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API

{
"docs":[
{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
},
{
"_id":"14208",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}
]
}

JSON file Upload to the DB

The format of the JSON file used in the third method (uploading a JSON file into the DB) is the same as in bulk method, only saved in a local file.

Creating the JSON file

The following picture shows the part of the graph where the UniversalDataReader component reads the CSV file with data and copies it to three different edges each for the JSON file types described above. Two of them are processed further later, while the last one is saved to the local file.

Inserting and Uploading JSON into CouchDB

For all of the methods we’ve mentioned, the Additional HTTP Headers property needs to be set to: Content-Type=application/json.

Read more about the HTTP headers: http://docs.couchdb.org/en/latest/api/basics.html#http-headers

The first used is actually the simplest one – inserting JSON documents one by one using POST request method. However, it’s also the slowest, due to multiple requests on the DB (for each single record).

CouchDB provides bulk_docs API for inserting multiple documents using a single JSON docs array.

The bulk_docs API is called using URL suffix: _bulk_docs – i.e.: http://URL/DB_NAME/_bulk_docs

Read more about this here: http://docs.couchdb.org/en/latest/api/database/bulk-api.html?highlight=bulk#db-bulk-docs

The third method is basically the same as the bulk method, only it uses the local file instead of data directly from the edge with defined Input file URL in CloverDX HTTPConnector.

Reading from CouchDB

Reading an Entire Database using Temp View

Reading from a database is performed using CloverDX HTTPConnector.

In the case of reading the whole database, API method _temp_view needs to be called using POST and with Request Content specified in CloverDX HTTPConnector component i.e.: { "map" : "function(doc) { emit(doc._id, doc); }"}

Read more about CouchDB views and mapping function: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

Reading a Single Document

The CloverDX HTTPConnector reads a single record from the database using the GET method and a defined ID suffix in URL – i.e. http://URL/DB_NAME/ID

JSON Extract

After getting the JSON file from CouchDB, extract is desired for further data manipulation in CloverDX. For this, CloverDX has a special component – JSONExtract. Check out more on JSONExtract in the CloverDX documentation:
http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/jsonextract.html