
Posted on May 20, 2014
Discover how Gain Theory automated their data ingestion and improved collaboration, productivity and time-to-delivery thanks to CloverDX.
Read case studyNoSQL databases are fast and used for high volumes of data and documents – and they are today's rising stars. When a customer asked us for help to manage CouchDB, we decided it would be helpful to others if we shared some tips in a blog post.
Apache’s CouchDB is an open source NoSQL document-oriented database, an alternative to the well known MongoDB. It's free and pretty functional, and because CloverDX can read and write JSON and has an HTTPconnector, we can work with and manage it pretty easily.
A brief introduction from the Apache CouchDB documentation describes the interface method and the way it stores data:
The CouchDB API is the primary method of interfacing to a CouchDB instance. Requests are made using HTTP and requests are used to request information from the database, store new data, and perform views and formatting of the information stored within the documents.
For nearly all operations, the submitted data, and the returned data structure, is defined within a JavaScript Object Notation (JSON) object.
CloverDX can actually both read and write JSON, and then use CloverDX HTTP connectors to control the database. The CouchDB returns HTTP Status Codes as a response to any operation, which can be used for later processing in CloverDX. Read more about CouchDB API in the documentation: http://docs.couchdb.org/en/latest/api/basics.html#errors.
Download sample project here: CouchDB_project.zip
In this example, we first delete the database created for demo purposes so we can show how CloverDX can easily control the CouchDB database via HTTPConnector using standard request methods. More information about this is available in CloverDX HTTPConnector documentation: http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/httpconnector.html
As you can see in the picture below, we delete the database with the name white_house, using the request method DELETE. The response is saved in the output file delete_db.out.
Right after, we create a new database for each different insert method using the request method PUT. This time, we get the output on an edge using standard CloverDX Output mapping.
There are a couple of ways to insert JSON documents into CouchDB. The following example demonstrates these three different ones:
The sample data used in the example is provided in a CSV file and sent to the JSONWriter component for generating the final JSON file. The JSONWriter component uses specific mapping.
Further information about the component can be found in CloverDX documentation: http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/jsonwriter.html
In the following steps, we’ll be using previously created JSON files in the format described below.
The “_id” field is unique ID field which can be provided on insert or is generated by CouchDB.
{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}
The bulk method requires a set of single documents to be wrapped in an array named “docs” as shown below:
More information about HTTP Bulk Document API is available here: http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API
{
"docs":[
{
"_id":"1",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
},
{
"_id":"14208",
"NAMELAST":"LAST_NAME",
"NAMEFIRST":"FIRST_NAME",
.
.
.
"CALLER_NAME_LAST":"CALLER_LAST_NAME",
"CALLER_NAME_FIRST":"CALLER_FIRST_NAME",
"Description":"DESCRIPTION STRING",
"RELEASE_DATE":"MM/DD/YYYY\r"
}
]
}
The format of the JSON file used in the third method (uploading a JSON file into the DB) is the same as in bulk method, only saved in a local file.
The following picture shows the part of the graph where the UniversalDataReader component reads the CSV file with data and copies it to three different edges each for the JSON file types described above. Two of them are processed further later, while the last one is saved to the local file.
For all of the methods we’ve mentioned, the Additional HTTP Headers property needs to be set to: Content-Type=application/json.
Read more about the HTTP headers: http://docs.couchdb.org/en/latest/api/basics.html#http-headers
The first used is actually the simplest one – inserting JSON documents one by one using POST request method. However, it’s also the slowest, due to multiple requests on the DB (for each single record).
CouchDB provides bulk_docs API for inserting multiple documents using a single JSON docs array.
The bulk_docs API is called using URL suffix: _bulk_docs – i.e.: http://URL/DB_NAME/_bulk_docs
Read more about this here: http://docs.couchdb.org/en/latest/api/database/bulk-api.html?highlight=bulk#db-bulk-docs
The third method is basically the same as the bulk method, only it uses the local file instead of data directly from the edge with defined Input file URL in CloverDX HTTPConnector.
Reading from a database is performed using CloverDX HTTPConnector.
In the case of reading the whole database, API method _temp_view needs to be called using POST and with Request Content specified in CloverDX HTTPConnector component i.e.: { "map" : "function(doc) { emit(doc._id, doc); }"}
Read more about CouchDB views and mapping function: http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
The CloverDX HTTPConnector reads a single record from the database using the GET method and a defined ID suffix in URL – i.e. http://URL/DB_NAME/ID
After getting the JSON file from CouchDB, extract is desired for further data manipulation in CloverDX. For this, CloverDX has a special component – JSONExtract. Check out more on JSONExtract in the CloverDX documentation:
http://doc.CloverDX.com/documentation/UserGuide/topic/com.CloverDX.gui.docs/docs/jsonextract.html