Quantcast
Channel: CodeSection,代码区,数据库(综合) - CodeSec
Viewing all articles
Browse latest Browse all 6262

Multi-Node Full Text Search in Couchbase 4.5 Beta

$
0
0
Scale out Full Text Search in Couchbase Server 4.5 Beta

Couchbase Server 4.5 includes a new service, full text search (FTS) . In this blog, I'll talk about how FTS scales out across nodes, how to replicate indexes, and how it behaves in a rebalance.

Since the Couchbase Server 4.5 Developer Preview released, the FTS team has been busy. The beta release of Couchbase Server 4.5 not only squashes a lot of FTS bugs, it also includes many big FTS improvements :

12x faster indexing performance better statistics support for authentication and role based access control audit logging of administrator events support for partial results

The most notable new search feature in the 4.5 beta is the ability to run the FTS service across multiple nodes. You can try it out today with the beta, and what you read here will still apply when Couchbase Server 4.5 goes GA.

Let me get the disclaimer out of the way right now: FTS will remain a developer preview in the GA version of Couchbase Server 4.5, so don't run it on a production server please . There's a lot of really great functionality in FTS but we haven't yet ticked off all the performance and system testing check boxes on our to do list. On the plus side, that gives us a chance to address some of the feedback we're getting from early test users. (Got feedback? Feel free toemail me directly at will dot gardella at couchbase dot com)

Ok, let's get to the good stuff. You can use the search service to index and search text in your Couchbase documents without relying on a third party search package. The new search service joins the data, index, and query services and can be managed like other services for the purposes of multi-dimensional scaling (MDS). Note that unlike N1QL, the search service does both full text query and indexing in a single service.

Distributed Search Service - Under the Hood

For the most part, distributed search indexes "just work": the Couchbase Full Text Search service takes advantage of new hardware as you add nodes, and full text indexes are failed over and rebalanced along with the data service. This section talks about the mechanisms that enable this, which you usually don't need to know as a user but you will sometimes encounter, as in partial search results (touched onlater).

From the start, full text search was designed to distribute text indexes across nodes, in very much the same way that the data service distributes data in buckets. If you understand Couchbase Server'sbuckets and vBuckets, you've got a good mental model for understanding this. The bucket is a logical unit of data containment that is easy to work with, and the vBucket is a physical portion of the data that's in a bucket, that lives on a specific node on a cluster. When you turn on replication on a bucket, Couchbase Server creates copies of all the necessary vBuckets that make up that bucket. Couchbase Server also makes sure that the layout of those vBuckets is optimal, which is itself a complex topic but for now, you can just think of it as balancing the locations of the vBuckets so that they are spread evenly across nodes.

FTS works similarly. Full text indexes are automatically divided into fragments called pindexes , which is short for "partitioned indexes” or "physical indexes", depending on who you ask. Like a bucket, a full text index is a logical concept. The pindex is the physical implementation of the index, just like the vBucket is the physical implementation of the bucket.

Pindexes are physically distributed across the Couchbase nodes that run the data service, however many that might be. In the developer preview, it was a single node but that restriction has been lifted in the Beta. In the examples below, we'll use just two nodes to keep it simple.

Adding a search node

Time to get hands on. If you have Couchbase Server 4.5 Beta or newer, you're all ready to go. You're going to need to set up more than one node, so get a VM ready. For this example, I'm going to use a couple of windows Server 2012 VMs.

Let's go ahead and start with a single node running the search service. We'll create a simple index on the travel-sample bucket's hotel documents. If you don't have the travel sample installed, you can get it by clicking Settings > Sample Buckets and then checking the box.

For the purposes of this demo, any full text index will do. I created an index on type="hotel" (don't forget to disable the default type mapping) with two fields, "name" and "description", store = true, and "index only specified fields" just so it will build fast.

Here's the curl command, in case have a UI allergy:

curl -XPUT -H "Content-Type: application/json" \ http://localhost:8094/api/index/hotel \ -d '{ "type": "fulltext-index", "name": "hotel", "sourceType": "couchbase", "sourceName": "travel-sample", "planParams": { "maxPartitionsPerPIndex": 32, "numReplicas": 0, "hierarchyRules": null, "nodePlanParams": null, "pindexWeights": null, "planFrozen": false }, "params": { "mapping": { "byte_array_converter": "json", "default_analyzer": "standard", "default_datetime_parser": "dateTimeOptional", "default_field": "_all", "default_mapping": { "display_order": "1", "dynamic": true, "enabled": false }, "default_type": "_default", "index_dynamic": true, "store_dynamic": false, "type_field": "type", "types": { "hotel": { "display_order": "0", "dynamic": false, "enabled": true, "properties": { "description": { "dynamic": false, "enabled": true, "fields": [ { "analyzer": "", "display_order": "0", "include_in_all": true, "include_term_vectors": true, "index": true, "name": "description", "store": true, "type": "text" } ] }, "name": { "dynamic": false, "enabled": true, "fields": [ { "analyzer": "", "display_order": "1", "include_in_all": true, "include_term_vectors": true, "index": true, "name": "name", "store": true, "type": "text" } ] } } } } }, "store": { "kvStoreName": "forestdb" } }, "sourceParams": { "clusterManagerBackoffFactor": 0, "clusterManagerSleepInitMS": 0, "clusterManagerSleepMaxMS": 2000, "dataManagerBackoffFactor": 0, "dataManagerSleepInitMS": 0, "dataManagerSleepMaxMS": 2000, "feedBufferAckThreshold": 0, "feedBufferSizeBytes": 0 } }'

When you're done, you can search for a common word like "Inn" to make sure things are working.

When you take a look at your Couchbase data directory, you'll see an @fts directory. Open it and you will see abunch of directories containing the pindexes.


Multi-Node Full Text Search in Couchbase 4.5 Beta
Multi-Node Full Text Search in Couchbase 4.5 Beta

Viewing all articles
Browse latest Browse all 6262

Latest Images

Trending Articles