Along with the NoSQL movement came a problem - how to query the data in distributed NoSQL databases?
A possible solution is to use both SQL and NoSQL. Imagine using a conventional RDBMS for indexing your data along with NoSQL database for storing actual data. Then query your index RDBMS with SQL and fetch full results from NoSQL buckets.
Modeling
- Determine what parts of your data do you want to index.
- Model your RDBMS by putting only index fields there.
- Each table that represents some entity must contain a NoSQL id which links to actual object in your distributed database.
- Model your NoSQL database so that it will store your objects in a serialized way (JSON, XML, ...).
Inserting
- Generate an identifier which will link your NoSQL entry with RDBMS. This can be some object hashcode or an SQL sequence value.
- Write your object into RDBMS (only indexed fields) providing the NoSQL ID. This can be done asynchroniously.
- Write your object into NoSQL DB.
Querying
- Query your RDBMS for indexed fields.
- Retrieve NoSQL ID from query results.
- Fetch objects from NoSQL DB using the NoSQL ID.
This approach could be called as SomeSQL (due to SQL + NoSQL). I haven’t tested it in real environments, so it would be interesting to know if anyone did this and if this approach proved to be useful.
I think that this is be a relatively standard setup, e.g. video site: tags, title, etc. are in relational DB (with memcached), content is in the cloud.
ReplyDeleteI believe so, this solution is just common sense. However I haven't found any best practices for doing it in the right way.
ReplyDeleteIn your example videos are very large binary files, so buckets are perfect for that, file has an ID, you get it and use it. However if you are keeping serialized objects instead, it gets more challenging and requires some considerations. I.e. how to query the non-indexed parts. Having a second index and constantly crawling your data?
The Lucandra project uses a similar technique. Lucandra is basically Lucene with Cassandra as its backing store. Cassandra gives you the ability to scale horizontally, but still retain the query features of Lucene.
ReplyDeletehttp://github.com/tjake/Lucandra
nosql isn't a replacement of sql. the concept of nosql is basically "getting what you know and what you want". it's like a big hash table with a pair of unique key to its value. you only use nosql for cases in which you have the similar data model.(shopping cart for example)
ReplyDeletefor indexing, it will be easier to implement the mechanism in the nosql implementation. in SomeSQL, you would have two "queries" for a single record and two network connection. when you have a slow latency, it will double the chance of losing data. when you insert, the first question is how to ensure both writes are successfully completed
"nosql" is a bad name for the concept and technology to my understanding. it should be called "cache persistence" :)