Looking for a replacement datastore

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Looking for a replacement datastore

Shawn Parrish
Howdy Riak folk,

We're looking for a possible datastore replacement for our server
monitoring check results.  Maybe some of you can offer feedback if
Riak is a possible good solution.

Each ping, http request, etc has a result with various metadata that
we store.  We're looking at about 250 million results a month and that
number continues to grow.

We query this data for:
1. last result (is the server up or down?)
2. if it's up, when was the last 'down' and inversely when it's down,
when was the last up?
3. Full detail of the last 5 results (to show recent results)
4. Last 24 hours results (usually ~1440 results) to graph
5. Results in a date range (example: all results from July 1 through
July 31)... this can be very large.

We currently use bigcouch (Couchdb) but the views and built in
_all_docs slow down with so many results and especially when we call
them with 'include_docs', cause we need the details of the results as
well.

We're trying to trim down the total results stored by summarizing
older data and deleting it but that slows down Couchdb views even
farther.

Questions:
1. Is Riak a possible datastore for this use case?  Can I get so many
results, including all the details quickly enough?
2. Do you know of another datastore that might be better?

Thanks,
Shawn

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a replacement datastore

Alexander Sicular
Hi Shawn,

tl;dr use Riak and Redis. Could you do it without Redis? Probably. Would I want to? No.

I'll take a stab at this. It goes without saying that there are many ways to do this and no "right" way. Each solution will have its own positives and negatives. It all depends on what you and your team are comfortable with and the needs of your app.

For those that follow my ramblings you can guess what I'm gonna say. I would put forward a solution of Riak and... Redis! Why Redis? Data structures (Riak doesn't have them... at the moment... or ever? Don't try to make it have them). You want them. Things like sorted sets, lists and hashes (which compress) are great for basically everything.

Things in your favor:
-immutable data
-trivially shardable
-constrained data set (data is not UGC with unbounded size)
-predictable growth rates
-deterministic keys (think %iso8601 date or unix epoch int%_%customerid%)

Keep 48 hours of live data in Redis. Run a culling process that dumps data to Riak. The culling process will keep your Redis memory footprint within known limits. You could run this every minute to minimize any data loss from downed Redis servers (outside of master/slave etc.). So like here is where you could make do without Redis. If your app is holding on to, writing or requesting data every minute you could write straight to Riak and just have worker processes roll those minutes into hours/days whatever if necessary.

With deterministic keys you may not even need search, secondary indexes or key filters but with them you can basically cover any permutation you could come up with. Your application handles fetching the correct key(s) in a deterministic fashion simply by manipulating date offsets.

Whatever you do, you do not want a situation where you write half baked keys into Riak. Frequently updating keys will incur file compaction which will make you want to cry and punch babies just to make the pain stop.

Best,
-Alexander Sicular

@siculars
http://siculars.posterous.com

On Aug 16, 2012, at 6:20 PM, Shawn Parrish wrote:

> Howdy Riak folk,
>
> We're looking for a possible datastore replacement for our server
> monitoring check results.  Maybe some of you can offer feedback if
> Riak is a possible good solution.
>
> Each ping, http request, etc has a result with various metadata that
> we store.  We're looking at about 250 million results a month and that
> number continues to grow.
>
> We query this data for:
> 1. last result (is the server up or down?)
> 2. if it's up, when was the last 'down' and inversely when it's down,
> when was the last up?
> 3. Full detail of the last 5 results (to show recent results)
> 4. Last 24 hours results (usually ~1440 results) to graph
> 5. Results in a date range (example: all results from July 1 through
> July 31)... this can be very large.
>
> We currently use bigcouch (Couchdb) but the views and built in
> _all_docs slow down with so many results and especially when we call
> them with 'include_docs', cause we need the details of the results as
> well.
>
> We're trying to trim down the total results stored by summarizing
> older data and deleting it but that slows down Couchdb views even
> farther.
>
> Questions:
> 1. Is Riak a possible datastore for this use case?  Can I get so many
> results, including all the details quickly enough?
> 2. Do you know of another datastore that might be better?
>
> Thanks,
> Shawn
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a replacement datastore

Alexander Zhuravlev-2
In reply to this post by Shawn Parrish
On Thu, Aug 16, 2012 at 04:20:50PM -0600, Shawn Parrish wrote:

> Howdy Riak folk,
>
> We're looking for a possible datastore replacement for our server
> monitoring check results.  Maybe some of you can offer feedback if
> Riak is a possible good solution.
>
> Each ping, http request, etc has a result with various metadata that
> we store.  We're looking at about 250 million results a month and that
> number continues to grow.
>
> We query this data for:
> 1. last result (is the server up or down?)
> 2. if it's up, when was the last 'down' and inversely when it's down,
> when was the last up?
> 3. Full detail of the last 5 results (to show recent results)
> 4. Last 24 hours results (usually ~1440 results) to graph
> 5. Results in a date range (example: all results from July 1 through
> July 31)... this can be very large.
>
> We currently use bigcouch (Couchdb) but the views and built in
> _all_docs slow down with so many results and especially when we call
> them with 'include_docs', cause we need the details of the results as
> well.
>
> We're trying to trim down the total results stored by summarizing
> older data and deleting it but that slows down Couchdb views even
> farther.
>
> Questions:
> 1. Is Riak a possible datastore for this use case?  Can I get so many
> results, including all the details quickly enough?
> 2. Do you know of another datastore that might be better?

I would recommend to take a look at OpenTSDB (http://opentsdb.net/).
--
Alexander Zhuravlev

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Looking for a replacement datastore

Parnell Springmeyer
I use Riak at my company primarily for time series data; I quickly learned that key filters were a bad idea (when I designed our data model, I had the uid's of the objects in MySQL plus the timestamp of the data piece) and once I moved to Map/Reduce using Erlang module/functions, it dramatically improved response time and stability. It works really well for me IMHO.

OpenTSDB is also a great option; but we use Riak for quite a few other things INCLUDING MySQL and Redis - so I didn't want 4 DBs running around (3 is enough). Riak is working very well for these needs.

On Aug 17, 2012, at 8:01 AM, Alexander Zhuravlev wrote:

> On Thu, Aug 16, 2012 at 04:20:50PM -0600, Shawn Parrish wrote:
>> Howdy Riak folk,
>>
>> We're looking for a possible datastore replacement for our server
>> monitoring check results.  Maybe some of you can offer feedback if
>> Riak is a possible good solution.
>>
>> Each ping, http request, etc has a result with various metadata that
>> we store.  We're looking at about 250 million results a month and that
>> number continues to grow.
>>
>> We query this data for:
>> 1. last result (is the server up or down?)
>> 2. if it's up, when was the last 'down' and inversely when it's down,
>> when was the last up?
>> 3. Full detail of the last 5 results (to show recent results)
>> 4. Last 24 hours results (usually ~1440 results) to graph
>> 5. Results in a date range (example: all results from July 1 through
>> July 31)... this can be very large.
>>
>> We currently use bigcouch (Couchdb) but the views and built in
>> _all_docs slow down with so many results and especially when we call
>> them with 'include_docs', cause we need the details of the results as
>> well.
>>
>> We're trying to trim down the total results stored by summarizing
>> older data and deleting it but that slows down Couchdb views even
>> farther.
>>
>> Questions:
>> 1. Is Riak a possible datastore for this use case?  Can I get so many
>> results, including all the details quickly enough?
>> 2. Do you know of another datastore that might be better?
>
> I would recommend to take a look at OpenTSDB (http://opentsdb.net/).
> --
> Alexander Zhuravlev
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com