Index Design Question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Index Design Question

Joe Olson
Index design question....

Suppose I have N customers I am tracking data for. All customer data is basically the same structure, and I have determined I need a simple secondary index on this data in order to satisfy a business goal.

Is it better to have N indexes (N ~ 100), or a single index, with the customer ID the most significant part of a compound index?

Assume the app will not access all customers equally, some customers data will be accessed far more frequently than others.

I tend to think a one giant index is more susceptible to failure, and I'm not sure how the caching and swapping to disk is affected under each scenario.

Any thoughts? Target clusters size is 7 nodes., 16GB ram each. If N indexes is workable, how high can N be?

Thanks!


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Index Design Question

Luke Bakken
Hi Joe,

In order to provide the best answer, example (dummy) data and a
description of how the index will be queried would help a lot.

Thanks

--
Luke Bakken
Engineer
[hidden email]


On Wed, Mar 23, 2016 at 10:01 AM, Joe Olson <[hidden email]> wrote:

> Index design question....
>
> Suppose I have N customers I am tracking data for. All customer data is
> basically the same structure, and I have determined I need a simple
> secondary index on this data in order to satisfy a business goal.
>
> Is it better to have N indexes (N ~ 100), or a single index, with the
> customer ID the most significant part of a compound index?
>
> Assume the app will not access all customers equally, some customers data
> will be accessed far more frequently than others.
>
> I tend to think a one giant index is more susceptible to failure, and I'm
> not sure how the caching and swapping to disk is affected under each
> scenario.
>
> Any thoughts? Target clusters size is 7 nodes., 16GB ram each. If N indexes
> is workable, how high can N be?
>
> Thanks!
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Inconsistency in Querying Riak TS?

Joe Olson
According to the documentation at

https://docs.basho.com/riak/ts/1.4.0/using/querying/guidelines/

"A query covering more than a certain number of quanta (5 by default) will generate the error too_many_subqueries and the query system will refuse to run it. Assuming a default quantum of 15 minutes, the maximum query time range is 75 minutes."

However, the example shows a table of quantum 15 seconds. After the example:

"The maximum time range we can query is 60s, anything beyond will fail."

....which seems to contradict the first assertion.

Furthermore, I have been getting inconsistant behavior using the quantum.

I have a code snippet placed here, demonstrating this behavior:

https://gist.github.com/anonymous/a4a4ccb8617a00d38fb47a6b11571d81

In this example, I set up two tables, one with a quantum of 6 hours, another with a quantum of 12 days.
 
I am using the default range (5) on my cluster.

A query spanning 5 quantum partitions is allowed on the 6 hour table, a query spanning 5 quantum partitions
on the 12 day table fails with the 'too many subqueries' error.

Are the number of allowed subqueries different depending on the quantum size?

If so, is there more detailed documentation on the subject?

Thanks!


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistency in Querying Riak TS?

John Daily
One of the catches regarding the quantum limit is that unless the
query starts exactly on a boundary, the effective limit is one fewer
because it is determined by the number of partitions the query has to
touch.

I suspect that's the behavior you're seeing.

Sent from my iPhone

> On Oct 17, 2016, at 9:38 PM, Joe Olson <[hidden email]> wrote:
>
> According to the documentation at
>
> https://docs.basho.com/riak/ts/1.4.0/using/querying/guidelines/
>
> "A query covering more than a certain number of quanta (5 by default) will generate the error too_many_subqueries and the query system will refuse to run it. Assuming a default quantum of 15 minutes, the maximum query time range is 75 minutes."
>
> However, the example shows a table of quantum 15 seconds. After the example:
>
> "The maximum time range we can query is 60s, anything beyond will fail."
>
> ....which seems to contradict the first assertion.
>
> Furthermore, I have been getting inconsistant behavior using the quantum.
>
> I have a code snippet placed here, demonstrating this behavior:
>
> https://gist.github.com/anonymous/a4a4ccb8617a00d38fb47a6b11571d81
>
> In this example, I set up two tables, one with a quantum of 6 hours, another with a quantum of 12 days.
>
> I am using the default range (5) on my cluster.
>
> A query spanning 5 quantum partitions is allowed on the 6 hour table, a query spanning 5 quantum partitions
> on the 12 day table fails with the 'too many subqueries' error.
>
> Are the number of allowed subqueries different depending on the quantum size?
>
> If so, is there more detailed documentation on the subject?
>
> Thanks!
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com