Put failure: too many siblings

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Put failure: too many siblings

Vladyslav Zakhozhai
Hi.

I have a trouble with PUT to Riak CS cluster. During this process I periodically see the following message in Riak error.log:

2016-06-03 11:15:55.201 [error] <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many siblings for object OBJECT_NAME (101)

and also

2016-06-03 12:41:50.678 [error] <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message {7345880,{error,{too_many_siblings,101}}}

Here OBJECT_NAME - is the name of object in Riak which has too many siblings.

I definitely sure that this objects are static. Nobody deletes is, nobody rewrites it. I have no idea why more than 100 siblings of this object occurs.

The following effect of this issue occurs:
  1. Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling has it own key or key duplicate?).
  2. Nodes are slow - adding new nodes are too slow
  3. Presence of "too many siblings" affects ownership handoffs
So I have several questions:
  1. Do hinted or ownership handoffs can affect siblings count (I mean can siblings be created during ownership of hinted handoffs)
  2. Is there any workaround of this issue. Do I need remove siblings manually or it removes during merges, read repairs and so on

My configuration:
  1. riak from basho's packages - 2.1.3-1
  2. riak cs from basho's packages - 2.1.0-1
  3. 24 riak/riak-cs nodes
  4. 32 GB RAM per node
  5. AAE is disabled

I appreciate you help.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Luke Bakken
Hi Vladyslav,

If you recognize the full name of the object raising the sibling
warning, it is most likely a manifest object. Sometimes, during hinted
handoff, you can see these messages. They should resolve after handoff
completes.

Please see the documentation for the transfer-limit command as well:

http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit

--
Luke Bakken
Engineer
[hidden email]


On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
<[hidden email]> wrote:

> Hi.
>
> I have a trouble with PUT to Riak CS cluster. During this process I
> periodically see the following message in Riak error.log:
>
> 2016-06-03 11:15:55.201 [error]
> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> siblings for object OBJECT_NAME (101)
>
> and also
>
> 2016-06-03 12:41:50.678 [error]
> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> {7345880,{error,{too_many_siblings,101}}}
>
> Here OBJECT_NAME - is the name of object in Riak which has too many
> siblings.
>
> I definitely sure that this objects are static. Nobody deletes is, nobody
> rewrites it. I have no idea why more than 100 siblings of this object
> occurs.
>
> The following effect of this issue occurs:
>
> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> has it own key or key duplicate?).
> Nodes are slow - adding new nodes are too slow
> Presence of "too many siblings" affects ownership handoffs
>
> So I have several questions:
>
> Do hinted or ownership handoffs can affect siblings count (I mean can
> siblings be created during ownership of hinted handoffs)
> Is there any workaround of this issue. Do I need remove siblings manually or
> it removes during merges, read repairs and so on
>
>
> My configuration:
>
> riak from basho's packages - 2.1.3-1
> riak cs from basho's packages - 2.1.0-1
> 24 riak/riak-cs nodes
> 32 GB RAM per node
> AAE is disabled
>
>
> I appreciate you help.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hi, Luke.

Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?

Now I have transfer-limit 1 on all riak nodes.

But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.

In the official docs I've read:

"Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."

I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).

Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.

On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
Hi Vladyslav,

If you recognize the full name of the object raising the sibling
warning, it is most likely a manifest object. Sometimes, during hinted
handoff, you can see these messages. They should resolve after handoff
completes.

Please see the documentation for the transfer-limit command as well:

http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit

--
Luke Bakken
Engineer
[hidden email]


On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
<[hidden email]> wrote:
> Hi.
>
> I have a trouble with PUT to Riak CS cluster. During this process I
> periodically see the following message in Riak error.log:
>
> 2016-06-03 11:15:55.201 [error]
> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> siblings for object OBJECT_NAME (101)
>
> and also
>
> 2016-06-03 12:41:50.678 [error]
> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> {7345880,{error,{too_many_siblings,101}}}
>
> Here OBJECT_NAME - is the name of object in Riak which has too many
> siblings.
>
> I definitely sure that this objects are static. Nobody deletes is, nobody
> rewrites it. I have no idea why more than 100 siblings of this object
> occurs.
>
> The following effect of this issue occurs:
>
> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> has it own key or key duplicate?).
> Nodes are slow - adding new nodes are too slow
> Presence of "too many siblings" affects ownership handoffs
>
> So I have several questions:
>
> Do hinted or ownership handoffs can affect siblings count (I mean can
> siblings be created during ownership of hinted handoffs)
> Is there any workaround of this issue. Do I need remove siblings manually or
> it removes during merges, read repairs and so on
>
>
> My configuration:
>
> riak from basho's packages - 2.1.3-1
> riak cs from basho's packages - 2.1.0-1
> 24 riak/riak-cs nodes
> 32 GB RAM per node
> AAE is disabled
>
>
> I appreciate you help.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hello.

I see very interesting and confusing thing.

From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.

I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).

Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.



On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
Hi, Luke.

Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?

Now I have transfer-limit 1 on all riak nodes.

But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.

In the official docs I've read:

"Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."

I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).

Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.

On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
Hi Vladyslav,

If you recognize the full name of the object raising the sibling
warning, it is most likely a manifest object. Sometimes, during hinted
handoff, you can see these messages. They should resolve after handoff
completes.

Please see the documentation for the transfer-limit command as well:

http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit

--
Luke Bakken
Engineer
[hidden email]


On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
<[hidden email]> wrote:
> Hi.
>
> I have a trouble with PUT to Riak CS cluster. During this process I
> periodically see the following message in Riak error.log:
>
> 2016-06-03 11:15:55.201 [error]
> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> siblings for object OBJECT_NAME (101)
>
> and also
>
> 2016-06-03 12:41:50.678 [error]
> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> {7345880,{error,{too_many_siblings,101}}}
>
> Here OBJECT_NAME - is the name of object in Riak which has too many
> siblings.
>
> I definitely sure that this objects are static. Nobody deletes is, nobody
> rewrites it. I have no idea why more than 100 siblings of this object
> occurs.
>
> The following effect of this issue occurs:
>
> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> has it own key or key duplicate?).
> Nodes are slow - adding new nodes are too slow
> Presence of "too many siblings" affects ownership handoffs
>
> So I have several questions:
>
> Do hinted or ownership handoffs can affect siblings count (I mean can
> siblings be created during ownership of hinted handoffs)
> Is there any workaround of this issue. Do I need remove siblings manually or
> it removes during merges, read repairs and so on
>
>
> My configuration:
>
> riak from basho's packages - 2.1.3-1
> riak cs from basho's packages - 2.1.0-1
> 24 riak/riak-cs nodes
> 32 GB RAM per node
> AAE is disabled
>
>
> I appreciate you help.
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Russell Brown-2
What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?

On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:

> Hello.
>
> I see very interesting and confusing thing.
>
> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>
> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>
> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>
>
>
> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
> Hi, Luke.
>
> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>
> Now I have transfer-limit 1 on all riak nodes.
>
> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>
> In the official docs I've read:
>
> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>
> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>
> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>
> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
> Hi Vladyslav,
>
> If you recognize the full name of the object raising the sibling
> warning, it is most likely a manifest object. Sometimes, during hinted
> handoff, you can see these messages. They should resolve after handoff
> completes.
>
> Please see the documentation for the transfer-limit command as well:
>
> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>
> --
> Luke Bakken
> Engineer
> [hidden email]
>
>
> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
> <[hidden email]> wrote:
> > Hi.
> >
> > I have a trouble with PUT to Riak CS cluster. During this process I
> > periodically see the following message in Riak error.log:
> >
> > 2016-06-03 11:15:55.201 [error]
> > <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> > siblings for object OBJECT_NAME (101)
> >
> > and also
> >
> > 2016-06-03 12:41:50.678 [error]
> > <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> > {7345880,{error,{too_many_siblings,101}}}
> >
> > Here OBJECT_NAME - is the name of object in Riak which has too many
> > siblings.
> >
> > I definitely sure that this objects are static. Nobody deletes is, nobody
> > rewrites it. I have no idea why more than 100 siblings of this object
> > occurs.
> >
> > The following effect of this issue occurs:
> >
> > Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> > has it own key or key duplicate?).
> > Nodes are slow - adding new nodes are too slow
> > Presence of "too many siblings" affects ownership handoffs
> >
> > So I have several questions:
> >
> > Do hinted or ownership handoffs can affect siblings count (I mean can
> > siblings be created during ownership of hinted handoffs)
> > Is there any workaround of this issue. Do I need remove siblings manually or
> > it removes during merges, read repairs and so on
> >
> >
> > My configuration:
> >
> > riak from basho's packages - 2.1.3-1
> > riak cs from basho's packages - 2.1.0-1
> > 24 riak/riak-cs nodes
> > 32 GB RAM per node
> > AAE is disabled
> >
> >
> > I appreciate you help.
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hi Russel,

thank you for your answer. I really appreciate your help.

2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.

Bucket properties:
# riak-admin bucket-type list
default (active)

# riak-admin bucket-type status default
default is active

allow_mult: true
basic_quorum: false
big_vclock: 50
chash_keyfun: {riak_core_util,chash_std_keyfun}
dvv_enabled: false
dw: quorum
last_write_wins: false
linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
n_val: 3
notfound_ok: true
old_vclock: 86400
postcommit: []
pr: 0
precommit: []
pw: 0
r: quorum
rw: quorum
small_vclock: 50
w: quorum
write_once: false
young_vclock: 20

I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?

Package versions:
# dpkg -l | grep riak
ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
ii  riak-cs                             2.1.0-1                          amd64        Riak CS

Subsystems versions:
"clique_version" : "0.3.2-0-ge332c8f",
"bitcask_version" : "1.7.2",
"sys_driver_version" : "2.2",
"riak_core_version" : "2.1.5-0-gb02ab53",
"riak_kv_version" : "2.1.2-0-gf969bba",
"riak_pipe_version" : "2.1.1-0-gb1ac2cf",
"cluster_info_version" : "2.0.3-0-g76c73fc",
"riak_auth_mods_version" : "2.1.0-0-g31b8b30",
"erlydtl_version" : "0.7.0",
"os_mon_version" : "2.2.13",
"inets_version" : "5.9.6",
"erlang_js_version" : "1.3.0-0-g07467d8",
"riak_control_version" : "2.1.2-0-gab3f924",
"xmerl_version" : "1.3.4",
"protobuffs_version" : "0.8.1p5-0-gf88fc3c",
"riak_sysmon_version" : "2.0.0",
"compiler_version" : "4.9.3",
"eleveldb_version" : "2.1.10-0-g0537ca9",
"lager_version" : "2.1.1",
"sasl_version" : "2.3.3",
"riak_dt_version" : "2.1.1-0-ga2986bc",
"runtime_tools_version" : "1.8.12",
"yokozuna_version" : "2.1.2-0-g3520d11",
"riak_search_version" : "2.1.1-0-gffe2113",
"sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
"basho_stats_version" : "1.0.3",
"crypto_version" : "3.1",
"merge_index_version" : "2.0.1-0-g0c8f77c",
"kernel_version" : "2.16.3",
"stdlib_version" : "1.19.3",
"riak_pb_version" : "2.1.0.2-0-g620bc70",
"syntax_tools_version" : "1.6.11",
"goldrush_version" : "0.1.7",
"ibrowse_version" : "4.0.2",
"mochiweb_version" : "2.9.0",
"exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
"ssl_version" : "5.3.1",
"public_key_version" : "0.20",
"pbkdf2_version" : "2.0.0-0-g7076584",
"sidejob_version" : "2.0.0-0-gc5aabba",
"webmachine_version" : "1.10.8-0-g7677c24",
"poolboy_version" : "0.8.1p3-0-g8bb45fb",
"riak_api_version" : "2.1.2-0-gd8d510f",
"asn1_version" : "2.0.3",


On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?

On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:

> Hello.
>
> I see very interesting and confusing thing.
>
> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>
> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>
> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>
>
>
> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
> Hi, Luke.
>
> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>
> Now I have transfer-limit 1 on all riak nodes.
>
> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>
> In the official docs I've read:
>
> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>
> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>
> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>
> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
> Hi Vladyslav,
>
> If you recognize the full name of the object raising the sibling
> warning, it is most likely a manifest object. Sometimes, during hinted
> handoff, you can see these messages. They should resolve after handoff
> completes.
>
> Please see the documentation for the transfer-limit command as well:
>
> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>
> --
> Luke Bakken
> Engineer
> [hidden email]
>
>
> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
> <[hidden email]> wrote:
> > Hi.
> >
> > I have a trouble with PUT to Riak CS cluster. During this process I
> > periodically see the following message in Riak error.log:
> >
> > 2016-06-03 11:15:55.201 [error]
> > <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> > siblings for object OBJECT_NAME (101)
> >
> > and also
> >
> > 2016-06-03 12:41:50.678 [error]
> > <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> > {7345880,{error,{too_many_siblings,101}}}
> >
> > Here OBJECT_NAME - is the name of object in Riak which has too many
> > siblings.
> >
> > I definitely sure that this objects are static. Nobody deletes is, nobody
> > rewrites it. I have no idea why more than 100 siblings of this object
> > occurs.
> >
> > The following effect of this issue occurs:
> >
> > Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> > has it own key or key duplicate?).
> > Nodes are slow - adding new nodes are too slow
> > Presence of "too many siblings" affects ownership handoffs
> >
> > So I have several questions:
> >
> > Do hinted or ownership handoffs can affect siblings count (I mean can
> > siblings be created during ownership of hinted handoffs)
> > Is there any workaround of this issue. Do I need remove siblings manually or
> > it removes during merges, read repairs and so on
> >
> >
> > My configuration:
> >
> > riak from basho's packages - 2.1.3-1
> > riak cs from basho's packages - 2.1.0-1
> > 24 riak/riak-cs nodes
> > 32 GB RAM per node
> > AAE is disabled
> >
> >
> > I appreciate you help.
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hello,

My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.

According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:

Is it safe to enable dvv on default bucket type and how it affects existing data? It may be a solution, is not it?

Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?

Thank you in advance.

On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
Hi Russel,

thank you for your answer. I really appreciate your help.

2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.

Bucket properties:
# riak-admin bucket-type list
default (active)

# riak-admin bucket-type status default
default is active

allow_mult: true
basic_quorum: false
big_vclock: 50
chash_keyfun: {riak_core_util,chash_std_keyfun}
dvv_enabled: false
dw: quorum
last_write_wins: false
linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
n_val: 3
notfound_ok: true
old_vclock: 86400
postcommit: []
pr: 0
precommit: []
pw: 0
r: quorum
rw: quorum
small_vclock: 50
w: quorum
write_once: false
young_vclock: 20

I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?

Package versions:
# dpkg -l | grep riak
ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
ii  riak-cs                             2.1.0-1                          amd64        Riak CS

Subsystems versions:
"clique_version" : "0.3.2-0-ge332c8f",
"bitcask_version" : "1.7.2",
"sys_driver_version" : "2.2",
"riak_core_version" : "2.1.5-0-gb02ab53",
"riak_kv_version" : "2.1.2-0-gf969bba",
"riak_pipe_version" : "2.1.1-0-gb1ac2cf",
"cluster_info_version" : "2.0.3-0-g76c73fc",
"riak_auth_mods_version" : "2.1.0-0-g31b8b30",
"erlydtl_version" : "0.7.0",
"os_mon_version" : "2.2.13",
"inets_version" : "5.9.6",
"erlang_js_version" : "1.3.0-0-g07467d8",
"riak_control_version" : "2.1.2-0-gab3f924",
"xmerl_version" : "1.3.4",
"protobuffs_version" : "0.8.1p5-0-gf88fc3c",
"riak_sysmon_version" : "2.0.0",
"compiler_version" : "4.9.3",
"eleveldb_version" : "2.1.10-0-g0537ca9",
"lager_version" : "2.1.1",
"sasl_version" : "2.3.3",
"riak_dt_version" : "2.1.1-0-ga2986bc",
"runtime_tools_version" : "1.8.12",
"yokozuna_version" : "2.1.2-0-g3520d11",
"riak_search_version" : "2.1.1-0-gffe2113",
"sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
"basho_stats_version" : "1.0.3",
"crypto_version" : "3.1",
"merge_index_version" : "2.0.1-0-g0c8f77c",
"kernel_version" : "2.16.3",
"stdlib_version" : "1.19.3",
"riak_pb_version" : "2.1.0.2-0-g620bc70",
"syntax_tools_version" : "1.6.11",
"goldrush_version" : "0.1.7",
"ibrowse_version" : "4.0.2",
"mochiweb_version" : "2.9.0",
"exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
"ssl_version" : "5.3.1",
"public_key_version" : "0.20",
"pbkdf2_version" : "2.0.0-0-g7076584",
"sidejob_version" : "2.0.0-0-gc5aabba",
"webmachine_version" : "1.10.8-0-g7677c24",
"poolboy_version" : "0.8.1p3-0-g8bb45fb",
"riak_api_version" : "2.1.2-0-gd8d510f",
"asn1_version" : "2.0.3",


On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?

On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:

> Hello.
>
> I see very interesting and confusing thing.
>
> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>
> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>
> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>
>
>
> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
> Hi, Luke.
>
> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>
> Now I have transfer-limit 1 on all riak nodes.
>
> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>
> In the official docs I've read:
>
> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>
> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>
> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>
> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
> Hi Vladyslav,
>
> If you recognize the full name of the object raising the sibling
> warning, it is most likely a manifest object. Sometimes, during hinted
> handoff, you can see these messages. They should resolve after handoff
> completes.
>
> Please see the documentation for the transfer-limit command as well:
>
> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>
> --
> Luke Bakken
> Engineer
> [hidden email]
>
>
> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
> <[hidden email]> wrote:
> > Hi.
> >
> > I have a trouble with PUT to Riak CS cluster. During this process I
> > periodically see the following message in Riak error.log:
> >
> > 2016-06-03 11:15:55.201 [error]
> > <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> > siblings for object OBJECT_NAME (101)
> >
> > and also
> >
> > 2016-06-03 12:41:50.678 [error]
> > <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> > {7345880,{error,{too_many_siblings,101}}}
> >
> > Here OBJECT_NAME - is the name of object in Riak which has too many
> > siblings.
> >
> > I definitely sure that this objects are static. Nobody deletes is, nobody
> > rewrites it. I have no idea why more than 100 siblings of this object
> > occurs.
> >
> > The following effect of this issue occurs:
> >
> > Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> > has it own key or key duplicate?).
> > Nodes are slow - adding new nodes are too slow
> > Presence of "too many siblings" affects ownership handoffs
> >
> > So I have several questions:
> >
> > Do hinted or ownership handoffs can affect siblings count (I mean can
> > siblings be created during ownership of hinted handoffs)
> > Is there any workaround of this issue. Do I need remove siblings manually or
> > it removes during merges, read repairs and so on
> >
> >
> > My configuration:
> >
> > riak from basho's packages - 2.1.3-1
> > riak cs from basho's packages - 2.1.0-1
> > 24 riak/riak-cs nodes
> > 32 GB RAM per node
> > AAE is disabled
> >
> >
> > I appreciate you help.
> >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Russell Brown-4

On 24 May 2017, at 09:11, Vladyslav Zakhozhai <[hidden email]> wrote:

> Hello,
>
> My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.
>
> According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:
>
> Is it safe to enable dvv on default bucket type and how it affects existing data?

It might not affect existing data enough. All the existing siblings are “undotted” and would need a read-put cycle to resolve.

> It may be a solution, is not it?

You may require further action. I remember basho support helping someone with a similar issue, and there was some manual intervention/scripted solution, but I can’t remember what it was right now. I think those objects (as logged) with the sibling issues need to be read and resolved. Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.

>
> Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?

Yes.

>
> Thank you in advance.
>
> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
> Hi Russel,
>
> thank you for your answer. I really appreciate your help.
>
> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.
>
> Bucket properties:
> # riak-admin bucket-type list
> default (active)
>
> # riak-admin bucket-type status default
> default is active
>
> allow_mult: true
> basic_quorum: false
> big_vclock: 50
> chash_keyfun: {riak_core_util,chash_std_keyfun}
> dvv_enabled: false
> dw: quorum
> last_write_wins: false
> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
> n_val: 3
> notfound_ok: true
> old_vclock: 86400
> postcommit: []
> pr: 0
> precommit: []
> pw: 0
> r: quorum
> rw: quorum
> small_vclock: 50
> w: quorum
> write_once: false
> young_vclock: 20
>
> I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?
>
> Package versions:
> # dpkg -l | grep riak
> ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
> ii  riak-cs                             2.1.0-1                          amd64        Riak CS
>
> Subsystems versions:
> "clique_version" : "0.3.2-0-ge332c8f",
> "bitcask_version" : "1.7.2",
> "sys_driver_version" : "2.2",
> "riak_core_version" : "2.1.5-0-gb02ab53",
> "riak_kv_version" : "2.1.2-0-gf969bba",
> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
> "cluster_info_version" : "2.0.3-0-g76c73fc",
> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
> "erlydtl_version" : "0.7.0",
> "os_mon_version" : "2.2.13",
> "inets_version" : "5.9.6",
> "erlang_js_version" : "1.3.0-0-g07467d8",
> "riak_control_version" : "2.1.2-0-gab3f924",
> "xmerl_version" : "1.3.4",
> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
> "riak_sysmon_version" : "2.0.0",
> "compiler_version" : "4.9.3",
> "eleveldb_version" : "2.1.10-0-g0537ca9",
> "lager_version" : "2.1.1",
> "sasl_version" : "2.3.3",
> "riak_dt_version" : "2.1.1-0-ga2986bc",
> "runtime_tools_version" : "1.8.12",
> "yokozuna_version" : "2.1.2-0-g3520d11",
> "riak_search_version" : "2.1.1-0-gffe2113",
> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
> "basho_stats_version" : "1.0.3",
> "crypto_version" : "3.1",
> "merge_index_version" : "2.0.1-0-g0c8f77c",
> "kernel_version" : "2.16.3",
> "stdlib_version" : "1.19.3",
> "riak_pb_version" : "2.1.0.2-0-g620bc70",
> "syntax_tools_version" : "1.6.11",
> "goldrush_version" : "0.1.7",
> "ibrowse_version" : "4.0.2",
> "mochiweb_version" : "2.9.0",
> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
> "ssl_version" : "5.3.1",
> "public_key_version" : "0.20",
> "pbkdf2_version" : "2.0.0-0-g7076584",
> "sidejob_version" : "2.0.0-0-gc5aabba",
> "webmachine_version" : "1.10.8-0-g7677c24",
> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
> "riak_api_version" : "2.1.2-0-gd8d510f",
> "asn1_version" : "2.0.3",
>
>
> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
> What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?
>
> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:
>
> > Hello.
> >
> > I see very interesting and confusing thing.
> >
> > From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
> >
> > I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
> >
> > Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
> >
> >
> >
> > On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
> > Hi, Luke.
> >
> > Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
> >
> > Now I have transfer-limit 1 on all riak nodes.
> >
> > But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
> >
> > In the official docs I've read:
> >
> > "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
> >
> > I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
> >
> > Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
> >
> > On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
> > Hi Vladyslav,
> >
> > If you recognize the full name of the object raising the sibling
> > warning, it is most likely a manifest object. Sometimes, during hinted
> > handoff, you can see these messages. They should resolve after handoff
> > completes.
> >
> > Please see the documentation for the transfer-limit command as well:
> >
> > http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
> >
> > --
> > Luke Bakken
> > Engineer
> > [hidden email]
> >
> >
> > On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
> > <[hidden email]> wrote:
> > > Hi.
> > >
> > > I have a trouble with PUT to Riak CS cluster. During this process I
> > > periodically see the following message in Riak error.log:
> > >
> > > 2016-06-03 11:15:55.201 [error]
> > > <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
> > > siblings for object OBJECT_NAME (101)
> > >
> > > and also
> > >
> > > 2016-06-03 12:41:50.678 [error]
> > > <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
> > > {7345880,{error,{too_many_siblings,101}}}
> > >
> > > Here OBJECT_NAME - is the name of object in Riak which has too many
> > > siblings.
> > >
> > > I definitely sure that this objects are static. Nobody deletes is, nobody
> > > rewrites it. I have no idea why more than 100 siblings of this object
> > > occurs.
> > >
> > > The following effect of this issue occurs:
> > >
> > > Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
> > > has it own key or key duplicate?).
> > > Nodes are slow - adding new nodes are too slow
> > > Presence of "too many siblings" affects ownership handoffs
> > >
> > > So I have several questions:
> > >
> > > Do hinted or ownership handoffs can affect siblings count (I mean can
> > > siblings be created during ownership of hinted handoffs)
> > > Is there any workaround of this issue. Do I need remove siblings manually or
> > > it removes during merges, read repairs and so on
> > >
> > >
> > > My configuration:
> > >
> > > riak from basho's packages - 2.1.3-1
> > > riak cs from basho's packages - 2.1.0-1
> > > 24 riak/riak-cs nodes
> > > 32 GB RAM per node
> > > AAE is disabled
> > >
> > >
> > > I appreciate you help.
> > >
> > > _______________________________________________
> > > riak-users mailing list
> > > [hidden email]
> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >
> > _______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Russell Brown-4
Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that adding the property `riak_kv.retry_put_coordinator_failure=false` may help in future. But won’t help with your keys with too many siblings.

On 24 May 2017, at 09:22, Russell Brown <[hidden email]> wrote:

>
> On 24 May 2017, at 09:11, Vladyslav Zakhozhai <[hidden email]> wrote:
>
>> Hello,
>>
>> My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.
>>
>> According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:
>>
>> Is it safe to enable dvv on default bucket type and how it affects existing data?
>
> It might not affect existing data enough. All the existing siblings are “undotted” and would need a read-put cycle to resolve.
>
>> It may be a solution, is not it?
>
> You may require further action. I remember basho support helping someone with a similar issue, and there was some manual intervention/scripted solution, but I can’t remember what it was right now. I think those objects (as logged) with the sibling issues need to be read and resolved. Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.
>
>>
>> Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?
>
> Yes.
>
>>
>> Thank you in advance.
>>
>> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>> Hi Russel,
>>
>> thank you for your answer. I really appreciate your help.
>>
>> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.
>>
>> Bucket properties:
>> # riak-admin bucket-type list
>> default (active)
>>
>> # riak-admin bucket-type status default
>> default is active
>>
>> allow_mult: true
>> basic_quorum: false
>> big_vclock: 50
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> dvv_enabled: false
>> dw: quorum
>> last_write_wins: false
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> n_val: 3
>> notfound_ok: true
>> old_vclock: 86400
>> postcommit: []
>> pr: 0
>> precommit: []
>> pw: 0
>> r: quorum
>> rw: quorum
>> small_vclock: 50
>> w: quorum
>> write_once: false
>> young_vclock: 20
>>
>> I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?
>>
>> Package versions:
>> # dpkg -l | grep riak
>> ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
>> ii  riak-cs                             2.1.0-1                          amd64        Riak CS
>>
>> Subsystems versions:
>> "clique_version" : "0.3.2-0-ge332c8f",
>> "bitcask_version" : "1.7.2",
>> "sys_driver_version" : "2.2",
>> "riak_core_version" : "2.1.5-0-gb02ab53",
>> "riak_kv_version" : "2.1.2-0-gf969bba",
>> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
>> "cluster_info_version" : "2.0.3-0-g76c73fc",
>> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
>> "erlydtl_version" : "0.7.0",
>> "os_mon_version" : "2.2.13",
>> "inets_version" : "5.9.6",
>> "erlang_js_version" : "1.3.0-0-g07467d8",
>> "riak_control_version" : "2.1.2-0-gab3f924",
>> "xmerl_version" : "1.3.4",
>> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
>> "riak_sysmon_version" : "2.0.0",
>> "compiler_version" : "4.9.3",
>> "eleveldb_version" : "2.1.10-0-g0537ca9",
>> "lager_version" : "2.1.1",
>> "sasl_version" : "2.3.3",
>> "riak_dt_version" : "2.1.1-0-ga2986bc",
>> "runtime_tools_version" : "1.8.12",
>> "yokozuna_version" : "2.1.2-0-g3520d11",
>> "riak_search_version" : "2.1.1-0-gffe2113",
>> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
>> "basho_stats_version" : "1.0.3",
>> "crypto_version" : "3.1",
>> "merge_index_version" : "2.0.1-0-g0c8f77c",
>> "kernel_version" : "2.16.3",
>> "stdlib_version" : "1.19.3",
>> "riak_pb_version" : "2.1.0.2-0-g620bc70",
>> "syntax_tools_version" : "1.6.11",
>> "goldrush_version" : "0.1.7",
>> "ibrowse_version" : "4.0.2",
>> "mochiweb_version" : "2.9.0",
>> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
>> "ssl_version" : "5.3.1",
>> "public_key_version" : "0.20",
>> "pbkdf2_version" : "2.0.0-0-g7076584",
>> "sidejob_version" : "2.0.0-0-gc5aabba",
>> "webmachine_version" : "1.10.8-0-g7677c24",
>> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
>> "riak_api_version" : "2.1.2-0-gd8d510f",
>> "asn1_version" : "2.0.3",
>>
>>
>> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
>> What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?
>>
>> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:
>>
>>> Hello.
>>>
>>> I see very interesting and confusing thing.
>>>
>>> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>>>
>>> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>>>
>>> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>>> Hi, Luke.
>>>
>>> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>>>
>>> Now I have transfer-limit 1 on all riak nodes.
>>>
>>> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>>>
>>> In the official docs I've read:
>>>
>>> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>>>
>>> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>>>
>>> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>>>
>>> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
>>> Hi Vladyslav,
>>>
>>> If you recognize the full name of the object raising the sibling
>>> warning, it is most likely a manifest object. Sometimes, during hinted
>>> handoff, you can see these messages. They should resolve after handoff
>>> completes.
>>>
>>> Please see the documentation for the transfer-limit command as well:
>>>
>>> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>>>
>>> --
>>> Luke Bakken
>>> Engineer
>>> [hidden email]
>>>
>>>
>>> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
>>> <[hidden email]> wrote:
>>>> Hi.
>>>>
>>>> I have a trouble with PUT to Riak CS cluster. During this process I
>>>> periodically see the following message in Riak error.log:
>>>>
>>>> 2016-06-03 11:15:55.201 [error]
>>>> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
>>>> siblings for object OBJECT_NAME (101)
>>>>
>>>> and also
>>>>
>>>> 2016-06-03 12:41:50.678 [error]
>>>> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
>>>> {7345880,{error,{too_many_siblings,101}}}
>>>>
>>>> Here OBJECT_NAME - is the name of object in Riak which has too many
>>>> siblings.
>>>>
>>>> I definitely sure that this objects are static. Nobody deletes is, nobody
>>>> rewrites it. I have no idea why more than 100 siblings of this object
>>>> occurs.
>>>>
>>>> The following effect of this issue occurs:
>>>>
>>>> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
>>>> has it own key or key duplicate?).
>>>> Nodes are slow - adding new nodes are too slow
>>>> Presence of "too many siblings" affects ownership handoffs
>>>>
>>>> So I have several questions:
>>>>
>>>> Do hinted or ownership handoffs can affect siblings count (I mean can
>>>> siblings be created during ownership of hinted handoffs)
>>>> Is there any workaround of this issue. Do I need remove siblings manually or
>>>> it removes during merges, read repairs and so on
>>>>
>>>>
>>>> My configuration:
>>>>
>>>> riak from basho's packages - 2.1.3-1
>>>> riak cs from basho's packages - 2.1.0-1
>>>> 24 riak/riak-cs nodes
>>>> 32 GB RAM per node
>>>> AAE is disabled
>>>>
>>>>
>>>> I appreciate you help.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Russell, thank you for the answer.

Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.

It would be great.

Thank you once more.

On Wed, May 24, 2017 at 11:36 AM Russell Brown <[hidden email]> wrote:
Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that adding the property `riak_kv.retry_put_coordinator_failure=false` may help in future. But won’t help with your keys with too many siblings.

On 24 May 2017, at 09:22, Russell Brown <[hidden email]> wrote:

>
> On 24 May 2017, at 09:11, Vladyslav Zakhozhai <[hidden email]> wrote:
>
>> Hello,
>>
>> My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.
>>
>> According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:
>>
>> Is it safe to enable dvv on default bucket type and how it affects existing data?
>
> It might not affect existing data enough. All the existing siblings are “undotted” and would need a read-put cycle to resolve.
>
>> It may be a solution, is not it?
>
> You may require further action. I remember basho support helping someone with a similar issue, and there was some manual intervention/scripted solution, but I can’t remember what it was right now. I think those objects (as logged) with the sibling issues need to be read and resolved. Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.
>
>>
>> Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?
>
> Yes.
>
>>
>> Thank you in advance.
>>
>> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>> Hi Russel,
>>
>> thank you for your answer. I really appreciate your help.
>>
>> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.
>>
>> Bucket properties:
>> # riak-admin bucket-type list
>> default (active)
>>
>> # riak-admin bucket-type status default
>> default is active
>>
>> allow_mult: true
>> basic_quorum: false
>> big_vclock: 50
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> dvv_enabled: false
>> dw: quorum
>> last_write_wins: false
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> n_val: 3
>> notfound_ok: true
>> old_vclock: 86400
>> postcommit: []
>> pr: 0
>> precommit: []
>> pw: 0
>> r: quorum
>> rw: quorum
>> small_vclock: 50
>> w: quorum
>> write_once: false
>> young_vclock: 20
>>
>> I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?
>>
>> Package versions:
>> # dpkg -l | grep riak
>> ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
>> ii  riak-cs                             2.1.0-1                          amd64        Riak CS
>>
>> Subsystems versions:
>> "clique_version" : "0.3.2-0-ge332c8f",
>> "bitcask_version" : "1.7.2",
>> "sys_driver_version" : "2.2",
>> "riak_core_version" : "2.1.5-0-gb02ab53",
>> "riak_kv_version" : "2.1.2-0-gf969bba",
>> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
>> "cluster_info_version" : "2.0.3-0-g76c73fc",
>> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
>> "erlydtl_version" : "0.7.0",
>> "os_mon_version" : "2.2.13",
>> "inets_version" : "5.9.6",
>> "erlang_js_version" : "1.3.0-0-g07467d8",
>> "riak_control_version" : "2.1.2-0-gab3f924",
>> "xmerl_version" : "1.3.4",
>> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
>> "riak_sysmon_version" : "2.0.0",
>> "compiler_version" : "4.9.3",
>> "eleveldb_version" : "2.1.10-0-g0537ca9",
>> "lager_version" : "2.1.1",
>> "sasl_version" : "2.3.3",
>> "riak_dt_version" : "2.1.1-0-ga2986bc",
>> "runtime_tools_version" : "1.8.12",
>> "yokozuna_version" : "2.1.2-0-g3520d11",
>> "riak_search_version" : "2.1.1-0-gffe2113",
>> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
>> "basho_stats_version" : "1.0.3",
>> "crypto_version" : "3.1",
>> "merge_index_version" : "2.0.1-0-g0c8f77c",
>> "kernel_version" : "2.16.3",
>> "stdlib_version" : "1.19.3",
>> "riak_pb_version" : "2.1.0.2-0-g620bc70",
>> "syntax_tools_version" : "1.6.11",
>> "goldrush_version" : "0.1.7",
>> "ibrowse_version" : "4.0.2",
>> "mochiweb_version" : "2.9.0",
>> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
>> "ssl_version" : "5.3.1",
>> "public_key_version" : "0.20",
>> "pbkdf2_version" : "2.0.0-0-g7076584",
>> "sidejob_version" : "2.0.0-0-gc5aabba",
>> "webmachine_version" : "1.10.8-0-g7677c24",
>> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
>> "riak_api_version" : "2.1.2-0-gd8d510f",
>> "asn1_version" : "2.0.3",
>>
>>
>> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
>> What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?
>>
>> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:
>>
>>> Hello.
>>>
>>> I see very interesting and confusing thing.
>>>
>>> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>>>
>>> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>>>
>>> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>>> Hi, Luke.
>>>
>>> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>>>
>>> Now I have transfer-limit 1 on all riak nodes.
>>>
>>> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>>>
>>> In the official docs I've read:
>>>
>>> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>>>
>>> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>>>
>>> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>>>
>>> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
>>> Hi Vladyslav,
>>>
>>> If you recognize the full name of the object raising the sibling
>>> warning, it is most likely a manifest object. Sometimes, during hinted
>>> handoff, you can see these messages. They should resolve after handoff
>>> completes.
>>>
>>> Please see the documentation for the transfer-limit command as well:
>>>
>>> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>>>
>>> --
>>> Luke Bakken
>>> Engineer
>>> [hidden email]
>>>
>>>
>>> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
>>> <[hidden email]> wrote:
>>>> Hi.
>>>>
>>>> I have a trouble with PUT to Riak CS cluster. During this process I
>>>> periodically see the following message in Riak error.log:
>>>>
>>>> 2016-06-03 11:15:55.201 [error]
>>>> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
>>>> siblings for object OBJECT_NAME (101)
>>>>
>>>> and also
>>>>
>>>> 2016-06-03 12:41:50.678 [error]
>>>> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
>>>> {7345880,{error,{too_many_siblings,101}}}
>>>>
>>>> Here OBJECT_NAME - is the name of object in Riak which has too many
>>>> siblings.
>>>>
>>>> I definitely sure that this objects are static. Nobody deletes is, nobody
>>>> rewrites it. I have no idea why more than 100 siblings of this object
>>>> occurs.
>>>>
>>>> The following effect of this issue occurs:
>>>>
>>>> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
>>>> has it own key or key duplicate?).
>>>> Nodes are slow - adding new nodes are too slow
>>>> Presence of "too many siblings" affects ownership handoffs
>>>>
>>>> So I have several questions:
>>>>
>>>> Do hinted or ownership handoffs can affect siblings count (I mean can
>>>> siblings be created during ownership of hinted handoffs)
>>>> Is there any workaround of this issue. Do I need remove siblings manually or
>>>> it removes during merges, read repairs and so on
>>>>
>>>>
>>>> My configuration:
>>>>
>>>> riak from basho's packages - 2.1.3-1
>>>> riak cs from basho's packages - 2.1.0-1
>>>> 24 riak/riak-cs nodes
>>>> 32 GB RAM per node
>>>> AAE is disabled
>>>>
>>>>
>>>> I appreciate you help.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hi,

I've been trying to change dvv_enabled for default bucket type. But this is impossible with riak-admin:

riak-admin bucket-type update default '{"props":{"dvv_enabled":true}}'
Error updating bucket type default:
no_default_update

I think that workaround for this is changing default props in riak config:

{riak_core, [

             {default_bucket_props, [
                 {allow_mult, true}, 
                 {dvv_enabled, true}
             ]},
...

(yes, I still use old-style configs)

And then I need to restart all riak nodes. Here is two questions:
1. Is this approach correct?
2. Is it ok to have different default_bucket_props value on different nodes of the same cluster (in short period of time)?

I have to restart 27 riak nodes. There is several billions of keys in riak and each node starts wery slow (20-30-60 min; bitcask backend). So I can't change default_bucket_props simultaneously in a such way.

I also can change this parameter in riak console, i.e.

application:set_env(riak_core, default_bucket_props, [{dvv_enabled, true}, ......]). But what I need to do for applying this changes?


On Wed, May 24, 2017 at 2:55 PM Vladyslav Zakhozhai <[hidden email]> wrote:
Russell, thank you for the answer.

Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.

It would be great.

Thank you once more.

On Wed, May 24, 2017 at 11:36 AM Russell Brown <[hidden email]> wrote:
Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that adding the property `riak_kv.retry_put_coordinator_failure=false` may help in future. But won’t help with your keys with too many siblings.

On 24 May 2017, at 09:22, Russell Brown <[hidden email]> wrote:

>
> On 24 May 2017, at 09:11, Vladyslav Zakhozhai <[hidden email]> wrote:
>
>> Hello,
>>
>> My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.
>>
>> According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:
>>
>> Is it safe to enable dvv on default bucket type and how it affects existing data?
>
> It might not affect existing data enough. All the existing siblings are “undotted” and would need a read-put cycle to resolve.
>
>> It may be a solution, is not it?
>
> You may require further action. I remember basho support helping someone with a similar issue, and there was some manual intervention/scripted solution, but I can’t remember what it was right now. I think those objects (as logged) with the sibling issues need to be read and resolved. Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.
>
>>
>> Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?
>
> Yes.
>
>>
>> Thank you in advance.
>>
>> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>> Hi Russel,
>>
>> thank you for your answer. I really appreciate your help.
>>
>> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.
>>
>> Bucket properties:
>> # riak-admin bucket-type list
>> default (active)
>>
>> # riak-admin bucket-type status default
>> default is active
>>
>> allow_mult: true
>> basic_quorum: false
>> big_vclock: 50
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> dvv_enabled: false
>> dw: quorum
>> last_write_wins: false
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> n_val: 3
>> notfound_ok: true
>> old_vclock: 86400
>> postcommit: []
>> pr: 0
>> precommit: []
>> pw: 0
>> r: quorum
>> rw: quorum
>> small_vclock: 50
>> w: quorum
>> write_once: false
>> young_vclock: 20
>>
>> I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?
>>
>> Package versions:
>> # dpkg -l | grep riak
>> ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
>> ii  riak-cs                             2.1.0-1                          amd64        Riak CS
>>
>> Subsystems versions:
>> "clique_version" : "0.3.2-0-ge332c8f",
>> "bitcask_version" : "1.7.2",
>> "sys_driver_version" : "2.2",
>> "riak_core_version" : "2.1.5-0-gb02ab53",
>> "riak_kv_version" : "2.1.2-0-gf969bba",
>> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
>> "cluster_info_version" : "2.0.3-0-g76c73fc",
>> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
>> "erlydtl_version" : "0.7.0",
>> "os_mon_version" : "2.2.13",
>> "inets_version" : "5.9.6",
>> "erlang_js_version" : "1.3.0-0-g07467d8",
>> "riak_control_version" : "2.1.2-0-gab3f924",
>> "xmerl_version" : "1.3.4",
>> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
>> "riak_sysmon_version" : "2.0.0",
>> "compiler_version" : "4.9.3",
>> "eleveldb_version" : "2.1.10-0-g0537ca9",
>> "lager_version" : "2.1.1",
>> "sasl_version" : "2.3.3",
>> "riak_dt_version" : "2.1.1-0-ga2986bc",
>> "runtime_tools_version" : "1.8.12",
>> "yokozuna_version" : "2.1.2-0-g3520d11",
>> "riak_search_version" : "2.1.1-0-gffe2113",
>> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
>> "basho_stats_version" : "1.0.3",
>> "crypto_version" : "3.1",
>> "merge_index_version" : "2.0.1-0-g0c8f77c",
>> "kernel_version" : "2.16.3",
>> "stdlib_version" : "1.19.3",
>> "riak_pb_version" : "2.1.0.2-0-g620bc70",
>> "syntax_tools_version" : "1.6.11",
>> "goldrush_version" : "0.1.7",
>> "ibrowse_version" : "4.0.2",
>> "mochiweb_version" : "2.9.0",
>> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
>> "ssl_version" : "5.3.1",
>> "public_key_version" : "0.20",
>> "pbkdf2_version" : "2.0.0-0-g7076584",
>> "sidejob_version" : "2.0.0-0-gc5aabba",
>> "webmachine_version" : "1.10.8-0-g7677c24",
>> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
>> "riak_api_version" : "2.1.2-0-gd8d510f",
>> "asn1_version" : "2.0.3",
>>
>>
>> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
>> What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?
>>
>> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:
>>
>>> Hello.
>>>
>>> I see very interesting and confusing thing.
>>>
>>> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>>>
>>> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>>>
>>> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>>> Hi, Luke.
>>>
>>> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>>>
>>> Now I have transfer-limit 1 on all riak nodes.
>>>
>>> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>>>
>>> In the official docs I've read:
>>>
>>> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>>>
>>> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>>>
>>> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>>>
>>> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
>>> Hi Vladyslav,
>>>
>>> If you recognize the full name of the object raising the sibling
>>> warning, it is most likely a manifest object. Sometimes, during hinted
>>> handoff, you can see these messages. They should resolve after handoff
>>> completes.
>>>
>>> Please see the documentation for the transfer-limit command as well:
>>>
>>> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>>>
>>> --
>>> Luke Bakken
>>> Engineer
>>> [hidden email]
>>>
>>>
>>> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
>>> <[hidden email]> wrote:
>>>> Hi.
>>>>
>>>> I have a trouble with PUT to Riak CS cluster. During this process I
>>>> periodically see the following message in Riak error.log:
>>>>
>>>> 2016-06-03 11:15:55.201 [error]
>>>> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
>>>> siblings for object OBJECT_NAME (101)
>>>>
>>>> and also
>>>>
>>>> 2016-06-03 12:41:50.678 [error]
>>>> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
>>>> {7345880,{error,{too_many_siblings,101}}}
>>>>
>>>> Here OBJECT_NAME - is the name of object in Riak which has too many
>>>> siblings.
>>>>
>>>> I definitely sure that this objects are static. Nobody deletes is, nobody
>>>> rewrites it. I have no idea why more than 100 siblings of this object
>>>> occurs.
>>>>
>>>> The following effect of this issue occurs:
>>>>
>>>> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
>>>> has it own key or key duplicate?).
>>>> Nodes are slow - adding new nodes are too slow
>>>> Presence of "too many siblings" affects ownership handoffs
>>>>
>>>> So I have several questions:
>>>>
>>>> Do hinted or ownership handoffs can affect siblings count (I mean can
>>>> siblings be created during ownership of hinted handoffs)
>>>> Is there any workaround of this issue. Do I need remove siblings manually or
>>>> it removes during merges, read repairs and so on
>>>>
>>>>
>>>> My configuration:
>>>>
>>>> riak from basho's packages - 2.1.3-1
>>>> riak cs from basho's packages - 2.1.0-1
>>>> 24 riak/riak-cs nodes
>>>> 32 GB RAM per node
>>>> AAE is disabled
>>>>
>>>>
>>>> I appreciate you help.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Magnus Kessler
On 25 May 2017 at 09:39, Vladyslav Zakhozhai <[hidden email]> wrote:
Hi,

I've been trying to change dvv_enabled for default bucket type. But this is impossible with riak-admin:

riak-admin bucket-type update default '{"props":{"dvv_enabled":true}}'
Error updating bucket type default:
no_default_update

I think that workaround for this is changing default props in riak config:

{riak_core, [

             {default_bucket_props, [
                 {allow_mult, true}, 
                 {dvv_enabled, true}
             ]},
...

(yes, I still use old-style configs)

And then I need to restart all riak nodes. Here is two questions:
1. Is this approach correct?
2. Is it ok to have different default_bucket_props value on different nodes of the same cluster (in short period of time)?

I have to restart 27 riak nodes. There is several billions of keys in riak and each node starts wery slow (20-30-60 min; bitcask backend). So I can't change default_bucket_props simultaneously in a such way.

I also can change this parameter in riak console, i.e.

application:set_env(riak_core, default_bucket_props, [{dvv_enabled, true}, ......]). But what I need to do for applying this changes?


Hi Vladyslav,

The recommended approach for changing the default bucket type's properties is to change the settings in `riak.conf` or `advanced.config`. However, I just checked that any settings changed through a set_env call also seem to be reflected in the runtime configuration.

If you'd like to try this, I recommend making the change on a test cluster first, as I have not verified if this causes issues on a production CS cluster. The set_env call should pass in the complete set of bucket type properties, not just the changes. You can try the following (with default default bucket-type properties):

riak_core_util:rpc_every_member_ann(application, set_env, [riak_core, default_bucket_props, [{allow_mult,false},{big_vclock,50},{chash_keyfun,{riak_core_util,chash_std_keyfun}},{dvv_enabled, false},{dw,quorum},{last_write_wins,false},{linkfun,{modfun,riak_kv_wm_link_walker,mapreduce_linkfun}},{n_val,3},{notfound_ok,true},{old_vclock,86400},{postcommit,[]},{pr,0},{precommit,[]},{pw,0},{r,quorum},{repl,true},{rw,quorum},{small_vclock,50},{w,quorum},{write_once,false},{young_vclock,20}]], 5000).


If you go down this route, please don't forget to also make changes to the configuration files, in order to these settings to persist across a restart.

Kind Regards,

Magnus

--
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hi Magnus, 

Thank you for the tip and clarification. I'll give it a try.

On Thu, May 25, 2017 at 4:24 PM Magnus Kessler <[hidden email]> wrote:
On 25 May 2017 at 09:39, Vladyslav Zakhozhai <[hidden email]> wrote:
Hi,

I've been trying to change dvv_enabled for default bucket type. But this is impossible with riak-admin:

riak-admin bucket-type update default '{"props":{"dvv_enabled":true}}'
Error updating bucket type default:
no_default_update

I think that workaround for this is changing default props in riak config:

{riak_core, [

             {default_bucket_props, [
                 {allow_mult, true}, 
                 {dvv_enabled, true}
             ]},
...

(yes, I still use old-style configs)

And then I need to restart all riak nodes. Here is two questions:
1. Is this approach correct?
2. Is it ok to have different default_bucket_props value on different nodes of the same cluster (in short period of time)?

I have to restart 27 riak nodes. There is several billions of keys in riak and each node starts wery slow (20-30-60 min; bitcask backend). So I can't change default_bucket_props simultaneously in a such way.

I also can change this parameter in riak console, i.e.

application:set_env(riak_core, default_bucket_props, [{dvv_enabled, true}, ......]). But what I need to do for applying this changes?


Hi Vladyslav,

The recommended approach for changing the default bucket type's properties is to change the settings in `riak.conf` or `advanced.config`. However, I just checked that any settings changed through a set_env call also seem to be reflected in the runtime configuration.

If you'd like to try this, I recommend making the change on a test cluster first, as I have not verified if this causes issues on a production CS cluster. The set_env call should pass in the complete set of bucket type properties, not just the changes. You can try the following (with default default bucket-type properties):

riak_core_util:rpc_every_member_ann(application, set_env, [riak_core, default_bucket_props, [{allow_mult,false},{big_vclock,50},{chash_keyfun,{riak_core_util,chash_std_keyfun}},{dvv_enabled, false},{dw,quorum},{last_write_wins,false},{linkfun,{modfun,riak_kv_wm_link_walker,mapreduce_linkfun}},{n_val,3},{notfound_ok,true},{old_vclock,86400},{postcommit,[]},{pr,0},{precommit,[]},{pw,0},{r,quorum},{repl,true},{rw,quorum},{small_vclock,50},{w,quorum},{write_once,false},{young_vclock,20}]], 5000).


If you go down this route, please don't forget to also make changes to the configuration files, in order to these settings to persist across a restart.

Kind Regards,

Magnus

--
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
Hi,

I've changed default bucket prop "dvv_enabled" on production system (27 nodes) without any issues (couple of days ago):

NewProps = [case Prop of {dvv_enabled,false} -> {dvv_enabled,true}; Other -> Other end || Prop <- application:get_env(riak_core, default_bucket_props)].
riak_core_util:rpc_every_member_ann(application, set_env, [riak_core, default_bucket_props, NewProps], 5000).

This simple script will do the trick. And also the configuration of riak has been changed.

Another question is about old siblings workaround. Russell said that I need to manage it manually. But now I have no idea what can I do in this case. I mean will siblings be resolved during read repairs, hinted handoffs, etc or not. Or I do need to resolve them manually?




On Fri, May 26, 2017 at 11:41 PM Vladyslav Zakhozhai <[hidden email]> wrote:
Hi Magnus, 

Thank you for the tip and clarification. I'll give it a try.

On Thu, May 25, 2017 at 4:24 PM Magnus Kessler <[hidden email]> wrote:
On 25 May 2017 at 09:39, Vladyslav Zakhozhai <[hidden email]> wrote:
Hi,

I've been trying to change dvv_enabled for default bucket type. But this is impossible with riak-admin:

riak-admin bucket-type update default '{"props":{"dvv_enabled":true}}'
Error updating bucket type default:
no_default_update

I think that workaround for this is changing default props in riak config:

{riak_core, [

             {default_bucket_props, [
                 {allow_mult, true}, 
                 {dvv_enabled, true}
             ]},
...

(yes, I still use old-style configs)

And then I need to restart all riak nodes. Here is two questions:
1. Is this approach correct?
2. Is it ok to have different default_bucket_props value on different nodes of the same cluster (in short period of time)?

I have to restart 27 riak nodes. There is several billions of keys in riak and each node starts wery slow (20-30-60 min; bitcask backend). So I can't change default_bucket_props simultaneously in a such way.

I also can change this parameter in riak console, i.e.

application:set_env(riak_core, default_bucket_props, [{dvv_enabled, true}, ......]). But what I need to do for applying this changes?


Hi Vladyslav,

The recommended approach for changing the default bucket type's properties is to change the settings in `riak.conf` or `advanced.config`. However, I just checked that any settings changed through a set_env call also seem to be reflected in the runtime configuration.

If you'd like to try this, I recommend making the change on a test cluster first, as I have not verified if this causes issues on a production CS cluster. The set_env call should pass in the complete set of bucket type properties, not just the changes. You can try the following (with default default bucket-type properties):

riak_core_util:rpc_every_member_ann(application, set_env, [riak_core, default_bucket_props, [{allow_mult,false},{big_vclock,50},{chash_keyfun,{riak_core_util,chash_std_keyfun}},{dvv_enabled, false},{dw,quorum},{last_write_wins,false},{linkfun,{modfun,riak_kv_wm_link_walker,mapreduce_linkfun}},{n_val,3},{notfound_ok,true},{old_vclock,86400},{postcommit,[]},{pr,0},{precommit,[]},{pw,0},{r,quorum},{repl,true},{rw,quorum},{small_vclock,50},{w,quorum},{write_once,false},{young_vclock,20}]], 5000).


If you go down this route, please don't forget to also make changes to the configuration files, in order to these settings to persist across a restart.

Kind Regards,

Magnus

--
Magnus Kessler
Client Services Engineer
Basho Technologies Limited

Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Put failure: too many siblings

Vladyslav Zakhozhai
In reply to this post by Russell Brown-4
Hi Russell,

I am reading about "retry_put_coordinator_failure" option and I do not understand it completely.


I understand the following thing.

Statement 1. If we have N=3, W=3 then PUT operation (write) will be successful if at least one PUT was successful (i.е. 2 vnodes failed due high load). If none of the vnodes were able to write data PUT request is failed.

Statement 2. In the case of this "successful" PUT we have only one copy of data and it will be fixed during read repairs or aae (if latter is enabled).

This is how I understand "the risk of potentially increasing the likelihood of write failure" from the link above:
"Setting it to off will speed response times on PUT requests in general, but at the risk of potentially increasing the likelihood of write failure."

Russell or anybody on the list, are my statements are correct or not?

Thank you in advance.

On Wed, May 24, 2017 at 11:36 AM Russell Brown <[hidden email]> wrote:
Also, this issue https://github.com/basho/riak_kv/issues/1188 suggests that adding the property `riak_kv.retry_put_coordinator_failure=false` may help in future. But won’t help with your keys with too many siblings.

On 24 May 2017, at 09:22, Russell Brown <[hidden email]> wrote:

>
> On 24 May 2017, at 09:11, Vladyslav Zakhozhai <[hidden email]> wrote:
>
>> Hello,
>>
>> My riak cluster still experiences "too many siblings". And hinted handoffs are not able to be finished completely. So "siblings will be resolved after hinted handoffs are finished" is not my case unfortunately.
>>
>> According to basho's docs (http://docs.basho.com/riak/kv/2.2.3/learn/concepts/causal-context/#sibling-explosion) I need to enable dvv conflict resolution mechanism. So here is a quesion:
>>
>> Is it safe to enable dvv on default bucket type and how it affects existing data?
>
> It might not affect existing data enough. All the existing siblings are “undotted” and would need a read-put cycle to resolve.
>
>> It may be a solution, is not it?
>
> You may require further action. I remember basho support helping someone with a similar issue, and there was some manual intervention/scripted solution, but I can’t remember what it was right now. I think those objects (as logged) with the sibling issues need to be read and resolved. Maybe one of the ex-basho support people remembers? I’ll prod one in a back channel and see if they can help.
>
>>
>> Why I talk about default bucket type? Because there is only one riak client - Riak CS and it does not manage bucket types of PUT'ed object (so, default bucket type always is used during PUT's). Is it correct?
>
> Yes.
>
>>
>> Thank you in advance.
>>
>> On Fri, Jun 17, 2016 at 11:45 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>> Hi Russel,
>>
>> thank you for your answer. I really appreciate your help.
>>
>> 2.1.3 is not actually riak_kv version. It is version of basho's riak package. Versions of riak subsystems you can see below.
>>
>> Bucket properties:
>> # riak-admin bucket-type list
>> default (active)
>>
>> # riak-admin bucket-type status default
>> default is active
>>
>> allow_mult: true
>> basic_quorum: false
>> big_vclock: 50
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> dvv_enabled: false
>> dw: quorum
>> last_write_wins: false
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> n_val: 3
>> notfound_ok: true
>> old_vclock: 86400
>> postcommit: []
>> pr: 0
>> precommit: []
>> pw: 0
>> r: quorum
>> rw: quorum
>> small_vclock: 50
>> w: quorum
>> write_once: false
>> young_vclock: 20
>>
>> I did not mentioned that upgrade from riak 1.5.4 have been took place couple months ago (about 6 months). As I understand DVV is disabled. Is it safe to migrate to setting DVV from Vector Clocks?
>>
>> Package versions:
>> # dpkg -l | grep riak
>> ii  riak                                2.1.3-1                          amd64        Riak is a distributed data store
>> ii  riak-cs                             2.1.0-1                          amd64        Riak CS
>>
>> Subsystems versions:
>> "clique_version" : "0.3.2-0-ge332c8f",
>> "bitcask_version" : "1.7.2",
>> "sys_driver_version" : "2.2",
>> "riak_core_version" : "2.1.5-0-gb02ab53",
>> "riak_kv_version" : "2.1.2-0-gf969bba",
>> "riak_pipe_version" : "2.1.1-0-gb1ac2cf",
>> "cluster_info_version" : "2.0.3-0-g76c73fc",
>> "riak_auth_mods_version" : "2.1.0-0-g31b8b30",
>> "erlydtl_version" : "0.7.0",
>> "os_mon_version" : "2.2.13",
>> "inets_version" : "5.9.6",
>> "erlang_js_version" : "1.3.0-0-g07467d8",
>> "riak_control_version" : "2.1.2-0-gab3f924",
>> "xmerl_version" : "1.3.4",
>> "protobuffs_version" : "0.8.1p5-0-gf88fc3c",
>> "riak_sysmon_version" : "2.0.0",
>> "compiler_version" : "4.9.3",
>> "eleveldb_version" : "2.1.10-0-g0537ca9",
>> "lager_version" : "2.1.1",
>> "sasl_version" : "2.3.3",
>> "riak_dt_version" : "2.1.1-0-ga2986bc",
>> "runtime_tools_version" : "1.8.12",
>> "yokozuna_version" : "2.1.2-0-g3520d11",
>> "riak_search_version" : "2.1.1-0-gffe2113",
>> "sys_system_version" : "Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:4:4] [async-threads:64] [kernel-poll:true] [frame-pointer]",
>> "basho_stats_version" : "1.0.3",
>> "crypto_version" : "3.1",
>> "merge_index_version" : "2.0.1-0-g0c8f77c",
>> "kernel_version" : "2.16.3",
>> "stdlib_version" : "1.19.3",
>> "riak_pb_version" : "2.1.0.2-0-g620bc70",
>> "syntax_tools_version" : "1.6.11",
>> "goldrush_version" : "0.1.7",
>> "ibrowse_version" : "4.0.2",
>> "mochiweb_version" : "2.9.0",
>> "exometer_core_version" : "1.0.0-basho2-0-gb47a5d6",
>> "ssl_version" : "5.3.1",
>> "public_key_version" : "0.20",
>> "pbkdf2_version" : "2.0.0-0-g7076584",
>> "sidejob_version" : "2.0.0-0-gc5aabba",
>> "webmachine_version" : "1.10.8-0-g7677c24",
>> "poolboy_version" : "0.8.1p3-0-g8bb45fb",
>> "riak_api_version" : "2.1.2-0-gd8d510f",
>> "asn1_version" : "2.0.3",
>>
>>
>> On Fri, Jun 17, 2016 at 10:45 AM Russell Brown <[hidden email]> wrote:
>> What version of riak_kv is behind this riak_cs install, please? Is it really 2.1.3 as stated below? This looks and sounds like sibling explosion, which is fixed in riak 2.0 and above. Are you sure that you are using the DVV enabled setting for riak_cs bucket properties? Can you post your bucket properties please?
>>
>> On 16 Jun 2016, at 23:54, Vladyslav Zakhozhai <[hidden email]> wrote:
>>
>>> Hello.
>>>
>>> I see very interesting and confusing thing.
>>>
>>> From my previous letter you can see that siblings count on manifest objects is about 100 (actualy it is in range 100-300). Unfortunately my problem is that almost all PUT requests are failing with 500 Internal Server error.
>>>
>>> I've tried today set max_siblings riak option to 500. And there were successfull PUT requests but not for long. Now I see in riak logs error with "max siblings", but actual count of them is 500+ (earlier it was 100-300 as I've mentioned).
>>>
>>> Period of time between max_siblings=500 and errors in log is about 30 minutes. And I want to point your attention that I've forbid PUT method on haproxy - frontend for riak cs.
>>>
>>>
>>>
>>> On Mon, Jun 6, 2016 at 1:17 AM Vladyslav Zakhozhai <[hidden email]> wrote:
>>> Hi, Luke.
>>>
>>> Thank you for your answer. I did not understand you completely about transfer-limit. How does it relate to my problem. Transfer limit - is a limit of concurrent data transfer from different nodes. Am I wright? You mean that riak can handoff one partition from several nodes concurrently?
>>>
>>> Now I have transfer-limit 1 on all riak nodes.
>>>
>>> But I am not sure that my cluster will be converged ever. All nodes experiences low memory and are killed by OOM Killer periodically. I try to add new nodes to the cluster but due problem with OOM killer this process is very-very slow.
>>>
>>> In the official docs I've read:
>>>
>>> "Sibling explosion occurs when an object rapidly collects siblings that are not reconciled. This can lead to a variety of problems, including degraded performance, especially if many objects in a cluster suffer from siblings explosion. At the extreme, having an enormous object in a node can cause reads of that object to crash the entire node. Other issues include undue latency and out-of-memory errors."
>>>
>>> I mentioned that new nodes in the cluster do not experience such problems (I mean out of RAM).
>>>
>>> Regarding to siblings maybe your are right, this is manifest object. I can recognize key name but not bucket name. But more than 100 siblings on many keys is really confused me. Each time I try to PUT some object to Riak via Riak CS S3 interface I got errors with siblings.
>>>
>>> On Fri, Jun 3, 2016 at 6:43 PM Luke Bakken <[hidden email]> wrote:
>>> Hi Vladyslav,
>>>
>>> If you recognize the full name of the object raising the sibling
>>> warning, it is most likely a manifest object. Sometimes, during hinted
>>> handoff, you can see these messages. They should resolve after handoff
>>> completes.
>>>
>>> Please see the documentation for the transfer-limit command as well:
>>>
>>> http://docs.basho.com/riak/kv/2.1.4/using/admin/riak-admin/#transfer-limit
>>>
>>> --
>>> Luke Bakken
>>> Engineer
>>> [hidden email]
>>>
>>>
>>> On Fri, Jun 3, 2016 at 2:55 AM, Vladyslav Zakhozhai
>>> <[hidden email]> wrote:
>>>> Hi.
>>>>
>>>> I have a trouble with PUT to Riak CS cluster. During this process I
>>>> periodically see the following message in Riak error.log:
>>>>
>>>> 2016-06-03 11:15:55.201 [error]
>>>> <0.15536.142>@riak_kv_vnode:encode_and_put:2253 Put failure: too many
>>>> siblings for object OBJECT_NAME (101)
>>>>
>>>> and also
>>>>
>>>> 2016-06-03 12:41:50.678 [error]
>>>> <0.20448.515>@riak_api_pb_server:handle_info:331 Unrecognized message
>>>> {7345880,{error,{too_many_siblings,101}}}
>>>>
>>>> Here OBJECT_NAME - is the name of object in Riak which has too many
>>>> siblings.
>>>>
>>>> I definitely sure that this objects are static. Nobody deletes is, nobody
>>>> rewrites it. I have no idea why more than 100 siblings of this object
>>>> occurs.
>>>>
>>>> The following effect of this issue occurs:
>>>>
>>>> Great amount of keys are loaded to RAM. I almost out of RAM (Do each sibling
>>>> has it own key or key duplicate?).
>>>> Nodes are slow - adding new nodes are too slow
>>>> Presence of "too many siblings" affects ownership handoffs
>>>>
>>>> So I have several questions:
>>>>
>>>> Do hinted or ownership handoffs can affect siblings count (I mean can
>>>> siblings be created during ownership of hinted handoffs)
>>>> Is there any workaround of this issue. Do I need remove siblings manually or
>>>> it removes during merges, read repairs and so on
>>>>
>>>>
>>>> My configuration:
>>>>
>>>> riak from basho's packages - 2.1.3-1
>>>> riak cs from basho's packages - 2.1.0-1
>>>> 24 riak/riak-cs nodes
>>>> 32 GB RAM per node
>>>> AAE is disabled
>>>>
>>>>
>>>> I appreciate you help.
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> [hidden email]
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com