Object not found after successful PUT on S3 API

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Object not found after successful PUT on S3 API

Daniel Miller
I have a 9-node Riak CS cluster that has been working flawlessly for about 3 months. The cluster configuration, including backend and bucket parameters such as N-value are using default settings. I'm using the S3 API to communicate with the cluster.

Within the past week I had an issue where two objects were PUT resulting in a 200 (success) response, but all subsequent GET requests for those two keys return status of 404 (not found). Other than the fact that they are now missing, there was nothing out of the ordinary with these particular to PUTs. Maybe I'm missing something, but this seems like a scenario that should never happen. All information included here about PUTs and GETs comes from reviewing the CS access logs. Both objects were PUT on the same node, however GET requests returning 404 have been observed on all nodes. There is plenty of other traffic on the cluster involving GETs and PUTs that are not failing. I'm unsure of how to troubleshoot further to find out what may have happened to those objects and why they are now missing. What is the best approach to figure out why an object that was successfully PUT seems to be missing?

Thanks!
Daniel Miller

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
Hi Daniel -

This is a strange scenario. I recommend looking at all of the log
files for "[error]" or other entries at about the same time as these
PUTs or 404 responses.

Is there anything unusual about the key being used?
--
Luke Bakken
Engineer
[hidden email]


On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:

> I have a 9-node Riak CS cluster that has been working flawlessly for about 3
> months. The cluster configuration, including backend and bucket parameters
> such as N-value are using default settings. I'm using the S3 API to
> communicate with the cluster.
>
> Within the past week I had an issue where two objects were PUT resulting in
> a 200 (success) response, but all subsequent GET requests for those two keys
> return status of 404 (not found). Other than the fact that they are now
> missing, there was nothing out of the ordinary with these particular to
> PUTs. Maybe I'm missing something, but this seems like a scenario that
> should never happen. All information included here about PUTs and GETs comes
> from reviewing the CS access logs. Both objects were PUT on the same node,
> however GET requests returning 404 have been observed on all nodes. There is
> plenty of other traffic on the cluster involving GETs and PUTs that are not
> failing. I'm unsure of how to troubleshoot further to find out what may have
> happened to those objects and why they are now missing. What is the best
> approach to figure out why an object that was successfully PUT seems to be
> missing?
>
> Thanks!
> Daniel Miller
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Thanks for the quick response, Luke.

There is nothing unusual about the keys. The format is a name + UUID + some other random URL-encoded charaters, like most other keys in our cluster.

There are no errors near the time of the incident in any of the logs (the last [error] is from over a month before). I see lots of messages like this in console.log:

/var/log/riak/console.log
2017-01-20 15:38:10.184 [info] <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during active anti-entropy exchange of {776422744832042175295707567380525354192214163456,3} between {776422744832042175295707567380525354192214163456,'[hidden email]'} and {822094670998632891489572718402909198556462055424,'[hidden email]'}
2017-01-20 15:40:39.640 [info] <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during active anti-entropy exchange of {936274486415109681974235595958868809467081785344,3} between {959110449498405040071168171470060731649205731328,'[hidden email]'} and {981946412581700398168100746981252653831329677312,'[hidden email]'}
2017-01-20 15:46:40.918 [info] <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during active anti-entropy exchange of {662242929415565384811044689824565743281594433536,3} between {685078892498860742907977265335757665463718379520,'[hidden email]'} and {707914855582156101004909840846949587645842325504,'[hidden email]'}
2017-01-20 15:48:25.597 [info] <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during active anti-entropy exchange of {776422744832042175295707567380525354192214163456,3} between {776422744832042175295707567380525354192214163456,'[hidden email]'} and {799258707915337533392640142891717276374338109440,'[hidden email]'}

Thanks!
Daniel


On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
Hi Daniel -

This is a strange scenario. I recommend looking at all of the log
files for "[error]" or other entries at about the same time as these
PUTs or 404 responses.

Is there anything unusual about the key being used?
--
Luke Bakken
Engineer
[hidden email]


On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
> I have a 9-node Riak CS cluster that has been working flawlessly for about 3
> months. The cluster configuration, including backend and bucket parameters
> such as N-value are using default settings. I'm using the S3 API to
> communicate with the cluster.
>
> Within the past week I had an issue where two objects were PUT resulting in
> a 200 (success) response, but all subsequent GET requests for those two keys
> return status of 404 (not found). Other than the fact that they are now
> missing, there was nothing out of the ordinary with these particular to
> PUTs. Maybe I'm missing something, but this seems like a scenario that
> should never happen. All information included here about PUTs and GETs comes
> from reviewing the CS access logs. Both objects were PUT on the same node,
> however GET requests returning 404 have been observed on all nodes. There is
> plenty of other traffic on the cluster involving GETs and PUTs that are not
> failing. I'm unsure of how to troubleshoot further to find out what may have
> happened to those objects and why they are now missing. What is the best
> approach to figure out why an object that was successfully PUT seems to be
> missing?
>
> Thanks!
> Daniel Miller
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
Hi Daniel -

I don't have any ideas at this point. Has this scenario happened again?

--
Luke Bakken
Engineer
[hidden email]


On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:

> Thanks for the quick response, Luke.
>
> There is nothing unusual about the keys. The format is a name + UUID + some
> other random URL-encoded charaters, like most other keys in our cluster.
>
> There are no errors near the time of the incident in any of the logs (the
> last [error] is from over a month before). I see lots of messages like this
> in console.log:
>
> /var/log/riak/console.log
> 2017-01-20 15:38:10.184 [info]
> <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {822094670998632891489572718402909198556462055424,'[hidden email]'}
> 2017-01-20 15:40:39.640 [info]
> <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> active anti-entropy exchange of
> {936274486415109681974235595958868809467081785344,3} between
> {959110449498405040071168171470060731649205731328,'[hidden email]'}
> and
> {981946412581700398168100746981252653831329677312,'[hidden email]'}
> 2017-01-20 15:46:40.918 [info]
> <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {662242929415565384811044689824565743281594433536,3} between
> {685078892498860742907977265335757665463718379520,'[hidden email]'}
> and
> {707914855582156101004909840846949587645842325504,'[hidden email]'}
> 2017-01-20 15:48:25.597 [info]
> <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {799258707915337533392640142891717276374338109440,'[hidden email]'}
>
> Thanks!
> Daniel
>
>
>
> On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
>>
>> Hi Daniel -
>>
>> This is a strange scenario. I recommend looking at all of the log
>> files for "[error]" or other entries at about the same time as these
>> PUTs or 404 responses.
>>
>> Is there anything unusual about the key being used?
>> --
>> Luke Bakken
>> Engineer
>> [hidden email]
>>
>>
>> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
>> > I have a 9-node Riak CS cluster that has been working flawlessly for
>> > about 3
>> > months. The cluster configuration, including backend and bucket
>> > parameters
>> > such as N-value are using default settings. I'm using the S3 API to
>> > communicate with the cluster.
>> >
>> > Within the past week I had an issue where two objects were PUT resulting
>> > in
>> > a 200 (success) response, but all subsequent GET requests for those two
>> > keys
>> > return status of 404 (not found). Other than the fact that they are now
>> > missing, there was nothing out of the ordinary with these particular to
>> > PUTs. Maybe I'm missing something, but this seems like a scenario that
>> > should never happen. All information included here about PUTs and GETs
>> > comes
>> > from reviewing the CS access logs. Both objects were PUT on the same
>> > node,
>> > however GET requests returning 404 have been observed on all nodes.
>> > There is
>> > plenty of other traffic on the cluster involving GETs and PUTs that are
>> > not
>> > failing. I'm unsure of how to troubleshoot further to find out what may
>> > have
>> > happened to those objects and why they are now missing. What is the best
>> > approach to figure out why an object that was successfully PUT seems to
>> > be
>> > missing?
>> >
>> > Thanks!
>> > Daniel Miller
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > [hidden email]
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Hi Luke,

Sorry for the late response and thanks for following up. I haven't seen it happen since. At this point I'm going to wait and see if it happens again and hopefully get more details about what might be causing it.

Daniel

On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel -

I don't have any ideas at this point. Has this scenario happened again?

--
Luke Bakken
Engineer
[hidden email]


On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:
> Thanks for the quick response, Luke.
>
> There is nothing unusual about the keys. The format is a name + UUID + some
> other random URL-encoded charaters, like most other keys in our cluster.
>
> There are no errors near the time of the incident in any of the logs (the
> last [error] is from over a month before). I see lots of messages like this
> in console.log:
>
> /var/log/riak/console.log
> 2017-01-20 15:38:10.184 [info]
> <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {822094670998632891489572718402909198556462055424,'[hidden email]'}
> 2017-01-20 15:40:39.640 [info]
> <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> active anti-entropy exchange of
> {936274486415109681974235595958868809467081785344,3} between
> {959110449498405040071168171470060731649205731328,'[hidden email]'}
> and
> {981946412581700398168100746981252653831329677312,'[hidden email]'}
> 2017-01-20 15:46:40.918 [info]
> <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {662242929415565384811044689824565743281594433536,3} between
> {685078892498860742907977265335757665463718379520,'[hidden email]'}
> and
> {707914855582156101004909840846949587645842325504,'[hidden email]'}
> 2017-01-20 15:48:25.597 [info]
> <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {799258707915337533392640142891717276374338109440,'[hidden email]'}
>
> Thanks!
> Daniel
>
>
>
> On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
>>
>> Hi Daniel -
>>
>> This is a strange scenario. I recommend looking at all of the log
>> files for "[error]" or other entries at about the same time as these
>> PUTs or 404 responses.
>>
>> Is there anything unusual about the key being used?
>> --
>> Luke Bakken
>> Engineer
>> [hidden email]
>>
>>
>> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
>> > I have a 9-node Riak CS cluster that has been working flawlessly for
>> > about 3
>> > months. The cluster configuration, including backend and bucket
>> > parameters
>> > such as N-value are using default settings. I'm using the S3 API to
>> > communicate with the cluster.
>> >
>> > Within the past week I had an issue where two objects were PUT resulting
>> > in
>> > a 200 (success) response, but all subsequent GET requests for those two
>> > keys
>> > return status of 404 (not found). Other than the fact that they are now
>> > missing, there was nothing out of the ordinary with these particular to
>> > PUTs. Maybe I'm missing something, but this seems like a scenario that
>> > should never happen. All information included here about PUTs and GETs
>> > comes
>> > from reviewing the CS access logs. Both objects were PUT on the same
>> > node,
>> > however GET requests returning 404 have been observed on all nodes.
>> > There is
>> > plenty of other traffic on the cluster involving GETs and PUTs that are
>> > not
>> > failing. I'm unsure of how to troubleshoot further to find out what may
>> > have
>> > happened to those objects and why they are now missing. What is the best
>> > approach to figure out why an object that was successfully PUT seems to
>> > be
>> > missing?
>> >
>> > Thanks!
>> > Daniel Miller
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > [hidden email]
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>
>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
I recently had another case of a disappearing object. This time the object was successfully PUT, and (unlike the previous cases reported in this thread) for a period of time GETs were also successful. Then GETs started 404ing for no apparent reason. There are no errors in the logs to indicate that anything unusual happened. This is quite disconcerting. Is it normal that Riak CS just loses track of objects? At this point we are using CS as primary object storage, meaning we do not have the data stored in another database so it's critical that the data is not randomly lost.

In the CS access logs I see

# all prior GET requests for this object succeeding like this one. This is the last successful GET request:
[28/Feb/2017:14:42:35 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
...
# all GET requests for this object are now failing like this one (the first 404):
[02/Mar/2017:08:36:11 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"

The object name has been elided for readability. I do not know when this object was PUT into the cluster because I only have logs for the past month. Is there any way to dig further into Riak or Riak CS data to determine if the object content is actually completely lost or if there are any other details that might explain why it is now missing? Could I increase some logging parameters to get more information about what is going wrong when something like this happens?

I have searched the logs for other 404 responses but found none (other than the two reported earlier), so this is the 3rd known missing object in the cluster. We retain logs for one month only (I'm increasing this now because of this issue), so it is possible that other objects have also gone missing, but I cannot see them since the logs have been truncated.

The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), and the riak storage backend is now leveldb instead of multi. I have attached config file templates for riak, raik-cs and stanchion (these are deployed with ansible).

Bucket properties:
{
  "props": {
    "notfound_ok": true,
    "n_val": 3,
    "last_write_wins": false,
    "allow_mult": true,
    "dvv_enabled": false,
    "name": "blobdb",
    "r": "quorum",
    "precommit": [],
    "old_vclock": 86400,
    "dw": "quorum",
    "rw": "quorum",
    "small_vclock": 50,
    "write_once": false,
    "basic_quorum": false,
    "big_vclock": 50,
    "chash_keyfun": {
      "fun": "chash_std_keyfun",
      "mod": "riak_core_util"
    },
    "postcommit": [],
    "pw": 0,
    "w": "quorum",
    "young_vclock": 20,
    "pr": 0,
    "linkfun": {
      "fun": "mapreduce_linkfun",
      "mod": "riak_kv_wm_link_walker"
    }
  }
}

I'll be happy to provide more context to help troubleshoot this issue.

Thanks in advance for any help you can provide.

Daniel


On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <[hidden email]> wrote:
Hi Luke,

Sorry for the late response and thanks for following up. I haven't seen it happen since. At this point I'm going to wait and see if it happens again and hopefully get more details about what might be causing it.

Daniel

On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel -

I don't have any ideas at this point. Has this scenario happened again?

--
Luke Bakken
Engineer
[hidden email]


On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:
> Thanks for the quick response, Luke.
>
> There is nothing unusual about the keys. The format is a name + UUID + some
> other random URL-encoded charaters, like most other keys in our cluster.
>
> There are no errors near the time of the incident in any of the logs (the
> last [error] is from over a month before). I see lots of messages like this
> in console.log:
>
> /var/log/riak/console.log
> 2017-01-20 15:38:10.184 [info]
> <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {822094670998632891489572718402909198556462055424,'[hidden email]'}
> 2017-01-20 15:40:39.640 [info]
> <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> active anti-entropy exchange of
> {936274486415109681974235595958868809467081785344,3} between
> {959110449498405040071168171470060731649205731328,'[hidden email]'}
> and
> {981946412581700398168100746981252653831329677312,'[hidden email]'}
> 2017-01-20 15:46:40.918 [info]
> <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {662242929415565384811044689824565743281594433536,3} between
> {685078892498860742907977265335757665463718379520,'[hidden email]'}
> and
> {707914855582156101004909840846949587645842325504,'[hidden email]'}
> 2017-01-20 15:48:25.597 [info]
> <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> active anti-entropy exchange of
> {776422744832042175295707567380525354192214163456,3} between
> {776422744832042175295707567380525354192214163456,'[hidden email]'}
> and
> {799258707915337533392640142891717276374338109440,'[hidden email]'}
>
> Thanks!
> Daniel
>
>
>
> On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
>>
>> Hi Daniel -
>>
>> This is a strange scenario. I recommend looking at all of the log
>> files for "[error]" or other entries at about the same time as these
>> PUTs or 404 responses.
>>
>> Is there anything unusual about the key being used?
>> --
>> Luke Bakken
>> Engineer
>> [hidden email]
>>
>>
>> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
>> > I have a 9-node Riak CS cluster that has been working flawlessly for
>> > about 3
>> > months. The cluster configuration, including backend and bucket
>> > parameters
>> > such as N-value are using default settings. I'm using the S3 API to
>> > communicate with the cluster.
>> >
>> > Within the past week I had an issue where two objects were PUT resulting
>> > in
>> > a 200 (success) response, but all subsequent GET requests for those two
>> > keys
>> > return status of 404 (not found). Other than the fact that they are now
>> > missing, there was nothing out of the ordinary with these particular to
>> > PUTs. Maybe I'm missing something, but this seems like a scenario that
>> > should never happen. All information included here about PUTs and GETs
>> > comes
>> > from reviewing the CS access logs. Both objects were PUT on the same
>> > node,
>> > however GET requests returning 404 have been observed on all nodes.
>> > There is
>> > plenty of other traffic on the cluster involving GETs and PUTs that are
>> > not
>> > failing. I'm unsure of how to troubleshoot further to find out what may
>> > have
>> > happened to those objects and why they are now missing. What is the best
>> > approach to figure out why an object that was successfully PUT seems to
>> > be
>> > missing?
>> >
>> > Thanks!
>> > Daniel Miller
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > [hidden email]
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>
>



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

config-files.zip (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Russell Brown-4
Hi,
Would be good to know the riak version, and why the dvv_enabled bucket property is set to false, please? Also, is there multi-datacentre replication involved? Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?

Cheers

Russell

On 6 Mar 2017, at 15:07, Daniel Miller <[hidden email]> wrote:

> I recently had another case of a disappearing object. This time the object was successfully PUT, and (unlike the previous cases reported in this thread) for a period of time GETs were also successful. Then GETs started 404ing for no apparent reason. There are no errors in the logs to indicate that anything unusual happened. This is quite disconcerting. Is it normal that Riak CS just loses track of objects? At this point we are using CS as primary object storage, meaning we do not have the data stored in another database so it's critical that the data is not randomly lost.
>
> In the CS access logs I see
>
> # all prior GET requests for this object succeeding like this one. This is the last successful GET request:
> [28/Feb/2017:14:42:35 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> ...
> # all GET requests for this object are now failing like this one (the first 404):
> [02/Mar/2017:08:36:11 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
>
> The object name has been elided for readability. I do not know when this object was PUT into the cluster because I only have logs for the past month. Is there any way to dig further into Riak or Riak CS data to determine if the object content is actually completely lost or if there are any other details that might explain why it is now missing? Could I increase some logging parameters to get more information about what is going wrong when something like this happens?
>
> I have searched the logs for other 404 responses but found none (other than the two reported earlier), so this is the 3rd known missing object in the cluster. We retain logs for one month only (I'm increasing this now because of this issue), so it is possible that other objects have also gone missing, but I cannot see them since the logs have been truncated.
>
> The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), and the riak storage backend is now leveldb instead of multi. I have attached config file templates for riak, raik-cs and stanchion (these are deployed with ansible).
>
> Bucket properties:
> {
>   "props": {
>     "notfound_ok": true,
>     "n_val": 3,
>     "last_write_wins": false,
>     "allow_mult": true,
>     "dvv_enabled": false,
>     "name": "blobdb",
>     "r": "quorum",
>     "precommit": [],
>     "old_vclock": 86400,
>     "dw": "quorum",
>     "rw": "quorum",
>     "small_vclock": 50,
>     "write_once": false,
>     "basic_quorum": false,
>     "big_vclock": 50,
>     "chash_keyfun": {
>       "fun": "chash_std_keyfun",
>       "mod": "riak_core_util"
>     },
>     "postcommit": [],
>     "pw": 0,
>     "w": "quorum",
>     "young_vclock": 20,
>     "pr": 0,
>     "linkfun": {
>       "fun": "mapreduce_linkfun",
>       "mod": "riak_kv_wm_link_walker"
>     }
>   }
> }
>
> I'll be happy to provide more context to help troubleshoot this issue.
>
> Thanks in advance for any help you can provide.
>
> Daniel
>
>
> On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <[hidden email]> wrote:
> Hi Luke,
>
> Sorry for the late response and thanks for following up. I haven't seen it happen since. At this point I'm going to wait and see if it happens again and hopefully get more details about what might be causing it.
>
> Daniel
>
> On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <[hidden email]> wrote:
> Hi Daniel -
>
> I don't have any ideas at this point. Has this scenario happened again?
>
> --
> Luke Bakken
> Engineer
> [hidden email]
>
>
> On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:
> > Thanks for the quick response, Luke.
> >
> > There is nothing unusual about the keys. The format is a name + UUID + some
> > other random URL-encoded charaters, like most other keys in our cluster.
> >
> > There are no errors near the time of the incident in any of the logs (the
> > last [error] is from over a month before). I see lots of messages like this
> > in console.log:
> >
> > /var/log/riak/console.log
> > 2017-01-20 15:38:10.184 [info]
> > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > and
> > {822094670998632891489572718402909198556462055424,'[hidden email]'}
> > 2017-01-20 15:40:39.640 [info]
> > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> > active anti-entropy exchange of
> > {936274486415109681974235595958868809467081785344,3} between
> > {959110449498405040071168171470060731649205731328,'[hidden email]'}
> > and
> > {981946412581700398168100746981252653831329677312,'[hidden email]'}
> > 2017-01-20 15:46:40.918 [info]
> > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {662242929415565384811044689824565743281594433536,3} between
> > {685078892498860742907977265335757665463718379520,'[hidden email]'}
> > and
> > {707914855582156101004909840846949587645842325504,'[hidden email]'}
> > 2017-01-20 15:48:25.597 [info]
> > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > and
> > {799258707915337533392640142891717276374338109440,'[hidden email]'}
> >
> > Thanks!
> > Daniel
> >
> >
> >
> > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
> >>
> >> Hi Daniel -
> >>
> >> This is a strange scenario. I recommend looking at all of the log
> >> files for "[error]" or other entries at about the same time as these
> >> PUTs or 404 responses.
> >>
> >> Is there anything unusual about the key being used?
> >> --
> >> Luke Bakken
> >> Engineer
> >> [hidden email]
> >>
> >>
> >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
> >> > I have a 9-node Riak CS cluster that has been working flawlessly for
> >> > about 3
> >> > months. The cluster configuration, including backend and bucket
> >> > parameters
> >> > such as N-value are using default settings. I'm using the S3 API to
> >> > communicate with the cluster.
> >> >
> >> > Within the past week I had an issue where two objects were PUT resulting
> >> > in
> >> > a 200 (success) response, but all subsequent GET requests for those two
> >> > keys
> >> > return status of 404 (not found). Other than the fact that they are now
> >> > missing, there was nothing out of the ordinary with these particular to
> >> > PUTs. Maybe I'm missing something, but this seems like a scenario that
> >> > should never happen. All information included here about PUTs and GETs
> >> > comes
> >> > from reviewing the CS access logs. Both objects were PUT on the same
> >> > node,
> >> > however GET requests returning 404 have been observed on all nodes.
> >> > There is
> >> > plenty of other traffic on the cluster involving GETs and PUTs that are
> >> > not
> >> > failing. I'm unsure of how to troubleshoot further to find out what may
> >> > have
> >> > happened to those objects and why they are now missing. What is the best
> >> > approach to figure out why an object that was successfully PUT seems to
> >> > be
> >> > missing?
> >> >
> >> > Thanks!
> >> > Daniel Miller
> >> >
> >> > _______________________________________________
> >> > riak-users mailing list
> >> > [hidden email]
> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> >
> >
> >
>
>
> <config-files.zip>_______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
> Would be good to know the riak version

Riak 2.1.1
Riak CS 2.1.0
Stanchion 2.1.0

> why the dvv_enabled bucket property is set to false, please?

Looks like that's the default. I haven't changed it.

 > Also, is there multi-datacentre replication involved?

no

> Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?

no

Thank you for the prompt follow-up.

Daniel


On Mon, Mar 6, 2017 at 10:38 AM, Russell Brown <[hidden email]> wrote:
Hi,
Would be good to know the riak version, and why the dvv_enabled bucket property is set to false, please? Also, is there multi-datacentre replication involved? Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?

Cheers

Russell

On 6 Mar 2017, at 15:07, Daniel Miller <[hidden email]> wrote:

> I recently had another case of a disappearing object. This time the object was successfully PUT, and (unlike the previous cases reported in this thread) for a period of time GETs were also successful. Then GETs started 404ing for no apparent reason. There are no errors in the logs to indicate that anything unusual happened. This is quite disconcerting. Is it normal that Riak CS just loses track of objects? At this point we are using CS as primary object storage, meaning we do not have the data stored in another database so it's critical that the data is not randomly lost.
>
> In the CS access logs I see
>
> # all prior GET requests for this object succeeding like this one. This is the last successful GET request:
> [28/Feb/2017:14:42:35 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> ...
> # all GET requests for this object are now failing like this one (the first 404):
> [02/Mar/2017:08:36:11 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
>
> The object name has been elided for readability. I do not know when this object was PUT into the cluster because I only have logs for the past month. Is there any way to dig further into Riak or Riak CS data to determine if the object content is actually completely lost or if there are any other details that might explain why it is now missing? Could I increase some logging parameters to get more information about what is going wrong when something like this happens?
>
> I have searched the logs for other 404 responses but found none (other than the two reported earlier), so this is the 3rd known missing object in the cluster. We retain logs for one month only (I'm increasing this now because of this issue), so it is possible that other objects have also gone missing, but I cannot see them since the logs have been truncated.
>
> The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), and the riak storage backend is now leveldb instead of multi. I have attached config file templates for riak, raik-cs and stanchion (these are deployed with ansible).
>
> Bucket properties:
> {
>   "props": {
>     "notfound_ok": true,
>     "n_val": 3,
>     "last_write_wins": false,
>     "allow_mult": true,
>     "dvv_enabled": false,
>     "name": "blobdb",
>     "r": "quorum",
>     "precommit": [],
>     "old_vclock": 86400,
>     "dw": "quorum",
>     "rw": "quorum",
>     "small_vclock": 50,
>     "write_once": false,
>     "basic_quorum": false,
>     "big_vclock": 50,
>     "chash_keyfun": {
>       "fun": "chash_std_keyfun",
>       "mod": "riak_core_util"
>     },
>     "postcommit": [],
>     "pw": 0,
>     "w": "quorum",
>     "young_vclock": 20,
>     "pr": 0,
>     "linkfun": {
>       "fun": "mapreduce_linkfun",
>       "mod": "riak_kv_wm_link_walker"
>     }
>   }
> }
>
> I'll be happy to provide more context to help troubleshoot this issue.
>
> Thanks in advance for any help you can provide.
>
> Daniel
>
>
> On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <[hidden email]> wrote:
> Hi Luke,
>
> Sorry for the late response and thanks for following up. I haven't seen it happen since. At this point I'm going to wait and see if it happens again and hopefully get more details about what might be causing it.
>
> Daniel
>
> On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <[hidden email]> wrote:
> Hi Daniel -
>
> I don't have any ideas at this point. Has this scenario happened again?
>
> --
> Luke Bakken
> Engineer
> [hidden email]
>
>
> On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:
> > Thanks for the quick response, Luke.
> >
> > There is nothing unusual about the keys. The format is a name + UUID + some
> > other random URL-encoded charaters, like most other keys in our cluster.
> >
> > There are no errors near the time of the incident in any of the logs (the
> > last [error] is from over a month before). I see lots of messages like this
> > in console.log:
> >
> > /var/log/riak/console.log
> > 2017-01-20 15:38:10.184 [info]
> > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > and
> > {822094670998632891489572718402909198556462055424,'[hidden email]'}
> > 2017-01-20 15:40:39.640 [info]
> > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> > active anti-entropy exchange of
> > {936274486415109681974235595958868809467081785344,3} between
> > {959110449498405040071168171470060731649205731328,'[hidden email]'}
> > and
> > {981946412581700398168100746981252653831329677312,'[hidden email]'}
> > 2017-01-20 15:46:40.918 [info]
> > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {662242929415565384811044689824565743281594433536,3} between
> > {685078892498860742907977265335757665463718379520,'[hidden email]'}
> > and
> > {707914855582156101004909840846949587645842325504,'[hidden email]'}
> > 2017-01-20 15:48:25.597 [info]
> > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > active anti-entropy exchange of
> > {776422744832042175295707567380525354192214163456,3} between
> > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > and
> > {799258707915337533392640142891717276374338109440,'[hidden email]'}
> >
> > Thanks!
> > Daniel
> >
> >
> >
> > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
> >>
> >> Hi Daniel -
> >>
> >> This is a strange scenario. I recommend looking at all of the log
> >> files for "[error]" or other entries at about the same time as these
> >> PUTs or 404 responses.
> >>
> >> Is there anything unusual about the key being used?
> >> --
> >> Luke Bakken
> >> Engineer
> >> [hidden email]
> >>
> >>
> >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
> >> > I have a 9-node Riak CS cluster that has been working flawlessly for
> >> > about 3
> >> > months. The cluster configuration, including backend and bucket
> >> > parameters
> >> > such as N-value are using default settings. I'm using the S3 API to
> >> > communicate with the cluster.
> >> >
> >> > Within the past week I had an issue where two objects were PUT resulting
> >> > in
> >> > a 200 (success) response, but all subsequent GET requests for those two
> >> > keys
> >> > return status of 404 (not found). Other than the fact that they are now
> >> > missing, there was nothing out of the ordinary with these particular to
> >> > PUTs. Maybe I'm missing something, but this seems like a scenario that
> >> > should never happen. All information included here about PUTs and GETs
> >> > comes
> >> > from reviewing the CS access logs. Both objects were PUT on the same
> >> > node,
> >> > however GET requests returning 404 have been observed on all nodes.
> >> > There is
> >> > plenty of other traffic on the cluster involving GETs and PUTs that are
> >> > not
> >> > failing. I'm unsure of how to troubleshoot further to find out what may
> >> > have
> >> > happened to those objects and why they are now missing. What is the best
> >> > approach to figure out why an object that was successfully PUT seems to
> >> > be
> >> > missing?
> >> >
> >> > Thanks!
> >> > Daniel Miller
> >> >
> >> > _______________________________________________
> >> > riak-users mailing list
> >> > [hidden email]
> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >> >
> >
> >
>
>
> <config-files.zip>_______________________________________________


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Russell Brown-4
Genuinely stumped then.

I’m surprised that dvv_enabled=false is the default as sibling explosion is bad.

I don’t know the CS code very well, but I assume a not_found means that either the manifest or some chunk is not found. I wonder if you can get the manifest and then see if any/all of the chunks are present?

On 6 Mar 2017, at 17:21, Daniel Miller <[hidden email]> wrote:

> > Would be good to know the riak version
>
> Riak 2.1.1
> Riak CS 2.1.0
> Stanchion 2.1.0
>
> > why the dvv_enabled bucket property is set to false, please?
>
> Looks like that's the default. I haven't changed it.
>
>  > Also, is there multi-datacentre replication involved?
>
> no
>
> > Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?
>
> no
>
> Thank you for the prompt follow-up.
>
> Daniel
>
>
> On Mon, Mar 6, 2017 at 10:38 AM, Russell Brown <[hidden email]> wrote:
> Hi,
> Would be good to know the riak version, and why the dvv_enabled bucket property is set to false, please? Also, is there multi-datacentre replication involved? Do you re-use your keys, for example, have the keys in question been created, deleted, and then re-created?
>
> Cheers
>
> Russell
>
> On 6 Mar 2017, at 15:07, Daniel Miller <[hidden email]> wrote:
>
> > I recently had another case of a disappearing object. This time the object was successfully PUT, and (unlike the previous cases reported in this thread) for a period of time GETs were also successful. Then GETs started 404ing for no apparent reason. There are no errors in the logs to indicate that anything unusual happened. This is quite disconcerting. Is it normal that Riak CS just loses track of objects? At this point we are using CS as primary object storage, meaning we do not have the data stored in another database so it's critical that the data is not randomly lost.
> >
> > In the CS access logs I see
> >
> > # all prior GET requests for this object succeeding like this one. This is the last successful GET request:
> > [28/Feb/2017:14:42:35 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 200 14923 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> > ...
> > # all GET requests for this object are now failing like this one (the first 404):
> > [02/Mar/2017:08:36:11 +0000] "GET /buckets/blobdb/objects/commcarehq__apps%2F3d2b... HTTP/1.0" 404 240 "" "Boto3/1.4.0 Python/2.7.6 Linux/3.13.0-86-generic Botocore/1.4.53 Resource"
> >
> > The object name has been elided for readability. I do not know when this object was PUT into the cluster because I only have logs for the past month. Is there any way to dig further into Riak or Riak CS data to determine if the object content is actually completely lost or if there are any other details that might explain why it is now missing? Could I increase some logging parameters to get more information about what is going wrong when something like this happens?
> >
> > I have searched the logs for other 404 responses but found none (other than the two reported earlier), so this is the 3rd known missing object in the cluster. We retain logs for one month only (I'm increasing this now because of this issue), so it is possible that other objects have also gone missing, but I cannot see them since the logs have been truncated.
> >
> > The cluster now has 7 nodes instead of 9 (see earlier emails in this thread), and the riak storage backend is now leveldb instead of multi. I have attached config file templates for riak, raik-cs and stanchion (these are deployed with ansible).
> >
> > Bucket properties:
> > {
> >   "props": {
> >     "notfound_ok": true,
> >     "n_val": 3,
> >     "last_write_wins": false,
> >     "allow_mult": true,
> >     "dvv_enabled": false,
> >     "name": "blobdb",
> >     "r": "quorum",
> >     "precommit": [],
> >     "old_vclock": 86400,
> >     "dw": "quorum",
> >     "rw": "quorum",
> >     "small_vclock": 50,
> >     "write_once": false,
> >     "basic_quorum": false,
> >     "big_vclock": 50,
> >     "chash_keyfun": {
> >       "fun": "chash_std_keyfun",
> >       "mod": "riak_core_util"
> >     },
> >     "postcommit": [],
> >     "pw": 0,
> >     "w": "quorum",
> >     "young_vclock": 20,
> >     "pr": 0,
> >     "linkfun": {
> >       "fun": "mapreduce_linkfun",
> >       "mod": "riak_kv_wm_link_walker"
> >     }
> >   }
> > }
> >
> > I'll be happy to provide more context to help troubleshoot this issue.
> >
> > Thanks in advance for any help you can provide.
> >
> > Daniel
> >
> >
> > On Tue, Feb 14, 2017 at 11:52 AM, Daniel Miller <[hidden email]> wrote:
> > Hi Luke,
> >
> > Sorry for the late response and thanks for following up. I haven't seen it happen since. At this point I'm going to wait and see if it happens again and hopefully get more details about what might be causing it.
> >
> > Daniel
> >
> > On Thu, Feb 9, 2017 at 1:02 PM, Luke Bakken <[hidden email]> wrote:
> > Hi Daniel -
> >
> > I don't have any ideas at this point. Has this scenario happened again?
> >
> > --
> > Luke Bakken
> > Engineer
> > [hidden email]
> >
> >
> > On Wed, Jan 25, 2017 at 2:11 PM, Daniel Miller <[hidden email]> wrote:
> > > Thanks for the quick response, Luke.
> > >
> > > There is nothing unusual about the keys. The format is a name + UUID + some
> > > other random URL-encoded charaters, like most other keys in our cluster.
> > >
> > > There are no errors near the time of the incident in any of the logs (the
> > > last [error] is from over a month before). I see lots of messages like this
> > > in console.log:
> > >
> > > /var/log/riak/console.log
> > > 2017-01-20 15:38:10.184 [info]
> > > <0.22902.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > > active anti-entropy exchange of
> > > {776422744832042175295707567380525354192214163456,3} between
> > > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > > and
> > > {822094670998632891489572718402909198556462055424,'[hidden email]'}
> > > 2017-01-20 15:40:39.640 [info]
> > > <0.21789.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 1 keys during
> > > active anti-entropy exchange of
> > > {936274486415109681974235595958868809467081785344,3} between
> > > {959110449498405040071168171470060731649205731328,'[hidden email]'}
> > > and
> > > {981946412581700398168100746981252653831329677312,'[hidden email]'}
> > > 2017-01-20 15:46:40.918 [info]
> > > <0.13986.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > > active anti-entropy exchange of
> > > {662242929415565384811044689824565743281594433536,3} between
> > > {685078892498860742907977265335757665463718379520,'[hidden email]'}
> > > and
> > > {707914855582156101004909840846949587645842325504,'[hidden email]'}
> > > 2017-01-20 15:48:25.597 [info]
> > > <0.29943.1193>@riak_kv_exchange_fsm:key_exchange:263 Repaired 2 keys during
> > > active anti-entropy exchange of
> > > {776422744832042175295707567380525354192214163456,3} between
> > > {776422744832042175295707567380525354192214163456,'[hidden email]'}
> > > and
> > > {799258707915337533392640142891717276374338109440,'[hidden email]'}
> > >
> > > Thanks!
> > > Daniel
> > >
> > >
> > >
> > > On Wed, Jan 25, 2017 at 9:45 AM, Luke Bakken <[hidden email]> wrote:
> > >>
> > >> Hi Daniel -
> > >>
> > >> This is a strange scenario. I recommend looking at all of the log
> > >> files for "[error]" or other entries at about the same time as these
> > >> PUTs or 404 responses.
> > >>
> > >> Is there anything unusual about the key being used?
> > >> --
> > >> Luke Bakken
> > >> Engineer
> > >> [hidden email]
> > >>
> > >>
> > >> On Wed, Jan 25, 2017 at 6:40 AM, Daniel Miller <[hidden email]> wrote:
> > >> > I have a 9-node Riak CS cluster that has been working flawlessly for
> > >> > about 3
> > >> > months. The cluster configuration, including backend and bucket
> > >> > parameters
> > >> > such as N-value are using default settings. I'm using the S3 API to
> > >> > communicate with the cluster.
> > >> >
> > >> > Within the past week I had an issue where two objects were PUT resulting
> > >> > in
> > >> > a 200 (success) response, but all subsequent GET requests for those two
> > >> > keys
> > >> > return status of 404 (not found). Other than the fact that they are now
> > >> > missing, there was nothing out of the ordinary with these particular to
> > >> > PUTs. Maybe I'm missing something, but this seems like a scenario that
> > >> > should never happen. All information included here about PUTs and GETs
> > >> > comes
> > >> > from reviewing the CS access logs. Both objects were PUT on the same
> > >> > node,
> > >> > however GET requests returning 404 have been observed on all nodes.
> > >> > There is
> > >> > plenty of other traffic on the cluster involving GETs and PUTs that are
> > >> > not
> > >> > failing. I'm unsure of how to troubleshoot further to find out what may
> > >> > have
> > >> > happened to those objects and why they are now missing. What is the best
> > >> > approach to figure out why an object that was successfully PUT seems to
> > >> > be
> > >> > missing?
> > >> >
> > >> > Thanks!
> > >> > Daniel Miller
> > >> >
> > >> > _______________________________________________
> > >> > riak-users mailing list
> > >> > [hidden email]
> > >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > >> >
> > >
> > >
> >
> >
> > <config-files.zip>_______________________________________________
> > riak-users mailing list
> > [hidden email]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
In reply to this post by Daniel Miller
Hi Daniel -

Did you forget to include the advanced.config file in your archive of
configuration files? I only see three *.conf.j2 files. The reason I
ask is that the following settings are critical to Riak CS functioning
correctly:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

I realize you have replaced the "be_blocks" backed with leveldb, but I
would like to confirm that you have the other settings.

In fact it would be best to archive the generated.configs directory
from one of your Riak nodes to include here.

Thanks

--
Luke Bakken
Engineer
[hidden email]


On Mon, Mar 6, 2017 at 7:07 AM, Daniel Miller <[hidden email]> wrote:
> I recently had another case of a disappearing object. This time the object
> was successfully PUT, and (unlike the previous cases reported in this
> thread) for a period of time GETs were also successful. Then GETs started
> 404ing for no apparent reason. There are no errors in the logs to indicate
> that anything unusual happened. This is quite disconcerting. Is it normal
> that Riak CS just loses track of objects? At this point we are using CS as
> primary object storage, meaning we do not have the data stored in another
> database so it's critical that the data is not randomly lost

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Hi Luke,

I do not have an advanced.config file since I switched to leveldb storage backend. Generated configs attached.

Hopefully not relevant, the data root is on an ecryptfs volume.

On Mon, Mar 6, 2017 at 1:23 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel -

Did you forget to include the advanced.config file in your archive of
configuration files? I only see three *.conf.j2 files. The reason I
ask is that the following settings are critical to Riak CS functioning
correctly:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

I realize you have replaced the "be_blocks" backed with leveldb, but I
would like to confirm that you have the other settings.

In fact it would be best to archive the generated.configs directory
from one of your Riak nodes to include here.

Thanks

--
Luke Bakken
Engineer
[hidden email]


On Mon, Mar 6, 2017 at 7:07 AM, Daniel Miller <[hidden email]> wrote:
> I recently had another case of a disappearing object. This time the object
> was successfully PUT, and (unlike the previous cases reported in this
> thread) for a period of time GETs were also successful. Then GETs started
> 404ing for no apparent reason. There are no errors in the logs to indicate
> that anything unusual happened. This is quite disconcerting. Is it normal
> that Riak CS just loses track of objects? At this point we are using CS as
> primary object storage, meaning we do not have the data stored in another
> database so it's critical that the data is not randomly lost


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

generated.configs.tbz (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

--
Luke Bakken
Engineer
[hidden email]


On Mon, Mar 6, 2017 at 11:29 AM, Daniel Miller <[hidden email]> wrote:
> Hi Luke,
>
> I do not have an advanced.config file since I switched to leveldb storage
> backend. Generated configs attached.
>
> Hopefully not relevant, the data root is on an ecryptfs volume.

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.



_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

riak-env.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Also I just realized I gave the wrong platform_data_dir in the config-files.zip I sent yesterday. The correct path is shown in riak-env.txt as well as generated.configs.tbz attachments sent later.

On Tue, Mar 7, 2017 at 9:47 AM, Daniel Miller <[hidden email]> wrote:
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
In reply to this post by Daniel Miller
Hi Daniel,

Thanks for providing all of that information.

You are missing important configuration for riak_kv that can only be provided in an /etc/riak/advanced.config file. Please see the following document, especially the section to which I link here:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

[
    {riak_kv, [
        % NOTE: double-check this path for your environment:
        {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
        {storage_backend, riak_cs_kv_multi_backend},
        {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
        {multi_backend_default, be_default},
        {multi_backend, [
            {be_default, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak"}
            ]},
            {be_blocks, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak_blocks"}
            ]}
        ]}
    ]}

].

Your configuration will look like the above. The contents of this file are merged with the contents of /etc/riak/riak.conf to produce the configuration that Riak uses.

Notice that I chose riak_kv_eleveldb_backend twice because of the discussion you had previously about RAM usage and bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-November/018801.html)

In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.

IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.

--
Luke Bakken
Engineer
[hidden email]

--
Luke Bakken
Engineer
[hidden email]

On Tue, Mar 7, 2017 at 6:47 AM, Daniel Miller <[hidden email]> wrote:
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
Thanks for taking the time to look into this Luke. I should have asked more questions when I setup the configuration for leveldb backend since there is no clear documentation for how configure CS with leveldb only.

In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.

IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.

Do you have a recommendation to get my data to a new state? Like will it work if I create new nodes and replace each existing node with a new node configured correctly? Or do I need a more involved migration process?
 

On Tue, Mar 7, 2017 at 3:58 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Thanks for providing all of that information.

You are missing important configuration for riak_kv that can only be provided in an /etc/riak/advanced.config file. Please see the following document, especially the section to which I link here:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

[
    {riak_kv, [
        % NOTE: double-check this path for your environment:
        {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
        {storage_backend, riak_cs_kv_multi_backend},
        {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
        {multi_backend_default, be_default},
        {multi_backend, [
            {be_default, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak"}
            ]},
            {be_blocks, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak_blocks"}
            ]}
        ]}
    ]}

].

Your configuration will look like the above. The contents of this file are merged with the contents of /etc/riak/riak.conf to produce the configuration that Riak uses.

Notice that I chose riak_kv_eleveldb_backend twice because of the discussion you had previously about RAM usage and bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-November/018801.html)

In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.

IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.

--
Luke Bakken
Engineer
[hidden email]

--
Luke Bakken
Engineer
[hidden email]

On Tue, Mar 7, 2017 at 6:47 AM, Daniel Miller <[hidden email]> wrote:
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.





_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: Object not found after successful PUT on S3 API

Luke Bakken
> Thanks for taking the time to look into this Luke. I should have asked more questions when I setup the configuration for leveldb backend since there is no clear documentation for how configure CS with leveldb only.

The reason for this is that a leveldb-only configuration is neither supported nor tested. I re-read your previous thread and found this message which gave instructions to not use the multi backend (https://goo.gl/BL6HXI). At this time, I believe those to be incorrect instructions and that you still must use riak_cs_kv_multi_backend where each sub-backend is riak_kv_eleveldb_backend.

> Do you have a recommendation to get my data to a new state? Like will it work if I create new nodes and replace each existing node with a new node configured correctly? Or do I need a more involved migration process?

Since you are changing your backend configuration completely, the best path forward is to set up an entirely new cluster and re-save your data there through the API. As I mentioned in my last email, there is no guarantee your current data isn't corrupted somehow.

--
Luke Bakken
Engineer
[hidden email]


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Daniel Miller
In reply to this post by Luke Bakken
Hi Luke,

Again, thanks for your help. We are currently preparing to move all objects into a new cluster using the S3 API. One question on configuration: currently I have "storage_backend = leveldb" in my riak.conf. I assume that on the new cluster, in addition to using the advanced.config you provided, I also need to set "storage_backend = multi" in riak.conf – is that correct?

Referring back to the subject of this thread for a bit, I'm assuming your current theory for why the (most recent) object went missing is because we have a bad backend configuration. Note that that object went missing weeks after it was originally written into riak, and it was successfully retrieved many times before it went missing. Is there a way I can query riak to verify your theory that the manifest was overwritten? Russel Brown suggested: "I wonder if you can get the manifest and then see if any/all of the chunks are present?" Would that help to answer the question about why the object went missing? Can you provide any hints on how to do that?

While bad configuration may be the cause of this most recent object going missing, it does not explain the original two objects that went missing immediately after they were PUT. Those original incidents happened when our cluster was still using bitcask/mutli backend, so should not have been affected by bad configuration.

~ Daniel

On Tue, Mar 7, 2017 at 3:58 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Thanks for providing all of that information.

You are missing important configuration for riak_kv that can only be provided in an /etc/riak/advanced.config file. Please see the following document, especially the section to which I link here:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

[
    {riak_kv, [
        % NOTE: double-check this path for your environment:
        {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
        {storage_backend, riak_cs_kv_multi_backend},
        {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
        {multi_backend_default, be_default},
        {multi_backend, [
            {be_default, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak"}
            ]},
            {be_blocks, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak_blocks"}
            ]}
        ]}
    ]}

].

Your configuration will look like the above. The contents of this file are merged with the contents of /etc/riak/riak.conf to produce the configuration that Riak uses.

Notice that I chose riak_kv_eleveldb_backend twice because of the discussion you had previously about RAM usage and bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-November/018801.html)

In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.

IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.

--
Luke Bakken
Engineer
[hidden email]

--
Luke Bakken
Engineer
[hidden email]

On Tue, Mar 7, 2017 at 6:47 AM, Daniel Miller <[hidden email]> wrote:
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.





_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Alexander Sicular-2
Hi Daniel,

Riak CS uses multi by default. By default the manifests are stored in leveldb and the blobs/chunks are stored in bitcask. If you're looking to force everything to level you should remove multi and use level as the backend setting. As Luke noted elsewhere, this configuration hasn't been fully tested and is not supported. 

Off the top of my head, take a look at the email Martin (?) sent about his modified level backend a few weeks ago for reasons why using level for data chunks may not be the best idea at this time. 

Thanks,
Alexander 

@siculars

Sent from my iRotaryPhone

On Mar 10, 2017, at 10:50, Daniel Miller <[hidden email]> wrote:

Hi Luke,

Again, thanks for your help. We are currently preparing to move all objects into a new cluster using the S3 API. One question on configuration: currently I have "storage_backend = leveldb" in my riak.conf. I assume that on the new cluster, in addition to using the advanced.config you provided, I also need to set "storage_backend = multi" in riak.conf – is that correct?

Referring back to the subject of this thread for a bit, I'm assuming your current theory for why the (most recent) object went missing is because we have a bad backend configuration. Note that that object went missing weeks after it was originally written into riak, and it was successfully retrieved many times before it went missing. Is there a way I can query riak to verify your theory that the manifest was overwritten? Russel Brown suggested: "I wonder if you can get the manifest and then see if any/all of the chunks are present?" Would that help to answer the question about why the object went missing? Can you provide any hints on how to do that?

While bad configuration may be the cause of this most recent object going missing, it does not explain the original two objects that went missing immediately after they were PUT. Those original incidents happened when our cluster was still using bitcask/mutli backend, so should not have been affected by bad configuration.

~ Daniel

On Tue, Mar 7, 2017 at 3:58 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Thanks for providing all of that information.

You are missing important configuration for riak_kv that can only be provided in an /etc/riak/advanced.config file. Please see the following document, especially the section to which I link here:

http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend

[
    {riak_kv, [
        % NOTE: double-check this path for your environment:
        {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
        {storage_backend, riak_cs_kv_multi_backend},
        {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
        {multi_backend_default, be_default},
        {multi_backend, [
            {be_default, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak"}
            ]},
            {be_blocks, riak_kv_eleveldb_backend, [
                {data_root, "/opt/data/ecryptfs/riak_blocks"}
            ]}
        ]}
    ]}

].

Your configuration will look like the above. The contents of this file are merged with the contents of /etc/riak/riak.conf to produce the configuration that Riak uses.

Notice that I chose riak_kv_eleveldb_backend twice because of the discussion you had previously about RAM usage and bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-November/018801.html)

In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.

IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.

--
Luke Bakken
Engineer
[hidden email]

--
Luke Bakken
Engineer
[hidden email]

On Tue, Mar 7, 2017 at 6:47 AM, Daniel Miller <[hidden email]> wrote:
Responses inline.

On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
Hi Daniel,

Two questions:

* Do you happen to have an /etc/riak/app.config file present?

No.

Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:

$ cat /etc/riak-cs/advanced.config
[
 {riak_cs,
  [
  ]}
].

 

* On one of your Riak nodes, could you please execute the following commands:

riak attach
rp(application:get_all_env(riak_kv)).

Copy the output of the previous command and attach as a separate file
to your response. Please note that the period is significant. Use
CTRL-C CTRL-C to exit the "riak attach" session.

Attached.




_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Object not found after successful PUT on S3 API

Luke Bakken
Just to clarify ...

What Alexander is suggesting is what Daniel is currently using, and
what I suspect may be causing Daniel's issues.

If you wish to run a leveldb-only Riak CS cluster, you still *must*
use the advanced.config file and the riak_cs_kv_multi_backend, and the
other settings that I mention in my response and in the docs. Notice
the multi_backend_prefix_list setting, for one thing.

Daniel -

The storage_backend setting in advanced.config will *override*
storage_backend in riak.conf. If you wish to ensure the riak.conf
setting is overridden, you may comment it out in that file.

--
Luke Bakken
Engineer
[hidden email]

On Fri, Mar 10, 2017 at 9:08 AM, Alexander Sicular <[hidden email]> wrote:

>
> Hi Daniel,
>
> Riak CS uses multi by default. By default the manifests are stored in leveldb and the blobs/chunks are stored in bitcask. If you're looking to force everything to level you should remove multi and use level as the backend setting. As Luke noted elsewhere, this configuration hasn't been fully tested and is not supported.
>
> Off the top of my head, take a look at the email Martin (?) sent about his modified level backend a few weeks ago for reasons why using level for data chunks may not be the best idea at this time.
>
> Thanks,
> Alexander
>
> @siculars
> http://siculars.posthaven.com
>
> Sent from my iRotaryPhone
>
> On Mar 10, 2017, at 10:50, Daniel Miller <[hidden email]> wrote:
>
> Hi Luke,
>
> Again, thanks for your help. We are currently preparing to move all objects into a new cluster using the S3 API. One question on configuration: currently I have "storage_backend = leveldb" in my riak.conf. I assume that on the new cluster, in addition to using the advanced.config you provided, I also need to set "storage_backend = multi" in riak.conf – is that correct?
>
> Referring back to the subject of this thread for a bit, I'm assuming your current theory for why the (most recent) object went missing is because we have a bad backend configuration. Note that that object went missing weeks after it was originally written into riak, and it was successfully retrieved many times before it went missing. Is there a way I can query riak to verify your theory that the manifest was overwritten? Russel Brown suggested: "I wonder if you can get the manifest and then see if any/all of the chunks are present?" Would that help to answer the question about why the object went missing? Can you provide any hints on how to do that?
>
> While bad configuration may be the cause of this most recent object going missing, it does not explain the original two objects that went missing immediately after they were PUT. Those original incidents happened when our cluster was still using bitcask/mutli backend, so should not have been affected by bad configuration.
>
> ~ Daniel
>
> On Tue, Mar 7, 2017 at 3:58 PM, Luke Bakken <[hidden email]> wrote:
>>
>> Hi Daniel,
>>
>> Thanks for providing all of that information.
>>
>> You are missing important configuration for riak_kv that can only be provided in an /etc/riak/advanced.config file. Please see the following document, especially the section to which I link here:
>>
>> http://docs.basho.com/riak/cs/2.1.1/cookbooks/configuration/riak-for-cs/#setting-up-the-proper-riak-backend
>>
>> [
>>     {riak_kv, [
>>         % NOTE: double-check this path for your environment:
>>         {add_paths, ["/usr/lib/riak-cs/lib/riak_cs-2.1.1/ebin"]},
>>         {storage_backend, riak_cs_kv_multi_backend},
>>         {multi_backend_prefix_list, [{<<"0b:">>, be_blocks}]},
>>         {multi_backend_default, be_default},
>>         {multi_backend, [
>>             {be_default, riak_kv_eleveldb_backend, [
>>                 {data_root, "/opt/data/ecryptfs/riak"}
>>             ]},
>>             {be_blocks, riak_kv_eleveldb_backend, [
>>                 {data_root, "/opt/data/ecryptfs/riak_blocks"}
>>             ]}
>>         ]}
>>     ]}
>> ].
>>
>> Your configuration will look like the above. The contents of this file are merged with the contents of /etc/riak/riak.conf to produce the configuration that Riak uses.
>>
>> Notice that I chose riak_kv_eleveldb_backend twice because of the discussion you had previously about RAM usage and bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2016-November/018801.html)
>>
>> In your current configuration, you are not using the expected prefix for the block data. My guess is that on very rare occasions your data happens to overwrite the manifest for a file. You may also have corrupted files at this point without noticing it at all.
>>
>> IMPORTANT: you can't switch from your current configuration to this new one without re-saving all of your data.
>>
>> --
>> Luke Bakken
>> Engineer
>> [hidden email]
>>
>> --
>> Luke Bakken
>> Engineer
>> [hidden email]
>>
>> On Tue, Mar 7, 2017 at 6:47 AM, Daniel Miller <[hidden email]> wrote:
>>>
>>> Responses inline.
>>>
>>> On Mon, Mar 6, 2017 at 3:04 PM, Luke Bakken <[hidden email]> wrote:
>>>>
>>>> Hi Daniel,
>>>>
>>>> Two questions:
>>>>
>>>> * Do you happen to have an /etc/riak/app.config file present?
>>>
>>>
>>> No.
>>>
>>> Not sure if relevant, but I did notice that /etc/riak-cs/advanced.config does exist, which contradicts with what I said earlier. This is surprising to me because I did not create this file. Maybe it was created by the riak installer? Anyway, the content is:
>>>
>>> $ cat /etc/riak-cs/advanced.config
>>> [
>>>  {riak_cs,
>>>   [
>>>   ]}
>>> ].
>>>
>>>>
>>>>
>>>> * On one of your Riak nodes, could you please execute the following commands:
>>>>
>>>> riak attach
>>>> rp(application:get_all_env(riak_kv)).
>>>>
>>>> Copy the output of the previous command and attach as a separate file
>>>> to your response. Please note that the period is significant. Use
>>>> CTRL-C CTRL-C to exit the "riak attach" session.
>>>
>>>
>>> Attached.
>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
12
Loading...