We recently upgraded our production ElasticSearch clusters from 0.90.x to 1.2.x and during the upgrade, we noticed that some shards could not initialize, leaving the cluster in a red state. The logs showed many errors like the following:
essms-07.prod.rpc: [2015-07-11 07:22:15,639][WARN ][cluster.action.shard ] [essms-07.prod.rpc] [mention_2013_05] sending failed shard for [mention_2013_05], node[fIMV3YJlQcWapAdYzE62bw], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[mention_2013_05] failed recovery]; nested: EngineCreationFailureException[[mention_2013_05] failed to open reader on writer]; nested: CorruptIndexException[did not read all bytes from file: read 1214 vs size 1215 (resource: BufferedChecksumIndexInput(MMapIndexInput(path="/usr/local/var/data/elasticsearch/revinate_sms/nodes/0/indices/mention_2013_05/3/index/_2xnd_2.del")))]; ]]
All of the Exceptions indicated that Lucene expected n bytes but the file on disk was n+1 bytes. After some digging, this appears to have been caused by a bug in Lucene:
The first line of the issue description is revealing:
BitVector before Lucene 3.4 had many bugs, particularly that it wrote extra bogus trailing crap at the end.
Although we had a copy of the 0.90.x data and could roll back, on a whim we decided to see what would happen if we simply truncated the problem files by 1 byte:
# mv _2xnd_2.del /tmp/foo && dd if=/tmp/foo of=_2xnd_2.del bs=1214 count=1
Success! The shards initialized and subsequent analysis proved they were in tact without any missing or corrupt data. Please don’t attempt this without a known good backup!