What I could do is to make the recovery possible after the namenode failure occurs. Below is the procedure that I'm using. The goal of the procedure is to be able to recover the namenode as quickly as possible while minimizing data loss. Hope this is useful for other as well.
Procedures:
- make sure hadoop dfs is off
- backup the directories listed in hdfs-site.xml under the property name 'dfs.name.dir' (from namenode server) and 'fs.checkpoint.dir' (from secondary namenode server) in a safe location. Optionally, diff all the fsimage and edits across different copies to get a sense how many unique copies of the metadata you can recover from and possibly start with the one with a timstamp closing to the current time.
- remove all remote locations listed in 'dfs.name.dir' and leave the local location (if there are more than one local locations, leave 1 from the list) in a machine that takes up the responsibility of a new namenode.
- restart hadoop dfs
- check the status page located in the namenode:50070: if the reported block ratio cannot reach 100%, stop hdfs and replace the meta data from another location (possibly a remote location/secondary namenode location if there are no other local locations). Note that always leave only 1 location in the 'dfs.name.dir'. If there is no more location to try, go to step 7. Otherwise, it is done.
- Start hdfs until step 5 is true.
- Select one meta data from all the locations which has the highest reported block ratio, copy this meta data into a local location (if it is not the local one) and empty the content inside the remote locations
- List the location which has the highest reported block ratio first in the 'dfs.name.dir' property and append it with the remote empty locations.
- start hdfs
- use 'bin/hadoop fsck /' to check the health of the cluster.
- Pay attention to the corrupted/missing blocks
- leave safemode by executing 'bin/hadoop dfsadmin -safemode leave' and then move all the corrupted and missing blocks to lost+found folder by executing 'bin/hadoop fsck -move'
- check the health of the cluster using 'bin/hadoop fsck /', the cluster is now healthy.
Configurations:
Also, we also notice that some configurations are very important to hadoop and hbase (0.90.4) reliability and they are:
Name: dfs.http.address
Description: put this property in hdfs-site.xml where the secondary namenode is hosted. The value is the number of seconds between two periodic checkpoints.
Name: dfs.support.append
Value: true
Description: put this property in hdfs-site.xml and hbase-site.xml across the entire cluster. This property enables durable sync to minimize data loss when namenode fails.