Hadoop Data Persistance in which format?
I have some experience with Lucene, I'm trying to understand how the data is actually stored in slave server in Hadoop framework?
Do we create an index in Slave Server with set of attributes to describe Document we are storing? how does it works in reality ?
Data is split into blocks of a certain size, and then replicated to other nodes in the cluster for reliability. This process is handled by a single "Name Node" which keeps track of which blocks of data have gone where.
Hadoop provides you with a virtual filesystem, similar to Unix, which you can query using various Hadoop filesystem tools (ls, get, put etc)
This link should give you a comprehensive overview.