Hadoop sequential data access

According to the Hadoop definitive guide:

HDFS is a filesystem designed for storing very large files with streaming or sequential data access patterns

What is streaming or sequential data access? How will it reduce the seek time of disk?


This is not really specific to Hadoop.

Sequential Access pattern is when you read your data in sequence (often from start to finish). Consider a book example. When reading a novel, you use sequential order: you start with page 1, then move to page 2 and so on. The other common pattern is called Random Access. This is when you jump from one place to another, and possibly even backwards when reading data. For a book example, consider a dictionary. You don't read it like you read a novel. Instead, you search for your word in the middle somewhere. And when you're done looking up that word, you may perhaps go look for another word that is located hundreds of pages away from where you have your book open to at the moment. That searching of where you should start reading from is called a "seek".

When you access sequentially, you only need to seek once and then read until you're done with that data. When doing random access, you need to seek every time you want to switch to a different place in your file. This can be quite a performance hit on hard drives, because seeking is really expensive on magnetic drives.

Need Your Help

How to set test category for all tests in the class

c# unit-testing tfs mstest

I am using MSTest, and I want to set the same test category for all methods in test class at once, without setting TestCategory attribute to each method individually. How can this be done?

DSP - Converting a sampled signal from real samples to complex samples and vice versa

algorithm signal-processing

How can I convert baseband sampled signal from real-valued samples to complex-valued samples (real,imaginary) and vice-versa.