Should Apache Kafka and Apache Hadoop share the same ZooKeeper instance?
Is it possible to use same ZooKeeper instance for coordinating Apache Kafka and Apache Hadoop clusters? If yes, what would be the appropriate configuration of ZooKeeper?
Yes, as far as my understanding goes, ideally there should be a single zookeeper cluster with dedicated machines for managing the co-ordination between different application in a distributed system. i would try to share few points here The zookeeper cluster consisting of several servers are typically called ensemble and basically manages to track and share states of your application.e.g Kafka uses it to commit offset changes to it so that in case of failure it can identify from where to start again. from the doc page : Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a sets of hosts(ensemble). whenever a change is made, it is not considered successful until it has been written to a quorum (at least half) of the servers in the ensemble.
Now Imagine both Kafka & Hadoop are having a dedicated cluster of 3 zookeeper servers each, in case couple of nodes get down in any of the two clusters it will result a service failure (ZK works based on simple majority voting, so it will tolerate up to 1 node failure keeping the service alive but not 2 ) . Instead if there is One Single cluster of 5zk servers managing both the applications and in case two of the nodes are down you still have the service available.Not only this offer better reliability also it reduces the hardware expenses as instead of managing 6 servers you only have to take care of 5.