Motivation for ParcelsThe first question I asked myself when I read “Parcel” in the release notes was why the h*** does Cloudera need yet another new packaging format? RPM and DEB etc… aren’t enough? And why is the format called parcel and not YAPF (Yet Another Packaging Format) to be consistent with Hadoop renaming (YARN)? Their main motivations are probably others…
Hadoop, unfortunately, became famous over time for its incompatibility between versions and components. How many spent a full night trying to run a component, eventually discovering that the embedded library is not compatible with that exact version of Hadoop? Having one package per component could lead to incompatibility problems when, for instance, one component is forgotten to be upgraded at the same time as the others. And with the rise of Hadoop 2.0 and its compatibility breaking changes, things are becoming even worse.
At the same time, companies have started to use Hadoop for mission critical processing. In such environments, downtime for component upgrades is no longer an option, so a way to achieve rolling upgrades is necessary.
If you were asked to design a solution that meet all the strong requirements mentioned above, what would your solution be? A custom, monolithic packaging format like parcel could be part of the answer.