03 July 2013

One Engineer’s Experience with Parcel

I wrote an article about Parcel that has been published on Cloudera's blog, but the missing part of the entry was about the motivation of yet another packaging format.

Motivation for Parcels

The first question I asked myself when I read “Parcel” in the release notes was why the h*** does Cloudera need yet another new packaging format? RPM and DEB etc… aren’t enough? And why is the format  called parcel and not YAPF (Yet Another Packaging Format) to be consistent with Hadoop renaming (YARN)? Their main motivations are probably others…
Hadoop, unfortunately, became famous over time for its incompatibility between versions and components. How many spent a full night trying to run a component, eventually discovering that the embedded library is not compatible with that exact version of Hadoop? Having one package per component could lead to incompatibility problems when, for instance, one component is forgotten to be upgraded at the same time as the others. And with the rise of Hadoop 2.0 and its compatibility breaking changes, things are becoming even worse.
At the same time, companies have started to use Hadoop for mission critical processing. In such environments, downtime for component upgrades is no longer an option, so a way to achieve rolling upgrades is necessary.

And last, but not least, for commercial entities such as Cloudera, existing packaging formats (RPM, DEB…) have started to be a real nightmare to maintain amongst all the available long term versions supported by their respective vendors. Naming and dependencies between packages change over time and major versions, resulting in a huge cost in testing and maintenance.
If you were asked to design a solution that meet all the strong requirements mentioned above, what would your solution be? A custom, monolithic packaging format like parcel could be part of the answer.

No comments:

Post a Comment