The Data processing service for OpenStack (sahara) aims to provide users with simple means to provision data processing (Hadoop, Spark) clusters by specifying several parameters like Hadoop version, cluster topology, nodes hardware details and a few more. After user fills in all the parameters, the Data processing service deploys the cluster in a few minutes. Also sahara provides means to scale already provisioned clusters by adding/removing worker nodes on demand.
The solution addresses the following use cases:
Key features are:
Designed as an OpenStack component.
Managed through REST API with UI available as part of OpenStack dashboard.
Support for different Hadoop distributions:
Predefined templates of Hadoop configurations with ability to modify parameters.
User-friendly UI for ad-hoc analytics queries based on Hive or Pig.