Driven User Guideversion 2.1.4-eap-9
Visualizing Hive Applications
Driven delivers the end-to-end visibility required for managing and monitoring your Hive applications, including directed acyclic graph (DAG) representations of flows and steps. With your Hive queries running in the Cascading execution framework, your Hive applications can maximize the benefits of the Cascading platform: dynamic management of all Hive objects, visibility into the end-to-end flow of the application, instrumentation, orchestration of your Hive modules for error recovery, and integration with major third-party systems such as Elasticsearch and Teradata.
You can move your Hive Query Language (HQL) queries into production using an API from HiveFlow and the runtime monitoring capabilities of Driven.
|Driven 2.1 can monitor applications that run with any Apache Hive deployment, including environments without HiveFlow. To monitor Hive-based applications without HiveFlow, see the Driven Agent 2.1 Installation Guide.|
HiveFlow is a simple Java wrapper that simplifies the chaining of multiple HQL statements into a single maintainable application. It transparently sends telemetry to Driven so that an HQL-based application can be managed and monitored in real-time.
Driven provides the current status of running applications and a searchable history of past application executions.
With HiveFlow, even applications based on multiple technologies, such as Hive, custom MapReduce, Cascading, and Scalding, can be chained together within the same application (an Apache Hadoop job JAR). The consolidation simplifies testing, deployment, maintenance, and monitoring.
Driven for Hive
Using Driven, you can perform critical tasks necessary for operationalizing and maintaining your Hive applications:
Visualize all queries being executed, as well as all dependencies, to comprehend your end-to-end Hive application
Execute full text search over HQL to find all applications that run a given query
Organize and compare your applications leveraging tags and other search parameters
Get real-time and historical operational insights to identify areas of bottleneck to tune your application
Track current and historical operational metrics for audits, governance, and lineage
Automatically orchestrate dependent Hive scripts with fault-recovery to make your application robust
Correlate your application behavior with other simultaneous events in the cluster
Figure 1: Sample usage of flow details page: Drilling-down to view a HQL statement. Click the icon to copy the statement to your clipboard.