Driven User Guide

version 2.1.4-eap-9

Visualizing Hive Applications

Driven delivers the end-to-end visibility required for managing and monitoring your Hive applications, including directed acyclic graph (DAG) representations of flows and steps. With your Hive queries running in the Cascading execution framework, your Hive applications can maximize the benefits of the Cascading platform: dynamic management of all Hive objects, visibility into the end-to-end flow of the application, instrumentation, orchestration of your Hive modules for error recovery, and integration with major third-party systems such as Elasticsearch and Teradata.

Using HiveFlow

You can move your Hive Query Language (HQL) queries into production using an API from HiveFlow and the runtime monitoring capabilities of Driven.

Note	Driven 2.1 can monitor applications that run with any Apache Hive deployment, including environments without HiveFlow. To monitor Hive-based applications without HiveFlow, see the Driven Agent 2.1 Installation Guide.

HiveFlow is a simple Java wrapper that simplifies the chaining of multiple HQL statements into a single maintainable application. It transparently sends telemetry to Driven so that an HQL-based application can be managed and monitored in real-time.

Driven provides the current status of running applications and a searchable history of past application executions.

With HiveFlow, even applications based on multiple technologies, such as Hive, custom MapReduce, Cascading, and Scalding, can be chained together within the same application (an Apache Hadoop job JAR). The consolidation simplifies testing, deployment, maintenance, and monitoring.

Driven for Hive

Using Driven, you can perform critical tasks necessary for operationalizing and maintaining your Hive applications:

Visualize all queries being executed, as well as all dependencies, to comprehend your end-to-end Hive application
Execute full text search over HQL to find all applications that run a given query
Organize and compare your applications leveraging tags and other search parameters
Get real-time and historical operational insights to identify areas of bottleneck to tune your application
Track current and historical operational metrics for audits, governance, and lineage
Automatically orchestrate dependent Hive scripts with fault-recovery to make your application robust
Correlate your application behavior with other simultaneous events in the cluster

Hive_App2 Figure 1: Sample usage of flow details page: Drilling-down to view a HQL statement. Click the icon to copy the statement to your clipboard.

Visualizing Custom MapReduce Jobs

Driven User Guide

Visualizing Hive Applications

Using HiveFlow

Driven for Hive

Next